Re: National characters...again

Albert Lunde (Albert-Lunde@nwu.edu)
Wed, 1 Feb 1995 18:29:14 +0100


At 2:54 PM 2/1/95, Donald=Greer@tsl.texas.gov wrote:
> I believe that Latin 1 is the only specified extended character set. For a
>definitive answer though, check the DTD or the HTML 2.0 drafts.

A recent version of the HTML 2.0 draft says in the section on MIME and HTML:

>Character sets
> The charset parameter is reserved for future use. See Section 2.16 for a
> discussion of character sets and encodings in HTML.
>
> The actual character set used in the representation of an HTML document
> may be ISO 8859/1, or its 7-bit subset which is ISO 646. There is no
> obligation for an HTML document to contain any characters above decimal
> 127. It is possible that a transport medium such as electronic mail
>imposes
> constraints on the number of bits in a representation of a document,
>though
> the HTTP access protocol used by WWW always allows 8 bit transfer.

I think the context of this is that HTML 2.0 is intended mostly to specify
current practice as of mid-94 and intention is that HTML 2.1 would
introduce "minor" ;) ;) extensions like international character sets.

Discussion of these issues has broken out, on and off, for the last two
months (at least) on the HTTP and HTML working group lists.

It's my personal opinion that there would be relatively little controversy
about extending HTML/HTTP specs to allow use of the MIME charset parameter
for ISO-8859-X where X=1 to 9 (the characters sets already mentioned in the
MIME RFCs). *However*, this has not yet actually been done, and there an
implementation problem in that not all WWW software parses MIME charset
parameters.

(De-facto I think people are using other character sets anyway and hacking
their clients to convert them, based on out-of band knowledge of the
correct charset.)

What seems more controversial is the treatment of mixed character sets
and/or languages in a single document. This brings in Unicode and other
things like ISO 2022 or ideas from the Text Encoding Inititative. I don't
know what solutions will get standardized. (The options are constrained
somewhat by keeping HTML SGML compliant.)

For more information see:

HTML WG archive
<URL:http://www.acl.lanl.gov/HTML_WG/archives.html>

HTTP WG archive
<URL:http://www.ics.uci.edu/pub/ietf/http/hypermail/>

and my personal collection of bookmarks:

<URL:http://www.mcs.com/%7Elunde/web/aboutwww.html>

---
    Albert Lunde                      Albert-Lunde@nwu.edu