Re: Putting the "World" back in WWW...

HALLAM-BAKER Phillip (hallam@dxal18.cern.ch)
Tue, 4 Oct 1994 11:12:10 +0100


In article <8A5B@cernvm.cern.ch> you write:

|>>We are currently using the standard X11 font distribution, MIT have free
|>>fonts for Korean, Chinese and Japanese. There are several Hebrew ones. I have
|>>metafont for hieroglyphs which I would like to have in X11 but the SeeTeX
|>>stuff will not compile on my machine.
|>
|>Speaking of Hieroglyphs, this is one of those areas where Unicode just
|>isn't done yet. We could likewise argue for centuries over how to han-
|>dle the Sumerian-Akkadian-Assyrian-Babylonian script(s). There is sim-
|>ply no way that Unicode is ever going to be a complete solution. Be-
|>ing a philologist by training, I can only look with horror on the idea
|>of locking everyone into an internal Unicode representation scheme.
|>
|>I've seen suggestions that seem to me as though they amount to this.
|>Am I misunderstanding? Phil, Dan, anyone else - ?

Well we have 32 bits to play with not just 16 so there should be plenty of
space to map in extra sets. Alternatively we could divide up the application
specific area. Hieroglyphs are meant to be included. We could make a
good guess at the Gardner list being used as the basis for the encoding and
work on that basis. The best bet in that case would be to use an ascii
encoding and SGML entities and make the mapping part internal to the browser.
That way there would be no later incompatibility issues. When UNICODE was
completed all that would have to change would be changing the entity
definitions.

Just to repeat. We only have one character at a time in 32 bit form, it
is analysed in the context of the current multifont and font to see if a
font change within the multifont is required. The character is then mapped
to the encoding scheme of the font.

Issues such as the Han unification controversy (which sounds like the name
of a novel) are inevitable if you only have 16 bits. The point is though
that SGML is not really any use in defining entity sets. The standard
entity set def being :-

<!ENTITY aacute SDATA "[aacute]" -- Now just how do you intend to print it?-->

To make a browser work interoperably we need a REAL, IE ACTUALLY USEFULL
definition of the entity. The only one that is of use is to define within
the context of a character set. UNICODE may not be a complete model but it is
at least a reasonably large one. We can bite off chunks of the 32 bit space
for extended usage and novel stuff.

--
Phillip M. Hallam-Baker

Not Speaking for anyone else.