ISO, Unicode -> multilinguality

Richard L. Goerwitz (goer@midway.uchicago.edu)
Mon, 26 Sep 94 18:05:11 CDT


Just received an enlightening personal letter. In it there was one
question worth answering here on www-talk. In my haste, it simply
did not occur to me that everyone here has the same background as I
do :-)...

> Here's another exigency that must be handled. Arabic, Hebrew, etc.
> run right-left:
>
> <language Arabic encoding="ISO 8859-7" wrap="right-left">
>
> The reason the "wrap" must be specified is that it is possible to
> do, say, Arabic in one of two ways. The first is to just code in
> the stuff backwards.
>
>Under what circumstances can it possibly make sense to code letters in
>backwards? Certainly not in HTML.... Unless you want to have to
>reverse whole files of text! (Since the line breaks are not defined
>in the source.)

This is an excellent question. Why indeed should we ever have to code
backwards? I've been talking Arabic so far, but consider the case we
often see on Israeli servers. They often show menus with English at one
end and Hebrew at the other (implying an ISO 8859-8 encoding - you just
have to switch your client over manually). The do it like this:

Menu item 1....................1 meti uneM
Menu item 2....................2 meti uneM
etc.

Of course the right-hand material would be written in Hebrew. I just
reversed the English here so everyone would be able to see the point.
All servers assume left-right directionality. So the only way to get
the English and Hebrew both up is to code the English in forwards, and
enter the Hebrew backwards.

<UL>
<LI>
<language English encoding="ISO 8859-1" direction="normal">
Menu item 1................
</language>
<language Hebrew encoding="ISO 8859-8" direction="reverse">
1 meti uneM
</language><BR>
<LI>
<language English encoding="ISO 8859-1" direction="normal">
Menu item 2................
</language>
<language Hebrew encoding="ISO 8859-8 direction="reverse">
2 meti uneM
</language><BR>
</UL>

What we are doing is putting the Hebrew in manually, reversing its
natural or normal order in the underlying file (which should be text
coming first should be coded first - here it is the opposite of that).
Again, if this example were "real," the Hebrew text above would use
high-order characters, as in ISO 8859-8. Note that the above coding
assumes that some server is not going to break up the line. Pretty
bad assumption. Who knows what they'll do. So that brings us to the
more sensible coding scheme, i.e., to code both English and Hebrew
"forwards," but expect the client to know that when it comes to dis-
playing Hebrew, it's going to run right-left:

<UL>
<LI>
<language English encoding="ISO 8859-1" direction="normal">
Menu item 1................
</language>
<language Hebrew encoding="ISO 8859-8" direction="normal">
Menu item 1
</language><BR>
<LI>
<language English encoding="ISO 8859-1" direction="normal">
Menu item 2................
</language>
<language Hebrew encoding="ISO 8859-8 direction="normal">
Menu item 2
</language><BR>
</UL>

I hope this helps get us all on the same page. As I mentioned, I
believe that the MIME standerd merges in both of these methods.
But I'm no expert. Someone correct me if I'm wrong.

The fact that menus like what I described above are being hacked to-
gether on the net shows that the demand is there for bilingual text
(multilingual text, in fact) - this in addition to all of the databse
and dictionary projects I mentioned in earlier postings.

Richard Goerwitz
goer@midway.uchicago.edu