> I've made some tangible progress on the X11 browser, so I though
> I'd let you know.
> This code is not in any shape to distribute, or even show anybody.
> But it works, and it's pretty speedy. That's enough to encourage me
> to polish it off.
Sounds like great progress! The TCL sounds interesting -- where did
you get it?
> [If you wan't my stuff, you'll have to be C++ capable. I can't
> think in C any more. :-]
Don't worry - we can handle C++, although for the line mode browser
we wanted portability into places where C++ could not reach. That's
why the common code (in WWW/Implementation) is all in C. Believe me,
after writing the NeXT browser in Objective-C it was a wrench to
conclude that it would have to be deobjectified.
> If you could round up some info on exactly what I can expect to see
> in an HTML file, and some idea of how you want it formatted [I have
> the HTML doc and the LineMode browser, but if you've got time to
> give me a little more info...] I'll be ready to tackle that pretty
You ask for info on exactly what you can expect to find in an HTML
file, but you've read the two HTML files about HTML. What is missing
Here is some discussion about the tags -- where it's not in
http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html I have updated
that document now.
Most of the tags are just style tags: this goes for the headings H1
to H6, the lists UL and OL with list elements LI, the glossary DL
with elements DT and DD.
<TITLE> ..<TITLE> is designed to be used for putting in the top
banner of a window, or using as the window name. It also is what you
would use in a history list. It shouldn't be displayed in the text
itself, as usually there is a <H1> heading atteh top of the text
anyway. A difference is that thet title is designed to make sense out
of context, whereas the heading is within context. For example,
a title might be "Formatting Characters for Printf -- C reference
manual" whereas the heading may just be "Formatting characters".
The base address tag is not used, nor is highlighting HP1 etc.
Anchors are used! The REL attribute is NOT used.
<ISINDEX> is sent by servers to indicate that they will accept a
search given this document name plus keywords. It turns on a search
panel when the document is the main window. An even better
implementation would have a keyword field at the bottom of the text
window if the document is a searchable index. That would make the
document more self-contained as an item in the user's eyes, and
reduce screen clutter.
<NEXTID> can be ignored by browsers, only needed for editors.
<XMP> and <LISTING> are used to indicate inserted literal text.
To make life easier for those writing documents (and because we don't
have entities in the code yet) they are special in that EVERYTHING is
litteral text until the closing tag - so one can use XMP for giving
examples of HTML for example. (We really need an escaping method -
the next parser will have simpl entities like "<." for "<".)
Within XMP or LISTING, newlines are significant (and mean "new
<PLAINTEXT> is used to indicate that the rest of the file is in fact
just ASCII. It turns off SGML parsing completely. It's a fudge for
the moment, until we have the document format negociation.
Structure of documents:
In writing a new generic parser, I wondered whether your text object
will store the nested structure of a document. At the moment, the
document is a linear sequence of styles: you can't have lists within
lists, etc. Ideally, it would be able to handle this - although its
more difficult for a human writer to handle when formatting the
document. I would in fact prefer, instead of <H1>, <H2> etc for
headings [those come from the AAP DTD] to have a nestable
<SECTION>..</SECTION> element, and a generic <H>..</H> which at any
level within the sections would produce the required level of
For a browser, it is quite satisfactory to flatten the structure back
into a sequence of styles, but for an editor it isn't. Are you going
to go for editing capability?
PS: Shall I put you on the www-talk list?