Re: HTML DTD

Dan Connolly (connolly@pixel.convex.com)
Thu, 25 Jun 92 16:59:59 CDT


>thanks for that contribution. Not being as hot on SGML
>as I ought to be, I don't see why the HREF has to refer to
>and entity declared separately rather than directly having
>a string argument.
>
That's actually left over from when I was trying to point
HREF attributes to MIME attachments. It's not really
necessary to move the UDIs into entities as long as you're
careful that the UDI syntax is a subset of the SGML
attribute literal syntax.

Beware, for example, that an
SGML parser will expand entity references in an attribute literal
to produce the CDATA for the attribute value. So that
<A HREF="A&P"> might be OK for the linemode browser,
but an SGML parser will try to resolve &P.

Also, SGML attribute values have a maximum length specified
in the SGML declaration. The default value is 960 or something
around there.

>The title is in fact optional currently, by the way ...
>we could keep it so though it "ought" always to have one.
>
>I'd like a DTD which as closely reflects the current HTML as
>possible.

I suppose you could come up with a DTD that describes something
close to the current HTML, but I'm not sure of the value of it.
HTML allows tags to be pretty much sprinkled wherever you feel
like putting them. Any DTD that allows that much leeway just
looks like this:

<!ENTITY % alltags "TITLE|H1|H2|H3|MENU|OL|UL">
<!ELEMENT %alltags (%alltags)*>

i.e. every element is just a repeatable or-group of all the elements.
Then the SGML parser can't do any minimization cuz nothing's required.

> Then, if we change HTML to HTML2, I would
>change it in a number of ways, in particular to include
>separate header and body parts. I have come across the
>"Davenport" group of publishers who are defineing DTDs for
>technical documentation. They include Steve Newcombe who
>is the HyTime guy (or one of the two I should say).
>I would like to get some input from them.
>

Certainly we should keep tabs on things like the Davenport
group and HyTime.

But my immediate concern is these little sytactic differences
that render HTML documents worthless to an SGML parser. The
current HTML and UDI syntax make a good proof of concept, but
we need to move toward formal definitions so that we can
have confidence that correct implementations will interoperate.

More later...

Dan