Well then maybe the USENET FAQ project needs a separate DTD. I'm
willing to table the issue for now.
>> Special character entities?
>> Yeah! It uses numeric character references already!
>No -- it used named entities. I'll leave it
So lt, gt, and amp are "Deprecated" rather than "Obsolete", that
is, they are not recommended, but they will be supported. In that
case, we should update the DTD to include them.
>> 12. Default text: this node seems to confuse lots of issues. I
>> we get rid of it. The way to look at HTML is as a stream of data
>> markup. Newlines are handled differently all over the place. It
>> be reasonable to talk about how newlines are handled by the text
>> formatter, after they have been handed over from the SGML parser.
>People writing SGML don't want to know about parser and formatters
>(an arbitray distinction which is very questionable in the definition
>of a DTD or SGML -- it is only relevant to the definition of the
>software interface to an SGML engine)
The distinction between parsers and formatters (i.e. applications)
is very much defined by SGML: a conforming application is not allowed
to act on anything but the ESIS. For example, it's illegal to
treat attribute values delimited by single quotes differently from
those surrounded by double quotes, because that information is
not reported by the parser. The same is true for newlines: it's
illegal to treat
different from <foo>content</foo> because the difference is
not reported by the parser (unless we do some shortref magic
to force the parser to report the difference.)
In any case, I think "people writing SGML" is the group for
whom an understanding of these issues is most critical.
They should be referred to the implementors' guide. This
business of "default text" or "Character Data" is thoroughly
discussed in "Text and Markup" under "Parsing content into
data and markup" and in "Recommended Usage" under "Body elements."
I wrote the "Text and Markup" node to replace this "Default Text"
or "Character Data" node, and I still think the node does
more harm than good.
>> In http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html
>> 13. This text is out of place:
>> Each tag starts
>> with a tag opener (a less than sign)
>> and ends with a tag closer (a greater
>> than sign). Many tags have corresponding
>> closing tags which identical except
>> for a slash after the tag opener.
>Take this as an informal intro not a spec.
>Let's keep the spec in parallel.
I took great pains to make "Text and Markup" an
accessible yet correct intro to SGML syntax. I'd like
to see folks referred to that document for these issues.
If it's not readable, let's fix it.
We must be very careful
of two things: 1. that these redundant informal blurbs
do not in any way conflict with the SGML standard,
and 2. that they are not misleading.
This blurb mostly passes criteria 1: all tags do
indeed start with a less than sign (and I guess
"tag opener" is close enough to "start tag open delimiter"
though "... which is identical except for a slash after
the tag opener" is goofy. </, the end tag open delimiter,
is not viewed as a start tag open delimiter followed by
But not all less than signs indicate tags: a less than
sign is only recognized as STAGO when it's followed by
a letter. And most A end tags are hardly identical
to their start tag, even modulo the slash. The case
of the start tag can be different from the case of the
end tag. I fear that folks will read this blurb and
write broken sed scripts.
Certainly there should be a link from this blurb to
>> 14. These blurbs should probably quote their element declarations
>> from the DTD, in order to help folks learn to read the DTD.
>Yes. And the DTD should be in PRE with links back to the blurbs.
>I have started a //info.cern.ch/hypertext/WWW/MarkUp/HTML.dtd.html
Excellent idea. But again, there are maintenance issues we must
>That's where I got to
I certainly appreciate the speedy response, and the notes
of encouragement I got from a few others.
I am much more confident that this will all be resolved soon.