Well, suffice it to say that HTML doesn't give much semantic
information. It would be nice to express relationships between
pieces of information through the document structure, but in
HTML we mostly use links.
>>[...] The same is true for newlines: it's
>>illegal to treat
>>different from <foo>content</foo> because the difference is
>>not reported by the parser (unless we do some shortref magic
>>to force the parser to report the difference.)
>I don't think we should do any shortref magic. The simplest thing
>(the way it works now) is that the two examples above are identical.
>I say this is fine.
But it's a royal pain to implement! Doing full SGML newline processing
by the standard is quite involved (see the article by Eric Naggum
in comp.text.sgml about SGML and Records that I referenced in
an earlier message). For example, you can't just get rid of all
newlines immediately before or after tags, like it says in the
web: Only those right after a start tag (of a non-empty element),
right before an end tag,
or the ones on a line containing only comments and processing instructions.
Newlines around <P> tags, for example, _are_ reported.
If we don't stick the SHORTREF magic in the DTD to force the
parser to report all newlines, we'll end up with countless hacks
at newline processing, none of which matches the standard, and
it'll be luck if any of them matches each other.