Re: HTML+ Comments

Klaus Harbo (Klaus.Harbo@euromath.dk)
Tue, 20 Jul 1993 09:50:47 +0200


I'd like second Rob on the issue of EMPTY P elements, since this issue
is something I've been thinking of the last couple of months...

Nat writes:

> Why is this? The <P> as separator makes sense to me, and is valid
> SGML.

It is valid, but - in my view - it is not very good SGML.

I assume that the DTD in http://info.cern.ch/hypertext/WWW/MarkUp/HTML.dtd.html
is current(?).

Come to think of it, could someone (Tim? Dave?) summarise the status
of and plans for W3 DTDs? I have a very old one, dated 15 Jul 92, made
by Dan Connolly, then there is the one I referred to above and then
there's the HTML+ DTD (which I haven't seen).

I agree with Rob's point that

> There is a strong habit for authors to think if the <P> as a
> container and when (if) we can ascribe style to objects within HTML, <P>
> as seperator will be *the* most misunderstood idea.

> Frankly, I am surprised that it was not defined as such to begin with.

Looking at the DTD, there is actually more cases where I find the use
of EMPTY tags unfortunate.

For example (quouting from http://info.cern.ch/hypertext/WWW/MarkUp/HTML.dtd.html,
with a lot omitted)>

<!ENTITY % inline "EM | TT | STRONG | B | I | U |
CODE | SAMP | KBD | KEY | VAR | DFN | CITE " >
<!ELEMENT (%inline;) - - (#PCDATA)>
<!ENTITY % text "#PCDATA | IMG | %inline;">
<!ENTITY % htext "A | %text">
<!ELEMENT DL - - (DT | DD | P | %htext;)*>
<!-- Content should match ((DT,(%htext;)+)+,(DD,(%htext;)+))
But mixed content is messy. -->
<!ELEMENT DT - 0 EMPTY>
<!ELEMENT DD - 0 EMPTY>

Why not make this:

<!ELEMENT DL - - ((DT+,DD)+) >
<!ELEMENT (DT|DL) - O (%htext;) >

(or something similar)? It corresponds to the comment by the DL
declaration? Of course there is mixed content in this too, but that is
inevitable due to structure of %text;. Of course, in this example
there is no room for P elements in DT and DD, but that could be worked
out.

What I don't understand is the choice to have EMPTY tags which CREATES
problems with mixed content.

To go back to Rob's point, I think the same problem occurs with the
declaration of the P element. The structure of HTML instances would be
much clearer if no text could occur outside of elements (ie. all the
text that is currently outside should be put in P elements).

Please note that this would not have to incur a lot more tagging since
the end tag of P should be optional. The main difference is really in
the _interpretation of the instance_, since text would be interpreted
as the _content_ of P. In terms of actual tags, a <P> tag would be
required before the first paragraph (or text, really) rather than:

<BODY>
bla bla bla
<P>bla bla bla
</BODY>

we should have:

<BODY>
<P>bla bla bla
<P>bla bla bla
</BODY>

---

At the Euromath Center we work on the development of the Euromath
System, a system which at the core has an SGML editor. I am developing
at Network Information Service module for Euromath, which - hopefully
- will provide Euromath users access to both Gopher and W3 information.

I have developed the necessary bit and pieces to let me export HTML.
That way we get WYSIWYG editing of HTML. The idea is to create a
translator that will let me read HTML, thus producing a real WYSWYG
HTML editor.

I intend to include browsing capabilities (time permitting) when the
translator is functional.

Cheers,

Klaus Harbo

- a long-time WWW lurker and SGML heretic

--
/--------------------------------------------------------------------------\
|  Klaus Harbo                   | e-mail:         Klaus.Harbo@euromath.dk |
|  Euromath Center   (EmC)       | phone (direct):           +45 3532 0713 |  
|  Universitetsparken 5          | phone (sw.board):         +45 3532 1818 | 
|  DK-2100 Copenhagen            | fax:                      +45 3532 0719 |
\--------------------------------------------------------------------------/