Re: assorted HTML and SGML questions

Daniel W. Connolly (
Sat, 18 Nov 1995 23:34:40 -0500

[David: one or two of these issues are good candidates for
BrowserCaps tests...]

In message <>, Joe Wells writes:
>Q: (("text/html" Internet Media Type)) Does text/html forbid including the
> SGML declaration (<!SGML ...>)?

No. In fact, it requires it...

> I know it forbids including a document
> type declaration subset, but the standard is unclear on whether the
> SGML declaration is allowed.

Well, you're the reader: if you say it's unclear, then it's
unclear :-) Sorry. But the info _is_ in there:
HTML Public Text Identifiers

To identify information as an HTML document conforming to this
specification, each document must start with one of the following
document type declarations.



>Q: ((Internet Media Types for SGML)) Since the text/html Internet media
> type forbids including a DTD subset, what media type should one use if
> one wishes to transmit an HTML document with a DTD subset via HTTP? Is
> there something like a text/sgml media type defined anywhere?

Essentially, yes. The ink isn't dry on the spec, but the MIMESGML
working group is working on it. start at
and poke around. (sometimes the DNS info for is
hozed. Try

>Q: ((HTML and Empty P Elements)) What are the semantics of an empty P
> element in HTML? The standard doesn't really seem to deal with this.

A P element is a paragraph. An empty P element is an empty paragraph.
An empty paragraph means whatever your user agent thinks it means.

> There are *lots* of documents on the net with *lots* of empty P
> elements. Is it reasonable for a user agent to issue a warning that
> this is bad HTML?

Hmmm... no, it's technically not bad HTML. On the other hand, a
"weblint"-like tool might issue a warning something like:

"foo.html line 27: Two or more consecutive <p> tags.
If you are using this idiom to create the effect of a certain
amount of vertical whitespace on the reader's display, you
should know that this is an unspecified, and hence unreliable

To reliably achieve a certain amount of whitespace, use
a stylesheet. See for
more info."

This would go in the category of "unspecified, but deprecated
for stylistic reasons" in the Browsercaps classification.

>Q: ((SGML Mixed Content)) I'm not sure if I understand the mixed content
> rules properly. Let me state what I guess the rules are so that you
> can tell me if I got it right or wrong. Here is what I think the rules
> are:
> * If a content model contains #PCDATA anywhere, the the entire
> element has "mixed content".
> * If an element does _not_ have mixed content, then a sequence of
> characters between two tags that is solely composed of whitespace
> (SPACE, TAB, RS, RE) is ignored, otherwise the whitespace is
> treated as ordinary data characters and must correspond to an
> occurrence of #PCDATA in the content model.
> Is this right?

Yup. The corresponding DTD-design rule is:

All content models containing #PCDATA should be
repeatable-or groups, i.e. declarations of the form:
<!element foo - - (#PCDATA | x | y| ... )*>

>Q: ((SGML LITLEN)) Is the SGML limit on attribute value lengths (LITLEN)
> applied to the attribute value after parsing and entity replacement or
> before?

Err.. after, I think. My copy of the SGML spec isn't handy. The
relevant section is section 7.6, as I recall.

>Q: ((HTML PRE Containing FORM)) RFC 1866 says this:
> For example, a <PRE> element may contain a <FORM> element, ...
> This doesn't make any sense because it contradicts the DTD given in the
> same document. What's the story here?

Er... this is a bug. Where were you during the last two years of
review? :-) FORM isn't allowed in PRE according to the DTD (I just
checked with sgmls). Whether it should be or not, I don't really want
to think about right now.

>Q: ((HTML INPUT and SELECT Attributes)) Why is the SIZE attribute of the
> INPUT element specified to have type CDATA while the SIZE attribute of
> element SELECT is specified to have type NUMBER? Is this to allow
> dimension units to be specified? It doesn't say in the standard.

I'll have to admit that I'm not intimately familiar with some
of the details of FORMS. My best reviewer was Paul Burchard. Maybe
he can answer this one...