assorted HTML and SGML questions

Joe Wells (jbw@cs.bu.edu)
Sat, 18 Nov 1995 22:30:14 -0500


Hi, HTML and SGML gurus,

I've got some more questions the answers to which I haven't been able to
find in my WWW browsing. Some of these questions are about HTML, some are
about SGML, and some are about HTML as an SGML document type.

Q: (("text/html" Internet Media Type)) Does text/html forbid including the
SGML declaration (<!SGML ...>)? I know it forbids including a document
type declaration subset, but the standard is unclear on whether the
SGML declaration is allowed.

Q: ((Internet Media Types for SGML)) Since the text/html Internet media
type forbids including a DTD subset, what media type should one use if
one wishes to transmit an HTML document with a DTD subset via HTTP? Is
there something like a text/sgml media type defined anywhere?

Q: ((HTML and Empty P Elements)) What are the semantics of an empty P
element in HTML? The standard doesn't really seem to deal with this.
There are *lots* of documents on the net with *lots* of empty P
elements. Is it reasonable for a user agent to issue a warning that
this is bad HTML?

Q: ((SGML Mixed Content)) I'm not sure if I understand the mixed content
rules properly. Let me state what I guess the rules are so that you
can tell me if I got it right or wrong. Here is what I think the rules
are:

* If a content model contains #PCDATA anywhere, the the entire
element has "mixed content".
* If an element does _not_ have mixed content, then a sequence of
characters between two tags that is solely composed of whitespace
(SPACE, TAB, RS, RE) is ignored, otherwise the whitespace is
treated as ordinary data characters and must correspond to an
occurrence of #PCDATA in the content model.

Is this right?

Q: ((SGML LITLEN)) Is the SGML limit on attribute value lengths (LITLEN)
applied to the attribute value after parsing and entity replacement or
before?

Q: ((HTML PRE Containing FORM)) RFC 1866 says this:

For example, a <PRE> element may contain a <FORM> element, ...

This doesn't make any sense because it contradicts the DTD given in the
same document. What's the story here?

Q: ((HTML INPUT and SELECT Attributes)) Why is the SIZE attribute of the
INPUT element specified to have type CDATA while the SIZE attribute of
element SELECT is specified to have type NUMBER? Is this to allow
dimension units to be specified? It doesn't say in the standard.

Thanks for any help you can give me.

-- 
Joe Wells <jbw@cs.bu.edu>