Re: Adding new tags (was: Redefining...)

Daniel W. Connolly (connolly@hal.com)
Mon, 13 Jun 1994 13:29:55 -0500


In message <9406131309.aa06913@dali.scocan.sco.COM>, Murray Maloney writes:
>
>However, now it seems that Tim thinks that it
>will be possible for a document instance
>to "encounter a RENDER tag for an undeclared element".
>
>It seems that things are not so clear again.
>At least to me?
>

>From what I gather, you had a reasonable picture of how it might work
in mind, and whoever thought a conforming document could contain tags
that refer to undeclared elements was a little confused...

[note that undeclared entities are a different story... the current
HTML DTD has a #DEFAULT entity declaration... more on that later]

>So, what is the story going to be? I think that
>we have to decide and commit right now. Either
>we are going to define HTML 2.0 and 3.0 as strictly
>conforming SGML DTDs and not provide trivial mechanisms
>for extending the language at the whim of information
>providers or browser developers, OR we are going to use
>SGML as a language of convenience for defining HTML 2.0
>and 3.0 and then provide simple but effective ways to
>formalize a mechanism for the extension of the language.

At this point in the game, it's important to phrase these things
carefully -- I've never seen the term "strictly conforming SGML DTD"
before. The term "conforming SGML document", on the other hand,
is defined in ISO 8879, definition 4.51.

I suggest (for the Nth time... :-) that a requirement of the HTML
language is:

An HTML document shall be a conforming SGML document.

This does _NOT_ directly conflict with the ability to "provide trivial
mechanisms for extending the language at the whim of information
providers or browser developers."

For the purposes of the 2.0 spec, there will be no way to use tags
that are not in the standard DTD in conforming documents. There just
aren't any widely deployed mechanisms in place. Browser implementors
will simply be warned that it is quite common for servers to transmit
invalid documents, and certain classes of errors should be tolerated
in the interest of short-term interoperability with experimental
systems.

But for future specifications, it is perfectly reasonable (and perhaps
inevitable) to include "hooks" in the form of parameter entities like
%cextra in the HTML DTD that allow information providers to extend the
language on a per-document basis. And this does _NOT_ necessarily
imply full DTD parsing in every client. A browser could, for example,
support a constrained subset of declarations like:

<!DOCTYPE HTML [
<!ENTITY % html PUBLIC "-//W3O//DTD WWW HTML 2.0//EN">
<!ENTITY % cextra "|quark|lepton">
%html;
]>
...<quark>...</quark>...

Even with these hooks, we only provide limited extensibility. There
may be a need for folks to experiment with idioms that are completely
irreconcilable with the DTD.

We can model this in any number of ways:

-- The "ignore tags you don't recognize" convention.
Experimental documents are an "invalid" documents, and
the "unknown" tag names are markup errors, and could
be reported to the user as such.

This works ok for phrase-level markup, but not for
elements that, for example, should cause paragraph
breaks. Imagine a document that uses BLOCKQUOTE
in a browser that doesn't support that element:
the blockquotes would run into the neighboring
paragraphs.

If all we need is various phrase tags on a per-document
basis, the %cextra hook will do just fine.

-- Any document with experimental tags must include
a prologue with declarations for those tags; i.e.
if you want to mess around with experimental
tags, you have to provide a corresponding DTD.

We could support idioms such as:

<!DOCTYPE HTML PUBLIC "-//experimentor//DTD WWW HTML//EN">

and a browser could look up the PUBLIC identifier
in a table of supported (i.e. "precompiled") DTDs.

This leavs open the question of: what do you do
with this arbitrary document that you've parsed?
How do you display it? How do you find the links?

Do we adopt a stylesheet mechanism? Architectural
forms? Both?

-- Browsers could support arbitrary DTDs at runtime, and
we could write:

<!DOCTYPE FOO SYSTEM "http://myhost/mydtd">

and a browser could retrieve the DTD at runtime.

At this point, we're talking about a beast that
is clearly distinct from HTML.

There are a lot more issues relates to "how do I express stuff that's
not in the spec?" But for now, the answer is "you can't."

Dan