Re: HTML todo list

Tim Berners-Lee (
Thu, 14 Jan 93 18:02:07 +0100

My machine crashed from too many wondows and I lost a few unsent mail
messages with that. So I may repeat myself at first, up to point 14.

Changes to the DTD I have made are in


Connolly/Current/* is untouched.

> Date: Mon, 11 Jan 93 22:36:43 CST
> From: Dan Connolly <>

> 1. My dictionary lists "markup" as a word, not mark-up.
> 2. The PLAINTEXT situation should be logged as a bug against
OK it is but not many servers use it and clients like to be able to
get source of postscript files for example easily. HTTP2 will fix it.
> 4. HTML should support QUESTION and RESPONSE elements to
> support converting USENET FAQ's to HTML
Too specific I think.
> In
> 5. PLAINTEXT is deprecated. Use PRE, and use a sed script
Done. text2html.sed on th web under HTML generation tools.
> 6. .../WWW/Tools/HTMLGeneration/dir2html.txt
> This thing doesn't quote attributes; ...

> 7. .../WWW/Tools/HTMLGeneration/ls2html.awk.txt> Quotes around

> 8. .../WWW/Daemon/Implementation/asis.txt
> Quote HREFS, numeric character references where necessary.
Quote sin online version, original is being rewritten anyway I am
> 9.
> Uses HEADER in stead of HEAD.
> Quote HREFs.
> Special character entities?
> Yeah! It uses numeric character references already!
Does it? You mean entities I think.
> In

> 10. Mark-up again

> 11. This text seems out of place:
OK I have hidden it. :-) Does your spec say it anywhere?

> 12. Default text: this node seems to confuse lots of issues.
OK Reference to your doc instead

> In
> 13. This text is out of place:


> In
> 14. These blurbs should probably quote their element declarations
I have started an HTML.dtd.html with links.

> In
> 15. This seems redundant:

> 16. What does this mean?
Elaborated and more sepcific.

> 17. Should the TITLE element be CDATA, RCDATA, or PCDATA?
> If we want to be able to use Latin chars in the title,
> it can't be CDATA. The only difference between RCDATA
> and PCDATA (with no subelements allowed) is that comments
> are recognized in PCDATA, whereas they are just regular
> data in RCDATA.

Good point.

- If we specify Latin 1 as the base set, can't wehave latin 1
characters in CDATA?

- If we can't, then I guess we use PCADATA as it would be the
only place except for <XMP> and <LISTING> where we can use

> In
> 18. The word "Format" connotes lexical details, which are discussed
> elsewhere. I endorse the use of examples, but I'd like to keep
> the model of
> SGML source ==parser==> ESIS ==WWW semantics==>formatted
> consistent. The WWW semantics processor doesn't deal with <>'s etc.
> It just sees the presence of the ISINDEX element and acts

Yes. OK. But I want as I said before (unless the crash lost the
message) to have two documents out of this. One is the HTML spec for
MIME IANA registration. The other is a readable document which is
NOT 100% a precise refernce document but can be read by human beings
WITHOUT SGML knowledge. I can guess that this document will have 10
times the readership of the other if it is readable, as <10% of the
people creating HTML will know about SGML CROs etc etc.

It is good to have a lot of cross-reference between them.

> In
> 19. The status of each element should be noted consistently. e.g.
> Mainstream Consistently used by past, present, and future
> Deprecated In use and will be supported, but should be avoided.
> Obsolete In use in some documents, but will not be supported.
> Proposed Not yet in the DTD or widely supported (e.g. LINK)
> Standard Not yet widely supported, but will be (e.g. PRE)
> Extra It's legal to ignore these. (e.g. EM)

We have almost as many categories as elements! I'd add
Obsolescent Will be obsolete when the alternative implementation
(eg HTTP2) is available.

I'd make PRE mainstream as there are no implementations for which a
new PRE-understanding version is not available or easily made
available. And so cut out "Standard" OOps I put it in again

I have made NEXTID Mainstream. Editors need it: can't do without it
really. I would perhaps change it to <EDITING NEXTID=z27> if that
was felt to be more logical.

We also need a hook for a version for the checkin/out/lock logic
DAN(?) proposed. That was that when you
lock or PUT a document, you specify the version so that a document
can be PUT or CHECKED IN by a different person to the one who GoT it.
This means the server gives a key, a version or date code, with the
document. This is all HTTP2 except when a document is stored
somehwre, passed around and then eventually returned to the server.
In that case, it needs a place to hold its original version number
on the server.

<EDITING NEXTID=z27 CHECKEDOUTAS="19930217234507">


> 20. How many of these are allowed? I could change
Any non-negative integer
> I don't know if the latter is legal SGML. I'd have to try
> it out.
I think that's what we want.

> 21. Link types are not well defined. The only reason to put
> something in a public specification is so everybody can agree
> on some semantics. If there are no semantics to agree on,
> why include the TYPE attribute? (It's status is at best
> "proposed" in my mind, though it's in the DTD.)

Yes and no. We need some well-define link type but we also need this
as a hook for the future which we haven't enugh experience. Link
types whould be registered.

This is a flexibility point, but it must be firm ... like
a towing ball on the back of your pickup you want to be able
to connect anything onto it but you want it well fixed onto the

But I want to make it REL instead of TYPE as people think TYPE
refers to the object type of the desdtination object rather than the
link. (From messages on this list).

> In
> 22. "(at least six)" -- how about exactly six? Though I've
> seen a lot of style guides that frown on anything more than 4.

I agree. I wuld frown ony anything over 3 in a hypertext document.
However, it is useful to generate a great big HTML document by
concatenating little ones, demoting their heading levels. You then
print the big document. This generates up to 6 easily. Maybe we
should go to 9 but frown on >4.

> In
> 23. We should give at least one complete reference to the standard,


> 24. In the Archive section, we could metion comp.text.sgml,
> the SGMLs parser materials, and the archive.

Link put in cruely.

> In
> 25. All attribute values have to be quoted, including NAME.
> The example is wrong.

I have cahnged NAME to ne a NAME -- ie doc-wide unique which it must
be. Numberic ones are then not valid but I donb't generate them any
more. I think that we should stick to the intended ID system. In the
future, we can think about IDs on many other elements.

> 26. The TYPE attribute hardly seems worth mentioning. In the DTD,
> it's a NAME, not just any old string.

I have made it REL as I said above and I think it is very important.

> 27. We should look at modeling anchors as HyTime linkends
> and/or ilinks.

Yes I agree when someone has time to get into that.

> 28. We should look at modeling the LINK element as a HyTime
> construct as well.

> In
> 29. I don't like the use of "exact representation" here:
OK we stick to "rendering" for that

> 30. Where are P's allowed? In the DTD, they're allowed in:
> CODE, SAMP, etc.

That's right. They are not in the CERN implementations allowed in
<DL> or <UL> etc, but they would be useful in those.

> In
> 31. Ordered lists: Obsolete or Standard?

Standard. Bother I thought we'd got rd of that! (The next editor will
turn them into unordered lists at the moment but I can fix that)

> 32. "The format is:" Here again, this is an example, but it's
> hardly a specification of the format of a UL element.

Ok. example.

> 33. What does this mean?
> The opening list tag must be immediately
> followed by the first list element.

(LI | (A|%text)+) in SGML I suppose just as you say.
You can't
<UL>and here they all are:
<LI>The first..
<LI>the second

> 34. The important difference between UL, MENU, and DIR is not
> how they are displayed, but their semantic meanings. A MENU
> is a list of things to choose from. A DIR is a list of names
> in a directory.

Yes and no. I too like logical definitions -- I am sold on semantic
markup but HTML is to cover a vast range of data and semantics. MENU
These things are NOT necessarily what their names suggest -- many a
selectable menu is set out as a DIR or a DL. The element names are
mnemonic only. The blurb talks about how much text is in the

> 35. We could also make this semantic distinction between PRE,
> XMP, and LISTING, were it not for the syntactic confusion
> surrounding XMP and LISTING.

We coudl but we are deprecating XMP and LISTINg and PRE will do for
all. You can only be very semantic in a very narrow application.
This is not one.

> 36. Get rid of this:

> In
> 37. Wording of the newline documentation:
> Line boundaries within the text are
Reworded with "render"

> 38. Semantics of newlines in PRE. Given the current DTD, a newline
> after the PRE start tag or before the PRE end tag is not reported
> by an SGML parser.

> I think I can cook up some magic SHORTREF declarations that will
> cause the SGML parser to report the newlines (possibly as P tags).
> [This would obviate the need for special newline processing code
> in libHTML]

> In any case, I'd suggest that ALL NEWLINES REPORTED BY THE SGML
> leaves the issue of which newlines are reported, which is governed
> by the SGML standard.

... and with the issue of explaining the end result to the
simple HTML writer and to me without our needing to call on the
model of the SGML engine and application. Awaiting the results
of your tests with SHORTREF.

> 39. I don't like the way this is worded:
> The &#60;p&#62; tag should not be used.
Ok done

> 40. "... character character highlighing elements may be used."
> Ack! I don't recommend this! Hmmm... maybe only the B, I, and U
> elements. This certainly conflicts with the current DTD.

Serious point here folks. There was a great demand for B I U
for man pages and the like. Why prohibit anything other than TT.
or to keep it simple, allow anything and mention TT should not be
used, and the constraints of fixed width may limit the ability to
render some highlighting.

I have introduced %htext noting that text always occurred with A.
I hope I have done it right.

> In
> 41. These have status "Extra"
> Where not supported by implementations,
> like all tags, these should be ignored.<p>

> This should be a warning to providers that some information may
> be lost on some browsers.

> 42. (Definition of these and reference
> - Dan?)
> They come from TeXinfo.

> 43. I left the TeXinfo @file element out. I don't remember why.
> It might have been an oversight. Do we want it in there?

No too sepcific. We have enough.

> 44. Examples (TBD) see complete.html in my stuff.

I repeat that I like your examples but I would like them split
into GOOD HTML documents describing bad HTML documents,
with links to the bad documents for testing only.
We don't want people to follow links to the only documentation to
find their parser has core dumped :-)

> In
> 45. The PLAINTEXT tag terminates the HTML entity. What
> follows is not SGML. In stead, there's an HTTP convention
> that what follows is a text/plain body.
OK -- in.

> 46. "The text may contain any ISO Latin printable characters" --
> this conflicts with the DTD. I think a design that treats Latin
> characters as external data entities is cleaner than one that
> treats them as text characters, but I'm willing to go the
> other way.

I'm glad. Lets. I think that a full 8-character base will be
easier. I think the text should be able to contain any latin 1.

> 47. "including the
> tag opener, so long as it does not
> contain the closing tag in full."
> For Pete's sake, could we get this out of there once and for all?
OK OK OK :-) I hope "The text may contain any ISO Latin printable
characters, but not the end tag opener. (See Historical note)" is OK

> 48. "The <a NAME="z22">XMP tag</a>..." Use the term "element". The
> term "tag" doesn't include the content of the element.

> In
> 49. "Special characters are represented
> by SGML entities"
> They're represented by numeric character references.
> The lt, gt, and amp entities are not in the DTD. They should
> be supported for historical reasons, but they are obsolete.
I would like them in the DTD. While people are still reading/writing
HTML they are useful. My mental ASCII table is in hex, not decimal,
anyway. Are they any overhead? Why the war against them? For the ISO
characters you wanted the opposite. (Does your menatl ASCII table
stop at 128? Mine too)


> In
> 50. I'd like to move the Abstract, Specification, and the reference
> "Text and Markup" up into
> That node would look like

> <H1>HyperText Markup Language</H1>
> <H3>Abstract</H3>...
> <H2>Language Reference</H2>
> <A>Text and Markup</A>
> <A>The Elements</A>
> <A>Implementors' Guide</A>
> <H2>Specification</H2>
> <A>the DTD</A>
> <H2>Appendices</H2>
> <A>futures</A>
> <A>constraints</A>

> and this node would become "Implementors' Guide", with
> pointers to recommended, complete, tolerated, errors,
> libHTML, and SGMLs.

> In
> 51. include ISO Latin 1 character set in SGML declaration?

> 52. Put PLAINTEXT back in HTML element (fell out by mistake.)

> 53. LINK element?

> 54. Get rid of H5 and H6?

> 55. Get rid of link TYPE lement?

> 56. Document BLOCKQUOTE in Elements reference.

This BLOCKQUOTE... it should be one thing or the other.
If it cannot contain other paragraph styles then it should be a
paragraph style like address, and not be able to contain address.
This is easy for everyone to implement.

If it can contain ADDRESS then why not let it contain anything - in
particular, headings. Trouble is, I can't represent that in RTF
easily so than blows the NeXT and Mac browsers. So let's
make it

<!ELEMENT BLOCKQUOTE - - (%htext;|P)+>
like ADDRESS, and bear it in mind for HTML3 which will have SECTION
in, ie without the linear RTF constraint.

> 57. EXPIRES attribute on HEAD?

I toook it off .. its in HTTP2, as it applies to all formats not just
> 58. Get rid of NEXTID element?
Nope .. needed to stop editors reusing deleted IDs. See above.

> 59. Document URN, TITLE, METHODS attributes of A element.
Ooo yes. Done. Lots of "notes" attached for info only.

> 60. Proposed Headers element (like a DL; for RFC822 message
> <dt>To<dd>
> <dt>Subject<dd>HTML todo list

1. In fact, <DL COMPACT> looks very similar and has less narrow a
2.In fact the headers inforation could rather be regarded as part of
the metainfo in the <HEAD> element. Many of the RFC822 things will
in fact be outside the document in the HTTP layer. This is a bit
chick-and-egg. Here we are describing an SGML dtd for a spoecific
format for a MIME_wrapped RFC822 body, and in it we want to put the
RFC822 header. Hmmm. Something has got muddled. But I understand
what you mean: very often one quotes mail messages as text. Strictly,
one shouldn't though. You shouldn't be able to edit the headers.

Currently there is DL COMPACT which does that. It is implemented in
www. I am torn betwen generality and the preacticality of getting
something defined and outthe door and I thinkl the latter wins so
let's put COMPACT as an attribute for DL and leave the HEADERs if you
don't mind too much.

> 61. List STYLE attribute?

No I don't think so -- see discussion #60

CDATA is probably nearest to the original intention?

This is your stuff dan I think:
> In
> 63. Under "Parsing Content Into Data and Markup," improve the
> explanation of the MIXED, ELEMENT, EMPTY, CDATA, and RCDATA content
> types (PCDATA is the wrong term) and how it affects parsing.
> 64. Revise the section on the sample implementation, libHTML, and
> supported.html.

> In
> 65. This node should be moved to the implementors' guide.
Same coments as above -- moved in <PRE>

> In
> 66. Delete the reference to the perl script.

> 67. There are two references here to old versions of my spec.
> 68. Header: it's in there: HEAD
> 69. Highlighting: it's in there>

> 70. Fixed width with anchors: it's in there: PRE.
gone .. all gone!

> (get rid of HP1 etc. in Elements reference)
No -- I will put that lot in another file though to keep it clean.
There are some people (here) who geenerate a lot of HPs.

> 71. Entities: Latin chars are in there. What do we need bullets
We don't.

> 72. Comments: the comment element is a bad idea. SGML comments are
> documented and supported.

They are rather different in that a comment can surround a whole
nested stack of SGML elements, and could ne nested. I don't suppose
SGML comments can?

> 73. Link types: we should look at HyTime before we go much further
> on this.
Well, there is only 9 pages on hhypertext in HyTime (More Time than
Hy) and in that I can't see any mention of link types. As I said
above (with a different metaphor), I think this should be a well
defined and entrenched gate into uncharted terriory

> In the midaswww-1.0 browser: [by the way: I've fixed all these in
my copy]

> 74. HREF's with quotes don't work
Foxed wth Tony's fix
> 75. Unrecognized tags are treated as data, rather than ignored.
> 76. numeric character references and entity references aren't
Could you post diffs please for those Dan? Thanks.

> 77. ETAGO doesn't end XMP, LISTING, PLAINTEXT unless it's the right
> GI. (e.g. <XMP>foo</foo> blah : blah should not be in the XMP

> 78. local: acess scheme is wierd. I suggest we go with ftp: and
> local-file: to match MIME, and get rid of local: and file:
Covered in another messsage. I am prepared to split
file to ftp: and local: even though there are many (decnet, afs,
etc) ways to get at files and the client may be the best judge of
what will work for him.

Now, what about the SAVEDAS adddress so that from justthe content of
the document hte partial UDIs can be resolved? I think that is a
useful thing, and could be essentail. I will put that in as Standard.

> Well, that's all I can think of. Good night.
I hope you slept well...

I have made a provisional list of link relationships. Ihope
they show the utility of the attribute. They can always be ignored!

> Dan