HTML spec. questions

James (jtilton@jupiter.willamette.edu)
Sat, 8 Jan 1994 16:12:25 -0800 (PST)


Following my comments on comp.infosystems.www about device-indpendent
HTML and the like, a number of people have suggested that what's needed
is a tool for the checking of HTML code for inconsistencies and bad
practices. To that end, I'm starting to work on "lint for the web" sort
of program.

To that end, I've been reading through the specification for HTML at
CERN, and have some questions:

* the comment is made in
http://info.cern.ch/hypertext/WWW/MarkUp/Text.html that "neither spaces
nor tabs should be used to make SGML source layout more attractive to
read". This is understandable in the case of tabs, since their
behaviour is undefined. But why shouldn't somebody use spaces to make
their HTML source more readable? I thought the specification called
for spaces to be collapsed into a single space? Or do we not get to
make that assumption? I'd like to be able to format my HTML like:

<ul>
<li> this is my unordered list. I realize that this first entry is awfully
long, and I'd like to have spaces to indent it in the source in order
to make it readable to me, as an author.
<ul>
<li> and if I nest lists, I'd like to be able to indent, so I don't
lost track of things.
</ul>
</ul>

Is this sort of thing acceptable? Shouldn't it be?

* It's not explicitly declared in the specification whether a <HR>
implies a paragraph break. I'm assuming it does, but I'd like
confirmation :).

* On that note, I'm also under the impression that if an element implies
a paragraph break (such as the ADDRESS element), then a <P> should
neither be place immediately before OR after it. Is this correct?

* Does PRE imply a paragraph break?

I'm not sure yet what the scope of this program will be. At the minimum,
it will do things like point out incorrect usages of <P>, and other
things which are pointed out as not recommended by the specification.
That is, things that will be parsed successfully, but aren't really
device independent. I'm not sure whether or not it should also check to
see whether the HTML is just plain illegal -- is this neccessary or even
desired functionality? (And do I want to go to the extra effort? :) )

Any comments appreciated!

-et

/ (James) Eric Tilton, Student AND Student Liaison, WITS \
\ Class of '95 - CS/Hist -- Internet - jtilton@willamette.edu /
<a href="http://www.willamette.edu/~jtilton/">ObHyPlan!</a>, chock fulla
<a href="http://www.willamette.edu/~jtilton/whatsnew.html">Fun Stuff!</a>