Re: HTTP HEAD request
Mon, 10 Apr 1995 22:49:54 +0500
Once upon a time you, Jim Hurley, wrote:
--> I wrote:
--> >And according to the DTD:
--> ><!ELEMENT HEAD O O (%head.content)>
--> ><!ELEMENT BODY O O %body.content>
--> ><!ENTITY % html.content "HEAD, BODY+">
--> >The first O indicates the opening tag is optional, the second one
--> >indicates the closing tag is optional.
--> >Every HTML document must have a head, and I did not say it should not.
--> >All I said is that the <head>, </head> *tags* do not have to be
--> >present, as confirmed by the DTD. (Similar for the <body>, </body>
--> >tags.) Apparently, HTML parsers are smart enough to decided for
--> >themselves what is the head and what is the body.
--> >--> >Returning an error if it encounters EOF before </head> would be a
--> >--> >major design bug.
--> >--> A major design bug of the HTML document, yes - but these are so
--> >--> commonly encountered.
--> >Nope, just like </p>, </li>, etc some tags are not required.
--> But this last part was about encountering a <head> but not getting
--> a matching </head>. Are you saying the <head> is terminated by
--> <body> or some body part?
All I said is that all the tags <head>, </head>, <body> and </body>
are optional. One could have a document with just the </head> tag, or
only <head> and </body>. It is all legal according to the DTD. So, if
you want to grap the head *section* (not the <head> *tag*) you would
have to be a little smarter. However, since there are only a few tags
part of the head section, it is not difficult. Whenever you encounter
anything which is not enclosed by any of the valid head section tags
(like <title>,</title>) you have reached the body part.
However, the question was originally raised asking a way to get only
the head of a document. This means the server has to parse the
document itself, which makes servers more complex, and more