HTML and SGML

lee@sq.com
Fri, 19 Aug 1994 17:42:33 -0400


Larry Masinter asked that I distribute this article more widely..
so here it is.

Lee

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK/XView/mf-fonts FAQs;lq-text unix text retrieval
HTML: SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/contrib/SoftQuad, and also
doc.ic.ac.uk:packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/...

<INCLUSION> Newsgroups: comp.text.sgml Subject: Re: HTML questions Summary: Expires: References: <32ccuo$niu@usenet.hana.nm.kr> Sender: X-Feet: bare Followup-To: Distribution: Organization: SoftQuad Inc., Toronto, Canada Keywords: hypertext, HTML, HyTime

I was going to mail this (as requested), but I thought it might answer some other people's questions about HTML and SGML too, so I am posting it.

Eugene Byon (Human Computers, Inc., Seoul, Korea) <eyb@next.human.co.kr> wrote: > [...] > 2) [HTML] an SGML DTD, thus giving us all of the advantages of using SGML > without having to spend a great deal of time and effort in familiarizing > ourselves with all the facets of SGML.

"the VW Bug is a car, thus giving us all the benefits of the automobile without having to learn how to operate a complex Cadillac."

HTML does indeed have an SGML DTD, but it uses very, very few of the facilities of SGML. This isn't to say that HTML is bad; it's very useful. It solves real problems, and lets people do things they can't in fact do in `pure SGML', which is why there are an estimated four to ten million users of HTML today.

The hypertext linking that HTML provides is not done in SGML, but with HyTime, and ISO standard for linking between documents (and other hyper- media activities). This doesn't mean that you can't put hypertext links between SGML documents without HyTime and have a system that makes the links work. But to do this without being tied to a single vendor will probably be best done with HyTime.

>1) Is there any way hypertext links can be implemented within RTF doc- > uments?

Not portably. In fact, since the RTF spec changes fairly frequently, you can't even be sure you'll be able to read your RTF files tomorrow.

>2) John Krieger of Westinghouse says that a converter from RTF to HTML > exists. How effective is that converter, and where can we obtain a > copy of it?

You could try archie. Please don't post if you don't now archie -- read news.answers or send me mail [this is _not_ a general invitation :-)] <A SRC="ftp://ftp.cray.com/src/WWWstuff/RTF">You could try here.</A>

>3) How extensively is HTML being used in newspaper publishing today?

It isn't, as far as I know. HTML is not particularly suitable for newspapers. Or, about as heavily as VW Bugs are used in farming :-)

But some of the more successful newspaper publishers are looking at moving to SGML. Obviously, the others are already using SGML :-)

>4) Does the use of HTML documents heavily depend on the World Wide Web > and the Internet?

Yes. You can use HTML locally, but it's much more fun on the net, because you can see all those undergraduate (`sophomore' in the US) Home pages... For more information about HTML, please go to the newsgroup comp.infosystems.www.users and read the FAQ before posting.

>6) What are the disadvantages of using HTML? How does HTML compare to > HyTime for creating hypertext documents?

Use HTML if * you have many small documents @ there are no tables @ there are no mathematical formulae @ you do not want to have links to particular locations within documents, only to entire documents * there are multiple on-line providers of information over whom you have no editorial control * you want people on the internet to have fast online access to your text * producing paper printouts is not a major goal @ archival status is not an issue, you are not concerned about incompatible changes in software because you can easily edit all your documents in the future * you do not need to do structure-based searching (see below) * you do not need to print or display different parts of documents differently at different times (e.g. put all keywords in bold)

The items marked with @ are changing either as part of the process of making HTML an international (IETF) standard or as part of the natural evolution of HTML.

The items marked with * are probably fundamental to HTML, although it's all fairly subjective.

What HTML does, it does _very_ well. As a basis for a newspaper's archives and publications, it's a little weak.

If you use SGML, youcan invent some tags of your own when you start.

For example, suppose you have tags like <Story> <By>Simon Barefoot <Title>White Slavery Ring uncovered in Washington <Printed> <paper>Pentagon Paragon <press-date>15/5/1994 <page>1 <column>1 <Edited>Susan Gore <Text> Today in <City>Washington</City>....

Here, you have a complex header that you might keep in a database, and also text with information such as a city name marked as such. How much of this you do is up to you in SGML. You can't do any of it in HTML or RTF.

You might then be able to do searches like find me all articles by Simon Barefoot that mention the city of Washington (e.g. not George and Mildred Washington)

RTF does not easily support content searches. HTML servers sometimes have quite good searching, but do not support what's called `content tagging', where you have tags that reflect the meaning, rather than the appearance, of your text. You can't do that in HTML at all.

I hope this helps a little.

Lee

</INCLUSION>