Info on Hypertext format ?

Marcus Speh (marcus@x4u.desy.de)
Mon, 22 Feb 1993 17:10:13 GMT


Someone from the DESY computer centre assured me
that this was the Hyper-FAQ to the WWW-wizards. I hope
to pass your kill-file nevertheless. [The same person
did not have any clue how to answer my question.]

I wanted to start a FAQ for one of the hepnet.* newsgroups.
Since the freeHEP server at freehep.scri.fsu.edu is tied to
WWW, and it also archives the hepnet hierarchy, my question
is: how do I format my FAQ in Hypertext?

Besides the question, I also have an information to
give: on the faq-maintainers mailing list, there is a BIG
debate right now on standard formats for FAQs. The discussion
was started by Thomas A. Fine's posting to the list which I
repost here--though I don't assume that the WWWmasters haven't
heard about it. In the future, the project he outlines may be
the cheapest way of getting your FAQ into Hypertext, but right
now the possibility does not seem to exist.

I'd like to know what you guys think.

--Marcus Speh
--<marcus@x4u.desy.de>



------------------------------ cut here -----------------------------
From: Thomas A. Fine <fine@cis.ohio-state.edu>
To: faq-maintainers@MIT.EDU
Subject: Converting FAQs to hypertext
Date: Fri, 12 Feb 93 20:55:55 -0500


I'm trying to develop a project which will convert FAQs into hypertext
documents under the World Wide Web system. I have a specification,
which is included, for a standard format for FAQs such that they can
easily be translated into hypertext documents. We will translate all
conforming FAQs at Ohio State, and make them available to the world in
hypertext. I'd like to let everyone have a look at what I'm doing, so
you can point out problems before I go ahead and do it.

First I'd like to make certain things clear:

* We can deal with more than one format. If you find what I've done
too restrictive, we can probably develop software that will turn
your own format into hypertext. I would encourage the development
of several different formats, as long as "several" is substantially
less than the total number of news.answers postings.

* The development of additional formats and software is fairly easy.
When the original concept was developed, I wrote perl scripts to
convert comp.lang.perl and comp.unix.questions FAQs to hypertext
(different software for each), in about 3 hours. Software for
the format I'm recommending was also fairly easy, although it has
been tweaked on a fair bit, so it was more than three hours.

* The format that I'm proposing is very flexible, IMHO. It is designed
for generic documents as well as lists of questions and answers.

* My format preserves most of the text formatting of the original
document.

* I'm very open to suggestions, volunteers, etc.

* I don't want to make conformance to ANY format a posting requirement.
If you choose not to conform to anything, you just won't have your
posting turned into hypertext.

* If you know about WWW and want to supply your FAQ to us directly as
hypertext, we can make arrangements for that.

* Submission of FAQs to us will be through two different methods. The
most painless is post it to news.answers -- we'll scan it and
convert anything we find that conforms (we may scan other
newsgroups as well). The other method will be to send us mail to
<an address that hasn't been set up yet>. This method will provide
faster response that you might want if you are modifying your FAQ
and want to see it in hypertext.

* The conversion software will be available (this is a requirement for
contributed software and formats). Currently, we have software which
converts a document, but we don't have any software for scanning news
or catching mail. Fairly trivial though.

Background

World Wide Web is a networked information retrieval system, with its own
hypertext documents at its base. Even though WWW is still under development
much of it is already solid, and there are already several applications
and a nice amount of information available. WWW can interface with both
WAIS and Gopher, and interfaces with other systems have been prototyped.
It is ideal as a local help system, as code already exists for converting
man pages, RFCs, emacs info documents, and lots of other things into
hypertext.

Currently, WWW only supports hypertext documents, although plans are being
made for a protocol which will deal with all MIME document types. Lots
of other ideas are being passed around and there's still much to be done.
Volunteers are welcome. There is a mailing list; subscribe via
www-talk-request@nxoc01.cern.ch. Hopefully we will have our own newsgroup
soon.

What follows is several things in one: It is a rough draft of what will
be posted monthly to news.answers describing this project. It is a
description of the format. And it conforms to the format, and therefore
serves as an example. A formatted version is available if you grab the
software (information is available in the document).

I'd very much prefer that you look at the hypertext results before
commenting. If you don't want to go through the hassle of compiling
the software, XMosaic is available in binary form (see below).

tom

----------------------------------------------------------------------------

Path: cis.ohio-stae.edu
From: fine@cis.ohio-state.edu (Thomas A Fine)
Newsgroups: news.misc,comp.infosystems,comp.answers,news.answers
Subject: The World Wide Web FAQ Project
Followup-To: comp.infosystems
Date: 11 Feb 1993 15:08:54 -0500
Expires: 11 Mar 1993 00:00:00 GMT
Summary: Information on the conversion of FAQs to WWW hypertext documents
Organization: The Ohio State University Dept. of Computer and Info. Science
Lines: 240
Message-ID: <asdfINN123@soccer.cis.ohio-state.edu>
NNTP-Posting-Host: soccer.cis.ohio-state.edu
Content-Type: text/x-usenet-FAQ;
version=1.0;
title="Hypertext FAQs"

Archive-name: hypertext-faq-format
Last-modified: 1993/02/11

Statement of Intent
-------------------

FAQs are a wonderful resource, but hard to work through. This project
is an attempt to unite the volume of information found in news.answers
and other newsgroups with The World Wide Web, a system for networked
information retrieval.

World Wide Web uses hypertext documents and a network transport
protocol to build a huge web of information all over the world.
Accessing documents is as easy as clicking on a mouse. (there are
tty-based interfaces available too). WWW (as we like to call it, for
obvious reasons) also knows how to talk to other services including
WAIS and Gopher. There are plans to extend the document type to be a
MIME document, of which WWW's hypertext (called HTML, for HyperText
Markup Language) will be one part.

Since we don't expect everyone to learn HTML (although it is fairly
straight forward), we have designed a format that can be used for
news.answers (et. al) documents and FAQs which will allow us to
automatically convert them to hypertext. The format has been designed
to allow FAQ maintainers to providing conforming news articles with
minimal changes. Note that while this is currently the only format,
it is possible to support multiple formats.

The format is described in following sections. Note that this article
itself conforms to the format, and a formatted version is available thru
the Web (See "Getting WWW Software").

Getting WWW Software
--------------------

To see what this document looks like after it's been formatted, grab
yourself some software and give it a try. The software is available
via anon ftp from various places. There are several different packages
available:

ftp.ncsa.uiuc.edu in /Web/xmosaic
xmosaic-0.7.tar.Z X11 browser - fairly new, very nice.
(binary available in the dir binaries-0.7)

info.cern.ch in /pub/www/src
tkWWW-0.4.tar.Z X11 browser - Tcl/Tk implementation
viola920730.tar.Z X11 browser - a bit out of date
midaswww-1.0.tar.Z X11 browser - new version expected soon
WWWLineMode_1.3b.tar.Z dumb terminal browser
WWWNextStep_0.15.tar.Z A NextStep browser and editor
www_and_frame-0.2.tar.Z A package for editing HTML with FrameMaker
[my tty-based browser and editor will hopefully be included soon]

To find the FAQ stuff in the Web, you will need the following Universal
Resource Locator, which can be typed into your browser in an application
specific way (the XMosaic author promised to include a built-in link to
the information):

"http://www.cis.ohio-state.edu:80/hypertext/faq/usenet/FAQ-List.html"

The hypertext documents produced from conforming FAQs uses few of the
features of html, and so will look rather plain looking. If you would
like more information, start by getting the software and rummaging
around the web. You can also get on the www-talk mailing list by
sending to www-talk-request@nxoc01.cern.ch. Lastly, you could bug me
with mail if you were really desparate. If after seeing what this
system can do, you decide you want to provide your documentation
directly as hypertext, contact me (fine@cis.ohio-state.edu) and we'll
work out the details. Send questions about this format to the same.

The Header Format
-----------------
[Note that when "article" is used, a single article is being referred to.
When "posting" is used, the entire set of articles are being referred to.]

In order to be recognized as a conforming article, it must use MIME
headers as follows:

For single-article postings, the header must include:

Content-type: text/x-usenet-FAQ

In addition, two fields can be added to this line:

version=1.0

This indicates the version number to process the file with. If absent,
version 1.0 will be assumed. The other field:

title="The title of the article"

This will be used in various places in the conversion to hypertext. If
not present, the subject line (in its entirity) of the first article
of the posting will be used as the title of that posting. The posting
title must be unique.

Note that when attributes are used, semicolons should also be used
after the Content-type, and after each attribute except for the last:

Content-type: text/x-usenet-FAQ;
version=1.0;
title="Blarg"

For multiple-article postings, the Content-type information as described
above should be the first thing found in the BODY of the first article
of the posting (it can be included in the secondary header with
Archive-name and Version lines found in many FAQs). In addition, each
article header must contain the MIME multipart information:

Content-type: message/partial;
number=1;
total=3;
id="totally-unique-id-string"

The "total" attribute is only required on the final part of the document.
The id will be the same for each article in the posting, but is supposed
to be guarenteed unique among postings. A format similar to the typical
news message id is recommended, e.g. something including the poster and
the posting host along with an id unique to that host (the time).

The Body Format
---------------
The content is fairly free-form. It can contain any of the following
"sections"

Documents
Ignored text
Questions/Answers

The Ignored text is stripped out first, then the articles are appended,
together. The converter then expects to see a series of "Documents"
and or "Questions/Answers" sections. These are all described
below if you are in text, or are links at the top level, if you
are in hypertext.

These will be formatted into a top level hypertext document with links
to all the other documents. The software will attempt to handle simple
subsets of this format accordingly; for instance if a posting consists
entirely of a single questions/answers section, the conversion will
show the list of questions as the top level of the hypertext (this
hasn't been implemented yet).

Each of the Documents or Questions/Answers sections must be started
with a blank line, a left-justified title line, and a left-justified
line of dashes to underscore the title. The only exception is a posting
which is either a single document, or a single set of questions/answers,
in which case no such title is required anywhere. Also, the first
section does not require such a title; if it is left out, the title
"Introduction" will be used.

Documents
---------
A document section is just any section of text, separated by the section
title described previously, and not matching the Questions/Answers form.
This means you must make sure no line in a document section starts with
a number followed by a right parenthesis.

This section on "Documents" you are reading now will be an entire
hypertext document after conversion (this may not be the best
choice for the hypertext layout, but makes a good example.)

Ignored Text
------------
Some text can be ignored in the posting. Typically in news articles, you
may need to include some redundant text in every article, that won't be
needed in the hypertext. Also some information, like how to unpack the
articles might be unneeded. Lists of questions are another example, since
the conversion software builds this list from the questions themselves.

To mark text as uneeded, it should be surrounded by lines consisting only
of "--".

An example starts here:
--
This text won't show up in the hypertext.
--
That was the example.  Note that if you are looking at the hypertext, there
was nothing between the start and the end, because it was ignored.
     
Important note:  Text will stop being ignored at the end of each ARTICLE,
even if there is no ending "--".  (This can be used to  eliminate
signatures, since lots of people use the "--" there anyway.)
     
Questions/Answers
-----------------
Any section which contains a left justified number followed by a right
parenthesis and then white space will be treated as a Questions/Answers
section.  The number can contain several decimal points, so "1.4.11) "
is an acceptable starting string for a question.
     
The requirements for this section can be summed up as:
* There must be a blank line before and after each question.
* All answer text must be indented from the left margin.
* All questions must be have no indentation on the first line.
* All questions start with a number, a right parenthesis, and some whitespace.
     
There can be a section title for some portion of the questions.  It
must have no indentation, and must be preceded by blank line, and
followed by a blank line and then a question.  It cannot start with a
number!
     
Creating Additional Links
-------------------------
Anywhere where the string ``(See "document title")'' occurs, a link
will be created to that document if it exists.  This link is just
an example: (See "Questions/Answers").
     
If you are going to do this, make sure you refer to a document title
that is unique.  All non-unique references will be ignored.
     
Questions and Answers
---------------------
     
Section 1.  The Documents
     
1.1) What is the documents section for?
     
  For introductions to newsgroups, introductions to FAQs, and other
  postings of interest for a newsgroup that don't fall into the question
  and answer scheme.
     
1.2) What if I have only a single document (and no questions)?
     
  Then make sure there are zero or one document titles (a title line
  followed by a line of dashes), and nothing that looks like the start
  of a Question/Answer section.
     
     
Section 2.  The Question/Answer portion of the FAQ
     
2.1) What if my question is too
     long to fit on one line?
     
  Just make sure you don't put a blank line in the question.  The software
  will deal correctly.
     
2.2) How do you distinguish section titles from questions, if both
start at the beginning of the line?
     
  Sections titles can not start with numbers.  Questions have to start with
  a number, followed by a right parenthesis and then whitespace.
     
2.3) Where, exactly, should I put blank lines
     
  A blank line has to occur before and after each question.  Blank lines
  must also occur before each section title.  All other blank lines will
  be preserved.
     
Section 3.  Other questions
     
3.1) Can I create additional links?
     
  Yeah, just put in text that looks like this:
     
      (See "Creating Additional Links")
     
  It must refer to a Document title or Q/A title somewhere in the posting.
     
3.2) Can I do tree structuring?
     
  The format is converted into a three level tree, but you can't impose
  any other structure on it, unless you want to provide your documentation
  directly as hypertext.
     
3.3) What if I don't like your format?
     
  Come up with your own.  We may even help you write the software to convert
  it.  Since the format requires a version number, its easy for us to support
  multiple versions at the same time.
     
  If you do decide to create your own format, it would be best if it would
  be general enough for more than your own postings, as we think it would
  be a little cumbersome having a separate piece of software for every
  different posting.
     
3.4) Since this uses MIME headers, will there be an application
     to read such documents with a MIME news/mail reader?
     
  Eventually, but not now.
     
3.5) Why didn't you use SeText format?
     
  Because we wanted to get this done quickly, and SeText isn't a fully
  realized standard yet.  Also, its not clear that SeText will handle
  all the functionality we needed.  It is possible that future versions
  will be based on SeText, or at least include some of its features.
     
  BTW SeText is a pseudo-markup language for text documents.  All the
  markups are chosen as items which won't interfere with the reading
  of the document in an unformatted state.  Neat stuff, but not ripe
  yet.
     
About this posting
------------------
This is posted monthly to news.answers, comp.answers, comp.infosystems,
and news.misc.  If changes are made between posts, no differences will
be posted as the formatted and unformatted versions available through
the web are assumed to be the latest.
     
Send comments and corrections about this posting to fine@cis.ohio-state.edu.
Please send a context diff, or an entire modified version of the file.
     
--
------------------------------------------------------------------------------
| Thomas A. Fine    | fine@cis.ohio-state.edu   | 2036 Neil Avenue Mall      |
| CIS Staff         | (614) 292-7325            | Columbus, Ohio 43210       |
| The Ohio State University - Department of Computer and Information Science |
     
     
     
--
       //////////////////////////////////////////////////////////
      //   marcus@x4u.desy.de   //  Marcus Speh               //
     //  [131.169.30.33]       //  II. Inst. f. Theor. Phys. //
    // Phone: (040) 8998-2260 //  Luruper Chaussee 149      //
   // FAX: (040) 8998-2267   //  2000 Hamburg 50 / Germany //
  //////////////////////////////////////////////////////////