Re: Usenet news and WWW

Tim Berners-Lee (timbl@www3.cern.ch)
Mon, 18 Jan 93 14:03:35 +0100


> Date: Tue, 12 Jan 93 0:06:00 CST
> From: Karl Lehenbauer <karl@one.neosoft.com>
>
> Many of the issues that people seem to be grappling with are
already
> handled by news.

Yes ... but on the other hand there are things which are already
handled by

> For example, we are talking about caching nodes. News has highly
evolved
> caching capabilities -- I mean, caching is what it is all about --
both for

> TCP/IP and UUCP-based links.

I agree. There are some snags with nes, though, for the ultimate
retrieval tool. One trouble is, news caches documents very simply.
The distribution scheme is a flood broadcast. This is OK for real
news (shortlived articles), although many sites sag under the load of
a lot of stuff they never read. There are strict limit on what
anyone posts because of the incredible worldwide total system load
and disk space usage per message. There is no well-defined algorithm
for picking up archived news. The message Id of an article is not
enough: you need to know at least one of its newsgroups, its date,
and be able to deduce the archibe node name and the organisation of
the archive.

The conventions of posing FAQ lists and other "periodic postings" are
in fact an abuse of the prototcol, and would be better served by a
retrieval protocol rather than a broadcast protocol.

I know that the NNTP WG is looking at this sort of area, and maybe we
should all get together.

In a nutshell, if you take all the data everywhere available online
and put it into news, the news system will die. The use of newsgroup
names and lists negotiated by system managers to control what
documents are visible and cached where is too crude, too inflexible
-- it doesn't scale well. The caching has to be automatic.

All this said, obvioulsy news and retrieval are coming togather,
which is why we have tried to look for analogies (see previous
messages to this list) between news articles and grousp to hypertext
documents and lists at all times.

> Someone mentioned the issue of caching and node names, apparently
> node names would have to be rewritten by the cacher or need to be
made
> machine-independent in some way (?).

Don't worry about that.. I think you are referring to a discussion of
complete vs. partial UILs. Let's keep that apart...

> Article IDs are guaranteed unique
> and are server-independent. The mechanism for translating article
> IDs to filenames is fast and pretty highly evolved.
>

> Oh, ugh, "Supercedes:" doesn't cut it unless the article
superceding
> the old one replaces its article ID, which would probably be Bad.

Certainly there is a case for having the "current" version of an
article and a given "fixed" version of an article each explicitly
addressable. See
http://info.cern.ch/hypertext/WWW/DesignIssues/Versioning.html
and linked things for an old discussion of these issues.

> Expiration dates can be set with "Expires:",

Exactly. If you read the provisional HTTP2 spec there is
an explicit link to rfc850 under "Expires". (See
/hypertext/WWW/Protocols/HTTP/Object_Headers.html#z5)

> and sites that

> archive certain groups already do special things on
"Archive-Name:".

Really? Tell me more. Is that in an RFC somewhere?
reference? Example?

> Plus news is already ultra-portable.
>

> Is the brief-connection-per-document approach of HTTP still
necessary
> when the data is widely replicated?

As I said above, the mass of data will not be widely replicated.
You don't want a copy of all the data in the phone book, you just
want access to it, plus a cache (which you may currently keep in you
diary). When you're talking about all the phone book sin the world,
this is still more the case!

So theer will in the end be a directory system not unlike X.500 which
will allow you to find who has the nearest copy of a document you
want, in a fairly sophisticated way. And you will pick up up from
that place. Then you will click again and pick up a reference from
somewhere else.

An important feature of HTTP is that the document is returned with
the minimum number of round trips. (Sorry for all the people who have
heard this before). Conection-oriented protocols like WAIS and NNTP
have an introductory dialogue which slows down the first fetch by n*
the distance/speed of light.

We probably need horses for courses -- there is nothing wrong with
keeping a few protcols around optimised for different access
profiles.

(BTW I think there is a need for a point-point low bandwidth protocol
designed for beating the hell out of a phone line. One that will keep
the phone line occpied in a very inteligent way with look-ahaed
fetches of related documents and lists or parts of them so that a
home user with a big disk can explore with optimised ease when he is
paying by the minute. Another good student project)

> It would be painful to go reap all the references that
> point to expired articles, although if a user traversed to an
expired
> article, perhaps it could be pulled off of tape or an NNTP
superserver

> somewhere.
>

> Clearly the authors of WWW think news is important because WWW has

> nice capabilities for accessing NNTP servers. What, then, is the

> motivation for HTTP as opposed to, say, using news with HTML
article

> bodies?

I hope I've showed that broadcast data can't cover the NIR world. But
I also hope that we can allow the models to converge and create a
supermodel which encompases them. This is the end goal of HTTP2 or
should we call it NNTP3.