Re: Broken links, are we ever going to address them?

Robert Robbins (rrobbins@gdb.org)
Mon, 23 Jan 1995 15:00:19 +0100


On Mon, 23 Jan 1995, derek wrote:

> I believe that in the long term we *have* to have some mechanism for finding
> and correcting broken links.

There should also be some thinking about the inevitability of processes
that create broken links and the need for some re-engineering to reduce
the effects of those processes.

Right now, one of the great features of WWW is that a single copy of a
document is available to readers all over the world. This is both a
feature and a bug. The feature aspects are obvious. The bug aspects
derive from the single point of failure that results. So long as WWW
publishing is based upon authors providing acess to a single copy of a
document, WWW publishing will remain the electronic equivalent of "copies
of the manuscript are available from the author upon request."

Let's consider some numbers:

Let N = the total number of WWW documents on the planet
M = mean time between failure, in days (i.e., M = the number
of days that the average WWW document remains accessible at
the same URL)
L = the average number of links pointing to a typical document

Then, on a typical day, the number of links that worked yesterday but that
don't today is given by N*(1-M)*L. Put in some reasonable estimates for
the future, such as:

N = 100,000,000
M = 1,000
L = 100

and the whole web experiences 10,000,000 new broken links every day. And
these breaks are due only to the actual movement or loss of html docs on
their servers. Typos as URLs propagate are ignored.

What's the point? The point is that as WWW matures, we are likely to see
the emergence of "publishers" and "libraries" who provide the electronic
equivalent of some of the functionality provided by paper publishers and
libraries: stability of access. As we endorse the power of electronic
publishing over paper publishing, let us not forget that there is a lot of
important functionality, not necessarily related to the use of paper,
embedded in the traditional publishing process.

To obtain access to a copy of a published paper book, I contact new and
used bookstores and libraries. If the book was published fairly recently
by a fairly large publisher, I can probably get access within a reasonable
amount of time. The reference "Books in Print" is consulted to see if the
book is available from a publisher. If not, the local library or inter-
library loan can be used.

Although these processes are currently related to paper publishing, they
are not absolutely wedded to the use of paper. Instead, they represent
several layers of services and meta-services that assist readers access
publications. This entire layer of infrastructure is almost wholly
lacking in the current Web world. The "middlemen" in paper publishing do
not merely skim money while delaying the communication process. Instead,
many provide truly important services.

Some interesting technical challenges lie first in figuring out which of
those services need replicating (and improving) in the world of electronic
publishing, and then in implementing the solutions.