WWW meta indexes (proposal)

Tony Sanders (sanders@bsdi.com)
Mon, 25 Oct 1993 12:49:49 -0500


WWW Indexing
============

Ok folks. It's time we got busy and made this better. Here is a proposal
for a simple site definition file. Let's hash out some of the issues and
then do it. The sample file is on my server right now
(http://www.bsdi.com/site.idx).

What we need to accomplish
--------------------------
1) agree on the filename of the site.idx file
2) agree on the format of the file (either "foo: data" or something else).
3) agree on the initial content and semantics of the index file
4) setup an email address where people can send registration forms
(these don't have to be processed right away, yet).

Constraints
-----------
1) The data format must be extensible (need I even say it)
2) It must be simple enough that we can get started soon
3) It must allow for meta-indexing other protocols in the future
4) the database must be distributed (so you can do the search on
a nearby site).

What we will need next
----------------------
1) software to accept and process registration forms (via email)
2) software for updating registration (a robot)
3) software for building the indexes (wais?)
4) software for searching the index and a site to host it

I believe that the above is all fairly easy.

Let the indexing begin!
-----------------------

This document is a proposal. Discussion to take place on:
www-talk@info.cern.ch
Or send electronic mail to Tony Sanders _<sanders@bsdi.com>_

The latest version of this document_ is available online at:
http://www.bsdi.com/HTTP:TNG/www-indexing.etx

To get the process of a WWW global index started I would like to propose
the following for a site registration file format. This data should
be accessible on your server as http://server/site.idx_

To jumpstart the registration process you will have to email one of
these to some address yet to be determined (thereafter, your file will
be occasionally updated by an automated retrieval process). Of course,
you can always email in a new one if something important changes.

We can extend the syntax later to include pointers to other resources.
WWW-wondering-robots would use this file to determine the server's
preferences for indexing. For example, we could add a field "wwwwr:
never" (or "0000 / 2400" for always). If you would like additional
information to be indexed we could invent a tag that points to those
documents (or whatever we want to do).

I believe this covers the basics and sufficiently allows for future
extension.

First an example, then I will explain each field
(this file is http://www.bsdi.com/site.idx_):

Name: www.bsdi.com:80
Organization: Berkeley Software Design, Inc
Organization-Type: Commercial software developer
Contact: Tony Sanders
Postal-Address: 3110 Fairview Park Dr, Suite 580;
Falls Church, VA 22042
Electronic-address: webmaster@www.bsdi.com
Telephone: +1 800 800 BSDI
Location: Fairfax County, VA, USA
Latitude-Longitude: 77 12 00 - / 38 51 37 +
Timezone: -0500 (Eastern Standard Time)
Written-By: sanders@www.bsdi.com (Tony Sanders);
Mon Oct 25 11:39:14 CDT 1993
Access times: 0000 / 2400
Policy: None
Description: This site contains public sources and information
related to BSDI's software products (eg: BSD/386).
Currently all sources are for publicly contributed
BSD/386 utilities.
Keywords: BSD, OS, source, berkeley, BSD/386, BSDI
Index: /info/ BSDI and BSD/386 Information
Index: /bsdi-man/ BSD/386 hypertext manual pages
Index: /official_patches/ BSDI 1.0 Official Patches Archive

Continuation lines begin with white space.
Case is only significant in data that requires it (e.g., inside URLs).

The following isn't a complete specification, but I think it's enough to
get us started. Most of this is stolen from other formats.

Name
----
Server name (including an option port number).

host[:port]

Host is a fully qualified domain name or a dot-quad ip address.
port should be a numeric. For example:
Name: www.bsdi.com:80

Organization
------------
Organization name. For example:
Organization: Berkeley Software Design, Inc

Organization-Type
-----------------
A general classification of what you do. For example:
Organization-Type: Commercial software developer

Contact
-------
Name of a human to contact. For example:
Contact: Tony Sanders

Postal-Address
--------------
Postal address. For example:
Postal-Address: 3110 Fairview Park Dr, Suite 580;
Falls Church, VA 22042

Electronic-address
------------------
Email address contact for the server. For example:
Electronic-address: webmaster@www.bsdi.com

Telephone
---------
Telephone number for contact. For example:
Telephone: +1 800 800 BSDI

Location
--------
General geographical location. For example:
Location: Fairfax County, VA, USA

Latitude-Longitude
------------------
Degrees minutes and seconds, for drawing cute maps. For example:
Latitude-Longitude: 77 12 00 - / 38 51 37 +

Timezone
--------
Offset from GMT and then a textual name. For example:
Timezone: -0500 (Eastern Standard Time)

Written-By
----------
Author of this text, including the last update time. For example:
Written-By: sanders@www.bsdi.com (Tony Sanders);
Mon Oct 25 11:39:14 CDT 1993

Access-times
------------
When the server is available (in local 24 hour time). For example:
Access-times: 0000 / 2400
Multiple entries are allowed.

Policy
------
Any policy statement you wish to make (e.g., the GNN server might
wish to give registration information here). For example:
Policy: None

Description
-----------
A brief description of the server (used for building meta-indexes).
For example:
Description: This site contains public sources and information
related to BSDI's software products (eg: BSD/386).
Currently all sources are for publicly contributed
BSD/386 utilities.

Keywords
--------
Keywords for constrained searches. The words are comma separated,
use "text, text" if you need to embed a comma, but it's best to have
simple words and not phrases. For example:
Keywords: BSD, OS, source, berkeley, BSD/386, BSDI
Multiple entries are allowed.

Index
-----
These are pointers to information indexes that the server supplies.
The first word is a partial URL (relative to the top of the server)
and the rest of the text is used to build the meta-index. For example:
Index: /info/ BSDI and BSD/386 Information
Multiple entries are allowed.

Tony_Sanders_

.. _Tony_Sanders http://www.bsdi.com/hyplan/sanders.html
.. _document http://www.bsdi.com/HTTP:TNG/www-indexing.etx
.. _http://www.bsdi.com/site.idx http://www.bsdi.com/site.idx