Registrar - a URN registry service

Rob Raisch (raisch@ora.com)
Wed, 7 Jul 1993 17:37:29 -0400 (EDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Michael Mealling: "Uniform Resource Modifier: a meta-information encoding scheme"
Previous message: Tony Sanders: "Re: virtual documents"

(NOTE: There are three mailing lists to which this is crossposted.
Apologies beforehand for the extra bandwidth.

o The URI list is the most appropriate for the obvious reasons,

o the WWW-TALK list is included to generate some discussion regarding
the issues involved in the HTTP protocol and what role it should play
in the issues raised here, and

o the COM-PRIV list is added to generate some discussion regarding the
distinction between Intellectual Properties and Products.

Thanks for your patience.)

Here is registrar. Comments are very welcome, and feel free to play. The
next document explains the distribution of registrar servers (a few days),
and then a paper describing the sonar (repository availability) protocol.

-----------------------------------------------------------------------------
Quick synopsis:

Registrar is a 'product' registry which serves various pieces of
information when given a unique URN. One of the returned 'attributes' is
a 'product instance record' which contains a URL, content type, content
encoding, content size, access authority, billing authority, and cost records.

Registrar is available on port 99, server 'ruby.ora.com' and
offers help information upon receiving a HELP command, eg.

% telnet ruby.ora.com 99

HELP
-----------------------------------------------------------------------------

Registrar -- Resource Registration Service

Robert Raisch
manager, online services
O'Reilly & Associates
90 Sherman Street, Cambridge MA 02140

Assumptions:
-----------

This document assumes that the reader is conversant in the form and
function of Uniform Resource Locators (URL). It would also be very helpful
if the reader were at least aware of the URI working group, and its efforts
to identify some of the issues addressed in this document.

Scope:
-----

This document discusses an implementation for a Uniform Resource Name
or Notation (URN) server. It describes how the URN can map to useful
information associated with a unique product, and provide the location of
instances of this product on the network. This information can be used to
automate retrieval of such products from multiple repositories.

Definitions:
-----------

Instance - an existing specimen of an product which is
indistinguishable from another instance of the same product based
on the declaration of its owner.

Product - a Product is any information declared by its owner
to be unique and available. That product might be available in
different formats or encodings, and distributed in different
repositories.

Registrar - a service which maps a unique URN to zero or
more Product Instance Records, which contain URLS. It is also a
service which caches other important information which is unique
to a product.

Uniform Resource Name (URN) - a URN is a notation that
uniquely identifies a product. The actual form of a particular
URN is up to the authority which maintains responsibility for
that variety of URN, and this document talks about one possible
form which meets the current needs of the author. URNs take the
form: authority:opaque_data, and that the authority discussed
here is called 'registrar.'

Uniform Resource Locator (URL) - a URL is that information
which allows the retrieval of a particular instance of a product.

Product Attribute (PA) - a Product Attribute is some piece
of information which can be attached to the declaration of a
product, and retrieved from a product registry server.

Product Instance Record (PIR) - a Product Instance Record
contains the information particular to a specific instance of an
Product.

Issues:
------

The Uniform Resource Locator (URL) contains information required to
retrieve a single instance of a network resource. It contains the name and
location of the instance, as well as the proper method used to retrieve it.

While this is useful information once the decision to retrieve the
instance has been made, it does not address the broader and more
complicated issues of whether or not we should retrieve the instance in the
first place, and whether or not we can use the instance once we have
retrieved it.

The information in the URL is insufficient to allow us to make this
decision and so, we must look elsewhere to satisfy our needs.

Thus, the primary concept behind the REGISTRAR server is to provide
enough information about a particular product so that a number of a
decisions can be made regarding its accessibility and value.

Currently, some of the information required to make an 'appropriate
retrieval decision' is available, but much is based on the assumption that
the agent which makes the retrieval has this information before the actual
URL is used. In most cases, this is information which the user possesses.
The user may understand that retrieving an instance from a '*.ac.uk' domain
would be less efficient than getting it from '*.berkeley.edu.' based on her
understanding or assumptions of the underlying structure of the network.

Assuming that we have already made the decision to retrieve a
particular product from the network, we will need the following information
to decide where we can retrieve it from, and whether or not we can use the
instance once we retrieve it:

- Is the instance available via a retrieval mechanism we can use?
(Instrumentality)

- Is the instance available from a source (server) to which we have
access? (Availability)

- Is the instance of a type which we can use? (Type)

- Is the instance in a form which we can use? (Encoding)

- Is the instance small enough to save and manipulate on our local
system? (Size)

- Are we allowed to retrieve the instance? (Access)

- If the instance is only available for a fee, can we pay for it?
(Billing)

- If the retrieval of this instance is billable, can we pay for it in
a currency which we use? (Payment)

Instrumentality -

The instrumentality issue is addressed by that portion of the user's
application which allows or facilitates retrieval. If the engine does not
support retrieval using a particular protocol or service, the application
will, no doubt, inform the user.

Availability -

When we request the instances associated with a URN, we will be
presented with a list of those sites which store those instances. This
list, however does not address whether or not we actually have physical
access to any of the listed sites.

Whether or not a particular instance of a product is available, in
terms of the availability of the repository site, is an important issue
relating to the question of retrieval. If the instance is available on
multiple repositories, we should have access to enough information to be
able to make the 'best' retrieval decision.

Best in this context refers to size of repository hardware (its
'power'), its current load, how long it takes to return a request ('ping'
time), and how many network 'hops' a request must traverse.

There is another protocol, SONAR - currently in prototype - which
answers this issue. We can assume that SONAR provides enough information
to the REGISTRAR server so that when it returns a number of PRODUCT
INSTANCE RECORDS, those records are in a sorted order (best first, worst
last) in terms of their suitability as 'appropriate' sites from which to
retrieve a product.

NOTE: SONAR is not meant as a 'user' protocol, (in the sense that a
client program interacts with it, as an agent for a user), as REGISTRAR is.
Rather it is an 'inter-server' protocol, used only between REGISTRAR
servers.

Type -

If an instance is in a native format or type we do not support,
retrieving it is of little value. Native format is that form which is used
directly by a application; eg. ascii, postscript(tm), bitmap, etc.

Encoding -

If an instance is in an encoding which we are unable to render back
into its native type or format, the instance is of little use. (Unless we
can contract with a service which does the conversion for us?) This is the
issue of compression and conversion into a form more appropriate for
network delivery, eg. unix-compress, uuencode, etc.

There is no information included in the URL which deals with either
issue. Historically, the question of applicability of a certain encoding,
or the availability of the required program to uncompress an instance has
been handled by the user of the application. The user has made the
decision to retrieve a particular instance based on her knowledge of its
usefulness once it has been retrieved.

This state of affairs is becoming increasingly intolerable since the
user can and should no longer be called upon to make these distinctions.
As the user base increases, (mostly in the low end of network saavy or
expertise), there will be more of a need for agents or services which can
make these decisions for the user.

Size -

If an instance is too large to cache locally, and cannot be retrieved
in pieces, it is of little value. Information related to the size of each
particular instance is needed to make an appropriate retrieval decision.

Access -

If we have decided that we can use an instance, we still must find out
whether or not we have permission to access that instance.

(To be completed later.)

Billing -

If we have permission to access an instance, assuming that the
instance is only available to those who can pay for it, we must next find
out if the billing authority which maintains control over the instance will
accept payment from us?

(To be completed later.)

Payment -

If we can pay, what will we pay?

(To be completed later.)

The Uniform Resource Name:
-------------------------

The Uniform Resource Name is a single, unique identifer for an
abstract product.

The following rules apply to URNs:

- Once created, a URN can never be destroyed.

- The actual encoding of the URN, (how it looks), is
completely immaterial to its function. The actual
content of a URN is that to which it refers.

- URNs are *never* created 'on the fly.' A URN is provided
as a pointer to a product when that product is registered
with the authority responsible for its existance. Humans
never make URNs, servers do.

Implementation:
--------------

There is a prototype REGISTRAR server operating on ruby.ora.com, port
99. It supports all of the features previously identified, as well as a
number of useful additions, such as keyword searching among products and a
test interface to the local SONAR server.

The server is a standard TCP session, similar to the 'finger' service,
and can be accessed via the 'telnet' program.

The command/response structure is simple, and it should be quite easy
to write clients for it. Its general rules are

o Requests to the REGISTRAR server are in ASCII, and are delimited
with CR/LF.

o Requests to the server are either commands or URN / ATTRIBUTE
pairs.

o Commands which the server understands are:

HELP -- returns a '.' delimited list of available commands.

DEBUG -- toggles debugging output from the server.

LIST -- lists a '.' delimited list of registered URNs.

SEARCH [keyword] -- returns a '.' delimited list of URNs which
contain the keyword.

QUERY [server] -- returns a single line of information
(from SONAR) which lists certain data
about the mentioned server.
(EXPERIMENTAL - NOT ACTIVE)

QUIT -- Ends the session.

o URN / ATTRIBUTE requests are used to retrieve particular
attributes from a product record. Without an explicit ATTRIBUTE,
the INSTANCE attribute is assumed. Thus, these are valid
requests

registrar://ora/category/item:version
returns the INSTANCE attributes of the product

registrar://ora/category/item:version CREATOR
returns the CREATOR attribute of the product

registrar://ora/category/item:version ALL
returns all of the available attributes of the
product, including the DESCRIPTION property which
is otherwise unavailable.

o Responses all begin with a numeric, in the following form:

0xx -- Command failed.

1xx -- Command succeeded.

o Any response which begins with a dash ('-') is a comment or a
debug or help message and can be safely ignored by the client.

Typical Sessions:
----------------

Request server HELP information
-------------------------------

server: Registrar URN Service -- version 0.5 (raisch)

client: HELP

server: --DEBUG -- enable copious output
server: --LIST -- list all registered URNS
server: --SEARCH (keyword)+ -- search for a keyword
server: --QUERY (server)+ -- query the status of a remote server
server: --
server: --<URN> ((ATTTRIBUTE)* | ALL) -- URN is in the form:
server: -- authority://domain/category/item:version
server: -- authority = 'registrar' (this service)
server: -- domain = 'ora' (others available)
server: -- category/item:version = product designator
server: --
server: -- ATTRIBUTE is zero or more attributes (default: INSTANCE)
server: -- ALL returns all defined attributes
server: -- including DESCRIPTION (full text description)
server: -- which is otherwise inaccessible
server: --
server: -- Format of the INSTANCE attribute:
server: -- ( URL --Uniform Resource Locator
server: -- ENCODING --TEXT,PS,TEX,GOPHER,HMTL,etc.
server: -- COMPRESSION --UNIX,ARC,ZIP,etc.
server: -- SIZE --in bytes
server: -- ACCESS_AUTHORITY --who grants permission to retrieve?
server: -- BILLING_AUTHORITY --who do we pay?
server: -- [COST]* -- (MONETARY_SYSTEM AMOUNT)
server: -- ) Ex: (UK_POUNDS 15.0)
server: --
server: --QUIT -- exit gracefully
server: .

client: QUIT

Request list of URNs on this server
-----------------------------------

server: Registrar URN Service -- version 0.5 (raisch)

client: LIST

server: <registrar://ora/nutshell books/Learning GNU Emacs:2.0>
server: <registrar://ora/magazine/Global Network Navigator:0.0>
server: .

client: QUIT

Request instances of a URN
(whitespace inserted to improve readability)
--------------------------

server: Registrar URN Service -- version 0.5 (raisch)

client: <registrar://ora/nutshell books/Learning GNU Emacs:2.0>

server: INSTANCE: ( gopher://gopher.ora.com/top_menu
GOPHER
NONE
320
)
server: INSTANCE: ( gopher://amber.ora.com/top_menu
GOPHER
NONE
320
)
server: INSTANCE: (ftp://ftp.../published/oreilly/books/gnu.txt.Z
TEXT
UNIX
16443
NONE
O'REILLY
(US_DOLLARS 20.0)
(CAN_DOLLARS 25.00)
(UK_POUNDS 15.0)
}
server: INSTANCE: (http://ftp.../published/oreilly/books/gnu.html
HTML
NONE
32768
NONE
O'REILLY
(US_DOLLARS 20.0)
(CAN_DOLLARS 25.00)
(UK_POUNDS 15.0)
)
server: .

client: QUIT

Format of a URN:
---------------

authority://domain/category/name:version_major.version_minor

o authority is the descriptor which defines the format of the
following fields.

o domain is a reference to the responsible entity which maintains
all members of a particular name space. (NOTE: Based on the
transience of hostnames and domains in the Domain Name Service on
the Internet, this is not to be assumed to represent a hostname
or domain. We assume that the actual host or hosts which support
a particular domain would be kept in a 'top level' domain
authority, registered with the proper authority (IANA), which
would be queried and cached to retrieve the proper host to
contact when a request for information is made to a particular
name space or domain of resposibility.)

o category is a method of defining seperate sub-name spaces within
a particular domain.

o name is the actual official name of the product in question, and

o version_major and version_minor reference a particular version of
a unique product. If the version is left off of the information
request, the request is assumed to refer to the 'current' or most
recent version of the product.

example:

registrar://ora/nutshell books/Learning GNU Emacs:2.0
^ ^ ^ ^ ^ ^
| | | | | |
authority-+ | | | | |
domain----------------+ | | | |
category------------------+ | | |
name-------------------------------------+ | |
version_major-----------------------------------------------+ |
version_minor-------------------------------------------------+

URN Record Format:
-----------------

Write-Once Attributes

NAME {1}
DOMAIN {1}
CATEGORY {1}
VERSION {1}

OWNER {1}
ADMINSTRATOR {1}

CREATED {0,1}
REGISTERED {0,1}

AUTHOR {1,N}
EDITOR {0,N}
PUBLISHER {0,N}

KEYWORDS {1}
SUMMARY {0,1}
DESCRIPTION {0,1}

Editable and User Defined Attributes

LAST_ACCESS {1}
INSTANCE {0,N}

ANIMAL {0,1}

{1} = Only One
{0,1} = Zero or One
{0,N} = Zero or More
{1,N} = One or More

Example:
NAME: Learning GNU Emacs
DOMAIN: ora.com
CATEGORY: nutshell books
VERSION: 2.0

OWNER: O'Reilly & Assoc. <ora@ora.com>
ADMINISTRATOR: Robert Raisch <raisch@ora.com>

CREATED: 20 June 1993
REGISTERED: 20 June 1993
LAST_ACCESS: 20 June 1993

AUTHOR: Debra Cameron <debra@ora.com>
AUTHOR: Bill Rosenblatt <bill@ora.com>
EDITOR: Mike Loukides <mikel@ora.com>
PUBLISHER: O'Reilly & Assoc. <nuts@ora.com>

ANIMAL: Gnu

KEYWORDS: book tutorial editor gnu lisp
SUMMARY: Tutorial on the GNU Emacs Editor

INSTANCE: ( gopher://ora.com/top_menu
GOPHER
NONE
320
)
INSTANCE: ( gopher://amber.ora.com/top_menu
GOPHER
NONE
320
)
INSTANCE: (
ftp://ftp.uu.net/published/oreilly/books/gnu.txt.Z -- URL
TEXT -- ENCODING
UNIX -- COMPRESSION
16443 -- SIZE
NONE -- ACCESS
O'REILLY -- BILLING
(US_DOLLARS 20.0) -- COST RECORD
(CAN_DOLLARS 25.00)
(UK_POUNDS 15.0)
)
DESCRIPTION:

[TEXT DELETED]
.

Property Instance Record Format (PIR):

URL - Uniform Resource Locator
CONTENT TYPE - (See Instance Type)
CONTENT ENCODING - (See Instance Encodning)
SIZE - Size of the Instance in Octets
ACCESS_AUTHORITY - (See Instance Access Authority)
BILLING_AUTHORITY - (See Instance Billing Authority)
(MONETARY_SYSTEM COST) - (See Instance Cost Record)

Instance Type:

(mime types are, of course, appropriate here.)

ASCII - Ascii Text
PS - Postscript(tm)
TEX - TeX
NROFF - Unix NROFF
TROFF - Unix TROFF
EQN - Unix EQN
GIF - Compuserve GIF, graphic
TIFF - Amiga TIFF, graphic
JPEG -
GOPHER - UMinn Gopher Menu
WAIS - WAIS query
HTML - WWW HTML document
AIFF -
AU -
MPEG -

Instance Encoding:

COMPRESS - Unix compress/uncompress
GNU - Gnuzip
ARC -
ZIP -
HQX -
UUENCODE -

Instance Access Authority:

None Defined. - O'REILLY is a reserved value.

Instance Billing Authority:

None Defined. - O'REILLY is a reserved value.

Instance Cost Record:

MONETARY_SYSTEM ex: US_DOLLAR, MEX_PESO, UK_POUND, CAN_DOLLAR
AMOUNT ex: 15.0

Comments:
--------

The most important issue addressed in this document has to be the
requirement of the current Internet community that individual intellectual
properties be uniquely identifiable and that multiple instances of the same
product be indentifiable as such. Without this capability, the Internet
will continue to labor under the limitation that the user is unable to make
appropriate retrieval decisions, and will continue to use bandwidth
needlessly. An example of this is the current assumption that two files on
the Internet are exactly the same, based on the implicit information
carried in their names. (foo.tar.Z and bar.arc *might* represent the exact
same information and the user has no method of telling.)

While there is considerable work being done to identify the
characteristics of "Intellectual Properties", the author takes the stance
that the whole concept of intellectual property is a legal construction to
protect the rights of the author.

Intellectual properties do not exist except as the right or license to
create products. The owner of an intellectual property is not making the
property itself available by publishing it on the network. The owner or
the owner's agent is making products available which are based on this
property.

As such, whether or not a particular file or resource on the net is or
is not an intellectual property is not relevant to the issues presented in
this paper.

Once a publisher makes one or more products available on the network,
it is the publisher's decision whether or not one product differs from
another, and any attempt to formalize this characteristic farther than this
is not useful to the task at hand.

The other issue is the fact that there are a number of characteristics
of a particular product which are required to make the retrieval decision.
If the file is encoded in Postscript(tm) and the local system does not have
the required technology to render that file, any retrieval of that file
would be in vain. The assumption that all the important details can be
implied from the filename is a very inapproprate one, based on the simple
fact that various systems have differing methods of naming the same file.
A Unix server might represent the file as foo.tar.Z, while a DOS system
might conceivably name the same file 'footar.arc', or a VMS system might
name the same file 'foo_tar.Z,123'
-----------------------------------------------------------------------------

Next message: Michael Mealling: "Uniform Resource Modifier: a meta-information encoding scheme"
Previous message: Tony Sanders: "Re: virtual documents"