Proposition on advanced URL features (Request for comments)

Mirsad Todorovac (tm@rasips1.rasip.etf.hr)
Mon, 27 Nov 1995 21:59:36 +0100 (MET)


This is not an RFC, just an idea how to extend standard on URL's promoted in
RFC 1738.

Introduction
------------

There is a rapidly growing number of documents which are generated
automatically, by programs. Although, many documents get generated by
online CGI scripts. Often there is a need to reference particular part
of a large document, which is created by a program we don't want to change,
or we don't own it. There are documents which we do not own, and we want to
reference certain parts of them, but they do not have <a nametags where we
would like them to be, and there is nothing we can do about it.

Here we propose an extension of the standard of URL's, which is easily
implemented in existing browsers, and doesn't harm those without such imple-
mentation.

A proposition of similar extension to HTTP protocol is in development.

Extension of URL Syntax
------------------------

Existing syntax of the URL allows constructs like this:

http://<host>:<port>/<path>#anchor?<searchpath>

Exapmle:
http://www.foo.bar/path/doc.html#part1
makes browser jump to anchor <a name="part1"in //www.foo.bar/path/doc.html .

We propose to extend posibility of the # addressing in http URL scheme in
following ways:

1. URL doc.html##123 should mean:
"Start display of rendered document at line 123 of resulting
document text (not the source).

2. URL: doc.html##H2.3
"Set top of display to third heading of level 2 in document
doc.html."
URL: doc.html##P.4
"Set top of display to fouth paragraph."

3. URL: doc.html##/foobar/[n]

"Set top of display to the first [n-th] occurance of word
(desirable regexp) 'foobar' inside rendered doc.html".

An Example
----------

Please note use of #/ extension in this case:

<PFor detail information on HTTP URL scheme, look into
<a href="http://www.rasip.fer.hr/cgi-bin/rfc/rfc1738.txt#/[Page+8]/">RFC
1738, pp. 8.</a.</P>

Obiously, this feature would save an overhead of finding page 8 ourself,
which results in much faster access to desired information, and less
disturbing of mainstream information flow in main document.

Suppose (as it is true in this case) that you do not own the document being
pointed to. The extension proposed allows you to save the person who reads
your document the effort of jumping 8 (possibly tens or hundreds of) pages.

Note: Standard '+' escape is used to escape space in search string.

Portability Considerations
--------------------------

Existing browsers which support '#' document positioning wouldn't break, but
start displaying from document top, as they would anyway, when they don't find
anchor e.g. "part1" in document being retrieved.

The feature is usefull in both textual and full graphical browsers. It does
fit into URL scheme currently being used. It's use in other URL schemes may
slightly differ, although the search and linenumber (3. and 1.) schemes seem
to be most obvious implementations of desired effect, and they may be
preserved in the form proposed for extended HTTP URL scheme. (It's obvious
that ##H and ##P constructs are HTML dependent.)

Security Considerations
-----------------------

No security issues are believed to be introduced in this document, which
aren't already discussed in RFC 1738.

References
----------

[1] T. Berners-Lee, L.Masinter, M. McCahill, "Uniform Resource Locators
(URL)", available as hypertext on
<URL: http://www.rasip.fer.hr/cgi-bin/rfc/rfc1738.txt >

[2] Berners-Lee, T., "Universal Resource Identifiers in WWW: A
Unifying Syntax for the Expression of Names and Addresses of
Objects on the Network as used in the World-Wide Web", RFC
1630, CERN, June 1994.
<URL: http://www.rasip.fer.hr/cgi-bin/rfc/rfc1630.txt >

Author(s)
---------

Mirsad Todorovac
World-Wide Web project
The RASIP Group
Faculty of Electrical Engineering and Computing Sciences
University of Zagreb
Croatia, 10000

Tel. + (385) 1 6129-842
Fax. + (385) 1 6129-809
e-mail: mirsad.todorovac@fer.hr