I've been meaning to write up an RFC on how DynaWeb handles large
files. As I've said, DynaWeb breaks a document into parts based on the
structure of the data. In particular, DynaWeb does runtime conversion
from SGML to HTML, and the smallest addressable part of a document in
DynaWeb is a single SGML element.
As you all probably know, an SGML document basically forms a
heirarchy of nested elements, or in other words, a tree. Filesystems,
in general, are also trees. It seemed natural to me to use the same
*type* of URL for files, and for sub-document addressing.
As such, DynaWeb actually supports 3 sub=document addressing modes,
which are pretty much taken straight from the TEI guidelines:
http://www.ebt.com/collection/book/doc=1/chap=2/sect=3
http://www.ebt.com/collection/book/1/2/3
http://www.ebt.com/collection/book/1
The first form accesses elements in the heirarchy by *typed* child
number, the second form accesses elements based on child number,
irrespective of type, and the last is a direct element address. In
practice, because few people ever access the server except by
browsing, the last form can be used in most cases. I would like to
argue that such an addressing scheme is applicable to many other types
of data as well.
As I said before, my real problem with byte-ranges is that generally,
they don't make sense. Ranges of *parts* does make sense however. One
other problem I have is that the format of a URL should really be
application dependent, so why make recommendations for cases where it
is meaningless? Let's leave it to the application (ie. the server),
until we are ready to design a far more general linking mechanism.
Loot at http://www.ebt.com/ to see how DynaWeb works.
PS. I should note that the above naming scheme is very, very useful in
our case, but it drives spiders wild....