************************************************************************
SUMMARY: all we really need, standards-wise, is a new 300-level HTTP
response, "Contains".
************************************************************************
Several good examples have been brought up of files that can be comprised of
segments, where each of those segments is a valid file of the same data-type,
as an argument for this proposal. However, in almost all of the examples,
there were only *specific* byte ranges which would work, in which the
requested object would really be usable. Thus, for most of these examples,
you could just ask for "parts 0-3" or "2-5" or "3-end", and the right thing
would happen. In only one of the examples was *true* random access
necessary, and that was to resume downloading of a file if it was interrupted
part of the way through. Keep this example off to the side for the next few
paragraphs.
Instead of thinking about one URL that represents a collection of objects,
why not give each object their own unique URL, and devise a way of addressing
a collection of URL's? This is similar to byterange, but more general.
Let's say somewhere a mapping takes place that translates URL1 into a
container for URL2, URL3, URL4, etc. I have a hunch this is URC/URI
territory, but I don't know enough yet about the specific URC proposals
floating around yet to know if this is already being considered.
So, it works like this:
Client asks for URL1. URL1 gets mapped at a server somewhere into a composite
body whose parts are URL2, URL3, and URL4. * If it doesn't find a place to
either inline or link URL3, URL4, etc., it's up to the browser to figure out
how to represent that "auxiliary" file. Maybe it just keeps it around until
it can be represented later.
Caches work just as they always have. If they can cache that container
mapping, so much the better. The important thing is that URL2, URL3,
URL4, etc., can be ANYTHING THEY WANT TO BE - there's no need to give
them some sort of formal syntax, caches know from the mapping from URL1
how they assemble together. If the server prefers knowing them as
byteranges, it doesn't matter. I.e., we can have
http://host/path/file
is-a-container-for
http://host/path/file;byterange=0-30
http://host/path/file;byterange=31-60
or
http://host/path/file
is-a-container-for
http://host/path/file?part1
http://host/path/file?part2
or even
http://host/path/file
is-a-container-for
http://host/path/file2
http://host2/path/script
ftp://host3/path/file3
and either way the client or proxy will know when it has the whole
object, or just its parts.
Finally, this also allows "parts" to be members of more than one
container, something none of the byterange proposals had considered. I
think this is a good thing, can anyone think of a situation where this
isn't? In fact they can even be on completely separate servers.
Yes, THIS REQUIRES CHANGES TO BROWSERS AND SERVERS. Minimally. Why
are we so afraid of that?
There are a couple really good side effects now that I think about it.
For example, right now Netscape's progressive-rendering algorithm has to
wait until it recognizes a reference to an inlined image before it can
start grabbing it. If it could be told that "URL1 contains this HTML
page and these inlined images" then it could possibly be more efficient
in what it does. Additionally, a content provider could "bundle" icons
with one page that weren't necessarily inlined on that page, but which
are used by subsequent pages, so that when visitors go to that subsequent
page, the icons are already loaded.
I can give plenty of examples of how this could work for just about
every application discussed so far. It would seem to be pretty
straightforward for servers to generate these mappings for a large PDF
file, presuming there's some way for it to query the PDF file to know
where it can be segmented.
So, now, back to the resume-downloading-at-point-x. This is
semantically a much different operation than "give me part x",
so let's just give it its own request header:
Startbyte: 204567
...would mean start the post-response-header transmission at byte 204567 into
the response, counting from the end of the response headers (\r\n\r\n, or
\n\n). Who cares if this is a CGI script or actual file, eh? :)
********************************************************************
So, I suppose in the end I'm proposing a new 300-level HTTP header,
something like
305 Contains Mapping
o Following: anything
o Required Headers: none
The server returns an HTTP object comprised of a newline-delimited
list of URI's which this URL is said to "contain". The client is expected
to fetch these URL's and plug them together, representing this
requested URL as the canonical URL for this collection. The other HTTP
headers on this object apply *only* to this object, and this response
should be cached where possible.
*******************************************************************
*Feedback*, please. I hate having all these ideas and no time to
implement them in a browser (though I'd be happy to implement this on the
server side in Apache).
Roy? Dan? Henrik?
Brian
--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
* - Order is insignificant - a browser first starts rendering URL2 and looks
for where to start plugging in URL3, etc, but that should just be an
optimization, browsers can plug things together however they wish. Some
network-aware file formats like VRML already have the concept of nesting
inlines, which HTML doesn't have (yet), so that order could to be
created by a depth- or breadth-first traversal of the scene to aid
rendering, but in a real directed graph that's not necessary.