Snapshot-date Was: Re: pragma no-cache -- Can we make it more useful?

karl@cavebear.com
Thu, 14 Jul 94 12:41:02 PDT


Before launching into the following (long) reply, the point that I really
want to make is that I would like master servers, but *not* cache servers to
generate the following header line whenever they send a document.

snapshot-date: <date>

Launching...

>This has been discussed before on www-talk, e.g.

> <http://gummo.stanford.edu/html/hypermail/www-talk-1994q1.messages/992.html>

Thanks for the reference. I'll deal with that e-mail at the end of this note.

>The relevent bit being a new header --

>Cache-last-updated:
> should reflect the date/timestamp of last comparison with the original
> (i.e. the Date: header returned by the source document's HTTP server
> in that comparison). Note that this should not be changed by cache
> managers more than once-removed from the origin. This is because the
> act of checking the cache consistency with the original document is
> equivalent to getting a new copy of that document, but the act of
> checking cache consistency with a higher-level cache is only equivalent
> to copying that cache.

Cache-last-updated isn't quite what we need. What is needed is the
time the copy in the cache was separated from the master source.

While it might be nice to know how long something has resided in a
cache, for management or statistical purposes, what is useful to users
is how long it has been since the cache copy has been unlinked (as it
were) from the original.

Now for that prior message....

>> In my cache server ('Lagoon'), few of the provisions in the HTTP MIME header
>> specifications are currently implemented, but I have already noticed the
>> need for more headers. To send information on the cache status of a document
>> to the client, I leave the headers obtained from the remote source intact,
>> and consider adding the following ones:
>>
>> Cache-date: <date>
>> Cache-last-refreshed: <date>
>> Cache-last-modified: <date>
>> Cache-via: <url> [, <url>]*
>>
>> This provides: the time the document was served from the cache in answer
>> to the present request, the time the document was last fetched into the
>> cache, the time it was last fetched and found to be different from the
>> previous version, and the sequence of URLs by which the document was
>> subsequently fetched. ...
>
>My first question is: why does the client need to see these headers?
>In other words, what task do you want the client to do that cannot be
>done without these headers? I will assume for now that the only reason
>is to support hierarchies of cache managers.

Even a single level cache needs to provide information to the client
so that the client can decide whether to accept the copy in the cache
or to cut through the cache and go right to the horse's mouth for an
up-to-the-minute version.

Perhaps that new version is identical to what is in the cache. But at
least the user knows that he/she/it has the most recent document
available.

But some users are willing to accept information which may be slightly
stale. How stale is acceptable can only be determined by the
user/client. The degree of "staleness" is not how long something has
lain in a cache, but how long since the cacached copy has been
synchronized with the master version, in otherwords, how long since
the master server dispensed with the copy in the cache.

This, of course, can be easily represented if all non-caching servers
added the following line when they deliver a document

snapshot-date: <date>

Cache servers must not generate this. It can be generated only by the
actual repository.

A client can fetch the header, look at it and then decide whether
to issue a no-cache GET.

>Assuming that, here is my opinion about the headers listed above,
>
>Cache-date:
> is inappropriate -- the Date: header of the message should
> list the date/time in which the cache manager generated the
> HTTP message (as a whole) for delivery to the client (i.e. it
> should always be the current date/timestamp.

I agree that it isn't a useful piece of information for the most part.
It is useful, however for management purposes.

>Cache-last-modified:
> should be Cache-last-updated: and should reflect the date/timestamp
> of last comparison with the original (i.e. the Date: header returned
> by the source document's HTTP server in that comparison). Note that
> this should not be changed by cache managers more than once-removed
> from the origin. This is because the act of checking the cache
> consistency with the original document is equivalent to getting a new
> copy of that document, but the act of checking cache consistency with
> a higher-level cache is only equivalent to copying that cache.

I'm not sure that this is a useful piece of information either, except for
management purposes.

>Cache-last-refreshed:
> is unnecessary given Cache-last-updated.

Agreed.

>Cache-via:
>
>> ... This last header, which provides a comma separated
>> list of URLs, is required in order for cache servers to break loops in chains
>> of forwarded requests. (Lagoon 0.11a now supports such forwarding, but it
>> doesn't check for loops yet.) The last URL in this sequence is always
>> the URL of the present request (with the '#name' relative anchor suffix
>> removed). All headers are completely optional, of course.
>
>I am not convinced that loops are possible. Could you give us an example
>where a normal (non-psychotic) cache hierarchy could result in a loop?

I've actually generated loops myself with the Cern caching httpd. I accidently
fired it up with an http_proxy environment variable pointing to itself.
It stopped when I ran out of process slots.

Again, I think this is a nice thing to have. It will certainly let people
know when their caching gets too deep.

--karl--