Re: pragma no-cache -- Can we make it more useful?

Ken Fox (fox@pt0204.pto.ford.com)
Thu, 14 Jul 1994 09:12:39 -0400 (EDT)


> > To use the technique that I think you are suggesting, the viewer would have to
> > send a HEAD request to get the document change date. Then the viewer would
> > have to look at the response and:
>
> No, not the viewer, the proxy. And I wasn't suggesting it, I thought I was
> describing established practice. If this is not correct, someone tell me please.
>
> So, conceptually, what I was describing goes:
>
> client GETs document, but this is routed via proxy cache
>
> proxy HEADs document at original server
>
> if (original_mod_date <= cached_mod_date) and not expired(cached_document)
> then
> return cached_document
> else
> GET new original document
> put it in cache
> return cached_document
> endif

I thought the expired date is used like this:

client GETs document, but this is routed via proxy cache

if expired(cached_document)
then
proxy HEADs document at original server

if (original_mod_date > cached_mod_date)
then
GET new original document
put it in cache
endif
endif

return cached_document

> The HTTP protocol was therefore extended to contain an If-modified-since request
> header, making it possible to do a conditional GET request.

This is an excellent optimization. I'd also like to see a "standard"
document that a proxy can request that will return a list of all documents
modified since some date. Sites that implement it would see a lot less
traffic from proxy servers. Sites that don't aren't penalized.

The document could be named "/changes.txt" or maybe "/cgi-bin/changes". It
would probably be computed on-the-fly from a document database. I wouldn't
expect anybody to do it with a file system traversal, but that is certainly
possible.

This proposal is very similar to the standard "/robots.txt" document that
robots/spiders/mirrors/etc. use to behave nicely.

Assuming all of these optimizations:

client GETs document, but this is routed via proxy cache

if expired(cached_document)
then
proxy GETs/If-Modified "/cgi-bin/changes" at original server

if (original_mod_date > cached_mod_date)
then
proxy GETs/If-Modified document at original server
put it in cache
endif
endif

return cached_document

This algorithm works against all servers --- not just the ones implementing
/cgi-bin/changes. It's an especially big win on servers with many
documents, or servers that have documents composed of many sub-documents ---
anytime it is likely that a client will GET more than two expired documents
from the same original server in the same session.

> "pragma: nocache gets the original version without going via the proxy. This
> might be put on the reload button"

This is a good idea --- but instead of *not* going through the proxy, maybe
there should be a way to force the proxy to get the original document?

> "proxies do not cache CGI documents and protected documents"

How do you tell the difference?! Unless a CGI script is located in /cgi-bin
or the server returns an expiration of zero, there isn't much information a
proxy can use to tell the difference. I posted a long list of problems I
had with mirroring a few months ago. This was one of my major problems.

- Ken

-- 
Ken Fox, fox@pt0204.pto.ford.com, (313)59-44794
-------------------------------------------------------------------------
Ford Motor Company, Powertrain    | "Is this some sort of trick question
CAD/CAM/CAE Process Integration   |  or what?" -- Calvin
AP Environment Section            |