As somebody else mentioned earlier, it's not good to cache a document if
it requires authorisation. However, there still exists the problem in
the web that you can't always know if a document is protected!
Specifically, if a document is being protected by host/client based
authentication. This happens quite a bit, because it's the only
possible way of saying "this document is available to all academic
sites", without requiring every academic user have a username/password.
With such a document, only the server knows that it is protected, and so
the client cannot correctly determine whether or not to cache it.
Exactly the same problem exists with proxy servers (I talked with some
people about this at the Geneva WWW conference...), where the request is
identified as originating from the proxy, and not from the client. Note
that with this problem, it is not the server which is at fault: it is
doing the right thing, limiting the access in that way. It is entirely
the client/cache/proxy's responsibility to identify the correct client
to the server.
Ways to "fix" this problem with caching (proxies can be fixed easily,
but as far as I know, the suggested fix hasn't yet been implemented):
1) check with a HEAD request to the server that it can be accessed (this
also serves to check the freshness of the cache...). Unfortunately, the
semantics of HEAD are bogus, because you can't specify what operation
you actually want to check (e.g. if it's a SEARCH, a GET, or a POST, or
2) somebody work out what the Public-Methods header in HTTP should
really mean, so that the cache could just look at this information.
Unfortunately, this is also bogus in its definition: should "public"
mean someone from the same site? someone from a different site? someone
using the same cache...? someone from the same domain as the From header
(as opposed to the client's source address) etc, etc. Also, it may
actually be hard work for the server to know exactly what methods are
allowed for a particular object: when using forms, you can specify the
method string to use for posting to be anything and this is defined in
the document (bizarrely enough), not the server, and so even assuming
you know what "user" the public bit refers to, it's not certain you can
determine what methods to do...
3) Pragma: no-cache. However, this seems like extreme overkill. The
document may well be a lengthy tome with zillions of cute images which
will generally be available to everyone at a particular site, and
accessed often by people on that site. Disabling caching is just
avoiding the issue and makes things slow.
Personally, I'd go for defining the Public-Methods header to return any
(but *not* neccessarily all) of the methods that a user from the same
site, with no special authentication could use. For example, accessing
a document might return as a public method "GET" if the server decides
that "email@example.com" (see note below), and passing no extra authdata
to the authorisation evaluation routine, could access the document.
However, the client could still attempt to try a "PUT" or anything else
if it feels like it: if a method is specified in the header, it is
guaranteed to succeed for a "public" client, however the list is not
guaranteed to be complete, other requests should be checked with the
Note: the problem with this is how to determine what the same site *is*
(for example, if they are coming through a proxy server...). The way to
fix this (mentioned previously as how to fix proxy servers) is IMHO to
fix the From header in HTTP to always be trusted (in the absence of any
real authentication data). Rationale: If I'm not using any *real*
authentication system and just using host based authentication (or
similar), then all I really want is something which I can trust, be it
the address of the client socket, or some data passed through it (esp in
the case of proxies). If they're lying to me, then that's their fault:
they're the ones who are liable for doing Bad Things, not the server ---
the server is a victim of fraud. So:
(a) the client should always fills in the from field (if nothing else,
(b) proxy servers should always pass the from field through *unchanged*,
unless the information is blank, in which case it should be filled in to
be "nobody"@socket-peer-address. This handles cases where (a) is not
satisfied and even allows requests passing through chains of proxies to
This results in the server always receiving a From header which has the
originating domain name attached to it, or whatever the user placed
there, which would hopefully be something similar.
The above proposals allow for servers to consistently have a piece of
information indicating the source of the request, and further to that,
allows caches to have more correct algorithms to determine if a document
is eligible for caching.
What needs to be done then to get all this working?
(1) Get all proxy servers to ensure that a valid From header is always
passed onto the destination server.
(2) Allow servers to use host based authentication based on From address
rather than socket-peer address.
(3) Decide what Public-Methods means.
(4) Modify servers to always correctly return the Public-Methods header.
(5) Modify the cache controllers to only cache/serve documents where the
Public-Methods header permits such access (in combination with any other
heuristics being used to decide if a document should be cached, of
So.... how about it? :-)
Nick Williams, Systems Architecture Research Centre, City University,
London, EC1V 0HB. UK.
E-mail: firstname.lastname@example.org (MIME and ATK)
Work Telephone: +44 71 477 8551
Work Fax: +44 71 477 8587