Re: caching dilemma

James Gwertzman (gwertzma@eecs.harvard.edu)
Mon, 29 May 1995 23:16:39 +0500

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: James Gosling: "(no subject)"
Previous message: Martin Hamilton: "Re: User authentication"
Maybe in reply to: Shel Kaphan: "caching dilemma"
Next in thread: Kee Hinckley: "Re: caching dilemma"

Hi. Let me respond to your points one by one.

>>>>> "Shel" == Shel Kaphan <sjk@amazon.com> writes:

Shel> Hi,

Shel> The "expires" feature should cover the issue of when pages
Shel> should be flushed, but the world is apparently not ready for
Shel> it, because:

Shel> - If you set documents to expire immediately, some major
Shel> browsers display "Data Missing" or equivalently scary
Shel> messages when you use browser commands to "back up" to that
Shel> page. Since many users are not going to understand what is
Shel> going on and will be confused by such messages, and may not
Shel> know to "reload" the page at that point, it would be better
Shel> for them never to see messages like that. (I've already had
Shel> problems with some naive beta testers tripping over that.
Shel> They tend to think something must have broken. You can't
Shel> argue that we need more sophisticated users, because we
Shel> don't have a choice!)

Shel> - Some browsers (such as Prodigy's) appear to ignore the
Shel> "expires" header and cache pages anyway. (and that's just
Shel> their *browser*...)

In my mind the expires field should ONLY be used for documents with a
fixed lifetime. Cool-site-of-the-day for example, or dynamic pages
which expire immediately. I agree that browsers should do a better job
with pages that expire immediatly; namely showing them but not caching
them. I believe that for all other items (with undetermined lifetimes)
thath the browsers should use the technique that I describe in the
chapter of my thesis labeled "Cache consistency" that is based on the
Alex FTP cache. Namely, the older a page is the less likely that the
page will change. when the browser suspects that the page might have
changed it sends the "get-if-changed-since" message to the server to
find out whether its cached replica needs to be updated. If the answer
is "yes" then it updates the page before showing it to the
user. Otherwise it simply uses the page currently cached.

the Browser decides when to check by using ratio of the time since the
file was last checked to the age of the file (time since file was
created). Whenever this ratio exceeds some threshold, ie 10%, the file
is checked. In other words, if the file is a month old, and it was
last checked an hour ago, don't bother checking again before using the
cached copy. If the file was created a month ago, and last checked a
week ago, then contact the server before showing the user the cached
file. I describe simulations in my thesis that show this to be a
promising approach.

Shel> So, I have a question and I have suggestions.

Shel> First, the question:

Shel> Is there any good workaround for the current problem, that
Shel> would have the properties of: - forcing browsers to reload
Shel> expired pages when someone explicitly requests one, and -
Shel> either: - allowing pages on the browser's history stack (for
Shel> instance) to remain in the local cache even if they are
Shel> expired, or, - *somehow* causing the browsers to gracefully
Shel> and silently reload expired pages when re-visited through
Shel> history mechanisms.

Shel> No? I suspected as much...

You're right, my stuff does not address the "here and now" very
well. I'm describing a solution to caching on local-area-networks, not
necessarily clients and their history stacks.

Shel> The suggestions:

Shel> To make the web work more smoothly, it would be nice if
Shel> browsers would handle this situation more gracefully, by,
Shel> for instance, not displaying errors like "Data Missing", but
Shel> just automatically reloading the page.

Shel> However, I also think it is worth considering for browser
Shel> writers that history stacks (that can be re-viewed with
Shel> browser navigation controls) are in a class of their own
Shel> when it comes to caching. However, while it might make
Shel> sense to back up and see an expired document, since history
Shel> mechanisms are for "history", it does not make sense to go
Shel> through a link and see a cached copy of an expired document.
Shel> It is REALLY BAD for browsers to display cached copies of
Shel> expired documents when they are meant to be freshly
Shel> displayed in response to a direct user command, because a
Shel> URL may be a request to a program that is displaying dynamic
Shel> information related to the user's extended "session" with
Shel> the server. (This is the core of the issue).

Shel> I realize these considerations may have no role in the HTTP
Shel> spec, however I feel there are serious problems in this
Shel> area, which can only be resolved by coordinating the
Shel> behavior of browsers and servers.

Shel> Another thing that might help: perhaps there should be a way
Shel> for servers to "force" the URL (the *name*) handled by
Shel> clients to something other than the requested URL. This
Shel> would allow, for example, the requestor's URL to be used to
Shel> encode information relating to a query, but would then
Shel> result in a single cache entry in the client.

Shel> To explain this a little more, if there were two GET
Shel> requests, one for /cgi-bin/food/hamburgers and one for
Shel> /cgi-bin/food/french-fries, which would result in a single
Shel> page that ought to be cached as one page, then the server
Shel> ought to be able to say, "you asked for /food/french-fries,
Shel> but the page is called /food/generic-junk-food", and to have
Shel> the browser use that info to uniquely identify a cache entry
Shel> and update it with the newly fetched data. This might not
Shel> help to avoid fetching documents extra times, but it would
Shel> help on cache coherence if the intent was to display a
Shel> dynamically generated document.

I agree here. There is already a redirection mechanism in place, but
I don't think the results of the redirection are cached across
sessions. I would love it if the user could ask for page a on machine
b, and be told that page a now lives on machine c, and remember that
fact until told otherwise. after all, a redirection like this only
takes 30 or 40 bytes, and the typical client could store thousands of
them very neatly.

Shel> Anyway, just some thoughts. If you have any ideas, pointers
Shel> or references for me, I would really appreciate it.

Shel> --Shel Kaphan sjk@amazon.com

Next message: James Gosling: "(no subject)"
Previous message: Martin Hamilton: "Re: User authentication"
Maybe in reply to: Shel Kaphan: "caching dilemma"
Next in thread: Kee Hinckley: "Re: caching dilemma"