Re: "Hits" pragma

Larry Masinter (masinter@parc.xerox.com)
Wed, 16 Aug 1995 00:11:57 PDT


Whoa.

I wrote:
>If most server administrators don't care about statistics most of the
>time, why burden the proxies with gathering data that most of them
>don't want? As was proposed earlier, let those-who-want-usage-data
>periodically turn off caching for their documents to get better data.
>It takes only a little advanced planning to make sure 'expires' dates
>don't exceed date-of-next-usage-survey.

and Marc Hedlund replied:

>I definitely don't agree that most server administrators feel this way. I
>maintain a CGI-FAQ, and *the* single most FAQ is "how can I put a counter
>on my page?" Obviously this is anecdotal, but my conclusion is that many
>people who provide information to the web are *very* interested in exactly
>how many times their pages are accessed. On a per-page basis, no less, and
>graphically displayed to every new visitor.

Of course, this is not a matter of 'opinion' but of externally
verifiable fact ('how server administrators feel'). I suppose we could
do a survey, as long as we ask the right question.

First, the fact that something is a FAQ doesn't mean that it's a
frequent practice. The most-frequently-asked question might still only
be asked by 2% of the server administrators.

Second, all of the sites I've seen with counters usually count only
the first/home/splash page, usually not most of the pages beyond that,
and certainly not the embedded GIF images independently.

Finally, for those who want the counters on those pages probably would
not be satisfied at all with any kind of statisically-based proxy
cache reporting. In particular, any scheme where counts might get lost
or even delayed a few days would interfere with the goals of those who
would be counted. Pages with counters *can't* be cached, because the
content changes every time it is accessed. So your anecdote is
contrary evidence, not supportive, of the proposals that the proxy
servers should accumulate statistics and send them back at some later
time.

Do you have any evidence (anectodal or more formal) that there are
web resources for which:

(a) it is possible to cache the resource and

(b) unreliable delayed measurements of accesses (until the proxy next
reports them) are acceptable

(c) statistical measurements (measuring accesses every 8th day)
are not acceptable

If there such resources, are there sufficiently many to warrant the
extra overhead of statistics gathering in the cache service dominates
the extra overhead of just retransmitting the data each time?