Re: proposals for log file format changes

Roy T. Fielding (fielding@simplon.ICS.UCI.EDU)
Thu, 10 Feb 1994 11:40:28 --100


Kevin Hughes said:

> More or less following RFC 822, then:
>
> host rfc931 authuser [DD/Mon/YYYY:hh:mm:ss UT[+/-]HHMM] "request" ddd bbbb
>
> How's that?

RFC 822 expects date fields to be separated by spaces.

Now that people are talking about including the Referer: field
(a great idea but a lot of text per log entry), I think the original idea
of a configurable log is now preferable in order to save some people's disks.
However, I would recommend a limited set of options rather than the
fully formattable sscanf codes that Ari first mentioned.
[I think Kevin suggested option names as well, but I didn't save that message].

How about:

host = machine.sub.dom.ain
rfc931 = whatever_it_returns
fromuser = whatever_From:_gives (stripped of comments)
authuser = whatever_Authorization:_gives (stripped of password)
authpass = whatever_Authorization:_gives (stripped of user - IS THIS SAFE?)
charge = whatever_ChargeTo:_gives
locdate = [DD Mon YYYY hh:mm:ss]
gmtdate = [DD Mon YYYY hh:mm:ss GMT]
tzdate = [DD Mon YYYY hh:mm:ss +HHMM]
request = "first line from HTTP request"
response = ddd (3 digit HTTP response code)
bytes = bbbbb (free-formatted number of bytes transmitted)
referer = the_referer's_URI

As specified in HTTP2, the From: field looks like an e-mail address.
Should the entire address be logged or just the username? If only username,
how does the server parse it given the wide variety of address formats?

The next question is: should the order be configurable as well?
If not, then the format can be specified by simple boolean options.
However, I'll bet people will want it configurable. In that case,
how should it be specified? A list is probably best, placed in a
server config file (e.g. NCSA's srm.conf). E.g.:

(host,rfc931,fromuser,authuser,charge,gmtdate,request,response,bytes,referer)

Any field which is requested but is not defined for a particular log entry
should be logged as a single dash "-".

Another question is how should the fields be separated in the log?
Current practice uses a space, but perhaps a comma is better. Any field
which could possibly include the delimiter would have to be surrounded
by some form of brackets (as is the date and request fields above).

Some examples:

(host,locdate,request,response,bytes)

would log something like:

simplon.ics.uci.edu [10 Feb 1994 01:18:51] "GET /ICShome.html HTTP/1.0" 200 4262

(gmtdate,response,host,fromuser,request,bytes,referer)

would log something like:

[10 Feb 1994 09:18:51 GMT] 200 simplon.ics.uci.edu fielding@ics.uci.edu "GET /ICShome.html HTTP/1.0" 4262 http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/whats-new.html

My primary concern about this is the extra work it will require of
the server authors. Provided that the fields are well defined and can
be parsed unambiguously, there should be no problem for log analyzers.
However, I think it would be much easier on the server authors if the field
order is fixed and simple options defined, e.g.:

LogDate LOCAL (or GMT or TIMEZONE)
LogReferer NO (or YES)

I think that decision should be left up to the server authors.

Comments?

...Roy Fielding ICS Grad Student, University of California, Irvine USA
(fielding@ics.uci.edu)
<A HREF="http://www.ics.uci.edu/dir/grad/Software/fielding">About Roy</A>