Re: FTP or HTTP consumes more resources?

Robert S. Thau (rst@ai.mit.edu)
Sun, 22 Jan 1995 23:30:04 +0100


Date: Sat, 21 Jan 1995 14:51:00 +0100
Reply-To: connolly@hal.com
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: To sign off, send mail to listproc@info.cern.ch with body DEL WWW-TALK

Recently, I had an email exchange with a system administrator who gave
data showing that using HTTP to serve an archive of data caused a
higher load on the machine, even though the actual net traffic served
was less.

Does anybody have any data or first-hand experience to share? Does
HTTP really fail to be lighter weight than FTP? Or did the above
sys admin probably have a very atypical setup, or a misconfigured
server?

Dan

It makes quite a bit of difference what httpd the machine was running.
The NCSA server, in particular, has its notorious slow performance in
reading MIME headers.

FWIW, I got curious enough about where the time actually goes to do a
profiling run of the AI lab's (somewhat modified) NCSA server early
this afternoon. The profile (based on a few hundred live requests
from actual clients --- the real world is its own best simulation) is
up as

http://www.ai.mit.edu/people/rst/profile.user_data

if people want to take a look. Get_mime_headers() expense isn't as
high as it would be in a stock NCSA server (due to one of my hacks),
but it evidently still needs work; also, there are still other
candidates for trimming. But look how little of it is in send_fd(),
the function which actually transfers data in almost all cases; even
getting the last-modified-time takes more user-mode cycles (due in
large part to all the locale stuff).

As a general comment, there are plenty of things a typical Web server
does on each connection, besides the fork(), which also take up
nontrivial amounts of CPU time, including DNS lookup of the client
(for logging and access control), trying to open every .htaccess file
which *might* be present, etc., etc. --- all of which is *in
principle* dispensable, and all of which burns server cycles. For
instance, in the profile cited above, about 20% of the server's time
went into trying to open .htaccess files which are almost never
actually there.

At least some of this (the .htaccess stuff in particular) is often not
done by ftp servers, and certainly not done repeatedly. (A typical
ftp session looks up the client's name *once* --- a web server, even
one that forks, could achieve the same effect by keeping a cache of
the last few hundred hostnames seen in a shared memory segment). I
sometimes suspect this overhead collectively accounts for as much or
more CPU cycles as forking off the process in the first place, but I
have no hard data on the question.

rst