> 1. take a base URL
> 2. retrieves all URL in the base document, but
> 3. do not goes outside the server (e.g. restrict the set of
> allowed URL),
> 4. minimum time between HEADs/GETs,
> 5. runs under unix (preferable SunOS 4.1 - i have ported software
> to hp-ux/solaris 2.x/dec osf/4.3bsd/aix/ultrix/sgi/linux)
I better clarify (4) - i would like to retreive all URL from
a site, but according to (4), have minimum time between two
GETs as to avoid overloading the server.
Answers to the query:
A). http://www.inria.fr/koala/abaird/oscheme/oscheme.html with
the "www-list" scripts (from Anselm.Baird_Smith@inria.fr)
B). http://www.ics.uci.edu/WebSoft/MOMspider/ (MOMspider)
(from joshuap@sdsc.edu (Joshua Polterock))
C). http://iamwww.unige.ch/~scg/Src/Scripts/ with the
explore script (diana@seldon.terminus.com (Cookie Monster))
D). Simon Spero <ses@tipper.oit.unc.edu> have a set of programs
for benmarking.
E). rst@ai.mit.edu (Robert S. Thau) has written a logfile replay program,
runs SunOS, which reports the main latency for every 100 transactions,
and which handle multiple outstanding requests. Found at
ftp://ftp.ai.mit.edu/pub/users/rst/monkey.c
F). www2dot from einpost@win.tue.nl (Reinier Post), it might no
fill the (4) requirement. Contact Reiner Post. Based on libwww2.
BTW, I probably try to use (C). For those interested, i'm running
a gateway (CGI based), which generates HTML pages on the fly.
I'm interested the above to profile the gateway (written in C).
thanks to all who answered,
msj
-- Martin Sj\"olin | http://www.ida.liu.se/labs/iislab/people/marsj Department of Computer Science, LiTH, S-581 83 Link\"oping, SWEDEN phone : +46 13 28 24 10 | fax : +46 13 28 26 66 | e-mail: marsj@ida.liu.se