Re: Broken links, are we ever going to address them?

Michael Chang (chang@cs.umd.edu)
Wed, 25 Jan 1995 21:01:00 +0100


I'd like to add my 2 cents to this.

I had written a perl script which reads the Referer fields that I log
(mod'd NCSA httpd) and matches a list of regular expressions against each
request. This is slow. After building a list of Referer URLs to try, it
uses the perl libwww routines to fetch them and then checks the links that
was found on that page if any of the above regular expressions match. This
more or less fits Martijn Koster's ideas #2 & 3, and is I think roughly what
Roy Fielding is talking about.

After fetching the page, it reads the page for any links that match the
given regular expressions, and then it tries to figure out who to send
warning/notice mail to. This is done by looking for 'link rev=made',
'reply-to' fields, RCS strings, whether the URL is of the form
/user/$username, then tries to pick out an email address in the final 256
bytes of the page. As a last resort, tries webmaster, postmaster, and root
on the remote server. One mail message per email address, so no mail
bombs. VRFYs are done to the sendmail daemon, so hopefully no bounces
either.

By fetching the URL and doing analysis on the referring page, this avoids
bogus messages due to faulty Referer implementations on clients. Of
course, my 'algorithm' could always guess wrong anyways. Yes, it's just
trying to make an educated guess.

Please note that I'm not trying to say that this is an optimal solution;
or anything more than just a hack. It is just an attempt to try to give
other people some advance warning, and save myself the effort of picking
through my server logs and compose mail messages.

mike