Re: converting URLs in .html files

Erik Ostrom (eostrom@mcs-server.gac.edu)
Tue, 31 Aug 93 12:37:40 CDT


Has anyone dealt with automatically converting the URLs within HTML files
so that you could take a set of files like the Library of Congress Vatican
Exhibit and use them off a local HTTP server rather than across the
Internet?

This won't help with the Vatican exhibit, but: If a cluster of related
files is written using relative URLs, then the only `conversion' you
need to do is to change the entry point.

That is, if http://sunsite.unc.edu/expo/vatican.exhibit/vatican.exhibit.html
contained a link to HREF="exhibit/Main_Hall.html", then you could just
copy all the files over to your local net, and jump to (say)
file:///my/html/files/vatican.exhibit/vatican.exhibit.html,
and the reference would now point you to the Main Hall file on your
local filesystem.

The Vatican exhibit uses absolute URLs, which is a pain for moving or
copying files. Oh well. A cluster of related files using relative
URLs should be easily portable. (That's the point of relative URLs,
as I understand it.) Yes, you need

to recreate the folder hierarchy of
the source server

but this is something that tar and other archivers already do.

For links that _aren't_ relative, it's really questionable whether you
want to convert. Usually (I hope) an URL inside a document absolute because it points to something unrelated, which wouldn't be
part of the package you brought to your local net anyway. Of course,
this isn't the case with the Vatican exhibit, or, no doubt, with many
other data sets on the web now. I can dream, though.