Re: indexing html files and the old filter options to WAIS

Ben Beecher (beecher@neb.cc.columbia.edu)
Tue, 14 Sep 93 16:32:04 EDT


> A while back I seem to remember a thread concerning WAIS indexing html files.
> If I'm not mistaken about the thread, did anyone come up with a way to
> index HTML without having tags in the database as well? What I'd like to
> end up with is an html file with a text section that is indexed but that
> also contains links to images. Has anyone already done this?
>
> -MM
> --
> ------------------------------------------------------------------------------
> Michael Mealling
> Georgia Institute of Technology
> Michael.Mealling@oit.gatech.edu
>

I recently modified iubio-wais-8b5 so it parses HTML files, and I'm
using it to index some poetry files we have online. You can tell it
to ignore HTML tags, but if you want to index certain sections you
have to teach the parsing functions about the structure of your
documents (sorry!)

If you'd like to see my changes I can make them available for you.

Ben Beecher
Columbia University
Academic Information Systems