Re: The future of meta-indices/libraries?

Peter Deutsch (peterd@bunyip.com)
Tue, 15 Mar 1994 19:54:10 --100


Hi all,

[ You wrote: ]

> Hiyall,
>
> Bjoern Stabell (bjoerns@staff.cs.uit.no) writes:
> > The main problem for many users of WWW today is that they cannot
> > locate the information they are searching for.
> [munch]
>
> > The problem is, these meta-libraries require much work to be kept
> > up-to-date (and so, they usually aren't very up-to-date at all)
> > and there are so many of them; resulting in most meta-libraries
> > keeping a list of other meta-libraries.
>
> True 'nuff.. what we need is an archie like mechanism which allows
> for doing world wide searches on a specific topic. We are currently
> working on a CGI interface which allows doing conceptual searches
> on WWW archives. If you use a spider program to walk the web, or an
> mechanism close to archie, you could do a _world wide_
> "show me all documents dealing with foobar"
> and get back a clickable list of world wide URLs!

Actually, we do plan to add this capability to the archie
server system in the very near future. For WWW there's the
obvious problem of what to index, since there is no real
useful meta-info in the URL itself (how many copies of
"default.html" are there, anyways? :-) so at this point
we'd be happy to be told what to collect and serve.

We had planned to come to the community round about the
end of this month to start the discussion as to what would
be your preferences. At our end we need to produce only a
simple data-gathering script (modelled on what we already
have for archie's anonFTP) and a parser to allow us to
check that the collected info is valid. The rest will
using the existing archie code for access, data sharing,
database management and so on.

FYI, we now have a WAIS index search engine internal to
the system as well, so we can index and serve template
oriented info as well. We're planning to use this to
gather and serve IAFA templates (among other things) and
we can use this for WWW, if the info available requires
it. If the WWW community can agree on a template
structure for documents this may be the best way to go.

FYI, we've already extended the system to index Gopher and
are testing the new collection now at a pilot site
provided by NEARNET in Boston (sorry, no general
availability yet, although it's not far away). This test
collection now indexes several hundred sites and is being
added to on a daily basis. Also FYI, the current gopher
index collection is tentatively called "gophind". Although
my partner despises the name, the rest of us here at Bunyip
Central think it's kinda cute. You'll know who won that
battle when we announce it to the entire net! :-)

As part of this development we've added a direct gopher
frontend onto the information, allowing you to choose
either the anonFTP or gophind collection through
gopher menus. We use the same internal database engine so
you have all the same search choices for gopher menus you
have for archie queries. We also have a WWW frontend
operating, although we're not sure that can be added as a
free upgrade at this point and are looking at the best way
to make this available. We'll keep you posted.

We hope to make the gopher index a part of the next
release currently scheduled for mid-April or so. It will
be offered to all existing archie sites as an additional
collection and they'll each choose individually whether to
offer this as part of their service. We then plan to start
serious work on the WWW index. Feedback and ideas are most
welcome and should be sent to "archie-group@bunyip.com".
Of course, we're also more than happy to see this
discussed on www-talk, as that's where the user community
for this service is to be found.

>
> That'd be kinda cool, wouldn't it.. ;-)

I hope so. The only question is what's the best stuff to
gather and index for a first pass. For that we need to
hear from the community, keeping in mind the tradeoff
between disk space and info desired. Can you all define a
simple template (or perhaps use one of the IAFA ones)? Is
the HEAD info enough? Once we know that the rest should be
fairly easy.

- peterd

-- 
------------------------------------------------------------------------------
  My proposal for funding the Internet is pretty simple. I vote we institute
  an "Information Superhighway" tax, the proceeds of which will be used to
  fund network infrastructure. The way this would work is simple - every time
  someone uses the words "Information Superhighway" or any of its derivatives
  we strike them with a sharp object and make them pay a $10 fee (of course,
  the sharp object is not actually needed to make this scheme work, it's just
  in there because it seems an appropriate thing to do...)
------------------------------------------------------------------------------