Note: I hope to get a Perl 5 fix out soon for mhonarc to properly
process multipart messages. I've done the fix, but would like to add a
few other things before the next release.
>
> 1. Base the published URLs on the global message-ids, not on local
> sequence numbers. So in stead of:
>
> http://www.foo.com/archive/mlist/00345.html
>
> I want to see:
>
> http://www.foo.com/archive/mlist?message-id=234234223@bar.net
I've been planning to add something like this mhonarc. The failing of
mhonarc, hypermail, and similiar programs is that they are not well
suited for large archives. Allowing the user to specify the message
links, one can utilize a known database system for retrieval and use
mhonarc just as a dynamic message->html filter.
> 2. Support format negotiation. Make the original message/rfc822 data
> available as well as the enhanced-with-links html format -- at the
> same address. This _should_ allow clients to treat the message as a
> message, i.e. reply to it, etc. by specifying:
>
> Accept: message/rfc822
A reasonable request. Will be very useful when clients can process
MIME data correctly.
> 3. Keep the index pages to a reasonable size. Don't list 40000
> messages by default. The cover page should show the last 50 or so
> messages, plus a query form where folks can select articles...
I've been requested the behavior to have the index list only the
last N messages, but preserve the older messages. On my TODO
list.
As for search engines, those can be hooked in independently; which some
have done with mhonarc. It is a waste of my time, and probably other
developers of mail processors, to write search engines when one can
already utilize well developed ones like Lycos, Glimpse, etc.
> 4. Allow relational queries: by date, author, subject, message-id,
> keywords, or any combination. Essentially, treat the archive as a
> relational database table with fields message-id, from, date, subject,
> keywords, and body.
This is best done by utilizing an existing database system (eg Oracle),
and using mhonarc (or other prefered mail->html filter) to convert
retrievied messages to html on-the-fly.
> Goals:
>
> 5. Generate HTML on the fly, not in batch. Cache the most recent pages
> of course (in memory?), but don't waste all that disk space. (support
> if-modified-since in the on-the-fly generator, by the way)
mhonarc can work in on-the-fly mode. It's up to the user to
set things up the way they want. I.e. mhonarc provides the
mail->html conversion, it is up to the user figure out how to
use that for his/her environment.
> Update the index in real-time, as messages arrive, not in batch.
.forward
> 6. Allow batch query results. Offer to return the raw message/rfc822
...
> 7. Export a harvest gatherer interface, so that collections of mail
...
> 8. Allow annotations (using PICS ratings???) for "yeah, that
...
> 9. Make it a long-running process exporting an ILU interface, rather
...
> Major brownie points to anybody who builds something that supports at
> least 1 thru 4 and makes it available to the rest of us. I'd really
> like to use it for all the mailing lists around here.
Your request are good, and I have had others state similiar
requests.
However, I see many of the tasks can be done by a collection of tools
and not a single tool. Trying to develop a single software program to
do everything maybe wasted effort, and it does not make the best use of
existing software that can do the job better (ie. I'm lazy and do not
want to reinvent the wheel :-).
The approach I take with mhonarc is that it can be used for moderately
sized archives, but it also can be used in non-archive mode to provide
a message->html converter for a larger mail archiving system. I know
mhonarc cannot be used as entire solution for some people's needs, but
it can be used as part of the solution. As long as mhonarc can be
invoked just as a message/rfc822->HTML converter, then others have the
ability to use that capability in whatever WWW mail archiving system
that suits their needs.
I'd like to remind people that many of the WWW tools/filters people use
are developed on various individuals spare-time. As one's problem
become more sophisticated, one should not hold his/her breath waiting
for a free, ready-made, solution. Many times it will take the
integration of several programs to come up with the desired solution
because free software developers cannot solve everyone's problems. The
solution to Dan's problem may be best be solved by an intelligent
integration of several programs and not a single program. The brownie
points will go to the person, or group, that can make a successful
integration.
--ewh
P.S. I hope my message does not convey the attitude that I,
and other developers of free programs, do not want to hear
suggestions. I'm always open to suggestions, as are many others.
The problem is that we do not always have the time to execute
on the requests we receive. Now, if I was paid ... :-)