suggested libWWW architecture

Dan Connolly (
Wed, 13 Jan 93 19:11:15 CST

I sent this to tim a while ago, but I don't think
he's had time to look at it.

Meanwhile, libWWW is becomming reentrant, but I still
think the architecture is kinda clumsy: you have to
have a big data structure describing the DTD, and
a routine for each element, etc.

This doesn't mesh well with the MidasWWW architecture, which
can read the DTD from the X resource database at

I have an idea for an architecture that the linemode and
MidasWWW could share (along with other new implementations).

It's not radically different from the current libWWW, but
there's a lot of grunt-work between the current libWWW
and what I've got here. But I think the end result would
be much more usable.

We start with the HText class. In stead of the various
style and append methods, we have four methods in a
virtual function table:

typedef struct{
int (*start_tag) PARAMS((SGML_Object this, CONST char* gi,
CONST char** attributes, int nattrs));
VOID (*end_tag) PARAMS((SGML_Object this, CONST char* gi));

VOID (*entity) PARAMS((SGML_Object this, CONST char* name));

VOID (*data) PARAMS((SGML_Object this, CONST char* data, int char_qty));

The linemode would declare something like:

SGML_DocClass griddoc = {HText_start_tag, HText_end_tag,
HText_entity, HText_data};

The HText implementation is responsible for keeping track of
the stack of open elements, if it needs to.

On top of these we build some format parsing routines:

SGML_parse(void* dest, void* closure, void* stream, int (getc)(void*));
/* psuedocode:
int read, content;
char buffer[1000];
SGML_DocClass *docclass = (SGML_DocClass*)closure;

while( (read = SGML_read(buffer, content, stream, getc)) != EOF){
case SGML_start_tag:
... parse name, attributes ...
content = (docclass->startTag)(dest, name, attrs);
if(content = empty){
content = MIXED; /*@@ could be ELEMENT */

case SGML_end_tag:
... parse name ...
content = MIXED; /*@@ could be ELEMENT */

case SGML_entity:
(docclass->entity)(data, name);

(docclass->data)(dest, buffer);

PlainText_parse(HText* dest, void* docclass, void* stream, int (getc)(void*));
/* psuedocode:
(docclass->startTag)(dest, "HTML");
(docclass->startTag)(dest, "BODY");
(docclass->startTag)(dest, "PRE");
keep a local buffer of about 1000 chars.
Call (getc)(stream) until EOF.
Call HText_data(dest, buffer) whenever buffer is full.
(docclass->endTag)(dest, "PRE");
(docclass->endTag)(dest, "BODY");
(docclass->endTag)(dest, "HTML");

GopherListing_parse(HText* dest, void* dummy, void* stream, int (getc)(void*));
/* psuedocode:
(docclass->startTag)(dest, "HTML");
(docclass->startTag)(dest, "BODY");
(docclass->startTag)(dest, "MENU");
while(Gopher_parse_line(stream, getc, type, name, host, port, path)){
char addr[BIG];
sprintf(addr, "gopher://%s:%d/%c%s", host, port, type, path);
(docclass->startTag)(dest, "A",
"HREF", addr,
(docclass->data)(dest, name);
(docclass->endTag)(dest, "A");
(docclass->endTag)(dest, "MENU");
(docclass->endTag)(dest, "BODY");
(docclass->endTag)(dest, "HTML");

We register each of these with the following routine:

ContentType_register(CONST char* type, CONST char* subtype,
HTParseProc parse, void* closure);

For example:

ContentType_register("TEXT", "X-HTML", HTML_parse, griddoc);
ContentType_register("TEXT", "PLAIN", PlainText_parse, griddoc);
ContentType_register("APPLICATION", "X-GOPHER",
GopherListing_parse, griddoc);

The following routine can be used for any MIME entity. It will dispatch
the appropriate parsing routine based on the content type header:

ContentType_parse(const char* ct, HText* dest, void* stream, int (getc)(void*));

Then we build some load routines, one per access scheme:
(note that this design separates format from the access scheme, which
allows us to, for example, load a gopher menu
from a local file, or load HTML text from a Gopher server)

/* I don't have error handling worked out yet. We need to have a coherent
design for this. It's a mess in the current WWWlib. */

/* I think the WWW file: should be split into ftp: and local-file:.
It's cleaner to implement; there are precedents in the MidasWWW local:
scheme and the MIME ftp and local-file access-types. */

LocalFile_load(HText* dest, CONST char* path, CONST char* search)
FILE* stream;

if(stream = fopen(path)){
const char* content_type = WWW_zen_content_type_from_extension(path);
ContentType_parse(content_type, dest, (void*)stream, (int ()(void*))getc);
return 1;
/* log an error */
return 0;

FTP_load(HText* dest, CONST char* path, CONST char* search);

HTTP_load(HText* dest, CONST char* path, CONST char* search);

Gopher_load(HText* dest, CONST char* path, CONST char* search);
const char* content_type = Gopher_zen_content_type_from_gtype_char(*path);
char* host = HTParse(path, PARSE_HOST);
char* portnum = HTParse(path, PARSE_PORT);
int port = atoi(portnum);
static char* tab = "\007";
static char* crlf = "\015\012";

void* stream = TCPOpen(host, port);

TCPwrite(stream, path, strlen(path);
TCPwrite(stream, tab, 1);
TCPwrite(stream, search, strlen(search);
TCPwrite(stream, crlf, 2);
ContentType_parse(content_type, dest, stream, TCPgetc);
return 1;
/* log an error */
return 0;

Then we register these just like formats:

HTAccess_register(const char* name, HTLoadProc load, void* closure);

And the HTLoadDocument routine in HTAccess.c becomes this:

HTAccess_load(HText* dest, HTParentAnchor* p, CONST char* address)
char* scheme = HTParse(address, PARSE_SCHEME);
/* path is everything after the colon, except the anchor */
char* path = HTParse(address, PARSE_HOST|PARSE_PORT|PARSE_PATH);
char* anchor = HTParse(address, PARSE_ANCHOR);
char* search = HTParse(address, PARSE_SEARCH_TERMS);
HText dest = HText_new(p); /* check for doc already loaded in p @@ */
void* closure;
HTLoadProc load;

if(load = /* load routine registered for scheme. find closure too */){
(load)(dest, path, search, closure);
HTSelect(dest, anchor);

What do you think?