Re: HTML parser in Yacc form???

uid#15033@dxal18.cern.ch
Wed, 22 Mar 1995 13:09:44 +0500


In article <3k4hss$l06@stratus.CAM.ORG> you write:

|> Hi all,
|>
|> I was wondering if there exists a specification of HTML in yacc
|>(or bnr) form. It has probably been done as constructing such a parser is
|>way more easier in this way than with a traditional C subroutine.

Don't think about it. HTML is not an LR(1) grammar and so trying to use yacc
is only going to cause pain. The best way of parsing SGML is with a top down
recursive descent parser. Try to use yacc and you will end up in all sorts of
troubles, especially with error reporting.

One of the problems with comp sci courses is that lecturers often make
silly statments such as bottom up parsing being somehow better than top down.
This is not the case. Bottom up parsers can be made slightly faster but at
a disproportionate cost in terms of complexity. My view is that a language
requiring a yacc parser is probably too complex in any case. Nobody uses
an LR(1) parser to parse LISP.

--
Phillip M. Hallam-Baker

Not Speaking for anyone else.