Re: quotes around tags and escape sequences

Dan Connolly (connolly@pixel.convex.com)
Mon, 30 Nov 92 21:42:47 CST


>Three questions,
>
> 1) If we now expect quotes around tags, are we still meant to understand % as
> an escape character within tags?

In short, I think so.

These dang things get parsed twice: once by the SGML parser, and once
by the URL parser.

After the HREF=, the SGML parser is looking for an attribute value,
which may be a token or a literal. The syntax of a URL conflicts with
the syntax of a token, so you've got to use a literal, i.e. you've
got to put quotes around it.

To compute the value of the HREF attribute, the SGML parser grabs
everything between ""s (or ''s, actually. In fact, it expands
&entity; references too!).

Then you hand the value of the HREF attribute to the URL parser.
It better be a legal URL at this point. I don't know if the URL
parsing code can handle spaces in a URL or not. If not, they've
got to be represented by the %nn construct.

NOTE: There's an SGML construct: &#SPACE; or { designed for the same
purpose. We might want to remove the quoting mechanism from the
URL spec, and say that you use whatever quoting mechanisms the
enclosing data format requires.

> 2) Which of the following do I need to support, and which is the "approved"
> method of accessing gopher?
>
> href="gopher://gopher.micro.umn.edu:70/00/Some Stuff"

This is legal SGML -- dunno if it's a legal URL.

> href="gopher://gopher.micro.umn.edu:70/00/Some%20Stuff"

This is probably your best bet for the current linemode code.

> href=gopher://gopher.micro.umn.edu:70/00/Some%20Stuff

SGML parsers won't grok this.

For starters, you've got kind of a bad design for handling SGML
attributes: you parse them twice: once to stick them in the param
resource, and once to take them out of the param resource and stick
them in the href and name resources.

Rather than a param resource, the parsing code should build an XtArglist
with the attribute names and values. Then it can just call XtSetValues
when it's done parsing the start tag. This would be a minor modification
to my current version of the MidasWWW code using my HTML parsing library.

> 3) Is the % meant to act as an escape character in search strings? ie
>
> href="http://slacvm.slac.stanford.edu/FIND/PARTICLE?PI%nn"
>
> meant to find entries for PI+ ? (where nn is the ascii code for +).

Yeah... I've got a bunch of questions like this one. My understanding
is that everything after the scheme: is defined by the individual scheme.
It's not safe to just replace %nn by the corresponding ASCII character
in all URLs. The %nn quoting mechanism is specific to the gopher scheme.
(It might be used by other schemes too, but it's not a universal mechanism.)

I've got some design ideas for the WWW library that I think would obviate
the need for implemntors like Tony to even mess with this stuff.

Details as the develop...

Tony: I'll send you my HTML parsing work separately.

Dan