Re: Web and Mail integration: a few key connections.

David Durand (dgd@cs.bu.edu)
Mon, 8 Nov 93 14:03:55 -0500


This is in reply to Ned Freed's comments on UR* and ISO FPIs. I've
been involved w/ SGML one way or another for a long time now so I can
at least clear up the factual questions, I think. The Formal Public
Indentifier (FPI) is addressing the "URN" problem of naming content
(with the identity/data format issues to be resolved by the publisher
not the naming formats).
Ned wrote in response to (timbl?)
>> SGML documents refer to external entities
>> as either "PUBLIC", in which case a special "Formal
>> Public Identifier" (FPI) space is used and everyone
>> is supposed o know what's in it, or "SYSTEM" in which
>> case the significance is purely local.

>My understanding is that there's one level of indirection implicit in SGML
>to begin with. Specifically, the actual documents references things by
>names which are then bound to actual objects by the DTD.

Documents reference things by the use of "entities" which are defined
in the DTD to resolve to either FPIs which are presumed known, or by
system strings. My quibble here is that both system string -> object
mappings and the FPI-> object mapping are in the SGML processor and
are external to both the document and the DTD. Any method of resolution
satisfactory to the users is OK.

Then we have this proposal:
>> 1. In an Internet context, the SGML "SYSTEM" identifiers
>> should be conventionally URIs. As there are URIs
>> which refer to a local file system, this does not
>> rule out refering to local files too.

>This sounds like a good idea to me.
Actually a bad idea. Since URIs are standard, they should be formal
identifiers: SYSTEM strings are _defined_ to be the place for
non-portable names in SGML. Assuming a particular format for SYSTEM
identifiers is guranteed not to work for existing documents, and
allows conforming SGML software to ignore SYSTEM identifiers at
will. Names which have a standard resolution method should use formal
identifiers.

This seems related to a following point:

>> 2. The FPI space should be registered as a URN.

>As I understand it, the FPI space has a lot in common with several other things
>in OSI. Specifically, while it does provide a convenient space for public
>usage, the lack of any authoritative registration process makes it very
>difficult for things to really interoperate. (Many other aspects of OSI have
>similar problems, such as BP15 OID usage, FTAM file formats, and so on.)
>
>Given that this is a fair representation of the current situation, I think
>having the Internet provide such a process would be a wonderful thing. In
>addition, if that process can be piggybacked on top of some existing or
>soon-to-exist scheme like URNs, so much the better.

The FPI space should be registered as a URN, and this can be done
pretty easily. Going the other direction is a bat trickier. The
objection Ned raises is based on a common misconception, since the ISO
registeation authority does not yet exist. However, the ISBN is an
accepted sub-domain for registration. So all one needs to do to create
a public FPI is have an ISBN publisher number and then make up a
subdomain. SRI or the government, for instance, could thus create a
proper inclusion of URN into FPI space tomorrow. However, the worst
wart on ISO 1090 (the FPI spec) is a length limitation to roughly 120
characters, so until this changes, many URN's are unlikely to fit into
the available space.

I don't have ISO 8879 (SGML) or ISO 1090 (FPI) in my hands at the
moment so I can't tell you what my view of an ideal solution would be,
but this kind of inter-standard coordination is pretty important. SGML
is the only contender for a non-presentational document language, and
would thus be important even without its increasing commercial
use.

-- David Durand
Boston University Computer Science