INS, DEL and Collaborative Document Design

Olle Jarnefors (ojarnef@admin.kth.se)
Tue, 12 Dec 95 20:16:36 +0100


Ian Graham <igraham@hprc.utoronto.ca> wrote in message
<9512050146.AA00534@www10.w3.org>:

> I am planning a project to look at collaborative HTML document
> development via the Web. I am thinking of using the proposed
> HTML 3 elements INS and DEL to delineate the changes associated
> with different versions or a document, with appropriate
> attributes to reflect authorship, version numbers and so on.

> Has anyone looked at integrating this type of functionality
> into HTML or other markup languages?

I haven't done that, but I have experimented with different ways
of augmenting _plain text_ for similar purposes. The details
are unimportant, but the _functionality_ I found useful to
support may be of interest and could be provided by means of
HTML or SGML.

SCENARIO: A long, complicated document is developed over a long
period of time jointly by several persons, working in different
places, without daily contact. One of them has the function of
document _editor_, the others are _contributors_. (The word
"editor" is here used for a human, not for editing programs.)

MODEL: The document evolves through a number of different
versions. Each version is composed by the editor alone. These
steps are followed:

1) The process starts with a _strawman document_. This may be
merely a plan for what should be included in the final
version document.

2) If the editor finds that the current document version is good
enough, he reformats it to a finished document.

3) This version of the document is distributed to the
contributors (together with any comment compilation,
see step 6).

4) If the document is finished: Stop.

5) During a subsequent _comment period_ the contributors are
requested to submit _contributions_, which can be comments on
this version, proposals for changes, and sometimes new text
for a part that has been delegated to a specific person.

6) The editor combines all contributions with the text of the
document to form a special working document, the _comment_
_compilation_, where all comments can be seen in their right
context and competing comments can be compared. (This step
may be shortcut in some cases.)

7) The editor then creates a _new version_ of the document being
worked out, where new text, as well as places where text has
been removed, is marked. This makes it possible for the
contributors to easily see what has been changed since the
last version of the document. For particularly important
changes, the editor may include comments explaining the
choices he/she has made.

8) Goto step 2.

FUNCTIONALITY: The contributions, the comment compilation, and
the document itself can use the same document type. When
implemented in HTML or SGML, it may be feasible to include both
the previous version, all contributions suggesting changes, and
the changes actually implemented by the editor in the new
version. I didn't attempt this in my augmented plain text
format, but kept the comment compilation and the revised version
as separate text files. (I'm skeptical to a possible further
generalization, including not only two versions of the document
in one file, but all the previous versions and all previous
comments.)

Compared to a normal document, this REVITEXT document type will
have use for at least this funtionality:

A) For a part of the document, indication of who has
_contributed_ it (or that it is provided by the editor).

B) A way to distinguish _metatext_ from _object text_. Metatext
is text about the object text and can include justifications
for proposals, discussion of alternatives, plans for the
future work, and free comments, that are not intended to end
up in the final version if the document.

C) A mechanism to _connect_ a certain piece of metatext with the
parts or points of the object text which it is about.

D) A way to include directly in the object text short
_meta-descriptions_ that indicates what eventually will be
included at this place, rather than the (not yet written)
wording itself.

E) For the proposal of a contribution, and for the new version
of the document, in comparison to the old version: Ways to
indicate:
1) _new_ text
2) _deleted_ text
3) text that has been _moved_
4) the former _place_ of moved text.

F) A way to include several _alternative_ ways of changing the
same part of the old version.

G) In the new version of the document: Visual indications by
different kinds of change bars of:
1) parts where the _substance_ has been changed or amended
2) parts where only _editorial_ changes have been made.

Authoring tools can take advantage of this to offer
several _views_ of the document:
- the previous version of the document
- the new version of the document
- the new version of the document with changes indicated
according to item G
- the new version with new parts marked and old parts
(that are removed) included
- the hypothetical version corresponding to one contributor's
contribution, together with the new version
- the new version with removed parts, also including the
contribution from a certain contributor
- for a certain part of the document: the old text, comments
from all contributors, the new text.

In all views of the document either only the object text, or
both the object text and comments on it, can be displayed.

It can be noted that functions A-C are almost all that is needed
for a general _annotation_ mechanism for documents and messages,
something that's useful in many other situations than the rather
formalized collective document development process modelled
here.

The RANGE element from the now withdrawn HTML 3.0 draft should
be usable for function C. The INS and DEL elements of that draft
could be used for function E. Several other new elements and
attributes are needed a full implementation, I think.

PROBLEMS: I haven't addressed here the following problems (in
difficulty order):

-- A naming scheme for the different versions and contributions.

-- How contributions and new versions are communicated between
the involved persons.

-- A possible differentiation of the contributors into a core
team with a quicker cycle of "working versions", and a wider
circle of "commenters", who only are bothered with and asked
to comment on a smaller number of "main versions".

-- A more "democratic" process where the editor isn't a
dictator.

-- A more "decentralized" process with more than one editor.

-- Possibilities to split the development process into several
parallel documents, or parallel versions of a part of the
document, which later are merged or kept as different
results of the process.

/Olle

--
Olle Jarnefors, Royal Institute of Technology, Stockholm <ojarnef@admin.kth.se>