Sequential Merge Analysis
Introduction
One of the important characteristics of sequential merge is that there is a clearly-defined order of editing. The original document is changed by the first editor, then the second and so on until the last person makes modifications and the document is then submitted for review. Sequential merge reports on the changes made by each editor. Each version is labelled, usually by the name of the editor. The attribute deltaxml:version-order defines the version order.
In the deltaV2 format, produced by sequential merge, the deltaV2 attribute accurately describes the contributions of the input files to the merge result. However, subsequent processing is simplified if these deltas are classified to describe the types of change which they represent. The current classification scheme for sequential merge describes certain changes in the result file as an add or a delete .
The order of editing provides us with a temporal frame of reference and so the concepts of add and delete are defined relative to the order of editing. This leads to the following definitions for change categorisation:
add: Something that does not exist in previous versions. When something is added it has never been seen before.
delete: Something that exists in a version, but is missing in one or more of the later versions. As soon as something is deleted it is never seen again.
Users familiar with the deltaV2 format will be aware that change propagates up an XML tree. So that, for example, a change in a word in a paragraph affects the paragraph itself, the parent element such as section or body all the way to the root of the tree. This is represented in deltaV2 through the presence of a != (not equals) separator between the version identifiers. The add and delete classifications are applied both to the leaf elements and also more generally to elements within the tree.
The analyzed deltaV2 output from a 'sequential merge' augments the default deltaV2 output by adding two attributes indicating whether the change is an addition or a deletion and also shows the version where this change happened. These attributes are only ever added to leaf nodes. For example:
deltaxml:added-by="Anna"
deltaxml:deleted-by="Ben"
The deltaxml prefix/namespace is the same as used for deltaV2 attributes. Other than the two additional attributes, the analyzed result is identical to the deltaV2 representation.
Examples
For all the examples shown below, the order of editing is Original, Anna and Ben.
A Simple Deletion
Consider a paragraph 'p' is deleted by the version 'Ben'. The deltaV2 representation of this would be as follows:
<section deltaxml:deltaV2="Original=Anna!=Ben">
<p deltaxml:deltaV2="Original=Anna">The quick brown fox jumps over the lazy dog.</p>
</section>
The analyzed result is similar, but the deleted paragraph will be marked by a deltaxml:deleted-by attribute.
<section deltaxml:deltaV2="Original=Anna!=Ben">
<p deltaxml:deleted-by="Ben" deltaxml:deltaV2="Original=Anna">The quick brown fox jumps over the lazy dog.</p>
</section>
A Simple Addition
Consider a paragraph 'p' is added by the version 'Anna'. The deltaV2 representation of this would be as follows:
<section deltaxml:deltaV2="Original!=Anna=Ben">
<p deltaxml:deltaV2="Anna=Ben">The quick brown fox jumps over the lazy dog.</p>
</section>
The analyzed result is similar, but the added paragraph will be marked by deltaxml:added-by attribute.
<section deltaxml:deltaV2="Original!=Anna=Ben">
<p deltaxml:added-by="Anna" deltaxml:deltaV2="Anna=Ben">The quick brown fox jumps over the lazy dog.</p>
</section>
Modification Followed by Deletion
Consider a paragraph 'p', where a text has been modified by 'Anna' and then the paragraph is deleted by 'Ben'. The deltaV2 representation of this would be as follows:
<section deltaxml:deltaV2="Original!=Anna!=Ben">
<p deltaxml:deltaV2="Original!=Anna">The
<deltaxml:textGroup deltaxml:deltaV2="Original!=Anna">
<deltaxml:text deltaxml:deltaV2="Original">quick</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="Anna">fast</deltaxml:text>
</deltaxml:textGroup> brown fox jumps over the lazy dog.
</p>
</section>
The analyzed result for this will be:
<section deltaxml:deltaV2="Original!=Anna!=Ben">
<p deltaxml:deleted-by="Ben" deltaxml:deltaV2="Original!=Anna">The
<deltaxml:textGroup deltaxml:deltaV2="Original!=Anna">
<deltaxml:text deltaxml:deltaV2="Original">quick</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="Anna">fast</deltaxml:text>
</deltaxml:textGroup> brown fox jumps over the lazy dog.
</p>
</section>
Addition Followed by Modification
Consider a paragraph 'p', which is added by 'Anna' and is later modified by 'Ben'. The deltaV2 representation of this would be as follows:
<section deltaxml:deltaV2="Original!=Anna!=Ben">
<p deltaxml:deltaV2="Anna!=Ben">The
<deltaxml:textGroup deltaxml:deltaV2="Anna!=Ben">
<deltaxml:text deltaxml:deltaV2="Anna">quick</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="Ben">fast</deltaxml:text>
</deltaxml:textGroup> brown fox jumps over the lazy dog.
</p>
</section>
The analyzed result for this will be:
<section deltaxml:deltaV2="Original!=Anna!=Ben">
<p deltaxml:added-by="Anna" deltaxml:deltaV2="Anna!=Ben">The
<deltaxml:textGroup deltaxml:deltaV2="Anna!=Ben">
<deltaxml:text deltaxml:deltaV2="Anna">quick</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="Ben">fast</deltaxml:text>
</deltaxml:textGroup> brown fox jumps over the lazy dog.
</p>
</section>
Addition Followed by Deletion
Consider a paragraph 'p', which is added by 'Anna' and is later deleted by 'Ben'.
<section deltaxml:deltaV2="Original!=Anna!=Ben">
<p deltaxml:deltaV2="Anna">The quick brown fox jumps over the lazy dog.</p>
</section>
The analyzed result for this will be:
<section deltaxml:deltaV2="Original!=Anna!=Ben">
<p deltaxml:added-by="Anna" deltaxml:deleted-by="Ben" deltaxml:deltaV2="Anna">The quick brown fox jumps over the lazy dog.</p>
</section>
Notes
The first example could be described as a simple deletion. In the case of a textGroup or attribute change this would correspond to there being exactly one deltaxml:text or deltaxml:attributeValue possibility. It is therefore possible to give a simple accept or reject type response in a change GUI with a single predictable result. The simple deletion shown here is where the Original version and Anna's version are the same and Ben has deleted the whole paragraph. A version 'Ben' does not appear in the deltaV2.
The second example is of a simple addition. The delta is deltaV2="Anna=Ben". Here the Original version is different from Anna's version because she has added the paragraph. Ben's version is the same because Anna's addition carries on into the next version (unlike with concurrent merge).
The third and fourth complex examples are still classified as a deletion or addition, even when a word is also modified. When we categorize change, add and delete override modify.
The examples above use textGroups and distinguish between simple and complex deletions. This approach is applied to addition (the presence of != implies complex addition as in the fourth example) and it can also be applied to attribute change represented using deltaxml:attributes. The simple/complex categorization can also apply to elements. Consider for example where one editor adds a paragraph and another adds a word to that paragraph, this would be a complex add. Another way of thinking about categorizing element adds or deletes is to consider that a simple add or delete will have no descendant change in that subtree. A complex element add/delete will contain nested change.
Enabling the merge analysis
The sequential merge analysis is performed as part of the output processing stage whenever the result type is set to analyzed deltaV2. This result type needs to be set prior to the output processing stage being run. When using the API the result type must be set prior to invoking the extractAll
method on a SequentialMerge object. The setResultType method can be used with SequentialMerge.MergeResultType.ANALYZED_DELTAV2 as its parameter value.
The command line tool provides the ResultType
optional parameter; setting this to ANALYZED_DELTAV2
will provide a result with the classification attributes.