Two and Three Document DeltaV2 Format
DeltaXML DeltaV2 Format Description
The DeltaXML DeltaV2 format is a representation of two or more XML documents in a single document. Any of the original documents can be extracted from the Delta.
The base version for the format described here is Version 2.0. In addition to this, specifically labelled sections describe the extensions added in Version 2.1. This latter version supports the recording of overlapping changes (from formatting elements) and is only output by the document comparator introduced in DeltaXML Core version 7.0.
When the format is applied to two documents, these input documents are denoted A and B, and with three documents the inputs are denoted A, B and C.
Namespaces and Prefixes
Three namespaces are used in the DeltaV2 format to represent change, they are summarized in the following table:
usual prefix | namespace URI | purpose |
---|---|---|
|
| Elements and attributes used to represent change between the inputs |
|
| The namespace of an element, used to represent an attribute, which was not in a namespace in one or both input files. |
|
| The namespace of an element, used to represent an attribute in the XML namespace (corresponding to the URI: |
The v1 component of the namespace URI is not a mistake. This URI was initially chosen when opinions on versioning of XML vocabularies and namespaces were in their infancy; we now agree with the mainstream opinion that new versions of a format/language should keep the same namespace.
Elements and Attributes
This is a list of the elements used by this format:
Element name | Content | Purpose |
---|---|---|
| One of more elements, each of which has a local-name and namespace corresponding to an attribute belonging to the parent element. | Details any differences between the attributes associated with the parent element. |
| CDATA representing the value of an attribute | To record an attribute value that appeared in one or more of the input documents. |
| One or more deltaxml:text elements. | This element contains the variants of this segment of text. |
| PCDATA, i.e. text | To record a text item that appeared in one or more of the input documents. |
| One or more deltaxml:content or deltaxml:text elements | This element contains the variants of text and inline formatting elements. |
| Elements and PCDATA, i.e. text | To record text or inline formatting elements |
| One or more deltaxml:textGroup elements. | Contains the variants of deltaxml:textGroup elements. |
| One or more deltaxml:token elements. | Contains unique deltaxml:token elements. |
| One or more deltaxml:token elements. | Contains deltaxml:token preserving order of attribute values. |
| PCDATA, i.e. text | To represent a tokenized attribute value. |
| One or more deltaxml:movedText elements. | This element contains the variants of this segment of text after after element moves are identified |
| PCDATA, i.e. text | To record a text item that appeared in one or more of the input documents due to element moves |
| One of more elements, each of which has a local-name and namespace corresponding to an attribute belonging to the parent element. | Details any differences between the attributes associated with the parent element after element moves are identified |
| CDATA representing the value of an attribute | To record an attribute value that appeared in two documents after element moves are identified. |
This is a list of the attributes used by Delta:
Attribute name | Content | Purpose |
---|---|---|
| For a delta of two documents, one of the following values: A, B, A=B, A!=B For a delta of three documents, one of the following values: A, B, C, A=B, A!=B, B=C, B!=C, A=C, A!=C, A=B=C, A!=B!=C, A=B!=C, A=C!=B, A!=B=C. | Details the documents in which this data item appeared. If it appeared in more than one document, this attribute also indicates whether the data items were the same or different. For example, |
| For a text/word delta of a deltaxml:contentGroup, one of the following values: A, B, A=B, A!=B | This attribute defines the text/word equality of a deltaxml:contentGroup. |
| For a delta of a move elements, one of the following values: A, B, A=B, A!=B | Define equality of moved elements |
On the root element the following attributes must appear:
Attribute name | Content | Purpose |
---|---|---|
| This must be "2.0" | This indicates the version of the delta format. |
| This must have one of the values: | This indicates if the delta document contains just the changes ( |
| Shows the comma-separated element position for the A and B documents: E.g. 'A=2, B=1' | Optional output for child elements of an 'Orderless Container' element. Shows the position of each child element for the same element as found in the A document and in the B document. |
On each deltaxml:attributes
element the following attribute/value must be present:
Attribute name | Content | Purpose |
---|---|---|
| This must be "false" | Indicates that the child elements, used to represent attributes, can appear in any order. |
New Attributes added in Version 2.1
In Version 2.1 new attributes have been added to support the representation of changes in element structure with respect to content. In certain document comparison scenarios (e.g. when processing formatting elements) this can improve granularity, a fuller description and samples can be found in the Overlapping Hierarchies in DeltaV2 document.
This following table summarises the function and contents of each of these new attributes:
Attribute name | Content | Purpose |
---|---|---|
| An alphabetically sorted list of one or more comma-separated document identifers, e.g. "A", "B", "A,B". | Indicates that the element tag was present in the document(s) identified, and all of its content is as included in this element in the delta file. |
| As above | Indicates that the element tag was present in the document(s) identified, but in that document the element contained not only the content here but more content. The additional content follows in zero or more elements marked deltaxml:deltaTagMiddle and ending with an element marked deltaxml:deltaTagEnd. |
| As above | Indicates that the content of this element was included in an element marked deltaxml:deltaTagStart which precedes this element. There is additional content, which follows in an element marked deltaxml:deltaTagMiddle or deltaxml:deltaTagEnd. |
| As above | Indicates that the content of this element was included in an element marked |
Description
There is no DTD or Schema for a Delta, but the Delta will have the same look and feel as the original documents. There is a set of simple rules which apply to the Delta format. In general terms, the Delta generated from a set of documents will be a union of these documents in the sense that all the data that appears in any of the documents will also appear in the Delta.
Elements, attributes and text that are identified by DeltaXML as common to two or more of the documents are shared in the Delta. A subtree that appears unchanged in one or more documents will appear in the Delta almost exactly as it appeared in the original document(s).
Schematron Rules
The definitive version of these rules is contained in a Schematron rule document: deltaV2-schematron.xml. - Note: this document does not yet support the new Version 2.1 attributes.
The same rules file is normally used for both the 2-input and 3-input deltas. Only one of the rules, the one which specifies the deltaV2 allowable characters, needs to be changed should a schema be required to differentiate between the 2-way and 3-way deltas. Should a specific 2-way delta be required the rule would be:
<report test="matches(@deltaxml:deltaV2, '[^AB!=]')">Delta value contains invalid characters</report>
Should an application need to be coded to handle both 2-way and 3-way delta results and be required to differentiate between them it can so so by looking at the characters contained in the deltaxml:deltaV2
attribute on the root node of a delta file. The root node will always have a delta attribute and it will always reflect the number of inputs.
Full Context Delta
In an analogous manner to the original Delta format, a Delta can either show just changes or can include all the data from the original documents. When only changes are shown, the content of any unchanged element is not reproduced in the delta. This will be for any element with a deltaxml:deltaV2
attribute equal to A=B or, for three documents, A=B=C.
Compatibility with original Delta format
The original delta format handled two documents only. The values of the deltaxml:deltaV2
attribute correspond with the deltaxml:delta
attribute as follows: add
becomes B
, delete
becomes A
, unchanged
becomes A=B
and WFmodify
becomes A!=B
. The value WFmodifyUnordered
is also A!=B
but the deltaxml:ordered="false"
attribute remains on the element so that knowledge of the unordered content remains in the delta document.
Attribute changes are now handled within markup to make processing easier. There is no longer a representation of an exchange between two elements or an element and text item.
Examples for Two Documents
Examples of Elements in Delta
Document A | Document B |
---|---|
<example> <tel/> | <example> </person> |
And the Delta for this will be as follows:
Delta | Comments |
---|---|
<example deltaxml:deltaV2="A!=B"> | Element <firstName> appears in both documents, A and B, and is the same in both. Element <lastName> appears in document B only. Element <T> appears in document A only. |
Examples of Text in Delta
Document A | Document B |
---|---|
<example> | <example> |
And the Delta for this will be as follows:
Delta | Comments |
---|---|
<example deltaxml:deltaV2="A!=B"> | The text in <firstName> is "J" in A and "John" in document B. The text in <lastName> is the same in both documents. |
Examples of Attributes in Delta
Document A | Document B |
---|---|
<example> | <example> |
And the Delta for this will be as follows:
Delta | Comments |
---|---|
<example deltaxml:deltaV2="A!=B"> | The attribute 'gender' is unchanged and so appears as a regular attribute. The attribute 'age' has a value of 36 in document A and 37 in B. Element <firstName> appears now as the second child of <person>. |
Examples for Three Documents
Examples of Elements in Delta
Document A | Document B | Document C |
---|---|---|
<example> | <example> </person> | <example> </person> |
And the Delta for this will be as follows:
Delta | Comments |
---|---|
<example deltaxml:deltaV2="A!=B!=C"> | Element <lastName> appears in two documents, A and B, and is the same in both. Element <tel> appears in only one document, A. |
Examples of Text in Delta
Document A | Document B | Document C |
---|---|---|
<example> | <example> | <example> |
And the Delta for this will be as follows:
Delta | Comments |
---|---|
<example deltaxml:deltaV2="A!=B!=C"> | The text in <firstName> is "J" in both A and C. The text in <firstName> is "John" in document B. The text in <lastName> is the same in all documents. |
Examples of Attributes in Delta
Document A | Document B | Document C |
---|---|---|
<example> | <example> | <example> |
And the Delta for this will be as follows:
Delta | Comments |
---|---|
<example deltaxml:deltaV2="A!=B!=C"> | The attribute 'gender' is unchanged and so appears as a regular attribute. The attribute 'age' has a value of 36 in document A and 37 in B. Element <firstName> appears now as the second child of <person>. |
DeltaV2 Version 2.1 Samples
A set of samples specific to Version 2.1 is provided in the document Overlapping Hierarchies in DeltaV2 .
The included samples illustrate how the new attributes introduced to DeltaV2 can be used in practice by supporting DeltaXML products.
Examples of Format Change using deltaxml:contentGroup
Document A | Document B | Document C |
---|---|---|
<example> | <example> | <example> |
And the Delta for this will be as follows:
Delta | Comments |
---|---|
<example deltaxml:deltaV2="A!=B!=C"> | The document B and document C has 'b' and 'i' formatting respectively along with some text changes. |