Introduction to Delta Format Version 2 (deltaV2)
This document provides an introduction to the new DeltaXML delta format (referred to as deltaV2) for representing changes between two XML documents. It is intended primarily for those familiar with the existing delta format (referred to here as deltaV1) to show how this has been improved. This document does not describe either the old or the new delta format in detail.
Background
The current DeltaXML delta format was designed in 2000 and has been 100% stable since then. We are pleased to introduce a new delta format which builds on this and improves it, in particular:
deltaV2 is simpler with fewer elements and attributes
deltaV2 is even easier to process
deltaV2 is extensible to more than two documents
There are some particular areas in which the new format will prove itself. These include:
changes in attributes now much easier to process because changed attributes are represented as elements and no longer embedded in attribute values
attribute namespaces are handled as namespaces rather than using prefixes embedded in attribute values
attribute values and text values are handled in a similar way
the
deltaxml:exchange
element is no longer needed, removing one level of structurean attribute on the root element indicates whether the delta is full-context or changes-only
Additionally, deltaV2 preserves some of the unique features and benefits of the original delta format:
both full-context and changes-only deltas have the same basic format
the delta remains bi-directional, i.e. can be used to convert either document to the other
an unchanged, added or deleted subtree has the same format as the original documents
at each element in the delta you know immediately if it is added, deleted, changed or unchanged
deltaxml:key
anddeltaxml:ordered
attributes are handled in the same way as before, so input filters will not need to be changed
Initially the XML Compare product will adopt this new format in a new 5.0 release. At a later date the DeltaXML Sync product will also adopt this new delta format.
We will look at some of these areas in more detail.
Simpler with fewer elements and attributes
DeltaV1 had six elements and three attributes:
deltaxml:PCDATAmodify
deltaxml:PCDATAold
deltaxml:PCDATAnew
deltaxml:exchange
deltaxml:old
deltaxml:new
@deltaxml:delta
@deltaxml:new-attributes
@deltaxml:old-attributes
DeltaV2 has four elements and one attribute, apart from two additional attributes on the root element:
deltaxml:attributes
deltaxml:attributeValue
deltaxml:textGroup
deltaxml:text
@deltaxml:deltaV2
This has the advantage that less code needs to be written to process delta data. Note also that since the new format caters for three or more documents as well as the basic two, there is even less that needs to be learned in order to process changes.
The delta attribute is similar, and the correspondence between the old and new formats is as follows:
deltaV1 | deltaV2 | Comment |
---|---|---|
|
| The element appears in the 'new' document or 'B' document only. |
|
| The element appears in the 'old' document or 'A' document only. |
|
| The element appears in both documents and is equal. |
|
| The element appears in both documents and is different in each, i.e. not equal. |
|
| The element appears in both documents and is different in each, i.e. not equal. |
Attributes easier to process
One of the biggest changes is in the way attribute values are handled. DeltaV1 was compact in the way that it handled attribute values but quite difficult to process, and could not be extended to more than two documents.
In deltaV1, changed attributes were encoded within the two delta attributes @deltaxml:new-attributes
and @deltaxml:old-attributes
. This meant that to process the attribute values they needed to be extracted. Also, because the old and new values were separated in these two attributes, it was often necessary to do set operations to determine whether an attribute was added, deleted or modified.
In deltaV2, attributes are handled within markup and processing is therefore very much easier. Unchanged attributes are handled as before: they remain unchanged as attributes.
Consider this small example to see how this works, where attribute a1
is unchanged, a2
is added, a3
is deleted and a4
is modified.
Document A (old):
<p a1="value1" a3="value3" a4="value4"/>
In deltaV1 this would be represented as:
<p deltaxml:delta="WFmodify" a1="value1"
deltaxml:old-attributes="a3='value3' a4='value4'"
deltaxml:new-attributes="a2='value2' a4='value5'" />
In deltaV2 this is represented as:
<p deltaxml:deltaV2="A!=B" a1="value1">
<deltaxml:attributes deltaxml:deltaV2="A!=B">
<dxa:a2 deltaxml:deltaV2="B">
<deltaxml:attributeValue deltaxml:deltaV2="B">
value2</deltaxml:attributeValue>
</dxa:a2>
<dxa:a3 deltaxml:deltaV2="A">
<deltaxml:attributeValue deltaxml:deltaV2="A">
value3</deltaxml:attributeValue>
</dxa:a3>
<dxa:a4 deltaxml:deltaV2="A!=B">
<deltaxml:attributeValue deltaxml:deltaV2="A">
value4</deltaxml:attributeValue>
<deltaxml:attributeValue deltaxml:deltaV2="B">
value5</deltaxml:attributeValue>
</dxa:a4>
</deltaxml:attributes>
</p>
The new format is much more verbose, but the code to process it is much shorter and simpler. For example, to determine which attributes have been modified, in deltaV1 it is necessary to parse deltaxml:old-attributes
and deltaxml:new-attributes
to extract the names of all the attributes and then do a set intersection on these to find the names of any attributes in both lists. In deltaV2, it is only necessary to find elements within deltaxml:attributes
which have more than one deltaxml:attributeValue
within them.
The handling of attribute namespaces is now more consistent because the attribute names become element names (for attributes where the value has changed) rather than the prefixes being embedded in the deltaxml:old-attribtues
and deltaxml:new-attributes
values. This makes for easier handling of the namespaces.
Note also that deltaV2 can be extended to handle three or more documents, whereas deltaV1 is limited to just two.
Attribute Namespaces
Some special namespaces are used when representing attribute change in the deltaxml:attributes
element. These are listed below:
usual or recommended prefix | namespace uri | purpose |
---|---|---|
|
| The namespace of an element used to represent an attribute which was not in a namespace in one or both input files. |
|
| The namespace of an element used to represent an attribute in the XML namespace (corresponding to the URI: |
These new namespaces are used for several reasons:
The semantics/use of default (or 'non-prefixed') namespaces applies differently to attributes than it does to elements. We use the 'dxa' namespace so that an attribute converted into an element in a file with a default namespace does not inherit any default namespace.
If the same name is used for an element and an attribute in a grammar (for example the xhtml style element and attribute) there may have been confusion or even mismatching when used with existing XSLT stylesheets/software. Using a new namespace should avoid such issues.
Use of the XML prefix and URI are reserved for future standards. Converting attributes such as
xml:space
intoxml:space
elements would have contravened these guidelines/rules.
New Root element attributes
An attribute on the root element specifies that the document is a delta document and is conforms to deltaV2: deltaxml:version='2.0'
Another attribute on the root element indicates whether the delta document contains just the changes (deltaxml:content-type='changes-only'
) or if the data that is unchanged in all the documents is also present (deltaxml:content-type='full-context'
).
Text handling
Text is handled in a similar manner but there are changes to enable more than two documents to be represented. Consider the following example:
Document A (old):
<p>The quick brown fox</p>
Document B (new):
<p>The quick red fox</p>
In deltaV1 this would be represented as:
<p deltaxml:delta="WFmodify">
<deltaxml:PCDATAmodify>
<deltaxml:PCDATAold>The quick brown fox</deltaxml:PCDATAold>
<deltaxml:PCDATAnew>The quick red fox</deltaxml:PCDATAnew>
</deltaxml:PCDATAmodify>
</p>
In deltaV2 this is represented as:
<p deltaxml:deltaV2="A!=B">
<deltaxml:textGroup deltaxml:deltaV2="A!=B">
<deltaxml:text deltaxml:deltaV2="A">
The quick brown fox</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="B">
The quick red fox</deltaxml:text>
</deltaxml:textGroup>
</p>
This could also be represented in deltaV2 more precisely as:
<p deltaxml:deltaV2="A!=B">
The quick
<deltaxml:textGroup deltaxml:deltaV2="A!=B">
<deltaxml:text deltaxml:deltaV2="A">brown</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="B">red</deltaxml:text>
</deltaxml:textGroup>
fox
</p>
There is therefore no significant difference in the way that text is handled, except that the absence of text in one document is treated in a slightly different manner:
Document A (old):
<p>The quick brown fox</p>
Document B (new):
<p></p>
In deltaV1 this would be represented as:
<p deltaxml:delta="WFmodify">
<deltaxml:PCDATAmodify>
<deltaxml:PCDATAold>The quick brown fox</deltaxml:PCDATAold>
<deltaxml:PCDATAnew/>
</deltaxml:PCDATAmodify>
</p>
In deltaV2 this is represented as:
<p deltaxml:deltaV2="A!=B">
<deltaxml:textGroup deltaxml:deltaV2="A!=B">
<deltaxml:text deltaxml:delta="A">
The quick brown fox
</deltaxml:text>
</deltaxml:textGroup>
</p>