Using Keys with Ordered Data

 Table of Contents

Adding keys to your data allows you to control the way XML Compare aligns the elements at each level in the documents you are comparing. Keys are useful both for ordered and orderless data. This document describes the concepts of how to use them with ordered data. For the resources associated with this sample, see here for Java and here for .NET

Comparing ordered data with keys

Ordered data comparison is often improved by using keys, which serve to control the alignment in XML Compare comparisons. Even without keys, XML Compare will always produce correct difference files. The issue is that several correct answers may be possible. We prefer that which best matches human understanding.

The next example illustrates this idea. Paragraphs form an ordered collection of data. Suppose that we have a small section of a book like this:

Example 1: First book draft (documentA.xml in Bitbucket, https://bitbucket.org/deltaxml/using-keys-with-ordered-data)

<book> 
  <p>The first advantage of DeltaXML is ..</p> 
  <p>The second advantage of DeltaXML is ..</p> 
</book>

Now we create an introductory paragraph and place it at the start of the file, while modifying the other paragraphs:

Example 2: Second book draft (documentB.xml in Bitbucket, https://bitbucket.org/deltaxml/using-keys-with-ordered-data)

<book> 
  <p>DeltaXML has many advantages:</p> 
  <p>The most important advantage of DeltaXML is ..</p> 
  <p>And the next advantage of DeltaXML is ..</p> 
</book>

We made three changes. Yet because we did not use key attributes, XML Compare produces a more verbose, though still correct output:

Example 3: Unkeyed comparison of first and second book drafts showing mismatches

<book xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"      
      deltaxml:deltaV2="A!=B"
      deltaxml:version="2.0"
      deltaxml:content-type="full-context">
   <p deltaxml:deltaV2="A!=B">
      <deltaxml:textGroup deltaxml:deltaV2="A!=B">
         <deltaxml:text deltaxml:deltaV2="A">The first advantage of DeltaXML is ..</deltaxml:text>
         <deltaxml:text deltaxml:deltaV2="B">DeltaXML has many advantages:</deltaxml:text>
      </deltaxml:textGroup>
   </p>
   <p deltaxml:deltaV2="A!=B">
      <deltaxml:textGroup deltaxml:deltaV2="A!=B">
         <deltaxml:text deltaxml:deltaV2="A">The second advantage of DeltaXML is ..</deltaxml:text>
         <deltaxml:text deltaxml:deltaV2="B">The most important advantage of DeltaXML is ..</deltaxml:text>
      </deltaxml:textGroup>
   </p>
   <p deltaxml:deltaV2="B">And the next advantage of DeltaXML is ..</p>
</book>

All paragraphs were mismatched. While the delta file is correct, we know that DeltaXML should also have matched the paragraphs in a different way. Yet without further information, XML Compare cannot always tell which paragraphs should be aligned.

NOTE: With XML Compare version 4 or above, there is a new Enhanced Matcher which is capable of achieving a very much better match between elements based on their content. This is especially effective for documents when using the word-by-word option. However, for the purposes of this example, in order to show how keys work, we have not used the word-by-word feature.

A better comparison requires hints. The hints take the form of key attributes. Here key attributes are applied to the paragraphs:

Example 4: First book draft with keys (documentA-keyed.xml in Bitbucket, https://bitbucket.org/deltaxml/using-keys-with-ordered-data)

<book xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"> 
  <p deltaxml:key="P1">The first advantage of DeltaXML is ..</p> 
  <p deltaxml:key="P2">The second advantage of DeltaXML is ..</p> 
</book>

Example 5: Second book draft with keys (documentB-keyed.xml in Bitbucket, https://bitbucket.org/deltaxml/using-keys-with-ordered-data)

<book xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"> 
  <p>DeltaXML has many advantages:</p> 
  <p deltaxml:key="P1">The most important advantage of DeltaXML is ..</p> 
  <p deltaxml:key="P2">And the next advantage of DeltaXML is ..</p> 
</book>

Running XML Compare under these conditions produces a more natural result:

Example 6: Keyed comparison of first and second book drafts showing no mismatches

<book xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"      
      deltaxml:deltaV2="A!=B"
      deltaxml:version="2.0"
      deltaxml:content-type="full-context">
   <p deltaxml:deltaV2="B">DeltaXML has many advantages:</p>
   <p deltaxml:deltaV2="A!=B" deltaxml:key="P1">
      <deltaxml:textGroup deltaxml:deltaV2="A!=B">
         <deltaxml:text deltaxml:deltaV2="A">The first advantage of DeltaXML is ..</deltaxml:text>
         <deltaxml:text deltaxml:deltaV2="B">The most important advantage of DeltaXML is ..</deltaxml:text>
      </deltaxml:textGroup>
   </p>
   <p deltaxml:deltaV2="A!=B" deltaxml:key="P2">
      <deltaxml:textGroup deltaxml:deltaV2="A!=B">
         <deltaxml:text deltaxml:deltaV2="A">The second advantage of DeltaXML is ..</deltaxml:text>
         <deltaxml:text deltaxml:deltaV2="B">And the next advantage of DeltaXML is ..</deltaxml:text>
      </deltaxml:textGroup>
   </p>
</book>

Paragraphs with the same key values have been matched up. The first paragraph is now shown as added, and the others as modified. This delta file corresponds to the nested edits, as understood by a human. Note that not all the paragraphs have keys, and XML Compare will work with whatever keys are provided. It is not necessary to key all the paragraphs although obviously this should be done for best results as further edits are made.

What data can be used for keys?

You can use any text data as a key and often there will be information within your document or data file that is suitable. For example, if there are ID attributes these can be used. In this case, all you need to do is to copy the value of the ID attribute into the deltaxml:key attribute value. You can also construct the key value from two or more existing attributes, or from some other content. If you do not want the keys in your final result, they can be stripped out, they are only needed for the comparison process.

It is best to keep the keys unique across child elements of a particular type, e.g. all <p> elements in a <section>. This is not essential but will give more predictable results.

 Key rules in ordered comparisons

These basic rules apply to the use of keys in ordered comparisons:

  • XML Compare never records a key change, because it considers elements with different keys as different elements.
  • An element lacking a key will never match any keyed element.
  • Key strings must be normalized: no leading or trailing whitespace; at most single embedded spaces between words.
  • Matching of keyed elements takes precedence over matching of unkeyed elements.
  • Elements with keys can be nested, and keys can be used at any level.
  • The deltaxml:key attribute may be used to identify ordered data only in XML Compare Version 2.1 or higher. Earlier XML Compare versions allow keys for orderless elements only.
  • Version 4 and higher of XML Compare have an enhanced matcher which will give improved results when it is not possible to use keys. (This is always used when using Document Comparator)

Running the sample

Download the the resources associated with this sample from Bitbucket, see here for Java and here for .NET

Details of how to run the sample are given in the file README.md, displayed in Bitbucket under the list of source files.

#content .code