Skip to main content
Skip table of contents

Detecting and Handling Moves

Introduction

This document describes the concepts behind the detecting and handling of moves. For the resources associated with this sample, see here.

Handling moves

This "how to" document discusses how you can detect a move rather than a delete and an add

In some situations, an element is moved from one place in an XML file to another. When comparing files where such a move has occurred, DeltaXML will not recognise the move as such, but will identify the change as a 'delete' (denoted by deltaxml:deltaV2="A") from the original position and an 'add' of an element (denoted by deltaxml:deltaV2="B") in the new position.

Although DeltaXML does not identify a move directly, it does generate a delta file in XML and it is possible to process this delta file to match delete/add or A/B pairs and so identify moves. This is described below, but first it is worth considering why DeltaXML adopts this approach, and some of the inherent complexities of a 'move' operation.

Moves in DocumentComparator

From XML Compare 9.1, moves are handled natively in the DocumentComparator. This means that you no longer need to use additional XSLT filters as described in the rest of this sample if you are using the DocumentComparator class. To process moves, enable the feature using the setDetectMoves method in ResultReadabilityOptions.

For an element to be considered as a move candidate, there must be an added (deltaxml:deltaV2='B') and deleted (deltaxml:deltaV2='A') version of the same element and no other occurrences. The element is identified by a combination of its local-name as well as an identifier provided by an XPath expression defined in the ResultReadabilityOptions class, e.g. '@id'.

For examples of enabling and configuring moves in the DocumentComparator, see the HandleMovesSample.java file that is included in this sample.

Background

There are a number of reasons for DeltaXML adopting the approach of identifying additions and deletions but not moves.

The first reason is that, in most situations, the delete and add behaviour is what is required. This is almost invariably the case where the XML represents data rather than a text document.

Secondly, the DeltaXML delta file contains no pointers or XPath expressions: all the elements have the same set of ancestors as in the original file(s). In order to represent a move, an XPath expression or a pointer is needed. This adds a level of complexity and makes processing of the delta files more difficult. It was a design goal of the delta format to keep it as simple as possible and also that the delta should reflect the structure of the files being compared.

A third reason relates to the control complexity that is inherent in identifying moves. Here are some considerations:

  • how far away can an element be moved? Anywhere in the file? Anywhere in the same parent? grandparent?

  • how can we detect a move when the element has also been changed?

  • what happens when an element is moved and also copied?

  • when an element is deleted once and added in many places, is that a move?

  • when an element is added once and deleted in many places, is that a move? which elements correspond?

This leads us to the conclusion that, for reliable move detection, each element should be identified so that moves could be detected accurately. In this case it is in fact quite possible to use DeltaXML and, by simply processing the delta file, to detect and mark these moves.

How to identify moves by processing the delta file

If you have, in your files, a reliable way to detect which elements are the same in the two files, then moves can be identified by processing the delta file. For example, if you have an ID attribute on elements, these can be used in this way. Let's look at an example where a set of work and home contacts are represented in an XML file.

Example 1: an XML file representing a set of contacts (r1.xml in the sample resources on Bitbucket)

CODE
<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"> 
 <work-contacts> 
  <contact ID="johnsmith" deltaxml:key="johnsmith"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 567</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
  </contact> 
  <contact ID="markjones" deltaxml:key="markjones"> 
   <name>Mark Jones</name> 
   <phone type="office">+44 200 1234 599</phone> 
   <phone type="fax">+44 200 1234 599</phone> 
   <phone type="mobile">+44 200 1234 500</phone> 
  </contact> 
 </work-contacts> 
 <home-contacts> 
 </home-contacts> 
</records>

And, in a modified version we have moved John Smith to be a home contact, and we have updated his phone number.

Example 2: an updated version of the contacts file (r2.xml in the sample resources on Bitbucket)

CODE
<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"> 
 <work-contacts> 
  <contact ID="markjones" deltaxml:key="markjones"> 
   <name>Mark Jones</name> 
   <phone type="office">+44 200 1234 599</phone> 
   <phone type="fax">+44 200 1234 599</phone> 
   <phone type="mobile">+44 200 1234 500</phone> 
  </contact> 
 </work-contacts> 
 <home-contacts> 
  <contact ID="johnsmith" deltaxml:key="johnsmith"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 555</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
  </contact> 
 </home-contacts> 
</records>

When these are compared, we get the following delta file.

Example 3: a delta file representing the changes between the contact list versions

CODE
<records deltaxml:deltaV2="A!=B" deltaxml:version="2.0" deltaxml:content-type="full-context">
 <work-contacts deltaxml:deltaV2="A!=B">
  <contact deltaxml:deltaV2="A" ID="johnsmith" deltaxml:key="johnsmith">
   <name>John Smith</name>
   <phone type="office">+44 200 1234 567</phone>
   <phone type="fax">+44 200 1234 568</phone>
   <phone type="mobile">+44 200 1234 569</phone>
  </contact>
  <contact deltaxml:deltaV2="A=B" ID="markjones" deltaxml:key="markjones">
   <name>Mark Jones</name>
   <phone type="office">+44 200 1234 599</phone>
   <phone type="fax">+44 200 1234 599</phone>
   <phone type="mobile">+44 200 1234 500</phone>
  </contact>
 </work-contacts>
 <home-contacts deltaxml:deltaV2="A!=B">
  <contact deltaxml:deltaV2="B" ID="johnsmith" deltaxml:key="johnsmith">
   <name>John Smith</name>
   <phone type="office">+44 200 1234 555</phone>
   <phone type="fax">+44 200 1234 568</phone>
   <phone type="mobile">+44 200 1234 569</phone>
  </contact>
 </home-contacts>
</records>

The move is obvious by inspection here because we have an element with an ID="johnsmith" and a deltaxml:delta="B" attribute on it, and another with the same ID attribute and a deltaxml:delta="A" attribute. By processing the delta file, these situations can be detected and the elements marked to indicate that they represent a move operation. Note that the phone number is not identified as changed, but the new number is present in the moved data, so the data is correct.

The above example also demonstrates the use of keys in the matching process. We would recommend that moved elements should also be keyed for consistent move results. For details of keying please see Using Keys with Ordered Data.

From XML Compare Release 5.3.3 onwards it is possible to call the compare function from within an XSLT stylesheet, and this would enable an actual comparison to be made for these moved objects during the processing of the delta file. An XSLT stylesheet for processing this example data is presented below.

Example 4: an XSLT stylesheet that detects moves in the contact list (handle-moves.xsl in the sample resources on Bitbucket)

CODE
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
  xmlns:deltajava="java:com.deltaxml.ext.xslt.saxon.Comparison"
  version="2.0">
  
  <!-- This filter detects and processes contact 'moves'.  A move is defined
    as exactly one added and one deleted contact with the same ID attribute.
    Keys are used to provide efficiency and count adds/deletes.
    We support moves anywhere within the document.  -->
  
  <xsl:key name="deletedContacts" match="contact[@deltaxml:deltaV2='A']" use="@ID"/>
  <xsl:key name="addedContacts" match="contact[@deltaxml:deltaV2='B']" use="@ID"/>
  
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="contact[@deltaxml:deltaV2='B']
                              [count(key('deletedContacts', @ID)) eq 1 and count(key('addedContacts', @ID)) eq 1]">
    <xsl:copy>
      <!-- if you don't care about changes to moved contacts you could just apply-templates now,
            otherwise need to re-compare the added/deleted contact using the compare extension function -->
      <xsl:variable name="a" select="key('deletedContacts', @ID)" as="node()"/>
      <xsl:variable name="b" select="." as="node()"/>
      <xsl:variable name="delta" as="node()"
          select="deltajava:compare($a, $b, resolve-uri('process-moves.dxp', static-base-uri()))"/>
      <xsl:apply-templates select="$delta/*/@*"/>
      <xsl:comment>This contact element was detected as a move</xsl:comment>
      <xsl:apply-templates select="$delta/*/node()"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="contact[@deltaxml:deltaV2='A']
                              [count(key('deletedContacts', @ID)) eq 1 and count(key('addedContacts', @ID)) eq 1]">
    <xsl:comment>A contact element with id <xsl:value-of select="@ID"/> was at this location but has been moved</xsl:comment>
  </xsl:template>
</xsl:stylesheet>

When this stylesheet is used to process the previous delta, the improved delta with the moves detected and compared becomes:

CODE
<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" deltaxml:deltaV2="A!=B" 
         deltaxml:version="2.0" deltaxml:content-type="full-context">
  <work-contacts deltaxml:deltaV2="A!=B">
    <!--A contact element with id johnsmith was at this location but has been moved-->
    <contact deltaxml:deltaV2="A=B" ID="markjones" deltaxml:key="markjones">
      <name>Mark Jones</name>
      <phone type="office">+44 200 1234 599</phone>
      <phone type="fax">+44 200 1234 599</phone>
      <phone type="mobile">+44 200 1234 500</phone>
    </contact>
  </work-contacts>
  <home-contacts deltaxml:deltaV2="A!=B">
    <contact deltaxml:deltaV2="A!=B" ID="johnsmith" deltaxml:key="johnsmith">
      <!--This contact element was detected as a move-->
      <name deltaxml:deltaV2="A=B">John Smith</name>
      <phone deltaxml:deltaV2="A!=B" type="office">
        <deltaxml:textGroup deltaxml:deltaV2="A!=B">
          <deltaxml:text deltaxml:deltaV2="A">+44 200 1234 567</deltaxml:text>
          <deltaxml:text deltaxml:deltaV2="B">+44 200 1234 555</deltaxml:text>
        </deltaxml:textGroup>
      </phone>
      <phone deltaxml:deltaV2="A=B" type="fax">+44 200 1234 568</phone>
      <phone deltaxml:deltaV2="A=B" type="mobile">+44 200 1234 569</phone>
    </contact>
  </home-contacts>
</records>

DeltaXML allows keys to control the comparison process and these would typically be useful for detecting moves, as shown in the use of keys in the above example.

How to identify moves by pre-processing the data

Another approach to this problem is to generate, from the original data, a file containing the elements that you know may have moved and that you wish to be compared.

Using the above example, we could filter the original files to generate these files. Here we have removed the <work-contacts> and<home-contacts> wrapper elements and just listed the <contact> elements. We have added a path attribute (or we could have used a sub-element) to record the original ancestors.

CODE
<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
    deltaxml:ordered="false"> 
 <contact ID="johnsmith" deltaxml:key="johnsmith" path="records/work-contacts/"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 567</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
  </contact> 
  <contact ID="markjones" deltaxml:key="markjones" path="records/work-contacts/"> 
   <name>Mark Jones</name> 
   <phone type="office">+44 200 1234 599</phone> 
   <phone type="fax">+44 200 1234 599</phone> 
   <phone type="mobile">+44 200 1234 500</phone> 
  </contact> 
</records>

And, in the modified version we have this:

CODE
<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
    deltaxml:ordered="false"> 
 <contact ID="markjones" deltaxml:key="markjones" path="records/work-contacts/"> 
   <name>Mark Jones</name> 
   <phone type="office">+44 200 1234 599</phone> 
   <phone type="fax">+44 200 1234 599</phone> 
   <phone type="mobile">+44 200 1234 500</phone> 
  </contact> 
 <contact ID="johnsmith"  deltaxml:key="johnsmith" path="records/home-contacts/"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 555</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
  </contact> 
</records>

Note that we have added deltaxml:ordered="false" to the root element because the <contact> elements could appear in any order. An alternative is to sort them by key and this may be easier to check. When these are compared we will see the changes to each contact in terms of its contents and its position in the original file.

CODE
<records deltaxml:deltaV2="A!=B" deltaxml:ordered="false" deltaxml:version="2.0" 
 deltaxml:content-type="full-context">
 <contact deltaxml:deltaV2="A!=B" deltaxml:key="johnsmith" ID="johnsmith">
  <deltaxml:attributes deltaxml:deltaV2="A!=B">
   <dxa:path deltaxml:deltaV2="A!=B">
    <deltaxml:attributeValue deltaxml:deltaV2="A">records/work-contacts/</deltaxml:attributeValue>
    <deltaxml:attributeValue deltaxml:deltaV2="B">records/home-contacts/</deltaxml:attributeValue>
   </dxa:path>
  </deltaxml:attributes>
  <name deltaxml:deltaV2="A=B">John Smith</name>
  <phone deltaxml:deltaV2="A!=B" type="office">
   <deltaxml:textGroup deltaxml:deltaV2="A!=B">
    <deltaxml:text deltaxml:deltaV2="A">+44 200 1234 567</deltaxml:text>
    <deltaxml:text deltaxml:deltaV2="B">+44 200 1234 555</deltaxml:text>
   </deltaxml:textGroup>
  </phone>
  <phone deltaxml:deltaV2="A=B" type="fax">+44 200 1234 568</phone>
  <phone deltaxml:deltaV2="A=B" type="mobile">+44 200 1234 569</phone>
 </contact>
 <contact deltaxml:deltaV2="A=B" ID="markjones" deltaxml:key="markjones" path="records/work-contacts/">
  <name>Mark Jones</name>
  <phone type="office">+44 200 1234 599</phone>
  <phone type="fax">+44 200 1234 599</phone>
  <phone type="mobile">+44 200 1234 500</phone>
 </contact>
</records>

This delta provides details only of the changed contacts and those that have been moved. The moves are indicated by a change in the path attribute.

DeltaXML provides a versatile solution to handling additions and deletions and can be configured to show moves as indicated above.

Running the sample

Using Java

Download the sample from https://bitbucket.org/deltaxml/detecting-and-handling-moves

Details of how to run the sample are given in the file README.md.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.