The XML files that you are comparing may contain data that you expect to change. You may wish to ignore these changes. From release 5.1 of XML Compare, XSLT filters are provided to allow you to ignore selected changes. This makes it easy to generate some forms of output from the delta file. The last section in this document describes some use cases for this which allow you to "merge" two documents in a controlled way.
What does "ignore" really mean?
First, we need to ask the question: What is meant by "ignore"?
Consider this very simple example of attribute change:
Ignore could mean:
- remove it completely from the result:
- prefer the 'A' or 'old' value:
- prefer the 'B' or 'new' value:
- take the average of any values with numerical/time data types:
- put in a difference marker:
- find some way to represent them both:
All of these approaches are possible using an output filter, however this document will concentrate on a generic approach and describe filters included in XML Compare since release 5.1 which implement the first three strategies above.
This document discusses how you might handle merges using two sets of input data; one data-centric and one document-centric. Two practical solutions are presented, one for each input data set, with each solution using a different comparator and method for customising a comparison:
- Pipelined Comparator (DXP) - Uses a filter pipeline defined by an XML file called a 'DXP' to customise the comparison.
- Document Comparator - Uses Java API calls to customise a pre-existing pipeline with a number of extension points. The Document Comparator provides a solution tailored to comparing structured documents.
Imagine comparing the following two inputs, with the intention of ignoring the change made to the lastUpdated attribute:
Example 1.1: a small address book as an XML file (documentA.xml in the sample on Bitbucket)
Example 2.1: an updated version of the address book (documentB.xml in the sample on Bitbucket)
XML Compare will produce the following delta:
This shows the changes represented in our deltaV2 format. While this may look overly complicated for such a simple change, it makes the job of processing it considerably easier. A side-effect of attribute changes being represented as elements is the addition of the dxa namespace, this is due to the namespace of a non-qualified attribute not being that of the document but an anonymous one and so this anonymous namespace needs to be represented.
Imagine comparing the following two inputs, with the intention of ignoring the change made to the revision attribute of the author, and also the date elements:
Example 1.2: the author information from a DocBook file (document/documentA.xml in the sample on Bitbucket)
Example 2.2: an updated version of the author information with changed telephone numbers and updated dates ( document/documentB.xml in the sample on Bitbucket)
XML Compare will produce the following delta:
This is the changes represented in our deltaV2 format. While this may look overly complicated for such a simple change, it makes our job of processing it a lot easier. A side-effect of attribute changes being represented as elements is the addition of the dxa namespace, this is due to the namespace of a non-qualified attribute not being that of the document but an anonymous one and so this anonymous namespace needs to be represented. The implication of this is that when promoting this attribute we need to make sure that attribute gets placed in the correct namespace.
Marking data that needs to be ignored
Next we need to mark our data to be ignored, this is achieved by placing the
deltaxml:ignore-changes attribute on the following:
- to ignore an attribute change: on the appropriate child of
deltaxml:attributeswhich is representing the attribute you wish to ignore,
- to ignore a sub-tree change: on the top most node in the sub-tree with a
- to ignore a text change: on the
By placing the
deltaxml:ignore-changes='B,A' attribute, you’re instructing apply-ignore-changes XSLT to change the delta of the modification to be unchanged and to copy the new (B) version. If there is no new version (i.e. in the case of a deletion) the old (A) version is used. This behaviour can be controlled by using a different value for the
deltaxml:ignore-changes attribute, the legal values are shown below:
|"B,A" or "true"||Default. Copy new value if it exists, otherwise copy old value.|
|"A,B"||Copy old value if it exists, otherwise copy new value.|
|"A"||Copy old value if it exists, otherwise don’t output|
|"B"||Copy new value if it exists, otherwise don’t output|
|""||Don’t copy under any circumstances (but process the subtree if present).|
The ignore-changes attribute can be added using an XSLT stylesheet.
Note that if you want to ignore specific changes to comments or processing instructions, you will need to change the lexical preservation settings on the Comparator. See the Preserving Processing Instructions and Comments sample for more information.
An example for ignoring changes to the lastUpdated attribute and lastLoggedIn element is included below.
Example 3.1: an XSLT stylesheet to mark parts of the address book to be ignored ( mark-ignore-changes.xsl in the sample on Bitbucket)
An example for ignoring changes to the version attribute and date elements is included below.
Example 3.2: an XSLT stylesheet to mark parts of the DocBook document to be ignored ( document/mark-ignore-changes.xsl in the sample on Bitbucket)
After the delta has been marked with the changes that should be ignored, using a filter similar to the one above, running
apply-ignore-changes.xsl and then
propagate-ignore-changes.xsl will process the delta, ignoring the marked data. The filter
dx2-extract-version-moded.xsl is imported by
apply-ignore-changes.xsl. All of these filters are supplied with versions of XML Compare 5.1 and later.
The examples used in this document are available for your own experimentation in the
Ignoring Changes repo on Bitbucket (suitable for versions 5.1 and above). The sample shows how to ignore both element and attribute change and provides two examples - one using the Pipelined Comparator and one using the Document Comparator - of how to construct the pipeline of appropriate output filters described here.
Running the sample code
Download the sample resources into the XML Compare release directory under the
samples directory. The resources should be located such that they are two levels below the top level release directory that contains the jar files. For example
Full instructions for running the sample are given in the file README.md file which is displayed under the source in Bitbucket.
Ignore processing in further detail
This section provides some rules and further details about how ignore change processing and particularly how the
apply-ignore-changes.xsl filter works.
Every element in the post-comparison XML tree has an 'effective' deltaxml:deltaV2 attribute which (a) specifies which of the inputs it was present in and (b) whether or not the elements were identical, if present in both inputs. The word effective is used because if you are in an unchanged, added or deleted sub-tree the deltaV2 attribute may only be on an ancestor element.
An element may also have an ancestor ignore-changes attribute, the closest ancestor is used when determining whether an element is included in the result.
Like most filters, some data flows through unaffected. In this case, if an element does not have an ancestor ignore-changes attribute it is copied to the result as-is.
When it does have an ancestor ignore-changes attribute, the following table specifies whether that element appears in the result:
The only difference in behaviour for
B,A occurs at the leaves of the XML tree (i.e. for changed text and attributes). When there are two possible text values in a textGroup or two possible attribute values then the choice between these settings determines which of two values is used in the result.
Ignore changes and attributes
There are some issues related to the closest ancestor rule outlined above when considering attributes. Attributes need to be attached to their parent element. If the ignore-change settings specify that an element is not included, neither are any of its attributes irrespective of their ignore change settings. Here is an example:
Normally we would expect
y='24' to appear in the result if we look solely at the attribute and its local ignore-changes and deltaV2 attributes. However, the ignore-changes setting on the element
x means that the attribute has lost its associated parent element and therefore cannot appear in the result.
Ignore changes and element removal
It is possible to use ignore changes at the element level as well as for simple attribute and text data. This is used for merging as discussed below and can also be used to remove elements from the result. Here are two examples, firstly removing a child element:
In the above example the ignore-changes setting prevents the
z element appearing in the result. Note that as well as occurring at the bottom of a hierarchy this can also appear with a hierarchy, here is another example:
The ignore-changes settings preclude the section appearing in the result, but the same is not true for the pagebreak element, which is effectively promoted in this result of the filter:
How to merge two documents using deltaxml:ignore-changes
In the examples above, the
deltaxml:ignore-changes attribute is applied to individual elements or attributes in the data. However, it can also be applied to a subtree or indeed the entire document. When applied to a subtree, changes in that subtree are removed and therefore the only
deltaxml:deltaV2 attribute will be located at the top of the subtree. If other parts of the file are not marked and processed then any deltaV2 markup remains, describing the changes to those parts of the tree.
If, for example, you place the attribute
deltaxml:ignore-changes="B,A" on the root element, then you will get a merge of the two documents, with the B values of data (attributes, text etc.) being used in precedence to the A values when both are present. The following 'mark' stylesheet will match the root element and add this attribute:
When this is processed with the usual 'mark', 'apply', 'propagate' chain of filters the result will only have a
deltaxml:deltaV2='A=B'attribute on the root element of the result tree and all other change markup will have been removed. Subsequent use of the
clean-house.xsl filter could then be used to remove this attribute and other delta attributes to give a result very close in style to the original inputs but with as much content from both inputs as is possible included in the result.