XML Compare is capable of comparing XML files that have some of their elements 'unordered', i.e. the elements in the two files being compared may appear in any order (referred to an an orderless comparison). Specific attributes need to be added to the files to achieve this and one convenient way to do this is to use XSLT.
This "how to" guide uses this ability to compare the differences between XML Schema definition files. It steps through a number of worked examples based on XML Schemas so that you can see how to use XSL stylesheets to automatically add keys to your XML Schema to allow it to be processed using the XML Compare differencing engine.
For further details of how the orderless comparison works in XML Compare, please see How to Compare Orderless Elements for details.
1.1. White space in examples
Most of the examples in this paper are pretty-printed, so the white space is present for display purposes only and is not part of the XML files. It is good practice when comparing files either to remove all whitespace, e.g. using an XSL script
2. Worked example using XML Schema
Our intention is to provide an XSL filter that will add the required attributes for a specific type of XML file. As an example, we will use XML Schema and develop an XSL filter to add keys to any XML Schema file so that XML Compare will be able to compare two XML Schema files where, for example, the complex type definitions may appear in any order. In addition, after we have added these attributes, XML Compare will be able to determine that changes in the order of items in a <choice> element are not important and can be ignored. In this way, using a simple XSL filter, XML Compare can be configured to provide a very intelligent comparison of XML Schema files.
The first step is to determine where keys need to be added. By default, XML data is taken to be ordered. We need to indicate to XML Compare the cases where the elements are not ordered. The first step is therefore to examine the DTD or Schema to find where this is the case (do not get confused here: as we are using XML Schema as an example, this means we will look at the DTD for XML Schema or the Schema for XML Schema). We will use the DTD in this case as it is more compact, but the same principles would apply if a Schema had been used. Unfortunately, Schema does not have any way to indicate if particular elements should be ordered or not so it is necessary to use domain knowledge to determine if this is so.
We can examine the elements in alphabetical order, having expanded out all the entities so that we see how the real content of the element is defined. In the examples below, some analysis has already been done to determine which elements or content particles are unordered.
2.1. <all> element
Example 1. Definition of the 'all' element
In the case of the <all> element, its contents consist of a single optional annotation and zero or more <element> elements. The <element> elements are not ordered - they could appear in any order and would have the same meaning.
This means that for every occurrence of an <all> element in our schema files being compared, we need to add a deltaxml:ordered="false" attribute. This can be achieved in XSL as follows.
Example 2. Adding a deltaxml:ordered="false" attribute to all <all> elements
The single <annotation> element should appear first, but we cannot enforce this with XML Compare. We can however ensure that the annotation elements are matched up by giving them the same key. This can be achieved by adding a template in the XSL to match the <annotation> element (there will always be at most one of these) within an <all> element and add this key, as follows.
Example 3. Adding a deltaxml:key="single" attribute to <annotation> within <all>
Notice here that we do not always want to do this because annotation is in some cases allowed to occur many times, so this template matches only <annotation> elements within <all> elements. The reason for using the value "single" for the key is simply that it enables us to re-use this template for other similar situations, as you will see later.
Next, we need to add deltaxml:key attributes for the <element> elements within <all>. The key to these will be either the @ref (the attribute named "ref") or the @name attribute. This means that the value of the delaxml:key attribute needs to be a copy of the value of either of these attributes. As it is illegal for both @name and @ref to appear together, we can represent this in XSL as follows.
Example 4. Adding a deltaxml:key="XX" attribute to <element> within <all>
We will see later that there are other situations where we need to do the same thing, and this can be achieved by changing the match attribute on this template.
2.2. <annotation> element
The next element that has repeated items in its content is <annotation>.
Example 5. Definition of <annotation>
In this case the content is ordered, so we do not need to add any deltaxml:ordered attribute to this element.
2.3. <appinfo> element
Example 6. Definition of <appinfo>
The <appinfo> element has ANY content which allows both text and elements. This is considered to be ordered and so no deltaxml:ordered atttibute needs to be added.
2.4. <attributeGroup> element
The <attributeGroup> element is a little more complicated.
Example 7. Definition of <attributeGroup>
Here we have an optional <annotation> element and an optional <anyAttribute> element in addition to the repeated <attribute> and <attributeGroup> elements. Because some of the repeated items are unordered we need to indicate that the <attributeGroup> element is unordered, by adding a deltaxml:ordered="false" attribute to it.
We can treat the optional <annotation> element in the same way as before, but we do not need to create a new template, we can simply change the match on the existing one. In fact we can change it at the same time to cater also for the <anyAttribute> element in this situation.
Example 8. Modifying the template match to add deltaxml:key="single" attribute
The <attribute> and <attributeGroup> elements have keys of @name or @ref in the same way as the <element> element above. So, again, we can change the match value to cater for these situations also.
Example 9. Modifying the template match to add deltaxml:key="XX" attribute
However, there is a problem here and the above will not work correctly. The problem is that an <attributeGroup> element may be matched by two templates now, depending on where it appears. If it appears within another <attributeGroup> element, it will match the second template, otherwise it will match the first. But we need to apply a deltaxml:ordered="false" attribute in both cases, and this is not done in the second case.
The solution is to split the second template into two, one for orderless elements which will add the deltaxml:ordered="false" attribute, the other for ordered elements which will not do this.
Example 10. Template to add deltaxml:key="XX" attribute for orderless elements
Example 11. Template to add deltaxml:key="XX" attribute for ordered elements
Now the templates should behave as expected.
2.5. <choice> element
The <choice> element has <annotation> as a child and this can be dealt with as before.
Example 12. Definition of <choice>
The repeating <element> and <group> elements can also make use of previously-defined templates. The other repeating items can be identified by their @id attribute. As this is of type ID it will be unique across the whole file, but this means it will also be unique within this element and so can be used as a key. The following new templates will achieve this. We need two templates to cater for ordered and unordered elements as discussed above.
Example 13. Templates to add deltaxml:key attibute using @id value
2.6. <complexType> element
The <complexType> element has a more complex structure.
Example 14. Definition of <complexType>
The repeating group can be handled using existing templates. The other items occur as single elements and so again can make use of an existing template.
2.7. <documentation> element
The <documentation> element has ANY content and is ordered. Nothing needs to be added to the XSL filter.
Example 15. Definition of <documentation>
2.8. <element> element
The repeating items in <element> are unordered.
Example 16. Definition of <element>
We need a new template to cater for these three items and we need to amend the existing 'single' element template to add the <annotation>, <complexType> and <SimpleType> to it.
Example 17. Templates to add deltaxml:key attibute using @name value
2.9. <extension> element
For the <extension> element, we can modify existing templates in a similar way to changes for <complexType>.
Example 18. Definition of <extension>
2.10. <key> element
The <key> element is ordered, so no changes are required to the template.
Example 19. Definition of <key>
2.11. <keyref > element
The <keyref> element is ordered, so no changes needed to the XSL filter.
Example 20. Definition of <keyref >
2.12. <redefine> element
The <redefine> element is not ordered. The <annotation> element has no key and this means that any changes to an <annotation> element will result in a delete and an add rather than a modify. In general it is good practice to ensure that all unordered elements have some key to avoid this problem.
Example 21. Definition of <redefine>
The key @name can make use of the existing template that adds this key.
2.13. <restriction> element
The repeated items in <restriction> are not ordered, but there are only keys for two of the elements. Any changes to other elements will be handled as a delete and an add.
Example 22. Definition of <restriction>
2.14. <schema> element
The <schema> element has a rather convoluted DTD specification due to the wish to have <annotation> at any position. However, apart from <annotation>, most of the repeated elements have keys. It is not possible with DeltaXML to preserve the order of the <annotation> elements, but this is generally not important because <annotation> is permitted within each of the other elements.
Example 23. Definition of <schema>
2.15. <sequence> element
The <sequence> element is ordered, so no changes are needed to the XSL filter.
Example 24. Definition of <sequence>
2.16. <union> element
The <union> element is not ordered and we can make simple modifications to existing templates to cater for this.
Example 25. Definition of <union>
2.17. <unique> element
The <union> element is not ordered and a new template is needed to cater for the @xpath key.
Example 26. Definition of <attributeGroup>
3. Using the XSL filter
We can now look at the effect of this filter by considering some examples. File t1a.xml is a simple (and not correct or complete) Schema file.
Example 27. File t1a.xml
It is worth looking at the same file after it has passed through the XSL input filter we have developed. This is shown below.
Example 28. File t1a.xml after it has been filtered with XSL (pretty-printed)
This shows how the deltaxml:unordered and deltaxml:key attributes have been added to the file.
We can now make some changes to this file which are not 'real' changes and which we would like to be ignored by the comparator. First, we can change the order of the <element> definitions. Second, we can change the order of the elements in the <choice> in "test3". The new file is shown below.
Example 29. File t1b.xml
If we do not use the XSL filter we have developed, we get a large number of changes as shown below (the white space has been normalized before the comparison process).
Example 30. Comparison without using the XSL filter
If we now use the new XSL filter, then the result is as shown below.
Example 31. Comparison using the XSL filter (no changes as expected)
This shows no changes, as expected, because all we have done is to change the order of some of the elements where this is not significant.
If any 'real' changes are made these will be seen in the result.
4. Summary of steps for XSL template definition
The worked example above shows how an XSL template can be developed for a specific XML structure. These steps can be summarized as follows.
4.1. Expand DTD or Schema
It is first necessary to expand any entities or Schema structures that are used in the DTD or schema so that the real structure can be seen. If this step is not done it is likely that repeating content particles will not be correctly identified.
4.2. Identify repeating content particles
Each repeating content particle should be identified.
4.3. Determine which content particles are unordered
For each identified content particle, determine whether it is ordered or not. Note that MIXED and ANY content is always considered ordered because of the presence of PCDATA.
4.4. For each unordered content particle, add deltaxml:ordered attribute
For any element that has an unordered repeating content particle within it, a deltaxml:ordered="false" attribute needs to be added. Only a single template is needed for this but remember to add this attribute also in any other template that handles these elements.
4.5. For each unordered content particle, determine keys
For each unordered content particle, determine, for each element that is repeated, the keys that are suitable to identify the element. The key may be a constant value, a single attribute, a combination of attributes or the content of a child element. With XSL it is possible to cater for complex keys because the full power of the language is available to set up the deltaxml:key attribute.
4.6. Check the behaviour of the script
The worked example above discusses some of the issues in developing the XSL stylesheet to behave correctly. A set of test data examples will ensure that the effect that you require has been achieved. As XSL uses a pattern-matching mechanism, it is easy to get the wrong effects if a template is selected that you had not intended.
This paper shows how to develop an XSL filter to make DeltaXML work intelligently for a particular XML format. Simple keys can easily be represented using modifications of the worked example above. For more complex situations, the full power of XSL is available to configure the input files with additional attributes to drive DeltaXML in special ways according to the ordering and keys of elements.
The additional attributes can be stripped out using an output filter or left in for further processing. For any re-combination with the original files, the attributes will need to be present in both files.
The full XSLT filter (schema-input-filter.xsl) is provided in the xsl-filters sub-directory in the samples directory distributed with the product.