1. Configuration File Purpose
The configuration file is used to specify how the comparison is required to behave for specific elements or attributes.
Within elements, it is possible to:
- ignore changes,
- specify how a list of items is matched,
- allow numeric values to have a tolerance,
- normalize whitespace and compare whole blocks or single words.
The schema for the configuration file is defined using an XML Schema (XSD), the XSD file is included in the product distribution resources and it is documented here.
2. The Empty Configuration File
When you create a config file, it has to be designed for a particular pair of data files. It is recommended therefore that the name of the config file connects it to the data files. The config file should always use the config.xsd file provided with XML Data Compare to ensure validity.
You may like to start with an empty configuration file and examine the results obtained before deciding what to add.
3. Editing the Configuration File
3.1. XSD Schema association
To help ensure that you create a valid configuration file, it is recommended that you use an XML editor that supports XSD validation and associate the file with the configuration file XSD. The XSD association is made using the xsi:schemaLocation attribute with a URI value that links to the configuration XSD file config.xsd, included in the distribution's resources directory. For example:
Note that in the above example, the configuration file elements use the 'dcf' prefix that is bound to the com.deltaxml.data.config namespace. The namespace is therefore also included as the first part of the xsi:schemaLocation value.
With this XSD association in your configuration file, your XML editor should provide auto-completion and context-specific help documentation as you type.
3.2. Writing XPath expressions
With the exception of default feature settings, all configuration feature settings must have a dcf:location element that identifies the elements and/or attributes for which the feature should be applied.
Each dcf:location element has a name attribute that must have a unique value. This name attribute makes the configuration more readable, and helps identify the location of an XPath problem.
A dcf:location element also has an xpath attribute. The value for this attribute is an XPath expression as specified in the XPath 3.1 specification. The XPath expression should return zero or more attribute and/or element nodes when evaluated against the input XML. Any returned nodes provide the context for the feature that is specified as a child element of the dcf:location element.
3.2.1. XML Namespaces
It is quite likely that your input XML elements (and possible attributes) have a specified namespace. If they are in a namespace that is not the default namespace, element or attribute names have a prefix followed by ':'. An xmlns attribute binds prefixes or the default namespace (which has no prefix) to a specific namespace URI. For example, in a Maven pom.xml file, we find:
In the example above, the xmlns attribute (line 1) declares http://maven.apache.org/POM/4.0.0 as the default namespace (there is no prefix). The xmlns:xsi attribute (line 2) binds the xsi prefix with the http://www.w3.org/2001/XMLSchema-instance namespace URI. The modelVersion element, a child of the project element, has no prefix; it therefore inherits the default namespace http://maven.apache.org/POM/4.0.0 declared on its parent element.
If an XPath expression in a configuration file is to return the modelVersion element it must specify the namespaces in all node-tests within the XPath (or explicitly specify any namespace using '*:' as a prefix in the node-test). XPath node-tests are evaluated using a namespace-context, specified in the dcf:xpath-namespaces element. This element is used to declare the default namespace and/or a set of namespace declarations associating a given prefix with a namespace URI. See the example in the next section.
XPath using the Default Namespace
In this case, to use the XPath:
We must specify in the configuration file what the default namespace is, as follows:
In the example above, there is a dcf:xpath-namespaces element with a child-element dcf:default-namespace specifying http://maven.apache.org/POM/4.0.0 as the default namespace. The XPath /project/modelVersion will therefore return the mavenProject element.
XPath using a Non-Default Namespace
It may be desirable or necessary to use XPaths with a non-default namespace, for example, if the input XML has more than one namespace. In this case, a prefix is used in node-tests included in the XPath:
To use the above expression in a configuration file, we must declare the namespace-URI binding for the 'mvn' prefix used in the XPath as follows:
In the configuration file above, the dcf:namespace element declares the prefix and namespace-uri binding. A dcf:namespace element is required for each prefix used in XPaths in the configuration file.
Attributes and XML Namespaces
The above examples use element node-tests. When using an attribute node-test, for example with /element/@attribute the absence of a prefix in '@attribute' means the attribute is not in any namespace (i.e. it is in the 'null' namespace) even with the default namespace declared. This is normally the desired behaviour because attributes in XML are always in the null namespace unless they have a prefix.
3.2.2. XPath Restrictions
To improve comparison performance, XPath expressions in configuration files are often processed internally in the 'match' attribute of a generated XSLT template.
As a consequence of this, a number of restrictions apply, these are summarised here:
An XPath 3.1 compatible 'match' pattern must be used. The syntax used for a pattern is defined here: https://www.w3.org/TR/xslt-30/#pattern-syntax
Some notable features:
- The ',' comma XPath separator operator can only be used in sub-expressions (the union '|' operator can be used instead in many cases)
- The context item for the evaluation of the XPath is the XML document node (i.e. not the root element)
- The main XPath expression should start with a /
3.2.3. Diagnostic Mode and XPath Errors
XML Data Compare runs comparisons with a 'Diagnostics Mode' set on by default. This mode allow for errors in XPath expressions to be reported in more detail, helping you identify the location of the problem in the configuration file. In this mode, each XPath expression is evaluated separately first, this may result in it taking longer to perform a comparison on large XML input files. The diagnostics mode can be switched off via the XML Data Compare REST API. The sample command-line interface client shows how this is done.
4. More Configuration with Samples
You will know your data structure well and be aware of the issues that can happen in a comparison.
|If you have this problem with comparison:||Then do this:||Notes and Caveats|
|Numeric data is present and shows up as different even when that difference is insignificant.|
Use numeric tolerances to specify acceptable differences in certain locations.
For more details, please see the numeric tolerances sample.
|Numbers have to be made up of the digits 0 to 9, the decimal point and plus or minus symbols. Letters invalidate a number, eg 5km will be treated as text and not a numeric value. It is possible to use XSLT to remove unwanted text in numeric fields.|
Sequences of elements do have a key that could be used to match pairs of records. If, when not using this key, the matching is not quite accurate.
Specify a container where the order should be ignored along with the key for its children. This key is used to identify the data pairs that are expected to match.
For more details, please see the ordered and unordered comparison sample.
You wish the comparison to treat each block of text as a single node and show the whole block as being different even when just one word differs
Set word-by-word to be false in the elements where it is not required. Note that, by default, word-by-word processing for elements is switched on.
For more details, please see the word by word sample.
|Word by word does not apply to attribute values.|
|Different types of whitespace - single or multiple spaces, tabs and newlines - are significant within the dataset. However by default the whitespace is normalised.|
The normalize-whitespace option can be switched off. The default is for all types of whitespace to be normalized, either as a single space or nothing at all depending on context, before matching.
For more details, please see the normalize whitespace sample.
|If you change the default for the whole comparison then you may get differences on every line if say the indentation is different between the "A" file and the "B" file. It may be preferable just to switch off this option for certain locations.|
|Differences are shown in certain elements but you would prefer them to be ignored.|
Use ignore-changes for those attributes and elements.
For more details, please see the ignore changes sample.
You can specify which version you wish to keep. For full details see the Configuration File Schema Guide.
|You want to change the result file to see changes only|
Use the output element and set the changes-only attribute to true.
For more details, please see the output format sample.
|For full details see the Configuration File Schema Guide.|
|You want to view a side-by-side folding report in a browser|
Use the output element and set the format attribute to sbs-folding-diffreport rather than deltaV2.
For more details, please see the output format sample.
|For full details see the Configuration File Schema Guide. You can download the results and save as an HTML file in order to view it.|
The locations where you want to change the default action of the comparison are specified by XPath expressions. Edit the configuration file to give the settings required and the locations as XPath expressions. Experiment in an XML editor such as oXygen to confirm that your Xpath expressions are correct.