Numeric Tolerances
Introduction
This document describes the concepts behind the use of numeric tolerances.
XML is often used to represent engineering, scientific or financial data where floating point numbers are widely used. Comparison using tolerances is used when writing software which handles floating point numbers and this article describes techniques which can be used in conjunction with XML Data Compare.
Background
The comparator processes well-formed XML which in turn represents numbers as textual XML. It performs textual comparison of PCDATA and therefore will only report numbers being equal if they have the same lexical representation. If different processors or different serialization software is being used to generate the different XML data being compared it is even possible that the 'same' numbers will have different lexical representations (think of '1.0' and '1.00') and therefore be reported as differing. The W3C XML Schema Datatypes, also supported as part of XSLT 2.0, provide facilities for converting, reading and writing floating point numbers. Rather than build complicated datatype facilities and associated mechanics into the comparison engine, the configuration file offers the numeric
element to resolve these issues with floating point numbers and their tolerances.
Remember that where a temperature is entered as say 20degrees
, 20C
or 68F
(without a space) then the value is not seen as numeric and will not have a tolerance. You can use XSLT to pre-process your files to transform such data. A number followed by a space and then some other characters will be read as the initial number, but any units given will be ignored.
This article will use a worked example to show how XML Data Compare allows numeric tolerances.
Example Data
XML
|
XML
|
The above example is designed to show numeric values used in element content and in attributes. There are some differences in handling and so we'll discuss elements and attributes separately.
Element tolerances
When these are compared using the comparator some of these changes are represented in deltaV2 as follows; here is part of the file corresponding to a change in the 'Salisbury' record element containing a floating point number:
<record deltaxml:deltaV2="A!=B">
<place deltaxml:deltaV2="A=B">Salisbury</place>
<temperature deltaxml:deltaV2="A!=B">
<deltaxml:textGroup deltaxml:deltaV2="A!=B">
<deltaxml:text deltaxml:deltaV2="A">19.8</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="B">19.85</deltaxml:text>
</deltaxml:textGroup>
</temperature>
</record>
We can specify in the configuration file that we wish to allow a tolerance of, for example, 0.5.
<dcf:location name="Temperature" xpath="//temperature">
<dcf:numeric tolerance="0.5" use="B"/>
</dcf:location>
After adding a numeric tolerance to the config file, our Salisbury temperature record becomes:
<record deltaxml:deltaV2="A=B">
<place>Salisbury</place>
<temperature>19.85</temperature>
</record>
Using XPaths to identify the numeric values
In the above example we used a template which matched all temperature elements, assuming they would contain a numeric value. However more explicit XPaths could also be used and also a template could be used to handle multiple numeric elements. Here are some examples which partially illustrate the power of XPath. Download the samples from Bitbucket to try them in the config file.
//temperature
/weather/record/temperature
/weather/record[place='Malvern']/temperature
//temperature | pressure | weight
Attribute Tolerances
The representation of attribute change in deltaV2 is more complicated than that for element content, shown above. Here is how the 'time' attribute used in the example above is represented:
<weather ... deltaxml:deltaV2="A!=B" ...>
<deltaxml:attributes deltaxml:deltaV2="A!=B">
<dxa:time deltaxml:deltaV2="A!=B">
<deltaxml:attributeValue deltaxml:deltaV2="A">12437389</deltaxml:attributeValue>
<deltaxml:attributeValue deltaxml:deltaV2="B">12437395</deltaxml:attributeValue>
</dxa:time>
</deltaxml:attributes>
...
In the input data the attribute would have an XPath of /weather/@time so adding this location to the config file:
<dcf:location name="Attr" xpath="/weather/@time">
<dcf:numeric tolerance="10" use="A"/>
</dcf:location>
The above tolerance will see the values as equal and show the A value.
<weather ... time="12437389">
Running the sample
See the sample at https://bitbucket.org/deltaxml/xml-data-compare-numeric-tolerances