Customising a Comparison
Introduction
Since XML Compare uses XML to represent changes, an API and Pipeline Configuration architecture allows standard XML technologies such as XSLT to be applied, complex information pipelines can therefore be built from a set of simple proven components.
Configuration of a typical custom comparison pipeline
Samples of Customised Comparisons
A set of samples are included with XML Compare; these include working code and documentation for a number of customized comparison scenarios.
Choosing the Comparator
When a comparison is invoked via the recommended com.deltaxml.cores9api API, you have the choice of two comparator classes: DocumentComparator
or PipelinedComparatorS9
.
Note
When invoking a comparison through the graphical interface (GUI) or command-line interface (CLI), the comparator class used will depend on whether a DCP file ID (for DocumentComparator), or DXP file ID (for PipelinedComparatorS9) is used.
Pipelined Comparator
Implemented via the PipelinedComparatorS9 Java class, this provides a very flexible form of comparison, best suited for when the input XML is not always document based or when your require low-level control of the processing pipeline. Except for restrictions associated with lexical preservation filters, input and output filters can be added to the processing pipeline at any point.
Document Comparator
Implemented through the DocumentComparator Java class, this has a pipeline specially optimized for document comparison, the figure below shows a simplified representation of this pipeline. Explicit extension points are available on the pipeline so new filter-steps or chains can be inserted in a managed way.
Filter steps or chains can be applied to specific extension points of the Document Comparator
Defining Pipelines
Pipelined Comparator
The Pipelined Comparator allows comparisons to be optimized for particular types of data or document structure, it also allows customisation of the way detected differences are represented in the output. The pipeline for a Pipelined Comparator is defined using a set of filters managed in FilterStep
and FilterChain
objects that can be added at both comparator inputs ('A' and 'B') or the comparator output.
The guide, Specifying a Comparison Pipeline provides an overview of how pipelines can be defined with the Pipelined Comparator, specifically through the use of Java, or an XML pipeline descriptor file format, called DXP.
More details on the use of DXP can be found in the document Pipeline Configuration using DXP.
Document Comparator
The Document Comparator differs from the Pipelined Comparator in that key parts of the pipeline are pre-defined with specialist document comparison features; this pipeline is modified by adding filters at certain named 'extension points'.
As in the Pipelined Comparator, filters are managed as FilterStep
and FilterChain
objects in Java, these are added to the pipeline using the DocumentComparator's setExtensionPoint
method. An alternative way to configure a Document Comparator is to use a Document Comparator Pipelines configuration file (DCP).
The Document Comparator is described in the Document Comparator Guide. More details on using DCP can be found in the guide Document Comparator Configuration using DCP.
JAXP Pipeline Comparator (legacy)
A lower level method (now regarded as legacy but still useful for advanced users) for creating pipelines is also available for Java developers, this exploits JAXP interfaces. For this, JAXP Pipeline Examples introduces you to a set of examples available for download, the paper Powering Pipelines with JAXP provides further details on using JAXP.
Pipeline Diagnostics
When there is a need to diagnose stages in a pipeline, a debugFilesmode is available where the inputs and outputs of each filter is output to separate file, a file naming convention is used to indicate where each 'debug file' fits into the pipeline. The debugFiles mode is set either by the setDebugFiles
method call or with a Configuration Property (see Configuration Properties) in a DeltaXML Configuration file named 'deltaXMLConfig.xml', sample XML for setting this property is shown below:
|
Configuration
Low-level XML Compare functionality is configured using different methods according to how the functionality is implemented. These different methods are summarized below:
Configuration Summary
Config Properties | Comparator Features & Properties | Parser Features | Output Properties |
---|---|---|---|
Diagnostics Settings | DeltaV Format | Configure XInclude | Indentation |
Catalog Settings | Diff/Patch Mode | JAXP/SAX Features | Doctype (DocType is affected by the LexicalPreservation configuration property) |
Matching Algorithm | |||
Ordering Priority |
Configuration Properties
Configuration Properties are used to control certain properties of a comparison operation that may have a wider scope than standard features and properties, more details can be found in the Configuration Properties guide.
Comparator Features and Properties
Features and properties are managed using the API or a DXP/DCP definition, the Features and Properties document describes the features and properties available.
Parser Features
Features for the Apache Xerces parser can be set either from the API or a DXP/DCP configuration, a DXP example can be found in the sample XInclude and XML Compare.
Output Properties
Output properties control the serializer of XML Compare's internal Saxon processor, they are set from the API or using DXP or DCP. An example of how DocType and indentation is set using DXP can be found in the Pipeline Configuration using DXP document.