Introduction

Since XML Compare uses XML to represent changes, an API and Pipeline Configuration architecture allows standard XML technologies such as XSLT to be applied, complex information pipelines can therefore be built from a set of simple proven components.

Configuration of a typical custom comparison pipeline


Samples of Customised Comparisons

A set of samples are included with XML Compare; these include working code and documentation for a number of customized comparison scenarios.

Choosing the Comparator

When a comparison is invoked via the recommended com.deltaxml.cores9api API, you have the choice of two comparator classes: DocumentComparator or PipelinedComparatorS9.

Note

When invoking a comparison through the graphical interface (GUI) or command-line interface (CLI), the comparator class used will depend on whether a DCP file ID (for DocumentComparator), or DXP file ID (for PipelinedComparatorS9) is used.


Pipelined Comparator

Implemented via the PipelinedComparatorS9 Java class, this provides a very flexible form of comparison, best suited for when the input XML is not always document based or when your require low-level control of the processing pipeline. Except for restrictions associated with lexical preservation filters, input and output filters can be added to the processing pipeline at any point.

Document Comparator

Implemented through the DocumentComparator Java class, this has a pipeline specially optimized for document comparison, the figure below shows a simplified representation of this pipeline. Explicit extension points are available on the pipeline so new filter-steps or chains can be inserted in a managed way.

Filter steps or chains can be applied to specific extension points of the Document Comparator


Defining Pipelines

Pipelined Comparator

The Pipelined Comparator allows comparisons to be optimized for particular types of data or document structure, it also allows customisation of the way detected differences are represented in the output. The pipeline for a Pipelined Comparator is defined using a set of filters managed in FilterStep and FilterChain objects that can be added at both comparator inputs ('A' and 'B') or the comparator output.

The guide, Specifying a Comparison Pipeline provides an overview of how pipelines can be defined with the Pipelined Comparator, specifically through the use of Java, C# or an XML pipeline descriptor file format, called DXP.

More details on the use of DXP can be found in the document Pipeline Configuration using DXP.

Document Comparator

The Document Comparator differs from the Pipelined Comparator in that key parts of the pipeline are pre-defined with specialist document comparison features; this pipeline is modified by adding filters at certain named 'extension points'.

As in the Pipelined Comparator, filters are managed as FilterStep and FilterChain objects in Java or C#, these are added to the pipeline using the DocumentComparator's setExtensionPoint method. An alternative way to configure a Document Comparator is to use a Document Comparator Pipelines configuration file (DCP).

The Document Comparator is described in the Document Comparator Guide. More details on using DCP can be found in the guide Document Comparator Configuration using DCP.

JAXP Pipeline Comparator (legacy)

A lower level method (now regarded as legacy but still useful for advanced users) for creating pipelines is also available for Java developers, this exploits JAXP interfaces. For this, JAXP Pipeline Examples introduces you to a set of examples available for download, the paper Powering Pipelines with JAXP provides further details on using JAXP.

Pipeline Diagnostics

When there is a need to diagnose stages in a pipeline, a debugFilesmode is available where the inputs and outputs of each filter is output to separate file, a file naming convention is used to indicate where each 'debug file' fits into the pipeline. The debugFiles mode is set either by the setDebugFiles method call or with a Configuration Property (see Configuration Properties) in a DeltaXML Configuration file named 'deltaXMLConfig.xml', sample XML for setting this property is shown below:

<!DOCTYPE deltaxmlConfig SYSTEM "deltaxml-config.dtd">
<deltaxmlConfig>
  <configProperty
    name="com.deltaxml.cores9api.DocumentComparator.debugFiles"
    value="true" />
  <configProperty
    name="com.deltaxml.cores9api.PipelinedComparatorS9.debugFiles"
    value="true" />
</deltaxmlConfig>


Configuration

Low-level XML Compare functionality is configured using different methods according to how the functionality is implemented. These different methods are summarized below:

Configuration Summary

Config Properties

Comparator Features & Properties

Parser Features

Output Properties

Config Properties

Comparator Features & Properties

Parser Features

Output Properties

Diagnostics SettingsDeltaV FormatConfigure XIncludeIndentation
Catalog SettingsMatching AlgorithmJAXP/SAX Features

Doctype

(DocType is affected by the LexicalPreservation configuration property.)



Diff/Patch Mode


Ordering Priority

Configuration Properties

Configuration Properties are used to control certain properties of a comparison operation that may have a wider scope than standard features and properties, more details can be found in the Configuration Properties guide.

Comparator Features and Properties

Features and properties are managed using the API or a DXP/DCP definition, the Features and Properties document describes the features and properties available.

Parser Features

Features for the Apache Xerces parser can be set either from the API or a DXP/DCP configuration, a DXP example can be found in the sample XInclude and XML Compare.

Output Properties

Output properties control the serializer of XML Compare's internal Saxon processor, they are set from the API or using DXP or DCP. An example of how DocType and indentation is set using DXP can be found in the Pipeline Configuration using DXP document.