Release Documentation

User Guide

1. Introduction

XML Compare provides a powerful solution to identify and process the differences between any two XML files that share the same root element. Its primary use is as a toolkit for integration into other systems or applications via the comprehensive API, but it may also be run standalone from the command-line or a simple GUI.

This user guide introduces you to the XML Compare product, providing a high-level product description along with a look at the main features and concepts associated with this product. You can find more detailed information on the subjects covered here by following the links to an extensive set of tutorials, samples and papers that complement this product.

The Getting Started 'ReadMe' file provides a quick start for the product as well as a description of all dependencies. The Samples 'ReadMe' gives a summary of all the included samples. An overview of the features in the Document Comparator can be found in the Document Comparator Guide. Comprehensive technical implementation information can be found in the Java and .NET API documentation.

Figure 1. The XML Compare API provides a high level of extensibility

The XML Compare API provides a high level of extensibility

The XML Compare approach is unique in that:

  • The change file is recorded in a XML 'delta file'.

  • The delta file has the same look and feel as the original files.

  • The delta file can include changes only or changes plus unchanged data.

  • The delta file is easy to understand and to process because it is an XML file.

  • The delta file can therefore be processed with standard XML tools.

  • Comparison can be customized by defining/extending filter pipelines.

2. XML Comparison Features

Two input files are used for an XML Compare comparison, referred to here as 'A' and 'B' files. Whilst it is often the case that 'B' is a modification of 'A', it is also possible that both inputs are derived independently from a common source. Using this terminology, a user-oriented set of high-level features is outlined below:

General Features
  • Find all the differences between any two XML files ('A' and 'B').

  • Apply changes to convert an 'A' XML file into the 'B' version (i.e. a diff 'patch').

  • Undo changes to convert a 'B' XML file into the 'A' version.

  • Display change information in either XML or HTML form, using a standard web browser.

  • Report changes only or changes+unchanged data.

  • Use XSLT input and output filters to pre and post process the XML data.

  • Handle large files without performance degradation.

Document Comparison Features and Benefits
  • Extensible pipeline with embedded functionality.

  • Text processing for differences on a word-by-word basis.

  • Special processing for formatting-elements.

  • HTML/CALS table structure aware.

  • Extension points for adding filter steps to the pipeline.

  • Ignoring changes to non-significant whitespace.

  • New in 9.1 - Handle moves of uniquely identifiable element. See Detecting and Handling Moves for more details.

Multi-Document Comparisons

XML Compare only supports the comparison of two XML documents at a time; multi-document comparison is however available in XML Merge and DITA Merge.

3. Running a Comparison

XML Compare runs locally on your own hardware and allows you to quickly embed XML comparison functionality into your own systems, it can be run in a variety of ways, with the range of options determined by the version of the product downloaded.

Table 1. Comparison options for XML Compare Downloads

SystemJava (Unix/Windows)macOS.NET
Command Line
oXygen Plugin 
Java API 

The XML Compare Download Page provides a choice of three possible downloads: Java (UNIX & Windows), macOS, and .NET. To start with, provided you don't want to use the .NET API, you should select the download option that matches your target operating system, each download option includes the required Java JAR files (e.g. deltaxml.jar) and support resources. If you're developing for .NET (on Windows) you should select the .NET download option - note that this version does not include the GUI available in other downloads. Note: A licence file is required to run XML Compare, see the Licensing User Guide for more details.

Figure 2. XML Compare can be invoked using a choice of interfaces (simplified view)

XML Compare can be invoked using a choice of interfaces (simplified view)

A comparison can be run programmatically, using Java or .NET APIs or, alternatively, it can be user-driven via the command-line (see the Command-Line ReadMe), an oXygen plugin (after installation of the DeltaXML oXygen Adaptor) or a simple graphical user-interface (GUI). Note that the GUI is designed to help demonstrate some of the built-in capabilities of XML Compare, but it is not intended as a standalone productivity tool.

It is also possible to invoke a further nested comparison from within an XSLT filter using a provided compare() XSLT extension function, this is described in the Java API documentation .

4. Customising a Comparison

Since XML Compare uses XML to represent changes, an API and Pipeline Configuration architecture allows standard XML technologies such as XSLT to be applied, complex information pipelines can therefore be built from a set of simple proven components.

Figure 3. Configuration of a typical custom comparison pipeline

Configuration of a typical custom comparison pipeline

4.1. Samples of Customized Comparisons

A set of samples are included with XML Compare; these include working code and documentation for a number of customized comparison scenarios.

4.2. Choosing the Comparator

When a comparison is invoked via the recommended com.deltaxml.cores9api API, you have the choice of two comparator classes: DocumentComparator or PipelinedComparatorS9. (Note that when the GUI or command-line processor is used to start a comparison, the standard pipelined comparator class: PipelinedComparatorS9 performs the comparisons).

Pipelined Comparator

Implemented via the PipelinedComparatorS9 class, this provides a very flexible form of comparison, best suited for when the input XML is not always document based or when your require low-level control of the processing pipeline. Except for restrictions associated with lexical preservation filters, input and output filters can be added to the processing pipeline at any point.

Document Comparator

Implemented through the DocumentComparator class, this has a pipeline specially optimized for document comparison, Figure 5 shows a simplified representation of this pipeline. Explicit extension points are available on the pipeline so new filter-steps or chains can be inserted in a managed way.

Figure 4. Filter steps or chains can be applied to specific extension points of the Document Comparator

Filter steps or chains can be applied to specific extension points of the Document Comparator

4.3. Defining Pipelines

Pipelined Comparator

The Pipelined Comparator allows comparisons to be optimized for particular types of data or document structure, it also allows customisation of the way detected differences are represented in the output. The pipeline for a Pipelined Comparator is defined using a set of filters managed in FilterStep and FilterChain objects that can be added at both comparator inputs ('A' and 'B') or the comparator output.

The guide, Specifying a Comparison Pipeline provides an overview of how pipelines can be defined with the Pipelined Comparator, specifically through the use of Java, C# or an XML pipeline descriptor file format, called DXP.

More details on the use of DXP can be found in the document Pipeline Configuration using DXP.

Document Comparator

The Document Comparator differs from the Pipelined Comparator in that key parts of the pipeline are pre-defined with specialist document camparison features; this pipeline is modified by adding filters at certain named 'extension points'.

As in the Pipelined Comparator, filters are managed as FilterStep and FilterChain objects in Java or C#, these are added to the pipeline using the DocumentComparator's setExtensionPoint method. An alternative way to configure a Document Comparator is to use a Document Comparator Pipelines configuration file (DCP).

The Document Comparator is described in the Document Comparator Guide. More details on using DCP can be found in the guide Document Comparator Configuration using DCP.

JAXP Pipeline Comparator (legacy)

A lower level method (now regarded as legacy but still useful for advanced users) for creating pipelines is also available for Java developers, this exploits JAXP interfaces. For this, JAXP Pipeline Examples introduces you to a set of examples available for download, the paper Powering Pipelines with JAXP provides further details on using JAXP.

Pipeline Diagnostics

When there is a need to diagnose stages in a pipeline, a debugFiles mode is available where the inputs and outputs of each filter is output to separate file, a file naming convention is used to indicate where each 'debug file' fits into the pipeline. The debugFiles mode is set either by the setDebugFiles method call or with a Configuration Property (see Configuration Propereties) in a DeltaXML Configuration file named 'deltaXMLConfig.xml', sample XML for setting this property is shown below:

<!DOCTYPE deltaxmlConfig SYSTEM "deltaxml-config.dtd">
    value="true" />
    value="true" />

4.4. Configuration

Low-level XML Compare functionality is configured using different methods according to how the functionality is implemented. These different methods are summarized below:

Table 2. Configuration Summary

Config PropertiesComparator Features & PropertiesParser FeaturesOutput Properties
Diagnostics SettingsDeltaV FormatConfigure XIncludeIndentation
Catalog SettingsMatching AlgorithmJAXP/SAX FeaturesDoctype[a]
Lexical Preservation[b]Diff/Patch Mode  
 Ordering Priority  

[a]DocType is affected by the LexicalPreservation configuration property.
[b]Preferred method for setting LexicalPreservation is via the API.
Configuration Properties

Configuration Properties are used to control certain properties of a comparison operation that may have a wider scope than standard features and properties, more details can be found in the Configuration Properties guide.

Comparator Features and Properties

Features and properties are managed using the API or a DXP/DCP definition, the Features and Properties document describes the features and properties available.

Parser Features

Features for the Apache Xerces parser can be set either from the API or a DXP/DCP configuration, a DXP example can be found in the sample XInclude and XML Compare.

Output Properties

Output properties control the serializer of XML Compare's internal Saxon processor, they are set from the API or using DXP or DCP. An example of how DocType and indentation is set using DXP can be found in the Pipeline Configuration using DXP document.

5. Document Comparison

XML document types such as DITA, DocBook and XHTML share a common set of features such as: inline formatting, tables, ordered/orderless lists and linked resources. To help achieve simple and accurate difference reports when comparing documents, each element supporting these features can be processed in a special way, both at comparison time and when the result is output.

Figure 5. Many document features can benefit from special processing.

Many document features can benefit from special processing.

For optimized processing of document-centric features, two approaches are recommended. The first approach is to exploit built-in features in XML Compare's document comparator augmented with custom XSLT filters where required. The second, more complex approach, is to use the pipelined comparator with a specially configured pipeline exploiting a set of custom XSLT filters.

Most of the features outlined in this section are incorporated into the document comparator, however links are included to samples for cases where you wish customize your own pipelined comparator, these samples also provide some useful insight into how the capabilities that are built into the document comparator actually work. Not all features are enabled by default in the document comparator.

Text Comparison

Normally text comparisons are case-sensitive, but there are certain contexts where case should be ignored, the Case Insensitive Comparison sample shows how this can be done. Also, comparison of text within each element of a document can be performed at different levels, three levels are considered for XML Compare, as outlined below:

  • Text-node Level - if the contents of a text-node changes the whole node is marked as a change (noting that a mixed-content element may contain more than one text node).

  • Word by Word - allows differences in content to be resolved down to specific words - normally differences are shown at the element level. The Word by Word Text Comparison tutorial introduces you to this concept.

  • Character by Character - a further refinement to Word by Word comparison, where differences within words are marked, this is described further in the Character by Character Comparison tutorial.

Lexical Preservation

For preservation of content that is often lost when processing XML, this covers XML comments, XML processing-instructions, CDATA tags, DOCTYPE declarations and entity references. The features supporting this in XML Compare are described in the Lexical Preservation reference. For further help on the use of custom lexical preservation filters, there are also the tutorials: How to Preserve Processing Instructions and Comments and How to Preserve Doctype Information.

Whitespace Management

Whitespace-only nodes found in an XML document should be treated differently depending on whether they are a significant part of content (as in mixed content) or simply used for formatting the XML source. The technique for this is described in the Managing White Space tutorial.

Table Comparison

Complications arise when comparing tables where the structure has changed, for example, when a column has been inserted or removed, the DocumentComparator class of the XML Compare API has ProcessCalsTables and ProcessHTMLTables boolean properties (with get/set methods for Java) that, when set, will manage table comparison so that the result remains valid.

Key-assisted Matching

Some document elements have unique content, such as id attributes, that can be highlighted for the comparator by adding a special key attribute. Keys are particularly useful for matching 'orderless' elements, but can also be of value for ordered elements, with some additional processing to handle moves (see Detecting and Handling Moves. More information can be found in the following samples and guides:Ordered Comparison, Mixed Ordered and Orderless Data and Comparing Orderless Elements.

Linked Resources

For elements whose main purpose is to link to other resources such as images or other documents, results can be improved if special processing is applied. Filters can be included in the processing pipeline to handle such cases. The Image and Binary Comparison sample shows how such link elements can be processed using an XSLT filter that exploits a Java extension function for binary file comparison. This sample could be adapted to suit cases where the link target is a text or XML resource.

Formatting Elements

The document comparator can be configured (by modifying a simple XSLT identity transform) to recognize and process elements used predominantly for inline formatting. This allows content-based element alignment and supports overlaps in the formatted-text range between compared versions. Such formatting differences are represented using extensions introduced in version 2.1 of DeltaV2 and described in the document Overlapping Hierarchies in DeltaV2. Formatting differences can be rendered or styled independently from structural changes according to need.

A practical example of formatting element processing is included in the Formatting Element Changes sample.

Figure 6. Overlaps in formatting in different versions are detected and recorded in the DeltaV2.

Overlaps in formatting in different versions are detected and recorded in the DeltaV2.

6. Data Comparison

For more data-centric XML resources, the comparison pipeline may have a number of design considerations and priorities different from those for comparing document-centric resources (as described in the previous section). This section outlines comparison features that are more significant in this context, but of course, many features described in the Document Comparison section above may also apply.

Numeric Tolerances

For comparison of floating point numbers there may be a requirement to ignore value differences within a specified tolerance, this tolerance can be implemented via output filters based on existing filter resources included in XML Compare, Numeric Tolerances is a worked example of this.

Comparing Large Datasets

When comparing large datasets there are some extra factors to consider, these are covered in the Comparing Large Files guide.

Ignoring Changes

For cases where changes in data are expected but not deemed significant, changes can be 'ignored' in the processing pipeline, a technique for this is explained in the sample: Ignoring Changes.

7. System Integration

Java and .NET APIs

While other methods are provided (such as the command line), XML Compare is designed primarily to be controlled through its API. This runs natively on the Java 1.7 platform but there is also a .NET API wrapper for easy integration with the .NET framework.

Saxon Compatibility

Certain parts of the API allow for integration with an external (Saxonica Ltd) Saxon XSLT/XQuery processor, for example overloads of the compare function provided by the comparator APIs take Saxon XdmNode instances as arguments. To minimize potential version conflicts with XML Compare's internal processor, XML Compare (versions 8.2 and later) exploits a 'compatibility layer' supporting Saxon versions 9.7 and 9.8.

XML Catalog Resolving

XML Compare uses a custom version of the Apache commons OASIS catalog resolver by default, this can however be changed. Further details are in the guide: Using a Catalog Resolver.

Progress Listeners

Systems often have the need to self-monitor or provide progress feedback to an end-user for operations that have the potential to take a noticable amount of time. The XML Compare API has provision for adding progress listeners via a ProgressListener interface, allowing a comparison to be monitored through each significant processing stage in the pipeline configuration.

8. Output Formats

8.1. Direct XML Compare Output

The direct output from XML Compare is the 'Delta', this is the base XML output for both the Pipelined Comparator and the Document Comparator. By default the Delta includes all content, including unchanged content, but there's also an option for a 'patch' output where only the changes are included. Other output format options are also available and described in this section, these are essentially transforms of the original Delta.

8.1.1. The Delta

Figure 7. The Delta XML output from XML Compare uses the DeltaV2 format.

The Delta XML output from XML Compare uses the DeltaV2 format.

The Delta is the XML output direct from the XML Compare comparator which uses the DeltaV2 format to mark up changes. This format is designed to be compact whilst also making code that processes it clean and efficient. Version 2.0 of the DeltaV2 format is used by default, but if the Document Comparator is used with marked up formatting elements, then version 2.1 is used. Version 2.1 is a superset of 2.0 with extensions to represent overlapping XML hierarchies.

At its simplest, the DeltaV2 format is a representation of the 'A' and 'B' documents in a single document. For this, deltaxml:deltaV2 attributes (in the DeltaXML namespace) are added to all elements where differences are found. The deltaV2 attribute may hold one of the following values: A, B, A=B and A!=B. The A or B represents the document source, and the = or != separator indicates if the matching source elements are the same or different. Extra elements in the DeltaXML namespace are used to represent modified text or attribute nodes. The DeltaV2 format is defined in full in the DeltaV2 reference, a more detailed description of the extensions added in version 2.1 are described in the reference: Overlapping Hierarchies in DeltaV2.

8.2. Supplementary Output Formats

This section describes output format filters included with the XML Compare distribution. These are used to transform the Delta output within the comparison pipeline (Pipelined Comparator or Document Comparator) immediately prior to serialization.

8.2.1. HTML Difference Reports

HTML5 Side-by-Side (diffreport-sbs)

This is a JavaScript-dependent HTML view that presents the comparison result of the raw XML of the input file versions rendered alongside each other, colored graphics are used to show how matching elements align. The user-interface provides up/down buttons on a toolbar allowing the end-user to highlight each change.

Figure 8. The 'side-by-side' output format.

The 'side-by-side' output format.

In the Pipelined Comparator, the HTML for this view can be generated using a built in DXP configuration which is invoked from the command-line or GUI using the diffreport-sbs configuration id. Alternatively, it can be generated from the XML Compare API with the dx2-side-by-side.xsl stylesheet added as the final output filter.

For the Document Comparator, the DCP equivalent configuration doc-diffreport-sbsmust be used. Or, if using the API, the XSLT filter dx2-side-by-side.xsl should be added as a filter-step to the OUTPUT_FINAL extension point, as shown in the following Java code:

DocumentComparator dcr= new DocumentComparator();
FilterStepHelper   fsh= dcr.newFilterStepHelper();
FilterChain     fChain= fsh.newFilterChain();
FilterStep       fsSBS= fsh.newFilterStepFromResource(
                 "xsl/side-by-side/dx2-side-by-side.xsl", "side-by-side");
dcr.setExtensionPoint(ExtensionPoint.OUTPUT_FINAL, fChain);
HTML Folding Report (diffreport)

As with the side-by-side view described earlier, this is also a JavaScript-dependent HTML view of the comparison result. This view, however, shows XML differences interleaved within a single view of the XML.

The color of the rendered XML indicates the type of change (blue, green and red for 'modified', 'added' and 'deleted' respectively). The view of each element nodes may be folded/unfolded by pressing the icon immediately to the left of the start tag. A simple toolbar and differences list allow for easier navigation of changes in large documents.

Figure 9. The 'folding' output format.

The 'folding' output format.

With the Pipelined Comparator, the HTML for the folding view is generated using a built in DXP configuration which can be invoked either from the GUI or from the command-line, as with the side-by-side view, but now using the diffreport configuration id.

For the Document Comparator, the folding view can be created from the command-line or GUI using the DCP configuration id doc-diffreport. Alternatively the associated XSLT stylesheet can be added as a filter-step to final output extension point. This is illustrated in the DCPdiffReport sample.

8.2.2. XML Diff and Patch Output

The XML Compare comparators may be configured to output either a full context delta (the default), or a changes only delta. When the pipelined comparator (but not the document comparator) is used, the changes-only format may be used to recreate document B from document A, this could be useful in version control systems and similar scenarios. A worked example of this is: Using Deltas for XML Versioning (diff and patch)

8.3. Document Comparator Formats

8.3.1. Tracked Changes

Many XML editors support a tracked changes feature incorporated into an Author Mode with a WISYWIG view; the output from XML Compare can be be represented as tracked changes in supported tools. The main benefit is that detected changes can be more easily accepted or rejected and further edits made within the chosen editor. The Document Comparator API provides a setResultFormat method of the OutputFormatConfiguration object to produce output conforming to the tracked changes format for the following XML editors:

  • oXygen

  • PTC ArborText

  • XMetal

  • Adobe FrameMaker

Figure 10. The tracked changes feature supports a number of XML Editors

The tracked changes feature supports a number of XML Editors