Skip to main content
Skip table of contents

Specifying a Comparison Pipeline

Introduction

One of the main features of XML Compare is the ability to define a comparison pipeline to use when processing your delta. This pipeline definition was introduced in version 3.0 (as the PipelinedComparator class) and allows the specification of input and output filter chains to apply to the data before and after a comparison takes place. This adds powerful functionality which allows the processing of delta files into standards-compliant output files that show change using the grammar of the input file format only.

The DXP Configuration Format

In XML Compare 3.1, the DXP file format was introduced. DXP files are XML definitions of pipelines that can be used with the provided command line tool or used to generate a pre-configured pipeline instance. DXP allows the specification of almost all of the features available on the corresponding API classes.

PipelinedComparatorS9

Since the introduction of the original com.deltaxml.core API package in version 3.0, an improved package based on the Saxon s9api interfaces (com.deltaxml.cores9api) has been released.  This new API package has more features and better performance than our original JAXP-based implementation;  PipelinedComparatorS9 and the DocumentComparator (see below) are recommended for new users, this document will primarily focus on the PipelinedComparatorS9.

DocumentComparator and DCP

Version 7.0 of XML Compare introduced a new DocumentComparator component in the cores9api package. This extends the PipelinedComparatorS9 with extra features focused on the comparison of narrative content (as opposed to data-centric content). 'DCP' is the file format, available from Version 7.2, that can be used to configure the DocumentComparator as an alternative to the Java API. A full description of the DocumentComparator can be found in the Document Comparator Guide

This document, plus the code samples included in the XML Compare release in directory samples/PipelineDefinition, walk you through these technologies for pipeline specification showing how they relate to each other.

PipelinedComparatorS9

The first thing you need before configuring the pipeline is a PipelinedComparatorS9. This is the class that will be configured and on which you start the actual comparison.

Examples

DXP

The minimal DXP file that creates an 'unconfigured'  pipeline is shown below:

XML
<comparatorPipeline description="A simple comparison" id="compare"/>

This will validate against the DXP DTD and if the pipeline is run using the command line interface, will perform a simple comparison, producing a changes-only delta.

Java

Simply create a new PipelinedComparatorS9 instance using the following Java code:

JAVA
import com.deltaxml.cores9api.PipelinedComparatorS9;
...
PipelinedComparatorS9 pc= new PipelinedComparatorS9();

Parser Features

When a comparison is triggered, the PipelinedComparator creates new parser instances (if necessary) with which to parse the input files. Unless otherwise specified, the Apache Xerces parser that is distributed with XML Compare will be used.

It may be necessary to configure the parser before use, to enable XInclude or specify how to validate the inputs for example.

Examples

The following examples show how to enable XInclude on the Apache Xerces parser with each technology.

DXP

Specify parser features using the <parserFeatures> element. Each feature is set using its own <feature> child element:

XML
<parserFeatures>
  <feature name="http://apache.org/xml/features/xinclude" literalValue="true"/>
</parserFeatures>

Comparator Features

Comparator features are switchable functionality settings on the comparator. As a feature is essentially switched on or off, it is set using a boolean value of true or false. Features on the comparator include http://deltaxml.com/api/feature/isFullDelta, http://deltaxml.com/api/feature/enhancedMatch1 and http://deltaxml.com/api/feature/deltaV2. See the Features List for more details on comparator features.

Examples

The following examples show how to turn on the comparator feature http://deltaxml.com/api/feature/isFullDelta which turns the delta from a changes-only delta to a full-context delta.

DXP

Specify comparator features using the <comparatorFeatures> element. Each feature is set using its own <feature> child element:

XML
<comparatorFeatures>
  <feature name="http://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
</comparatorFeatures>

Comparator Properties

Comparator properties are settings that take an instantiated Object as a value. One example is http://deltaxml.com/api/property/orderlessPresentation which can take a number of String values. See the Properties List for more details on comparator properties.

Examples

The following examples show how to set the comparator property http://deltaxml.com/api/property/orderlessPresentation to 'a_matches_deletes_adds'.

DXP

Specify comparator properties using the <comparatorProperties> element. Each property is set using its own <property> child element. N.B. DXP only supports the setting of String property values.

XML
<comparatorProperties>
  <property name="http://deltaxml.com/api/property/orderlessPresentation"
    literalValue="a_matches_deletes_adds"/>
</comparatorProperties>

Input Filters

The main purpose of the PipelinedComparatorS9 is to make the addition of filters into input and output chains a lot simpler. It is possible to use JAXP to chain filters together but the use of PipelinedComparatorS9 is recommended.

Input filters are applied to the input files in the order specified before the comparison takes place. They can be Java-based streaming filters or XSLT filters.

Symmetrical Input Filter Chains

If you wish to pass each of the two inputs through the same set of filters, with the same parameter values (where applicable), you only need to define the filter chain once and it will be run on each input in turn.

The following examples show how to specify an input filter chain that consists of two XSLT filters followed by a Java filter.

DXP

If the same filter chain should be applied to both inputs, use the <inputFilters> element to define them. Each filter is defined in its own <filter> child element:

XML
<inputFilters>
  <filter>
    <file path="input-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="input-filter2.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <class name="com.deltaxml.demo.SimpleJavaFilter"/>
  </filter>
</inputFilters>

Java

Symmetrical input filter chains should be set using the setInputFilters() method on the PipelinedComparatorS9.

JAVA
import com.deltaxml.demo.SimpleJavaFilter;
import java.io.File;
import com.deltaxml.cores9api.FilterChain;
import com.deltaxml.cores9api.FilterStepHelper;
...
FilterStepHelper fsh= pc.newFilterStepHelper();
FilterChain inChain= fsh.newFilterChain();

inChain.addStep(fsh.newFilterStep(new File("input-filter1.xsl"), "in-filter1"));
inChain.addStep(fsh.newFilterStep(new File("input-filter2.xsl"), "in-filter2")):
inChain.addStep(fsh.newFilterStep(SimpleJavaFilter.class, "in-java-filter"));

pc.setInputFilters(inChain);

Asymmetrical Input Filter Chains

For some pipelines, you may wish to apply different filter chains to each input or to pass different parameters to the same filter chains depending on which input is being processed. The PipelinedComparator allows the use of asymmetrical input filter chains for this reason.

The following examples show how to specify different filter chains for each input, including how to pass different parameter values to the same filter.

DXP

When using different filter chains for each input, replace the <inputFilters> element with <input1Filters> and <input2Filters>:

XML
<input1Filters>
  <filter>
    <file path="input-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="input-filter-with-parameter.xsl"/>
    <parameter name="an-input-param" literalValue="input1-value"/>
  </filter>
</input1Filters>
<input2Filters>
  <filter>
    <file path="input-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="input-filter-with-parameter.xsl"/>
    <parameter name="an-input-param" literalValue="input2-value"/>
  </filter>
  <filter>
    <class name="com.deltaxml.demo.SimpleJavaFilter"/>
  </filter>
</input2Filters>

Java

Asymmetrical input filter chains should be specified with setInput1Filters() and setInput2Filters() methods on PipelinedComparatorS9:

JAVA
import com.deltaxml.demo.SimpleJavaFilter;
import com.deltaxml.cores9api.FilterChain;
import com.deltaxml.cores9api.FilterStepHelper;
import java.io.File;
...
FilterStepHelper fsh= pc.newFilterStepHelper();
FilterStep step= null;
...
FilterChain inChain1= fsh.newFilterChain();
step= fsh.newFilterStep(new File("input-filter1.xsl"), "in-filter1");
inChain1.addStep(step);
step= fsh.newFilterStep(
        new File("input-filter-with-parameter.xsl"),
        "a-param-filter"
);
step.setParameterValue("an-input-param", "input1-value");
inChain1.addStep(step);

FilterChain inChain2= fsh.newFilterChain();
step= fsh.newFilterStep(new File("input-filter1.xsl"), "in-filter1");
inChain2.addStep(step);
step= fsh.newFilterStep(
        new File("input-filter-with-parameter.xsl"),
        "a-param-filter"
);
step.setParameterValue("an-input-param", "input2-value");
inChain2.addStep(step);
step= fsh.newFilterStep(SimpleJavaFilter.class, "java-filter");
inChain2.addStep(step);

pc.setInput1Filters(inChain1);
pc.setInput2Filters(inChain2);

Output Filters

Output filter chains are constructed in the same way as input filter chains except that there is only one chain and it is applied to the result of the comparison.

Examples

The following examples show how to add a simple XSLT filter chain to the output.

DXP

Specify output filters using the <outputFilters> element. Each filter is defined in its own <filter> child element:

XML
<outputFilters>
  <filter>
    <file path="output-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="output-filter2.xsl" relBase="dxp"/>
  </filter>
</outputFilters>

Java

Output filters are added using the setOutputFilters() methods on the PipelinedComparatorS9.

JAVA
import com.deltaxml.cores9api.FilterChain;
import com.deltaxml.cores9api.FilterStepHelper;
import java.io.File;
...
FilterStepHelper fsh= pc.newFilterStepHelper();
...
FilterChain outChain= fsh.newFilterChain();
outChain.addStep(fsh.newFilterStep(new File("output-filter1.xsl"), "out-filter1"));
outChain.addStep(fsh.newFilterStep(new File("output-filter2.xsl"), "out-filter2"));
pc.setOutputFilters(outChain);

Conditional Filters

When using DXP as a pipeline specification, it is helpful to be able to make some filters conditional. This can be achieved by specifying a boolean input parameter that is tested when adding the filter. Filters can be made conditional 'if' a parameter is true or 'unless' a parameter is true (or a combination of both). At present, only a single parameter can be tested for.

Examples

DXP

The boolean to test against should be specified as a <booleanParameter> element. The <filter> element then refers to the parameter name in an if or unless attribute:

XML
<pipelineParameters>
   <booleanParameter name="run-A" defaultValue="true"/>               <!-- filter A will run by default -->
   <booleanParameter name="dont-run-B" defaultValue="false"/>         <!-- filter B will run by default -->
   <booleanParameter name="tidy-inputs" defaultValue="true"/>         <!-- inputs should be 'tidied' by default -->
   <booleanParameter name="input-already-tidy" defaultValue="false"/> <!-- inputs are not already tidy by default -->
</pipelineParameters>

<inputFilters>
  <!-- will tidy input if told to unless it is already tidy -->
  <filter if="tidy-inputs" unless="input-already-tidy">
    <file path="tidy-input.xsl"/>
  </filter>
  <filter if="run-A">
    <file path="filterA.xsl"/>
  </filter>
  <filter unless="dont-run-B">
    <file path="filterB.xsl"/>
  </filter>
</inputFilters>

N.B. An alternative way to optionally add the first of the filters above is to use the when attribute. This attribute is only available if using the com.deltaxml.cores9api.DXPConfigurationS9 class to load the DXP file. Its value should be an XPath statement that evaluates to a boolean. Parameters (both string parameters and boolean parameters) can be referenced by adding a $ character to the start of their names, e.g.:

XML
  <!-- will tidy input if told to unless it is already tidy -->
  <filter when="$tidy-inputs and not($already-tidy)"> 
    <file path="tidy-input.xsl"/>
  </filter>

Java

Conditional filters in a source code pipeline may be implemented by either using if statements to decide whether or not to add specific filters to List objects, or by enabling or disabling filter steps using setEnabled(boolean).

In Java the following code will tidy input when the boolean variable tidyInputs is set, unless it is already tidy.

JAVA
  pc= new PipelinedComparatorS9();
  FilterStepHelper fsh= pc.newFilterStepHelper();
  FilterChain inChain1= fsh.newFilterChain();
  FilterStep step= fsh.newFilterStep(new File("tidy-input.xsl"), "tidy-input");
  inChain1.addStep(step);
  step.setEnabled(tidyInputs && !inputAlreadyTidy);

Filter Parameters

DXP and the PipelinedComparatorS9 allow String values to be passed to filters as parameters. These enable you to change the behaviour of filters and make more flexible pipelines. Parameters could be dependent on external values being passed in or could be dependent on which input chain is being processed (in the case of asymmetrical input filter chains). If a parameter is being passed to an XSLT filter, it should declare an <xsl:param> with the same name as the parameter being passed. If a parameter is being passed to a Java filter, it should have a public method called set{parameterName} that takes a single String, e.g. for a parameter called myParam, there should be a method public void setMyParam(String value) declared on the Java filter.

N.B. For XML Compare versions earlier than 6.0, the capitalisation of the set method is important. The case of the parameter name  in the set method should be the same as in the name of the parameter itself e.g. to set myParam, a method called setmyParam() must be present; to set MyParam, a method called setMyParam() should be defined. From version 6.0, parameters with a lower case letter can be defined with a set method containing the lower case form or the upper case form. The prefix 'set' must always be lower case.

Examples

DXP

Filter parameters can be added to any filter type using the <parameter> element as a child of the <filter>. They can take a fixed value (defined using a literalValue attribute) or can take the value of a parameter defined as either <booleanParameter> or <stringParameter> elements underneath the <pipelineParameters> element (using the parameterRef attribute). In the case of boolean parameters, the boolean value is first converted to a String before being passed to the filter:

XML
<pipelineParameters>
  <stringParameter name="external-parameter" defaultValue="default"/>
</pipelineParameters>

<inputFilters>
  <filter>
    <file path="input-filter-with-parameter.xsl" relBase="dxp"/>
    <parameter name="an-input-param" literalValue="both-inputs"/>
  </filter>
  <filter>
    <class name="com.deltaxml.demo.SimpleJavaFilter"/>
    <parameter name="myParam" parameterRef="external-parameter"/>
  </filter>
</inputFilters>

An alternative way to pass parameters to filters when using com.deltaxml.cores9api.DXPConfigursationS9 to load the DXP file is to use the xpath attribute on the parameter element. This attribute contains an XPath statement that evaluates to a single atomic value (it will be converted to a string before being passed to the filter). The statement can reference any of the pipeline parameters by adding a $ character to the start of their names.

XML
<pipelineParameters>
  <stringParameter name="first-name" defaultValue="John"/>
  <stringParameter name="surname" defaultValue="Smith"/>
</pipelineParameters>

<inputFilters>
  <filter>
    <file path="name-replacement-filter.xsl" relBase="dxp"/>
    <parameter name="full-name" xpath="concat($first-name, ' ', $surname)"/>
  </filter>
</inputFilters>

Java

A FilterStep object's parameter can be set using the setParameterValue method as illustrated below. A parameter value can be changed at any time before a comparison, but should not be updated during a comparison.

Implementation Note: When setting a parameter on a Java filter reflection is used.

JAVA
import com.deltaxml.cores9api.FilterChain;
import com.deltaxml.cores9api.FilterStep;
import com.deltaxml.cores9api.FilterStepHelper;
import com.deltaxml.demo.SimpleJavaFilter;
import java.io.File;
...
FilterStepHelper fsh= pc.newFilterStepHelper();
FilterChain inChain= fsh.newFilterChain()
...
String externalParameter= "default";
FilterStep step= null;
...
step= fsh.newFilterStep(
        new File("input-filter-with-parameter.xsl"),
        "in-java-filter"
);
step.setParameterValue("an-input-param", "both-inputs");
inChain.addStep(step);

step= fsh.newFilterStep(SimpleJavaFilter.class, "in-java-filter");
inChain.addStep(fsh.newFilterStep(step));
step.setParameterValue("myParam", externalParameter);

pc.setInputFilters(inChain);

Filter Types

Filters can be declared using a variety of types. Java filters are always declared as Class filters but XSLT filters can be declared as files, URLs, Templates objects, classpath resources, XsltTransformers or XsltExecutables depending on which technology is being used.

Adding Filters using DXP

Filters can be added to a DXP pipeline as named classes, files, URLs or classpath resources:

XML
<filter>
  <class name="com.deltaxml.demo.SimpleJavaFilter"/>
</filter>

<filter>
  <file path="input-filterA.xsl"/>
</filter>

<filter>
  <http url="http://www.deltaxml.com/xml-compare/current/samples/PipelineDefinition/input-filterA.xsl"/>
</filter>

<filter>
  <resource name="xsl/input-filterA.xsl"/>
</filter>

DXP/DCP with Jav

The following table shows the relationship between the DXP and DCP elements that are used as children of the filter element and the corresponding Object types allowed as filter list members in the 'Core S9API' packages.

concept/package

DXP element name

Java com.deltaxml.cores9api package filter list member

compiled source filters

class

FilterStep
(from a class literal)

file system

file

FilterStep
(from a java.io.File object)

http filters

http

FilterStep
(from a java.net.URL object)

jar/classpath resource filters

resource

FilterStep
(from a resource string)

pre-compiled
filter objects

-

FilterStep
(from a net.sf.saxon.s9api.XsltExecutable object)

Output Properties

If the result of the comparison is to be serialized, it is possible to configure the final Transfomer step using output properties. More details on output properties can be found in the W3C Recommendation for XSLT.

Examples

The following examples show how to configure the pipeline to indent the result file.

DXP

Output properties can be specified using the <outputProperties> element. Each property is defined in its own <property> child element:

XML
<outputProperties>
  <property name="indent" literalValue="yes"/>
</outputProperties>

Java

Call the setOuptutProperty() method on the PipelinedComparatorS9 for each property you wish to set.  These are specified using values of the  net.sf.saxon.s9api.Serializer.Property enumeration:

JAVA
import net.sf.saxon.s9api.Serializer;
...
pc.setOutputProperty(Serializer.Property.INDENT, "yes");
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.