This document describes the concepts behind preserving processing instructions and comments. For the resources associated with this sample, see here.
XML documents often contain processing instructions (PIs) or comments as well as the normal elements and attributes. These parts of the document are reported by the parser and processed during a comparison by default when the DocumentComparator or PipelinedComparatorS9 comparator classes are used. To achieve this, these node types are converted into XML elements within the document and then converted back again after comparison.
Sometimes however it is necessary to control comment and processing instruction processing in a more granular way, perhaps so that changes in these node types can be processed differently (by default the 'B' version of a changed node is output). This sample explains how this can be achieved using filters provided in the DeltaXML product.
Note that, if using the API, it may be easier to select one of the pre-configured lexical preservation modes as discussed in the Guide to Lexical Preservation, as many of them include the preservation of comments and processing instructions.
A simple approach for retaining PIs and comments is to enable the built-in lexical preservation on our 'Core S9 API' comparators, such as the PipelinedComparatorS9, which are configured by passing them a LexicalPreservationConfig object. The following code extract illustrates how to enable just PI and comment preservation on a LexicalPreservationConfig object.
Having enabled the preservation the next step is to specify how changes in PIs and comments should be handled. It is relatively straightforward to handle PIs and comments that are either unchanged, or appear in an added or deleted XML element. In these cases, the PIs and comments appear as they would in the inputs. The difficulty comes in working out how to handle PIs or comments that are modified or only appear in one of the sources, but not in an added or deleted XML element. In some cases, such as when there is no way of representing a change in a PI or comment, it may be appropriate to output the newer 'B' version; this is illustrated in the following code extract.
In cases where the desired output format can represent change of a comment or processing instruction, then further details on the lexical preservation format and scheme is required. This more advanced usage is discussed in the Explanation section of this sample.
DXP or DCP Approach
DXP and DCP pipeline configuration files can be used to configure the PipelinedComparatorS9 or the DocumentComparator respectively, allowing comparisons to be invoked with special configurations from a simple GUI or the command-line.
The lexicalPreservation DXP element.
The DXP and DCP pipeline configuration formats support the lexicalPreservation element that can be used to set lexical preservation options. The approach here is to first set the default options that apply to all lexical preservation artifacts, and then to set the overrides for specific lexical artifact types. This is illustrated in the sample preserve-pis-and-comments-lp.dxp, a snippet is shown below:
This approach is very flexible because settings can be parameterised by using parameters declared in a
Running the sample
If you have Ant installed, use the build script provided to run the sample, from the directory you have downloaded the sample resources to, normally
samples/PreservePIsAndComments. To use the DXP configuration exploiting a lexicalPreservation element, simply type the following command to run the pipeline and produce the
dxp-lp-result.xml output file.
Use the following command to compile and run the sample with the API approach and produce the output file
If you don't have Ant installed, you can run the sample DXP from a command line by issuing the following command from your sample directory (ensuring that you use the correct directory separators for your operating system). Replace x.y.z with the major.minor.patch version number of your release e.g. command-10.0.0.jar
To clean up the sample directory, run the following command in Ant.
Converting Processing Instructions and Comments into XML
The first step in preserving PIs and comments is to convert them into XML elements. The following example shows an XML document that contains PIs and comments.
Example 1: an XML file containing PIs and comments (input1.xml in Bitbucket, https://bitbucket.org/deltaxml/preserving-processing-instructions-comments)
The XML Compare product includes a lexical preservation feature, which can be configured to enable the processing of PIs and comments. The following example shows the same file after the lexical preservation input processing has been applied. Notice that the PIs and comments that appeared outside of the root element have been moved inside it, wrapped in special container elements highlighting the fact.
Example 2: the XML file after passing through the LexicalPreservation input filter
These elements can now be compared as part of the comparison and will appear in the delta file.
Converting back after comparison
The following example shows the delta file produced after comparing input1.xml and input2.xml in Bitbucket, https://bitbucket.org/deltaxml/preserving-processing-instructions-comments.
Example 3: a delta file showing changes to PIs and comments
The lexical preservation output chain can be used to convert this information into processing instructions and comments, as discussed in the API approach section. However, it is also possible for you to convert the lexically preserved processing instructions and comments into other custom formats, using custom XSLT filters. Further details on the format are presented in the Lexical Preservation Format document.
The following example shows the affect of configuring the lexical preservation to choose the 'B' version of the encoded processing instructions and comments of the XML in example 3.
Example 4: output that chooses the 'B' version of PIs and comments