This document describes the concepts behind preserving entity references. For the resources associated with this sample, see here in Bitbucket.
XML documents sometimes contain entity references. While entity references can be either expanded or left as references within an XML document, they are not by default processed during a comparison. This is of course not always an issue but sometimes it is necessary to include entity references in the result. To achieve this, they need to be converted into XML elements within the document and then converted back again after comparison. This sample explains how this can be achieved using filters provided in the DeltaXML product.
Note that it may be easier to select one of the pre-configured lexical preservation modes as discussed in the Guide to Lexical Preservation, as some of them include entity preservation (e.g. 'roundTrip' preservation mode).
Simple API approach
A simple approach for retaining entity references is to enable the built-in lexical preservation on our 'Core S9 API' comparators, such as the PipelinedComparatorS9, which are configured by passing them a LexicalPreservationConfig object. The following code extract illustrates how to enable just entity references preservation on a LexicalPreservationConfig object.
Having enabled the preservation the next step is to specify how changes in entity references should be handled. It is relatively straightforward to handle entity references that are either unchanged, or appear in an added or deleted XML element. In these cases, the entity references appear as they would in the inputs. The difficulty comes in working out how to handle entity references that are modified or only appear in one of the sources, but not in an added or deleted XML element. In some cases, such as when there is no way of representing a change in an entity reference, it may be appropriate to output the newer 'B' version; this is illustrated in the following code extract.
It is also possible to detect changes in the content of an entity reference, by retaining both the entity reference and its replacement text, before the comparison is performed. Following the comparison, any changes in the replacement text will be identified using the usual scheme. This can then be used to highlight that the entity reference's content has changed by adding and deleting the associated entity reference in the output, as discussed in the Predefined Preservation Modes section of the Guide to Lexical Preservation.
Note that if both the entity reference and replacement text are being preserved then it is possible to choose which should be retained in the output using the
lpc.setUseEntityReferences method. The default behaviour is to preserve the entity references, rather than use the associated replacement text.
For further information on the representation of the entity declarations and references please refer to the Explanation section of this sample.
Running the sample
If you have Ant installed, use the build script provided to run the sample (found in directory
samples/PreserveEntityRefs). Simply type the following command to run the pipeline and produce the output files
If you don't have Ant installed, you can run the sample from a command line by issuing the following command from the sample directory (ensuring that you use the correct directory separators for your operating system). Replace x.y.z with the major.minor.patch version number of your release e.g. command-10.0.0.jar
To clean up the sample directory, run the following command in Ant.
The following explanation makes use of a simplified variant of the sample input files. The key changes are that the DOCTYPE has been changed from XHTML to one specified solely by an internal subset, and that the explanatory text has been removed (as it is contained in this document). This allows us to focus on how changes in the entity declarations and use (via entity references) are handled.
Note: The output is not valid HTML, as it contains an internal subset, but it illustrates the changes in entity references in several web-browsers (including Internet Explorer, Safari, Firefox, and Chrome), so long as the file extension is '.html'.
Entity references into XML
The first step in preserving entity references is to convert them into XML elements. The following example shows an XML document that contains entity references.
Example 1: an XML file containing entity references
The XML Compare product include filters for 'lexical preservation' which includes the processing of entity references. The following example shows the same file after being loaded with lexical preservation enabled.
Example 2: the XML file after passing through the lexical preservation input processing.
Part 1 - Encoding the doctype and internal subset declarations
Note that the encoded value of the
p2 entity declaration makes use of a non-standard XML entity character encoding scheme, as this simplifies some of the lexical preservation processing and assists with debugging.
Part 2a - Encoding the body in the preconfigured 'roundTrip' preservation mode
Part 2b - Encoding the body in the preconfigured 'entityRef' preservation mode
Part 2c - Encoding the body in the preconfigured 'nestedEntityRef' preservation mode
Raw intermediate comparison result
Having encoded the entity references for the purposes of comparison, the next step is to perform the comparison and see how changes in the encoded entity references are represented. We continue the above example by creating an input where the inner/nested city entity references in the definitions of both
&er2 have been swapped (as illustrated below).
This leads to the following post comparison intermediate result, which is split into two sections. The first section presenting the common doctype and internal subset result, and the second section that presents the changes to the XML content.
Part 1 - The raw comparison intermediate result for the doctype and internal subset declarations
Part 2a - The raw comparison intermediate result for the body in the preconfigured 'roundTrip' preservation mode
Part 2b - The raw comparison intermediate result for the body in the preconfigured 'entityRef' preservation mode
Part 2c - The raw comparison intermediate result for the body in the preconfigured 'nestedEntityRef' preservation mode
XML to Entity references
Before the raw comparison result can be converted into an output format, that format needs to be chosen. For the purposes of this sample and our explanation we use HTML as our output format, where changes are marked up using HTML's
del elements. Recall that we are currently presenting a simplified version of the sample code, which is (X)HTML.
The transformation from this DeltaV2 markup to the HTML markup is performed by an XSL transformation that is designed to cope with the types of change found in the sample. Note that it is intended only for the purposes of illustrating this sample code; it is not a general purpose DeltaV2 markup to HTML change markup filter.
We now continue the above example by illustrating the resolved output in two sections: the doctype and XML content sections.
Part 1 - The final result for the doctype and internal subset declarations.
Part 2a - The final result for the body in the preconfigured 'roundTrip' preservation mode.
When 'roundTrip' processing no differences are reported in the two inputs, as the changes to the DOCTYPE cannot be reported, and there are no changes to the body of the XML document itself.
Part 2b - The final result for the body in the preconfigured 'entityRef' preservation mode.
With 'entityRef' preservation enabled, it is now possible to detect that the content of entity referred to by
&p1 has changed, this is now represented by an insertion and deletion of that entity reference.
Part 2c - The final result for the body in the preconfigured 'nestedEntityRef' preservation mode.
With 'nestedEntityRef' preservation enabled, it is now possible to detect that the definition of entity referred to by
&p2 has changed, this is now represented by an insertion and deletion of that entity reference. In other words, neither
&p2's name or value have changed, but the means by which that value is calculated has changed.
Note that, in the cases where the replacement text is kept in the encoded entity references, it is possible to choose to show the changes in the replacement text, instead of preserving the entity references themselves. There are some contexts in which this is desirable, such as when supporting multiple output formats through a single pipeline, where some of the output formats can contain entity references and others cannot. The following output illustrates the changes if the replacement text is kept.