There are many XML languages that describe documents available, such as XHTML, DocBook and ODF. One common feature of these XML languages is that some of their elements are used not to define structure but to mark text as having a certain format. Elements such as
<em> in XHTML,
<emphasis> in DocBook and
<text:span> in ODF are examples of such elements.
XML Compare makes no distinction between structural elements and formatting elements when comparing two versions of a document. Because of this, changes to formatting elements can generate more change than expected in a delta file. Consider the following examples of a very simple documentation language.
Example 1: A simple XML document (
input1.xml in the Bitbucket sample, https://bitbucket.org/deltaxml/formatting-elements)
Example 2: The same document with text changes and text formatting added (
input2.xml in the Bitbucket sample, https://bitbucket.org/deltaxml/formatting-elements)
When these files are compared, they generate a delta file that shows a lot of change.
Example 3: The delta file without taking formatting elements into account (text is being compared word by word)
Although this is a correct representation of what has changed between the two versions of the document, it may not be intuitive to somebody making changes in a WYSIWYG editor. For a document editor the most important changes are textual changes, not the format changes, and this delta file shows more text change than actually occurred. XML Compare includes some XSLT filters to improve this result by taking into account those elements that are merely used for textual formatting.
com.deltaxml.cores9api.DocumentComparator is designed to handle structural changes such as this. In order for the comparator to identify formatting elements, they will need to be marked with a
deltaxml:format="true" attribute. In the example documents above, the
<bold> element needs to be marked in this way. The following XSLT template could be used to do this.
Example 4: an XSLT template to mark bold elements (defined in
mark-formatting.xsl in the Bitbucket sample, https://bitbucket.org/deltaxml/formatting-elements)
This stylesheet needs to be added to the DocumentComparator by assigning a FilterChain to the PRE_FLATTENING extension point. The following Java code snippet (from
FormattingElementDemo.java in Bitbucket) shows how to do this:
Note that the equivalent C# code for the .NET API is broadly similar to the Java above and can be viewed in
FormattingElementDemo.cs in https://bitbucket.org/deltaxml/formatting-element-changes-.net
The final result for the example files above is as follows:
Example 5: the final result
Note that the first paragraph is marked as unchanged because there have been no textual changes. The second paragraph shows the new word added within the newly bolded text.
The way in which modified formatting elements are represented is controlled by the
ModifiedFormatOutput property. The default value for this property is 'BA', the property is explained further in the Javadoc and the DCP Schema Guide.
- Changes to formatting elements are shown as a structural change unless they are marked.
- Formatting elements can be marked using the
- The result is a delta file that focuses on textual change.
- Changed formatting elements can be represented in different ways using the
4. Running the sample
For the resources associated with this sample, see here for Java and here for .NET. The sample resources should be checked-out, cloned or downloaded and unzipped into the samples directory of the XML Compare release. The resources should be located such that they are two levels below the top level release directory, for example DeltaXML-XML-Compare-10_0_0_j/samples/FormattingElements.