Formatting Element Changes
Introduction
This document describes the concepts behind formatting element changes. For the resources associated with this sample, see here.
There are many XML languages that describe documents available, such as XHTML, DocBook and ODF. One common feature of these XML languages is that some of their elements are used not to define structure but to mark text as having a certain format. Elements such as <strong>
or <em>
in XHTML, <emphasis>
in DocBook and <text:span>
in ODF are examples of such elements.
XML Compare makes no distinction between structural elements and formatting elements when comparing two versions of a document. Because of this, changes to formatting elements can generate more change than expected in a delta file. Consider the following examples of a very simple documentation language.
Example 1: A simple XML document (input1.xml
in the Bitbucket sample, https://bitbucket.org/deltaxml/formatting-elements)
<document>
<para>This paragraph will have words made bold in the following version.</para>
<para>In this sentence, new words will be added and some made bold.</para>
</document>
Example 2: The same document with text changes and text formatting added (input2.xml
in the Bitbucket sample, https://bitbucket.org/deltaxml/formatting-elements)
<document>
<para>This paragraph will have <bold>words made bold</bold> in the following version.</para>
<para>In this sentence, <bold>new bold words</bold> will be added and some made bold.</para>
</document>
When these files are compared, they generate a delta file that shows a lot of change.
Example 3: The delta file without taking formatting elements into account (text is being compared word by word)
Although this is a correct representation of what has changed between the two versions of the document, it may not be intuitive to somebody making changes in a WYSIWYG editor. For a document editor the most important changes are textual changes, not the format changes, and this delta file shows more text change than actually occurred. XML Compare includes some XSLT filters to improve this result by taking into account those elements that are merely used for textual formatting.
DocumentComparator
The com.deltaxml.cores9api.DocumentComparator
is designed to handle structural changes such as this. In order for the comparator to identify formatting elements, they will need to be marked with a deltaxml:format="true"
attribute. In the example documents above, the <bold>
element needs to be marked in this way. The following XSLT template could be used to do this.
Example 4: an XSLT template to mark bold elements (defined in mark-formatting.xsl
in the Bitbucket sample, https://bitbucket.org/deltaxml/formatting-elements)
<xsl:template match="bold">
<xsl:copy>
<xsl:attribute name="deltaxml:format" select="'true'"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
This stylesheet needs to be added to the DocumentComparator by assigning a FilterChain to the PRE_FLATTENING extension point. The following Java code snippet (from
FormattingElementDemo.java
in Bitbucket) shows how to do this:
DocumentComparator comparator= new DocumentComparator();
FilterStepHelper fsh= comparator.newFilterStepHelper();
FilterChain formatMarker=
fsh.newSingleStepFilterChain(new File("mark-formatting.xsl"), "format-marker");
comparator.setExtensionPoint(ExtensionPoint.PRE_FLATTENING, formatMarker);
The final result for the example files above is as follows:
Example 5: the final result
Note that the first paragraph is marked as unchanged because there have been no textual changes. The second paragraph shows the new word added within the newly bolded text.
The way in which modified formatting elements are represented is controlled by the ModifiedFormatOutput
property. The default value for this property is 'BA', the property is explained further in the Javadoc and the DCP Schema Guide.
Summary
Changes to formatting elements are shown as a structural change unless they are marked.
Formatting elements can be marked using the
deltaxml:format='true'
attribute.The result is a delta file that focuses on textual change.
Changed formatting elements can be represented in different ways using the
ModifiedFormatOutput
property.
Running the sample
For the resources associated with this sample, see here. The sample resources should be checked-out, cloned or downloaded and unzipped into the samples directory of the XML Compare release. The resources should be located such that they are two levels below the top level release directory, for example DeltaXML-XML-Compare-10_0_0_j/samples/FormattingElements
.