Formatting Element Changes

 Table of Contents

Introduction

This document describes the concepts behind formatting element changes. For the resources associated with this sample, see here for Java and here for .NET

There are many XML languages that describe documents available, such as XHTML, DocBook and ODF. One common feature of these XML languages is that some of their elements are used not to define structure but to mark text as having a certain format. Elements such as <strong> or <em> in XHTML, <emphasis> in DocBook and <text:span> in ODF are examples of such elements.

XML Compare makes no distinction between structural elements and formatting elements when comparing two versions of a document. Because of this, changes to formatting elements can generate more change than expected in a delta file. Consider the following examples of a very simple documentation language.

Example 1: A simple XML document (input1.xml in the Bitbucket sample, https://bitbucket.org/deltaxml/formatting-elements)

<document>
  <para>This paragraph will have words made bold in the following version.</para>
  <para>In this sentence, new words will be added and some made bold.</para>
</document>

Example 2: The same document with text changes and text formatting added (input2.xml in the Bitbucket sample, https://bitbucket.org/deltaxml/formatting-elements)

<document>
  <para>This paragraph will have <bold>words made bold</bold> in the following version.</para>
  <para>In this sentence, <bold>new bold words</bold> will be added and some made bold.</para>
</document>

When these files are compared, they generate a delta file that shows a lot of change.

Example 3: The delta file without taking formatting elements into account (text is being compared word by word)

<document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
          deltaxml:deltaV2="A!=B"
          deltaxml:version="2.0"
          deltaxml:content-type="full-context">
  <para deltaxml:deltaV2="A!=B">
    This paragraph will have 
    <deltaxml:textGroup deltaxml:deltaV2="A">
      <deltaxml:text deltaxml:deltaV2="A">words</deltaxml:text>
    </deltaxml:textGroup>
    <bold deltaxml:deltaV2="B">words made bold</bold>
    <deltaxml:textGroup deltaxml:deltaV2="A">
      <deltaxml:text deltaxml:deltaV2="A">made bold </deltaxml:text>
    </deltaxml:textGroup>
    in the following version.
  </para>
  <para deltaxml:deltaV2="A!=B">
    In this sentence, 
    <deltaxml:textGroup deltaxml:deltaV2="A">
      <deltaxml:text deltaxml:deltaV2="A">new</deltaxml:text>
    </deltaxml:textGroup>
    <bold deltaxml:deltaV2="B">new bold words</bold>
    <deltaxml:textGroup deltaxml:deltaV2="A">
      <deltaxml:text deltaxml:deltaV2="A">words </deltaxml:text>
    </deltaxml:textGroup>
    will be added and some made bold.
  </para>
</document>

Although this is a correct representation of what has changed between the two versions of the document, it may not be intuitive to somebody making changes in a WYSIWYG editor. For a document editor the most important changes are textual changes, not the format changes, and this delta file shows more text change than actually occurred. XML Compare includes some XSLT filters to improve this result by taking into account those elements that are merely used for textual formatting.

DocumentComparator

The com.deltaxml.cores9api.DocumentComparator is designed to handle structural changes such as this. In order for the comparator to identify formatting elements, they will need to be marked with a deltaxml:format="true" attribute. In the example documents above, the <bold> element needs to be marked in this way. The following XSLT template could be used to do this.

Example 4: an XSLT template to mark bold elements (defined in mark-formatting.xsl in the Bitbucket sample, https://bitbucket.org/deltaxml/formatting-elements)

<xsl:template match="bold">
  <xsl:copy>
    <xsl:attribute name="deltaxml:format" select="'true'"/>
    <xsl:apply-templates select="node()"/>
  </xsl:copy>
</xsl:template>

This stylesheet needs to be added to the DocumentComparator by assigning a FilterChain to the PRE_FLATTENING extension point. The following Java code snippet (from FormattingElementDemo.java in Bitbucket) shows how to do this:

DocumentComparator comparator= new DocumentComparator();
FilterStepHelper fsh= comparator.newFilterStepHelper(); 
FilterChain formatMarker= 
  fsh.newSingleStepFilterChain(new File("mark-formatting.xsl"), "format-marker");
comparator.setExtensionPoint(ExtensionPoint.PRE_FLATTENING, formatMarker);

Note that the equivalent C# code for the .NET API is broadly similar to the Java above and can be viewed in FormattingElementDemo.cs in https://bitbucket.org/deltaxml/formatting-element-changes-.net

The final result for the example files above is as follows:

Example 5: the final result

<document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
          deltaxml:deltaV2="A!=B"
          deltaxml:version="2.0"
          deltaxml:content-type="full-context">
  <para deltaxml:deltaV2="A=B">
    This paragraph will have <bold>words made bold</bold> in the following version.
  </para>
  <para deltaxml:deltaV2="A!=B">
    In this sentence, <bold deltaxml:deltaV2="A!=B">new 
    <deltaxml:textGroup deltaxml:deltaV2="B">
      <deltaxml:text deltaxml:deltaV2="B">bold </deltaxml:text>
    </deltaxml:textGroup>
    words</bold> will be added and some made bold.
  </para>
</document>

Note that the first paragraph is marked as unchanged because there have been no textual changes. The second paragraph shows the new word added within the newly bolded text.

Summary

  • Changes to formatting elements are shown as a structural change unless they are marked.
  • Formatting elements can be marked using the deltaxml:format='true' attribute.
  • The result is a delta file that focuses on textual change.

Running the sample

For the resources associated with this sample, see here for Java and here for .NET.  The sample resources should be checked-out, cloned or downloaded and unzipped into the samples directory of the XML Compare release. The resources should be located such that they are two levels below the top level release directory, for example DeltaXML-XML-Compare-10_0_0_j/samples/FormattingElements.

#content .code