Skip to main content
Skip table of contents

Formatting Elements Sample

Introduction

In XML-based documentation systems, some elements are concerned with the structure of a document and other elements exist purely to suggest what the intended styling of the final document should be. These latter elements are what we will refer to as Formatting Elements. The set of formatting elements will vary between documentation systems. For example to request that words in the final document have bold styling DITA provides the "<b/>" element and DocBook provides the "<emphasis/>" element. XML Merge allows the user to specify the set of elements to be considered as formatting elements.

The formatting element functionality is intended to allow users to focus on either purely textual content differences between documents or styling differences by treating elements marked as formatting differently to other elements. When formatting element functionality is enabled, XML Merge provides additional output formats which show different views of the changes. The page Formatting Element Representations has details of the different output types. These additional output formats can show changes in several different ways, each of which allows the user to concentrate on a different aspect of the document changes. For example the content group output format provides a "text" output showing just the textual changes, and also a "content" output showing both textual and styling changes.

Example Input Files and Configuration

Input Files

To illustrate how XML Merge helps with formatting elements, we will examine the following four versions of a simple document. These might be multiple edits of the same document content over time or by different authors. We will show how XML Merge handles these changes with formatting elements both enabled and disabled.

The ancestor version of the document, shown below as version "A", has no formatting in the paragraph.

Ancestor Version "A"
XML
<document>
  <para>This paragraph will have words in bold in the following version.</para>
</document>

The first modification, shown as version "B" below, has no text changes, but specifies that the text "words in bold" is shown in an emphasised style.

Modified Version "B"
XML
<document>
  <para>This paragraph will have <bold>words in bold</bold> in the following version.</para>
</document>

The second modification, shown as version "C" below, has both textual changes and styling changes. The bold markup is removed and a single word is marked to be shown in italics, and there are two text changes.

Modified Version "C"
XML
<document>
  <para>This paragraph will have words in <italic>italics</italic> in this version.</para>
</document>

The third modification of the document, shown as version "D" below, has applied multiple styling changes including italics as a child element of underline, and the text is unchanged from version "C". 

Modified Version "D"
XML
<document>
  <para>This paragraph will have <underline>words in <italic>italics</italic></underline> in this version.</para>
</document>

The four inputs are shown below as an author might see them in a WYSIWYG view of an editor, and also combined into one diagram to highlight the changes between versions A,B,C, and D.

Configuring XML Merge Formatting Elements

As mentioned in the introduction, we need to specify what elements to consider as formatting elements. To do this we provide a stylesheet called mark-formatting.xsl, shown below.

This stylesheet matches on a XXXcollectionof element names that we want to be treated as formatting elements, and adds to them an attribute called deltaxml:format  with a value of 'true'. Note that the element names in this example are solely to illustrate the mechanics of using formatting elements and do not belong to any document format. As an example of a more concrete list of elements, for DITA you might choose the following elements as formatting elements:

"b|i|sup|sub|tt|u|line-through|overline"

The stylesheet must be added to the PRE_FLATTENING extension point. This can be done using Java API code such as the following.

Specify Formatting Elements Using Java API
JAVA
// Specify which elements to consider as formatting
FilterStepHelper fsh;
fsh= merge.getFilterStepHelper();
FilterChain fc;
fc= fsh.newSingleStepFilterChain(new File(dataFolder, "mark-formatting.xsl"), "mark-formatting");
merge.setExtensionPoint(ExtensionPoint.PRE_FLATTENING, fc);

Next we specify what type of merge output is required, using the enum FormattingOutputType.The available options are listed inFormatting Element Representations.

FormattingOutputType Values 

The formatting output is set using the setFormattingOutputType.

The code below selects content group output.

Set Output Type
JAVA
merge.setFormattingOutputType(FormattingOutputType.DELTA_V_2_1);

Default Output

Without formatting elements enabled the DeltaV2.1 output from Merge for our sample inputs would look like the following.

Output With No Formatting Element Support
XML
<document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
  deltaxml:version-order="A, B, C, D" deltaxml:content-type="merge-concurrent"
  deltaxml:version="2.0" deltaxml:deltaV2="A!=B!=C!=D">
  <para deltaxml:deltaV2="A!=B!=C!=D">This paragraph will have
    <deltaxml:textGroup deltaxml:deltaV2="A=C!=B">
      <deltaxml:text deltaxml:deltaV2="A=C">words</deltaxml:text>
      <deltaxml:text deltaxml:deltaV2="B"> </deltaxml:text>
    </deltaxml:textGroup>
    <bold deltaxml:deltaV2="B">words in bold</bold>
    <deltaxml:textGroup deltaxml:deltaV2="D">
      <deltaxml:text deltaxml:deltaV2="D"> </deltaxml:text>
    </deltaxml:textGroup>
    <underline deltaxml:deltaV2="D">words in <italic>italics</italic></underline>
    <deltaxml:textGroup deltaxml:deltaV2="A!=C!=D">
      <deltaxml:text deltaxml:deltaV2="A"> in bold</deltaxml:text>
      <deltaxml:text deltaxml:deltaV2="C"> in </deltaxml:text>
      <deltaxml:text deltaxml:deltaV2="D"> in this</deltaxml:text>
    </deltaxml:textGroup>
    <italic deltaxml:deltaV2="C">italics</italic>
    <deltaxml:textGroup deltaxml:deltaV2="A=B!=C">
      <deltaxml:text deltaxml:deltaV2="A=B"> in the following</deltaxml:text>
      <deltaxml:text deltaxml:deltaV2="C"> in this</deltaxml:text>
    </deltaxml:textGroup> version.</para>
</document>

If we are only interested in text changes and not interested in format changes then this output, although a correct representation of the changes between versions, makes it difficult to see the text changes.

Formatting Elements DeltaV2.1 Output

DeltaV2.1 Output With Formatting Elements
XML
<document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
  deltaxml:version-order="A, B, C, D" deltaxml:content-type="merge-concurrent"
  deltaxml:version="2.1" deltaxml:deltaV2="A!=B!=C!=D">
  <para deltaxml:deltaV2="A!=B!=C!=D">This paragraph will have 
    <bold deltaxml:deltaTag="B" deltaxml:deltaV2="A!=B!=C!=D">
      <underline deltaxml:deltaTag="D" deltaxml:deltaV2="A=B!=C!=D">words in 
        <italic deltaxml:deltaTag="C,D" deltaxml:deltaV2="A=B!=C=D">
          <deltaxml:textGroup deltaxml:deltaV2="A=B!=C=D">
            <deltaxml:text deltaxml:deltaV2="A=B">bold</deltaxml:text>
            <deltaxml:text deltaxml:deltaV2="C=D">italics</deltaxml:text>
          </deltaxml:textGroup>
        </italic>
      </underline>
    </bold> in
    <deltaxml:textGroup deltaxml:deltaV2="A=B!=C=D">
      <deltaxml:text deltaxml:deltaV2="A=B">the following</deltaxml:text>
      <deltaxml:text deltaxml:deltaV2="C=D">this</deltaxml:text>
    </deltaxml:textGroup> version.
  </para>
</document>

Note how in the above DeltaV2.1 result with formatting elements enabled the changes highlight the formatting element hierarchy and there is less duplication of changed text than the default output.

Content Group Output

Content Group Output
XML
<document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
  deltaxml:version-order="A, B, C, D" deltaxml:content-type="merge-concurrent"
  deltaxml:version="2.0" deltaxml:deltaV2="A!=B!=C!=D">
  <para deltaxml:deltaV2="A!=B!=C!=D">This paragraph will have 
    <deltaxml:contentGroup deltaxml:wordDelta="A=B!=C=D" deltaxml:deltaV2="A!=B!=C!=D">
      <deltaxml:text deltaxml:deltaV2="A=B">words in bold</deltaxml:text>
      <deltaxml:text deltaxml:deltaV2="C=D">words in italics</deltaxml:text>
      <deltaxml:content deltaxml:deltaV2="A">words in bold</deltaxml:content>
      <deltaxml:content deltaxml:deltaV2="B"><bold>words in bold</bold></deltaxml:content>
      <deltaxml:content deltaxml:deltaV2="C">words in <italic>italics</italic></deltaxml:content>
      <deltaxml:content deltaxml:deltaV2="D"><underline>words in <italic>italics</italic></underline></deltaxml:content>
    </deltaxml:contentGroup> in
    <deltaxml:textGroup deltaxml:deltaV2="A=B!=C=D">
      <deltaxml:text deltaxml:deltaV2="A=B">the following</deltaxml:text>
      <deltaxml:text deltaxml:deltaV2="C=D">this</deltaxml:text>
    </deltaxml:textGroup> version.
  </para>
</document>

With the content group formatting element output selected (CONTENT_GROUP) the above output is produced. Note how the contentGroup (we will assume the presence of the "deltaxml:" namespace prefix for the elements in this description) element contains two types of child elements;  text and content elements. The text elements show only the text changes, ignoring any formatting, so that users who wish to concentrate on only textual changes need only examine these. The content elements show both text and formatting changes grouped together in a way that makes it easier to see the entire changes. One intended use case for the content group formatting element output format is for Graphical User Interface (GUI) systems which can allow the end user to choose which type of differences to show; either just text or text and format. The following example illustrates one way this might be shown. The default display is to show only text additions and deletions, and when the highlighted text block is selected, a dropdown could show both the formatting and text changes from each version.

Implementation

The sample is implemented in the main method of a single Java class, FormattingElements.java, and the approach taken is as follows:

  1. Create File objects for sample inputs.

  2. Instantiate the merge object using either ConcurrentMerge or SequentialMerge;

  3. Set the ancestor file (only for ConcurrentMerge) with: ConcurrentMerge.setAncestor ;

  4. Add the demo input versions by using one of the following:

    1. ConcurrentMerge.addVersion

    2. SequentialMerge.addVersion

  5. Set the formatting output typeto DELTA_V_2_1 or CONTENT_GROUP.

  6. Call one of the following methods to produce a DeltaV2 representation of the merge.

    1. ConcurrentMerge.extractAll

    2. SequentialMerge.extractAll

Running The Sample

To run this sample, all the files you need together with instructions are available on Bitbucket


JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.