Formatting Elements Sample

 Table of Contents

1. Introduction

In XML-based documentation systems, some elements are concerned with the structure of a document and other elements exist purely to suggest what the intended styling of the final document should be. These latter elements are what we will refer to as Formatting Elements. The set of formatting elements will vary between documentation systems. For example to request that words in the final document have bold styling DITA provides the "<b/>" element. DITA Merge allows the user to specify the set of elements to be considered as formatting elements.

The formatting element functionality is intended to allow users to focus on either purely textual content differences between documents or styling differences by treating elements marked as formatting differently to other elements. When formatting element functionality is enabled, DITA Merge provides additional output formats which show different views of the changes. The page Formatting Element Representations has details of the different output types. These additional output formats can show changes in several different ways, each of which allows the user to concentrate on a different aspect of the document changes. For example the content group output format provides a "text" output showing just the textual changes, and also a "content" output showing both textual and styling changes.

2. Example Input Files and Configuration

2.1. Input Files

To illustrate how DITA Merge helps with formatting elements, we will examine the following four versions of a simple document. These might be multiple edits of the same document content over time or by different authors. We will show how DITA Merge handles these changes with formatting elements both enabled and disabled.

The ancestor version of the document, shown below as version "A", has no formatting in the paragraph.

Ancestor Version "A"
<topic id="topic_v4z_cjd_d3b">
  <title></title>
  <body>
    <p>This paragraph will have words in bold in the following version.</p>
  </body>
</topic>

The first modification, shown as version "B" below, has no text changes, but specifies that the text "words in bold" is shown in an 'b' style.

Modified Version "B"
<topic id="topic_v4z_cjd_d3b">
  <title></title>
  <body>
    <p>This paragraph will have <b>words in bold</b> in the following version.</p>
  </body>
</topic>

The second modification, shown as version "C" below, has both textual changes and styling changes. The 'b' markup is removed and a single word is marked to be shown in 'i', and there are two text changes.

Modified Version "C"
<topic id="topic_v4z_cjd_d3b">
  <title></title>
  <body>
    <p>This paragraph will have words in <i>italics</i> in this version.</p>
  </body>
</topic>

The third modification of the document, shown as version "D" below, has applied multiple styling changes including 'i' as a child element of 'u', and the text is unchanged from version "C". 

Modified Version "D"
<topic id="topic_v4z_cjd_d3b">
  <title></title>
  <body>
    <p>This paragraph will have <u>words in <i>italics</i></u> in this version.</p>
  </body>
</topic>

The four inputs are shown below as an author might see them in a WYSIWYG view of an editor, and also combined into one diagram to highlight the changes between versions A,B,C, and D.

2.2. Configuring DITA Merge Formatting Elements

In DITA Merge, the formatting elements can be enabled and configured by setting FormattingElementsConfiguration on DITA merge object.

When formatting elements feature is enabled, DITA Merge, by default, marks following elements as formatting elements:

Default formatting elements:

apiname, b, cite, cmdname, codeph, filepath, i, lines, msgnum, msgph, parmname, pre, q, sep, sub, sup, systemoutput, term, tm, tt, u, uicontrol, userinput, var, wintitle

This list can be modified by using the helper methods in the class FormattingElementsConfiguration.

Specify Formatting Elements Using Java API
//specifying the formatting elements
FormattingElementsConfiguration formattingElementsConfiguration= new FormattingElementsConfiguration();
    
//default formatting elements list can be changed using the methods below
formattingElementsConfiguration.setFormattingElements(new HashSet<>(Arrays.asList("b", "i", "line-through")));
formattingElementsConfiguration.addFormattingElements(new HashSet<>(Arrays.asList("u")));
formattingElementsConfiguration.removeFormattingElements(new HashSet<>(Arrays.asList("line-through")));


ConcurrentMerge merge= new ConcurrentMerge();
merge.setFormattingElementsConfiguration(formattingElementsConfiguration);

Next we specify what type of merge output is required, using the enum FormattingOutputType. The available options are listed in  Formatting Element Representations.

2.2.1. FormattingOutputType Values 

The formatting output is set using the setFormattingOutputType.

The code below selects content group output.

Set Output Type
merge.setFormattingOutputType(FormattingOutputType.DELTA_V_2_1);

3. Default Output

Without formatting elements enabled the output from DITA Merge for our sample inputs would look like the following.

Output With No Formatting Element Support
<topic deltaxml:version-order="A, B, C, D" deltaxml:content-type="merge-concurrent" 
  deltaxml:version="2.0" deltaxml:deltaV2="A!=B!=C!=D" class="- topic/topic ">
  <title deltaxml:deltaV2="A=B=C=D" class="- topic/title "/>
  <body deltaxml:deltaV2="A!=B!=C!=D" class="- topic/body ">
    <p deltaxml:deltaV2="A!=B!=C!=D" class="- topic/p ">This paragraph will have 
        <deltaxml:textGroup deltaxml:deltaV2="A=C">
        <deltaxml:text deltaxml:deltaV2="A=C">words</deltaxml:text>
      </deltaxml:textGroup>
      <b deltaxml:deltaV2="B" class="+ topic/ph hi-d/b ">words in bold</b>
      <u deltaxml:deltaV2="D" class="+ topic/ph hi-d/u ">words in  <i class="+ topic/ph hi-d/i "
          >italics</i>
      </u>
      <deltaxml:textGroup deltaxml:deltaV2="A!=C!=D">
        <deltaxml:text deltaxml:deltaV2="A"> in bold</deltaxml:text>
        <deltaxml:text deltaxml:deltaV2="C"> in </deltaxml:text>
        <deltaxml:text deltaxml:deltaV2="D"> in this</deltaxml:text>
      </deltaxml:textGroup>
      <i deltaxml:deltaV2="C" class="+ topic/ph hi-d/i ">italics</i>
      <deltaxml:textGroup deltaxml:deltaV2="A=B!=C">
        <deltaxml:text deltaxml:deltaV2="A=B"> in the following</deltaxml:text>
        <deltaxml:text deltaxml:deltaV2="C"> in this</deltaxml:text>
      </deltaxml:textGroup> version. </p>
  </body>
</topic>

If we are only interested in text changes and not interested in format changes then this output, although a correct representation of the changes between versions, makes it difficult to see the text changes.

4. Formatting Elements DeltaV2.1 Output

DeltaV2.1 Output With Formatting Elements
<topic deltaxml:version-order="A, B, C, D" deltaxml:content-type="merge-concurrent"
  deltaxml:version="2.1" deltaxml:deltaV2="A!=B!=C!=D" class="- topic/topic ">
  <title deltaxml:deltaV2="A=B=C=D" class="- topic/title "/>
  <body deltaxml:deltaV2="A!=B!=C!=D" class="- topic/body ">
    <p deltaxml:deltaV2="A!=B!=C!=D" class="- topic/p ">This paragraph will have  <b
        deltaxml:deltaTag="B" deltaxml:deltaV2="A!=B!=C!=D" class="+ topic/ph hi-d/b ">
        <u deltaxml:deltaTag="D" deltaxml:deltaV2="A=B!=C!=D" class="+ topic/ph hi-d/u ">words in 
            <i deltaxml:deltaTag="C,D" deltaxml:deltaV2="A=B!=C=D" class="+ topic/ph hi-d/i ">
            <deltaxml:textGroup deltaxml:deltaV2="A=B!=C=D">
              <deltaxml:text deltaxml:deltaV2="A=B">bold</deltaxml:text>
              <deltaxml:text deltaxml:deltaV2="C=D">italics</deltaxml:text>
            </deltaxml:textGroup>
          </i>
        </u>
      </b> in  <deltaxml:textGroup deltaxml:deltaV2="A=B!=C=D">
        <deltaxml:text deltaxml:deltaV2="A=B">the following</deltaxml:text>
        <deltaxml:text deltaxml:deltaV2="C=D">this</deltaxml:text>
      </deltaxml:textGroup> version. </p>
  </body>
</topic>  

Note how in the above DeltaV2.1 result, with formatting elements enabled, the changes highlight the formatting element hierarchy and there is less duplication of changed text than the default output.

5. Content Group Output

Content Group Output
<topicdeltaxml:version-order="A, B, C, D" deltaxml:content-type="merge-concurrent" 
  deltaxml:version="2.0" deltaxml:deltaV2="A!=B!=C!=D" class="- topic/topic ">
  <title deltaxml:deltaV2="A=B=C=D" class="- topic/title "/>
  <body deltaxml:deltaV2="A!=B!=C!=D" class="- topic/body ">
    <p deltaxml:deltaV2="A!=B!=C!=D" class="- topic/p ">This paragraph will have 
        <deltaxml:contentGroup deltaxml:wordDelta="A=B!=C=D" deltaxml:deltaV2="A!=B!=C!=D">
        <deltaxml:text deltaxml:deltaV2="A=B">words in bold</deltaxml:text>
        <deltaxml:text deltaxml:deltaV2="C=D">words in italics</deltaxml:text>
        <deltaxml:content deltaxml:deltaV2="A">words in bold</deltaxml:content>
        <deltaxml:content deltaxml:deltaV2="B">
          <b class="+ topic/ph hi-d/b ">words in bold</b>
        </deltaxml:content>
        <deltaxml:content deltaxml:deltaV2="C">words in  <i class="+ topic/ph hi-d/i ">italics</i>
        </deltaxml:content>
        <deltaxml:content deltaxml:deltaV2="D">
          <u class="+ topic/ph hi-d/u ">words in  <i class="+ topic/ph hi-d/i ">italics</i></u>
        </deltaxml:content>
      </deltaxml:contentGroup> in  <deltaxml:textGroup deltaxml:deltaV2="A=B!=C=D">
        <deltaxml:text deltaxml:deltaV2="A=B">the following</deltaxml:text>
        <deltaxml:text deltaxml:deltaV2="C=D">this</deltaxml:text>
      </deltaxml:textGroup> version. </p>
  </body>
</topic>

With the content group formatting element output selected (CONTENT_GROUP) the above output is produced. Note how the contentGroup(we will assume the presence of the "deltaxml:" namespace prefix for the elements in this description) element contains two types of child elements; text and content elements. The text elements show only the text changes, ignoring any formatting, so that users who wish to concentrate on only textual changes need only examine these. The content elements show both text and formatting changes grouped together in a way that makes it easier to see the entire changes. One intended use case for the content group formatting element output format is for Graphical User Interface (GUI) systems which can allow the end user to choose which type of differences to show; either just text or text and format. The following example illustrates one way this might be shown. The default display is to show only text additions and deletions, and when the highlighted text block is selected, a dropdown could show both the formatting and text changes from each version.

6. Implementation

The sample is implemented in the main method of a single Java class, FormattingElements.java, and the approach taken is as follows:

  1. Create File objects for sample inputs.
  2. Instantiate the merge object using either ConcurrentMerge or SequentialMerge;
  3. Set the ancestor file (only for ConcurrentMerge) with: ConcurrentMerge.setAncestor;
  4. Add the demo input versions by using one of the following:
    1. ConcurrentMerge.addVersion
    2. SequentialMerge.addVersion
  5. Set the formatting output type to DELTA_V_2_1 or CONTENT_GROUP.
  6. Call one of the following methods to produce a DeltaV2 representation of the merge.
    1. ConcurrentMerge.extractAll
    2. SequentialMerge.extractAll

7. Running The Sample

To run this sample, all the files you need together with instructions are available on Bitbucket : DITA Merge - Formatting Elements Sample.

#content .code