In XML-based documentation systems, some elements are concerned with the structure of a document and other elements exist purely to suggest what the intended styling of the final document should be. These latter elements are what we will refer to as Formatting Elements. The set of formatting elements will vary between documentation systems. For example to request that words in the final document have bold styling DITA provides the "<b/>" element and DocBook provides the "<emphasis/>" element. XML Merge allows the user to specify the set of elements to be considered as formatting elements.
The formatting element functionality is intended to allow users to focus on either purely textual content differences between documents or styling differences by treating elements marked as formatting differently to other elements. When formatting element functionality is enabled, XML Merge provides additional output formats which show different views of the changes. The page Formatting Element Representations has details of the different output types. These additional output formats can show changes in several different ways, each of which allows the user to concentrate on a different aspect of the document changes. For example the content group output format provides a "text" output showing just the textual changes, and also a "content" output showing both textual and styling changes.
2. Example Input Files and Configuration
2.1. Input Files
To illustrate how XML Merge helps with formatting elements, we will examine the following four versions of a simple document. These might be multiple edits of the same document content over time or by different authors. We will show how XML Merge handles these changes with formatting elements both enabled and disabled.
The ancestor version of the document, shown below as version "A", has no formatting in the paragraph.
The first modification, shown as version "B" below, has no text changes, but specifies that the text "words in bold" is shown in an emphasised style.
The second modification, shown as version "C" below, has both textual changes and styling changes. The bold markup is removed and a single word is marked to be shown in italics, and there are two text changes.
The third modification of the document, shown as version "D" below, has applied multiple styling changes including italics as a child element of underline, and the text is unchanged from version "C".
The four inputs are shown below as an author might see them in a WYSIWYG view of an editor, and also combined into one diagram to highlight the changes between versions A,B,C, and D.
2.2. Configuring XML Merge Formatting Elements
As mentioned in the introduction, we need to specify what elements to consider as formatting elements. To do this we provide a stylesheet called mark-formatting.xsl, shown below.
This stylesheet matches on a XXX collection of element names that we want to be treated as formatting elements, and adds to them an attribute called
deltaxml:format with a value of '
true'. Note that the element names in this example are solely to illustrate the mechanics of using formatting elements and do not belong to any document format. As an example of a more concrete list of elements, for DITA you might choose the following elements as formatting elements:
The stylesheet must be added to the
PRE_FLATTENING extension point. This can be done using Java API code such as the following.
Next we specify what type of merge output is required, using the enum FormattingOutputType. The available options are listed in Formatting Element Representations .
2.2.1. FormattingOutputType Values
The formatting output is set using the setFormattingOutputType.
The code below selects content group output.
3. Default Output
Without formatting elements enabled the DeltaV2.1 output from Merge for our sample inputs would look like the following.
If we are only interested in text changes and not interested in format changes then this output, although a correct representation of the changes between versions, makes it difficult to see the text changes.
4. Formatting Elements DeltaV2.1 Output
Note how in the above DeltaV2.1 result with formatting elements enabled the changes highlight the formatting element hierarchy and there is less duplication of changed text than the default output.
5. Content Group Output
With the content group formatting element output selected (
CONTENT_GROUP) the above output is produced. Note how the
contentGroup (we will assume the presence of the "
deltaxml:" namespace prefix for the elements in this description) element contains two types of child elements;
content elements. The
text elements show only the text changes, ignoring any formatting, so that users who wish to concentrate on only textual changes need only examine these. The
content elements show both text and formatting changes grouped together in a way that makes it easier to see the entire changes. One intended use case for the content group formatting element output format is for Graphical User Interface (GUI) systems which can allow the end user to choose which type of differences to show; either just text or text and format. The following example illustrates one way this might be shown. The default display is to show only text additions and deletions, and when the highlighted text block is selected, a dropdown could show both the formatting and text changes from each version.
The sample is implemented in the
main method of a single Java class,
FormattingElements.java , and the approach taken is as follows:
- Create File objects for sample inputs.
- Instantiate the merge object using either ConcurrentMerge or SequentialMerge;
- Set the ancestor file (only for ConcurrentMerge) with:
- Add the demo input versions by using one of the following:
- Set the formatting output type to
- Call one of the following methods to produce a DeltaV2 representation of the merge.
7. Running The Sample
To run this sample, all the files you need together with instructions are available on Bitbucket