DITA Merge Features

 Table of Contents


The features outlined in these sections are incorporated in the DITA Merge classes. These features can be configured using the Java or REST APIs.

The sections here include links to samples and guides that describe how DITA Merge processing can be configured for:

  • the input DITA documents,
  • the context in which the input DITA documents were created,
  • the way in which the merge result is to be used.

Structure and Alignment

DITA Merge aligns the corresponding elements in the input files by taking account of the tree structure of the input XML. The corresponding elements will have the same element local name and namespace and will have corresponding parent elements. The elements can also be aligned using the keys. For more details about the merge process, refer Technical Specification.

Handling Text

The text processing in DITA Merge is case-sensitive. The text processing within each element of a document can be performed at different levels, as outlined below:

Text-node level: if the contents of a text-node changes the whole node is marked as a change

Word by Word: allows differences in content to be resolved down to specific words. In DITA Merge "Word by Word" is enabled by default. It can be enabled/disabled for each merge type by using the method setWordbyWord.

Please note that it is not possible to perform character by character text comparison with DITA Merge.

Lexical Preservation

DITA Merge provides various configuration options for preserving the content that is often ignored when processing XML. This covers comments, processing instructions, CDATA tags, DOCTYPE declarations and entity references. The differences between comments, processing instructions and entity references are represented using deltaxml:contentGroup elements in the result.  See our web page, The DeltaV2 Format for DITA Merge for more details on Content Groups.

There are two different settings to handle DOCTYPE declarations and entities - DoctypePreservationMode and EntityReferencePreservationMode

Whitespace handling

In DITA Merge, whitespace-only nodes are handled differently depending on whether they are significant or not. DITA Merge preserves significant whitespace (in mixed content). However it removes whitespace which is used for formatting the input DITA documents. DITA Merge provides an option for indenting the output once the merge is finished. This can be achieved for each merge type by using the method setIndent.

Table processing

Handling tables is a complex process and it becomes even more complex when more than two tables are involved. Complications arise when merging tables where the structure has changed, for example, a column has been removed or added. DITA Merge handles CALS table and Simple DITA tables. Please refer to Merging Tables for more details.

Result types

DITA Merge has three different merge types: Concurrent Merge, Sequential Merge and Three Way Merge. Each merge type provides different result types. The page Merge Result Formats and Types summarizes the result types and their differences and offers advice as to which may be appropriate for different user requirements and use-cases.

Rule processing

DITA Merge result types are always some form of a delta file usually showing, for example, information about the deleted and added content in the result. Line-based merge algorithms often use automatic acceptance of simple changes such as deletions and additions to create a result. A "similar XML"-based algorithm is used in DITA Merge when rule processing is enabled. We call this rule-based processing because a set of rules are used to determine which types of change are automatically applied. Please refer to Rule-Based Processing for more details.

Three to two merge

This applies only to a three-way concurrent merge. The representation of three-way conflicts is not supported by many XML editors. On the other hand, XML editors do support two-way change tracking and this is well understood by users.

Therefore, representing three-way merge conflicts in two-way change tracking would provide a significant and useful simplification for users. The Three To Two Merge Guide provides detailed information about three to two merge and Three To Two Merge Use Cases code sample describe different use cases with three to two merge.

Formatting elements

DITA Merge can be configured to recognize and process elements used predominantly for inline formatting. This allows content-based element alignment and supports overlaps in the formatted-text range between compared versions. Such formatting differences are represented using different formatting element representations. Formatting differences can be rendered or styled independently from structural changes according to need.

An example of formatting element processing is included in the Formatting Element Changes sample.

Element splitting

Sometimes the modified elements containing dissimilar text align during the merge process due to them sharing the same position in the different documents. DITA Merge provides an option to split such elements when the amount of unchanged text falls below 10%. Certain DITA elements such as section and title are only permitted once in the DITA DTD. Therefore, the element splitting feature is not applied to such elements. Element splitting is currently applied to following DITA elements : apiname, b, cite, cmdname, codeph, filepath, i, li, lines, msgnum, msgph, note, p, parmname, pre, q, sli, sub, sup, systemoutput, term, text, tm, tt, u, uicontrol, userinput, var, varname, wintitle. This feature can be enabled/disabled for each merge type by using the setElementSplitting method.

Progress Listeners

Systems often have the need to self-monitor or provide progress feedback to an end-user for operations that have the potential to take a noticeable amount of time. The DITA Merge API has provision for adding progress listeners via a MergeProgressListener interface, allowing a merge to be monitored through each significant processing stage.

#content .code