The features outlined in these sections are incorporated in the DITA Merge classes. These features can be configured using the Java or REST APIs.
The sections here include links to samples and guides that describe how DITA Merge processing can be configured for:
- the input DITA documents,
- the context in which the input DITA documents were created,
- the way in which the merge result is to be used.
Structure and Alignment
DITA Merge aligns the corresponding elements in the input files by taking account of the tree structure of the input XML. The corresponding elements will have the same element local name and namespace and will have corresponding parent elements. The elements can also be aligned using the keys. For more details about the merge process, refer Technical Specification.
The text processing in DITA Merge is case-sensitive. The text processing within each element of a document can be performed at different levels, as outlined below:
Text-node level: if the contents of a text-node changes the whole node is marked as a change
Word by Word: allows differences in content to be resolved down to specific words. In DITA Merge "Word by Word" is enabled by default. It can be enabled/disabled for each merge type by using the method setWordbyWord.
Please note that it is not possible to perform character by character text comparison with DITA Merge.
DITA Merge provides various configuration options for preserving the content that is often ignored when processing XML. This covers comments, processing instructions, CDATA tags, DOCTYPE declarations and entity references. The differences between comments, processing instructions and entity references are represented using
deltaxml:contentGroup elements in the result. See our web page, The DeltaV2 Format for DITA Merge for more details on Content Groups.
In DITA Merge, whitespace-only nodes are handled differently depending on whether they are significant or not. DITA Merge preserves significant whitespace (in mixed content). However it removes whitespace which is used for formatting the input DITA documents. DITA Merge provides an option for indenting the output once the merge is finished. This can be achieved for each merge type by using the method setIndent.
Handling tables is a complex process and it becomes even more complex when more than two tables are involved. Complications arise when merging tables where the structure has changed, for example, a column has been removed or added. DITA Merge handles CALS table and Simple DITA tables. Please refer to Merging Tables for more details.
DITA Merge has three different merge types: Concurrent Merge, Sequential Merge and Three Way Merge. Each merge type provides different result types. The page Merge Result Formats and Types summarizes the result types and their differences and offers advice as to which may be appropriate for different user requirements and use-cases.
DITA Merge result types are always some form of a delta file usually showing, for example, information about the deleted and added content in the result. Line-based merge algorithms often use automatic acceptance of simple changes such as deletions and additions to create a result. A "similar XML"-based algorithm is used in DITA Merge when rule processing is enabled. We call this rule-based processing because a set of rules are used to determine which types of change are automatically applied. Please refer to Rule-Based Processing for more details.
Three to two merge
This applies only to a three-way concurrent merge. The representation of three-way conflicts is not supported by many XML editors. On the other hand, XML editors do support two-way change tracking and this is well understood by users.
Therefore, representing three-way merge conflicts in two-way change tracking would provide a significant and useful simplification for users. The Three To Two Merge Guide provides detailed information about three to two merge and Three To Two Merge Use Cases code sample describe different use cases with three to two merge.
DITA Merge can be configured to recognize and process elements used predominantly for inline formatting. This allows content-based element alignment and supports overlaps in the formatted-text range between compared versions. Such formatting differences are represented using different formatting element representations. Formatting differences can be rendered or styled independently from structural changes according to need.
An example of formatting element processing is included in the Formatting Element Changes sample.
Sometimes the modified elements containing dissimilar text align during the merge process due to them sharing the same position in the different documents. DITA Merge provides an option to split such elements when the amount of unchanged text falls below 10%. Certain DITA elements such as section and title are only permitted once in the DITA DTD. Therefore, the element splitting feature is not applied to such elements. Element splitting is currently applied to following DITA elements : apiname, b, cite, cmdname, codeph, filepath, i, li, lines, msgnum, msgph, note, p, parmname, pre, q, sli, sub, sup, systemoutput, term, text, tm, tt, u, uicontrol, userinput, var, varname, wintitle. This feature can be enabled/disabled for each merge type by using the setElementSplitting method.
Systems often have the need to self-monitor or provide progress feedback to an end-user for operations that have the potential to take a noticeable amount of time. The DITA Merge API has provision for adding progress listeners via a MergeProgressListener interface, allowing a merge to be monitored through each significant processing stage.