Technical Specification

Overview

DITA Merge merges three or more well-formed Darwin Information Typing Architecture (DITA) file inputs and generates a single well-formed XML file describing the differences between the files. The file representing the differences is known as a delta file. This delta file can be post processed to create a merged DITA document. We use the term 'file' in this specification but the inputs and outputs may use other datatype representations including strings, in-memory trees or event streams.

The DITA Merge software provides an application programming interface (API) allowing the software to be embedded in other Java-based software. DITA Merge also has a REST API which allows merge operations from wide range of programming languages and systems. The DITA Merge REST service can be used to invoke concurrent merge, three way concurrent merge and sequential merge either synchronously or asynchronously.

DITA Versions and Inputs

DITA Merge targets the language features of OASIS DITA 1.1 and DITA 1.2. XML catalog support is provided by the tool. For other versions or specialisations, some configuration of the catalog system will be necessary. DITA Merge only merges DITA Topic inputs, it does not merge DITA Maps.

Delta Files

A DeltaXML delta file has the same basic structure as the files that have been compared, with some additional attributes and elements. An XML namespace (the DeltaXML namespace) distinguishes these additional elements and attributes from those found in the input files. The delta file includes unchanged elements and attributes. The delta file provides a structured representation of the input files as a single file in which common data is shared.

XML Processing

Comments and processing instructions can be preserved so that they appear in the delta file. Internal parsed general entities can be expanded or preserved. CDATA sections can be expanded or preserved.

DeltaXML handles namespaces and will detect elements in the same namespace even if the namespace prefix values are different. An element or attribute in a namespace may have a different namespace prefix in the delta file from that used in the input file.

Merge Process

DITA Merge aligns the corresponding elements in the input files by taking account of the tree structure of the input XML. Corresponding elements will have the same element local name and namespace and will have corresponding parent elements. The root elements of the files must have the same local name and namespace. DITA Merge determines the alignment at each level in the tree structure between the files. The alignment algorithm determines the longest common subsequence of corresponding elements. The alignment algorithm gives precedence to elements that are exactly equal over those that have just the same element name and namespace. DITA Merge treats elements as ordered, i.e. a change in order is identified as a change.

The DITA inputs are loaded into DITA Merge in order. The order is recorded in an attribute on the root element of the merged delta file.

For a delta with type 'merge-concurrent', one input file is considered to be the common ancestor from which the other input files have been derived. As each successive file is loaded into the delta, the file is first aligned with the common ancestor and this alignment will take precedence over alignment between this file and other files previously loaded into the delta.

DITA Merge ignores the order of attributes. Changes to attributes are represented using elements in the DeltaXML namespace.

System Requirements

DITA Merge requires Java Standard Edition JRE version 8.0 or later. We test on: Solaris (Intel 64 bit), and macOS (Intel 64 bit). For support any reported problem should be reproducible on at least one of these platforms.