Output Formats
Introduction
The direct XML Compare output uses the 'Delta'. It has the look and feel of the original input documents, but with annotations added to describe the differences. This is the standard output, but other output formats can also be produced. XML Compare has options to give the following:
For the Document Comparator only, the Tracked Changes formats of the ArborText, FrameMaker, Oxygen and XMetal XML editors.
XML Compare includes output filters for either a 'side-by-side' or a 'folding' html rendering, called a 'DiffReport'. Custom output formats can be created by adding an XSLT output filter to perform a final transform on the chosen pre-defined output format or, perhaps more commonly, on the raw Delta format.
Direct XML Compare Output
The direct output from XML Compare is the 'Delta', this is the base XML output for both the Pipelined Comparator and the Document Comparator. By default the Delta includes all content, including unchanged content, but there's also an option for a 'patch' output where only the changes are included. Other output format options are also available and described in this section, these are essentially transforms of the original Delta.
The Delta
The Delta XML output from XML Compare uses the DeltaV2 format.
The Delta is the XML output direct from the XML Compare comparator which uses the DeltaV2 format to mark up changes. This format is designed to be compact whilst also making code that processes it clean and efficient. Version 2.0 of the DeltaV2 format is used by default, but if the Document Comparator is used with marked up formatting elements, then version 2.1 is used. Version 2.1 is a superset of 2.0 with extensions to represent overlapping XML hierarchies.
At its simplest, the DeltaV2 format is a representation of the 'A' and 'B' documents in a single document. For this, deltaxml:deltaV2
attributes (in the DeltaXML namespace) are added to all elements where differences are found. The deltaV2
attribute may hold one of the following values: A
, B
, A=B
and A!=B
. The A
or B
represents the document source, and the =
or !=
separator indicates if the matching source elements are the same or different. Extra elements in the DeltaXML namespace are used to represent modified text or attribute nodes. The DeltaV2 format is defined in full in the DeltaV2 reference, a more detailed description of the extensions added in version 2.1 are described in the reference: Overlapping Hierarchies in DeltaV2.
Document Comparator Formats
Tracked Changes
Many XML editors support a tracked changes feature incorporated into an Author Mode with a WYSIWYG view; the output from XML Compare can be be represented as tracked changes in supported tools. The main benefit is that detected changes can be more easily accepted or rejected and further edits made within the chosen editor. The Document Comparator API provides a setResultFormat method of the OutputFormatConfiguration object to produce output conforming to the tracked changes format for the following XML editors:
Changes to Attributes
Changes in attributes are supported by both Oxygen and ArborText tracked changes systems.
The tracked changes feature supports a number of XML Editors
Arbortext Tracked Changes
When using Arbortext Tracked Changes Markup the output of the comparison is an Arbortext tracked-change version of a document. Here XML elements can contain Arbortext tracked change elements.
Assuming that the inputs to the comparison are valid XML documents and all the changes to the output are accepted (or rejected) as previously discussed, then the resulting document will be a valid XML document. Note that in general it is not possible to guarantee that an arbitrary combination of 'accepted' and 'rejected' changes will result in a valid document, due to the granularity of change.
The generated tracked changes use three of the available tracked change elements:
atict:add | For inserted content |
atict:del | For deleted content |
atict:chgm | For attribute modification (outside the context of a table). |
Changes within comments and CDATA Sections results in the whole of the old version of the text being marked as deleted, and the whole of the new version of the text being marked as inserted.
The Arbortext tracked change format supports changes to elements, text, attributes, comments and processing instructions. It also supports both cell and row level changes within tables.
For example, to choose Arbortext tracked changes:
DocumentComparator dc= new DocumentComparator();
dc.getOutputFormatConfiguration().setResultFormat(ResultFormat.ARBORTEXT_TC);
Adobe FrameMaker Tracked Changes
The FrameMaker Tracked Changes Markup output format is a valid XML document that includes annotations to represent changes in the document. These changes are displayed in the FrameMaker Editor.
The format employs FrameMaker's method for tracking changes, exploiting XML processing-instructions and comments to mark additions and deletions within documents.
The FrameMaker tracked change format is restricted to the Author and WYSIWYG views. These views do not support edits within XML marked as CDATA; changes to CDATA sections are therefore converted to normally parsed XML content. This format uses a pseudo-entity '&fm-double-hyphen;' to allow two adjacent hyphen characters to be represented within comments - which the track change format uses to contain deleted content.
FrameMaker tracked change format supports supports changes to elements, text, comments and processing instructions but not attributes. As with most editors, FrameMaker has a few limitations on what types of change can be tracked for different element types, an example is the addition/deletion of table rows. For this specific example, the output format defaults to DOWN (affected by FrameMakerTrackChangesTableChangeMode) where changes in rows and cells are pushed down to the cell content level. Other limitations have not been fully explored and its possible that some changes marked in the output format will be ignored by FrameMaker.
Code Snippet
OutputFormatConfiguration formatConfig= new OutputFormatConfiguration();
formatConfig.setResultFormat(ResultFormat.FRAMEMAKER_TC);
Oxygen Tracked Changes
When using Oxygen Tracked Changes Markup the output of the comparison is itself an XML document. The format is supported by the Oxygen Editor and Author products.
This output format uses processing instructions to identify change, where deleted content is typically contained within the processing instruction and inserted content is typically sandwiched between two processing instructions, one marking the start of the insertion and the other the end. Hence, removing (or ignoring) the processing instructions has the affect of accepting all changes to the document.
Comments and CDATA Sections are handled specially, as processing instructions cannot be placed inside their content. Instead, changes are identified by a sequence of processing instructions that immediately follow the Comment or CDATA Section, which mark the location of the change by using a character counting technique. Here, deleted content is contained in the processing instructions, whereas inserted content is already in the Comment or CDATA Section text itself. This preserves the principle of being able to accept all the changes within a document by either ignoring or removing the tracked change processing instructions.
The Oxygen tracked change format supports changes to elements, text, attributes, comments and processing instructions. It also supports both cell and row level changes within tables. In order to see the attribute changes the ModifiedAttributeMode should be CHANGE. Note: This is the default when the output format is set to OXYGEN_TC.
Code Snippet
OutputFormatConfiguration formatConfig= new OutputFormatConfiguration();
formatConfig.setResultFormat(ResultFormat.OXYGEN_TC);
String author= "New Author";
formatConfig.setTrackChangesAuthor(author);
Calendar newDate= Calendar.getInstance();
newDate.add(Calendar.DAY_OF_MONTH, 1);
formatConfig.setTrackChangesDate(newDate);
XMetaL Tracked Changes
When using XMetaL Tracked Changes Markup the output of the comparison is itself an XML document.
This output format uses processing instructions to identify change in a similar manner to that of Oxygen tracked change format. However, it does not support row or cell level table changes. It does support changes to elements, text, comments and processing instructions but not attributes.
Changes within CDATA Sections are handled by moving the change to the CDATA Section level as a whole. Therefore any textual change with in a CDATA section results in the old version of the whole CDATA section being marked as deleted, and the whole of the new version of the CDATA Section being marked as inserted.
There is a special XMetaL specific parameter (XmetalTrackChangesTableChangeMode) which controls what happens when row or cell level table changes are present. These changes can be pushed down to the cell content level, where the content of each cell within the changed region is appropriately deleted and inserted; this is the 'default' behaviour. The second option is that changes to rows or cells can be pushed up to the table level, so that the old and new versions of the table as a whole are tracked. The third option is that changes can simply be ignored (which mirrors what the XMetaL editor would do). However, selecting the ignore mode means that all changes within a table are ignored, not just those that are at the 'row' or 'cell' level. This is deliberate, as we believe that partial tracking of changes within a table would be confusing.
Code Snippet
OutputFormatConfiguration formatConfig= new OutputFormatConfiguration();
formatConfig.setResultFormat(ResultFormat.XMETAL_TC);
Tracked changes configuration parameters.
When ResultFormat is any of the tracked changes representations these parameters apply. For full details see OutputFormatConfiguration. See above for code snippets.
TrackChangesAuthor
Specifies the author name that is embedded into the generated insertion and deletion processing instruction. The default value is 'deltaxml'. See the Javadoc.
TrackChangesDate
Specifies a Calendar instance to use as the date representation when generating tracked changes. A null value can be used to generate the current date/time when the comparison runs.
ModifiedFormatOutput
Specifies the different options for processing the elements that were flattened in the inputs.
Supplementary Output Formats
This section describes output format filters included with the XML Compare distribution. These are used to transform the Delta output within the comparison pipeline (Pipelined Comparator or Document Comparator) immediately prior to serialization.
HTML Difference Reports
HTML5 Side-by-Side (diffreport-sbs)
This is a JavaScript-dependent HTML view that presents the comparison result of the raw XML of the input file versions rendered alongside each other. The user-interface provides up/down buttons on a toolbar allowing the end-user to highlight each change.
The 'side-by-side' output format.
In the Pipelined Comparator, the HTML for this view can be generated using a built in DXP configuration which is invoked from the command-line or GUI using the diffreport-sbs
configuration id. Alternatively, it can be generated from the XML Compare API with the dx2-deltaxml-sbs-folding-html.xsl
stylesheet added as the final output filter.
For the Document Comparator, the DCP equivalent configuration doc-diffreport-sbs
must be used. Or, if using the API, the XSLT filter dx2-deltaxml-sbs-folding-html.xsl
should be added as a filter-step to the OUTPUT_FINAL
extension point, as shown in the following Java code:
DocumentComparator dcr= new DocumentComparator();
FilterStepHelper fsh= dcr.newFilterStepHelper();
FilterChain fChain= fsh.newFilterChain();
FilterStep fsSBS= fsh.newFilterStepFromResource(
"xsl/dx2-deltaxml-sbs-folding-html.xsl", "side-by-side");
fChain.addStep(fsSBS);
dcr.setExtensionPoint(ExtensionPoint.OUTPUT_FINAL, fChain);
HTML Folding Report (diffreport)
As with the side-by-side view described earlier, this is also a JavaScript-dependent HTML view of the comparison result. This view, however, shows XML differences interleaved within a single view of the XML.
The color of the rendered XML indicates the type of change (blue, green and red for 'modified', 'added' and 'deleted' respectively). The view of each element nodes may be folded/unfolded by pressing the icon immediately to the left of the start tag. A simple toolbar and differences list allow for easier navigation of changes in large documents.
The 'folding' output format.
With the Pipelined Comparator, the HTML for the folding view is generated using a built in DXP configuration which can be invoked either from the GUI or from the command-line, as with the side-by-side view, but now using the diffreport
configuration id.
For the Document Comparator, the folding view can be created from the command-line or GUI using the DCP configuration id doc-diffreport
. Alternatively the associated XSLT stylesheet can be added as a filter-step to the final output extension point. This is illustrated in the DCP Folding Diff Report sample.
XML 'Diff and Patch' Output
The XML Compare comparators may be configured to output either a full context delta (the default), or a changes only delta. When the pipelined comparator (but not the document comparator) is used, the changes-only format may be used to recreate document B from document A, this could be useful in version control systems and similar scenarios. A worked example of this is: Using Deltas for XML Versioning (diff and patch)