Skip to main content
Skip table of contents

Technical Specification

Overview

XML Compare compares two well-formed XML files, an 'old' file and an 'updated' file, and generates an XML file describing the differences between the two files. The file representing the differences is known as a delta file.

The XML Compare software provides a procedural interface that can be embedded in other Java-based software to compare elements and attributes of two well-formed XML 1.0 documents, A and B, and represent any differences in a well-formed, XML-encoded, delta file, D.

The XML Compare Combine function provides the capability of re-combining D with A to generate B' such that when B and B' are compared using the XML Compare Compare function no differences will be identified. Similarly D may be re-combined with B to generate A' such that when A and A' are compared no differences will be identified. Re-combination is not supported when one or more of the elements are specified as orderless.

Delta Files

A DeltaXML delta file normally represents just the set of differences between two files, and does not include any data that has not changed. The XML Compare Compare function provides a feature that can be set to generate a 'full context delta' which includes unchanged data, including any unchanged elements and attributes. The 'full context delta' provides a structured representation of two files as a single file in which common data is shared.

A DeltaXML delta file has the same basic structure as the files that have been compared, with some additional attributes and elements. An XML namespace (the DeltaXML namespace) distinguishes these additional elements and attributes from those found in the input files.

Delta files cannot be compared unless the DeltaXML namespace in the delta files is changed before comparison.

XML Processing

XML Compare sends each document to an XML parser prior to processing. If the document starts with a DOCTYPE declaration or a call to an XML Schema the parser will process the DTD or Schema and return a SAX stream with all the entities expanded, and any unspecified attributes added with default values. XML Compare does not need to take into account the structure of a file as specified in a DTD or Schema file during processing except to ensure that any white space flagged as ignorable by the parser is ignored. With no DTD or XML Schema available, if white space normalization is performed, supplied XSLT can attempt to detect significant white space nodes that should not be removed.

Comments and processing instructions are not passed to the SAX output stream by the parser. If they are considered to be significant for the purposes of comparison, XSLT should be used to convert comments and processing instructions into elements that can be compared during parsing.

XML Compare handles namespaces and will detect elements in the same namespace even if the namespace prefix values are different. An element or attribute in a namespace may have a different namespace prefix in the delta file from that used in the input file.

Document Comparison

XML Compare compares the two XML files, taking account of the tree structure of the files and identifying corresponding elements in the two files. Corresponding elements will have the same element local name and namespace and will have corresponding parent elements. The root elements of the two files must have the same local name and namespace. XML Compare determines the best fit at each level in the tree structure between the two files. The best fit algorithm determines the longest common subsequence of corresponding elements. The best fit gives precedence to elements that are exactly equal over those that have just the same element name and namespace.

XML Compare can use key values, identified to the software using an attribute in the DeltaXML namespace, to identify corresponding elements in the two files. Elements with different keys in the two files will not be considered to correspond.

XML Compare treats elements as ordered, i.e. a change in order is identified as a change. Optionally any element can be identified to XML Compare as orderless, using an attribute in the DeltaXML namespace which must be present in both files. In this case the child elements may appear in any order in the two files and XML Compare will match corresponding elements. Within an orderless element, a corresponding element is an element with the same name, namespace and key or an element that is exactly equal through its tree structure. In orderless comparison, any elements that do not exactly correspond will be added or deleted according to which file they appear in. Orderless elements must have element-only content.

XML Compare ignores the order of attributes. Changes to attributes are represented using elements in the DeltaXML namespace.

Text Handling

PCDATA items are treated as a whole and are not subdivided into words or characters. XSLT filters may be used to modify the markup before the files are compared and thus provide a word-by-word comparison. The XML parser interprets CDATA sections and expands entity references prior to comparison within XML Compare.

System Requirements

XML Compare requires either:

  • A Java Standard Edition JRE version 8.0 or later. We test on: Oracle Solaris 10 (Intel Xeon), Mac OSX (10.6 or higher on Intel), Windows Server 2008 R2 and Windows 7 platforms. For support any reported problem should be reproducible on at least one of these platforms.

Patent granted 2001270901; EP1325432; 60134999.7; US8,196,135B2; CA 2416876; US 8,423,518 B2; EP2174238; 602008031420.0. Patents pending 1315520.5; 14275178.3; 14/474,377

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.