Formatting Element Representations
DeltaV2.1
DeltaV2 is the format used to represent the comparison results from two or more input XML documents in a single XML result document, see the DeltaV2 reference documentation for the full definition.
Version 2.1 of this format adds new capabilities for representing types of change associated with overlapping hierarchies. For full details along with comprehensive examples see Overlapping Hierarchy. There are four new attributes used: deltaxml:deltaTag
, deltaxml:deltaTagStart
, deltaxml:deltaTagMiddle
and deltaxml:deltaTagEnd
.
These representations correspond to the 'FormattingOutputType Values' section in the formatting elements sample.
ContentGroup
In DeltaV2.0, content groups were originally used to just represent entities, processing instructions and comments. Now, content groups in DeltaV2.0 have been extended to represent formatting-element changes. The content group (deltaxml:contentGroup
) is used to describe changes involving formatting as well as entity references, processing instructions and comments. This element is modelled on the deltaxml:textGroup
element, but relaxes the restriction that the child elements (deltaxml:content) must only contain text. However, the content group also contains deltaxml:text
to show text differences without formatting elements. The deltaxml:deltaV2
attribute describes formatting change as well as text change whereas deltaxml:wordDelta
only describes text change. The deltaxml:content
child elements provide the alternative content and their deltaxml:
deltaV2
attributes indicate which of the input files contained that content.
The following example indicates how a contentGroup is used to show that different formatting, where <b> and <u> are formatting elements, is used in corresponding locations in the merge inputs:
<deltaxml:contentGroup deltaxml:deltaV2="A=D!=B!=C" deltaxml:wordDelta="A=B=C=D">
<deltaxml:text deltaxml:deltaV2="A=B=C=D">The quick brown fox jumps over the lazy dog.</deltaxml:text>
<deltaxml:content deltaxml:deltaV2="A=D"><b>The quick brown fox jumps over the lazy dog.</b></deltaxml:content>
<deltaxml:content deltaxml:deltaV2="B"><b>The quick brown <u>fox jumps over the lazy dog.</u></b></deltaxml:content>
<deltaxml:content deltaxml:deltaV2="C"><b>The quick brown fox <u>jumps over the lazy dog.</u></b></deltaxml:content>
</deltaxml:contentGroup>
Interaction of Content Groups and Formatting Element Types
Broadly speaking there are two categories of formatting elements, which can be described quite well using the HTML (up to v4.01) terminology of block-level elements and inline elements. The properties of the two types of element are listed below.
Content model
Generally, block-level elements may contain inline elements and (sometimes) other block-level elements. Inherent in this structural distinction is the idea that block elements create "larger" structures than inline elements.
Default formatting
By default, block-level elements begin on new lines, but inline elements can start anywhere in a line.
Example
<div class="example">
<p>This is a block-level paragraph with some inline <b>formatting</b> within.</p>
</div>
Currently, the Content Group representation does not distinguish between in-line and block level formatting elements, so the user should be aware that the behaviour of content groups with block level formatting may produce unexpected or undesirable results. One such undesirable output is duplication of block level elements which may result in the output being an invalid document. In the example above, the "div" and "p" elements are both block-level elements, and the "b" element is inline. If there are changes within the "p" element between versions this may result in multiple "p" elements as children of the "div", which may not be desirable. In cases such as this example, we recommend that only inline elements are suitable to be marked as formatting elements, and not block-level elements.
Milestones
Milestones are empty-elements that are placed in the position of the start and end tags of formatting elements. XML Merge can produce two different types of milestones results:
Overlapping Milestones: This is a basic milestone result, simply representing the starts and ends of formatting elements.
Non-Overlapping Milestones: This milestone format is an extension to overlapping milestones having extra milestones representing fragments generated due to Overlapping Hierarchy.
Both of these representations use the same XML elements as milestones. Non-Overlapping Milestones has additional attributes to indicate milestones added to allow fragmentation to prevent overlaps.
Milestone Representation Definition
A milestone is an empty element representing the start and end tags with a name either
<format:start>
or<format:end>
.An element
<deltaxml:formatting>
is a first child of a root element and has information about all of the different formatting elements from all the inputs.XML<deltaxml:formatting> <b/> <u/> </deltaxml:formatting>
Here <b> elements will have a
format:index="1"
in the milestone and <u> will haveformat:index="2".
Every
<format:start>
has a corresponding<format:end>
with sameformat:id
.
Every milestone has the following attributes:
Attribute | Overlapping | Non overlapping |
---|---|---|
format:id | Yes | Yes |
format:index | Yes | Yes |
deltaxml:deltaV2 | Yes | Yes |
format:overlap-split | No | Yes |
format:id
is unique across the document for overlapping milestones. However, for non-overlapping milestonesformat:id
is unique within the groups formed due to fragmentation.format:index
gives the position of the formatting element from thedeltaxml:formatting
list.The
deltaxml:deltaV2
attribute gives information about the versions for which this formatting element exists.The
format:overlap-split
attribute is used only in non-overlapping milestones.Milestone with attribute format:overlap-split="false": Represents start and end for formatting element with respect to inputs.
Milestone with attribute format:overlap-split="true" : Represents milestone created due to fragmentation.
Simple example
<format:start format:index="1" deltaxml:deltaV2="A" format:id="1p1_5p1"/>
Word1 Word2 Word3 Word4 Word5
<format:end format:index="1" deltaxml:deltaV2="A" format:id="1p1_5p1"/>
Sample
You are able to define which elements that you consider to be formatting elements. See the formatting elements sample.