Rule-Based Processing
Introduction
DITA Merge result types are always some form of delta file, usually showing, for example, information about the deletion and the deleted content in the result.
Line-based merge algorithms often use automatic acceptance of simple changes such as deletions and additions to create a result. The XML-based algorithms used in our merge products can apply a similar process.
We call this rule-based processing because a set of rules are used to determine which types of change are automatically applied. This processing is used when the user selects the RULE_PROCESSED_DELTAV2
result type for a merge. Without any user-specified processing rules the processing engine will, by default, process "simple add" changes so that the content is added. Similarly, "simple delete" changes are removed from the result and simple (leaf) modifications are applied. The processing rules allow control over which changes are displayed to the user (for example, for subsequent interactive checking or resolution). Another way of thinking about the display rules is that they control which changes are not automatically applied or converted. The SIMPLIFIED_RULE_PROCESSED_DELTAV2
is a simplified form for RULE_PROCESSED_DELTAV2
.
For a comparison and reference to the different delta formats see the page Merge Result Formats and Types.
Motivation
The motivation for developing a rule-based processing system follows from the design of line-based merge or 'diff3' algorithms used in software version control systems. These systems typically accept non-conflicting changes, so that lines that are changed but not in conflict are merged into the result.
However, our solution is more flexible in that rules can be used to control where the non-conflicting changes are applied whereas the line based algorithms typically do not provide any configuration possibilities.
Example
The following is an example of a deltaxml:textGroup
used to show fine grained changes to inline text. One of the inputs ('Ben' in the example) changes the word 'quick' to 'fast'. Without rule based processing (when the result type is ANALYZED_DELTAV2
) the output would be:
<p deltaxml:deltaV2="Original=Anna=Chris!=Ben">The
<deltaxml:textGroup deltaxml:deltaV2="Original=Anna=Chris!=Ben" deltaxml:edit-type="modify">
<deltaxml:text deltaxml:deltaV2="Original=Anna=Chris">quick</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="Ben">fast</deltaxml:text>
</deltaxml:textGroup> brown fox jumps over the lazy dog.
</p>
The above is an example of a modification which can be rule processed. It is a simple modification because only one editor has made a change in the <p> element. The default action of the rule processing system would be to remove the deltaxml:textGroup
describing the change. The corresponding output would then be:
<p deltaxml:deltaV2="Original=Anna=Chris=Ben">The fast brown fox jumps over the lazy dog.</p>
In the above example the deltaxml:textGroup
describing the change is removed and replaced with the literal text corresponding to the modification made by Ben (the word 'fast'). Also note that the deltaV2 attribute on the parent p
element, has been updated to reflect the fact that it no longer contains changes. The updating or correction of delta attributes is applied throughout the tree and if appropriate to the root element. In certain cases, such as when all changes are simple and are processed, the result may have a deltaV2 on the root element indicating that no changes are present in the final result. The root element delta can then be used to determine that there is no need to use interactive change display or conflict resolving tools in these cases.
Similar to deltaxml:textGroup
used in DELTAV2
merge output, deltaxml:versionGroup
elements are used to represent the changes in SIMPLIFIED_DELTAV2
. Rule processing can also be applied to the simplified output to produce SIMPLIFIED_RULE_PROCESSED_DELTAV2
output. Following is an example of deltaxml:versionGroups
.
<p deltaxml:deltaV2="ancestor=edit1!=edit2">The
<deltaxml:versionGroup>
<deltaxml:versionContent deltaxml:deltaV2="base=edit1">quick</deltaxml:versionContent>
<deltaxml:versionContent deltaxml:deltaV2="edit2">fast</deltaxml:versionContent>
</deltaxml:versionGroup> brown fox jumps over the lazy dog.
</p>
The default action of the rule processing system will remove the deltaxml:versionGroup
to describe the change and produce the same rule processed result as above.
Rule configuration
There are currently seven parameter settings available. Six of them are used to control changes which are to be displayed and therefore not resolved in any way. A parameter version priority list is used for conflict resolution based on the version identifier priorities. If none of these settings are used their default values have been chosen so that resolution of simple adds, simple deletes and simple (leaf) modifications does take place when rule based processing is applied.
Rule configuration is achieved using the RuleConfiguration
object in the API. This has several methods that describe the configuration as discussed below. The ConcurrentMerge
object has a setRuleConfiguration
method that changes the current configuration. This is applied during rule processing in the extractAll
methods when the ConcurrentMergeResultType
has been set to RULE_PROCESSED_DELTAV2
.
DisplaySimpleAdds
The DisplaySimpleAdds parameter controls whether simple adds are displayed or automatically resolved. Its default value is false
, which means that it will be automatically resolved. The following code can be used to display all simple adds.
ConcurrentMerge cm= new ConcurrentMerge();
cm.setResultType(MergeResultType.RULE_PROCESSED_DELTAV2);
RuleConfiguration rc= new RuleConfiguration();
rc.setDisplaySimpleAdds(true);
cm.setRuleConfiguration(rc);
DisplaySimpleDeletes
The DisplaySimpleDeletes parameter controls whether simple deletes are displayed or automatically resolved. Its default value is false
, which means that it will be automatically resolved. The following code can be used to display all simple deletes.
ConcurrentMerge cm= new ConcurrentMerge();
cm.setResultType(MergeResultType.RULE_PROCESSED_DELTAV2);
RuleConfiguration rc= new RuleConfiguration();
rc.setDisplaySimpleDeletes(true);
cm.setRuleConfiguration(rc);
DisplaySimpleModify
The DisplaySimpleModify parameter controls whether simple modifications are displayed or automatically resolved. Its default value is false
, which means that it will be automatically resolved. The following code can be used to see all cases of 'simple modify'.
ConcurrentMerge cm= new ConcurrentMerge();
cm.setResultType(MergeResultType.RULE_PROCESSED_DELTAV2);
RuleConfiguration rc= new RuleConfiguration();
rc.setDisplaySimpleModify(true);
cm.setRuleConfiguration(rc);
DisplayChangesInvolving
It is possible to specify that no changes for a given list of versions are resolved (i.e. they are displayed). This is achieved by setting the DisplayChangesInvolving parameter to a comma separated list of version identifiers that are to be displayed. The following code can be used to see all the changes involving the 'Anna' and 'Chris' versions.
ConcurrentMerge cm= new ConcurrentMerge();
cm.setResultType(MergeResultType.RULE_PROCESSED_DELTAV2);
RuleConfiguration rc= new RuleConfiguration();
rc.setDisplayChangesInvolving(new HashSet<String>() {{ add("Anna"); add("Chris"); }});
cm.setRuleConfiguration(rc);
DisplayChangesTo
This rule configuration parameter setting specifies whether simple changes to specific elements or groups of elements are either displayed or automatically resolved.
Any XPaths used are applied in the context of the entire 'output' document, after generation of the analysed deltaV2. It is possible to provide control over elements anywhere in the file, such as this XPath for selecting title elements:
//title -
would apply to any title element.
An XPath can also be used to address a specific individual element using its 'id' attribute:
/topic/bodydiv/p[@id='summary']
The output XML tree is, in general, likely to have a different structure to that of at least one of its inputs. For example, an input could add (or delete) a paragraph. Therefore using the 'sibling' index number to identify a position in the tree should only be done when the user is confident that these indexes will not change in the output.
Using the XPath sequence concatenation operator (comma), it is possible to supply multiple XPaths using the setDisplayChangesTo
method in the API, for example:
ConcurrentMerge cm= new ConcurrentMerge();
cm.setResultType(MergeResultType.RULE_PROCESSED_DELTAV2);
RuleConfiguration rc= new RuleConfiguration();
rc.setDisplayChangesTo("//title, /topic/bodydiv/p[1], //p[start-with(@id, 'sum')]");
cm.setRuleConfiguration(rc);
Selecting changes to text and attributes is a little more involved, as it requires some knowledge of the deltaV2 format. Specifically changed text and attributes is contained in deltaxml:textGroup
and deltaxml:attributes
elements. The following example illustrates how to display changes to text containing the word 'important' and all 'id' attributes.
ConcurrentMerge cm= new ConcurrentMerge();
cm.setResultType(MergeResultType.RULE_PROCESSED_DELTAV2);
RuleConfiguration rc= new RuleConfiguration();
rc.setDisplayChangesTo(
"//deltaxml:textGroup[contains(string(.), 'important')]," +
"//deltaxml:attributes/dxa:id");
cm.setRuleConfiguration(rc);
For further information on deltaxml:textGroup
and deltaxml:attributes
please see the deltaV2 format documentation.
DisplayFormatChangesIn
This parameter specifies whether simple changes in formatting inside specific elements or groups of elements are either displayed or automatically resolved.
Any XPaths used are applied in the context of the entire 'output' document, after generation of the analysed deltaV2. It is possible to provide control over elements anywhere in the file, such as this XPath for selecting title elements:
//title
- would apply to any title
element.
An XPath can also be used to address a specific individual element using its 'id' attribute:
/topic/bodydiv/p[@id='summary']
The output XML tree is, in general, likely to have a different structure to that of at least one of its inputs. For example, an input could add (or delete) a paragraph. Therefore using the 'sibling' index number to identify a position in the tree should only be done when the user is confident that these indexes will not change in the output.
Using the XPath sequence concatenation operator (i.e. comma), it is possible to supply multiple XPaths using the setDisplayFormatChangesIn
method in the API, for example:
ConcurrentMerge cm= new ConcurrentMerge();
cm.setResultType(MergeResultType.RULE_PROCESSED_DELTAV2);
RuleConfiguration rc= new RuleConfiguration();
rc.setDisplayFormatChangesIn("//title, /topic/bodydiv/p[1], //p[start-with(@id, 'sum')]");
cm.setRuleConfiguration(rc);
For more details about formatting elements and rule processing interaction, see Formatting Elements With Rule Processing Sample.
Version Priority List
This rule configuration setting specifies the priority list of versions identifiers. Modifications and conflicts will be automatically resolved based on the priority list and versions involved in a change. This is achieved by comma separated list of version identifiers (first version identifier in list having highest priority).
ConcurrentMerge cm= new ConcurrentMerge();
cm.setResultType(MergeResultType.RULE_PROCESSED_DELTAV2);
RuleConfiguration rc= new RuleConfiguration();
List<String> priorityList= Arrays.asList("chris","anna","ben");
rc.setVersionPriorityList(priorityList);
cm.setRuleConfiguration(rc);
Parameter precedence
The precedence order of the rule based resolver parameters is summarized as follows:
The DisplayChangesTo, DisplayChangesInvolving and DisplayFormatChangesIn settings will override other parameters
The DisplaySimpleAdd, DisplaySimpleDeletes and DisplaySimpleModify settings control whether resolution is applied to these categories of changes for any elements that are not configured with parameters discussed above.
All other parameters (DisplayChangesTo, DisplayChangesInvolving, DisplayFormatChangesIn, DisplaySimpleAdds, DisplaySimpleDeletes, DisplaySimpleModify) will override the version priority list setting.