Mapfile Comparison

A Mapfile comparison is simply a comparison of two DITA map files. The result of such a comparison is a single DITA map file with markup describing the differences between the two input files. The available output formats are the same as those for a topic comparison, as described in  Output Formats.

The alignment of structure in the input map files relies on keying based on the topicref element's attributes keyref or href. The keyref takes priority if both attributes exists. A topicref with a changed key is treated as an entirely different element, it will not be aligned with its original version.

Map Topicset Comparison

Map Topicset Comparison takes two map files, and compares the linked topics.

For a Map TopicSet comparison there are four basic cases to consider: topic inserted, topic deleted, topic changed, and topic unchanged. In addition to these cases, other complicating factors can affect how the results at the map level should be presented, such as whether a topic has:

  1. moved location e.g. position in the map or on the file system,

  2. contains unrepresentable change e.g. an attribute change, which cannot be represented in most output formats, or

  3. been refactored e.g. split or merged.

The structure of the Map TopicSet result is referred to as the Map Result Structure. To suit different presentation needs the following Map Result Structure types are provided as options:

Topic Set

The result of the topic comparison is a map that contains a non-hierarchical (i.e. flat) set of topic references, which are marked up to indicate whether their referent topics (i.e. the topics that they point at) have been inserted, deleted, changed, or unchanged.

The default behaviour is for the result map's topic references to be in the order of their occurrence in the second input (or 'B' document map). Note, those topics that appear only in the first input (or 'A' document map) are positioned close to another neighbouring 'A' document topic that is also in the 'B' document.

Map Pair

The topic references within one of the existing maps (or a copy of it) are marked up with how they have changed. And those topic references that only appear in the other 'remaining map' are output in a non-hierarchical map. Note, this latter map is the map that is specified by the output of the comparison operation.

The default behaviour is for the second 'B' document map (and its submaps) to be updated. Here, the 'B' documents map hierarchy is updated, to indicate which topic references have been inserted, changed, or left unchanged, and the specified output map contains a set of 'deleted' topic references (i.e. those that only appear in the 'A' document).

Unified Map

This result structure attempts to combine the benefits of both the Top Set and Map Pair. That is, in the default case, the 'B' map structure is preserved, and the 'A' only topic references are inserted in the 'B' map as close as possible to the last topic reference to match in both maps.

The Unified Map result works best when the maps being compared have a very similar structure. Specialized DITA maps may also cause problems because the element names of deleted topic references may need renaming so as to be valid if they are inserted in the 'B' map at a different level to their occurrence in the 'A' map.

The Result Structure type is set using the 'map-result-structure' parameter. The default behaviour of basing the output on the structure of the second 'B' document can be changed by setting the 'map-result-origin' parameter.

Map comparisons can be performed either 'inplace' or on a 'copy' of their inputs. In the latter case:

  1. both inputs are copied to an output directory, as discussed in Section Copying Maps,

  2. an inplace comparison is performed on that copy, and

  3. result-alias.ditamap is created in the output dir., whose topicref(s) point to the result map(s).

When copying the inputs the 'map-copy-scope' parameters configure what is copied.

The topic references within an output map are now marked up to indicate whether the content that they are referring to, i.e. the referent, has changed. Here, each topic reference is marked with a status attribute to indicate whether its referent (i.e. the topic it is pointing at) is added, deleted, changed, or unchanged. Note that the setting of this attribute is independent on whether the content of the topicref element has changed. Therefore, it is possible to have an unchanged topicref element containing new, deleted, or changed topicref elements.

As the structure of the map comparison result is non-trivial it is useful to illustrate this by an example. Consider the two versions of a simple ReadMe document illustrated in below; here the only difference in the map structures is the removal of the notices.dita topic from the 'B' version of the document.

These inputs can be compared 'in place' using both the 'topic set' and 'map pair' outputs, as illustrated by the next two illustrations, where the structure of the output is based on the 'B' document. Note, the backup of the 'B' document's map is being displayed in the case of the topic set, to highlight that it differs structurally from that of the resulting map.

These examples exclude the 'unified map' output structure which can be considered to be the same as that for 'topic set' (only the content of the 'B' map and any sub-map files is different).

Result of a topic set in place comparison:

Result of a map pair in place comparison:

It is possible to change the name of the map pair result's remaining document DITA map using the 'map-pair-remaining-map-name' parameter. Further, that control over the backup mechanism is provided by the 'map-clean-temp' and 'map-backup-suffix' parameters.

The example inputs can also be compared using a 'copy' of the input, as illustrated in the next two illustrations, where the output is stored in the C:\OutDir directory. Note that the structure of the output directory is in part explained by the Section on Copying Maps. In addition to the copying of the inputs, a result-alias.ditamap map is generated, as previously discussed.

Result of a topic set comparison with an OutDir output directory:

Result of a map pair comparison with an OutDir output directory:

Copying Maps

What does it mean to copy a DITA Map? This section answers that question in the context of preparing inputs for comparison.

A DITA map provides a mechanism for specifying those resources that both belong to and are referenced by the map. Each of these resources is associated with a scope (e.g. local and external), which can be used to determine whether a resource should be copied along with the map. The 'map-copy-scope' parameter can be used to configure what is copied.

Identifying the resources to copy

DITA provides two basic means for referencing another resource:

A hypertext reference (href) These references contain a relative or absolute URI to a resource.

A key reference These references are defined indirectly by a string, known as a key, which is bound to a hypertext reference within its definition.

Note that DITA keys can only be defined within a map, and that it is the first definition that is used (see DITA 1.2 specification's Overview of Keys section for details on the search order).

DITA Maps are designed to contain submaps and topics, therefore the resources that are referenced by a submap or topic are considered to be indirectly referenced by their parent map. However, with the introduction of keys within DITA 1.2, some DITA practitioners are encouraging the use of 'key' only references within topics. In this case, there is no need to scan the topics for potential resource use as these will be declared at the map level. The 'map-scan-topics-for-references' parameter can be used to prevent DITA topics from being scanned for resources.

Our DITA map model currently considers all resources that are referenced within the map, it submaps, and optionally its topics to be part of the document. In particular, references within key definitions that are not bound are considered to be part of the document. When 'key' aware processing is introduced there will be an option to remove unbound key definitions from the resources associated with a DITA map.

DITA documents are designed to enable references to non-DITA resources, such as images and web-sites. Images are often given a local scope as they are considered to be part of the document. In such cases, the images (and other locally scoped resources) will be copied along with the map. Resources that have other scopes may be copied depending on the setting of the 'map-copy-scope' parameter.

It is assumed that non-DITA content is either self contained (i.e. it does not refer to other resources) or that the references are 'global' (i.e. they can be obtained from a globally unique - and available - address). This may not be true in general, and other formats may be provided with resource scanning capabilities when requested.

Copied structure

When copying a resource we aim to maintain its relative position to the map when this is feasible. For example, if a map contains relative references to resources that are being copied, then these references remain unchanged in the copied output. However, if these relatively referenced resources are not being copied, then the references are updated to reflect their current positions. Alternatively, if a map contains resources that are being copied and cannot be relatively addressed (e.g. they are on a different host computer), then these resources are copied to new locations, and their associated references are updated appropriately.

Those resources that are identified as belonging to the map, and thus are copied, are the same resources that are identified as potential resources for comparison.

In general, the output of a copied map is represented by a forest (e.g. a directory) that has a tree (e.g. another directory) for each 'host system', where the relative locations between the resources on each host system is maintained in the copied output.

Before we explore the general case, it is worth briefly presenting a simple case, where the resources referenced within the document defined by a DITA map - C:\Common\Docs\Notices\notices.ditamap - are either alongside or beneath the location of the DITA map. In this case, the copying of a map to a target location is straightforward, as illustrated in the illustration below, which is being copied in preparation for use as the first, 'A', input in a comparison.

The result might look a little odd, as the copying has introduced an apparently unnecessary directory _a-0-file- and file _a-copy-alias.ditamap, which is a map that references the copied notices.ditamap. The reason for this is that in the general case the copied map might not be in the top-level directory of the copied output directory, and it is useful to have a reliable location for specifying where the copied map is. Consider the following example, where:

the map ReadMe.ditamap in C:\Users\Ian\Products\Docs directory is:

<map>
   <title>Product ReadMe</title>
   <topicref ref="Topics/intro.dita" />
   <topicref ref="file:/C:/Common/Docs/Notices/notices.ditamap" />
</map>
XML

the map notices.ditamap in C:\Common\Docs\Notices directory is:

<map>
   <title>Notices</title>
   <topicref ref="notice1.dita" />
   <topicref ref="notice2.dita" />
</map>
XML

As it is possible to relatively reference the notices.ditamap from the ReadMe.ditamap, the copying of the ReadMe.ditamap results in a forest with a single tree, as illustrated below:

In this case, the copied tree retains the relative paths from ReadMe.ditamap to notices.ditamap in the copied output. One consequence of this is that the ReadMe.ditamap has ended up four levels of directory structure lower than the top of its copied tree. This is not a problem, as the copy alias map (_a-copy-alias.ditamap) points to the actual location of the copied map. For clarity the three result maps are:

the map _a-copy-alias.ditamap in C:\Copy directory
<map>
   <topicref ref="_a-0-file-/Users/Ian/Products/Docs/ReadMe.ditamap" />
</map>
XML
the map ReadMe.ditamap in C:\Copy\_a-0-file-\Users\Ian\Products\Docs directory
<map>
   <title>Product ReadMe</title>
   <topicref ref="Topics/intro.dita" />
   <topicref ref="../../../../Common/Docs/Notices/notices.ditamap" />
</map>
XML
the map notices.ditamap in C:\Copy\_a-0-file-\Common\Docs\Notices directory
<map>
   <title>Notices</title>
   <topicref ref="notice1.dita" />
   <topicref ref="notice2.dita" />
</map>
XML

So far the examples have only shown one tree in the copied result, the directory structure under _a-0-file-. It is straightforward to construct an example that requires two trees, by a small modification to our previous example: Let the notices.ditamap file located on the C:\ drive be moved to a similar position on the D:\ drive, and have the ReadMe.ditamap updated to reflect the new location. The copy of the updated ReadMe.ditamap to C:\Copy is illustrated below:

In this case the directory structures under the trees are much more compact, which is arguably a better result than that produced in the previous example, because it appears to mirror the conceptual breakdown of the document. In general, a user might want to state that certain directories should be treated as if they were on a separate non-relatively addressable storage area. Such functionality can be added in the future if there is sufficient demand.