Table Comparison in DITA Compare

DITA tables (which use the CALS table model) are handled slightly differently from the rest of the document because displaying change, particularly structural change, in tables is more difficult. For this reason, structural table changes are shown at various different levels of granularity. Our main aim in the table processing is to produce a result where the changes can be seen in as much detail as possible but with the result document still maintaining validity against the DITA specification and the CALS table specification. Producing an invalid result document can cause problems further down the publishing pipeline.

Simple Structural Change

When the column definitions for the two tables have not changed, it is possible to represent changes to column or row spanning at the effected row granularity. Rather than repeating the whole of the table in two tgroup elements, it is possible to repeat only individual rows, or in some cases a set of consecutive rows in the same format of original document rows (marked with status="deleted") followed by the latest document rows (marked with status="new"). The number of rows that are repeated depends on what type of structural change has occurred. If the change involves changes to column spanning within a single row that does not overlap other rows and is itself not overlapped, it is possible to repeat only that single row. If column spanning changes occur on a row that overlaps other rows or is itself overlapped, it is necessary to group together all of the rows affected by the row spanning and repeat them together. This is also the case for any changes involving changes to row spanning.

Complex Structural Change

Some structural changes are too complex to represent in a single result table section (the tgroup element) and so the result document contains a table with two table sections: the first contains the table from the original document with a status="deleted" attribute on it, the second contains the table from the latest document with a status="new" attribute on it. Although it is not possible to see individual changes to rows/cells etc that occurred between the document versions, it is possible to see the two table versions and, providing the inputs were both valid, be sure that the result document is valid.

This type of result is produced when a table contains changes to row or column spanning as well as changes to the column definitions (e.g. changed column names or added/deleted columns).

Orderless Tables

Sometimes the order of rows within a table is insignificant. For example, consider a simple product information table, where the first column of the table contains a unique product name, the second column its 'tag line', the third column its standard price, etc. The rows in this table can be reasonably ordered in a variety of ways, such as by 'name', or by 'price'. When two versions of a document are compared that use different row ordering mechanisms, a significant number of rows are likely to be added and deleted due to them moving position. If such differences are insignificant then an orderless row comparison would be useful.

Orderless row comparison support can be provided so long as there is no row spanning within the tables being compared. In such cases, the <?dxml-orderless-rows?> processing instruction can be added within the element that directly contains the rows that are to be processed in an orderless fashion. It is important to ensure that this processing instruction is added to the relevant table in both input documents.

The orderless comparison algorithm is greatly improved through the use of unique row keys. Adding a <?dxml-key id1?> processing instruction within the element that directly contains the row, sets that rows key to 'id1'. It is also possible to specify the row 'cell position' that is used for defining the default value for a row's key. For example, the <?dxml-orderless-rows cell-pos:2?> processing instruction specifies that the text content of the row's second cell (e.g. <entry> or <stentry> element) should be used as the row's key. Note that the row cell position takes no account of 'column' data (e.g. @colnum attribute), it just counts the number of cells.

Other Changes

Other kinds of simple structural change can be represented within a single table without needing to repeat any rows. For example, column deletion in a table that does not have any changes to column or row spanning can be represented by marking each of the deleted cells with the status="deleted" attribute.