Comparing Document Tables (CALS or HTML)

 Table of Contents

Introduction

DeltaXML's comparison and merge products have specialised processing for tables. To benefit from special processing, tables should conform to either the OASIS CALS or W3C HTML table specifications. Table processing can be disabled in all products if required.

In DeltaXML products, the main goal of table-processing is to keep the tables in the comparison output valid according to the relevant table specification. Keeping tables valid ensures comparison results can be rendered properly in the appropriate application.

The purpose of this document is to explain how a comparison result is affected by table processing in DeltaXML products.

Examples

Please see the Examples of Table Comparison Results page for specific examples of how different types of table-differences are represented.

Differences between CALS and HTML tables

The CALS and HTML table models have many similarities, with HTML tables being simpler but based on the CALS model (see W3C HTML-3 tables). However, the way in which parts of the model are expressed using XML is quite different. For example in cases where a cell must span more than a single row or column.

Different processing logic is used for the CALS table syntax than that for the HTML table syntax. The effect of this specialised processing is that, in specific cases, the representation of differences between two tables will not be the same for CALS tables as it is for HTML tables.

Detecting Table Types

The logic that determines whether an element represents a CALS table or HTML table is as follows:

CALS Tables

All elements and their descendants with a local-name of 'tgroup' are processed as CALS tables. The element namespaces are not checked for table-processing.

HTML Tables (also known as 'Simple' or 'Informal' CALS Tables)

'HTML' tables can be found in many XML grammars, such as DocBook and DITA and XHTML, the top-level element local-name and namespace are not standard. An XML element and its descendants is processed as an HTML table if it has:

  • a local-name of 'table' and child or grand-child elements with a local-name of 'tr'
  • a local-name of 'informaltable' and child or grand-child elements with a local-name of 'tr'
  • a local-name of 'simpletable'

Representing Conflict in Structure (Table Splitting)

Two corresponding input tables can have structures that conflict, for example, when a column is added in a position so that it overlaps with an existing cell that spans more than one column (see Example 11). In such cases it is difficult (if not impossible) to represent their differences in the comparison result in a valid way. CALS table processing handles such conflicts by showing the whole document 'A' table followed by the whole document 'B' table. HTML table processing however will allow the overlapping data, resulting in a rendered table that needs careful interpretation and could well be misleading.

Alignment of Cells using Column Names (CALS only)

The CALS table specification can exploit a 'colname' attribute to identify each cell as belonging to a specific column. This attribute, if present, is used by the comparator to align cells in each table. This can provide a better comparison result in cases where a column has been added or deleted or its position changed. The HTML table specification does not have an equivalent to the 'colname' attribute. Additional examples 15 and 16, only available on Bitbucket, demonstrate the advantages of using column names with CALS tables.

If Document B uses column names that are not consistent with those in Document A, the integrity of column structure in the comparison result will be compromised.

Granularity of the Comparison Result

Fine Grain

Simple changes to the table, such as changing the contents of an entry, adding a row or column are generally represented as 'fine grain' changes. In this context, 'fine grain' means that changes are represented at the same level as the change itself. Some examples:

  • if a word is deleted (with 'word-by-word' enabled) then the text for that word is marked as deleted
  • if a row is deleted then the element for that row is marked as deleted
  • if a column is deleted then, in each table row, the cell element corresponding to the deleted column position is marked as deleted

Row-level

Some type of changes such as table entries overlapping or spanning multiple rows and columns are difficult to represent at fine granularity, whilst ensuring validity. In these cases, the changes can often be represented at row-level granularity.

An example of row-level granularity: if a cell in a table row is changed to span 2 rows and the corresponding cell in the following row is removed to make room, then the two affected row elements in Document A are shown in the result, marked as deletions. These are followed by the two corresponding rows found in Document B that are marked as additions (See Example 6).

Table-level

In certain cases where there are structural differences in the compared tables, even whole-table granularity (see 'Table Splitting' above) must be used.

Differences in HTML syntax

HTML Tables have elements that are inferred by an HTML processor if they are missing from the tag structure. For example, 'tr' elements can be immediate children of the 'table' element. DeltaXML's table processing can still process HTML tables without optional elements like 'tbody'. The HTML table processor does however still require consistency in the convention used in the 'A' and the 'B' documents. For example, if a 'tbody' element is present in Document A but missing in Document B, the child 'tr' elements of 'tbody' in 'A' will not align with 'tr' elements in 'B' that are immediate children of the 'table' element. 

Validation of CALS tables in the input XML

All products that perform CALS table processing have a built-in capability to validate the CALS tables found in the input XML documents for the comparison. Validation is performed using the CALS Table Schematron, this uses assertions to check the cell-structure integrity of the table. It is important to note that a valid table comparison result is only possible if the input XML is valid also.

If tables in the inputs are invalid, this can either be reported as a warning (that may be output in a number of ways) or result in the comparison being terminated with an error, depending on the comparison setting.

Summary

The following table shows how different type of table changes are represented in a comparison or merge result.

Type of ChangeExample #Applies to CALS tablesApplies to HTML tablesFine grain changesGroups of added/deleted rowsTable Splitting
Cell content change1

Row addition/deletion2

Column addition/deletion (no 'colname' available for alignment)3

Column addition/deletion ('colname' available for alignment)-


Column span addition/deletion/modification4, 5 13

Row span addition/deletion/modification6, 7

3-Way Column span addition/deletion/modification8

3-Way Row span addition/deletion/modification9

2-dimensional span addition/deletion/modification10

Row group addition/deletion/modification14

Invalid tables11, 12


#content .code