This page gives a detailed explanation of the settings used in a DCP file when processing HTML tables. These are summarised in the DCP Schema Guide.
The use of HTML tables is explained in the page Comparing Document Tables (CALS or HTML)
Note that this page is only applicable when using the Document Comparator.
HTML tables are validated by a Schematron file. There are several ways to configure how validation error messages appear. See below for details.
Each message includes the XPath of the element concerned. This allows the identification of the erroneous element in the input file within an XML editor. Alternatively you may search for the phrase HTML Table Validation Warning in the result document.
Validation is performed on HTML <table>, DocBook <informaltable> DITA <simpletable> elements. Any validation error messages will appear in the element of concern in the result file - if you have asked for processing instructions or comments.
If you have no HTML tables in the input documents, the comparison may be faster if you switch HTML table processing off.
By default the HTML Table Validation Warning appears in a processing instruction. In a large file you could search for the string HTML Table Validation Warning.
When using a value of
message for this parameter and running from the command line you will obtain output like this if there is an error:
If you set the warningReportMode to comments then an XML comment is inserted after a <tgroup> that has an error.
The XPath expression in a validation message can be copied and pasted into an XPath builder in an XML editor such as oXygen and, with the focus on the file with the problem, the element concerned will be highlighted.
For both processing-instructions and comments, the message will appear in the tgroup element with the problem.
Note that both the comments and processingInstructions options include the wording HTML Table Validation Warning so that searching the result file for this phrase will show the errors.
This propagates the error up to the containing table.
This processes the table as if it is just xml and not a table. The warning is still given, defaulting to a processing instruction on the table.
This will fail with a message starting Detected invalid table(s), for example:
For certain errors nothing will be reported in relaxed mode. In strict mode you will see errors like:
- A caption element can occur in an HTML table only once
- A caption element must be inserted immediately after the table element
- A tr element should have one or more td or th elements inside
- A tfoot element should appear before any tbody element
Relaxed validation does not give any error for this table even though there are two captions and the <tfoot> element is after the <tbody>
It is difficult to compare two tables where different conventions have been used. By default HTML tables will be normalized before comparison takes place to ensure the underlying structure of all tables is the same. If you do not wish this to take place you can set normalizeTables to false. This only applies to HTML tables, simple tables and informal tables because there are different ways of expressing the same structure.
Currently, the normalization feature is limited to the use of <colgroup>, this feature may be extended to cover more cases in future.
This setting is recommended when there is a difference between inputs of specifying columns, e.g. if one uses just * <colgroup> and another uses <col> without <colgroup>.
The <tbody> element is treated as a special case for HTML table comparison. This is because the tbody element may not be present in a table or there may one or more instances of this element in a table. The tbody elements are flattened in the input so that differences in the existence or number of tbody elements can be hidden in the comparison result. How any tbody structure differences are shown is determined by standard FormattingElement settings.