addContentTable to HTML documents

1 Function

addContentTable is a Python tool which adds a hierarchical content table to HTML documents.

The content table is derived from the documents element structure. Usually the headers H2, H3 and H4 are used to generate content table entries at hiearchical level 0, 1 and 2, respectively. Content table entries are links to the respective elements. The elements are hierarchically numbered, e.g. 1.2.1.

The generated content table usually is placed in the first div class=toc element found in the document (its content is removed). If no such element is found, the content table is placed after the first element specified by place.

2 Installation

Extract the TGZ archive addContentTable.tgz. It contains the Python package dmutil and the script addContentTable. Install the Python package as a subdirectory of site-packages. Place the script addContentTable somewhere on your path.

3 Invocation

addContentTable [options] file...
adds content tables to the HTML files file... Currently, the following options are recognized:
-s sec,...,sec
a comma separated list of tag names used to generate the hierarchical content table.
Defaults to H2,H3,H4
-d destdir
addContentTable puts the rewritten HTML files into destdir.
If not specified, the source file is replaced by the rewritten file (after file is renamed to file~).
-p place
tells where to place a content table, when there is not yet a content table or the -n has also been specified. In these cases, the content table is placed after the first occurrence of element place. Otherwise, the new content table replaces the old one.
Defaults to H1
-n
do not place the content table in the first div class=TOC element but place it after the first element place (see -p option).
-v
be verbose.

4 Requirements

The tool requires a Python 1.5.x installation together with the Python XML package xml-0.5.

Python can be downloaded from the Python homepage, the XML package from the XML-SIG repository.

addContentTable requires two minor patches to the XML package.
The first patch removes a problem with the methods firstChild and lastChild of xml.dom.core.Document objects. They fail to initialize the document owner of the children. This leads to Document Exceptions.
The second patch retains empty lines on output. This is essential e.g. for pre elements.
You find these patches in the XML-SIG archive attached to the announcement of addContentTable.

5 Usage Conditions

You can use this tool under an Open Source license at your own risk. Please see the copyright notice at the beginning of dmutil/ContentTable.py or addContentTable, for details.

6 Known Bugs/Weaknesses

6.1 Beta State

This software is in a beta (experimental) state. Almost surely, there are hidden bugs.

6.2 Source Document Formatting Lost

addContentTable works by parsing the document into a DOM tree, modifying the tree and then generating a new HTML document. Thereby, the source documents formatting is lost.

You may consider to always use the -d option such that your source files are retained.

6.3 Quite Slow

7 Download

Sources: addContentTable-0.2.tgz 4kB TGZ archive.

8 Version History

8.1 Version 0.2


Dieter Maurer
Last modified: Sun Jan 31 15:32:17 CET