XSL-Pattern

1 Introduction

XSL-Pattern is an implementation of the pattern subset of the XSL working draft 16-December-1998.

Patterns are used extensively in the XSL transformation language and its control structures. They can be used outside XSL, too, for e.g. querying/selecting/matching parts of HTML/SGML/XML documents.

XSL-Pattern is laid on top of PyDOM, the DOM implementation of Pythons XML special interest group.

For parser construction, XSL-Pattern uses bison and Scott Hassans PyBison package.

2 Installation

There are two TGZ archives xslpattern.tgz and xslpattern_s.tgz.

xslpattern.tgz is necessary to use the software. The archive contains two python packages dmutil and PyBison. Unpack it in Pythons site-packages directory. While dmutil contains code developed by me, PyBison is a small (and slightly patched) part of Scott Hassans PyBison distribution.

xslpattern_s.tgz contains additional sources. You need this archive only if you want to change (or view) the bison grammar for XSL patterns. To change the grammar, you will need bison, PyBison, a C development system and the patch utility. PyBison must be patched with the patch file env.pat provided in the archive (This requirement arises with version 0.3. The patch allows parser object specific customization. This became necessary for parser objects using different pattern factories.).

3 Usage

The central module is dmutil.xsl.pattern. It contains an XSL pattern parser Parser, a parser object which parses an XSL pattern string into a pattern object. Besides, it contains the exception classes Eerror and SyntaxError (derived form Error). SyntaxError is raised for syntax errors, Error for other errors. The class IdDecl is used to provide DOM document objects with methods getIdMap and flushIdMap in order to handle IdExpr in XSL patterns.

XSL pattern objects have three (relevant) methods: select, match and checkNonempty. All methods get a DOM node as first parameter. select returns the list of nodes (in document order) selected by the pattern with node as context. match returns true iff node is matched by the pattern. See the XSL specification for details. checkNonemty accepts as optional second argument a value. It returns true iff from context node at least one node is selected with value value (if specified).

The XSL pattern object attribute patternstring contains the pattern string from which the object has been built.

from dmutil.xsl.pattern import Parser

domtree= ....
pattern= Parser('A/@HREF')
a_hrefs= pattern.select(domtree)
   # the list of HREF attribute of A elements, in document order
first= a_hrefs[0]
   # may raise an IndexError
pattern.match(first)
   # returns true
h2_ancestor= Parser('ancestor(H2)').select(first)[0]
   # the nearest H2 element above first or an IndexError exception,
   # if no such element exists

There are many more XSL pattern examples in the test file test_pattern.py (inside the dmutil/xsl package).

Since version 0.3, experts can use the parser infrastructure to create their own parser objects. The function makeParser takes a PatternFactory object as argument and creates a parser. The parser parses XSL pattern strings and uses the factory to create customized XSL pattern objects. The factory determines these objects completely. They may not have the methods described above for standard XSL pattern objects. A factory must have all methods (and attributes) of the standard pattern factory PatternFactory with the same arguments. It may, however, use a completely different implementation.

4 Requirements

XSL-Pattern requires a Python 1.5.x installation together with the Python XML package xml-0.5.x.

Python can be downloaded from the Python homepage, the XML package from the XML-SIG repository.

You may further need a small patch to xml.dom.core to fix a type error in PyDOMs attribute handling. I posted this patch to the mailing list xml-sig@python.org You find it in its archive attached to the announcement of XSL-Pattern.

5 Usage Conditions

You can use XSL-Pattern under an Open Source license at your own risk. Please see the copyright notice at the beginning of dmutil/xsl/pattern.py, for details.

6 Known Bugs/Weaknesses

6.1 XML/XSL incompatibilities

6.2 id references

XSL-Pattern has no way to determine which attributes are used as id attribute. In order to support id references, XSL-Pattern requires the document root to have a method getIdMap returning a dictionary mapping id names to nodes.
The pattern module contains the class IdDecl. Its contructor has 2 optional parameters: a list of general id attributes and a dictionary mapping (some) element types to their id attribute. If an element type is a key in the dictionary, then its value specifies its id attribute (if it is empty or None, then it has no id attribute); for all other element types, any of the attribute names given in the list can be its id attribute. A IdDecl instance provides methods getIdMap and flushIdMap to a document when it is applied to it.

ATTENTION: Match patterns containing an id reference with a pattern parameter are extremely inefficient -- never use them! However, such patterns can be used as select pattern.

6.3 Incompatibilities between DOM and XSL

XSL does not support CDATA, EntityReference and Notation nodes, which can occur in DOM trees. This incompatibility is not yet handled correctly.
Fortunately, most (SAX based) parsers transform CDATA and EntityReference into normal text nodes, at least if not explicitely told otherwise. However, in this process, the XSL requirement is violated that no two text nodes are adjacent. The function normalize can be applied to a document to merge adjacent text nodes.

6.4 Efficiency

Patterns consisting only of SubNode expressions (optionaly filtered) are fairly efficient, as are IdentityExpr, ParentExpr and Ancestors. IdExpr are exprensive; especially when used as match pattern. IdExpr with pattern (rather than literal) argument used as match (rather than select) pattern are prohibitive.

IdentityExpr and absolute pattern could be more optimized.

7 Download

XSL-Pattern -- required for XSL-Pattern usage
xslpattern-0.3.tgz 14 kB TGZ archive
Additional Sources -- only required for modifications to the XSL pattern grammar
xslpattern_s-0.3.tgz 2 kB TGZ archive

8 Version History

8.1 Version 0.3

8.2 Version 0.2

8.3 Version 0.1

The initial implementation with the following bugs/weaknesses:
Dieter Maurer
Last modified: Sat Apr 10 18:25:13 /etc/localtime 1999