Patterns are used extensively in the XSL transformation language and its control structures. They can be used outside XSL, too, for e.g. querying/selecting/matching parts of HTML/SGML/XML documents.
XSL-Pattern is laid on top of PyDOM, the DOM implementation of Pythons XML special interest group.
For parser construction, XSL-Pattern uses
and Scott Hassans
xslpattern.tgz is necessary to use the software.
The archive contains two python packages
Unpack it in Pythons
dmutil contains code
developed by me,
PyBison is a small (and slightly patched) part of
xslpattern_s.tgz contains additional sources.
You need this archive only if you want to change (or view)
the bison grammar for XSL patterns.
To change the grammar, you will need
a C development system and the
PyBison must be patched with the patch file
provided in the archive (This requirement arises with version 0.3.
The patch allows parser object specific customization.
This became necessary for parser objects using different
dmutil.xsl.pattern. It contains an XSL pattern parser
Parser, a parser object which parses an XSL pattern string into a pattern object. Besides, it contains the exception classes
SyntaxErroris raised for syntax errors,
Errorfor other errors. The class
IdDeclis used to provide DOM document objects with methods
flushIdMapin order to handle
IdExprin XSL patterns.
XSL pattern objects have three (relevant) methods:
All methods get a DOM node as first parameter.
select returns the list of nodes (in document order)
selected by the pattern with node as context.
true iff node is matched
by the pattern. See the XSL specification for details.
checkNonemty accepts as optional second argument
true iff from context node
at least one node
is selected with value value (if specified).
The XSL pattern object attribute
contains the pattern string from which the object has been
from dmutil.xsl.pattern import Parser domtree= .... pattern= Parser('A/@HREF') a_hrefs= pattern.select(domtree) # the list of HREF attribute of A elements, in document order first= a_hrefs # may raise an IndexError pattern.match(first) # returns true h2_ancestor= Parser('ancestor(H2)').select(first) # the nearest H2 element above first or an IndexError exception, # if no such element exists
There are many more XSL pattern examples in the
test_pattern.py (inside the
Since version 0.3, experts can use the parser infrastructure
to create their own parser objects. The function
PatternFactory object as argument and creates a
parser. The parser parses XSL pattern strings and uses
the factory to create customized XSL pattern objects.
The factory determines these objects completely.
They may not have the methods described above for standard
XSL pattern objects. A factory must have all methods (and attributes) of
the standard pattern factory
with the same arguments. It may, however, use a completely
Python can be downloaded from the Python homepage, the XML package from the XML-SIG repository.
You may further need a small patch to
to fix a type error in PyDOMs attribute handling.
I posted this patch to the mailing list
You find it in its archive
attached to the announcement of XSL-Pattern.
Open Sourcelicense at your own risk. Please see the copyright notice at the beginning of
dmutil/xsl/pattern.py, for details.
idattribute. In order to support
idreferences, XSL-Pattern requires the document root to have a method
getIdMapreturning a dictionary mapping id names to nodes.
patternmodule contains the class
IdDecl. Its contructor has 2 optional parameters: a list of general id attributes and a dictionary mapping (some) element types to their id attribute. If an element type is a key in the dictionary, then its value specifies its id attribute (if it is empty or
None, then it has no id attribute); for all other element types, any of the attribute names given in the list can be its id attribute. A
IdDeclinstance provides methods
flushIdMapto a document when it is applied to it.
ATTENTION: Match patterns containing an id reference with a pattern parameter are extremely inefficient -- never use them! However, such patterns can be used as select pattern.
Notationnodes, which can occur in DOM trees. This incompatibility is not yet handled correctly.
EntityReferenceinto normal text nodes, at least if not explicitely told otherwise. However, in this process, the XSL requirement is violated that no two text nodes are adjacent. The function
normalizecan be applied to a document to merge adjacent text nodes.
SubNodeexpressions (optionaly filtered) are fairly efficient, as are
IdExprare exprensive; especially when used as match pattern.
IdExprwith pattern (rather than literal) argument used as match (rather than select) pattern are prohibitive.
IdentityExpr and absolute pattern could be more optimized.
ancestorin match pattern" threw an exception
...//OtherNodestartet above rather than at OtherNode when used as match pattern. (was correct as select pattern).
patternstringattribute; it is the string the object has been built from.