Patterns are used extensively in the XSL transformation language and its control structures. They can be used outside XSL, too, for e.g. querying/selecting/matching parts of HTML/SGML/XML documents.
XSL-Pattern is laid on top of PyDOM, the DOM implementation of Pythons XML special interest group.
For parser construction, XSL-Pattern uses bison
and Scott Hassans
PyBison
package.
xslpattern.tgz
and xslpattern_s.tgz.
xslpattern.tgz is necessary to use the software.
The archive contains two python packages dmutil
and PyBison.
Unpack it in Pythons site-packages directory.
While dmutil contains code
developed by me,
PyBison is a small (and slightly patched) part of
Scott Hassans
PyBison distribution.
xslpattern_s.tgz contains additional sources.
You need this archive only if you want to change (or view)
the bison grammar for XSL patterns.
To change the grammar, you will need bison,
PyBison,
a C development system and the patch utility.
PyBison must be patched with the patch file env.pat
provided in the archive (This requirement arises with version 0.3.
The patch allows parser object specific customization.
This became necessary for parser objects using different
pattern factories.).
dmutil.xsl.pattern.
It contains an XSL pattern parser Parser,
a parser object which
parses an XSL pattern string into a pattern object.
Besides, it contains the exception classes Eerror
and SyntaxError (derived form Error).
SyntaxError is raised for syntax errors,
Error for other errors.
The class
IdDecl is used to provide DOM document objects
with methods getIdMap and flushIdMap
in order to handle IdExpr in XSL patterns.
XSL pattern objects have three (relevant) methods:
select, match and checkNonempty.
All methods get a DOM node as first parameter.
select returns the list of nodes (in document order)
selected by the pattern with node as context.
match returns true iff node is matched
by the pattern. See the XSL specification for details.
checkNonemty accepts as optional second argument
a value.
It returns true iff from context node
at least one node
is selected with value value (if specified).
The XSL pattern object attribute patternstring
contains the pattern string from which the object has been
built.
from dmutil.xsl.pattern import Parser
domtree= ....
pattern= Parser('A/@HREF')
a_hrefs= pattern.select(domtree)
# the list of HREF attribute of A elements, in document order
first= a_hrefs[0]
# may raise an IndexError
pattern.match(first)
# returns true
h2_ancestor= Parser('ancestor(H2)').select(first)[0]
# the nearest H2 element above first or an IndexError exception,
# if no such element exists
There are many more XSL pattern examples in the
test file test_pattern.py (inside the
dmutil/xsl package).
Since version 0.3, experts can use the parser infrastructure
to create their own parser objects. The function makeParser
takes a PatternFactory object as argument and creates a
parser. The parser parses XSL pattern strings and uses
the factory to create customized XSL pattern objects.
The factory determines these objects completely.
They may not have the methods described above for standard
XSL pattern objects. A factory must have all methods (and attributes) of
the standard pattern factory PatternFactory
with the same arguments. It may, however, use a completely
different implementation.
xml-0.5.x.
Python can be downloaded from the Python homepage, the XML package from the XML-SIG repository.
You may further need a small patch to xml.dom.core
to fix a type error in PyDOMs attribute handling.
I posted this patch to the mailing list xml-sig@python.org
You find it in its archive
attached to the announcement of XSL-Pattern.
Open Source license at your
own risk. Please see the copyright notice at the beginning of
dmutil/xsl/pattern.py,
for details.
id referencesid attribute. In order to support id
references, XSL-Pattern requires the document root to
have a method getIdMap returning a dictionary
mapping id names to nodes.
pattern module contains the class
IdDecl. Its contructor has 2 optional
parameters: a list of general id attributes and a
dictionary mapping (some) element types to their id attribute.
If an element type is a key in the dictionary, then
its value specifies its id attribute (if it is empty or
None, then it has no id attribute);
for all other element types, any of the attribute names
given in the list can be its id attribute.
A IdDecl instance provides methods
getIdMap and flushIdMap to
a document when it is applied to it.
ATTENTION: Match patterns containing an id reference with a pattern parameter are extremely inefficient -- never use them! However, such patterns can be used as select pattern.
CDATA, EntityReference
and Notation nodes, which can occur in DOM trees.
This incompatibility is not yet handled correctly.
CDATA
and EntityReference into normal text nodes,
at least if not explicitely told otherwise. However,
in this process, the XSL requirement is violated that no two
text nodes are adjacent. The function normalize
can be applied to a document to merge adjacent text nodes.
SubNode expressions
(optionaly filtered) are fairly efficient, as are
IdentityExpr, ParentExpr and
Ancestors. IdExpr are
exprensive; especially when used as match pattern.
IdExpr with pattern (rather than
literal) argument used as match (rather than select)
pattern are prohibitive.
IdentityExpr and absolute pattern could be more optimized.
ancestor in match pattern" threw an exception
...//OtherNode startet
above rather than at
OtherNode when used as match pattern.
(was correct as select pattern).
patternstring attribute;
it is the string the object has been built from.