Patterns are used extensively in the XSL transformation language and its control structures. They can be used outside XSL, too, for e.g. querying/selecting/matching parts of HTML/SGML/XML documents.
XSL-Pattern is laid on top of PyDOM, the DOM implementation of Pythons XML special interest group.
For parser construction, XSL-Pattern uses bison
and Scott Hassans
PyBison
package.
xslpattern.tgz
and xslpattern_s.tgz
.
xslpattern.tgz
is necessary to use the software.
The archive contains two python packages dmutil
and PyBison
.
Unpack it in Pythons site-packages
directory.
While dmutil
contains code
developed by me,
PyBison
is a small (and slightly patched) part of
Scott Hassans
PyBison
distribution.
xslpattern_s.tgz
contains additional sources.
You need this archive only if you want to change (or view)
the bison grammar for XSL patterns.
To change the grammar, you will need bison
,
PyBison,
a C development system and the patch
utility.
PyBison must be patched with the patch file env.pat
provided in the archive (This requirement arises with version 0.3.
The patch allows parser object specific customization.
This became necessary for parser objects using different
pattern factories.).
dmutil.xsl.pattern
.
It contains an XSL pattern parser Parser
,
a parser object which
parses an XSL pattern string into a pattern object.
Besides, it contains the exception classes Eerror
and SyntaxError
(derived form Error
).
SyntaxError
is raised for syntax errors,
Error
for other errors.
The class
IdDecl
is used to provide DOM document objects
with methods getIdMap
and flushIdMap
in order to handle IdExpr
in XSL patterns.
XSL pattern objects have three (relevant) methods:
select
, match
and checkNonempty
.
All methods get a DOM node as first parameter.
select
returns the list of nodes (in document order)
selected by the pattern with node as context.
match
returns true
iff node is matched
by the pattern. See the XSL specification for details.
checkNonemty
accepts as optional second argument
a value.
It returns true
iff from context node
at least one node
is selected with value value (if specified).
The XSL pattern object attribute patternstring
contains the pattern string from which the object has been
built.
from dmutil.xsl.pattern import Parser domtree= .... pattern= Parser('A/@HREF') a_hrefs= pattern.select(domtree) # the list of HREF attribute of A elements, in document order first= a_hrefs[0] # may raise an IndexError pattern.match(first) # returns true h2_ancestor= Parser('ancestor(H2)').select(first)[0] # the nearest H2 element above first or an IndexError exception, # if no such element exists
There are many more XSL pattern examples in the
test file test_pattern.py
(inside the
dmutil/xsl
package).
Since version 0.3, experts can use the parser infrastructure
to create their own parser objects. The function makeParser
takes a PatternFactory
object as argument and creates a
parser. The parser parses XSL pattern strings and uses
the factory to create customized XSL pattern objects.
The factory determines these objects completely.
They may not have the methods described above for standard
XSL pattern objects. A factory must have all methods (and attributes) of
the standard pattern factory PatternFactory
with the same arguments. It may, however, use a completely
different implementation.
xml-0.5.x
.
Python can be downloaded from the Python homepage, the XML package from the XML-SIG repository.
You may further need a small patch to xml.dom.core
to fix a type error in PyDOMs attribute handling.
I posted this patch to the mailing list xml-sig@python.org
You find it in its archive
attached to the announcement of XSL-Pattern.
Open Source
license at your
own risk. Please see the copyright notice at the beginning of
dmutil/xsl/pattern.py
,
for details.
id
referencesid
attribute. In order to support id
references, XSL-Pattern requires the document root to
have a method getIdMap
returning a dictionary
mapping id names to nodes.
pattern
module contains the class
IdDecl
. Its contructor has 2 optional
parameters: a list of general id attributes and a
dictionary mapping (some) element types to their id attribute.
If an element type is a key in the dictionary, then
its value specifies its id attribute (if it is empty or
None
, then it has no id attribute);
for all other element types, any of the attribute names
given in the list can be its id attribute.
A IdDecl
instance provides methods
getIdMap
and flushIdMap
to
a document when it is applied to it.
ATTENTION: Match patterns containing an id reference with a pattern parameter are extremely inefficient -- never use them! However, such patterns can be used as select pattern.
CDATA
, EntityReference
and Notation
nodes, which can occur in DOM trees.
This incompatibility is not yet handled correctly.
CDATA
and EntityReference
into normal text nodes,
at least if not explicitely told otherwise. However,
in this process, the XSL requirement is violated that no two
text nodes are adjacent. The function normalize
can be applied to a document to merge adjacent text nodes.
SubNode
expressions
(optionaly filtered) are fairly efficient, as are
IdentityExpr
, ParentExpr
and
Ancestors
. IdExpr
are
exprensive; especially when used as match pattern.
IdExpr
with pattern (rather than
literal) argument used as match (rather than select)
pattern are prohibitive.
IdentityExpr
and absolute pattern could be more optimized.
ancestor
in match pattern" threw an exception
...//OtherNode
startet
above rather than at
OtherNode when used as match pattern.
(was correct as select pattern).
patternstring
attribute;
it is the string the object has been built from.