XPath is the expression sublanguage of both XSLT and XPointer.
PyXPath is laid on top of PyDOM, the DOM implementation of Python's XML special interest group.
For parser construction, PyXPath uses bison
and Scott Hassan's
PyBison
package.
pyxpath.tgz
and pyxpath_s.tgz
.
pyxpath.tgz
is necessary to use the software.
The archive contains two python packages dmutil
and PyBison
.
Unpack it in Python's site-packages
directory.
While dmutil
contains code
developed by me,
PyBison
is a small (and slightly patched) part of
Scott Hassan's
PyBison
distribution.
pyxpath_s.tgz
contains additional sources.
You need this archive only if you want to change (or view)
the bison grammar for XPath.
To change the grammar, you will need bison
,
PyBison,
a C development system and the patch
utility.
PyBison must be patched with the patch file env.pat
provided in the archive.
The patch allows parser object specific customization.
dmutil.xsl.xpath
.
It contains the parser factory makeParser
and two evalation context classes ParseContext
and Env
.
makeParser
contructs
an XPath parser. Such a parser parses
XPath expression strings and constructs corresponding
XPath objects. An XPath object can be evaluated with
a node, a nodelist and an Env
instance
to obtain a value.
makeParser
accepts two optional parameters,
a ParseContext
instance context,
and a BaseFactory
instance factory.
context defines the namespaces and function
library available for XPath parsing. factory
is the factory object used to contruct XPath objects.
A XPath parser can be applied to a XPath expression string.
It accepts as optional argument a ParseContext
instance context. The expression is
parsed with the namespaces defined in context
and the context parameter given during parser
construction and with the functions defined by either
the context argument, the context argument
given for parser construction or
the factory. The context parameters
default to None
, the factory to
dmutil.xsl.DomFactory.DomFactory
.
With this default setting, the constructed XPath object does not
recognize namespaces and can use the functions defined in
the XPath core library; it can be evaluated with
three parameters, a PyDom node, a PyDom nodelist containing
the node and an Env
instance specifying the
available variables and their values.
from dmutil.xsl.xpath import makeParser, Env
domtree=.... # create a PyDom document
P= makeParser(); E= Env() # make a parser and a variable environment
E.setVariable('x','Hallo') # binds x to 'Hallo'
links= P('//A[@HREF]').eval(domtree,[domtree],E)
# selects all links in a HTML document
anchors= P('//A[@NAME]').eval(domtree,[domtree],E)
# selects all anchors in a HTML document
You find more XPath examples in the test_xpath.py
test case file.
XPath knows 5 data types (extendible): Boolean
,
Number
, String
, Nodeset
and Return Tree Fragment
.
PyXPath maps these to the Python data types int
,
float
, string
, list
and (unspecified) instance
, respectively.
You must use one of these types for values of variables.
xml-0.5.1
.
Python can be downloaded from the Python homepage, the XML package from the XML-SIG repository.
Open Source
license at your
own risk. Please see the copyright notice at the beginning of
dmutil/xsl/xpath.py
,
for details.
namespace
axis is not yet recognized.
id
referencesid
attributes. In order to support id
references, PyXPath requires each node._document
to have an attribute _idMap
. This attribute
must be a dictionary mapping elements to their id
attribute. If such an attribute does not exist, id
references are not found.
DomFactory
module contains the class
IdDecl
. Its contructor has a dictionary
mapping element names to id
attributes as idMap
parameter. If an IdDecl
instance is applied
to a document, it installs its idMap
as _idMap
attribute of the document.
CDATA
, EntityReference
and Notation
nodes, which can occur in DOM trees.
This incompatibility is not yet handled correctly.
CDATA
and EntityReference
into normal text nodes,
at least if not explicitely told otherwise. However,
in this process, the XSL requirement is violated that no two
text nodes are adjacent. The function normalize
can be applied to a document to merge adjacent text nodes.