AdvancedQuery
is a Zope product aimed to overcome
several limitations of ZCatalog
's native search
function.
Like ZCatalog
's search, it supports elementary
index searches. While ZCatalog
can combine such
elementary searches only by "and", AdvancedQuery
allows them to be combined arbitrary with &
(and),
|
(or) and ~
(not).
While ZCatalog
supports an efficient sorting via an
index on one level, AdvancedQuery
supports sorting via
any level of (field) indexes. Moreover, it sorts the result
incrementally -- only as far as you access your result. This can
drastically speed up the time required for sorting. It uses
Python's generators for this (and thus requires Python 2.2 or
better).
Queries are specified by (full blown) Python objects. They are constructed in the following way:
Expression | printed as | Meaning |
---|---|---|
Eq(index, value,
filter=False) |
index = value |
the documents indexed by index under value |
Le(index, value,
filter=False) |
index <= value |
the documents indexed by index under a value less or equal value |
Ge(index, value,
filter=False) |
index >= value |
the documents indexed by index under a value greater or equal value |
MatchGlob(index, pattern,
filter=False) |
index =~ pattern |
the documents indexed by index under a value
matching the glob pattern. A glob pattern can contain
wildcards * (matches any sequence of characters) and
? (matches any single character).This query type is only supported by most string or unicode based ManagableIndex es (exception: RangeIndex ).
Many TextIndex es support glob matching via the
Eq query. |
MatchRegexp(index, regexp,
filter=False) |
index =~~ regexp |
the documents indexed by index under a value
matching the regular expression regexp. See the
re module documentation in the Python Library
Reference, for a description of regular expressions.This query type is only supported by most string or unicode based ManagableIndex es (exception:
RangeIndex ). |
Between(index, low,
high, filter=False) |
low <= index <=
high |
the documents indexed by index under a value between low and high |
In(index, sequence,
filter=False) |
index in sequence |
the documents indexed by index under a value in sequence |
Generic(index, value,
filter=False) |
index ~~ value |
this query type is used to pass any search expression to
index as understood by it. Such search expressions
usually take the form of a dictionary with query as
the most essential key. Generic is necessary to use
the full power of specialized indexes, such as the
level argument for PathIndex searches. |
Indexed(index) |
Indexed(index) |
the documents that are indexed by index. This does not work for all index types. |
LiteralResultSet(set) |
LiteralResultSet(set) |
the documents specified by set. set must be an IISet ,
IITreeSet or sequence of catalog
"data_record_id_"s.This can e.g. be used to further restrict the document set previously obtained through a query. |
~ query |
~ query |
Not: the documents that do not satisfy query |
query1 & query2 |
(query1 & query2) |
And: the documents satisfying both query1 and query2 |
And(*queries) |
(query1 & ... &
queryn) |
And: the documents satisfying all queries; if
queries is empty, any document satisfies this
And query |
query1 | query2 |
(query1 | query2) |
Or: the documents satisfying either query1 or query2 (or both) |
Or(*queries) |
(query1 | ... | queryn) |
Or: the documents satisfying (at least) one of
queries; if queries is empty, no document
satisfies this Or query |
And
and Or
queries are so called
CompositeQuerys. They possess a method
addSubquery(query)
to add an additional
subquery.
The constructors are imported from
Products.AdvancedQuery
.
AdvancedQuery
uses so called Monkey
Patching to give ZCatalog
the new method
makeAdvancedQuery(catalogSearchSpec)
. A
catalogSearchSpec is a search specification as described
in the Zope Book for ZCatalog
searches (essentially a
dictionary mapping index names to search specifications).
makeAdvancedQuery
returns the equivalent
AdvancedQuery
search object.
AdvancedQuery
uses so called Monkey
Patching to give ZCatalog
and (if available) the
CMF CatalogTool
the new method
evalAdvancedQuery(query, sortSpecs=(),
withSortValues=_notPassed)
.
evalAdvancedQuery
evaluates query and then
sorts the document result set according to
sortSpecs.
If withSortValues is not passed in, it is set to
True
if sortSpecs contains a ranking
specification (as you are probably interested in the rank) and to
False
otherwise.
If withSortValues, then the
data_record_score_
attribute of the returned proxies
is abused to hold the sort value. It is a tuple with one component
per component in sortSpecs. The attribute
data_record_normalized_score_
is set to
None
.
The CatalogTool
's evalAdvancedQuery
uses a new ValidityRange
index (if present) in
preference to the effective
and expires
indexes to restrict searches to valid objects.
ValidityRange
is expected to be a ManagableIndex
RangeIndex
. Searches via such a range index are considerably
more efficient than those via the individual indexes.
AdvancedQuery
supports incremental multi-level
lexicographic sorting via field index like indexes. If an index
used for sorting is not field index like (i.e. does not index an
object under at most one value), you may get funny (and partly non
determistic) results.
Sorting is specified by a sequence of sort specifications, each
for a single level. Such a specification is either an index name, a
pair index name and direction or a ranking specification (see
below). A direction is 'asc'
(ascending) or
'desc'
(descending); if the direction is not
specified, 'asc'
is assumed.
When the result contains documents not indexed by a sorting index, such documents are delivered after indexed documents. This happens always, independant of search direction.
From version 1.1 on, AdvancedQuery
supports
incremental filtering. Incremental filtering can be very promissing
for an unspecific subquery inside an otherwise specific And query,
especially for large Le
, Ge
,
Between
and range subqueries. If we use the index in
the normal way a huge Or query is constructed for such subqueries.
Even IncrementalSearch2
cannot fully optimize the
search against this huge Or query. Whith incremental filtering the
index is not used in the normal way. Instead, the remaining And
subqueries are used to produce a set of document candidates. These
are then filtered by the filtering subquery, discarding documents
not matching the subquery. Provided that the other And subqueries
already have reduced the document set sufficiently, incremental
filtering can save a lot of time.
You request incremental filtering for an (elementary) subquery
with the filter
keyword argument. Usually, you use it
only for some subqueries of specific And queries. Otherwise,
incremental filtering may not reduce but increase the query time
(even considerably).
If you have more than a single filtering subquery in an And query, their order might be relevant for efficiency. You should put filtering subqueries that are likely to reduce the document set more before other filtering subqueries.
Incremental filtering requires that you have version 1.0 of
IncrementalSearch
or IncrementalSearch2
installed. Furthermore, incremental filtering is only effective, if
the index supports it. This is true for most version 1.3
ManagableIndex
index types. If some condition for
incremental filtering is not met, the filter
keyword
is simply ignored.
From version 2.0 on, AdvancedQuery
supports
incremental ranking. Ranking is a form of sorting. Therefore, you
specify it as a sort spec. Ranking can be combined with other sort
specs in the usual way (leading to multi-level sorting).
Like sorting in general, ranking is performed incrementally -- just as far as you have looked at the result. Therefore, although ranking in general is very expensive, its effect can be small if you only look at the first few (hundred) result objects (rather than the several hundred thousands).
Currently, the ranking specifications
RankByQueries_Sum
, and RankByQueries_Max
are supported. In both cases, you call the constructors with one or
more pairs (q, vq), i.e. with a
sequence of weighted queries.
The rank of a document is the sum or the maximum of the weights for
queries matching the document, respectively.
Note that the runtime behaviour for RankByQueries_Sum
is exponential, that of RankByQueries_Max
linear in
the number of queries involved in the ranking.
Note that you probably want to normalize the document ranks. The
ranking classes above have methods getQueryValueSum()
and getQueryValueMax()
, respectively, that can help
with this.
from Products.AdvancedQuery import Eq, Between, Le # search for objects below 'a/b/c' with ids between 'a' and 'z~' query = Eq('path','a/b/c') & Between('id', 'a', 'z~') # evaluate and sort descending by 'modified' and ascending by 'Creator' context.Catalog.evalAdvancedQuery(query, (('modified','desc'), 'Creator',)) # search 'News' not yet archived and 'File's not yet expired. now = context.ZopeTime() query = Eq('portal_type', 'News') & ~ Le('ArchivalDate', now) | Eq('portal_type', 'File') & ~ Le('expires', now) context.Catalog.evalAdvancedQuery(query) # search 'News' containing 'AdvancedQuery' and filter out # not yet effective or still expired documents. query = Eq('portal_type', 'News') & Eq('SearchableText', 'AdvancedQuery') \ & Ge('expires', now, filter=True) & Le('effective', now, filter=True) context.Catalog.evalAdvancedQuery(query) # search for 'ranking' in 'SearchableText' and rank very high # when the term is in 'Subject' and high when it is in 'Title'. # print the id and the normalized rank from Products.AdvancedQuery import RankByQueries_Sum term = 'ranking' rs = RankByQueries_Sum((Eq('Subject', term),16), (Eq('Title', term),8)) norm = 1 + rs.getQueryValueSum() for r in context.Catalog.evalAdvancedQuery( Eq('SearchableText', term), (rs,) ): print r.getId, (1 + r.data_record_score_) / norm
You must not cache the result of an AdvancedQuery
unless you have ensured that sorting has finished (e.g. by
accessing the last element in the result). This is because
AdvancedQuery
uses incremental sorting with
BTrees
iterators. Like any iterator, they do not like
when the base object changes during iteration. Nasty types of
(apparently) non-deterministic errors can happen when the index
changes during sorting.
Download the most recent version via my Zope Page.
Install by unpacking the tar archive into your
Products
folder and restart your Zope.
This software is open source and licensed under a BSD style license. See the license file in the distribution for details.