DTMLMethod
's PrincipiaSearchSource
returns
raw unprocessed content. If this is used for document indexing,
documents may be indexed for irrelevant terms, such as
HTML tags. On the other hand, essential index terms may be
missing or wrong, e.g. due to the use of HTML entities
or because documents components (included e.g. via DTML tags) are not
included.
This is especially serious for internationalized text where almost
all content is dynamically generated based on the effective language.
The following module can be used to preprocess DTML methods and documents before indexing. It renders the object and then applies HTML filtering on the result. This filtering strips HTML/SGML tags and translates HTML 2.0 entities. The result can then be feed to ZCatalog's indexing machinery.
# $Id: CatalogSupport.html,v 1.1.1.1 2002/02/23 13:40:19 dieter Exp $ '''Catalog Support Routines.''' from sgmllib import SGMLParser from string import join class _StripTagParser(SGMLParser): '''SGML Parser removing any tags and translating HTML entities.''' from htmlentitydefs import entitydefs data= None def handle_data(self,data): if self.data is None: self.data=[] self.data.append(data) def __str__(self): if self.data is None: return '' return join(self.data,'') def filterRenderedHTML(self): '''renders *self* and filters HTML. can be used as method for DTML Methods/Documents indexing. ''' # we pass "render_for_catalog__", such that a catalog aware object # may take special actions, e.g. not create sessions # rendering may raise exceptions; in this case, this document # does not provide information for this indexing category. try: render= self(self,self.REQUEST, render_for_catalog__=1) except: return '' # filter try: p= _StripTagParser() p.feed(render); p.close() return str(p) except: return ''
You can download this module, too.
To use it automatically, do the following steps:
filterRenderedHTML
an external method accessible from your catalogued objects,PrincipiaSearchSource
but in analogy to it.
render_for_catalog__
inside a DTML Method/Document, if it requires special
treatment during index preprocessing.
You may need a patch for "ZCatalog.ZopeFindAndApply" (or wait for Zope 2.2.1), because older ZopeFindAndApply versions strip the acquisition context. Therefore, your external method would not be found by the objects and the index would remain empty.
The module is in an alpha state. So, expect some problems. The following problems are forseen:
<th>Customer</th><th>Product</th>
results in CustomerProduct
.PrincipiaSearchSource
in addition.