(検索インデックスとメタデータ)

Description

How to program your custom fields and data queriable through portal_catalog.

はじめに

Indexing is action to make object data searchable. Plone stores available indexes in the database. You can create them through-the-web and inspect existing indexes in portal_catalog on Index tab.

Indexes and metadata

portal_catalog does subset of object field as a copy and makes them searchable.

  • Indexes make content searchable: Indexes are are stored values which are used to match queries. Indexed might be preprocessed to make the matching possible. For example, full text search indices run incoming text output through splitters and such filters to generate fast searchable data out of it.
  • Metadata make content summariable: Metadata, also known as columns, are stored values which can be displayed to the user with the search hit. They usually copy the field value as is.

Metadata can exist without index and vice versa.

Viewing indexes and indexed data

Indexed data

You can do this through portal_catalog tool in ZMI.

  • Click portal_catalog in the portal root
  • Click Catalog tab
  • Click any object

Indexes and metadata columns

Available indexes are stored in the database, not in Python code. To see what indexes your site has

  • Click portal_catalog in the portal root
  • Click Indexes and Metadata tabs

Creating an index

To perform queries on custom data, you need to add corresponding index to portal_catalog first.

E.g. If your Archetypes content type has a field:

        schema = [

           DateField("revisitDate",
                widget = atapi.DateWidget(
                    label="Revisit date"),
                    description="When you are alarmed this content should be revisited (one month beforehand this date)",
                    schemata="revisit"
                    ),
        ]

class MyContent(...):

        # This is automatically run-time generated function accessor method,
        # but could be any hand-written method as well
        # def getMyCustomValue(self):
        #        pass

You can add a new index which will index the value of this field, so you can make queries based on it later.

See more information about accessor methods.

ノート

If you want to create an index for content type you do not control yourself or if you want to do some custom logic in your indexer, please see Custom index method below.

Creating an index through the web

This method is suitable during development time - you can create an index to your Plone database locally.

  • Go ZMI

  • Click portal_catalog

  • Click Indexes tab

  • On top right corner, you have a drop down menu to add new indexes. Choose the index type you need to add.

    • Type: FieldIndex
    • Id: getMyCustomValue
    • Indexed attributes: getMyCustomValue

You can use Archetypes accessors methods directly as an indexed attribute. In example we use getMyCustomValue for AT field customValue.

The type of index you need depends on what kind queries you need to do on the data. E.g. direct value matching, ranged date queries, free text search, etc. need different kind of indexes.

  • After this you can query portal_catalog:

    my_brains = contex.portal_catalog(getMyCustomValue=111)
    for brain in my_brains:
            print brain["getMyCustomValue"]
    

Adding index using add-on product installer

You need to have your own add-on product which registers new indexes when the add-on installer is run. This is the recommended method for repeated installations.

You can create an index

  • Using catalog.xml where XML is written by hand
  • Create the index through the web and export catalog data from a development site using portal_setup tool Export functionality. The index is created through-the-web as above, XML is generated for you and you can fine tune the resulting XML before dropping it in to your add-on product.
  • Create indexes in Python code of add-on custom import step.
  • As a prerequisitement, your add-on product must have GenericSetup profile support.

This way is repeatable: index gets created every time an add-on product is installed. It is more cumbersome, however.

警告

There is a known issue of indexed data getting pruned when an add-on product is reinstalled. If you want to avoid this then you need to create new indexes in add-on installer custom setup step (Python code).

The example below is not safe for data prune on reinstall. This file is profiles/default/catalog.xml It installs a new index called revisit_date of DateIndex type.

<?xml version="1.0"?>
<object name="portal_catalog" meta_type="Plone Catalog Tool">
         <index name="revisit_date" meta_type="DateIndex">
                <property name="index_naive_time_as_local">True</property>
        </index>
</object>

For more information see

Custom index methods

plone.indexer package provides method to create custom indexing functions. These methods can be retrofitted to content types you do not directly control e.g. you do not need to mess with class code.

ノート

This method is available since Plone 3.3.

Indexing is defined as adapter for the content type marker interface(s) you wish it to index. You declare the adapter in your ZCML (alternative ways exist).

Example ZCML snippet:

<!-- Adapter name must match indexer name -->
<adapter
    name="recurrence_days"
    factory=".indexing.recurrence_days"/>

The easiest way to create an method which extracts the indexed value from the content item is using a function decorator from plone.indexer package.

"""

    Create recurrence_days index for vs.event content types

"""

# Python imports
import logging

# Zope imports
import Missing
import zope.interface
from zope.component import ComponentLookupError

# Plone imports
from Products.CMFPlone.CatalogTool import registerIndexableAttribute
from plone.indexer.decorator import indexer

# Custom imports
from dateable import kalends
from vs.event.interfaces import IVSEvent

# Do some logging from the indexer
logger = logging.getLogger("vs.event")

# Define which content items are indexed by a marker interface
# use zope.interface.Interface here if you want to match any Plone content item
@indexer(zope.interface.Interface)
def recurrence_days(object):
    """ Index the dates of recurrences as ordinals.

    Matches all VS.event content types and their subclasses
    which implement IVSEvent interface.

    This is called when

    * vs.event AT content type is saved and reindexObject() is called

    * portal_catalog rebuild is performed

    @param object: Indexed content item

    @return: Value to be stored on the index (format depends on the index type)

    ..note::

        Indexer swallows all raised exceptions silently.
    """

    # Some debug output to know whether our indexer was
    # called on save or not
    # Note: The default Plone log level is INFO, not DEBUG
    logger.debug("Indexing recurrence_days")

    try:
        # Get an adapter to extract recurrency information
        # from the content item
        # via Zope adapter look-up.
        # (apply IRecurrenct interface for IVSEvent object and
        # this results of adapter factory call)
        recurrence = kalends.IRecurrence(object)

        # Index all days when this event happens
        # for upcoming 5 years.

        # This will return a list of dates converted to int objects
        # to many hundres per content item...
        return recurrence.getOccurrenceDays()
    except (ComponentLookupError, TypeError, ValueError), e:

        # For some reason, our content type didn't support
        # recurrency. Probably it is a vs.event subclass
        # which disables this behavior.

        # Indexer exceptions will be failed silently...
        # try do something about it
        logger.error("Indexing vs.event failed")
        logger.exception(e)

        # The catalog expects AttributeErrors when a value can't be found
        # Use special Zope object to notify that the value is missing
        # (this differs for None - we can index None as a value)
        return Missing.Value

When indexing happens

Content item reindexing is run when

  • The content item is edited (saved)
  • portal_catalog rebuild is run (from Advanced tab)
  • If you add a new index you need to run Rebuild catalog to get the existing values from content objects to new index.
  • You might also want to call reindexObject() method manually in some cases

Index types

Zope 2 product PluginIndexes defines various portal_catalog index types used by Plone.

  • FieldIndex stores values as is
  • DateIndex and DateRangeIndex store dates (Zope 2 DateTime objects) in searhable format. The latter provides ranged searches.
  • KeywordIndex allows keyword-style look-ups (query term is matched against the all values of a stored list)
  • ZCTextIndex is used for full text indexing
  • ExtendedPathIndex is used for indexing content object locations.

Default Plone indexes and metadata columns

Some interesting indexes

  • start and end: Calendar event timestamps, used to make up calendar portlet
  • sortable_title: Title provided for sorting
  • portal_type: Content type as it appears in portal_types
  • Type: Translated, human readable, type of the content
  • path: Where the object is (getPhysicalPath accessor method).
  • object_provides: What interfaces and marker interfaces object has. KeywordIndex of interface full names.
  • is_default_page: is_default_page is method in CMFPlone/CatalogTool.py handled by plone.indexer, so there is nothing like object.is_default_page and this method calls ptool.isDefaultPage(obj)

Some interesting columns

  • getRemoteURL: Where to go when the object is clicked
  • getIcon: Which content type icon is used for this object in the navigation
  • exclude_from_nav: If True the object won’t appear in sitemap, navigation tree

Indexing an object

警告

Unit test warning: Usually Plone reindexes modified objects at the end of each request (each transaction). If you modify the object yourself you are responsible to notify related catalogs about the new object data.

Indexing an object is done by calling reindexObject() method. reindexObject() method is defined in ICatalogAware interface.

Plone calls reindexObject() if

  • The object is modified by the user using the standard edit forms

You must call reindexObject() if you

  • Directly call object field mutators
  • Otherwise directly change object data

reindexObject() method takes optional argument idxs which will list the changed indexes. If idxs is not given, all related indexes are updated even though they were not changed.

Example:

object.setTitle("Foobar")

# Object.reindexObject() method is called to reflect the changed data in portal_catalog.
# In our example, we change the title. The new title is not updated in the navigation,
# since the navigation tree and folder listing pulls object title from the catalog.

object.reindexObject(idxs=["Title"])

Also, if you modify security related parameters (permissions), you need to call reindexObjectSecurity().

TextIndexNG3

TextIndexNG3 is advanced text indexing solution for Zope.

Please read TextIndexNG3 README.txt regarding how to add support for custom fields. Besides installing TextIndexNG3 in GenericSetup XML you need to provide a custom indexing adapter.

# Add TextIndexNG3 in catalog.xml. Example:

<index name="getYourFieldName" meta_type="TextIndexNG3">

  <field value="getYourFieldName"/>

  <autoexpand value="off"/>
  <autoexpand_limit value="4"/>
  <dedicated_storage value="False"/>
  <default_encoding value="utf-8"/>
  <index_unknown_languages value="True"/>
  <language value="en"/>
  <lexicon value="txng.lexicons.default"/>
  <query_parser value="txng.parsers.en"/>
  <ranking value="True"/>
  <splitter value="txng.splitters.simple"/>
  <splitter_additional_chars value="_-"/>
  <splitter_casefolding value="True"/>
  <storage value="txng.storages.term_frequencies"/>
  <use_normalizer value="False"/>
  <use_stemmer value="False"/>
  <use_stopwords value="False"/>
</index>

# Create adapter which will add TextIndexNG3 indexing support for your custom fields. Example:

import logging

from Products.TextIndexNG3.adapters.cmf_adapters import CMFContentAdapter
from zope.component import adapts

logger = logging.getLogger("Plone")

class TextIndexNG3SearchAdapter(CMFContentAdapter):
    """ Adapter which provides custom field specific index information for TextIndexNG3
    """

    # Your content marker interface here
    adapts(IDescriptionBase)

    def indexableContent(self, fields):
        """ Produce TextIndexNG3 indexing information for the object

        Traceback::

              ZCatalog.py(536)catalog_object()
            -> update_metadata=update_metadata)
              Catalog.py(360)catalogObject()
            -> blah = x.index_object(index, object, threshold)
              Products/TextIndexNG3/TextIndexNG3.py(91)index_object()
            -> result = self.index.index_object(obj, docid)
              Products/TextIndexNG3/src/textindexng/index.py(114)index_object()
            -> default_language=self.languages[0])
              Products/TextIndexNG3/src/textindexng/content.py(99)extract_content()
            -> icc = adapter.indexableContent(fields)
            > indexableContent()

        """
        logging.debug("Indexing" + str(self.context))

        # Use superclass to construct generic field adapters (id, title, description, SearchableText)
        icc = CMFContentAdapter.indexableContent(self, fields)

        # These fields have their own TextIndexNG3 indexes which
        # are queried separately from SearchableText
        accessors = [ "getClassifications", "getOtherNames" ]

        for accessor in accessors:

            try:
                method = getattr(self.context, accessor)
            except AttributeError:
                logger.warn("Declared indexing for unsuppoted accessor:" + accesor)
                continue

            value = method()

            # We might have a value which is not a real string,
            # but must be first stringified
            try:
                value = unicode(value)
            except UnicodeDecodeError, e:
                # Bad things happen here?
                logger.warn("Failed to index field:" + accessor)
                logger.exception(e)
                continue

            # Convert value to text format (utf-8) expected
            # by the indexer
            text = self._c(value)

            icc.addContent(accessor, text, self.language)

        return icc

# Add adapter in your ZCML:

<adapter factory=".customcontent.TextIndexNG3SearchAdapter"/>