Home > Resources > Articles > Custom ECM Solutions > Using Documentum’s Verity Full Text Engine

Using Documentum’s Verity Full Text Engine

October 31, 2004 by Blue Fish Development Group

Learn to configure and use the embedded Verity Full Text Engine within Documentum.

About This Article

This article describes the use of Verity’s Full Text Engine (FTE) with Documentum’s Content Server 5.2. It covers the basic principles of Documentum’s implementation of Verity’s FTE and the query language needed to perform powerful searches.

This document will prove helpful if:

  • You’re just beginning in your installation of Documentum and want to understand how to search documents and attributes.
  • You’ve already installed Documentum but want more detail around using the embedded full-text search.

To get the most from this article, you should already have:

  • A working knowledge of Documentum Content Server 5.2.
  • A working knowledge of how attributes and content are stored in Documentum.

Full Text Engine Overview

Verity’s Full Text Engine is a powerful search tool embedded within Documentum’s Content Server. It enables a user to search both attributes and content stored within the Docbase without leaving the Documentum environment. Using the full-text search a user can perform a comprehensive search of both attributes and content through the execution of a single query.

Full-text searching is enabled by creating a full-text index of the content and attributes on the Content Server. A full-text index is an index on the attributes and content files associated with dm_sysobject or dm_sysobject subtypes within the Docbase. Utilizing effective full-text indexes limits the need for Documentum to perform full table scans thus making the search more efficient. Each storage area can have only one searchable index but can contain many standby indexes that are not searchable by the end user but prove helpful in testing.

Documentum Content Server 5.2.5 embeds Verity FTE version 2.7.1b with Verity 7.3.1 filters.

Using the Full Text Engine

Indexing Object Type Attributes

An object type’s attributes are marked for indexing by setting the a_full_text attribute to true. By default all string attributes of dm_sysobject or dm_sysobject subtypes plus the r_creation_date attribute are indexed. Updating the a_full_text attribute can be accomplished through Documentum Administrator or through a DQL statement:

  • Adding an attribute to full-text index (setting a_full_text to true)
                                ALTER TYPE object_name ADD_FTINDEX on attribute_name
                            
  • Removing an attribute from full-text index (setting a_full_text to false)
                                ALTER TYPE object_name DROP_FTINDEX on attribute_name
                            

If you add a string attribute to an object type that already has indexed instances you must reset and update the full-text index before the new attribute will be included. Indexes may be reset and updated with the full-text administrative methods, RESET_FTINDEX and UPDATE_FTINDEX, respectively. The full-text administrative methods are fully explained in the last section, Full Text Administrative Methods.

Indexed Objects

Only objects with an associated content file are included in full-text index, even if the content contains only a single space. Objects without content such as dm_folder objects are not included in the index (i.e. the objects attributes will not be indexed). for example, a contentless dm_document object will not have any of its attributes indexed. However, if an object has a content file that is not an indexable format the attributes and not the content of the object will be indexed.

Query Syntax Language

Once the full-text index has been configured, you can begin querying both the content and attributes through the full text engine. The full text engine supports 2 types of searches: Document Search and Topic Search.

Document Search

Document searches can be used for quick simple searching of specific words within an object’s content. Boolean operators such as AND, OR, and NOT are supported. To perform a document search against the Docbase add the SEARCH DOCUMENT CONTAINS clause to your DQL statement.

                
    SELECT * FROM DM_DOCUMENT SEARCH DOCUMENT CONTAINS 'searchword1' AND 'searchword2' OR NOT 'searchword3'
    
            

Capabilities not available through a document search include:

  • Searching attribute/value pairs
  • Phrases (whitespace), wildcards, and other relationships such as synonyms

Topic Search

Topic searches fill the void that is left by the limitations of simple document searches. A topic search has the same capabilities of a document search but it also includes advanced searching such as attribute/value pairs, linguistic operations (near, stem), phrases, wildcards, and fuzzy search. It provides a more robust means of searching content and attributes by utilizing Verity’s Query language (VQL). To perform a topic search against the Docbase add the SEARCH TOPIC clause to your DQL statement.

                SELECT * FROM DM_DOCUMENT SEARCH TOPIC 'searchword1' AND 'searchword2' OR NOT 'searchword3'
            

VQL utilizes a series of operators and modifiers within the search syntax. The operators and modifiers are grouped into categories: Concept Operators, Evidence Operators, Proximity Operators, Relational Operators, Search Modifiers, and Score Operators.

Concept Operators

Identifies a concept in a document by combining the meanings of search elements. Also known as Boolean operations.

Syntax Description Example
ACCRUE To find objects that include at least one of the search elements that you specify. Results are ranked based on the number of search terms found. 1) SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'blue <ACCRUE> fish';


2) SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<ACCRUE> (blue, fish)';




Results: Both queries return the same results. Results include documents that contain blue and fish but ranking those with both blue and fish higher.
, (comma) To find objects containing at least one of the words specified, ranking them using “the more, the better” approach, so objects with the most evidence of the words searched for are given the highest rank. Similar to ACCRUE. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'blue, fish';




Results: Returns same results as ACCRUE
ALL To find objects that contain all of the search terms specified. ALL and AND retrieve the same results, but queries using ALL are always assigned a score of 1.00. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'blue <ALL> fish';




Results: Returns all objects that contain blue and fish setting the relevance to 1.00
AND To find objects where all words/conditions are found. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'blue <AND> fish';




Results: Returns all objects that contain blue and fish.
ANY To find objects where at least one word/condition is found. ANY and OR retrieve the same results, but queries using ANY are always assigned a score of 1.00. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'blue <ANY> fish';




Results: Returns all objects that contain blue or fish setting the relevance ranking to 1.00.
OR To find objects where at least one word/condition is found. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'blue <OR> fish';




Results: Returns all objects that contain blue or fish.

Evidence Operators

Specifies either a basic or an intelligent word search. A basic word search finds objects that contain only the word or words specified in the query. An intelligent word search expands the query terms to create an expanded word list so that the search returns objects that contain variations of the query terms. For example, the THESAURUS operator selects objects that contain the word specified, as well as its synonyms. This is also referred to as a “fuzzy” search.

Syntax Description Example
SOUNDEX Expands the search to include the word that you enter and one or more words that sound like or whose letter pattern is similar to the word specified. Collections do not have sound-alike indexes by default; to use this feature you must build sound-alike indexes.

Objects are not relevance-ranked unless the MANY modifier is used.

SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<SOUNDEX> blue';




Results: Returns objects that contain words such as “blue”, “ball”, “bill”, “bail”, etc. Blue has a Soundex value of B400
STEM Expands the search to include the word that you enter and its variations.

Objects are not relevance-ranked unless the MANY modifier is used.
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<STEM> fish';




Results: Returns objects that contain words such as “fish”, “fishing”, “fished”, etc.
THESAURUS Expands the search to include the word that you enter and its synonyms. The default synonym index used by Documentum is called vdk20.syd. It is located under the verity installation folder on the Content Server.

Objects are not relevance-ranked unless the MANY modifier is used.
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<THESAURUS> happy';




Results: Returns objects that contain words such as “happy”, “glad”, “joyful”, if they are included as synonyms to happy.
TYPO/N Expands the search to include the word that you enter plus words that are similar to the query term. This operator performs “approximate pattern matching” to identify similar words. The optional N variable in the operator name expresses the maximum number of errors between the query term and a matched term. If N is not specified, the default error distance is 2.

Note: A query term specified with TYPO/N can have a maximum length of 32 characters. Also, TYPO/N is not supported with multi-byte character sets. The TYPO/N is impractical for use in large collections (greater than 100,000 documents unless a current spanning word list is available) or in performance-sensitive environments. Performance can be improved by generating a spanning word list for the collections being used.
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<TYPO/1> log';




Results: Returns objects that contain words such as “log”, “blog”, “lag”, “leg”, “flog”
WILDCARD Matches wildcard characters included in search strings. Wildcard characters include *, ?, {}, [], [^], or [-]. The apostrophe (*) and question mark(?) automatically indicate a wildcard specification so the use of <WILDCARD> is not necessary.

Documents are not relevance-ranked unless the MANY modifier is used.
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<WILDCARD> "l[aeo]g"';




Results: Returns objects that contain words such as “log”, “lag”, “leg”. Note that the search term must be enclosed in double quotes (“”).
WORD A basic word search, selecting objects that include one or more instances of the search term. The WORD operator is automatically implied in any SIMPLE query.

Documents are not relevance-ranked unless the MANY modifier is used.
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<WORD> fish';




Results: Returns objects that contain fish.

Proximity Operators

Specifies the relative location of specific words in the content. Specified words must be in the same phrase, paragraph, or sentence for an object to be retrieved.

Syntax Description Example
IN To find objects that contain specified values in one or more document zones. A document zone represents a region of a document, such as the document’s summary, date, or body text.

To use IN zones must be defined for your collection by your administrator. If you use the IN operator to search collections without defined zones, no objects will be selected. Also, the zone name you specify must match the zone names defined in your collections.
To search for a term only within the one or more zones that have certain conditions, you qualify the IN operator with the WHEN operator.
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '("Content Server",Web Publisher) <IN> summary';




Results: Returns objects that contain the phrase “Content Server” or stemmed variations of Web Publisher in the summary.
NEAR To find objects containing specified search terms within close proximity to each other. Object scores are calculated based on the relative number of words between search terms. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '"Content" <NEAR> "Server"';




Results: Returns objects that contain the phrase “Content” and the phrase “Server” setting the relevancy ranking higher for documents with the words closer together.
NEAR/N To find objects containing two or more search terms within N number of words of each other, where N is an integer between 1 and 1024. NEAR/1 searches for two words that are next to each other. The closer the search terms are within an object, the higher the object’s score when they are separated by N words or less.
You can specify multiple search terms using multiple instances of NEAR/N as long as the value of N is the same.
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '"Content" <NEAR/1> "Server"';




Results: Returns objects that contain the phrase “Content” and the phrase “Server next to one another. This gives the same results as a searching for the phrase “Content Server”
PARAGRAPH To find objects that include all of the words you specify within the same paragraph. To search for three or more words or phrases in a paragraph, you must use the PARAGRAPH operator between each word or phrase. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '"Apples" <PARAGRAPH> "Oranges" <PARAGRAPH> "Grapes"';




Results: Returns objects that contain the phrase “Apples”, the phrase “Oranges”, and the phrase “Grapes” within the same paragraph.
PHRASE To find objects that include a phrase you specify. A phrase is a grouping of two or more words that occur in a specific order.

By default, two or more words separated by a space are considered to be a phrase in simple syntax. Two or more words enclosed in double quotes are also considered to be a phrase.
1) SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '"red" <PHRASE> "apple"';


2) SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<PHRASE> ("red","apple")';




Results: Both queries return the same results. Returns objects that contain the phrase “red apple”.
SENTENCE To find objects that include all of the words you specify within the same sentence. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '"red" <SENTENCE> "apple"';




Results: Returns objects that contain the phrase “red” and the phrase “apple” within the same sentence.
WHEN To find objects that contain specified values in one or more document zones upon which certain conditions have been placed.

A zone is a pre-defined section of content. Verity includes built in support for HTML tags and treats tags within an xml file as a zone. For example,
<partner>Documentum</partner> in an xml file represent the zone “partner” and the value in the zone is Documentum.

Example 1: Search for all objects that have links labeled download that point to your company’s web site. (i.e. <a href=”www.foo.com/new”>new</a>

Solution: SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '"new" <IN> a <WHEN> (href <CONTAINS> "foo")';




Example 2: Search for all software objects that have Documentum as a partner. In this example, the object’s xml file contains an element to store partner information. In Verity lingo, we are searching for all objects that have “Documentum” in a zone named “partner” when the attribute “type” is equal to “software”.

Solution: SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '"Documentum" <IN> partner <WHEN> (type = "select <OR> type = "primary")';

Relational Operators

Search values of specific attributes that have been full-text indexed. These operators perform a filtering function by selecting objects that contain specified attribute values. Documents retrieved using relational operators are not relevance-ranked, and you cannot use the MANY modifier with relational operators.

Syntax Description Example
CONTAINS To find objects by matching the word or phrase that you specify with the values stored in a specific attribute field. Objects are selected only if the search terms specified appear exactly the same in the attribute value. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'subject <CONTAINS> documentum';




Results: Returns objects whose subject attribute contains documentum
ENDS To find objects by matching the character string that you specify with the ending characters of the values stored in a specific attribute field. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'subject <ENDS> Guide';




Results: Returns objects whose subject attribute ends with Guide (i.e. Development Guide, Help Guide, etc.)
MATCHES To find objects by matching the query string with values stored in a specific attribute field. Objects are selected only if the search elements specified match the attribute value exactly. If a partial match is found, the object is not selected. You can use ? and * to represent individual and multiple characters, respectively, within a string. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'subject <MATCHES> Development Guide';




Results: Returns objects whose subject attribute equals Development Guide.
STARTS To find objects by matching the character string that you specify with the starting characters of the values stored in a specific attribute field. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'subject <STARTS> Documentum';




Results: Returns objects whose subject attribute begins with Documentum.
SUBSTRING To find objects by matching the query string that you specify with any portion of the strings in a specific attribute field. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'subject <SUBSTRING> server';




Results: Returns objects whose subject attribute contain server.
>=, !=, <, >, <=, >= To find objects that are equal to (=), not equal to (!=), less than (<<>),less than or equal (<<=>), greater than (<>>), greater than or equal (<>=>).


These operators work well for numeric or date comparisons.
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'subject = Development Guide';




Results: Returns objects whose subject attribute equals Development Guide

Search Modifiers

Changes Verity’s default behavior. These modifiers are used in conjunction with operators to change the standard behavior of an operator.

Syntax Description Example
CASE Specifies a case-sensitive search. Normally, Verity searches are case-insensitive for search terms entered in all uppercase or all lowercase, and case-sensitive for mixed-case search strings.

Use with the following operators: WORD or WILDCARD
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<CASE> Server <OR <CASE> server';




Results: Returns objects that contain Server or server but not SERVER.
MANY Counts the density of words, stemmed variations, or phrases in a document and produces a relevance-ranked score for retrieved objects. Use with the following operators: WORD, WILDCARD, STEM, SOUNDEX, PHRASE, SENTENCE, THESAURUS or PARAGRAPH.

The MANY modifier cannot be used with AND, OR, ACCRUE, or Relational operators.

SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<MANY> Content Server <AND> Documentum';




Results: Returns objects that contain the phrase Content Server and Documentum setting the relevancy higher on objects that contain a higher occurrence of Content Server.
NOT Excludes objects that contain the specified word or phrase. For use with any operator. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'Documentum <NOT> Content Server';




Results: Returns objects that contain the phrase Documentum and do not contain the phrase Content Server.
ORDER Specifies that the search elements must occur in the same order in the object as they are specified in the query. Use with the following operators: PARAGRAPH, SENTENCE, ALL or NEAR/N.

The ORDER modifier should be placed before any operator
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<ORDER><PARAGRAPH> ("Documentum", "server")';




Results: Returns objects that contain the phrase Documentum before server within a paragraph.
[X] Assigns a relative importance, or weight, to search terms from 1 to 100, where 1 represents the lowest importance and 100 represents the highest. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '[50]Server, [75]Documentum';




Results: Returns objects that contain the phrase Documentum or Server assigning a 75% relevancy to Documentum and a 50% relevancy to Server.

Score Modifiers

Adjusts the ranking of returned documents.

Syntax Description Example
COMPLEMENT Calculates scores for objects matching a query by taking the complement (subtracting from 1) of the scores for the query’s search elements. The new score is 1 minus the search element’s original score. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<COMPLEMENT> "Blue Fish"';




Results: Returns all objects but recalculates their score. If the search objects’s original score is .785, the COMPLEMENT operator recalculates the score as .215. In this query all objects that do not contain Blue Fish will return a score of 1.
PRODUCT Multiplies the scores for the search elements in each document matching a query. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<PRODUCT> (blue, fish)';




Results: Returns all objects that contains both blue and fish (if either value returns a score of 0 the product will be 0). Takes the product of the resulting scores (blue and fish separately) as the final score. If the search objects’s original score blue and fish score is .5 and .5 respectively the PRODUCT operator recalculates the score as .25.
SUM Adds the scores for the search element in each document matching a query, up to a maximum value of 1. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<SUM> (blue, fish)';




Results: Returns all objects that contain blue or fish (if either value returns a score <>> 0 the sum will be <>> 0. Takes the sum of the resulting scores (blue and fish separately) as the final score. If the search objects’s original score blue and fish score is .5 and .5 respectively the PRODUCT operator recalculates the score as 1.
YESNO Forces the score of an element to 1 if the element’s score is nonzero. YESNO can be used to avoid relevance ranking. SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC '<YESNO> "Blue Fish"';




Results: Returns all objects that contain the phrase “Blue Fish” setting the score to 1. if the retrieval result of the search on “Blue Fish” is 0.75, the YESNO operator forces the result to 1.

Documentum Full Text Search Keywords

Documentum provides a set of full-text search keywords for both document and topic searches. The keywords are returned with the result set as a part of the select statement. It isn’t necessary to include a SEARCH clause in the query to include the full-text keywords. However, for some keywords such SUMMARY or TEXT, the return values are not useful without the use of a SEARCH clause. The table below lists the available search keywords:

Keyword Description Example
CONTENTID or MCONTENTID Returns the object id of the object representing the content file that matches the search criteria.

Use CONTENTID for objects with only one content file. Use MCONTENTID for objects with multiple pages (content files).
SELECT r_object_id, object_name, CONTENTID FROM dm_document SEARCH TOPIC 'Blue Fish';
HITS or MHITS Returns the number of times the search term/condition matched the object. This value is returned as an integer.

Use HITS for objects with only one content file. Use MHITS for objects with multiple pages (content files).
SELECT r_object_id, object_name, HITS FROM dm_document SEARCH TOPIC 'Blue Fish';
ISCURRENT Returns the TRUE (1) if the object is the current version and FALSE (0) if it is not. SELECT r_object_id, object_name, ISCURRENT FROM dm_document SEARCH TOPIC 'Blue Fish';
ISPUBLIC Returns the TRUE (1) if the object is a public object and FALSE (0) if it is not. SELECT r_object_id, object_name, ISPUBLIC FROM dm_document SEARCH TOPIC 'Blue Fish';
OBJTYPE Returns the r_object_type of each result. SELECT r_object_id, object_name, OBJTYPE FROM dm_document SEARCH TOPIC 'Blue Fish';
OFFSET Returns the location of the word(s) on the content page. This is expressed as the number of words from the beginning of the page.

This keyword is to be used in conjunction with TEXTPAGE and is only useful against content that is indexed from PDF renditions (not PDFText renditions).
SELECT r_object_id, object_name, OFFSET FROM dm_document SEARCH TOPIC 'Blue Fish';
PAGE_NO Returns the page number of the content that contains the search terms. SELECT r_object_id, object_name, PAGE_NO FROM dm_document SEARCH TOPIC 'Blue Fish';
POSTION Returns the offset of each object index entry from the begginning of the full-text indexed document. You must select the r_object_id attribute and order the results by r_object_id. This keyword can only be used for content in PDFText format and it cannot be used with a repeating attribute in the select list. SELECT r_object_id, object_name, PAGE_NO FROM dm_document SEARCH TOPIC 'Blue Fish' order by r_object_id;
SCORE or MSCORE Returns the relevance ranking for each object.

Use SCORE for objects with only one content file. Use MSCORE for objects with multiple pages (content files).
SELECT r_object_id, object_name, SCORE FROM dm_document SEARCH TOPIC 'Blue Fish';
SUMMARY Returns a summary of each object with the result set. This summary is four sentences defined by the Verity full text search engine after analyzing the object content. The sentences vary in length and may not exceed 255 characters. SELECT r_object_id, object_name, SUMMARY FROM dm_document SEARCH TOPIC 'Blue Fish';
TEXT Returns the word(s) matched by a non-specific full-text search criteria such as STEM or SOUNDEX.

Note: this returns a row for each word matched so multiple rows can be returned for each object. Use this keyword carefully.
SELECT r_object_id, object_name, TEXT FROM dm_document SEARCH TOPIC 'Blue Fish';
TEXTPAGE Returns the number of the page in PDF content that contains the word(s) in the search criteria.

This keyword is typically used with the OFFSET keyword for use by Adobe’s web-highlighting plug-in.
SELECT r_object_id, object_name, TEXTPAGE, OFFSET FROM dm_document SEARCH TOPIC 'Blue Fish';

Verity Syntax Rules

  • With the exception of AND, OR, and NOT you must use angle brackets (<>) around all Verity operators. This allows Verity to know that you are using the operator rather than just trying to search for the word in your document.
  • Special characters such as < and > must be escaped with a backslash () if you want to search for them in a query.
  • If a mixed case entry it passed to a query, case sensitivity is applied to the search.
  • When using topic searches, the entire verity search condition string after SEARCH TOPIC must be enclosed in single quotes (”)
  • AND, OR, or NOT must be within double quotes (“”) if they are included within a search phrase. This allows Verity to know you are using the term and not the operator.
  • Stemming is the default behavior, to turn this off put double quotes (“”) around each search term to make it a phrase.

Advanced Features

Verity contains a set of “style” files that may be configured to enable advanced search capabilities. These styles can be set through the SETSTYLE_FTINDEX method. The styles supported by Verity are:

  • Thesaurus (syd) – Allows a user to search for synonyms. The default thesaurus file is located in %Documentum%fulltextverity271common[locale]vdk20.syd. Custom style files can be created through the Verity mksyd command tool.
  • Stop Word File (stp) – Defines words that should not be included in the index. The default stop file stops indexing of all single character words and common prepositions, articles, adverbs, and conjunctions. The default stp file is located %Documentum%fulltextverity271common[locale]dm_default.stp.
  • Lex File (lex) – Defines what constitutes word breaks in text by defining which non-alphanumeric characters may be included in the index.
  • Universal Style file (uni) – Defines the filters used to index the indexable formats. The embedded version of Verity is utilizing Verity’s 7.3.1 filters. If a new indexable format is added you must create a custom style.uni file and reset the index. The default style.uni file is located in %Documentum%fulltextverity271commonstylestyle.uni
  • Zone file (zon) – Determines which section of the content is searched. Zone files are applicable to files with a structured format such as XML.
  • Topics – Groups topics related to concept. This is hierarchial relation of topics (i.e. parent -> child). When a topic is searched all objects that are contained with the topic and subtopics are returned. The default installation does not provide any topic trees. If a custom topic file is created you must run the Verity mktopics utility to convert the custom topic text file into the verity topic format.

Frequently Asked Questions

  1. How do you create a custom Verity Synonym File?

    The default synonym file (%Documentum%fulltextverity271common[locale]vdk20.syd) is in binary format. The following are the steps required to modify the file and update the full-text index with the changes:

    • Ensure %Documentum%fulltextverity271_nti40bin is set in the Classpath
    • Convert the binary file to a text file:

      Mksyd -dump -syd vdk20.syd -f text filename

    • Update the text file (i.e list: “US,United States, USA”)
    • Convert the text file back to a binary file:

      Mksyd -f text filename -syd new syd filename

    • Run SETSTYLE_FTINDEX method setting the new syd file for your index:

      EXECUTE SETSTYLE_FTINDEX with name = 'filestore_01', verity_style = 'syd', server_path = 'your new syd file with path', locale='your locale2';

    • Set the dm_FullTextMgr and dm_CleanFTIndex jobs to inactive
    • Run RESET_FTINDEX method:

      EXECUTE RESET_FTINDEX with name = 'filestore_01';

    • Run UPDATE_FTINDEX method:

      EXECUTE UPDATE_FTINDEX with name = 'filestore_01', batch_size = '10000';

    • Reactivate the dm_FullTextMgr and dm_CleanFTIndex jobs
  2. How can you determine the indexable file formats?

    Indexable formats have an attribute can_index = true. Run the following query to list all indexable formats:

    select r_object_id, name from dm_format where can_index = true;

  3. How can an indexed attribute be dropped from an object type?

    Before an indexed attribute may be dropped you must first drop the attributes full-text index. The following are the steps required to drop the attribute:

    • Set the dm_FullTextMgr and dm_CleanFTIndex jobs to inactive
    • Drop the attributes full-text index:

      ALTER TYPE object_name DROP_FTINDEX on attribute_name

    • Drop the attribute from the object

      ALTER TYPE object_name DROP attribute_name

    • Run RESET_FTINDEX method

      EXECUTE RESET_FTINDEX with name = 'filestore_01';

    • Run UPDATE_FTINDEX method

      EXECUTE UPDATE_FTINDEX with name = 'filestore_01', batch_size = '10000'

    • Reactivate the dm_FullTextMgr and dm_CleanFTIndex jobs
  4. What are the wildcards in Verity?
    • ? – Specifies one alphanumeric character. When the question mark is used,<WILDCARD>is unnecessary.
    • * – Specifies zero or more alphanumeric characters. When the asterisk is used, <WILDCARD> is unnecessary.
    • ] – Specifies one of any characters in a set. You must enclose the word that includes a set in back quotes (‘), and there can be no spaces in a set. <WILDCARD> ‘c[au]t’ finds cat and cut.
    • { } – Specifies one of each pattern separated by a comma.You must enclose the word that includes a pattern in back quotes {‘}, and there can be no spaces in a set.
      <WILDCARD> ‘cat{s,er}’ finds cats and cater.
    • [^ ] – Specifies characters excluded from the set. The caret (^) must be the first character after the left bracket ([) that introduces a set.
      <WILDCARD> ‘l[^ai]p’ excludes lap and lip.
    • [ - ] – Specifies a range of characters in a set.
      <WILDCARD> ‘c[a-u]t’ finds every three-letter word from “cat” to”cut”.



Full Text Administrative Methods

If you add a string attribute to an object type that already has indexed instances you must reset and update your index before the new attribute will be indexed. The following list explains the administrative methods for managing and creating full-text indexes:

ADOPT_FTINDEX

Sets a standby full-text index as a storage area’s current index

  • name: name of the standy index. Use the name of the index’s full-text index object
                EXECUTE ADOPT_FTINDEX with name = 'standy_index1'
            

CHECK_FTINDEX

Check the update status for a full text index

  • name: index that is being checked
  • server_path: Optional. If this argument is set, the method copies the status file to the directory. The path must be visible to the server.
                EXECUTE CHECK_FTINDEX with name = 'filestore_01', server_path = 'D:/Verity/StatusFiles'
            

CLEAN_FTINDEX

Clean up temporary files from full text indexing. Merges partitions within the full text collections. This makes it easier for the Verity Full Text engine to open the collections which enhances query performance. Before executing this method set the dm_FullTextMgr and dm_CleanFTIndex jobs to inactive and an update, clean or reset operation is NOT being executed against the index. This method generates mkvdk log (status_optimize) files which are stored in the full-text working directory.

  • name: name of the index being cleaned
                EXECUTE CLEAN_FTINDEX with name = 'filestore_01'
            

CLEAR_PENDING

Mark the content that participated in a specified update as non-indexable to recover from index failures. This method is used to recover from index updates that do not complete successfully. Before executing this method set the dm_FullTextMgr and dm_CleanFTIndex jobs to inactive and ensure that an no other full-text methods are being executed against the index.

  • name: name of the index being cleared
                EXECUTE CLEAR_PENDING with name = 'filestore_01'
            

DUMP_FTINDEX

Dump a full text index for shadow indexing. It is recommended to run a CLEAN_FTINDEX before executing this method.

  • name: name of the index being dumped
  • file: name of resulting dump file. This must be the full path.
                EXECUTE DUMP_FTINDEX with name = 'filestore_01', file = 'D:/Verity/DumpFiles/dump.txt'
            

ESTIMATE_SEARCH

Estimate the number of objects that will be returned from a full text search.(>= 5.1 Server only)

  • name: name of the index being searched
  • type: object_type to be searched
  • query: word or phrase being searched
                EXECUTE ESTIMATE_SEARCH with name = 'filestore_01', type = 'dm_document', query = 'Documentum'
            

GET_FTINDEX_SIZE

Get the current size (in bytes) of a full text index. This is useful to determine the amount of disk space an index is using.

  • name: name of the index being checked
                EXECUTE GET_FTINDEX_SIZE with name = 'filestore_01'
            

GETSTYLE_FTINDEX

Retrieve the topic set associated with a particular index or the style file used to index a particular document.

  • name: name of index (i.e. filestore_01)
  • verity_style: topic_trees, syd (thesaurus file), lex (lex file) stp (stop file), uni (style.uni file), zon (zone file)
  • sample_object: (Optional) Document whose style file you want to retrieve
  • server_path: specifies where to copy the retrieve style file. The path must be visible to the server.
                EXECUTE GETSTYLE_FTINDEX with name = 'filestore_01', verity_style = 'syd', sample_object = 'veritySampleFile', server_path = 'D:/Verity/StyleFiles'
            

LOAD_FTINDEX

Load a dumped full text index as a shadow index.

The file must be one created with the DUMP_FTINDEX method.

  • name: name of index where the shadow index will be loaded
  • file: the full path to the dump file
                EXECUTE LOAD_FTINDEX with name = 'filestore_01', file = 'D:/Verity/DumpFiles/dump.txt'
            

MARK_ALL

Mark all content for full text indexing whose associated object has a_full_text = TRUE. This is a backup method to catch documents who did not get indexed. Before executing this method set the dm_FullTextMgr and dm_CleanFTIndex jobs to inactive and ensure that an no other full-text methods are being executed against the index. Messages are logged to %DOCUMENTUM%dbalogmarkall_[Docbase_name].log

  • trace: true or false (setting is false by default)
                EXECUTE MARK_ALL with trace = true
            

MARK_FOR_RETRY

Mark all content with a specified negative update_count value as awaiting indexing. This method is used as a recovery procedure after an UPDATE_FTINDEX is unsuccessful. Before executing this method set the dm_FullTextMgr and dm_CleanFTIndex jobs to inactive and ensure that an no other full-text methods are being executed against the index.

  • name: index that contains the objects to mark
  • update_count: specify as a positive integer the update_count value that represents the update operation that was not successful. This is the r_update_count value stored in the dm_fulltext_index object.
                EXECUTE MARK_FOR_RETRY with name = 'filestore_01', update_count  = 5
            

MODIFY_TRACE

Change full text index tracing. This is not used for the MARK_ALL method.

  • subsystem: full-text – this indicates you want to turn on tracing for the full-text indexing operations
  • value: none (turn off tracing), Documentum (log only Content Server messages), verity (log only Verity FTE messages), or all (log all messages)
                EXECUTE MODIFY_TRACE with subsystem = 'full-text', value = 'all'
            

RESET_FTINDEX

Reinitialize a full text index. This method is used to reset the index after new indexable attributes have been added to an object or after a new style has been set that needs to be applied to all content. After running this method run UPDATE_FTINDEX to recreate the index. Before executing this method set the dm_FullTextMgr and dm_CleanFTIndex jobs to inactive and ensure that an no other full-text methods are being executed against the index.

  • name: index to that is being reset
                EXECUTE RESET_FTINDEX with name = 'filestore_01'
            

RMSTYLE_FTINDEX

Remove a user defined topic set or style file from an index. The FTE requires a thesaurus and a stop file so it will not let a user remove these if there is not a backup.

  • name: index to that style is being removed from
  • install_loc: name of location file which points to the verity installation location. The default is the verity_location attribute in the Docbase config
  • locale: name of the verity locale. Must be used if verity_style is set to syd.
                EXECUTE RMSTYLE_FTINDEX with name = 'filestore_01', locale='en'
            

SETSTYLE_FTINDEX

Sets a specified installation style file. If using a custom style it is recommended to set the custom file upon creation of the Docbase. A new style is only applied to indexed created after the style file is added to the system. Existing indexes are not affected by the new style file unless they are recreated.

  • name:index to associate with topic set/style
  • verity_style: syd (thesaurus file), lex (lex file) stp (stop file), uni (style.uni file), zon (zone file)
  • server_path:location of the topic set or style file to add
  • install_loc:name of location file which points to the verity installation location. The default is the verity_location attribute in the Docbase config
  • locale: name of the verity locale. Must be used if verity_style is set to syd.
                EXECUTE SETSTYLE_FTINDEX with name = 'filestore_01', verity_style = 'syd', server_path = 'D:/verityFiles/vdk20.syd', locale='en'
            

UPDATE_FTINDEX

Update a full text index. This method can be execute on searchable or standy indexes. The method executes in 2 phases; Phase 1 creates the batch files for indexing and Phase 2 goes through each batch file and indexes the content files in each batch. Both phase must be complete to update an index. Before executing this method set the dm_FullTextMgr and dm_CleanFTIndex jobs to inactive and ensure that no other full-text methods are being executed against the index. Log files are automatically generated and stored in the full-text working directory (dm_ftwork_dir). The files are names status and status.last. If MODIFY_TRACE has been set another log file is stored under %Documentum%dbalogfulltext.

  • name:index to recreate
  • phase: 1(only first phase is updated), 2 (only second phase is updated). If not included, both phases are updated.
  • batch_size:specifies how much content to include in each batch. This only applies to the first phase operation. Do not set this argument above 20,000
  • max_docs:specifies approximate number of documents to be indexed. This only applies to the first phase operation. If you included this attribute DO NOT set date_cutoff.
  • date_cutoff: defines upper time boundary for choosing documents to index. Documents modified after the date specified are not included in the update. You cannot specify a date later than the current time. This only applies to the first phase operation. If you included this attribute DO NOT set max_docs.
                EXECUTE UPDATE_FTINDEX with name = 'filestore_01', batch_size = '10000'
            

dm_CleanFTIndex

Job which executes the CLEAN_FTINDEX method

dm_FulltextMgr

Job which executes the UPDATE_FTINDEX method on a regular basis to update the full-text index as objects change in the Docbase.