Print Header

Related Topics

Related Case Studies

Contact Blue Fish

Blue Fish Development Group
701 Brazos St. #700
Austin, TX 78701
(512) 469-9300

WebCache Intro and Q&A

December 3, 2002 - Article by Chris Wilper

WebCache is a Documentum tool that allows quick access to content and (optionally) it’s associated attributes by storing them on a flat filesystem and an RDBMS, respectively.

Introduction

WebCache is a Documentum tool that allows quick access to content and (optionally) it’s associated attributes by storing them on a flat filesystem and an RDBMS, respectively.

WebCache is intended to be used to allow a website to use content that exists in Documentum, but without the overhead of talking directly to the server. Once you have configured WebCache in your environment, you can write programs for your website that do anything you want with the content and it’s associated attributes.

WebCache consists of two major components:

WebCache Source: The machine on which the docbase resides. This component of WebCache is responsible for sending changed data and attributes to the WebCache target, either at periodic intervals (nightly, daily, hourly), or triggered by the manual invokation of a dm_job.

WebCache Target: This consists of at least one component: A copy of the content from the source Docbase which is to be cached. Alternatively, it may also consist of a Database component (the RDBMS doesn’t have to reside on the same machine), which provides attribute information on the content that has been cached to the filesystem.

This article describes several of the details about Documentum WebCache in a question/answer format.

Where do documents go on the target?

After a WebCache operation, the content files are saved on the filesystem of the WebCache target. The root location of where these are stored can be configured to be anywhere. They are structured in directories, based on the path within the Source docbase.

How, exactly, are attributes stored in the Target database?

All attributes of WebCached content are stored somewhere inside two special tables. The first part of the name of the tables can be configured to whatever you want. In this example, and throughout this document, we will assume WebCache has been configured to name these tables starting with PROPS.

Single-valued attributes are stored in a table called PROPS_S:

                
A_WEBC_URL                                    VARCHAR2(544)
I_CHRONICLE_ID                                VARCHAR2(16)
R_OBJECT_ID                                   VARCHAR2(16)
I_CONTENTS_ID                                 VARCHAR2(16)
OBJECT_NAME                                   VARCHAR2(255)
R_VERSION_LABEL                               VARCHAR2(32)
R_FOLDER_PATH                                 VARCHAR2(255)
I_FULL_FORMAT                                 VARCHAR2(32)
 
            

More columns are added to the single-valued attribute table if you configure WebCache to use additional attributes from your source documents (any attribute can be used, but you must specify which ones)

The A_WEBC_URL column is the unique identifier of the content that is being described. It looks like a path. The A_WEBC_URL is a key into the multi-valued property table, too.

Multi-valued attributes are stored in a table called PROPS_R

                
A_WEBC_URL                                    VARCHAR2(544)
STATES                                        VARCHAR2(32)
 
            

In this example, STATES is the name of a repeating attribute of a custom object type. There will be as many rows in this table for a given A_WEBC_URL as there are STATES for that document.

Things to keep in mind about repeating attributes:

  • The order of repeating attributes is preserved. That means, if you put a bunch of values in a specific order in documentum, you can expect to find them in the same order within the WebCache Target DB’s PROPS_R table.
  • If you have multiple repeating attributes, for each A_WEBC_URL, there will be as many rows as the maximum number of populated attributes. Empty repeating attribute row’s entries are NULL. For example, if you have repeating attributes ABC and XYZ, if, for a certain document you have 3 ABC’s and 10 XYZ’s, there will be seven rows for the associated A_WEBC_URL in which the value for ABC is NULL.

How do you control which documents are published to the cache?

In the WebCache configuration object in the docbase, you define a starting folder, a version, and an effective label (optional) for each WebCache configuration object.

There is no configurable “where” clause. However, if you have Documentum WebPublisher installed, you can publish one document at a time, given it’s objectID. So, you could emulate “where” clause behavior by writing your own program to return a list of content in documentum, then call this special WebPublisher publish method for each other those object.

WebCache optionally pays attention to the a_effective_label, a_effective_date, and a_expiration_date attributes of each document. If a document has an a_effective_label matching the effective label specified in the WebCache configuration object, it will be made available on the target only for the period of time occurring before the a_effective_date and a_expiration_date specified for that document.

Note: Because Documentum WebPublisher uses these special attributes, they shouldn’t be used in conjunction with WebCache if WebPublisher is running on the source docbase.

How does WebCache support multiple renditions?

Multiple formats for objects are supported. In the webcache configuration object in the docbase, you specify which formats should be published when multiples exist.

When multiple formats exist, they are placed in the same directory. The ‘primary’ format’s filename is the object name. Other formats of the same object are named as the object name (minus the extension, if one exists), plus the dos_extension of the format (from dm_format).

For example, say you have a document called testdoc2.txt. In the docbase, here are the relevant attributes:

                
     DM_DOCUMENT table    DM_FORMAT table
       ______|______      ______|______
      |             |    |             |
      |             |    |             |
OBJECT_NAME     I_FULL_FORMAT    DOS_EXTENSION
------------    -------------    -------------
testdoc2.txt    crtext           txt
testdoc2.txt    html             htm

            

Target Database gets a unique A_WEBC_URL entry for each format. The I_FULL_FORMAT value is also propagated to the target DB:

                
A_WEBC_URL                   I_FULL_FORMAT
----------------------       -----------------------
TestFolder/testdoc2.txt      crtext
TestFolder/testdoc2.htm      html

            

The A_WEBC_URL represents the path to the document from the root directory for webcache’s file dumps on the target.

Note: Because documents are given more or less “standard” extensions during the webcache process, if you’re serving them directly from a webserver, the target webserver should deliver them with the correct MIME type. For non-standard extensions, you may need to add those manually to your webserver’s configuration. Relevant MIME type data can be gathered from the the mime_type and dos_extension fields in the docbases’s dm_format table.

Is it possible to publish from two distinct WebCache sources to one target?

Documentum says that this shouldn’t be attempted because files will end up over-writing each other and it will end up being a big mess.

Sometimes it’s desirable to have data and attributes from separate docbases available on the same website.

You could do this by:

  • (for content files) Setting up multiple targets, and pretending they are one target from the webserver site. You can create symbolic links into the target content directories from your webserver, or (a more drastic approach) write a website front-end that doesn’t hit the filesystem based on the URL, but instead takes the request and decides which target webcache file area to retrieve it from.
  • (for attributes) Publish to differently named tables for each of the multiple targets. Set database triggers on these tables which will reflect changes to a master table on-the-fly. This way, you’ll have only one table to query for attributes, instead of two.

What gets copied to the cache when a source document is linked to another folder?

If the links reside under the same webcache configured root source folder, a copy is made on the target for each instance of the document, and for each copy, a set of attributes exists in the Database, if RDBMS functionality is enabled for WebCache.

                
$ ls -la
total 48
drwxr-xr-x   3 /articles/dmin  staff        512 Jun  1 13:57 .
drwxr-xr-x   3 /articles/dmin  staff        512 May 24 15:51 ..
drwxr-xr-x   2 /articles/dmin  staff        512 Jun  1 13:57 InnerFolder
-rw-r--r--   1 /articles/dmin  staff         23 May 24 16:07 testdoc1
-rw-r--r--   1 /articles/dmin  staff      18603 May 24 15:51 testdoc2.htm
-rw-r--r--   1 /articles/dmin  staff         46 May 24 15:51 testdoc2.txt
$ cd InnerFolder
$ ls -la
total 6
drwxr-xr-x   2 /articles/dmin  staff        512 Jun  1 13:57 .
drwxr-xr-x   3 /articles/dmin  staff        512 Jun  1 13:57 ..
-rw-r--r--   1 /articles/dmin  staff         23 Jun  1 13:57 testdoc1

            

Can contentless objects be exported?

Although the documentation states that they can, it is currently not possible (as of WebCache version 4.2). This has been reported as a bug.

A workaround is to attach a 0-byte piece of content to items that don’t have to have content. If a more recent version than 4.2 exists since the publishing of this article, the workaround may not be needed.

Does WebCache copy virtual documents or multiple versions of the same document?

When a document is copied, only one version gets pushed to a given webcache target.

When a virtual document is copied, if all it’s components are present in the to-be-webcached directory, they will be copied, but the parent/child relationships will not.

To get around the virtual document limitation, you could:

  • Not use/rely on virtual documents for your website
    – or –
  • Instead of using the built-in relationship management mechanism for parent/child VDoc relationships, you could use your own attribute (i.e. a new attribute called child_object_ids and/or parent_object_ids)

To get around the multiple versions limitation, you could:

  • Create a job in documentum to split up the versions beforehand, into separate folders. Then do multiple WebCache jobs… one for each source folder.
    – or –
  • Create a job in documentum to split up the versions into different objects beforehand, each with a name that indicates it’s version. Then copy them out into the to-be-webcached folder and do a WebCache job. You should get all versions that way. – or –
  • Create multiple webcache targets for the same source. Each target would be configured to copy a specific version. Of course, this creates the problem that multiple targets aren’t seen as one cohesive set. See the answer to the question “Is it possible to publish from two distinct WebCache sources to one target?” for ideas on making two targets appear as one.
1 Votes | Average: 5 out of 51 Votes | Average: 5 out of 51 Votes | Average: 5 out of 51 Votes | Average: 5 out of 51 Votes | Average: 5 out of 5 (1 votes, average: 5 out of 5)
Loading ... Loading ...

One Comment

Hello,

We are having an issue where the documents even though they have crossed their expiration dates, their attributes are stiil available in RDBMS but the document is successfully getting removed and its lifecycle state in Documentum is chaging to removed state.
We have a publishing configuration created and it has “include contentless properties ” checked, could this be causing the problem?
Please help me out

Sahana NC | March 22nd, 2009 9:09 pm

Comment on this article:

You must be logged in to post a comment.

Notification

Subscribe to our newsletter to be notified when new articles are posted. You can unsubscribe at any time.