If you are facing the challenge of moving a massive – or even just a respectable – amount of content and metadata into an Enterprise Content Management system or from one system to another, you immediately face two project-threatening risks:
- Being seduced by the myopic promises of technology-driven solutions
- Walking off the end of the plank in blissful ignorance
Either extreme can be equally hazardous, as can all the points in between. At Blue Fish Development Group, we are very proud of the depth, breadth, and flexibility of our migration tool set and the technical savvy of our staff. However, some of the biggest challenges in any migration effort have nothing to do with technology. They lie dormant in the world of the business users, and they don’t start creating havoc until someone or some migration process begins to poke at them.
The first step in deciding on the best approach for your migration is to determine whether it will be a concentrated, all-at-once effort confined to a relatively short period of time, or will you perform the migration in stages or batches over a longer period of time. This decision should be derived by addressing each of the following considerations:
- Are you migrating content and related metadata, technology, or both?
- Are application layer changes required?
- Will your business permit a period of downtime or a system freeze?
- How large is your user base and what level of training is required on the new system?
A brief discussion of each of these considerations follows.
Content, technology, or both?
What appears to be a single migration effort may actually consist of multiple migrations. Are you upgrading to a newer version of one or more Documentum components? (Documentum refers to this to as an “in-place upgrade.”) Are you primarily moving content from one repository to another, or possibly into a repository for the first time? Or are you doing both simultaneously? These questions identify the types of resources required to perform the migration tasks and provide an indication of the complexity and coordination that will be required.
Are application layer changes required?
Do the source and target repositories use an out-of-the-box Documentum user interface, such as Webtop? If so, then you only need to confirm that the versions deployed on the source and target repositories are compatible.
However, if the source repository has a customized application layer that will be also used on the target repository, then several factors come into play. Is the application layer built on the Web Developer Kit (WDK) framework or another development platform? Will it be enhanced or otherwise modified for the target repository? Addressing these questions will identify additional resource requirements and related skill sets. It will also identify schedule dependencies and facilitate the coordination of migration activities with application development and testing activities.
What about downtime?
Will your business allow downtime or a system freeze during which the content and metadata are migrated from the source(s) to the target repository? Or will 24/7 access or other demands require that the migration be transparent and seamless? Maintaining system availability is certainly achievable, but it generally carries additional costs in terms of resources – both human and computer – and may require splitting the content into smaller batches for incremental migration.
Downtime is generally a good thing, allowing those responsible for the migration to focus on the task at hand and removing the distractions of system availability. However, the apparent luxury of downtime can be a doubled-edged sword: It sounds comforting to know that you can take the system(s) offline while performing the migration, but the pressure of completing the effort within the specified period creates its own set of stresses and requires contingency planning.
Somewhere in the middle of 100% uptime and a specified period of downtime is the concept of a system freeze. Users have read-only access to the system for existing content, but they are not allowed to edit that content or create new content during the freeze period. Another approach based on compromise is to divide the content to be migrated into critical (must remain accessible during the migration) and non-critical (can be unavailable for a period of time).
There is no one approach that fits every migration initiative. The key is evaluating the options and crafting an approach that is a win-win scenario for the consumers of the content and the IT resources responsible for performing the migration.
User base and training
How will your user base be affected by the migration? Changes that can impact them include:
- Migrating to a newer version of Documentum and utilizing new features
- Making changes to the application layer
- Changing the object model, adding new document types, or combining existing document types
- Using a different cabinet/folder structure
- Enriching or transforming the metadata during the migration so that data is in different locations or named differently
If the impact on your user base is minimal, then training probably is not an issue. However, if the changes require training, then you will need to create the appropriate training plan and coordinate it with migration and rollout activities.
The size of your user base also factors into the equation. The more users you have, the more impact even small changes will have and you will likely end up training the users in groups over a more extended period of time.
Coordinating the requirements of each of these considerations will shape the migration approach that is right for your business. Once you have determined to follow a big bang approach, a drip feed, or a combination, you’ll be ready to move on to other high-level considerations.
In a perfect world, the metadata in the source repositories would be complete and consistent. However, experience has taught us that this is the single biggest risk in a successful migration. The effects of “bad data” can be frustrating and costly. A successful migration plan includes an up-front effort to ascertain the condition of the metadata and validate – or reset – assumptions about the state of that data.
The common issues fall into three categories:
- Consistency – Are the attribute values consistently applied within a repository and, if applicable, across multiple repositories? For example, do all date fields follow a consistent format, such as dd/mm/yyyy? Or are some fields mm/dd/yyyy or mm/dd/yy? Is capitalization consistent across all values in an attribute, or will upper- and lowercase letters require manipulation to make them consistent?
- Completeness – If a migration step or a downstream repository process depends on a valid attribute value, does every document slated for migration contain a valid value for the required attribute? For example, if the current_document_owner attribute requires a valid system user, are all values for this attribute in the source repository also valid system users in the target repository? If not, there will be an error when importing the document. A second example shows the negative effects of missing data on a downstream repository process: If each document imported into the target repository will be assigned a state based on a combination of the version_number and date_modified attributes from the source, then a missing value in either of these attributes will cause the assignment of a state to fail.
- Corruption – Broken references or otherwise “corrupted” metadata in a source repository can make exporting the data difficult or even impossible. Corruption can be caused by administrative neglect (like not running periodic cleanup jobs) or bugs within the repository system itself. For example, an object may reference a nonexistent object, access control list or user. Business users are often not aware that they have corrupted source metadata until they try to export it. Depending on the extent of the data corruption, it may be necessary to fix manually the problems before attempting an export.
Determining the state of the metadata early in the migration effort allows you to plan appropriately for any data scrubbing, manipulation, or transformation that is required. Discovering these issues later in the process creates havoc with schedules, taxes resources, and generates unnecessary frustration.
Every content migration has two main business user activities:
- Moving, transforming, or enriching the metadata
- Validating the migrated content and metadata
These activities are discussed in detail below.
Moving the metadata
Think in terms of a spectrum of activity. If the target has no new attributes, there will be no need to add values; therefore, moving the metadata from the source to the target repository may be a straightforward mapping of an existing attribute from source attribute 123 to target attribute 456. In this scenario, the business users would typically approve the mapping proposed by the database administrators or other technical resources.
Moving toward the middle of the spectrum, the target might have a small number of attributes that don’t exist in the source. The business users may be able to derive rules for assigning the appropriate attribute value for each piece of content. For example, if the document came from a source file system folder called “Procedures,” then populate the new target attribute named legacy_location with the value “Procedures.” It may not be possible to create rules for populating all new attributes, and so the users may be required to enrich the metadata by hand. This is generally facilitated through an enrichment tool. Each row in the spreadsheet or database table represents the metadata for one document and the business user enters the appropriate value in the column that is designated for the new attribute.
At the other end of the spectrum, the mapping of source attributes to target attributes may be complex and require combining multiple source attributes into one target attribute, parsing source attribute values and splitting them into separate target attributes, or providing appropriate values for new attributes that have no relationship to the source repository. Again, much of this may be automated through rules defined by the business users, and a complex migration often requires a combination of automation with business user validation along with manual entry of attribute values. The validation and manual entry activities can be managed through enrichment tools as previously mentioned, but there is the potential for a significant effort on the part of the business users and this must be planned appropriately.
Validation generally occurs at two key points in the migration process:
- Pre-import validation, when business users review attribute values that were derived through automated rules or procedures. This generally consists of sampling derived values but it may mean a complete validation of every value depending on the migration requirements.
- Post-import validation, during which business users review content and attributes after they are imported into the target repository. Again, the extent of this effort ranges from sampling to complete validation.
As with moving the metadata, the validation activities are facilitated and managed through applications similar to enrichment tools. But live humans must perform these activities at specified points in time during the migration process.
Having the right resources at the right time
The key question is: Will you have sufficient business resources to perform these activities? In this case, “sufficient” means you can answer yes to all of the following questions:
- Do your business resources have the required domain knowledge, including an understanding of relevant business rules and procedures?
- Do they have a thorough understanding of the source repository and how business rules are applied or enforced by it?
- Are they knowledgeable about the target repository and how business rules are applied or enforced by it?
- Do they have a reasonable level of competency with the selected enrichment and validation tools?
- Are they available for the duration and within the time frame required by the migration plan?
Identifying the correct resources and planning to have them available when their services are needed are key planning elements in a successful migration. In the end, the business users must have confidence in the migrated content, and the best way to earn that confidence is to have them participate in these activities.
Will you bring all versions of each piece of content into the target repository or just the current version? Or is another compromise in order, such as migrating the current and up to three previous versions? Answering this series of questions depends on many factors, including but not limited to:
- How will the migrated content be used?
- Will the source repository still be available or is it being decommissioned?
- Is there a requirement to trace the revision trail? If so, how far back?
Having the right conversations with the right participants early in the process will allow you to determine the most cost-effective approach.
You can choose to migrate all or some of your PDF and HTML renditions along with the base content, or you can utilize Documentum’s Content Rendition Services to create the renditions once the documents are imported into the target repository. There are advantages and disadvantages to both approaches, so again it means evaluating your migration requirements and determining the best approach.
If you are in a regulated industry, there are certain things you must do in the regular course of business as well as things you must not do. In short, what kind of information must you keep to make the auditor happy? These regulatory requirements must be included when planning a migration effort, which means resources who know the compliance aspects of your business should be part of the migration planning team.
One area of compliance that merits special mention is signed documents. It is unlikely that the appropriate signatories are going to want to re-sign large numbers of migrated documents, and this may also be a violation of regulatory requirements. So preserving the signature trail during the migration will be critical.
As you work through the considerations discussed above and shape the correct migration approach for your business, you’ll identify risks for which contingencies must be put in place. The types of contingencies and the extent of the related plans will depend on the complexity of your migration. Two common candidates are resources and schedule. Unfortunately, migrating content is often overlooked, or gets lost in the noise and fanfare of planning and launching a new systems initiative. Even if you are paying close attention to the requirements of content migration, you will still want to build in appropriate amounts of contingency for both resources and schedule so you will be able to absorb the impact of surprises.
Are you looking for a turnkey solution or do you want knowledge transfer for future migration efforts?
Your migration initiative may be an onetime effort with no plans for subsequent migrations on the same or other repositories. If this is the case, you may be well served by outsourcing the technical aspects of the migration. However, keep in mind that there will almost always be the need for participation from knowledgeable business users when moving and verifying the metadata as discussed earlier.
If, however, you have plans for additional or ongoing migration work, you should consider having some of your IT staff participate in the initial migration initiatives to learn how to use and extend the migration tool set and to learn best practices. Building knowledge transfer into the migration activities can be both efficient and economical, and produce a significant ROI.
Source repository (or source): The framework or system in which the content to be migrated currently resides. This can be a Documentum repository, another content management repository, a file system, a Microsoft Outlook .pst file, or basically any system for organizing content.
Target repository (or target): The new repository to which the content will be migrated.
Metadata (or attribute): The information about each piece of content. Examples of metadata include author, date created, version, and document type.
Enrichment: Activities performed on metadata while it is in transit between the source and target repositories. Examples include adding values for new metadata that does not exist in the source repository and selecting appropriate values based on the data dictionary of the target repository.
Enrichment tools: Applications, such as Microsoft Excel and Access, that provide a consistent and controlled interface to the metadata and are readily used by the business user community.