Home > Resources > Articles > Custom ECM Solutions > Using Server Clusters for Failover

Using Server Clusters for Failover

November 14, 2004 by Blue Fish Development Group

Failover is the ability of a system to manage severe failures in a component that would normally lead to system shutdown by switching to an alternate standby component. This article, the first part of a two part series that discusses how to provide failover support at the eContent server layer using Microsoft Cluster Service. The second part of this series discusses the options available for providing failover support at each layer of a Documentum Solution.

Introduction

A cluster comprises of two or more computers organized to work together in such a manner that if one of the computer fails the resources and user load can be redirected to the other computes in the cluster. The cluster failover mechanism ensures that systems continue to be available even in the event of failure and can therefore provide availability of the system in the event of application or service failures, system or hardware failures and site failures.

Conceptually and at the simplest level, setting up failover at the eContent Server layer is a fairly straightforward. Consider a single docbase that can be accessed by more than one eContent Server processes as shown in the following diagram.

Fig. 1


Figure 1:
Basic Two Server Cluster

This configuration provides the basic redundancy needed to provide failover at the eContent Server level i.e. should the machines running Primary eContent Server fail, it would be possible to switch the system to the secondary content server. There are two missing elements before the above configuration could become a true failover configuration:

  1. First, a means by which the switchover to the secondary server could happen automatically; and
  2. Second, a means by which the client applications can be redirected to the secondary server.

Microsoft Cluster Service provides both the missing elements allowing for a true failover solution; the cluster pair shown above is exposed to clients as a single virtual server. This virtual server is known to the clients through a virtual IP address and host name – to users and clients, connecting to an application or service running as a clustered virtual server appears to be the same process as connecting to a single, physical server. The cluster management layer detects failover and shifts the docbase resources from the primary eContent server to the secondary eContent Server.

It is important to note that failover is not the same as fault-tolerance. During failover the client applications will need to reconnect back the eContent Server and any un-saved work will be lost. In the absence of fault-tolerance capability in the client application even failover can lead to loss of data and/or inconsistent data. Alternate strategies not discussed in this article are available for building fault tolerance in the application.

Clustering in the Windows Environment

The Microsoft Cluster Server (MSCS) provides infrastructure on which eContent Server failover solutions can be implemented in the Windows environment.

In MSCS, a cluster comprises of a number of individual computers each of which is called the node. Cluster resources are hardware and software components that are managed by the cluster service. Resources can be grouped into resource groups, a collection of resources managed by the cluster service as a single unit.

MSCS clusters implement a shared-nothing model of cluster architecture this means that any devices common to the cluster (such as docbases) can only be owned by one server (node) at one time. Nodes in a cluster can be either active or standby; the shared-nothing model means that each active node does not share any resources with other nodes for example two active eContent server nodes would not be able to share the same docbase. However when an active node fails its resources are transferred to one or more of the remaining active or standby nodes. For example the docbase resource of a failed active node can be transferred to a standby node. In this case one active server and one passive server can share a single docbase.

Using the active and passive node configurations it is possible to implement eContent server clustering in two ways using an active / passive cluster and an active / active cluster.

Fig. 2


Figure 2:
Cluster Options

An active/passive cluster consists of two nodes each of which has an eContent server setup to point to the same docbase. When the cluster is started the eContent server processes is running on the active node is tuned off on the passive node. When an active node in such a cluster fails the eContent server on the passive node is started, providing failover. Since only one of the nodes is active at any one time there is no sharing of resources amongst active clusters.

An active/active configuration has two nodes each of which has an active eContentServer process controlling a different docbase. In the event of a failure each a second process can be started on the remaining node to take over the docbase of the node that failed.

Configuring an Active / Passive Cluster

The first step to creating an active / passive eContent server configuration is to setup the MSCS. The details of how to install MSCS are outside the purview of this article (refer to Microsoft documentation for details). This section assumes that you are familiar with the basic docbase installation process. Refer to the eContentServer installation manual for specific details of the install process including the details of any other pre-installation tasks that need to be carried out.

Setting up of an eContent server on a cluster begins with the creation of a resource group that defines the shared resources between the two nodes on the cluster. There are three resources in this group:

  • A virtual IP address – the address used by the Content Server and the DocBroker (the DocBroker does not have to run on the same node or share the same address this configuration assumes that they run on the same node and use the same IP address)
  • A virtual network hostname – the name associated with this IP address
  • A shared disk partition – the location where the docbase content will be stored.

(This configuration does not include the actual DBMS, which is assumed for the purpose of this configuration to reside elsewhere. The DBMS may in itself require clustering to provide failover; and while the principles are quite the same the failover configuration of databases requires to be treated in its own right.)

The resource group configuration, and the values with which the eContent server will be configured are shown in the following diagram:

Fig. 3


Figure 3:
Active Passive Configuration

Before we proceed with the step of installing the content server a few points are in order:

  • The content server and the doc broker processes on both the machines are for manual start. This is required as these services are started from the MSCS Cluster Administration utility;
  • A single docbase is setup in this configuration;
  • The virtual IP address / host name used to identify the processes both within and outside the cluster.

Installing the Content Server

The eContent Server setup on the cluster is performed in the following sequence:

  • First, the resource group set above is transferred to the first node.
  • Second, Install the docbase with a local DocBroker on the first node using the configuration information shown above. Ensure that all location objects are configured to use the shared drive setup in the resource path.
  • Shutdown the eContent Server and DocBroker processes on the first node and move the resource group to the second node.
  • Start the install of eContentServer on the second node with the same docbase name and id. There are some key differences between the install on the first node and this install.
    • This install must be done with the existing DBMS account name option selected and the same configuration information used in the first setup used here.
    • The docbase configuration scripts are all run when the first docbase is installed and do not need to be run here again.
  • Shutdown the eContent Server and DocBroker on the second node.
  • The basic configuration on both the nodes needs to be changed as follows:
    • Modify the server.ini on both the nodes and change its doc broker projection target section to the below values
    • Modify the server.ini on both the nodes and change its doc broker projection target section to the below values
    • Modify the DocBroker start-up command (using Edit DocBroker service option of the eContentServer manager console) to add host virtual_host_name option as follows:

The above completes the eContent Server setup process on the eContent server cluster. However, the cluster configuration is not yet complete, the newly created eContentServer and DocBroker processes now need to be added to the shared resource group.

Adding eContentServer and DocBroker to Shared Resources

These tasks are completed using MSCS Cluster administrator tool. New resources now need to be created for both the DocBroker and the eContent Server services and added to the resource group managed by the cluster.

The following table summarizes the configuration information to be used for configuring the two services. The Doc Broker service needs to be installed before the Doc Base Service as the Doc Base service depends on the Doc Broker. Each service should be brought online as soon as it has been created.

Parameter Doc Broker Service Doc Base Service
Name DocBroker (or a name of your choice) Doc base Name> Docbase (or a name of your choice)
Resource Type Generic Service Generic Service
Possible Owners add both nodes as possible owners add both nodes as possible owners
Resource Dependencies
  1. Virtual Network hostname
  1. Shared Disk
  2. Virtual Network hostname
  3. DocBroker
Service Name dmdocbroker

  • Check Use Network Name as computer name
  • No startup parameters needed
DmServer docbaseName

  • Check Use Network Name as computer name
  • No startup parameters needed

Testing Failover

Once the configuration has been created and the services created the next step is to test the failover.

  • Assuming that the services have started use any Documentum client to connect to the docbase and issue a query. The query should return as expected.
  • Use the MSCS administration utility to move the resource group created to the passive node. Wait for the passive node to come online
  • Issue another query from the same client (a re-connect may be required), this query should also come back as required.

Configuring an Active / Active Cluster

An active-active cluster is configured very similarly to the active-passive cluster. The primary difference being that this configuration comprises of two docbase and therefore there are two resource groups that need to be managed.

The process for creating an active / active server cluster can proceed as the install of two active-passive clusters with one exception. The first node is the active cluster for the first docbase and the second node is the active cluster for the second docbase.

The exception in this configuration relates to how to setup the second DocBroker on each of the machines. Only a single docbase process can run on each node, therefore when the second docbase is being setup, use the Remote DocBroker option and specify the host name associated with the first docbase as the remote host name.

In Closing

MSCS cluster services can be used to provide failover capabilities for the eContentServer and the DocBroker processes. This can be useful when managing mission critical and other applications that require high availability. Similar clustering options are also available on Unix platforms.

As we discussed in the introduction failover at the eContentServer and DocBroker processes is only one part of the high availability implementations. Based on your application failover capabilities may also need to be provided at the following layers:

  • The Application Server layer
  • The DBMS layer