Failover is the ability of a system to manage severe failures in a component that would normally lead to system shutdown by switching to an alternate standby component. This article, the first part of a two part series that discusses how to provide failover support at the eContent server layer using Microsoft Cluster Service. The second part of this series discusses the options available for providing failover support at each layer of a Documentum Solution.
A cluster comprises of two or more computers organized to work together in such a manner that if one of the computer fails the resources and user load can be redirected to the other computes in the cluster. The cluster failover mechanism ensures that systems continue to be available even in the event of failure and can therefore provide availability of the system in the event of application or service failures, system or hardware failures and site failures.
Conceptually and at the simplest level, setting up failover at the eContent Server layer is a fairly straightforward. Consider a single docbase that can be accessed by more than one eContent Server processes as shown in the following diagram.
Basic Two Server Cluster
This configuration provides the basic redundancy needed to provide failover at the eContent Server level i.e. should the machines running Primary eContent Server fail, it would be possible to switch the system to the secondary content server. There are two missing elements before the above configuration could become a true failover configuration:
Microsoft Cluster Service provides both the missing elements allowing for a true failover solution; the cluster pair shown above is exposed to clients as a single virtual server. This virtual server is known to the clients through a virtual IP address and host name – to users and clients, connecting to an application or service running as a clustered virtual server appears to be the same process as connecting to a single, physical server. The cluster management layer detects failover and shifts the docbase resources from the primary eContent server to the secondary eContent Server.
It is important to note that failover is not the same as fault-tolerance. During failover the client applications will need to reconnect back the eContent Server and any un-saved work will be lost. In the absence of fault-tolerance capability in the client application even failover can lead to loss of data and/or inconsistent data. Alternate strategies not discussed in this article are available for building fault tolerance in the application.
The Microsoft Cluster Server (MSCS) provides infrastructure on which eContent Server failover solutions can be implemented in the Windows environment.
In MSCS, a cluster comprises of a number of individual computers each of which is called the node. Cluster resources are hardware and software components that are managed by the cluster service. Resources can be grouped into resource groups, a collection of resources managed by the cluster service as a single unit.
MSCS clusters implement a shared-nothing model of cluster architecture this means that any devices common to the cluster (such as docbases) can only be owned by one server (node) at one time. Nodes in a cluster can be either active or standby; the shared-nothing model means that each active node does not share any resources with other nodes for example two active eContent server nodes would not be able to share the same docbase. However when an active node fails its resources are transferred to one or more of the remaining active or standby nodes. For example the docbase resource of a failed active node can be transferred to a standby node. In this case one active server and one passive server can share a single docbase.
Using the active and passive node configurations it is possible to implement eContent server clustering in two ways using an active / passive cluster and an active / active cluster.
An active/passive cluster consists of two nodes each of which has an eContent server setup to point to the same docbase. When the cluster is started the eContent server processes is running on the active node is tuned off on the passive node. When an active node in such a cluster fails the eContent server on the passive node is started, providing failover. Since only one of the nodes is active at any one time there is no sharing of resources amongst active clusters.
An active/active configuration has two nodes each of which has an active eContentServer process controlling a different docbase. In the event of a failure each a second process can be started on the remaining node to take over the docbase of the node that failed.
The first step to creating an active / passive eContent server configuration is to setup the MSCS. The details of how to install MSCS are outside the purview of this article (refer to Microsoft documentation for details). This section assumes that you are familiar with the basic docbase installation process. Refer to the eContentServer installation manual for specific details of the install process including the details of any other pre-installation tasks that need to be carried out.
Setting up of an eContent server on a cluster begins with the creation of a resource group that defines the shared resources between the two nodes on the cluster. There are three resources in this group:
(This configuration does not include the actual DBMS, which is assumed for the purpose of this configuration to reside elsewhere. The DBMS may in itself require clustering to provide failover; and while the principles are quite the same the failover configuration of databases requires to be treated in its own right.)
The resource group configuration, and the values with which the eContent server will be configured are shown in the following diagram:
Active Passive Configuration
Before we proceed with the step of installing the content server a few points are in order:
The eContent Server setup on the cluster is performed in the following sequence:
The above completes the eContent Server setup process on the eContent server cluster. However, the cluster configuration is not yet complete, the newly created eContentServer and DocBroker processes now need to be added to the shared resource group.
These tasks are completed using MSCS Cluster administrator tool. New resources now need to be created for both the DocBroker and the eContent Server services and added to the resource group managed by the cluster.
The following table summarizes the configuration information to be used for configuring the two services. The Doc Broker service needs to be installed before the Doc Base Service as the Doc Base service depends on the Doc Broker. Each service should be brought online as soon as it has been created.
|Parameter||Doc Broker Service||Doc Base Service|
|Name||DocBroker (or a name of your choice)||Doc base Name> Docbase (or a name of your choice)|
|Resource Type||Generic Service||Generic Service|
|Possible Owners||add both nodes as possible owners||add both nodes as possible owners|
Once the configuration has been created and the services created the next step is to test the failover.
An active-active cluster is configured very similarly to the active-passive cluster. The primary difference being that this configuration comprises of two docbase and therefore there are two resource groups that need to be managed.
The process for creating an active / active server cluster can proceed as the install of two active-passive clusters with one exception. The first node is the active cluster for the first docbase and the second node is the active cluster for the second docbase.
The exception in this configuration relates to how to setup the second DocBroker on each of the machines. Only a single docbase process can run on each node, therefore when the second docbase is being setup, use the Remote DocBroker option and specify the host name associated with the first docbase as the remote host name.
MSCS cluster services can be used to provide failover capabilities for the eContentServer and the DocBroker processes. This can be useful when managing mission critical and other applications that require high availability. Similar clustering options are also available on Unix platforms.
As we discussed in the introduction failover at the eContentServer and DocBroker processes is only one part of the high availability implementations. Based on your application failover capabilities may also need to be provided at the following layers: