Sei sulla pagina 1di 31

A Comparison of HP-UX Disaster Tolerant Solutions

(Formerly titled “Design Consideration for HP-UX Disaster Tolerant Solutions”)

Executive Summary ......................................................................................................................... 5


Section 1: Introduction..................................................................................................................... 6
Target Audience ..................................................................................................................... 6
Purpose of Document............................................................................................................. 6
Section 2: What is a Disaster Tolerance Architecture ............................................................................ 7
Section 3: General Requirements....................................................................................................... 8
Section 4: Cluster File System (CFS) Support........................................................................................ 9
Section 5: Oracle 10g .................................................................................................................... 9
Section 6: DTS and HP’s Virtualization Strategy ................................................................................. 10
Section 7: Types of Disaster Tolerant Clusters .................................................................................... 10
Extended Campus Cluster.................................................................................................... 10
Benefits of Extended Campus Cluster......................................................................................... 11
Limitations of Extended Campus Cluster..................................................................................... 11
Extended Cluster for RAC..................................................................................................... 12
Benefits of Extended Cluster for RAC......................................................................................... 12
Limitations of Extended Cluster for RAC ..................................................................................... 12
Metrocluster ......................................................................................................................... 12
Benefits of Metrocluster........................................................................................................... 14
Limitations of Metrocluster ....................................................................................................... 14
Continentalclusters............................................................................................................... 14
Benefits of Continentalclusters (CC)........................................................................................... 16
Limitations of Continentalclusters .............................................................................................. 17
Comparison of Solutions...................................................................................................... 17
Differences Between Extended Campus Cluster and Metrocluster.................................................... 17
Comparison - All DTS Solutions ................................................................................................ 18
Section 8: Disaster Tolerant Cluster Limitations................................................................................... 21
Section 9: Recommendations .......................................................................................................... 21
Appendix A – DTS Design Considerations......................................................................................... 23
Cluster Arbitration................................................................................................................ 23
Dual cluster lock disks............................................................................................................. 23
Quorum Server in a third location............................................................................................. 23
Arbitrator node(s) in a third location ......................................................................................... 24
Protecting Data through Replication.................................................................................... 25
Off-line Data Replication......................................................................................................... 25
On-line Data Replication ......................................................................................................... 25
Using Alternative Power Sources ........................................................................................ 28
Creating Highly Available Networking............................................................................... 28
Managing a Disaster Tolerant Environment........................................................................ 29
For more information..................................................................................................................... 31
This page is intentionally left blank.

3
Revision History
Printing history
0.1 Review
1.0 Initial publication (written and published by Hue Vu)
1.5 Revised to include following configurations/additions, and distribute
for review:
• Extended Cluster for RAC
• Continentalclusters with RAC
• Continentalclusters with single IP Subnet configuration
2.0 Updated to reflect feedback from review of version 1.5
• Document wording revised, per feedback
• Executive summary enhanced
• New section (General Requirements) added, to precede “Types of
Disaster Tolerant Clusters” section
• For readability Design considerations and implementation attributes
have been moved to an Appendix
• Section 4 (comparison of 4 HP-UX solutions) table reformatted for
usability
• Metrocluster section updated to reflect policy to determine maximum
supported distance
3.0 Second publication of document
3.1 Updated to reflect enhancements to DTS, including
• CFS Support
• Oracle 10g Discussion
• Virtual Server Environment (VSE) Support
• Support of SRDF asynchronous data replication with MC/SRDF

Please send all feedback directly to Deb Alston (deb.alston@hp.com)

4
Executive Summary
In a Serviceguard cluster configuration, high availability is achieved by using redundant hardware to
eliminate single points of failure. This protects the cluster against hardware faults, such as a single
node failure. This architecture, which is typically implemented on one site in a single data center, is
sometimes called a local cluster. For some installations, the level of protection provided by a local
cluster is insufficient for the business. Consider an order-processing center where power outages are
common during harsh weather. Or consider the systems running the stock market, where multiple
system failures, for any reason, have a significant financial impact. For these types of installations,
and many more like them, it is important to guard not only against single points of failure, but against
multiple points of failure (MPOF), or against single massive failures that cause many components to
fail (such as the failure of a data center, an entire site, or a small area).

Creating clusters that are resistant to multiple points of failure or single massive failures requires a
different type of cluster architecture from the local cluster. This architecture is called a disaster tolerant
architecture – often referred to as a Disaster Tolerant Solution (DTS). This architecture provides you
with the ability to fail over automatically to another part of the cluster or manually to a different cluster
after certain disasters. Specifically, the disaster tolerant solution provides appropriate failover in the
case where an entire data center becomes unavailable. HP has a rich portfolio of disaster tolerant
cluster offerings, including Extended Campus Cluster1, Metrocluster, and Continentalclusters. While
each of these solutions has its own characteristics, their common goal is to protect users from a site-
wide outage. To achieve this, the common feature they all implement is multiple data centers with
multiple copies of the user’s data. Effectively, if one data center fails, a second data center is
available to continue processing.

Both Metrocluster and Extended Campus Cluster solutions are single Serviceguard clusters, meaning
an application can automatically fail over from one data center to the other in the event of a failure.
Although similar in nature, these topologies have key differences that provide different levels of
disaster tolerance. For example, a key difference between these two topologies is the method of data
replication used. Metrocluster implements storage-based data replication with one of the following
three storage subsystems
– HP StorageWorks Continuous Access XP (aka Metrocluster/CAXP)
– EMC’s Symmetrix arrays (aka Metrocluster/SRDF)
– HP StorageWorks Continuous Access EVA (aka Metrocluster/CAEVA)
Extended Campus Cluster is a host-based data replication product.

While Extended Campus Cluster spans two data centers up to 100km apart, the distance between
Metrocluster sites is based on the cluster network and data replication link. In a Metrocluster
configuration, maximum distance is the shortest of the distances defined by:
• Cluster network – maximum distance cannot exceed roundtrip cluster heartbeat network latency
requirement of 200ms
• DWDM provider – distance cannot exceed the maximum as specified for the product supplied by
the DWDM provider
• Data replication link – maximum supported distance as stated by the storage partner

The third solution – Continentalclusters – is built on top of two individual Serviceguard clusters, and
uses semi-automatic failover to start up an application on its recovery cluster. When a site fails, the

1
Extended Campus Cluster is also known as “CampusCluster” and “Extended Distance Cluster”. Throughout this document, this configuration will
be referred to as “Extended Campus Cluster”.

5
user is notified, and must initiate a “recovery” process on the secondary site for the affected
applications to be brought up. Continentalclusters has no distance limitation (i.e., it may span very
short to very long distances, implementing both LAN and WAN technologies).

Continentalclusters also supports a configuration with three data centers. In the three data center
configuration, the first two data centers implement Metrocluster. The third data center is a traditional
single data center Serviceguard cluster. This configuration is suited for environments that (may
already) have two data centers implemented, but for business reasons, require a third data center.
Deployment of this configuration is rare. Typically, Continentalclusters is implemented with two data
centers (i.e., two single data center Serviceguard clusters), with semi-automatic failover between data
centers.

From initial observation, the solutions appear to be interchangeable. The key to selecting the
appropriate fit for a customer’s environment is often driven by the customer’s Recovery Time and
Recovery Point Objectives (referred to as RTO and RPO). Customers requiring the least amount of
downtime will require a solution that tightly integrates data currency with application availability. The
best solution for this customer is one that offers automatic failover of the application. On the other
hand, customers who want control over application failover would prefer a solution that allows the
user to decide when an application starts at the recovery site. Please refer to “Section 9:
Recommendations” for guidelines on selecting and recommending a disaster tolerant solution.

Section 1: Introduction
Many decisions have to be made when designing a disaster tolerant solution. These decisions can
have a tremendous impact on the availability of the solution, consistency of the data, and the overall
cost of the solution. This paper discusses the overall disaster tolerant architecture and its general
requirements, solutions that HP currently offers for HP-UX, differences between them, and offers a high-
level design guideline. Architectures discussed include:
• Extended Campus Cluster2
• Extended distance support for Oracle Real Application Server
– In an active/active configuration (Extended Cluster with RAC)
– In an active/standby configuration (Continentalclusters with RAC)
• Metrocluster
• Continentalclusters
Target Audience
This paper is only available internally to HP personnel. It is intended for use by HP’s pre-sales force to
aid in providing recommendations to customers on disaster tolerant solutions.
Purpose of Document
The purpose of this document is two-fold:
• Discuss and compare disaster tolerant cluster solutions that HP currently offers for HP-UX
• Provide recommendations on positioning our products relative to each other, enabling HP Field
personnel to help customers determine the best disaster tolerant solution for their environments

As this document specifically discusses HP-UX solutions, it does not address implementations on
platforms other than HP-UX.

2
Extended Campus Cluster is also known as “CampusCluster” and “Extended Distance Cluster”. Throughout this document, this configuration will
be referred to as “Extended Campus Cluster”.

6
Section 2: What is a Disaster Tolerance Architecture
In a conventional Serviceguard cluster configuration, all components are in a single data center. This
is referred to as a local cluster. High availability is achieved by using redundant hardware to
guard against single points of failure, such as protection against the node failure in Figure 1.

DAT A CENTER

D a ta L A N + H e artbe at H ea rtb ea t

Failo v er

SAN

Figure 1. High Availability Architecture

However, for many types of installations, it is important to guard not only against single points of
failure, but against multiple points of failure (MPOF), or against single massive failures that
cause many components to fail, such as the failure of a data center, of an entire site, or of a small
area. A data center, in the context of disaster recovery, is a physically proximate collection of
servers, storage, network, and power source that can be used to run a business application(s), usually
all in one room. Creating clusters that are resistant to multiple points of failure or single massive
failures requires a different type of cluster architecture called a disaster tolerant architecture.
This architecture provides you with the ability to fail over automatically to another part of the cluster or
manually to a different cluster after certain disasters. Specifically, the disaster tolerant cluster provides
appropriate failover in the case where an entire data center becomes unavailable, as in the sample
configuration in Figure 2.

7
DATA CENTER DATA CENTER
1 D a ta L A N + H e a rt b e a t D a t a L A N +2 H e a r t b e a t

H e a rtb e a t H e a rtb e a t

F a ilo v e r

SAN SAN

Figure 2. Disaster Tolerant Sample Configuration

Section 3: General Requirements


What do customers need in a disaster tolerant solution? As a first step, the customer needs to go
through business impact and risk assessment exercises to understand the application’s availability
requirements. The customer also needs to define the recovery time objectives (RTO) for the
applications that are critical to the business, and the recovery point objectives (RPO) - the point in time
to which data must be restored to resume transaction processing. Two common design requirements
of disaster tolerant architecture affecting RTO and RPO are the ability to protect the data from data
loss or corruption, and the ability to access the data. Since a solution that keeps the applications
running but allows data to become corrupt is useless, data protection should always take
precedence over application availability.

Depending on the type of disaster you are protecting against and the available technology, the
nodes can be as close as partitions within a single node, nodes in another room in the same building,
or as far away as another continent. Whatever the distance, the goal of a disaster tolerant
architecture is to survive the loss of a data center that contains critical resources to run a business
application. Putting clustered nodes further apart increases the likelihood that alternate nodes will be
available for failover in the event of a disaster. The most significant losses during a disaster are the
loss of access to data, and the loss of data itself. You protect against this loss through data
replication (i.e., creating extra copies of the data). Data replication should:
• Ensure data consistency by replicating data in a logical order so that it is immediately usable or
recoverable. Inconsistent data is unusable and is not recoverable for processing. Consistent data
may or may not be current.
• Ensure data currency by replicating data quickly so that a replica of the data can be recovered
to include all committed disk writes that were applied to the local disks.
• Ensure data recoverability so that there are some actions that can be taken to make the data
consistent, such as applying logs or rolling a database.
• Minimize data loss by configuring data replication to address consistency, currency, and
recoverability.

8
Section 4: Cluster File System (CFS) Support
Traditionally, the only storage management options in Serviceguard (SG) environments have been
either Logical Volume Manager (LVM) or Symantec Volume Manager (VxVM). Similarly, the only
options available to SG Extension for RAC (SG/SGeRAC) were the Shared Logical Volume Manager
(SLVM) and the Symantec Cluster Volume Manager (CVM), where Oracle’s application software is
typically installed on a local file system. In December 2005, support was extended to include
Symantec’s Cluster File System (CFS) by both SG and SG/SGeRAC. With CFS, executables and data
alike can be managed by the file system (e.g., Oracle data files and Oracle binaries can both be put
in a CFS). CFS provides major enhancements such as improved manageability and improved
maintenance. For instance, with CFS, Oracle binaries are installed only once, and are visible to all
cluster nodes. A central location is available to store runtime logs, archive logs, etc. From a
maintenance perspective, software updates, patches, and changes have to be applied only once.

CFS support – which requires CVM 4.1 - is currently available for single data centers only. Support
for CFS and CVM 4.1 with Extended Campus Cluster, Extended Cluster for RAC, and
Continentalclusters are all targeted for (calendar year) 2006. Until the time at which support is
provided, CVM 4.1 is not supported in DTS configurations. Please note that the support of CFS
requires the HP Storage Management Suite which includes appropriate versions of both CFS and
CVM in the Management Suite. There are presently no plans to support CFS with Metrocluster.

Section 5: Oracle 10g


The advent of Oracle 10g has introduced several new Oracle features, including Automatic Storage
Management (ASM). ASM was introduced as a component of the Oracle database. ASM provides
an alternative to platform file systems and volume managers for the management of file types used to
store most Oracle files, including data files, control files, and redo logs. A big advantage of ASM is
the ease of management it provides for Oracle database files. However, there are several file types
that are not supported – and cannot be managed - by ASM, including Oracle database server
binaries, trace files, audit files, alert logs, backup files export files, tar files core files, and Oracle’s
(clusterware) quorum and registry devices.

Support of ASM by SG/SGeRAC (version A.11.17 and beyond) is available on HP-UX 11iv2 for
RAC databases only (i.e., there is no ASM support for Oracle single instance database with SG).
Additionally, SG/SGeRAC configurations using ASM must use raw logical volumes managed by
SLVM (i.e., ASM “sits on top of” SLVM). The primary reason SLVM is required is to leverage the
multipathing capabilities provided by SLVM so that ASM can be supported by SG/SGeRAC on HP-
UX 11iv2. There are presently no plans to support any Disaster Tolerant cluster configuration with
ASM.

Extended Distance SG/SGeRAC and Continentalclusters currently support the Oracle 10g RAC
database server in non-ASM, non-CFS configurations. Additionally, Metrocluster and
Continentalclusters support the Oracle 10g single instance database server in non-ASM, non-CFS
configurations. CFS support by Extended Distance SG/SGeRAC and Continentalclusters is targeted
for 2006.

More information on SG/SGeRAC integration with Oracle 10g may be found at the HA ATC
website: http://haweb.cup.hp.com/ATC/, and in the product user’s guides (i.e., Designing Disaster
Tolerant High Availability Clusters 14th Edition, and Using Serviceguard Extension for RAC 3rd Edition)

9
Section 6: DTS and HP’s Virtualization Strategy
DTS products support HP’s VSE strategy. Serviceguard is integrated with HP VSE products related to
partitioning, utility pricing, workload management and tools for managing the overall VSE
environment. The addition of DTS leverages this integration to extend support from a single data
center to multiple data centers. More information on the integration of DTS with VSE may be found in
the following document: http://www.hp.com/products1/unix/operating/docs/wlm.serviceguard.pdf
additionally, a demo that implements Metrocluster in a VSE may be downloaded from the HA ATC
website, http://haweb.cup.hp.com/ATC/. Once on the website, the download is available on the
“Demos” webpage.

Section 7: Types of Disaster Tolerant Clusters


Four HP disaster-tolerant cluster configurations are described in this guide, including:
• Extended Campus Cluster
• Extended Cluster for RAC
• Metrocluster
• Continentalclusters

Extended Campus Cluster


An Extended Campus Cluster is a normal Serviceguard cluster with nodes spread over two data
centers. All nodes are on the same IP subnet. An application runs on one node in the cluster with
other nodes configured to take over in the event of a failure in an active/standby configuration.
Either HP-UX MirrorDisk/UX or Symantec VERITAS VxVM mirroring is used to replicate application
packages' data between the two data centers in an Extended Campus Cluster, even if the data is
stored on RAID.

Extended Campus Cluster relies on the capability of the Fibre Channel (FC) technology. It uses FC
switches and/or hubs, and Dense Wavelength Division Multiplexing (DWDM) to provide host-to-
storage connectivity across two data centers up to 100km apart.

In the Extended Campus Cluster architecture, each clustered server is directly connected to the storage
in both data centers. The following diagram depicts a 4-node Extended Campus Cluster using dual
cluster lock disks for arbitration. Cluster locks are discussed in Appendix A of this document.

10
DATA CENTER 1 DATA CENTER 2
Data LAN + Heartbeat Data LAN + Heartbeat

Heartbeat Heartbeat

UP TO 100 KM
If DWDM is used

SAN SAN

Primary SOFTWARE MIRRORING


cluster lock (e.g., MirrorDisk/UX) Secondary
disk cluster lock
disk

Figure 3. Extended Campus Cluster with two Data Centers (dual cluster lock disks used for cluster arbitration)

Benefits of Extended Campus Cluster


• This configuration implements a single Serviceguard cluster across two data centers, and uses either
MirrorDisk/UX or Symantec VERITAS VxVM mirroring for data replication. No (cluster) license
beyond SG is required for this solution, making it the least expensive to implement. The addition of
CFS support is targeted for 2006.
• Customers may choose any storage supported by Serviceguard, and the storage can be a mix of
any SG-supported storage.
• This configuration may be the easiest for customers to understand and manage, as it “looks and
feels” just like SG.
• Application failover is minimized. All disks are available to all nodes, so that if a primary disk fails
but the node stays up and the replica is available, there is no failover (i.e., the application
continues to run on the same node while accessing the replica).
• Data copies are peers, so there is no issue with reconfiguring a replica to function as a primary disk
after failover.
• Writes are synchronous unless the link or disk is down, so data remains current between the
primary disk and its replica.

Limitations of Extended Campus Cluster


• Extended Campus Cluster provides no built-in mechanism for Serviceguard to determine the state of
the data before starting up the application. An application package will start successfully if volume
group activation is successful. For example, nothing prevents an application from starting if the
Logical Volume Manager (LVM) mirrors are split. This scenario will increase the exposure to data
loss in the event of a site disaster. Only a carefully designed architecture coupled with proper
implementation (e.g. adding additional intelligence to package control scripts, selecting
appropriate volume group activation options, incorporating monitoring tools like Event Monitoring
Services, etc.) can help to avoid undesirable behavior or consequences.
• Extended Campus Cluster does not support asynchronous data replication. While data currency is
maintained between the two data centers in normal operations, longer distances between the data
centers increases the likelihood of performance impact.

11
• With MirrorDisk/UX, there is an increased I/O load for writes, since each write has to be done
twice by the host. If data resynchronization is required, based on the amount of data involved, this
can have a major performance impact on the host.

Extended Cluster for RAC


Serviceguard Extension for RAC (SGeRAC) is a specialized configuration that enables Oracle Real
Application Clusters (RAC) to run in an HP-UX environment on high availability clusters. RAC in a
Serviceguard environment lets you maintain a single (Oracle) database image that is accessed by the
servers in parallel in an active/active configuration, thereby providing greater processing power
without the overhead of administering separate databases.

Extended Cluster for RAC merges Extended Campus Cluster with SGeRAC. One key difference
between the two configurations is the volume manager. While Extended Campus Cluster uses LVM
and VxVM, Extended Cluster for RAC implements SLVM and CVM 3.5. Additionally, CFS support is
targeted for (calendar year) 2006.
Benefits of Extended Cluster for RAC
• In addition to the benefits of Extended Campus Cluster, RAC runs in active/active mode in the
cluster, so that all resources in both data centers are utilized. The database and data are
synchronized and replicated across two data centers up to 100km apart. In event of a site failure,
no failover is required, since the instance is already running at the remote site.
• Extended Cluster for RAC implements SLVM so that SGeRAC has a “built-in” mechanism for
determining the status of volume group extents in both data centers (i.e., the state of the volume
groups is kept in memory at the remote site), and SLVM will not operate on non-current data.

Limitations of Extended Cluster for RAC


• There is a limit on cluster size, based on the underlying volume manager. If the volume manager
used is SLVM, the (RAC) configuration is limited to 2 nodes (i.e., while the actual cluster size can be
up to 16 nodes, only 2 nodes in the cluster can be configured with RAC, since SLVM supports 2-
node mirroring. All other nodes can be configured to run “non-RAC” applications.) In the
Extended Cluster for RAC configuration, if one of the RAC nodes is unreachable, the surviving node
has no backup.
• With MirrorDisk/UX, there is an increased I/O load for writes, since each write has to be done
twice by the host. If data resynchronization is required, based on the amount of data involved, this
can have a major performance impact on the host.
• In addition to SLVM, Extended Cluster for RAC also supports Symantec’s Cluster Volume Manager
(CVM 3.5). With CVM 3.5, the cluster may be increased up to four nodes, but the distance for a
4-node cluster is limited to 10km (like SLVM, a 2-node CVM 3.5 cluster supports a maximum
distance of 100km).
• Link distance and latency may affect the application’s performance, as RAC uses the network for
data block passing (Oracle’s Cache Fusion architecture).

Metrocluster
Similar to Extended Campus Cluster, a Metrocluster is a normal Serviceguard cluster that has
clustered nodes and storage devices located in different data centers separated by some distance.
Applications run in an active/standby mode (i.e., application resources are only available to one
node at a time). The distinct characteristic of Metrocluster is its integration with array-based data
replication. Currently, Metrocluster implements three different solutions:
• Metrocluster/CAXP – HP StorageWorks Continuous Access XP

12
• Metrocluster/CAEVA – HP StorageWorks Continuous Access EVA
• Metrocluster/SRDF - EMC’s Symmetrix arrays

Each data center has a set of nodes connected to the storage local to that data center. Disk arrays in
the two data centers are physically connected to each other. Since the data replication/mirroring is
done by the storage subsystem, there is no need to have storage connection from a local server to the
disk array at the remote data center. Either arbitrator nodes, located in a third location, or a quorum
server is used for cluster arbitration.
The following diagram provides an example of Metrocluster/CAXP, configured with arbitrator nodes
at a location separate from either of the two data centers.

rd
3 Site
Arbitrator node Arbitrator node

nd
st
2 Site: DATA CENTER 2
1 Site: DATA CENTER 1
Data LAN + Heartbeat Data LA N + Heartbeat

DWDM DWDM

rd
Figure 4. Metrocluster/CAXP CA with two data centers & a 3 location for arbitrator nodes

NOTE: DETAILED INFORMATION ON ARBITRATOR NODES AND QUORUM SERVERS IS DISCUSSED IN


APPENDIX A OF THIS DOCUMENT.

The distance separating the data centers in a Metrocluster is based on the cluster network and data
replication link. In a Metrocluster configuration, maximum distance is the shortest of the distances
defined by:
– Cluster network – maximum distance cannot exceed roundtrip cluster heartbeat network latency
requirement of 200ms
– DWDM provider – distance cannot exceed the maximum as specified for the product supplied by
the DWDM provider
– Data replication link – maximum supported distance as stated by the storage partner
Since this is a single SG cluster, all cluster nodes have to be on the same IP subnet for cluster network
communication.

13
Benefits of Metrocluster
• Metrocluster offers a more resilient solution than Extended Campus Cluster, as it provides full
integration between Serviceguard’s application package and the data replication subsystem. The
storage subsystem is queried to determine the state of the data on the arrays. Metrocluster knows
that application package data is replicated between two data centers. It takes advantage of this
knowledge to evaluate the status of the local and remote copies of the data, including whether the
local site holds the primary copy or the secondary copy of data, whether the local data is consistent
or not, and whether the local data is current or not. Depending on the result of this evaluation, it
decides if it is safe to start the application package, whether a resynchronization of data is needed
before the package can start, or whether manual intervention is required to determine the state of
the data before the application package is started. Metrocluster allows for customization of the
startup behavior for application packages depending on the customer's requirements, such as data
currency or application availability. This means that by default, Metrocluster will always prioritize
data consistency and data currency over application availability. If, however, the customer
chooses to prioritize availability over currency, s/he can configure Metrocluster to start up even
when the state of the data cannot be determined to be fully current (but the data is consistent).
• Users wishing to prioritize performance over data currency between the data centers have a choice
of Metrocluster CAXP or Metrocluster SRDF, as each supports both synchronous and asynchronous
replication modes.
• Because data replication and resynchronization are performed by the storage subsystem,
Metrocluster may provide significantly better performance than Extended Campus Cluster during
recovery. Unlike Extended Campus Cluster, Metrocluster does not require any additional CPU time,
which minimizes the impact on the host.
• There is little or no lag time writing to the replica, so the data remains very current.
• Data can be copied in both directions, so that if the primary site fails and the replica takes over,
data can be copied back to the primary site when it comes back up.
• Disk resynchronization is independent from CPU failure (i.e., if the hosts at the primary site fail but
the disk remains up, the disk knows it does not have to be resynchronized).

Limitations of Metrocluster
• Specialized storage hardware is required in a Metrocluster environment, meaning customers are
not allowed to choose their own storage component. Supported storage subsystems include HP
StorageWorks XP, HP StorageWorks EVA, and EMC Symmetrix with SRDF. In addition to
specialized storage, disk arrays from different vendors are incompatible (i.e., a pair of disk arrays
from the same vendor is required).
• There are no plans to support Oracle RAC (neither 9i nor 10g) in a Metrocluster configuration.
• There are no plans to support CFS in a Metrocluster configuration.

Continentalclusters
Continentalclusters provides an alternative disaster tolerant solution in which short to long
distances separate distinct Serviceguard clusters, with either a local area network (LAN) or a wide
area network (WAN) between the clusters. Unlike Metrocluster and Extended Campus Cluster that
have single-cluster architecture, Continentalclusters uses multiple clusters to provide application
recovery. Applications run in the active/standby mode, with application data replicated between
data centers by either storage array-based data replication products (such as Continuous Access XP
or EMC's SRDF), or software-based data replication (such as Oracle 8i Standby DBMS and Oracle 9i
Data Guard).

14
Two types of connections are needed between the two Serviceguard clusters in this architecture; one
for the inter-cluster communication, and another for the data replication. Depending on the distance
between the two sites, either LAN (i.e., single IP subnet) or WAN connections may be used for cluster
network communication. For data replication, depending on the type of connection (ESCON or FC)
that is supported by the data replication software, the data can be replicated over DWDM, 100Base-
T and Gigabit Ethernet using Internet Protocol (IP), ATM, and T1 or T3/E3 leased lines or switched
lines. The Ethernet links and ATM can be implemented over multiple T1 or T3/E3 leased lines.

Continentalclusters provides the ability to monitor a Serviceguard cluster and fail over mission critical
applications to another cluster if the monitored cluster should become unavailable. In addition,
Continentalclusters supports mutual recovery, which allows for mission critical applications to run on
both clusters, with each cluster configured to recover the mission critical applications of the other. As
of March 2003, Continentalclusters supports SGeRAC in addition to Serviceguard. In an SGeRAC
configuration, Oracle RAC database instances are simultaneously accessible by nodes in the same
cluster (i.e., the database is only accessible to one site at a time). The Oracle database and data are
replicated to the 2nd data center, and the RAC instances are configured for recoverability, so that the
2nd data center stands by, ready to begin processing in event of a site failure at the 1st data center
(i.e., across sites, this is an active/standby configuration such that the data base is only accessible to
one site at a time).

If a participating cluster in Continentalclusters should become unavailable, Continentalclusters sends


the administrator a notification of the problem. The administrator should verify that the monitored
cluster has failed and then issue a recovery command to transfer mission critical applications from the
failed cluster to the recovery cluster.

NOTE: THE MOVEMENT OF AN APPLICATION FROM ONE CLUSTER TO ANOTHER CLUSTER DOES NOT REPLACE
LOCAL FAILOVER. APPLICATION PACKAGES SHOULD BE CONFIGURED TO FAIL BETWEEN NODES (OR
PARTITIONS) IN THE LOCAL CLUSTER.

The following diagram depicts a Continentalclusters configuration with two data centers.
DATA CENTER 1 IP Router IP Router
DATA CENTER 2
Data LAN + Heartbeat
Data LAN + Heartbeat

Heartbeat Heartbeat

IP Router IP
IP Router
Network

CNT
CNT Edge
Edge

Figure 5. Contientalclusters with XP CA over IP

15
Benefits of Continentalclusters (CC)
• Customers can virtually build data centers anywhere and still have the data centers provide disaster
tolerance for each other. Since Continentalclusters uses two clusters, theoretically there is no limit to
the distance between the two clusters. The distance between the clusters is dictated by the required
rate of data replication to the remote site, level of data currency, and the quality of networking links
between the two data centers.
• Inter-cluster communication can be implemented with either WAN or LAN. LAN support is a great
advantage for customers who have data centers in proximity of each other, but for whatever
reason, do not want the data centers configured into a single cluster. One example may be a
customer who already has two SG clusters close to each other. For business reasons, the customer
cannot merge these two clusters into a single cluster, but is concerned about having one of the
centers become unavailable. Continentalclusters can be added to provide disaster tolerance.
• Customers can integrate Continentalclusters with any storage component of choice that is supported
by Serviceguard. Continentalclusters provides a structure to work with any type of data replication
mechanism. A set of guidelines for integrating customers’ chosen data replication scheme with
Continentalclusters is included in the “Designing Disaster Tolerant High Availability Clusters”
manual.
• Besides selecting their own storage and data replication solution, customers can also take
advantage of the following (HP) pre-integrated solutions
– Storage subsystems implemented by Metrocluster are also pre-integrated with Continentalclusters.
Continentalclusters uses the same data replication integration module that Metrocluster
implements to check for data status of the application package before package start up.
– If either Oracle8i or Oracle9i DBMS is used and logical data replication is the preferred method,
depending on the version, either Oracle 8i Standby or Oracle 9i Data Guard with log shipping is
used to replicate the data between two data centers. HP provides a supported integration toolkit
for Oracle 8i Standby DB in the Enterprise Cluster Management Toolkit (ECMT). Contributed
integration templates for Oracle 9i Data Guard are available at the following location:
http://haweb.cup.hp.com/ATC/. While the integration templates for Oracle 9i Data Guard have
been tested with Continentalclusters by ACSL, the scripts are provided at no charge, with no
support from HP.
• Both Oracle9i and Oralce10g RAC are supported by Continentalclusters by integrating CC with
SGeRAC. In this configuration, multiple nodes in a single cluster can simultaneously access the
database (i.e., nodes in one data center can access the database). If the site fails, the RAC
instances can be recovered at the second site.
• In a 2-data center configuration, Continentalclusters supports a maximum of 32 nodes – i.e., a
maximum of 16 nodes per data center.
• Continentalclusters supports up to three data centers. In this configuration, the first two data centers
must implement Metrocluster so that applications automatically fail over between the first two data
centers before migrating to the third data center. The third data center is a traditional (single)
Serviceguard data center. If both the first and second data centers fail, the customer will be notified
and advised to migrate the application to the third site.

NOTE: THIS CONFIGURATION MUST BE VERY CAREFULLY DEPLOYED, AS APPLICATION AND DATA FAILBACK IS
VERY MANUALLY INTENSIVE

• Failover for Continentalclusters is semi-automatic. If a data center fails, the administrator is advised,
and is required to take action to bring the application up on the surviving cluster. Per customer
feedback via our Field personnel, some customers prefer notification that the site is down before the
application migrates to the recovery site.
• CFS support is targeted for 2006.

16
Limitations of Continentalclusters
• Semi-automatic failover is a concern for some customers, depending on their Recovery Time
Objectives (RTO). Per feedback from Field personnel, some customers would like the option of
automatic failover as well as semi-automatic failover.
• Although not a limitation of the Continentalclusters product, it should be noted that increased
distance could significantly complicate the solution. For example, operational issues, such as
working with different staff with different processes, and conducting failover rehearsals, are more
difficult the further apart the clusters are. In addition, the physical connection is one or more leased
lines managed by a common carrier for configurations that require WAN between the clusters
because of the distance separating them. Common carriers cannot guarantee the same reliability as
a dedicated physical cable. The distance can introduce a time lag for data replication, which
creates an issue with data currency. This could increase the overall solution cost by requiring higher
speed connections to improve data replication performance and reduce latency.

Comparison of Solutions
One of the major problems the Field faces is distinguishing between Extended Campus Cluster and
Metrocluster. The following section is provided to highlight key differences between the two.

Differences Between Extended Campus Cluster and Metrocluster


The major differences between an Extended Campus Cluster and a Metrocluster include:
• The methods used to replicate data between the storage devices in the two data centers. Generally
speaking, there are two basic methods available for replicating data between the data centers for
HP-UX clusters - either host-based or storage array-based. Extended Campus Cluster always uses
host-based replication (either MirrorDisk /UX or Symantec VERITAS VxVM mirroring). Any (mix of)
SG-supported storage can be implemented in an Extended Campus Cluster. Metrocluster always
uses array-based replication/mirroring, and requires storage from the same vendor in both data
centers (i.e, a pair of XPs with CA, a pair of Symmetrix arrays with SRDF, or a pair of EVAs with
CA).
• Data centers in an Extended Campus Cluster can span up to100km, whereas the distance between
data centers in a Metrocluster is defined by the shortest of the distances for
– the maximum distance that guarantees a network latency of no more than 200ms
– the maximum distance supported by the data replication link
– the maximum supported distance for DWDM as stated by the provider
• In an Extended Campus Cluster, there is no built-in mechanism for determining the state of the data
being replicated. When an application fails over from one data center to another, the package is
allowed to start up if the volume group(s) can be activated. A Metrocluster implementation
provides a higher degree of data integrity - the application is only allowed to start up based on the
state of the data and the disk arrays.
• Extended Campus Cluster supports active/active access by implementing SGeRAC, whereas
Metrocluster only supports active/standby access.
• Extended Campus Cluster reads may outperform Metrocluster in normal operations. On the other
hand, Metrocluster performance is better than Extended Campus Cluster for data resynchronization
and recovery.

17
Comparison - All DTS Solutions
The following table extends the comparison to include Extended Cluster with RAC and
Continentalclusters.
Attributes Extended Extended Metrocluster Continentalclusters
Campus Cluster with (CC)
Cluster RAC
The following attributes are included, as they must be considered, based on the type of disaster(s)
about which the customer is concerned.
Key Benefit Excellent in “normal” Excellent in “normal” Two significant Increased data protection
operations, and operations, and benefits: by supporting unlimited
partial failure. Since partial failure. - Provides maximum distance between data
all hosts have access Active/active data protection. State centers (protects against
to both disks, in a configuration of the data is such disasters as those
failure where the provides maximum determined before caused by earthquakes or
node running the data throughput and application is started. violent attacks, where an
application is up but reduces the need for If necessary, data entire area can be
the disk becomes fail over (since both resynchronization is disrupted).
unavailable, no data centers are performed before
application is brought
failover occurs. The active, the
up.
node will access the application is
remote disk to already up on the 2nd - Better performance
continue processing. site). than Extended
Campus Cluster for
resync, as replication
is done by storage
subsystem (no impact
to host)

Key Limitation No ability to check SLVM configuration Specialized storage No automatic failover
the state of the data is limited to 2 nodes. required. Currently, between clusters.
before starting up the CVM 3.5 XP with continuous
application. If the configuration access, EVA with
volume group (vg) supports up to 4 continuous access,
can be activated, the nodes. However, 4- and EMC’s Symmetrix
application will be node configuration is with SRDF are
started. If mirrors limited to a distance supported.
are split or PV links of10km.
are down, as long as
the vg can be Data
activated, the resynchronization
application will be can have a big
started. impact on system
performance as this
Data is a host-based
resynchronization solution.
can have a big
impact on system
performance, as this
is a host-based
solution.
100 kilometers 100 km (maximum 2 Shortest of the 3
Maximum No distance restrictions
Distance1 nodes, with either distances between
SLVM or CVM 3.5) •
Cluster network
10 km apart latency (not to
(maximum is 4 nodes exceed 200ms)
with CVM 3.5) •
Data replication
max distance

DWDM provider
2
max distance

18
Attributes Extended Extended Metrocluster Continentalclusters (CC)
Campus Cluster
Cluster with RAC
The following attributes are included, as they directly affect data consistency, currency, and
availability and must be considered when evaluating the customer’s RTO
Data Replication Host-based, via Host-based, via Array-based, via Customers have a choice of either
Mechanism MirrorDisk/UX MirrorDisk/UX CAXP, CAEVA, or selecting their own SG-supported
or (Symantec) or (Symantec) EMC SRDF. storage and data replication
VERITAS VxVM. VERITAS CVM mechanism, or implementing one of
Replication can 3.5. Replication and HP’s pre-integrated solutions
affect Replication can resynchronization (including CAXP, CAEVA, and EMC
performance impact performed by the SRDF for array-based, or Oracle 8i
(writes are performance storage subsystem, Standby for host based.) Also,
synchronous). (writes are so the host does customers may choose Oracle 9i
Re-syncs can synchronous). not experience a Data Guard as a host-based solution.
impact Re-syncs can performance hit. Contributed (i.e., unsupported)
performance (full impact Incremental re- integration templates for Oracle 9i
re-sync is performance syncs are done, Data Guard are available for
required in (full re-sync is minimizing the download at the following location:
many scenarios required in need for full re- http://haweb.cup.hp.com/ATC/
that have many scenarios syncs.
multiple that have
4 multiple
failures.)
4
failures.)

Application Automatic (no Instance is Automatic (no Semi-automatic (user must “push the
Failover type manual already running manual button” to initiate recovery)
intervention at the 2nd site intervention
required) required)
Access Mode5 Active/Standby Active/Active Active/Standby Active/Standby
Client Client detects the Client may Client detects the User must reconnect once the
Transparency lost connection. already have a lost connection. application is recovered at 2nd site
User must standby User must
reconnect once connection to reconnect once the
the application remote site application is
is recovered at recovered at 2nd
2nd site site
The following attributes are included, as they directly impact system scalability
Maximum Cluster 2 to 16 nodes 2 nodes with 3 to 16 nodes 1 to 16 nodes in each cluster
size allowed (up to 4 when SLVM or CVM (max total of 32 nodes – 16 nodes
using dual lock 3.5 with a per cluster in a 2-data center
disks) maximum configuration)
distance of
100km
4 nodes with
CVM 3.5 with a
maximum
distance of
10km

19
Attributes Extended Extended Metrocluster Continentalclusters (CC)
Campus Cluster
Cluster with RAC
The following attributes are included, as they directly affect cost of implementation and maintenance
Storage Identical storage Identical Identical storage is Identical storage is required if
is not required storage is not required storage-based mirroring is used
(replication is required,
host-based with replication is Identical storage is not required for
either host-based with other data replication
MirrorDisk/UX either implementations
OR MirrorDisk/UX
VxVM mirroring) OR
CVM 3.5
Mirroring)
Data replication Dark Fiber Dark Fiber Dark Fiber WAN
link FC over IP LAN
FC over ATM Dark Fiber (pre-integrated solution)
FC over IP (pre-integrated solution)
FC over ATM (pre-integrated
solution)
Cluster network Single IP subnet Single IP subnet Single IP subnet Two configurations:
Single IP subnet for both clusters
(LAN connection between clusters)

Two IP subnets – one per cluster


(WAN connection between clusters)
DTS SG (no other SG + SGeRAC SG + Metrocluster SG + Continentalclusters +
Software/Licenses clustering SW is (Metrocluster CAXP OR
required) Metrocluster CAEVA
Required
Metrocluster SRDF OR
Enterprise Cluster Master Toolkit) OR
Customer-selected data replication
subsystem

CC with RAC: SG + SGeRAC +


Continentalclusters

1
Data centers that are farther apart increase the likelihood that alternate nodes will be available for
failover in event of a disaster.
2
Metrocluster distance is determined by the shortest of

the maximum distance that guarantees a network latency no more than 200ms,

the maximum supported distance for the data replication link, or

the DWDM provided maximum supported distance
As such, these values will vary between configurations, based on these factors.
3
Continentalclusters has no limitation on distance between the two data centers. The distance is
dictated by the required rate of data replication to the remote site, level of data currency, and the
quality of networking links between the two data centers.
4
A full re-sync is required if a failure that caused one of the mirrors to be unavailable (such as a path
failure to the remote site) is followed by a failure that causes a failover to the host at the remote site
that uses the mirror that was unavailable.
5
Active/standby access means one node at a time is accessing the application’s resources.
Active/active access means all resources are available to multiple nodes.

20
Section 8: Disaster Tolerant Cluster Limitations
Disaster tolerant clusters have limitations, some of which can be mitigated by good planning. Some
examples of multiple points of failure that may not be covered by disaster tolerant configurations
include:
• Failure of all networks among the data centers — using a different route for all network cables can
mitigate the risk.
• Loss of power in more than one site (e.g., a data center + the site housing arbitrator nodes) — This
can be mitigated by making sure sites are on different power circuits, redundant power supplies are
on different circuits, and power circuits are fed from different grids. If power outages are frequent
in your area, and down time is expensive, you may want to invest in a backup generator.
• Loss of all copies of the on-line data — this can be mitigated by replicating data off-line (frequent
backups). It can also be mitigated by taking snapshots of consistent data and storing it on-line;
Business Copy XP and EMC Symmetrix BCV (Business Consistency Volumes) provide this
functionality and the additional benefit of quick recovery should anything happen to both copies of
on-line data.
• A rolling disaster is a disaster that occurs before the cluster is able to recover from a non-
disastrous failure. An example is a data replication link that fails, then, as it is being restored and
data is being resynchronized, a disaster causes an entire data center to fail. Ensuring that a copy
of the data is stored either off-line or on a separate disk that can quickly be restored can mitigate
the effects of rolling disasters. The trade-off is a lack of currency of the data in the off-line copy.

Section 9: Recommendations
As previously stated, customers’ recovery time and recovery point objectives (RTO and RPO) typically
drive the type of disaster tolerant solution selected. The following guidelines are provided to help
determine how to select a solution for recommendation.

• When should I recommend Extended Campus Cluster or Extended Cluster for RAC?
Extended Campus Cluster is recommended for any of the following situations:
Ø A Customer needs to provide some level of protection, but has his own storage. Since any
storage supported by SG is approved for Extended Campus Cluster, this may be the best
solution for this customer.
Ø A customer has a requirement to implement disaster tolerance on a very limited budget.
Metrocluster would be the customer’s choice, but the cost to deploy it exceeds his budget.
Extended Campus Cluster is a good recommendation – as long as the customer understands
and accepts its limitations.
Ø If a customer’s business is the financial industry (such as banking) with an extraordinarily
large volume of real-time transactions, the customer needs to maximize resource usage. The
customer is concerned about such natural events as flooding. In this instance, you may
recommend Extended Cluster for RAC.

• When should I recommend Metrocluster?


Metrocluster is recommended for any of the following situations:

21
Ø A customer has one data center running SG. The shared storage is a disk array (XP, EMC, or
EVA). The customer is investigating building a 2nd data center a few miles away to be used
primarily for development and test. This data center can also be used as a back up for the
existing data center.
Ø A customer has two data centers that are within Metrocluster distance limits. One data center
is running an SG cluster. The 2nd data center is used strictly to back up the data via physical
data replication (such as EMC’s SRDF). The 2nd data center is not running any (business
critical) applications. In this situation, the data is protected, such that in the event of an
outage at the primary data center, the data can be physically moved to a location where a
cluster can be brought up and transaction processing restored. This process is manually
intensive. Because of its automatic failover capability, Metrocluster shortens recovery time,
offering a much better solution.
Ø A customer has three data centers running independently from each other, and realizes the
vulnerability of having unprotected data at each of the data centers. HP offers a solution for
three data centers. The first two data centers implement Metrocluster for automatic failover.
Continentalclusters is then implemented so that the third data center backs up the first two. In
this configuration, if the entire Metrocluster fails, the third data center will take over
operations.

• When should I recommend Continentalclusters?


Continentalclusters is recommended for any of the following situations:
Ø A customer needs disaster tolerance, but wants to decide when an application is recovered
(i.e., the customer wants to be informed/consulted before bringing up an application on the
2nd data center after its main site fails).
Ø A customer has two existing data centers that cannot be disrupted - each configured as a
local cluster - and is concerned about a site failure. Regardless of the distance, the customer
can “add on” disaster tolerance with data replication and Continentalclusters.
Ø A customer has data centers geographically dispersed, and is concerned about an outage at
one of the sites. As an example, a customer’s location is subject to natural disasters – such as
tornadoes – that could impact an entire Metropolitan area. Continentalclusters is an excellent
solution, as it has no distance limitations.
Ø A customer is interested in disaster tolerance for a RAC application, they are in an area
vulnerable to natural disasters that can affect an entire metropolitan area (e.g., earthquakes)
and an active/passive configuration meets their business needs.
Ø A customer has three data centers running independently from each other, and realizes the
vulnerability of having unprotected data at each of the data centers. HP offers a solution for
three data centers. The first two data centers implement Metrocluster for automatic failover.
Continentalclusters is then implemented so that the third data center backs up the first two. In
this configuration, if the entire Metrocluster fails, the third data center will take over
operations.

As you can see, disaster tolerant solutions require a significant investment in hardware with
geographically dispersed data centers, a means to continuously replicate data from the primary site
to the recovery site, clustering software to monitor faults and manage the failover of the applications,
as well as IT staff in all data centers to operate the environment. With the defined RTO and RPO, the
customer can then decide on whether implementing a disaster tolerant solution is worth the
investment.

22
Appendix A – DTS Design Considerations
Once a customer defines his requirements and chooses to implement a disaster tolerant solution, he
must make many decisions about the actual implementation. The following information is included to
help with the selection of solution components.

Cluster Arbitration
To protect application data integrity, Serviceguard uses a process called arbitration to prevent
more than one incarnation of a cluster from running and starting up a second instance of an
application. In the Serviceguard user’s manual, this process is known as tie breaking, because it is a
means to decide on a definitive cluster membership when different competing cluster nodes are
independently trying to re-form a cluster. Cluster re-formation takes place when there is a change in
cluster membership. In general, the algorithm for cluster re-formation requires the new cluster to
achieve a cluster quorum of a strict majority (that is, more than 50%) of the nodes previously
running. If both halves (exactly 50%) of a previously running cluster were allowed to re-form, there
would be a split-brain situation in which two instances of the same cluster were running.
Serviceguard employs a lock disk, a quorum server, or arbitrator nodes to provide definitive
arbitration to prevent split-brain conditions.

Serviceguard Cluster Quorum Requirements


Strictly, more than 50% of the active members from the previous cluster membership (all cluster
members are required when a cluster is initially started unless a manual override is specified)

All supported Serviceguard cluster arbitration schemes apply to individual clusters in


Continentalclusters (including cluster lock disk, quorum server, and arbitrator nodes).

Dual cluster lock disks


Extended Campus Cluster and Extended Cluster for RAC can use cluster lock disks for cluster quorum.

In an Extended Campus Cluster where the cluster nodes are running in two separate data centers, a
single cluster lock disk would be a single point of failure if the data center it resides in suffers a
catastrophic failure. In this solution, there should be one lock disk in each of the two data centers,
and all nodes must have access to both lock disks. In the event of a failure of one of the data centers,
the nodes in the remaining data center will be able to acquire their local lock disk, allowing them to
successfully reform a new cluster. A solution uses dual cluster lock disks is susceptible to split-brain
syndrome. If it is properly designed, configured, and deployed, it would be very difficult for split-
brain to occur (all storage links and cluster network links must all fail) but it is still possible. A dual
cluster lock disk is only supported with Extended Cluster for RAC and Extended Campus Cluster in a
cluster size of four nodes or less.

Quorum Server in a third location


Extended Campus Cluster, Extended Cluster for RAC, and Metrocluster can use a quorum server for
cluster quorum.
The quorum server is an alternate form of cluster lock that uses a server program running on a
separate system for tie breaking rather than a lock disk. Should equal sized groups of nodes become
separated from each other, the quorum server allows one group to achieve quorum and form the
cluster, while the other group is denied quorum and cannot reform the cluster. The quorum server
process runs on a machine outside of the cluster for which it is providing quorum services. In a
disaster tolerant solution, you can see that the quorum server should be located in a separate location

23
away from the two data centers. The farther the third location is away from the two data centers, the
higher disaster protection the solution can provide. If the customer chooses a building within the
same campus as one of the data centers to house the quorum server, the customer may be protected
from a fire or power outage, but may not be protected from an earthquake or a hurricane. One
advantage of the quorum server is that additional cluster nodes do not have to be configured for
arbitration. Also, one quorum server can serve multiple clusters.

Since you cannot configure redundant quorum server, an entire cluster will fail if the quorum server
fails followed by a failure that requires cluster reformation. To reduce this exposure, you need to
make sure that the quorum server is packaged in its own SG cluster so that when a disaster occurs to
one of the main data centers, the quorum server is available to provide cluster quorum to the
remaining cluster nodes to form a new cluster. A solution using quorum server is not susceptible to
split brain syndrome.

Quorum server software is available free of charge.

Arbitrator node(s) in a third location


Extended Campus Cluster, Extended Cluster for RAC, and Metrocluster can use arbitrator node(s) for
cluster quorum.

An arbitrator node is the same as any other cluster node and is not configured in any special way in
the cluster configuration file. It is used to make an even partition of the cluster impossible or at least
extremely unlikely. A single failure in a four-node cluster could result in two equal-sized partitions, but
a single failure in a five-node cluster could not. The fifth node in the cluster, then, performs the job of
arbitration by virtue of the fact that it makes the number of nodes in the cluster odd. If one data
center in the solution were down due to disaster, the surviving data center would still remain
connected to the arbitrator node, so the surviving group of nodes would be larger than 50% of the
previously running nodes in the cluster. It could therefore obtain the quorum and re-form the cluster.
As in the case of quorum server, the arbitrator node should be located in a site separate from the two
data centers to provide the appropriate degree of disaster tolerance. The farther the site is away from
the two data centers, the higher disaster protection the solution can provide. A properly designed
cluster solution with two data centers and a 3rd site using arbitrator node(s) will always be able to
achieve cluster quorum after a site failure because a cluster quorum of a strict majority (that is, more
than 50%) of the nodes previously running will always be available to form a new cluster.

It is recommended that two arbitrator nodes be configured in a site separate from either of the data
centers to eliminate the single arbitrator node being an SPOF of the solution. The arbitrator nodes
can be used to run an application that doesn’t need disaster tolerant protection. The arbitrator nodes
can be configured to share some common local disk storage. A Serviceguard package can be
configured to provide local fail over of the application between the two arbitrator nodes.

Recommended Arbitration Method


For a single-cluster disaster tolerant solution, it is recommended to select the cluster arbitration
method in the following order:
• Two arbitrator nodes in a site separate from either of the two data centers to provide highest
protection with highest cost.
• One arbitrator node or a quorum server in a site separate from the data centers is medium
cost, but the single node itself can potentially become an SPOF of the solution

24
• Dual cluster lock disk is lowest cost but is susceptible to a slight chance of split-brain
syndrome – only supported with Extended Campus Cluster and Extended Cluster for RAC

Protecting Data through Replication


Different data replication methods have different advantages about data consistency and currency.
Your requirements will dictate your choice of data replication.

Off-line Data Replication


Off-line data replication is the method most commonly used today. The data is stored on tape and is
kept in a vault at a remote location away from the primary data center. If a disaster occurs at the
primary data center, the off-line copy of data is used and a remote site functions in place of the failed
site. Because data is replicated using physical off-line backup, data consistency is fairly high, barring
human error or an untested corrupt backup. However, data currency is compromised by the amount
of time that elapses between backups.

Off-line data replication is fine for many applications for which recovery time is not critical to the
business. Although data might be replicated weekly or even daily, recovery could take from a day to
a week depending on the volume of data. Some applications, depending on the role they play in the
business, may need to have a faster recovery time, within hours or even minutes. For these
applications, off-line data replication would not be appropriate.

On-line Data Replication


On-line data replication is a method of copying data from one site to another across a link. It is used
when very short recovery time, from minutes to hours, is required. To be able to recover use of an
application in a short time, the data at the alternate site must be replicated in real time on all disks.

Data can be replicated either synchronously or asynchronously. Synchronous replication requires


one disk write to be completed and replicated before another disk write can begin. This method
guarantees data consistency and currency during replication. However, as distance increases
between data centers, it greatly reduces data replication capacity and application performance, as
well as system response time. Asynchronous replication does not require the primary site to wait
for one disk write to be replicated before beginning another. This can be an issue with data
currency, depending on the volume of transactions. An application that has a very large volume of
transactions can get hours behind in replication using asynchronous replication. If the application fails
over to the remote site, it would start up with data that is not current, and this may not be desirable.
Where as data consistency and currency are inherent traits of synchronous replication mode, in
asynchronous replication, guaranteed write ordering must be provided to ensure data consistency,
and the level of data currency is based on customer requirements and the cost the customer is willing
to pay. Note that not all asynchronous data replication facilities guarantee write ordering.

Currently the two ways of replicating data on-line are physical data replication and logical data
replication. Either of these can be configured to use synchronous or asynchronous writes.

Physical Data Replication


Each physical write to disk is replicated on another disk at another site. Because the replication is a
physical write to disk, it is not application dependent. This allows each node to run different

25
applications under normal circumstances. Then, if a disaster occurs, an alternate node can take
ownership of applications and data, provided the replicated data is current and consistent.

Physical data replication can be done in software or hardware. MirrorDisk/UX is an example of


physical replication done in the software; a disk I/O is written to each storage connected to the node,
requiring the node to make multiple disk I/Os. Continuous Access XP on the HP StorageWorks Disk
Array XP series is an example of physical replication in hardware; a single disk I/O is replicated
across the Continuous Access link to a second XP disk array.

Replication Mode
Currently, there are three hardware physical data replication products integrated and supported with
HP-UX Disaster Tolerant Solutions – CAXP, CAEVA, and EMC SRDF. Both CAXP and EMC SRDF are
supported with both synchronous and asynchronous mode.

Advantages of physical replication in hardware are:


• There is little or no lag time writing to the replica. This means that data remains very current.
• Replication consumes no additional CPU.
• The hardware deals with resynchronization if the link or disk fails. Moreover, resynchronization is
independent of CPU failure; if the CPU fails and the disk remains up, the disk knows it does not
have to be resynchronized.
• Data can be copied in both directions, so that if the primary fails and the replica takes over, data
can be copied back to the primary when it comes back up.
• Easier and faster data recovery because data is available on the remote storage, no need to restore
from tape.

Disadvantages of physical replication in hardware are:


• The logical order of data writes is not maintained during resync-recovery after a link failure and
recovery. When a replication link goes down and transactions continue at the primary site, writes
to the primary disk are queued in a bit-map. When the link is restored, if there has been more than
one write to the primary disk, there is no way to determine the original order of transactions. This
increases the risk of data inconsistency in the replica during resynchronization.
• Because the replicated data is a write to a physical disk block, database corruption and human
errors, such as the accidental removal of a database table, are replicated at the remote site.
• Redundant disk hardware and cabling are required. This at least doubles data storage costs. Also,
because the technology is in the disk itself, this solution requires specialized hardware.
• For architectures using dedicated cables, the distance between the sites are limited by the cable
interconnect technology. Different technologies support different distances and provide different
“data throughput” performance.
• For architectures using common carriers, the costs can vary dramatically, and the reliability of the
connection can vary, depending on the Service Level Agreement.

Advantages of physical replication in software are:


• There is little or no time lag between the initial and replicated disk I/O, so data remains very
current.
• The solution is independent of disk technology, so you can use any supported disk technology.

26
• Data copies are peers, so there is no issue with reconfiguring a replica to function as a primary disk
after failover.
• Because there are multiple read devices, that is, the node has access to both copies of data, there
may be improvements in read performance.
• Writes are synchronous unless the link or disk is down.

Disadvantages of physical replication in software are:


• As with physical replication in the hardware, the logical order of data writes is not maintained.
When the link is restored, if there has been more than one write to the primary disk, there is no
way to determine the original order of transactions.
• Distance between sites is limited by the physical disk link capabilities.
• Performance is affected by many factors: CPU overhead for mirroring, double I/O writes, degraded
write performance, and CPU time for resynchronization. In addition, CPU failure may cause a
resynchronization even if it is not needed, further affecting system performance.

Logical Data Replication


Logical data replication is a method of replicating data by repeating the sequence of transactions at
the remote site. Logical replication often must be done at both the file system level, and the database
level in order to replicate all of the data associated with an application. Most database vendors have
one or more database replication products. An example is the Oracle Standby Database. Logical
replication can be configured to use synchronous or asynchronous writes. Transaction processing
monitors (TPMs) can also perform logical replication.

For logical data replication, currently the Continentalclusters product has a fully integrated and
supported solution with Oracle 8i Standby Database. The integration script is available in the
Enterprise Cluster Master Toolkit. Contributed integration templates for Continentalclusters with
Oracle 9i Data Guard are available for downloaded from http://haweb.cup.hp.com/ATC/. While the
integration templates for Oracle 9i Data Guard have been tested with Continentalclusters by ACSL,
the scripts are provided at no charge, with no support from HP.

Advantages of using logical replication are:


• The distance between nodes is limited only by the networking technology.
• There is no additional hardware needed to do logical replication, unless you choose to boost CPU
power and network bandwidth.
• Logical replication can be implemented to reduce risk of duplicating human error. For example, if a
database administrator erroneously removes a table from the database, a physical replication
method will duplicate that error at the remote site as a raw write to disk. A logical replication
method can be implemented to only replicate database transactions, not database commands, so
such errors would not be replicated at the remote site. This also means that administrative tasks,
such as adding or removing database tables, have to be repeated at each site.
• With database replication you can roll transactions forward or backward to achieve the level of
currency desired on the replica, although this functionality is not available with file system
replication.

Disadvantages of logical replication are:


• It uses significant CPU overhead because transactions are often replicated more than once and
logged to ensure data consistency, and all but the most simple database transactions take
significant CPU. It also uses network bandwidth, whereas most physical replication methods use a

27
separate data replication link. As a result, there may be a significant lag in replicating transactions
at the remote site, which affects data currency.
• When a site disaster occurs, logical records or logs being prepared for shipment and in the process
of being transferred to the recovery site will be lost. In this instance, the amount of data loss can be
significant depending on the number of transactions contained within the logical records or logs
(e.g., an Oracle archive log can potentially contain hundreds of database transactions). Reducing
the number of transactions contained within a data transfer and increasing the frequency of the
transfers, which will also improve data currency, can minimize data loss.
• If the primary database fails and is corrupt, and the replica takes over, the process for restoring the
primary database so that it can be used as the replica is complex. It often involves recreating the
database and doing a database dump from the replica.
• Logic errors in applications or in the RDBMS code itself that cause database corruption will be
replicated to remote sites. This is also an issue with physical replication. However, with Oracle
Standby it could be configured such that replicated logs do not get applied immediately to the
standby database, providing a window for DBA intervention.
• Most logical replication methods do not support personality swapping, which is the ability after a
failure to allow the secondary site to become the primary and the original primary to become the
new secondary site. This capability can provide increased up time.

Recommended Data Replication


The recommended disaster tolerant architecture, if budgets allow, is the following combination:
• For performance and data currency—physical data replication.
• For data consistency—either create a second physical data replication at the remote site as a point-
in-time snapshot using BC or BCV or logical data replication which would only be used in the cases
where the primary physical replica was corrupt.

Using Alternative Power Sources


In a high-availability cluster, redundancy is applied to cluster components, such as PV links, redundant
network cards, power supplies, and disks. In disaster tolerant architectures another level of protection
is required for these redundancies. Each data center that houses part of a disaster tolerant cluster
should be supplied with power from a different circuit. In addition to a standard UPS (uninterrupted
power supply), each node in a disaster tolerant cluster should be on a separate power circuit.

Housing remote nodes in another building often implies they are powered by a different circuit, so it
is especially important to make sure all nodes are powered from a different source if the disaster
tolerant cluster is located in two data centers in the same building. Some disaster tolerant designs go
as far as ensuring their redundant power source is supplied by a different power substation on the
grid, and the power circuits are fed from different grids. This adds protection against large-scale
power failures, such as brownouts, sabotage, or electrical storms.

Creating Highly Available Networking


The two critical elements in a disaster tolerant solution are the cluster communication link, and the
data replication link or host to storage connections.

Standard high-availability guidelines require redundant networks. Redundant networks may be highly
available, but they are not disaster tolerant if a single accident can interrupt both network
connections. For example, if you use the same trench to lay cables for both networks, you do not have
a disaster tolerant architecture because a single accident, such as backhoe digging in the wrong

28
place, can sever both cables at once. This may lead to a split-brain syndrome in an Extended
Campus Cluster using dual cluster lock disks. In a disaster tolerant architecture, the reliability of the
network is paramount. To reduce the likelihood of a single accident causing both networks to fail,
redundant network cables should be installed to use physically different routes for each network.

In addition to redundant lines, you also need to consider what bandwidth you need to support the
data replication method you have chosen. Bandwidth affects the rate of data replication, and
therefore the currency of the data at the remote site. For Extended Campus Cluster, Extended Cluster
with RAC, and Metrocluster, the networking link for cluster communication should have no more than
200 milliseconds latency.

The reliability of the data replication link affects whether or not data replication happens, and
therefore the consistency of the data at the remote site. Dark fiber is more reliable but more costly
than leased lines.

Cost influences both bandwidth and reliability. It is best to address data consistency issues first by
installing redundant lines, then weigh the price of data currency and select the line speed
accordingly.

Managing a Disaster Tolerant Environment


In addition to the changes in hardware and software to create a disaster tolerant architecture, there
are also changes in the way you manage the environment. Configuration of a disaster tolerant
architecture needs to be carefully planned, implemented and maintained. There are additional
resources needed, and additional decisions to make concerning the maintenance of a disaster
tolerant architecture.

• Manage it in-house, or hire a service?


Hiring a service can remove the burden of maintaining the capital equipment needed to recover
from a disaster. Most disaster recovery services provide their own off-site equipment, which reduces
maintenance costs. Often the disaster recovery site and equipment are shared by many companies,
further reducing cost. Managing disaster recovery in-house gives complete control over the type of
redundant equipment used and the methods used to recover from disaster, giving you complete
control over all means of recovery.

• Implement automated or manual recovery?


Manual recovery costs less to implement and gives more flexibility in making decisions while
recovering from a disaster. Evaluating the data and making decisions can add to recovery time, but
it is justified in some situations, for example if applications compete for resources following a
disaster and one of them has to be halted. Automated recovery reduces the amount of time and in
most cases eliminates human intervention needed to recover from a disaster.
– You may want to automate recovery for any number of reasons:
– Automated recovery is usually faster.
– Staff may not be available for manual recovery, as is the case with “lights-out” data centers.
– Reduction in human intervention is also a reduction in human error. Disasters do not happen
often, so lack of practice and the stressfulness of the situation may increase the potential for
human error.
– Automated recovery procedures and processes can be transparent to the clients.

29
Even if recovery is automated, you may choose to, or need to recover from some types of disasters
with manual recovery. A rolling disaster, which is a disaster that happens before the cluster has
recovered from a previous disaster, is an example of when you may want to manually switch over.
If the data link failed, and as it was coming up and re-synchronizing data, a data center failed, you
would want human intervention to make judgment calls on whether the remote site has consistent
data before failing over.

• Who manages the environment and how are they trained?


Putting a disaster tolerant architecture in place without planning for the people aspects is a waste of
money. Training and documentation are more complex because the cluster is in multiple data
centers.

Each data center often has its own operations staff with their own processes and ways of working.
These operations people will now be required to communicate with each other and coordinate
maintenance and failover rehearsals, change control, IT process, as well as working together to
recover from an actual disaster. If the remote nodes are placed in a “lights-out” data center, the
operations staff may want to put additional processes or monitoring software in place to maintain
the nodes in the remote location. Rehearsals of failover scenarios are important to keep everyone
prepared. Changes made to the production environment (such as OS and/or application upgrades)
must also be tested at the recovery site, to ensure applications failover correctly in the event of
disaster. A written plan should outline rehearsal of what to do in cases of disaster with a minimum
recommended rehearsal schedule of once every 6 months, ideally once every 3 months.

• How is the environment maintained?


Planned downtime and maintenance, such as backups or upgrades, must be more carefully thought
out because they may leave the cluster vulnerable to another failure. For example, when doing
system maintenance in a Serviceguard cluster, nodes need to be brought down for maintenance in
pairs: one node at each site, so that quorum calculations do not prevent automated recovery if a
disaster occurs during planned maintenance. Rapid detection of failures and rapid repair of
hardware is essential so that the cluster is not vulnerable to additional failures. Testing is more
complex and requires personnel in each of the data centers. Site failure testing should be added to
the current cluster testing plans.

30
For more information
• Product User’s Guides and Release Notes, found at http://docs.hp.com/en/ha.html
• Current Unix Server Configuration Guide (by chapter) - found under “Ordering/Configuration
Guides” at http://source.hp.com/portal/site/source/
• DTS Whitepapers and Customer Presentations – found with the search key “DTS” at
http://source.hp.com/portal/site/source/
• HA ATC links, found at http://haweb.cup.hp.com/ATC/
• Cluster for High Availability, Second Edition, Peter S. Weygant
• DWDM: A white paper, Joseph Algieri and Xavier Dahan
• Evaluation of Data Replication Solutions, Bob Baird
• Extended SAN: A Performance Study, Xavier Dahan
• Extended MC/Serviceguard Cluster Configurations (Campus Cluster), Joseph Algieri and Xavier
Dahan
• High Availability Technical Documentation
• http://docs.hp.com/hpux/ha/index.html
• HP Extended Cluster for RAC – 100 Kilometer Separation Becomes a Reality

© 2003 Hewlett-Packard Development Company, L.P. The information


contained herein is subject to change without notice. The only warranties for
HP products and services are set forth in the express warranty statements
accompanying such products and services. Nothing herein should be construed
as constituting an additional warranty. HP shall not be liable for technical or
editorial errors or omissions contained herein.

Itanium is a trademark or registered trademark of Intel Corporation in the U.S.


and other countries and is used under license.
XXXX-XXXXEN, 03/2006

31

Potrebbero piacerti anche