Sei sulla pagina 1di 66
HP StorageWorks 3 Data Center Replication White paper
HP StorageWorks 3 Data Center Replication
White paper

Table of contents

Executive summary

3

The HP StorageWorks 3 Data Center Replication solution

4

HP StorageWorks XP Continuous Access Synchronous

5

HP StorageWorks XP Continuous Access Asynchronous

6

HP StorageWorks XP Continuous Access Journal

8

Recovery point objective

10

RPO equal to 0

10

RPO greater than 0

10

Recovery time objective

10

RTO between one and five minutes

10

RTO of one hour or more

10

The prior Multi-Site Disaster Tolerant Solution (MSDT)

11

3DC with HP StorageWorks XP Continuous Access Journal configurations

12

Comparing cascading and multi-target configurations

13

Detailed rules and restrictions for 3DC configurations

15

HP StorageWorks XP Continuous Access Journal and mirror unit descriptors

15

A

volume can only have one source

18

Replication pairs independent of each other

20

Suspending a downstream link before the upstream link goes out of order

21

3DC failover and failback actions without a third link

23

Action 1: Transition from a cascaded to multi-target configuration

25

Action 2: Transition from a multi-target to cascaded configuration

26

Action 3: Cascaded configuration data center failure

27

Failback to DC1 after a cascaded configuration failover

28

Action 4: Multi-target configuration data center failure

29

Failback to DC1 after a multi-target configuration failover to DC2

30

Action 5: Multiple data center failures

31

Failback to DC1 after a multiple DC failure

32

The benefits of the optional third data link

37

Action 6: Using the optional third link

39

Recovery after the use of the optional third data link

43

Other configuration options

50

4DC configuration

50

A

3DC configuration with four copies

51

3DC with fast delta resynchronization

52

Appendix A—CLI Management of 3DC configurations

53

HP StorageWorks XP RAID Manager and cascaded (forwarded) commands

55

Raidscan command options

56

Options for the pairsplit command

58

XP RAID Manager and 3DC delta resync

59

The “nocsus” paircreate command options

60

Displaying the journal status for “-nocsus”

61

The ”horctakeoff” command for 3DC

62

Horctakeoff command on running L2 site

62

Horctakeoff command on L1 local site

63

Horctakeoff command on L1 recovery site

64

Horctakeoff command on L2 remote site

65

For more information

66

Executive summary

To stay competitive in today’s business climate, organizations must find ways to meet their business needs while controlling IT costs. IT departments face flat budgets and, at the same time, find that their organizations have become increasingly dependent on uninterrupted access to business-critical data. New governmental regulations are requiring some types of businesses to implement resilient disaster-tolerant systems and processes. Adding to this burden is the accelerating pace of change, which must be effectively managed for a business to succeed.

In today’s world, prudent IT administrators prepare to recover from two types of disasters as part of a complete Business Continuity and Availability (BC and A) plan. The first is a localized disaster, affecting a building or a small set of buildings. The second is a wide-area disaster, such as a hurricane or a regional power outage. Enterprises must replicate data to alternate data centers, located at a variety of distances from the primary data center, while maintaining acceptable data currency standards.

Until now, available remote replication technology has involved drawbacks. Synchronous remote replication solutions are designed to maintain total data currency, but can induce application performance degradation when replicating over long distances. By contrast, asynchronous and journal remote replication solutions are designed to provide acceptable performance and data consistency over long replication distances, but cannot guarantee complete data currency.

Today however, the limitations of the previous solutions can be overcome by the HP StorageWorks 3 Data Center Replication (3DC) architecture. This technology provides data currency and consistency. This technology protects against both local and wide-area disasters by simultaneously supporting short-distance synchronous replication and long-distance asynchronous replication emanating from the same source volume.

This paper provides details on the planning, configuration, and maintenance of a 3DC solution.

Note A 3DC solution is only possible with the HP StorageWorks XP24000/XP20000/XP12000/XP10000 Disk Arrays.

The HP StorageWorks 3 Data Center Replication solution

Figure 1 shows a high-level implementation of a 3DC solution that combines the data consistency of synchronous replication with the long-distance capability of asynchronous or journal replication to protect against local and wide-area disasters. This technology provides other benefits, including:

Maintaining more efficient data currency. Using synchronous replication over a short distance in a campus or metropolitan area cluster provides the highest level of data currency without undue impact to application performance.

Permitting swift recovery. A campus/metropolitan cluster implementation allows for fast automated failovers after a local area disaster with minimal to no transaction loss.

Permitting recovery even when a disaster exceeds traditional regional boundaries. A wide-area disaster could disable both data centers 1 and 2, but with some manual interaction, operations can be shifted to data center 3 and continue after the disaster.

Shifting to staffing outside the disaster area. A wide-area disaster also affects people located within the disaster area, both professionally and personally. By moving operations out of the region to a remotely located recovery data center, operational responsibilities shift to people not directly affected by the disaster.

shift to people not directly affected by the disaster. Figure 1: 3DC solution benefits Cross country/Cont

Figure 1: 3DC solution benefits

Cross country/Continental distance

Cross country/Continental distance

journal replication

journal replication

• •

Protect against both Local and regional disasters

Protect against both Local and regional disasters

• • Push button failover Push button failover • • Data consistency guaranteed Data consistency
• •
Push button failover
Push button failover
• •
Data consistency guaranteed
Data consistency guaranteed
Data
Data
Center 2
Center 2
Data
Data
Data
Data
Center1
Center1
Center 3
Center 3
Campus/metropolitan distance
Campus/metropolitan distance
synchronous replication
synchronous replication
• •
Minimized performance impact to application
Minimized performance impact to application
• •
Fast automated failover during local disasters
Fast automated failover during local disasters
• •
Data currency guaranteed
Data currency guaranteed
disasters Fast automated failover during local disasters • • Data currency guaranteed Data currency guaranteed 4

HP StorageWorks XP Continuous Access Synchronous

HP StorageWorks XP Continuous Access Synchronous (previously known as HP StorageWorks Continuous Access XP) is an HP StorageWorks XP “array-to-array” mirroring product that provides synchronous remote replication. It is ideal for replicating data over local or metropolitan networks with small latencies (typically less than 2 ms) and provides replication with full data currency on both the local and remote data center. Because every I/O written to the local array must be sent over the link to the remote array, completed on the remote array, and confirmed to the local array before the I/O can be completed, the latency of the link affects each I/O. Therefore, host performance is directly impacted by the amount of network latency between the data centers.

Most companies prefer synchronous replication because of the high currency of data on the remote site. However, the data centers must be located close to each other to reduce latency performance impact. This situation often leads to data centers being placed in the same infrastructure, which places companies at risk of no operational data in the event of a large disaster.

Seamless integration into a campus/metropolitan cluster environment allows for fast, automated failover between synchronously connected data centers with high levels of data currency. This arrangement is ideal for protection against limited disasters that affect only a single data center.

Figure 2 shows the XP Continuous Access Synchronous (Sync) process steps.

1. The primary data center server writes to the primary data volume.

2. The primary array writes data to the remote array.

3. The remote array acknowledges to the primary array that the write was received.

4. I/O is acknowledged by the primary array to the server.

4. I/O is acknowledged by the primary array to the server. Figure 2: HP StorageWorks XP

Figure 2: HP StorageWorks XP Continuous Access Synchronous processes

Data Center 1 Data Center 1 Data Center2 Data Center2 P-vol P-vol S-vol S-vol write
Data Center 1
Data Center 1
Data Center2
Data Center2
P-vol
P-vol
S-vol
S-vol
write
write
1 1
2 2
write
write
cache
cache
ack
ack
ack
cache
cache
4 4
ack
3 3
P-vol S-vol S-vol write write 1 1 2 2 write write cache cache ack ack ack

HP StorageWorks XP Continuous Access Asynchronous

HP StorageWorks XP Continuous Access Asynchronous (Async, previously known as HP StorageWorks Continuous Access XP Extension) is an HP StorageWorks XP “array-to-array” mirroring product that provides asynchronous remote replication by holding any unsent data in an area of the cache called the sidefile. It provides an ideal solution for longer distance over 50–200 km (31–124 miles) replication, with far less latency impact on the performance of the primary server. The remotely replicated data is consistent (but most of the time not current) and can lead to some transactions being lost during a disaster. However, the added advantage of the distance between data centers makes it possible to have most of the data to restart an application or recover in the event of a large-scale disaster. Meanwhile, the host I/O is not affected by the latency between data centers.

Although Continuous Access XP Async is a trusted solution for long-distance replication, it has some operational challenges. The storage of unsent data in the sidefile or cache can result in very high utilization of array cache resources. The array must maintain data consistency at all times, all unsent data must be maintained until it is possible to send this data to the remote array and receipt of the data has been acknowledged by the remote array. If a host or application writes an abnormally large burst of data to the primary array, much higher than the physical link between arrays can handle, the sidefile begins to fill. When the sidefile reaches a preset value (default is 30% of total cache capacity area dedicated for cache LUNs), the array must implement new I/O inflow control to enable that the array will not run out of internal resources. This procedure slows host performance only until the sidefile’s unsent data volume returns to a normal level, but the impact to the host and application can be significant. The alternative is to suspend the replication process and begin to track any new changes with a per-track bitmap, but this option freezes the remote data center data currency at the point of replication suspension. This application impact due to limited internal cache resources is one of the drawbacks of the Continuous Access XP Async solution.

Another issue that results from the limited internal resources is when the physical replication link fails, even for a short time. There are not enough cache resources to store the unsent data. As a result, the replication process suspends soon after the physical link fails. This in itself is not a critical issue because data consistency is still assured on the remote data center. However, any changes on the production volumes during suspend status are tracked using a bitmap. Therefore, when performing a replication volume pair resynchronization operation after the physical links have been recovered, out of order data is sent to the remote data center until the resynchronization completes. This situation leaves the remote data center with inconsistent data during the time window required to complete the resynchronization operation. This condition can be detrimental if the primary data center suffers a disaster during the bitmap-based resynchronization, leaving a company with no remotely usable data, unless a best practice is followed to create a local data copy at the remote site before beginning the resynchronization.

Figure 3 shows the XP Continuous Access Asynchronous (Async) process steps.

1. The primary data center server writes to cache and the primary data volume.

2. I/O is immediately acknowledged by the array to the server.

3. The cached data is written asynchronously and in order to the remote array (a “push” data model).

4. The remote data center XP array acknowledges the write to the primary array, clearing the cache.

the write to the primary array, clearing the cache. Figure 3: HP StorageWorks XP Contin uous

Figure 3: HP StorageWorks XP Continuous Access Async processes

Data Center 1 Data Center 1 Data Center2 Data Center2 P-vol P-vol S-vol S-vol 3
Data Center 1
Data Center 1
Data Center2
Data Center2
P-vol
P-vol
S-vol
S-vol
3 3
1 1
write
write
write
write
SidefileSidefile
Sidefile
cache
cache
ack
ack
cache
cache
2 2
ack
ack
4 4
cache
cache
ack ack cache cache 2 2 ack ack 4 4 cache cache Continuous Access XP Async

Continuous Access XP Async does not support multiple remote copies from a single source, or a combination of Continuous Access XP Sync and Continuous Access XP Async replication from the same primary data volume. Therefore, you cannot use Continuous Access XP Async in the 3DC solution. The older HP StorageWorks Multi-Site Disaster Tolerant Solution (MSDT) used Continuous Access XP Sync and Continuous Access XP Async in combination with HP StorageWorks XP Business Copy to replicate data over three data centers.

HP StorageWorks XP Continuous Access Journal

HP StorageWorks XP Continuous Access Journal is an HP StorageWorks XP “array-to-array” mirroring product that improves the Continuous Access XP Async product by replacing the cache-based sidefile with dedicated disk LDEV volumes, known as a “journal pool.” This method improves the ability to store in-order unsent data by using larger and less expensive storage volumes and improves the resiliency of the replication solution. This increased capacity for unsent data overcomes the limitation of Continuous Access XP Async regarding sudden bursts of high I/O from the application filling the cache-based sidefile. Continuous Access XP Journal can handle much larger and longer bursts of data from the application without resorting to an out-of-order bitmap.

The larger journal pool space also enables XP Continuous Access Journal to continue operations for a substantial period of time following a physical replication link failure, before data must be tracked by a bitmap. It continues the in-order journal replication as long as the physical replication link recovers before the journal volume is full, thereby maintaining data consistency on the remote array more often.

XP Continuous Access Journal is the ideal solution for high network latency and low network bandwidth solutions and overcomes the limitations Continuous Access XP Async may have with physical replication links with insufficient bandwidth to cope with application I/O spikes or that might be susceptible to short periods of communication outages. Whereas, Continuous Access XP Async required remote replication links to be sized to accommodate ”peak” write activity, XP Continuous Access Journal allows links to be sized for “average” write I/O activity, which can account for significant savings on leased lines.

Figure 4 shows the XP Continuous Access Journal process steps.

1. The primary volume (P-VOL) receives a write command from the host.

2. The primary array stores the received write data in its cache, and creates metadata for the write in the journal area in the cache. If necessary, the journal metadata and write data are de-staged to the disk-based journal volumes (P-JNL).

3. The primary array acknowledges the write to the host, enabling the host to perform the next write operation.

4. The remote array regularly polls the primary array for journal information by using a read journal operation (a ”pull” data model).

5. When the primary array has any journal data available, it responds to the read journal operation and sends all journal data in order by way of a single I/O or multiple large I/Os to the remote array.

6. The remote array sorts journal data according to sequence numbers and then applies the data in order to the secondary data volumes.

7. When data is successfully applied to secondary data volumes (S-VOL), the remote array informs the primary array of successfully applied journal sequence numbers. The primary array discards these sequence numbers from the local journal volume.

these sequence numbers from the local journal volume. Figure 4: HP StorageWorks XP Continuous Access Journal

Figure 4: HP StorageWorks XP Continuous Access Journal processes

Data Center 1 Data Center 1 Data Center 2 Data Center 2 S-vol S-vol P-vol
Data Center 1
Data Center 1
Data Center 2
Data Center 2
S-vol
S-vol
P-vol
P-vol
S-jnl
S-jnl
P-jnl
P-jnl
write
write
1 1
4 4
7 7
ACK
ACK
Read
Read
6 6
cache
cache
cache
cache
ack
ack
3 3
5 5
Write
Write
Restore
Restore
2 2
Record
Record
3 5 5 Write Write Restore Restore 2 2 Record Record XP Continuous Access Jour nal

XP Continuous Access Journal can be used in combination with Continuous Access XP Sync, and allows for two simultaneous copies of the same (primary) source volume. This functionality allows for a three data center replication solution with Continuous Access XP Sync as the local metropolitan replication and XP Continuous Access Journal as the out-of-region remote replication. Both the copies can be active simultaneously and are not dependent.

Recovery point objective

A recovery point objective (RPO) is a goal of how much data can be lost tolerably in the event of a

major disruption or disaster. Disaster recovery solutions define an RPO for all applications, with the most critical applications having the shortest RPO. A recovery is only possible if a data-consistent copy exists. Therefore, be sure that one or more data-consistent and unaltered copies are available

(the more recent the copy, the better the recovery point).

RPO equal to 0

When an application is sufficiently critical that the business cannot afford the loss of any transactions, its RPO is defined as being 0.

If an RPO of 0 is required, synchronous replication technology is the only solution available that can deliver it. Synchronous replication does not allow the primary data center to process a transaction until the secondary data center signals that the previous transaction has been received. If the

communication link between primary and secondary data centers is broken, synchronous replication can be set up to prevent the primary data center from processing transactions until the link is repaired

or replication is disabled.

The performance of synchronous solutions is sensitive to the network latency between primary and secondary data centers. The longer the distance, the more latency is induced in each transaction. For this reason, HP recommends synchronous replication only for distances shorter than 50–200 km (approximately 31–124 miles).

RPO greater than 0

An RPO greater than 0 indicates that the administrator is willing to sacrifice some amount of data in the event of a disaster due to the use of asynchronous replication.

For this type of solution, the asynchronous or journal replication options are best. Although complete data currency is not assured for this solution, data consistency is provided when using any of the HP StorageWorks XP Continuous Access products.

Recovery time objective

A recovery time objective (RTO) indicates how much time will pass before applications are available

for users again. Any disaster recovery solution requires a defined RTO.

RTO between one and five minutes

This setting indicates that applications on the disaster recovery data center systems must be automated with very fast reaction time to any disaster, not waiting for in-flight data to arrive. This requirement normally indicates cluster integration.

RTO of one hour or more

This setting indicates that the administrator is willing to first assess the disaster and any possible data loss before initiating a data center failover and application recovery. Recovery can be automatic, but

in

most cases, a push-button (single command failover) approach is used, after the decision to recover

is

made.

Part of the reason behind such a large RTO may be the time necessary to enhance the recovery point before starting the application in the recovery data center.

For more information about implementing a cascaded disaster recovery solution using HP MetroCluster/ContinentalClusters solution, read the ”Cascading Failover in a ContinentalClusters”

The prior Multi-Site Disaster Tolerant Solution (MSDT)

The MSDT solution (see Figure 5) was the first to support replication over multiple data centers, even before the availability of the current 3DC solution. The obsolete MSDT solution used a combination of HP XP Continuous Access products (XP Continuous Access) and HP XP Business Copy for a synchronous solution between two nearby data centers with a delayed point-in-time copy to a third, distant data center. The solution was quite complex to configure and required constant monitoring and maintenance, specifically the scripting of the data propagation to data center 3. Only a static point-in-time copy of data was available on data center 3, and the process of deciding between the Continuous Access XP Async copy or the HP XP Business Copy copy of data on data center 3 was very complex.

Copy copy of data on data center 3 was very complex. Figure 5: HP StorageWorks MSDT

Figure 5: HP StorageWorks MSDT solution

XP Continuous XP Continuous Access Access Synchronous Synchronous P-vol P-vol S/P-vol S/P-vol S-vol S-vol XP
XP Continuous
XP Continuous
Access
Access
Synchronous
Synchronous
P-vol
P-vol
S/P-vol
S/P-vol
S-vol
S-vol
XP Continuous
XP Continuous
Access
Access
BC
BC
BC
BC
Asynchronous
Asynchronous
S/P-vol
S/P-vol
S/P-vol
S/P-vol
XP Continuous Access Access BC BC BC BC Asynchronous Asynchronous S/P-vol S/P-vol S/P-vol S/P-vol 11

3DC with HP StorageWorks XP Continuous Access Journal configurations

With the development of HP XP Continuous Access Journal technology, it became possible to propagate data that was generated by an application and replicated using XP Continuous Access Sync directly and concurrently to a third data center using journal volumes. This process greatly improves the MSDT solution by removing the requirement for a constant data propagation process. Data is consistent in all data centers and is much more current on data center 3 than what the MSDT point-in-time process could deliver. Figure 6 shows the 3DC “cascaded” disaster-tolerant solution.

6 show s the 3DC “cascaded” disaster-tolerant solution. Figure 6: HP StorageWorks 3DC disaster-tolerant solution XP

Figure 6: HP StorageWorks 3DC disaster-tolerant solution

XP Continuous XP Continuous Access Access Synchronous Synchronous P-vol P-vol S/P-vol S/P-vol S-vol S-vol XP
XP Continuous
XP Continuous
Access
Access
Synchronous
Synchronous
P-vol
P-vol
S/P-vol
S/P-vol
S-vol
S-vol
XP Continuous
XP Continuous
Access
Access
Journal
Journal
P-jnl
P-jnl
S-jnl
S-jnl
Access Access Journal Journal P-jnl P-jnl S-jnl S-jnl In this solution, data is replicated from data

In this solution, data is replicated from data center 1 to data center 2 using XP Continuous Access Sync. The data is then automatically journaled on data center 2 and replicated to data center 3 using XP Continuous Access Journal. Both replication pairs can remain in PAIR status at all times. No point- in-time operations or scripting operation is necessary to keep data on data center 3 virtually current and available.

Comparing cascading and multi-target configurations

When designing a 3DC solution, two configurations can be considered: cascading and multi-target. In addition, the configuration can switch between these two configurations at any time during normal operation of a 3DC solution. The main characteristics of the configuration are determined by two factors: where data enters the configuration (that is, on which data center the application is running) and in what direction the data flows. Figure 7 shows the cascaded and multi-target configuration data flows.

the cascaded and mult i-target configuration data flows. Figure 7: Cascaded and multi-target configurations Always

Figure 7: Cascaded and multi-target configurations

Always check the flow of Data, and the data entry point

Always check the flow of Data, and the data entry point

Cascaded (1:1:1) Cascaded (1:1:1) Multi Target (1:2) Multi Target (1:2)

Cascaded (1:1:1)

Cascaded (1:1:1)

Multi Target (1:2)

Multi Target (1:2)

Multi Target (1:2) Multi Target (1:2)
P-vol P-vol
P-vol
P-vol
S/P-vol S/P-vol S-vol S-vol
S/P-vol S/P-vol
S/P-vol
S/P-vol
S-vol S-vol
S-vol
S-vol
S-vol S-vol
S-vol
S-vol
S-vol S-vol P/P-vol P/P-vol
P/P-vol P/P-vol
P/P-vol
P/P-vol
S-vol S-vol
S-vol
S-vol
P-jnl P-jnl S-jnl S-jnl
P-jnl
P-jnl
S-jnl
S-jnl

Direction of data flow

Direction of data flow

P-jnl P-jnl S-jnl S-jnl
P-jnl
P-jnl
S-jnl
S-jnl

Direction of data flow

Direction of data flow

 

Cascaded (1:1:1)

Cascaded (1:1:1)

Cascaded (1:1:1) Cascaded (1:1:1)

Multi Target (1:2)

Multi Target (1:2)

P-vol P-vol
P-vol
P-vol
S/P-vol S/P-vol
S/P-vol
S/P-vol

Direction of data flow

Direction of data flow

P-jnl P-jnl
P-jnl
P-jnl
S-vol S-vol S-jnl S-jnl
S-vol
S-vol
S-jnl
S-jnl
S-vol S-vol
S-vol
S-vol
S-vol S-vol Direction of data flow Direction of data flow P/P-vol P/P-vol S-vol S-vol P-jnl P-jnl
Direction of data flow Direction of data flow P/P-vol P/P-vol S-vol S-vol P-jnl P-jnl S-jnl
Direction of data flow
Direction of data flow
P/P-vol
P/P-vol
S-vol
S-vol
P-jnl
P-jnl
S-jnl
S-jnl
S-vol S-vol Direction of data flow Direction of data flow P/P-vol P/P-vol S-vol S-vol P-jnl P-jnl
S-vol Direction of data flow Direction of data flow P/P-vol P/P-vol S-vol S-vol P-jnl P-jnl S-jnl

Cascaded configuration: In this configuration, the data enters the system on one end, is replicated synchronously to the next available storage array (data center), and from there it is replicated to the last storage array (data center). In most cases, starting point of the configuration indicates the data center or host that runs the application under normal conditions, with the second data center being the automated cluster failover node and the third data center being the manual failover long-distance node.

Multi-target configuration: In this configuration, the data enters the configuration on a specific node and then splits into two directions. One direction is the synchronous replication to data center 2. The other direction is the journaled replication to data center 3.

Your situation and requirements determines when you will need a cascaded solution or a multi-target solution. Normally, the storage array that runs the application most of the time determines the configuration, but during a failover on the synchronous replication pair, a switch from cascade to multi-target or from multi-target to cascade occurs. There are no recommendations on whether to use one rather than the other, and both configurations have their own strengths and drawbacks, although multi-target allows for the delta (no full copy required) resync of the two remaining nodes in the event of a disaster at the primary node. This paper discusses both configurations and how to handle failures and recoveries in these configurations.

For documentation purposes, be sure to identify the point of data origin in a configuration and the direction of data flow. Not all diagrams have the point of origin on the left side of the diagram, which can be confusing. In some configurations, it might be that more than one application will be running in the configuration, each at its own point of origin and, therefore, each with its own configuration. In some cases, data center naming will be arbitrary and have no specific meaning to the configuration. For example, data center 1 often indicates the primary data center for the application and, therefore, the origination point of the data, but an application move to data center 2 changes the origination point and, in this case, indicates that the application is running on a failover node, not the primary node.

Detailed rules and restrictions for 3DC configurations

This section discusses basic rules and restrictions, as well as some special commands used to control 3DC solutions. It is important to understand the applicable restrictions when combining two replication products.

HP StorageWorks XP Continuous Access Journal and mirror unit descriptors

A mirror unit descriptor or number (MU#) is a special index number available with all volumes that provides an individual designator for each local or remote copy of a volume.

With the HP StorageWorks XP24000/XP20000/XP12000/XP10000 arrays, the internal structures of each volume support seven full copy mirror unit descriptors and 64 snapshot copy unit descriptors. Three of the full copy MU#s are for local replication copies using XP Business Copy, and are represented in the HP StorageWorks XP RAID Manager configuration file by the values 0, 1, and 2. The XP RAID Manager configuration file assumes MU# of 0 when no MU# is specified in the configuration file. However, to avoid confusion, the 0 should be explicitly defined in the configuration for XP Business Copy.

The fourth full copy MU# is the standard MU# used for remote replication and can be used for XP Continuous Access Sync, XP Continuous Access Async, or XP Continuous Access Journal replication. This MU# is never explicitly defined in the XP RAID Manager configuration file, and should always be left blank. This is also the only MU# that can be used for XP Continuous Access Sync or XP Continuous Access Async replication, and should be used for this purpose whenever a 3DC solution is configured.

The remaining three full copy MU#s are for XP Continuous Access Journal replication pairs only. These MU#s are represented by h1, h2, and h3 values in the XP RAID Manager configuration file. Currently, the XP24000/XP20000/XP12000/XP10000 Disk Arrays only support one XP Continuous Access Journal pair at any point in time per volume, and this one pair can use any of the four XP Continuous Access Journal MU#s.

With the XP24000/XP20000/XP12000/XP10000 Disk Arrays, it is possible to use XP Continuous Access Journal in combination with XP Continuous Access Sync to create two independent copies from the same source volume. When creating this configuration, the XP Continuous Access Sync replication must use MU# 0 and the XP Continuous Access Journal replication must use one of the remaining three “h” MU#s for remote replication.

Figure 8 shows all the available MU#s for each data volume, as well as the value to use in the XP RAID Manager configuration file to identify the specific replication instance for the volume and any environment variables (for example, HORCC_MRCF) necessary to address the copy using XP RAID Manager.

X P C o n t i n u o u s A c c

XP Continuous Access Jnl

XP Continuous Access Jnl

Figure 8: Mirror unit descriptors

BC MU#0 BC MU#0 BC MU#1 BC MU#1 BC3 BC3 BC MU#2 BC MU#2 Snap
BC MU#0
BC MU#0
BC MU#1
BC MU#1
BC3
BC3
BC MU#2
BC MU#2
Snap MU#0-63
Snap MU#0-63
BC2
BC2
BC1
BC1

RM MU# 0 (Specify explicitly)

RM MU# 0 (Specify explicitly)

HORCC_MRCF=1

HORCC_MRCF=1

RM MU# 1

RM MU# 1

HORCC_MRCF=1

HORCC_MRCF=1

RM MU# 2

RM MU# 2

HORCC_MRCF=1

HORCC_MRCF=1

XP Continuous Access Sync/Async

XP Continuous Access Sync/Async

Continuous Access Sync/Async XP Continuous Access Sync/Async CA-MU#0 CA-MU#0 Jnl MU#1 Jnl MU#1 Jnl MU#2 Jnl

CA-MU#0

CA-MU#0

Jnl MU#1

Jnl MU#1

Jnl MU#2

Jnl MU#2

Jnl MU#3

Jnl MU#3

XP Continuous Access Jnl

XP Continuous Access Jnl

Jnl MU#3 XP Continuous Access Jnl XP Continuous Access Jnl XP Continuous Access Jnl XP Continuous

XP Continuous Access Jnl

XP Continuous Access Jnl

Access Jnl XP Continuous Access Jnl XP Continuous Access Jnl XP Continuous XP Continuous Access Jnl
XP Continuous XP Continuous Access Jnl Access Jnl
XP Continuous
XP Continuous Access Jnl
Access Jnl
Jnl XP Continuous XP Continuous Access Jnl Access Jnl S-vol S-vol S-vol S-vol S-vol S-vol S-vol
S-vol S-vol
S-vol
S-vol
S-vol S-vol
S-vol
S-vol
S-vol S-vol
S-vol
S-vol
S-vol S-vol
S-vol
S-vol

Do not specify MU#

Do not specify MU#

in RM config file

in RM config file

RM MU# h1

RM MU# h1

RM MU# h2

RM MU# h2

RM MU# h3

RM MU# h3

RM MU# h1 RM MU# h1 RM MU# h2 RM MU# h2 RM MU# h3 RM

Note For XP Continuous Access Journal, the mirror unit descriptor used in the XP RAID Manager configuration file has an “h” prefix. Only one XP Continuous Access Journal replication pair is supported at any point in time per volume, but XP Continuous Access Journal can be used in conjunction with XP Continuous Access Sync to create two remote copies of a P-VOL.

When creating a 3DC solution, the XP Continuous Access Sync pair always uses the remote replication mirror unit descriptor “0,” or CA-MU #0. The XP Continuous Access Journal replication pair uses mirror unit descriptor “1,” and in some configurations, mirror unit descriptor “2” is used to create temporary replication pairs.

“1,” and in some co nfigurations, mirror unit de scriptor “2” is used to create temporary

Figure 9: 3DC solution and mirror descriptor usage

P-vol P-vol P-jnl P-jnl
P-vol
P-vol
P-jnl
P-jnl
S-jnl S-jnl S-vol S-vol
S-jnl
S-jnl
S-vol
S-vol

Multi Target (1:2)

Multi Target (1:2)

XP Continuous

XP Continuous

Access Sync Access Sync #0 #0 #0 #0
Access Sync
Access Sync
#0
#0
#0
#0
S-vol S-vol
S-vol
S-vol

XP Continuous

XP Continuous

Access Journal

Access Journal

XP Continuous Access Sync replication using MU#0

XP Continuous Access Sync replication using MU#0

XP Continuous Access Journal replication using MU#1

XP Continuous Access Journal replication using MU#1

P-vol P-vol
P-vol
P-vol

Cascaded (1:1:1)

Cascaded (1:1:1)

XP Continuous

XP Continuous

Access Sync Access Sync #0 #0
Access Sync
Access Sync
#0
#0
S/P S/P XP Continuous XP Continuous S-vol S-vol #0 #0 Access Journal Access Journal P-jnl
S/P
S/P
XP Continuous
XP Continuous
S-vol
S-vol
#0
#0
Access Journal
Access Journal
P-jnl
P-jnl
S-jnl
S-jnl
#1
#1
#1
#1
XP Continuous S-vol S-vol #0 #0 Access Journal Access Journal P-jnl P-jnl S-jnl S-jnl #1 #1

A volume can only have one source

Although a single volume can be directly replicated to several separate volumes, a volume can only receive data from one source at a time. Data can be received from either a host attached to the array or from a primary volume in the configuration.

Figure 10 shows a valid configuration in which the volume only receives data from one source. The volume in the middle of the configuration receives data from the left and forwards data on to the right (data is cascaded “downstream”).

data on to the right (data is cascaded “downstream”). Figure 10: An S-VOL can have only

Figure 10: An S-VOL can have only one data source

S-vol S-vol
S-vol
S-vol
P-vol P-vol #0 #0 S/P S/P #1 #1
P-vol
P-vol
#0
#0
S/P
S/P
#1
#1
P-vol P-vol #0 #0 S/P S/P #1 #1
source S-vol S-vol P-vol P-vol #0 #0 S/P S/P #1 #1 Figure 11 shows an invalid

Figure 11 shows an invalid configuration in which a volume receives data from two primary volumes. This configuration is not allowed. The array fails the second pair creation operation or any takeover operation that would result in such a configuration. This situation typically occurs in a cascaded environment in which the application is moved from data center 1 to data center 3, and the volume on data center 2 must now act as an S-VOL for two pairs. For more information, see the “Action 5:

Multiple data center failure” section in this paper.

5: Multiple data center failure ” section in this paper. Figure 11: Receiving data from two

Figure 11: Receiving data from two different sources is not allowed

11: Receiving data from two different sources is not allowed P-vol P-vol #0 #0 S/P S/P
P-vol P-vol #0 #0 S/P S/P #1 #1
P-vol
P-vol
#0
#0
S/P
S/P
#1
#1
P-vol P-vol #0 #0 S/P S/P #1 #1 P-vol P-vol
P-vol P-vol
P-vol
P-vol

The exception to the rule is when a takeover command on an S-VOL (center) failed because a P-VOL (left) was not available or because replication links were down, and the S-VOL is left in SSWS (S-VOL suspend with swap pending) status. In this status, the S-VOL can still receive data from a host system, and a takeover from a downstream S-VOL (right) is also allowed. This status allows the volume to appear to act as an S-VOL for two replication pairs. But even in this status, it is possible to perform updates from only one source.

it is possible to perform updates from only one source. Figure 12: Exception for S-VOL in

Figure 12: Exception for S-VOL in SSWS status

P-vol P-vol
P-vol
P-vol

SSWS

SSWS

#0 #0
#0
#0

takeover

takeover

S/S S/S #1 #1 P-vol P-vol
S/S
S/S
#1
#1
P-vol
P-vol
for S-VOL in SSWS status P-vol P-vol SSWS SSWS #0 #0 takeover takeover S/S S/S #1

Replication pairs independent of each other

When multiple replication pairs emanate from a single volume, each pair status is treated independently. A failure on one pair does not affect the status of another pair, as shown in Figure 13.

is treated independently. A failure on one pair does not affect the status of another pair,

Figure 13: Failure on one pair does not affect other pairs

Fenced Fenced P-vol PSUE P-vol PSUE CA-MU#0 CA-MU#0 P-vol PAIR P-vol PAIR S-vol S-vol CA-MU#0
Fenced
Fenced
P-vol PSUE
P-vol PSUE
CA-MU#0
CA-MU#0
P-vol PAIR
P-vol PAIR
S-vol
S-vol
CA-MU#0
CA-MU#0
XP Cont. Acc. Journal
XP Cont. Acc. Journal
XP Cont. Acc. Sync
XP Cont. Acc. Sync
P-vol
P-vol
Jnl MU#1
Jnl MU#1
P-jnl
P-jnl
S-vol
S-vol
S-jnl
S-jnl
P-vol PAIR
P-vol PAIR
Jnl MU#1
Jnl MU#1
S-jnl S-jnl P-vol PAIR P-vol PAIR Jnl MU#1 Jnl MU#1 The figure shows that a failure

The figure shows that a failure on the XP Continuous Access Sync pair changes the pair status to suspend with error (PSUE) and disables host write I/O to the volume (XP Continuous Access Sync with fence-level data). Although this failure stops host I/O to the P-VOL, the XP Continuous Access Journal pair status is not influenced by this failure and the pair remains in PAIR status. A local takeover on the P-VOL (enabling the P-VOL for host I/O without the XP Continuous Access Sync replication protection) also maintains the XP Continuous Access Journal replication environment.

In the same way, if the XP Continuous Access Journal link suffers a failure, the XP Continuous Access Journal pair status changes to suspend with error (PSUE), but this condition does not influence the XP Continuous Access Sync operations.

Suspending a downstream link before the upstream link goes out of order

Whenever a replication pair goes into COPY status for any length of time, it indicates that updates to the secondary volume are performed out of order and on a bitmap basis. This situation results in the secondary volume being in an inconsistent data state for some time. If this secondary volume is also the source volume for further downstream replication, it is not allowed to have the downstream replication pair in PAIR status because inconsistent data would be replicated on a pair with a consistent PAIR status. Therefore, whenever an upstream pair must be resynchronized (or during some takeover operations) and would be inconsistent for some time, it is necessary to suspend downstream replication pairs first. Figure 14 shows this process. In the XP24000 and XP20000 with firmware 60- 01-68-00/00 or greater, and the XP120000 and XP10000 50-05-06-00/00 and greater firmware versions, this operation is performed automatically. However, resynchronization of the downstream pair, after the upstream pair reaches PAIR status, is an administrator-initiated manual operation.

status, is an administrator-initiated manual operation. Figure 14: Suspending a downstream link before the upstream

Figure 14: Suspending a downstream link before the upstream link goes out of order

Group1 cannot be in Group1 cannot be in COPY state if Group2 is COPY state
Group1 cannot be in
Group1 cannot be in
COPY state if Group2 is
COPY state if Group2 is
Need to
Need to
in PAIR status
in PAIR status
re-sync this
re-sync this
group
group
Direction of data flow
Direction of data flow
P-vol
P-vol
S/P-vol
S/P-vol
S-vol
S-vol
Group1
Group1
0 0
P-jnl
P-jnl
PSUS
PSUS
S-jnl
S-jnl
Group2
Group2
3
3
Re-sync
Re-sync
0
0
V05 FW will do this step
V05 FW will do this step
PAIR
PAIR
4
4
COPY
COPY
automatically when
automatically when
1
1 Suspend
Suspend
performing re-sync on
performing re-sync on
5
5
PAIR
PAIR
Group1 (step 3).
Group1 (step 3).
2
2
PSUS
PSUS
Re-sync
Re-sync
6
6
7
7
COPY
COPY
8
8
PAIR
PAIR
PAIR Group1 (step 3). Group1 (step 3). 2 2 PSUS PSUS Re-sync Re-sync 6 6 7

Likewise, before performing a takeover from XP Continuous Access Journal volume group (group2) (which causes the XP Continuous Access Journal link to go out of order), you must suspend the XP Continuous Access Sync pair (group1). Performing the action shown in Figure 15, enables that out-of- order or garbage data will not be propagated to XP Continuous Access Sync volumes.

in Figure 15, enables that out-of- order or garbage data will not be propagated to XP

Figure 15: Suspending the upstream link before the downstream link goes out of order

Group1 cannot be in

Group1 cannot be in

PAIR state if Group2 is

PAIR state if Group2 is

going to be in COPY

going to be in COPY

status after takeover

status after takeover

S-Vol S-Vol P/P-vol P/P-vol Group1 Group1 0 0 PAIR PAIR P-Jnl P-Jnl
S-Vol
S-Vol
P/P-vol
P/P-vol
Group1
Group1
0 0
PAIR
PAIR
P-Jnl
P-Jnl
1 1
1
1

Suspend

Suspend

0 0
0
0
2 2
2
2
6 6
6
6
7 7
7
7
8 8
8
8
PSUS PSUS
PSUS
PSUS
PSUS PSUS Re-Sync Re-Sync COPY COPY PAIR PAIR

Re-Sync

Re-Sync

COPY COPY
COPY
COPY
PAIR PAIR
PAIR
PAIR
PSUS PSUS Re-Sync Re-Sync COPY COPY PAIR PAIR
3 3
3
3
4 4
4
4
5 5
5
5

Group2

Group2

PAIR PAIR Takeover Takeover COPY COPY PAIR PAIR
PAIR
PAIR
Takeover
Takeover
COPY
COPY
PAIR
PAIR
Want to start Want to start application application here here S-vol S-vol S-Jnl S-Jnl
Want to start
Want to start
application
application
here
here
S-vol
S-vol
S-Jnl
S-Jnl
COPY COPY PAIR PAIR Want to start Want to start application application here here S-vol S-vol

3DC failover and failback actions without a third link

In a 3DC configuration, a failure of a host or data center that is actively running the application initiates an action for recovery. The action depends on the failure and the condition of the configuration after the failure. Figure 16 shows all the major actions that can occur in a 3DC configuration in which there are only XP Continuous Access data links between two of the three data centers (no phantom volume group).

two of the three data centers (no phantom volume group). Figure 16: Host failure in a

Figure 16: Host failure in a 3DC configuration

Cascaded (1:1:1) Cascaded (1:1:1) Multi-Target (1:2) Multi-Target (1:2) Host failure Host failure Action 1 Action
Cascaded (1:1:1)
Cascaded (1:1:1)
Multi-Target (1:2)
Multi-Target (1:2)
Host failure
Host failure
Action 1
Action 1
Action 2
Action 2
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Action 3 Action 3
Action 3
Action 3
Action 4 Action 4 Data center 1 Data center 1 Data center 2 Data center
Action 4
Action 4
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Data center 1 fail
Data center 1 fail
Data center 2 fail
Data center 2 fail
Action 5
Action 5
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3

Single data center failure

Single data center failure

Double data center failure

Double data center failure

Data center 1 and 2 fail

Data center 1 and 2 fail

failure Data center 1 and 2 fail Data center 1 and 2 fail One of three

One of three major events can occur, and the action depends on the configuration at the time of failure.

Host failure

Action 1: Transition from a cascaded configuration to a multi-target configuration.

Action 2: Transition from a multi-target configuration to a cascaded configuration.

Single data center failure

Action 3: Transition from a cascaded configuration to a two data center XP Continuous Access Journal configuration.

– Action 4: Transition from a multi-target configuration to a two data center unprotected solution. A Continuous Access XP data link between data centers 1 and 3, or another recovery data center, is necessary to protect the data.

Double data center failure

Action 5: Transition to a single data center unprotected solution.

The following sections describe each action in more detail.

Depending on your documentation style or diagramming preference, Figure 17 shows the same actions, but when starting from a multi-target solution with no phantom volume group. Notice in actions 1 and 2 how a horctakeover can change a multi-target configuration to a cascaded configuration, and visa versa.

actions 1 and 2 how a horctakeover can change a multi-target configuration to a cascaded configuration,

Figure 17: Host failure in a 3DC configuration when working from a multi-target solution

Multi-Target (1:2)

Multi-Target (1:2)

Cascaded (1:1:1)

Cascaded (1:1:1)

Host failure Host failure Action 2 Action 2 Action 1 Action 1 Data center 2
Host failure
Host failure
Action 2
Action 2
Action 1
Action 1
Data center 2
Data center 2
Data center 2
Data center 2
Data center 1
Data center 1
Data center 1
Data center 1
Data center 3
Data center 3
Data center 3
Data center 3
Action 5
Action 5
Data center 2
Data center 2
Data center 1
Data center 1
Action 3
Action 3
Data center 3
Data center 3
Data center 1 and 2 fail
Data center 1 and 2 fail
Double data center failure
Double data center failure
Data center 2
Data center 2
Data center 2
Data center 2
Data center 1
Data center 1
Data center 1
Data center 1
Data center 3
Data center 3
Data center 3
Data center 3
Data center 1 fail
Data center 1 fail
Data center 2 fail
Data center 2 fail
Action 4
Action 4
Single data center failure
Single data center failure
Data center 2 fail Data center 2 fail Action 4 Action 4 Single data center failure

Action 1: Transition from a cascaded to multi-target configuration

A DC1 host failure does not affect the array or the ability of the array to replicate data. When

handling a host failure in a 3DC configuration, it is easy to transition from one configuration to another and maintain replication to three data centers. Figure 18 shows a transition from a cascaded

to a multi-target configuration.

transition from a cascaded to a multi-target configuration. Figure 18: Transition from a cascaded to multi-target

Figure 18: Transition from a cascaded to multi-target configuration

Action 1 Action 1
Action 1
Action 1
H1 H1 H2 H2 H3 H3 P-vol P-vol S/P-vol S/P-vol S-vol S-vol Sync Sync P-jnl
H1
H1
H2
H2
H3
H3
P-vol
P-vol
S/P-vol
S/P-vol
S-vol
S-vol
Sync
Sync
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
H1 H1 H2 H2 H3 H3 S-vol S-vol P/P-vol P/P-vol S-vol S-vol Sync Sync P-jnl
H1
H1
H2
H2
H3
H3
S-vol
S-vol
P/P-vol
P/P-vol
S-vol
S-vol
Sync
Sync
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Data center 2 Data center 2 Data center 3 Data center 3 This figure shows that

This figure shows that host 1 failed. This failure could be a hardware issue, or it could be due to a maintenance action, such as a planned operating system upgrade to host 1, requiring that the administrator move the application to another node in a different data center. If this were a hardware failure, the movement of the application could be automated by using a clustered solution to detect the failure and automatically initiate movement. A cluster extension solution or metro cluster solution can swap the pair and prepare the volume for the read/write operation.

The direction of the XP Continuous Access Sync replication is swapped and application data enters the system from host 2 and is split into multiple replication destinations, one going synchronously to data center 1 and one going asynchronously (journal) to data center 2.

Action 2: Transition from a multi-target to cascaded configuration

Figure 19 shows a transition from a multi-target to a cascaded configuration.

a multi- target to cascaded configuration Figure 19 shows a transition from a mu lti-target to

Figure 19: Transition from a multi-target to cascaded configuration

Action 2 Action 2
Action 2
Action 2
H1 H1 H2 H2 H3 H3 S-vol S-vol P/P-vol P/P-vol S-vol S-vol Sync Sync P-jnl
H1
H1
H2
H2
H3
H3
S-vol
S-vol
P/P-vol
P/P-vol
S-vol
S-vol
Sync
Sync
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
H1 H1 H2 H2 H3 H3 S-vol S-vol S/P-vol S/P-vol S-vol S-vol Sync Sync Suspend
H1
H1
H2
H2
H3
H3
S-vol
S-vol
S/P-vol
S/P-vol
S-vol
S-vol
Sync
Sync
Suspend
Suspend
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal

Data center 1

Data center 1

Data center 2

Data center 2

Data center 3

Data center 3

center 2 Data center 2 Data center 3 Data center 3 This figure shows that host

This figure shows that host 2 failed, or the application is moved from host 2 to host 1. The reason could be to recover from action 1, or could be that the host running the application currently had a hardware issue. If this was a hardware failure, the movement of the application could be automated by using a clustered solution to detect the failure and initiate application movement. A cluster extension solution or metro cluster solution can swap the pair and prepare the volume for read/write operation.

The direction of the XP Continuous Access Sync replication must be swapped, but the swap process puts the XP Continuous Access Sync pair in a COPY status for a short time, sending inconsistent (out- of-order) data to data center 2. This action could jeopardize the consistency of data on the XP Continuous Access Journal pair. The XP Continuous Access Journal pair must therefore be suspended before the swap can occur. So long as the XP24000 or XP20000 is running firmware 60-01-68- 00/00 or newer firmware, or the XP12000 or XP10000 is running 50.05.xx or newer firmware, this action is performed automatically during the swap process. Application data then enters the system from host 1 and is replicated synchronously to data center 2. The XP Continuous Access Journal must be manually resynchronized using the pairresync command, after the XP Continuous Access Sync pair is back in PAIR status.

Action 3: Cascaded configuration data center failure

With a data center failure, either the array or all communications to the array are failed. Either way, data center 1 is not able to participate in a failover operation. This condition could also indicate a data center 1 disaster in which a power outage or other disaster disabled the data center for some time. The disaster is localized enough that it does not affect data center 2 or its accessibility to run the application. Figure 20 shows a cascaded configuration data center failure.

20 shows a cascaded configuration data center failure. Figure 20: Cascaded configuration data center failure

Figure 20: Cascaded configuration data center failure

Action 3 Action 3 H1 H1 H2 H2 P-vol P-vol S/P-vol S/P-vol Sync Sync P-jnl
Action 3
Action 3
H1
H1
H2
H2
P-vol
P-vol
S/P-vol
S/P-vol
Sync
Sync
P-jnl
P-jnl

Data center 1

Data center 1

Data center 2

Data center 2

H3 H3 S-vol S-vol S-jnl S-jnl Journal Journal Data center 3 Data center 3
H3
H3
S-vol
S-vol
S-jnl
S-jnl
Journal
Journal
Data center 3
Data center 3
S-jnl S-jnl Journal Journal Data center 3 Data center 3 H1 H1 H2 H2 SSWS SSWS
H1 H1 H2 H2 SSWS SSWS P-vol P-vol S/P-vol S/P-vol Sync Sync P-jnl P-jnl
H1
H1
H2
H2
SSWS
SSWS
P-vol
P-vol
S/P-vol
S/P-vol
Sync
Sync
P-jnl
P-jnl

Data center 1

Data center 1

Data center 2

Data center 2

H3 H3 S-vol S-vol S-jnl S-jnl Journal Journal Data center 3 Data center 3
H3
H3
S-vol
S-vol
S-jnl
S-jnl
Journal
Journal
Data center 3
Data center 3
S-jnl S-jnl Journal Journal Data center 3 Data center 3 This figure shows that data center

This figure shows that data center 1 is considered failed and the application is moved to data center 2. A takeover action from data center 2 attempts to swap the XP Continuous Access Sync replication pair, but this process fails because data center 1 is unavailable to participate. An S-VOL takeover from DC2 will leave the DC2 volume in S-VOL SSWS status. It also indicates that a swap is still pending on the volume.

The application can now run on host 2 and continues to replicate, using XP Continuous Access Journal, to data center 3. Data protection is still available, but an RPO of 0 is not assured at this point if a second major failure requires failover to data center 3.

Failback to DC1 after a cascaded configuration failover

The data center failure recovery process depends on the extent of the disaster. If the disaster was just temporary and did not damage the hardware, the array would retain knowledge of preexisting replication pairs, and the recovery process is fast and easy. However, if the hardware was damaged and requires replacement, you must re-create replication relationships. This process requires forcibly deleting remaining relationships on data center 2. The process of replacing hardware requires extended planning and timing. This paper discusses a case in which the hardware did not need to be replaced.

Figure 21 shows recovery from a cascaded configuration data center failure.

recovery from a casc aded configuration data center failure. Figure 21: Failback to DC1 after a

Figure 21: Failback to DC1 after a cascaded failover to DC2

Action 3 Action 3
Action 3
Action 3
H1 H1 H2 H2 H3 H3 Copy Copy S-vol S-vol P/P-vol P/P-vol S-vol S-vol Sync
H1
H1
H2
H2
H3
H3
Copy
Copy
S-vol
S-vol
P/P-vol
P/P-vol
S-vol
S-vol
Sync
Sync
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Data center 2 Data center 2 Data center 3 Data center 3 H1 H1 H2 H2

H1

H1

H2

H2

H3 H3 S-vol S-vol S-jnl S-jnl Journal Journal Data center 3 Data center 3
H3
H3
S-vol
S-vol
S-jnl
S-jnl
Journal
Journal
Data center 3
Data center 3
P-vol P-vol S/P-vol S/P-vol Sync Sync P-jnl P-jnl
P-vol
P-vol
S/P-vol
S/P-vol
Sync
Sync
P-jnl
P-jnl

Data center 1

Data center 1

Data center 2

Data center 2

center 1 Data center 1 Data center 2 Data center 2 This figure shows that data

This figure shows that data center 1 has recovered and is available again. The first step in the process is to complete the pending swap on the XP Continuous Access Sync pair, using the pairresync – swaps command from host 2 or the pairresync –swapp command from host 1. The replication direction for the XP Continuous Access Sync pair is swapped and any changed data is copied from data center 2 to data center 1. Because the target for the copy process is the volume on data center 1, the XP Continuous Access Journal pair is unaffected by the out-of-order copy process and remains in PAIR status.

After the copy process is complete, the application is in a functional multi-target configuration. A command to transition to cascade configuration (discussed later) returns the configuration back to normal.

Note Multi-target to cascade transitions suspend the XP Continuous Access Journal pair. A manual resynchronization of this pair is required after completing the transition.

Action 4: Multi-target configuration data center failure

A DC1 failure in a multi-target configuration leaves you with these options:

Perform a failover on the XP Continuous Access Sync pair and continue normal operations, but have no data replication protection. During DC2 disaster, applications could be moved to data center 3 but a substantial amount of transactions might be lost.

Perform a failover on the XP Continuous Access Journal pair and take the chance of losing some transactions, but move the application out of the disaster area. In a cluster environment, the failover to data center 2 would be initiated automatically by the cluster. Unless a full or delta resync connection is established between DC2 and DC3, DC3 will get further and further out of date.

Figure 22 shows a failover following a multi-target configuration data center failure.

following a mult i-target configuration data center failure. Figure 22: Multi-target configuration data center failure

Figure 22: Multi-target configuration data center failure

Action 4 Action 4 H2 H2 S-vol S-vol H1 H1 Data center 2 Data center
Action 4
Action 4
H2
H2
S-vol
S-vol
H1
H1
Data center 2
Data center 2
H3
H3
Sync
Sync
P/P-vol
P/P-vol
S-vol
S-vol
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal
Data center 1
Data center 1
Data center 3
Data center 3
Data center 1 Data center 1 Data center 3 Data center 3 H2 H2 SSWS SSWS

H2

H2

SSWS SSWS S-vol S-vol H1 H1 Sync Sync P/P-vol P/P-vol
SSWS
SSWS
S-vol
S-vol
H1
H1
Sync
Sync
P/P-vol
P/P-vol
P-jnl P-jnl
P-jnl
P-jnl
S-vol H1 H1 Sync Sync P/P-vol P/P-vol P-jnl P-jnl Data center 2 Data center 2 H3

Data center 2

Data center 2

H3

H3

S-vol S-vol S-jnl S-jnl
S-vol
S-vol
S-jnl
S-jnl
Journal Journal
Journal
Journal

Data center 1

Data center 1

Data center 3

Data center 3

center 1 Data center 1 Data center 3 Data center 3 This figure shows that data

This figure shows that data center 1 failed for some reason, and the application is moved to data center 2. A takeover action on data center 2 will attempt to swap the XP Continuous Access Sync replication pair, but this process will fail because data center 1 is unavailable to participate. An S-VOL takeover leaves the DC2 volume in SSWS status. SSWS status indicates that a swap is still pending on the volume.

The application can now run on data center 2 but cannot replicate to any data center. Data on data centers 2 and 3 will grow further apart as new data is written to data center 2. If a temporary link between data centers 2 and 3 can be established, you can create a new replication pair between these volumes and continue replication.

Failback to DC1 after a multi-target configuration failover to DC2

The failback process depends on the extent of the disaster. If the disaster was just temporary and did not damage the DC1 hardware, the DC1 array would retain knowledge of preexisting replication pairs, and the recovery process would be fast and easy. However, if the hardware was damaged and requires replacement, you must re-create the replication relationship. This process requires forcibly deleting remaining relationships from data center 2. The process of replacing hardware requires extended planning and timing. This paper discusses a case in which the hardware did not need to be replaced.

Figure 23 shows recovery from a multi-target data center failure.

23 shows recovery from a multi-target data center failure. Figure 23: Failback to DC1 after a

Figure 23: Failback to DC1 after a multi-target configuration failover to DC2

Action 4 Action 4 H2 H2 Pairresync -swaps Pairresync -swaps P-vol P-vol 1 1 H1
Action 4
Action 4
H2
H2
Pairresync -swaps
Pairresync -swaps
P-vol
P-vol
1 1
H1
H1
Data center 2
Data center 2
H3
H3
Sync
Sync
Suspend
Suspend
S/P-vol
S/P-vol
S-vol
S-vol
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal
Data center 1
Data center 1
Data center 3
Data center 3
Copy
Copy
1 Data center 1 Data center 3 Data center 3 Copy Copy H2 H2 2 2
H2 H2 2 2 S-vol S-vol H1 H1 Data center 2 Data center 2 H3
H2
H2
2 2
S-vol
S-vol
H1
H1
Data center 2
Data center 2
H3
H3
Sync
Sync
Copy
Copy
Pairresync
Pairresync
3 3
P/P-vol
P/P-vol
S-vol
S-vol
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal
Data center 1
Data center 1
Data center 3
Data center 3
Data center 1 Data center 1 Data center 3 Data center 3 The figure shows that

The figure shows that data center 1 has recovered and is available. The first step in the process is to complete the pending swap on the XP Continuous Access Sync pair, using either the pairresync – swaps command from host 2 or the pairresync –swapp command from host 1. The replication direction for the XP Continuous Access Sync pair is swapped and any changed data is copied from data center 2 to data center 1. This action could result in out-of-order data on the source volume for the XP Continuous Access Journal pair, except the XP Continuous Access Journal pair is automatically suspended by the array. The duration of the copy process depends on the quantity of data that was modified during the failure and link throughput. When the copy is complete and the XP Continuous Access Sync pair is again in PAIR status, the application can either be left on host 2 and run in a cascaded configuration, or it can be moved back to host 1 and returned to the original multi-target solution.

Action 5: Multiple data center failures

A concurrent failure of data centers 1 and 2, or one failing shortly after the other, results in the

applications being down with no automatic cluster enabled failover to data center 3. You may want

to

take some time to evaluate the failure and the risk, if any, of data loss when moving the application

to

data center 3. It is not easy to determine whether data loss would occur. Even with a well-sized

replication link, a possibility of some data loss remains, depending on the disaster, what part of the

configuration was affected first, and how fast the rest of the configuration was affected. When the decision is made to start the application on data center 3, you can initiate the application startup manually.

Figure 24 shows multiple data center failures.

manually. Figure 24 shows multiple data center failures. Figure 24: Multiple data center failures Action 5

Figure 24: Multiple data center failures

Action 5 Action 5
Action 5
Action 5
H1 H1 H2 H2 H3 H3 P-vol P-vol S/P-vol S/P-vol S-vol S-vol Sync Sync P-jnl
H1
H1
H2
H2
H3
H3
P-vol
P-vol
S/P-vol
S/P-vol
S-vol
S-vol
Sync
Sync
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
H1 H1 H2 H2 H3 H3 SSWS SSWS ?-vol ?-vol ?/P-vol ?/P-vol S-vol S-vol Sync
H1
H1
H2
H2
H3
H3
SSWS
SSWS
?-vol
?-vol
?/P-vol
?/P-vol
S-vol
S-vol
Sync
Sync
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal

Data center 1

Data center 1

Data center 2

Data center 2

Data center 3

Data center 3

center 2 Data center 2 Data center 3 Data center 3 This figure shows that data

This figure shows that data centers 1 and 2 failed, either together or one after the other, and a decision was made to start the application on data center 3. The application startup on data center 3 attempts a takeover on the XP Continuous Access Journal S-VOL, but this takeover fails and leaves the S-VOL in SSWS status. This status also indicates that the swap is still pending. The S-VOL is read/write enabled, and the application can run on this data center, but the data is not remote replication protected.

Failback to DC1 after a multiple DC failure

This recovery procedure assumes that the temporarily unavailable arrays on data centers 1 and 2 (originally in Multi-target mode) were not replaced with new arrays but were recovered using the existing arrays after a failover to DC3.

1. Verify that the volume at data center 2 can be changed to an S-VOL for the XP Continuous Access Journal pair and to either a P-VOL or SSWS S-VOL for the Sync pair. If this volume is a PAIR state S-VOL for the DC1-DC2 XP Continuous Access Sync pair, then it must not also receive data from data center 3 because an S-VOL cannot receive data from two directions at once. Depending on prior status, a takeover directed to the XP Continuous Access Sync volume on data center 2 by way of host 2 either leaves the volume in SSWS (suspended S-VOL) status or changes the replication direction of the XP Continuous Access Sync group by changing the DC2 Sync S-VOL into a Sync P-VOL (re-creating multi-hop). Figure 25 shows the preparation of the S-VOL for a takeover.

gure 25 shows the preparation of the S-VOL for a takeover. Figure 25: Preparing the DC2

Figure 25: Preparing the DC2 S-VOL for a takeover

1. 1.

Prepare shared volume for takeover

Prepare shared volume for takeover

The CA-Sync pair must have The CA-Sync pair must have P-vol or S-vol(SSWS) on data
The CA-Sync pair must have
The CA-Sync pair must have
P-vol or S-vol(SSWS) on data
P-vol or S-vol(SSWS) on data
H1
H1
H2
H2
H3
H3
center 2 before
center 2 before
recovery can start.
recovery can start.
1 1
SSWS
SSWS
S-vol
S-vol
P/P-vol
P/P-vol
S-vol
S-vol
Sync
Sync
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal

Data center 1

Data center 1

Data center 2

Data center 2

Data center 3

Data center 3

Journal Journal Data center 1 Data center 1 Data center 2 Data center 2 Data center

2. Complete the takeover on the XP Continuous Access Journal volume group by using the pairresync –swaps command from host 3. This action copies any updated data from data center 3 to data center 2. Figure 26 shows the completion of the swap of the XP Continuous Access Journal pair.

data center 3 to data center 2. Figure 26 shows the completion of the swap of

Figure 26: Completing the swap of the XP Continuous Access Journal pair

2. 2.

Complete swap of CA-journal pair

Complete swap of CA-journal pair

Pairresync -swaps Pairresync -swaps H1 H1 H2 H2 H3 H3 2 2 S-vol S-vol P/S-vol
Pairresync -swaps
Pairresync -swaps
H1
H1
H2
H2
H3
H3
2 2
S-vol
S-vol
P/S-vol
P/S-vol
P-vol
P-vol
Copy
Copy
Sync
Sync
S-jnl
S-jnl
P-jnl
P-jnl
Journal
Journal
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
P-jnl Journal Journal Data center 1 Data center 1 Data center 2 Data center 2 Data

3. Move the application to data center 2. This action swaps the direction of replication for the XP Continuous Access Journal link, and any new data will be replicated from data center 2 to data center 3. This action also prepares the configuration for the recovery on the XP Continuous Access Sync pair. It does not make sense to have the application running on data center 3, replicate using XP Continuous Access Journal to data center 2, and then replicate using a synchronous replication method to data center 1 (synchronous downstream of asynchronous). The XP Continuous Access Journal data is not typically current because of the nature of journal replication, and, therefore, synchronous replication will not necessarily provide current data. Figure 27 shows the moving of an application to data center 2.

27 shows the moving of an application to data center 2. Figure 27: Moving an application

Figure 27: Moving an application to data center 2

3. 3.

Move application to data center 2 (step1)

Move application to data center 2 (step1)

Takeover (CA-journal)

Takeover (CA-journal)

H1 H1 H2 H2 H3 H3 3 3 S-vol S-vol P/P-vol P/P-vol S-vol S-vol Copy
H1
H1
H2
H2
H3
H3
3 3
S-vol
S-vol
P/P-vol
P/P-vol
S-vol
S-vol
Copy
Copy
Sync
Sync
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal

Data center 1

Data center 1

Data center 2

Data center 2

Data center 3

Data center 3

Journal Journal Data center 1 Data center 1 Data center 2 Data center 2 Data center

4.

With the application now running on data center 2, resynchronize the XP Continuous Access Sync pair, and replicate any new data to data center 1. This action brings the configuration back into a multi-target solution. The application can remain running on data center 2, or the application can be moved to data center 1 to re-create the cascaded solution. Figure 28 shows resynchronization of the XP Continuous Access Sync pair.

center 1 to re-create the casc aded solution. Figure 28 shows resynchronization of the XP Continuous

Figure 28: Resynchronization of the XP Continuous Access Sync pair

4. 4.

Move application to data center 2 (step2)

Move application to data center 2 (step2)

Pairresync (CA-Sync)

Pairresync (CA-Sync)

H1 H1 H2 H2 H3 H3 4 4 Copy Copy S-vol S-vol P/P-vol P/P-vol S-vol
H1
H1
H2
H2
H3
H3
4 4
Copy
Copy
S-vol
S-vol
P/P-vol
P/P-vol
S-vol
S-vol
Sync
Sync
P-jnl
P-jnl
S-jnl
S-jnl
Journal
Journal

Data center 1

Data center 1

Data center 2

Data center 2

Data center 3

Data center 3

Journal Journal Data center 1 Data center 1 Data center 2 Data center 2 Data center

5.

Move the application to data center 1. This process is a standard multi-target to cascade failover, and during the process, inconsistent data is copied to the volume on data center 2. This action automatically suspends the XP Continuous Access Journal replication pair. A manual resynchronization of the XP Continuous Access Journal pair is required after completing the application move. Figure 29 shows the movement of the application to data center 1.

is required after completing the application move. Figure 29 shows the moveme nt of the application

Figure 29: Moving the application to data center 1

5. 5.

Move application to data center 1 (Optional)

Move application to data center 1 (Optional)

Takeover (CA-Sync)

Takeover (CA-Sync)

H1 H1 H2 H2 5 5 Copy Copy P-vol P-vol S/P-vol S/P-vol Sync Sync P-jnl
H1
H1
H2
H2
5 5
Copy
Copy
P-vol
P-vol
S/P-vol
S/P-vol
Sync
Sync
P-jnl
P-jnl
H3 H3 S-vol S-vol S-jnl S-jnl
H3
H3
S-vol
S-vol
S-jnl
S-jnl

Suspend

Suspend

P-jnl H3 H3 S-vol S-vol S-jnl S-jnl Suspend Suspend Journal Journal Data center 1 Data center

Journal

Journal

Data center 1

Data center 1

Data center 2

Data center 2

Data center 3

Data center 3

Journal Journal Data center 1 Data center 1 Data center 2 Data center 2 Data center

The benefits of the optional third data link

With the standard 3DC configuration, a problematic situation exists when data center 2 fails in a cascaded solution or if data center 1 fails in the multi-target solution. In this condition, the application must run on a single node with no remote replication protection, and the two remaining data centers have no communication with each other. By adding an optional data link between the two data centers, a new replication pair can be created and the data can yet again be remotely protected. Figure 30 shows the failover and failback possibilities when using an optional third data link.

possibilities when using an optional third data link. Figure 30: Failover and failback with third data

Figure 30: Failover and failback with third data link

Cascaded (1:1:1)

Cascaded (1:1:1)

Host failure

Host failure

Multi-Target (1:2)

Multi-Target (1:2)

Data center 1 Data center 1 Data center 2 Data center 2 Data center 3
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Action 6
Action 6
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Data center 1 fail
Data center 1 fail

Single data center failure

Single data center failure

1 fail Single data center failure Single data center failure Data center 2 fail Data center
1 fail Single data center failure Single data center failure Data center 2 fail Data center

Data center 2 fail

Data center 2 fail

Double data center failure

Double data center failure

Data center 1 Data center 1 Data center 2 Data center 2 Data center 3
Data center 1
Data center 1
Data center 2
Data center 2
Data center 3
Data center 3
Data center 1 and 2 fail
Data center 1 and 2 fail
center 3 Data center 1 and 2 fail Data center 1 and 2 fail The diagram

The diagram shows the failover and failback options available when a third data link is added between data centers 1 and 3 in a cascaded configuration. A new recovery possibility is added to the solution for the scenario in which data center 2 is unavailable, and a new replication pair can be created between data centers 1 and 3 using the additional link.

When studying the 3DC solution from a single application perspective, it can be said that this additional link is a very expensive solution because the link is only utilized in the case of a data center failure, and most telco providers charge monthly rental for the line even if it is not used. Most installations have a requirement to protect multiple applications running on at least data centers 1 and 2. Using multiple applications, with each forming its own 3DC configuration, you can use both long-distance links to data center 3 on a fairly equal basis. In the event of a disaster, use the link for both applications.

Figure 31 shows the same third link configuration and failover option from a multi-target perspective.

Figure 31 shows the same third link configuration an d failover option from a multi-target perspective.

Figure 31: Failover and failback with third data link from a multi-target perspective

Multi-Target (1:2)

Multi-Target (1:2)

Cascaded (1:1:1)

Cascaded (1:1:1)

Host failure

Host failure

a d e d ( 1 : 1 : 1 ) Host failure Host failure Data
Data center 2 Data center 2 Data center 1 Data center 1 Data center 3
Data center 2
Data center 2
Data center 1
Data center 1
Data center 3
Data center 3
Data center 2 Data center 2 Data center 1 Data center 1 Data center 3
Data center 2
Data center 2
Data center 1
Data center 1
Data center 3
Data center 3
Data center 2
Data center 2
Data center 1
Data center 1
Data center 3
Data center 3
Action 6
Action 6
Data center 1 and 2 fail
Data center 1 and 2 fail
Double data center failure
Double data center failure
Data center 2
Data center 2
Data center 2
Data center 2
Data center 1
Data center 1
Data center 1
Data center 1
Data center 3
Data center 3
Data center 3
Data center 3
Data center 1 fail
Data center 1 fail

Single data center failure

Single data center failure

Data center 2 fail

Data center 2 fail

1 fail Data center 1 fail Single data center failure Single data center failure Data center

Action 6: Using the optional third link

A failure of data center 2 in the cascaded configuration or of data center 1 in the multi-target configuration can lead to a situation in which the remaining data centers are isolated from each other. The availability of an optional third data link between the remaining data centers allows for the possibility to create a new XP Continuous Access Journal replication pair between these data centers.

Figure 32 shows a data center failure of data center 2 that results in the suspension of the XP Continuous Access Sync replication group, as well as fencing host 1 from writing to the P-VOL (XP Continuous Access Sync with fence-level data).

P-VOL (XP Continuous Access Sync with fence-level data). Figure 32: Using the optional third data link

Figure 32: Using the optional third data link

Start: Data center 2 failure

Start: Data center 2 failure

H1 H1 P-VOL P-VOL jnl jnl
H1
H1
P-VOL
P-VOL
jnl
jnl

Data center 1

Data center 1

H2 H2 S/P-vol S/P-vol P-jnl P-jnl
H2
H2
S/P-vol
S/P-vol
P-jnl
P-jnl
H3 H3 S-VOL S-VOL S-jnl S-jnl
H3
H3
S-VOL
S-VOL
S-jnl
S-jnl

Data center 3

Data center 3

H3 H3 S-VOL S-VOL S-jnl S-jnl Data center 3 Data center 3 Sync Sync Journal Journal

Sync

Sync

H3 H3 S-VOL S-VOL S-jnl S-jnl Data center 3 Data center 3 Sync Sync Journal Journal

Journal

Journal

Data center 2 Data center 2
Data center 2
Data center 2
S-VOL S-jnl S-jnl Data center 3 Data center 3 Sync Sync Journal Journal Data center 2

1.

Perform a P-VOL takeover at data center 1. This action removes the “fence” enforced by fence-level data and allows the application to continue writing to the P-VOL. Data is not remotely protected at this time. Figure 33 shows a P-VOL takeover.

to continue writing to the P-VOL. Data is not remotely protected at this time. Figure 33

Figure 33: Performing a P-VOL takeover

1. 1.

Perform P-vol takeover (due to fence level data)

Perform P-vol takeover (due to fence level data)

horctakeover –g Group1

horctakeover –g Group1

H1 H1 P-VOL P-VOL P-jnl P-jnl
H1
H1
P-VOL
P-VOL
P-jnl
P-jnl

Data center 1

Data center 1

H2 H2 S/P-vol S/P-vol P-jnl P-jnl
H2
H2
S/P-vol
S/P-vol
P-jnl
P-jnl
H3 H3 S-VOL S-VOL S-jnl S-jnl
H3
H3
S-VOL
S-VOL
S-jnl
S-jnl

Data center 3

Data center 3

H3 H3 S-VOL S-VOL S-jnl S-jnl Data center 3 Data center 3 Sync Sync Data center

Sync

Sync

Data center 2 Data center 2
Data center 2
Data center 2

Journal

Journal

S-jnl S-jnl Data center 3 Data center 3 Sync Sync Data center 2 Data center 2

2.

Delete any remaining XP Continuous Access Journal pair definitions on the array at data center 3 by using the cascaded pairsplit –FCA 1 –R command from host 1 and directing it to the phantom, or dummy, group between data centers 1 and 3. (For more information, see the Options for the pairsplit command section in this paper.) This phantom group now becomes the actual XP Continuous Access Journal group between data centers 1 and 3. Figure 34 shows the deletion of any remaining XP Continuous Access Journal pair definitions.

remaining XP Continuous Ac cess Journal pair definitions. Figure 34: Deleting remaining XP Continuous Access Journal

Figure 34: Deleting remaining XP Continuous Access Journal pair definitions

2. 2.

Delete XP Continuous Access Journal Pair

Delete XP Continuous Access Journal Pair

pairsplit –g Group3 –FCA 1 -R

pairsplit –g Group3 –FCA 1 -R

H1 H1 P-VOL P-VOL jnl jnl
H1
H1
P-VOL
P-VOL
jnl
jnl
H2 H2 S/P-vol S/P-vol P-jnl P-jnl
H2
H2
S/P-vol
S/P-vol
P-jnl
P-jnl
H3 H3 vol vol jnl jnl
H3
H3
vol
vol
jnl
jnl

Data center 3

Data center 3

P-jnl H3 H3 vol vol jnl jnl Data center 3 Data center 3 Sync Sync Data

Sync

Sync

Data center 2 Data center 2
Data center 2
Data center 2

Data center 1

Data center 1

jnl Data center 3 Data center 3 Sync Sync Data center 2 Data center 2 Data

3. Create the new XP Continuous Access Journal pair between data centers 1 and 3. Before the XP24000/20000 60-02-25-00/00 firmware and the XP12000/XP10000 50-08-xx firmware, this action required a full initial copy of all data on data centers 1 to 3. This requirement for full initial copy was removed with XP24000/20000 60-02-25-00/00 firmware, and the XP12000/XP10000 50-08-xx XP firmware. For more information, see the “3DC with fast delta resynchronization” section in this paper.

fast delta resynchronization ” section in this paper. Figure 35: Creating the new XP Continuous Access

Figure 35: Creating the new XP Continuous Access Journal pair

3. 3.

Create XP Continuous Access Journal pair

Create XP Continuous Access Journal pair

Paircreate –g Group3 –vl –f async –jp –js

Paircreate –g Group3 –vl –f async –jp –js

H1 H1 H2 H2 H3 H3 P-VOL P-VOL S/P-vol S/P-vol S-VOL S-VOL Sync Sync P-jnl
H1
H1
H2
H2
H3
H3
P-VOL
P-VOL
S/P-vol
S/P-vol
S-VOL
S-VOL
Sync
Sync
P-jnl
P-jnl
P-jnl
P-jnl
S-jnl
S-jnl
Data center 2
Data center 2
Data center 1
Data center 1
Data center 3
Data center 3

Journal

Journal

S-jnl Data center 2 Data center 2 Data center 1 Data center 1 Data center 3

Recovery after the use of the optional third data link

To recover after using the optional third optional data link, with data center 2 back in production, select either of the following options:

Resynchronize the Sync pair between data centers 1 and 2 and continue operations using a multi- target configuration and use the link between data centers 2 and 3 as the new optional third data link. If the original configuration was cascaded, this option would not return the user to the configuration in use before the failure and might require a substantial amount of resources in rewriting operational procedures and test plans.

Revert the configuration back to a cascaded solution using the data link between data centers 2 and 3 as the primary data link and the data link between data centers 1 and 3 as the optional data link. This operation is described in the following steps.

The recovery operation requires several steps that must be performed in the correct sequence. The recovery process also implements the “no-copy” option for the paircreate command to avoid a full and long initial copy of data between data centers 2 and 3. Make sure that all steps in the process are performed in the correct order and that you have a full understanding of the process before starting. The examples indicate that the application can remain active on data center 1 during the entire recovery processes. However, HP recommends that you stop the application for the duration of the process.

Figure 36 shows the recovery of data center 2. It is assumed that the same original array is used after the recovery. This array at data center 2 still has configuration information about both the XP Continuous Access Sync and XP Continuous Access Journal pairs that existed on this array.

Continuous Acce ss Journal pairs that existed on this array. Figure 36: Recovery from the use

Figure 36: Recovery from the use of the optional third data link

Start: Data center 2 recovered

Start: Data center 2 recovered

H1 H1 H2 H2 H3 H3 P-VOL P-VOL S/P-vol S/P-vol S-VOL S-VOL Sync Sync P-jnl
H1
H1
H2
H2
H3
H3
P-VOL
P-VOL
S/P-vol
S/P-vol
S-VOL
S-VOL
Sync
Sync
P-jnl
P-jnl
P-jnl
P-jnl
S-jnl
S-jnl
Data center 2
Data center 2
Data center 1
Data center 1
Data center 3
Data center 3

Journal

Journal

S-jnl Data center 2 Data center 2 Data center 1 Data center 1 Data center 3

1. Delete the remaining XP Continuous Access Journal definition still existing on the array at data center 2 (shown in figure 37). The journal volume still contains pairing information related to the XP Continuous Access Journal pair between data center 2 and 3. This information could not be discarded during the failover process because the array on data center 2 was down at that time. This process returns the journal status back to simplex (SMPL) status and enables the re-creation of the XP Continuous Access Journal pair between data centers 2 and 3.

Continuous Access Journal pair between data centers 2 and 3. Figure 37: Deleting the remaining XP

Figure 37: Deleting the remaining XP Continuous Access Journal definition on the array on data center 2

1. 1.

on data center 2

on data center 2

Delete any remaining XP Continuous Access Journal pair

Delete any remaining XP Continuous Access Journal pair

pairsplit –g Group2 –S -l

pairsplit –g Group2 –S -l

H1 H1 H2 H2 H3 H3 P-VOL P-VOL S-vol S-vol S-VOL S-VOL Sync Sync P-jnl
H1
H1
H2
H2
H3
H3
P-VOL
P-VOL
S-vol
S-vol
S-VOL
S-VOL
Sync
Sync
P-jnl
P-jnl
Jnl
Jnl
S-jnl
S-jnl
Data center 2
Data center 2
Data center 1
Data center 1
Data center 3
Data center 3

Journal

Journal

S-jnl Data center 2 Data center 2 Data center 1 Data center 1 Data center 3

2.

Perform a resynchronization on the XP Continuous Access Sync pair between data centers 1 and 2 (shown in figure 38). The pair relationship still exists in the two arrays, and a resynchronization operation will perform comparison of bitmap data and copy all changed tracks from data center 1 to data center 2. If the array on data center 2 was replaced after the disaster with a new array, it would be necessary to re-create the pair and perform a full initial copy of data. This step might take a few minutes, depending on the amount of differential data between the arrays.

on the amount of differential data between the arrays. Figure 38: Performing a resynchronization on the

Figure 38: Performing a resynchronization on the XP Continuous Access Sync pair

2. 2.

Re-sync XP Continuous Access Sync pair from data center 1

Re-sync XP Continuous Access Sync pair from data center 1

pairresync –g group1

pairresync –g group1

H1 H1 H2 H2 H3 H3 P-VOL P-VOL S-vol S-vol S-VOL S-VOL Sync Sync P-jnl
H1
H1
H2
H2
H3
H3
P-VOL
P-VOL
S-vol
S-vol
S-VOL
S-VOL
Sync
Sync
P-jnl
P-jnl
Jnl
Jnl
S-jnl
S-jnl
Data center 2
Data center 2
Data center 1
Data center 1
Data center 3
Data center 3

Journal

Journal

S-jnl Data center 2 Data center 2 Data center 1 Data center 1 Data center 3

After completion of the resynchronization operation, data centers 1 and 2 will be in sync and will contain exactly the same data. Data center 3 might be a few transactions behind the data on data centers 1 and 2 because of the nature of journal replication. Data centers 2 and 3 have the same base data, but data center 2 might be slightly ahead of data center 3.

3. Suspend the XP Continuous Access Sync pair between data centers 1 and 2 (shown in figure 39). Data center 2 now has a point-in-time copy of the data at data center 1 and the same base data as data center 3. If the application on host 1 continues to write to the P-VOL, XP Continuous Access Journal replication to data center 3 continues and allows data center 3 to get ahead of the data on data center 2.

data center 3 to get ahead of the data on data center 2. Figure 39: Suspending

Figure 39: Suspending the XP Continuous Access Sync pair

3. 3.

Suspend XP Continuous Access Sync pair

Suspend XP Continuous Access Sync pair

Pairsplit –g group1

Pairsplit –g group1

H1 H1 H2 H2 H3 H3 P-VOL P-VOL S-vol S-vol S-VOL S-VOL Sync Sync P-jnl
H1
H1
H2
H2
H3
H3
P-VOL
P-VOL
S-vol
S-vol
S-VOL
S-VOL
Sync
Sync
P-jnl
P-jnl
Jnl
Jnl
S-jnl
S-jnl
Data center 2
Data center 2
Data center 1
Data center 1
Data center 3
Data center 3

Journal

Journal

center 1 Data center 3 Data center 3 Journal Journal Data center 2 has a point-in-time

Data center 2 has a point-in-time copy of the base data at time T0. Data center 1 is at time T2, and data center 3 is somewhere behind data center 1 but ahead of data center 2 at time T1. At this time, data center 1 tracks all new writes to the P-VOL for the XP Continuous Access Sync pair. This tracking includes all writes that were copied to data center 3 using XP Continuous Access Journal after the suspension of the XP Continuous Access Sync pair.

4.

Suspend and then delete the XP Continuous Access Journal group between data centers 1 and 3. This action leaves the data on data center 3 consistent at a point-in-time later than that of data center 2. The journal pool at data center 3 is returned to simplex (SMPL) status and can now be used in the next step to create a new XP Continuous Access Journal pair between data centers 2 and 3. Figure 40 shows deletion of the XP Continuous Access Journal group between data centers 1 and 3.

Access Journal group between data centers 1 and 3. Figure 40: Deleting the XP Continuous Access

Figure 40: Deleting the XP Continuous Access Journal group between data centers 1 and 3

4. 4.

Delete XP Continuous Access Journal pair

Delete XP Continuous Access Journal pair

Pairsplit –g group3 -S

Pairsplit –g group3 -S

H1 H1 P-VOL P-VOL Jnl Jnl
H1
H1
P-VOL
P-VOL
Jnl
Jnl

Data center 1

Data center 1

H2 H2 S-vol S-vol Jnl Jnl
H2
H2
S-vol
S-vol
Jnl
Jnl
H3 H3 VOL VOL Jnl Jnl
H3
H3
VOL
VOL
Jnl
Jnl

Data center 3

Data center 3

H2 S-vol S-vol Jnl Jnl H3 H3 VOL VOL Jnl Jnl Data center 3 Data center

Sync

Sync

Data center 2 Data center 2
Data center 2
Data center 2
Data center 3 Sync Sync Data center 2 Data center 2 At this time, data center

At this time, data center 2 has a point-in-time copy of the base data at time T0. Data center 3 has a point-in-time copy of the base data and some additional data at a point-in-time T1. Data center 1 had the latest current data at time T3. Data center 1 has a bitmap record of all data added to the system after the XP Continuous Access Sync operations were suspended (from T0 to current).

5. Create the XP Continuous Access Journal pair between data centers 2 and 3 using the “-nocopy” operation. Although the data on data centers 2 and 3 are not exactly the same, the base data on these volumes is the same. Data center 3 has some newer data, but this newer data is tracked by the differential tables on data center 1 for the suspended XP Continuous Access Sync pair.

Suspend the replication operation for the XP Continuous Access Journal pair. This action is necessary to prepare for the next step that performs a resynchronization on the XP Continuous Access Sync pair, which would otherwise result in inconsistent data being forwarded on for a group in PAIR status, which is not allowed. Figure 41 shows the creation and suspension of the XP Continuous Access Journal pair using the “-nocopy” option.

Access Journal pair using the “-nocopy” option. Figure 41: Creating and suspending the XP Continuous

Figure 41: Creating and suspending the XP Continuous Access Journal pair using the “nocopy” option

5. 5.

(using –nocopy)

(using –nocopy)

Create and suspend XP Continuous Access Journal pair

Create and suspend XP Continuous Access Journal pair

paircreate –g Group2 –vl –nocopy –f async

paircreate –g Group2 –vl –nocopy –f async

pairsplit –g Group2

pairsplit –g Group2

H1 H1 P-VOL P-VOL Jnl Jnl
H1
H1
P-VOL
P-VOL
Jnl
Jnl

Data center 1

Data center 1

H2 H2 S/P-vol S/P-vol P-jnl P-jnl
H2
H2
S/P-vol
S/P-vol
P-jnl
P-jnl
H3 H3 S-VOL S-VOL S-jnl S-jnl
H3
H3
S-VOL
S-VOL
S-jnl
S-jnl

Data center 3

Data center 3

H3 H3 S-VOL S-VOL S-jnl S-jnl Data center 3 Data center 3 Sync Sync Journal Journal

Sync

Sync

H3 H3 S-VOL S-VOL S-jnl S-jnl Data center 3 Data center 3 Sync Sync Journal Journal

Journal

Journal

Data center 2 Data center 2
Data center 2
Data center 2
S-VOL S-jnl S-jnl Data center 3 Data center 3 Sync Sync Journal Journal Data center 2

6.

Resynchronize the XP Continuous Access Sync pair between data centers 1 and 2. This process updates data center 2 with the latest data from data center 1, including all the data already copied to data center 3 using the old XP Continuous Access Journal group, and any data written to data center 1 after the deletion of the XP Continuous Access Journal group between data centers 1 and 3. All copied tracks are recorded in the bitmap for the XP Continuous Access Journal pair at data center 2.

After the XP Continuous Access Sync pair reaches PAIR status, resynchronize the XP Continuous Access Journal pair between data centers 2 and 3. This action copies all the updated data from data center 2 to data center 3. While much of this data is already on data center 3, it is harmlessly overwritten during the copy process, and any additional changes are also written to data center 3. Figure 42 shows the resynchronization of the XP Continuous Access Sync and XP Continuous Access Journal pairs.

Access Sync and XP Continuous Access Journal pairs. Figure 42: Resynchronization of the XP Continuous Access

Figure 42: Resynchronization of the XP Continuous Access Sync pair

6. 6.

then re-sync XP Continuous Access Journal pair

then re-sync XP Continuous Access Journal pair

Re-sync XP Continuous Access XP Sync pair,

Re-sync XP Continuous Access XP Sync pair,

pairresync –g Group1

pairresync –g Group1

pairresync –g Group2

pairresync –g Group2

H1 H1 H2 H2 P-VOL P-VOL S/P-vol S/P-vol Sync Sync Jnl Jnl P-jnl P-jnl
H1
H1
H2
H2
P-VOL
P-VOL
S/P-vol
S/P-vol
Sync
Sync
Jnl
Jnl
P-jnl
P-jnl

Data center 1

Data center 1

H3 H3 S-VOL S-VOL S-jnl S-jnl Journal Journal
H3
H3
S-VOL
S-VOL
S-jnl
S-jnl
Journal
Journal

Data center 3

Data center 3

Data center 2 Data center 2
Data center 2
Data center 2
Data center 3 Data center 3 Data center 2 Data center 2 At this point, data

At this point, data centers 1 and 2 have exactly the same data again, and data center 3 has the exact same base data than data centers 1 and 2, but might be slightly behind them because of the XP Continuous Access Journal operation. The configuration is back in cascaded configuration, and the recovery operation is complete.

Other configuration options

This section discusses possible alternative configurations for multiple data center replication. These configurations show the options available and specific requirements.

4DC configuration

This configuration uses four data centers instead of three. Data centers 1 and 2 are in one metropolitan area. Data centers 3 and 4 are in another metropolitan area. Continuous Access XP Sync operations occur between data centers in both remote metropolitan areas. However, the metropolitan areas are far enough apart from each other to safely put them out of the extended disaster range of each other, allowing for a unique setup in which four applications can be run and be protected in a 3DC configuration at one time. This configuration also enables Continuous Access XP Sync protection at all times in one of the two metropolitan area configurations.

Another option on the 4DC configuration is to implement a Continuous Access XP Journal replication from data centers 1 and 2 to data centers 3 and 4, respectively. This replication makes sure that there is a Continuous Access XP Journal group established whenever a single data center fails and that it is not necessary to create a new Continuous Access XP Journal group between surviving data centers, avoiding the initial copy processes. This replication provides you with two journal copies of data on data centers 3 and 4. These two copies are not necessarily at the same data currency level. Figure 43 shows a 4DC configuration.

data currency level. Figure 43 shows a 4DC configuration. Figure 43: 4DC configuration Data center 1

Figure 43: 4DC configuration

Data center 1

Data center 1

Data center 2 Data center 2 G1 G1 Sync Sync #0 #0 #0 #0 P-vol
Data center 2
Data center 2
G1
G1
Sync
Sync
#0
#0
#0
#0
P-vol
P-vol
S/P
S/P
#2
#2
#2
#2
P-jnl
P-jnl
G5
G5
P-jnl
P-jnl
#1
#1
Journal
Journal
#1
#1
G2
G2
G3
G3
Journal
Journal
Journal
Journal
G6
G6
#1
#1
#1
#1
Journal
Journal
S-jnl
S-jnl
S-jnl
S-jnl
#2
#2
#2
#2
S-vol
S-vol
S-vol
S-vol
#0
#0
#0
#0
G4
G4
Data center 3
Data center 3
Data center 4
Data center 4
Sync or Journal
Sync or Journal

A 3DC configuration with four copies

With a standard 3DC configuration, the biggest concern remains the recovery from action 6. This situation occurs when one data center fails but leaves the remaining data centers separated from each other. Recovering and remotely protecting the data again after this type of failure requires a full initial copy of a new XP Continuous Access Journal pair. This process can take some time. The solution also requires a data link between the two data centers that would not be used under normal conditions.

One way to overcome the requirement for the full initial copy is to create an XP Continuous Access Journal pair for both data centers 1 and 2 to data center 3 using different target volumes on data center 3. This action creates two copies of the data on data center 3. These copies would not necessarily be exactly the same. If either data center 1 or data center 2 fails, the remaining data center still has an XP Continuous Access Journal replication active to data center 3 and there is no need to create a new replication pair. Figure 44 shows a 3DC configuration with four copies.

pair. Figure 44 shows a 3DC configuration with four copies. Figure 44: 3DC configuration with four

Figure 44: 3DC configuration with four copies

Data center 1

Data center 1

Data center 2 Data center 2 G1 G1 Sync Sync P-vol P-vol #0 #0 #0
Data center 2
Data center 2
G1
G1
Sync
Sync
P-vol
P-vol
#0
#0
#0
#0
S/P
S/P
P-jnl
P-jnl
#1
#1
#1
#1
P-jnl
P-jnl
G2
G2
G3
G3
Journal
Journal
Journal
Journal
#1
#1
#1
#1
S-jnl
S-jnl
S-jnl
S-jnl
S-vol
S-vol
S-vol
S-vol
Data center 3
Data center 3
Journal Journal #1 #1 #1 #1 S-jnl S-jnl S-jnl S-jnl S-vol S-vol S-vol S-vol Data center

3DC with fast delta resynchronization

As of XP24000/20000 60-02-25-00/00 firmware and XP12000/XP10000 50-08-05 firmware, 3DC with delta resync eliminates the requirement for a full copy when creating a DC2-to-DC3 XP Continuous Access Journal pair after DC1 becomes disabled. With delta resync, data center 2 maintains a journal volume under normal conditions, journaling exactly the same data as data center 1. In the event of a data center failure at data center 1, the journal relationship can be resumed from data center 2 using the journal data already available on this data center. The journal data on data center 3 is compared with the journal data on