Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
PERFORMANCE GUIDE
Revision History
The following table presents the revision history of this document:
Introduction
About this document
This document provides extensive information on the performance capabilities of
RecoverPoint, and performance considerations for building and configuring a
RecoverPoint system.
It assumes that you are familiar with the RecoverPoint product, and have a basic
knowledge of storage technologies and their respective performance
characteristics.
In addition, before using this document, you should be familiar with basic
RecoverPoint performance capabilities, as described in the EMC RecoverPoint
5.0 Performance Guide. If you are not responsible for the detailed specification
of your RecoverPoint system, that document may provide all the information that
you need.
This guide focuses on major use cases and common questions; however, given
the complexity of real-world environments, it cannot cover all of the possible
configurations and scenarios. If, after studying this guide, you still need additional
information consult RPSPEED.
Definitions
Throughput — volume of incoming writes, normally expressed in megabytes per
second (MB/s).
IOPS — number of incoming writes, in I/Os per second.
Sustained performance — maximum replication rate that can be sustained over
an extended time, and which maintains the required RPO and RTO without
entering a highload state.
Distribution — process by which the replicated data in the copy journal is written
to the copy storage. This process is CPU-intensive and I/O-intensive.
Protection window — how far in time the copy image can be rolled back.
Primary RPA — the preferred RPA for replicating a given consistency group (CG)
Highload — a system state that occurs during replication when RPA resources at
the production cluster are insufficient.
Related documentation
The following related documents are available for download from EMC Online
Support:
EMC RecoverPoint 5.0 Performance Guide
EMC RecoverPoint 5.0 Release Notes
I/O Patterns
IOPS are measured with 4K I/O blocks.
Throughput is measured with 64K I/O blocks.
Except in the application pattern tests, the I/O pattern that is generated is
100% random write cache hit.
The data is generated with compressibility ratio of 2; that is, the RPA can
compress the data to half of its initial size if compression is enabled.
Response time
All host I/Os pass through the splitter: read I/Os are immediately passed to the
designated devices, while write I/Os are intercepted by the splitter. From the
splitter, a write I/O is sent first to the primary RPA. Once it is acknowledged by the
RPA it is passed to the designated device, and only then it is acknowledged to
the host.
In asynchronous replication, the primary RPA acknowledges write I/Os
immediately upon receiving them in its memory. However, even in this case
there may be an added response time to every write I/O, due to any of the
following factors:
Primary RPA hardware — stronger RPAs may respond faster.
Load on primary RPA for the production copy — higher load may induce
longer response time.
Communications protocol between splitter and RPA (FC or iSCSI), and size of
the I/O — these determine the number of round trips needed for the I/O to
pass from the splitter to the RPA.
Splitter type
In synchronous replication, the primary RPA acknowledges write I/Os only after
they are received and acknowledged by its peer RPA at the remote cluster. The
remote RPA acknowledges a write immediately when it reaches its memory.
Hence, in synchronous replication, response time depends on all of the factors
listed for asynchronous replication, together with the following factors:
Remote peer RPA hardware — stronger RPAs may respond faster.
Load on the RPA for the remote copy — higher load may induce longer
response time.
Communications protocol between the RPA clusters at the two sites (FC or
IP), and size of the I/O — these determine the number of round trips needed
for the I/O to pass between the clusters.
As a result of these factors, added response times are presented in the
Performance Test Results” section for several example environments.
RecoverPoint added response time for distributed consistency groups is the same
as for regular consistency groups.
RecoverPoint added response time for write I/Os is typically higher for
synchronous replication than asynchronous replication. However, for a multi-user
application, although each user’s transaction experiences a delay, the overall
impact on performance is minor.
Dynamic sync mode replication can be assigned to a group that generally
requires synchronous replication, but which can be switched to asynchronous
replication at peak times to avoid excessive application delay. For this
replication mode, the user defines thresholds for the latency and/or the
throughput between RPA clusters. When any threshold is reached, replication
over the link automatically switches to asynchronous mode. When values are
Workload
RecoverPoint replicates only write I/Os because only these I/Os change the state
of the device. Thus, the added response time due to read I/Os is negligible
relative to write I/Os. Real-world applications have a complex I/O pattern which
is composed of both reads and writes of various sizes.
The following benchmarks are commonly used to simulate common application
patterns:
OLTP1 — mail applications
OLTP2 — small Oracle applications
OLTP2HW — large Oracle applications
DSS2 — data warehouse applications
The sustained write throughput that can be replicated by a single RPA was
described in the EMC RecoverPoint 5.0 Performance Guide. With complex
application patterns, however, many factors beside the RPA can affect the
performance, including the splitter type, the communications channel
properties, port connectivity, and remote-site storage type and configuration.
The results for throughput, IOPS, and added response time are provided for
several example environments in the “Performance Test Results” section, which
begins on page 27.
Communications
The nature of the communications link between RPA clusters has a major impact
on RecoverPoint performance.
Bandwidth
The bandwidth between RPA clusters may become a bottleneck that limits the
replication throughput. Compression and deduplication WAN optimizations can
be enabled to allow greater throughput over the link between these RPA
clusters. WAN optimizations, however, are CPU–intensive, and may reduce RPA
performance if CPU is the performance bottleneck, as it may be, for example,
with weak vRPAs.
In synchronous replication, WAN optimizations are disabled, since they tend to
increase RecoverPoint added response time.
The default compression level is low compression, because it gives the most
benefit across all RPA and vRPA configurations.
When the data is compressible and dedupable, enabling compression and
dedup optimizations doesn’t degrade maximum IOPS and throughput when
replicating with a physical RPA. For example, when the workload has
compressibility ratio of 2 (that is, it can be compressed into half of its initial size)
and dedupe ratio of 2 (that is, dedup optimization can save half the WAN
bandwidth), a single physical RPA can replicate up to 300 MB/s.
The greater that the latency and the packet loss are, the smaller the throughput
that RecoverPoint can replicate.
Table 2. Impact of RTT on throughput of single RPA in asynchronous replication
The maximum supported round trip time for asynchronous replication is 200 ms
with up to 1% packet loss.
The maximum supported round trip time for synchronous replication is 4 ms over
FC (distance of 200 km) or 10 ms over WAN.
Communications problems often cause highloads. Ways to detect these
problems are presented in the “Performance Tools” section, on page 38.
External WAN accelerators can be used to improve performance during
asynchronous replication. It is best practice to disable RecoverPoint WAN
optimizations (compression and deduplication) if WAN optimization is performed
by a WAN accelerator.
Journal
To replicate a production write, while maintaining the undo data that is needed
if you want to roll back the target copy image, five-phase distribution mode is
applied. This mode produces five I/Os at the target copy. Of these, two I/Os are
directed to the replication volumes and three I/Os are directed to the journal.
Thus, the throughput requirement of the journal at the target copy is three times
that of the production, and 1.5 times that of the replication volumes. For that
reason it is very important to configure the journal correctly. Misconfiguration
may result in a decrease in sustained throughput, an increase in journal lag, and
highloads.
Journal I/Os are typically large and sequential as opposed to the target copy
I/Os that depend on application write I/O patterns, which may be random. For
performance reasons, the I/O chunk size that RecoverPoint sends depends on
the array type, including:
VNX/Unity — 1 MB
VMAX — 256 KB
VPLEX — 256K for reads and 128 KB for writes. (Starting with GeoSynchrony 5.2,
the write size is 1MB).
Example:
The application generates throughput of 50 MB/s and 6400 write IOPS. What is the
required performance of the journal at a remote copy?
The throughput requirement of the journal would be (50 MB/s × 3) = 150 MB/s.
The average I/O size in this example is (50 MB/s / 6400 IOPS) = 8 KB. Thus, many
I/Os (16–64, depending on the array type) would be aggregated into a single
I/O to the journal. The IOPS requirement from the journal would be between
(150 MB/s / 512 KB) = 300 and (150 MB/s / 128 KB) = 1200.
Production journals, as opposed to copy journals, do not have strict performance
requirements since they are used for writing only small amounts of metadata
during replication. In the case of failover, however, these production journals
become copy journals that have major effect on performance. This should be
taken into account when configuring the system.
provisioned journal LUNs with FAST VP, they should be bound to a pool with
the slowest drives, provided faster drives are available in the same pool in
case of failover.
Journal compression
RecoverPoint can compress the data that is written to the journal in order to
decrease the required journal capacity and increase the protection window.
However this compression is CPU-intensive and usually reduces the overall
throughput of an RPA and CG. In addition, it significantly reduces image access
performance. Hence, it is recommended to enable journal compression only
when there is not enough journal storage.
Table 3 presents the effect of journal compression on throughput in
asynchronous replication over WAN with Gen6 RPAs.
Table 3. Impact of journal compression on throughput (asynchronous)
Note that multiple CGs running on the same RPA may show better performance
for journal compression.
The compression ratio depends on the I/O pattern generated from the
application. Without additional information, as a rule of thumb, medium journal
compression ratio doubles the protection window, and high compression ratio
triples it.
Security level
The following security levels can be configured for IP communication between
all RPA clusters in a RecoverPoint system:
Not authenticated, not encrypted
Authenticated and encrypted
For more details, refer to the EMC RecoverPoint 5.0 Security Configuration Guide.
If maximum security is required, it is recommended that you use the
“Authenticated and encrypted” security level.
If the security provided by authentication and encryption is not required, you
can achieve a small gain in performance by using the “Not authenticated, not
encrypted” security level, especially when the WAN quality is low (i.e., high
latency and/or packet loss) or when sync replication is needed.
Note that when the security level is set higher than “Not authenticated, not
encrypted”, the response time for synchronous replication over WAN may be as
much as doubled.
During an upgrade, if both product versions support the same security levels,
then the existing security level is maintained. If not, ensure that, following
upgrade, the security level is set to your desired level.
Regular CG Distributed CG
Throughput Throughput
IOPS IOPS
Configuration (MB/s) (MB/s)
2 clusters — 1 local copy and
35,000 140 35,000 350
1 remote copy
3 clusters — 2 async remote copies 35,000 200 35,000 460
3 clusters — 1 async and 1 sync
20,000 200 21,000 350
remote copies
Regular CG Distributed CG
Throughput Throughput
IOPS IOPS
Configuration (MB/s) (MB/s)
2 clusters — 1 local copy and 1 remote 20,000 110 27,000 350
copy
3 clusters — 2 async remote copies 13,000 55 27,000 220
3 clusters — 1 async and 1 sync 13,000 55 16,000 88
remote copies
The write I/Os that are received at production are duplicated once for every
local and remote copy. Only then, WAN compression and deduplication are
applied according to the link configuration. Note that the splitter to RPA
communication does not depend on the number of copies.
Example:
An application generates writes at 80 MB/s, and is being replicated by
RecoverPoint to two remote copies and one local copy. The data that the
application generates can be compressed by half. What is the required WAN
bandwidth?
The local copy doesn’t require WAN bandwidth, but for every remote copy, the
80 MB/s throughput is duplicated. Therefore without compression the required
bandwidth is 160 MB/s. However with compression enabled the required
bandwidth is only 80 MB/s.
Compression is CPU-intensive. Consider configuring a CG with high throughput to
multiple remote copies as a distributed CG. This will allow spreading of the CPU
load over multiple RPAs.
Note: Replication always takes place between RPAs that have the same roles in
their respective clusters; for example, RPA 1 in one cluster replicates to RPA 1 of
another cluster). “Diagonal replication”—that is, replication between
unmatched RPAs— is not supported. As a consequence, adding an RPA to one
RPA cluster without balancing the number of RPAs at the other clusters will not
help to increase replication performance between these clusters.
4 VMAX V2 ports that are spread on two directors can hold the maximum IOPS
and throughput of a single RPA.
To support more IOPS and throughput, you can add additional RPAs and VMAX
ports while keeping the ratio of at least 4 VMAX ports per RPA. The ports should
be spread as equal as possible over the directors and engines.
vRPA
vRPA resources
vRPAs can be deployed in the following predefined virtual machine
configurations:
8 vCPUs, 8 GB RAM
4 vCPUs, 4 GB RAM
2 vCPUs, 4 GB RAM
Refer to EMC RecoverPoint 5.0 Performance Guide for the capabilities of these
vRPA configurations.
When designating the configuration of the production side vRPAs, consider the
following:
If synchronous replication is needed, use the “8 vCPUs, 8 GB RAM” or
“4 vCPUs, 4 GB RAM” vRPA configuration.
If deduplication is needed, use the “8 vCPUs, 8 GB RAM” vRPA configuration.
Stronger vRPAs will be able to sustain higher write IOPS and throughput
generated by the application.
Stronger vRPAs will be able to handle longer and stronger peaks of write I/Os.
Scaling considerations
As in the physical case, RecoverPoint scales linearly as vRPAs are added. This is
especially important because it is very easy to add vRPAs (up to 8) to the RPA
cluster (by cloning an existing vRPA or deploying a new vRPA using OVA) without
adding additional physical hardware.
Example:
Four applications generate throughput of 100 MB/s each. How many vRPAs
should be deployed?
Each “8 vCPUs, 8 GB RAM” vRPA can replicate incoming write throughput of
about 100 MB/s over WAN. To allow a total throughput of 400 MB/s 4 RPAs are
required. However an additional RPA should be provisioned in order to allow the
system to continue replication non-disruptively in case of an RPA failure. Weaker
vRPA hardware cannot be used unless the application data is divided into
several CGs that are grouped in a “group set” with parallel bookmarks.
Example:
Two applications generate writes at 5,000 IOPS. Which vRPA hardware should be
provided?
Putting aside redundancy considerations, 5,000 IOPS can be handled by a single
“8 vCPUs, 8 GB RAM” vRPA or by two “2 vCPUs, 4 GB RAM” vRPAs. The hardware
requirement of the latter, however, is lower than the first. Hence, in this case, two
weaker vRPAs will utilize host resources better than one strong vRPA.
Deployment considerations
When deploying a vRPA cluster consider the following recommendations:
Memory — It is highly recommended to reserve on the ESX all memory
required for the vRPAs. Note that when deploying vRPAs from an OVF,
whether it reserves all of the memory depends on the exact version.
CPU — It is recommended to reserve CPU. When deploying a vRPA from an
OVF, about 4000 MHz of CPU is reserved.
Total CPU and memory available on an ESX host must be at least the sum of
CPU and memory required by each of the individual vRPAs on it.
Network bandwidth of the ESX host must be sufficient to handle the I/O load
of all vRPAs that run on it.
If production VMs run on the same ESX host as the vRPA, you must consider also
their CPU, memory, and network requirements in the sizing. In addition, you
should consider the VM management services (such as vMotion) network
requirements.
It is recommended to deploy vRPAs of the same cluster on different ESX hosts, to
allow spreading the CPU load and avoiding networking congestion on the ESX
ports. Nonetheless, if you deploy more than one vRPA from an RPA cluster on a
single ESX, consider the following:
Never put vRPA roles 1 and 2 on the same ESX, in order to prevent a single
ESX failure causing the failure of an entire RPA cluster.
If that ESX fails, all the CGs running on the vRPAs that run on it will switch to
other vRPAs running on other ESXs. Plan accordingly to ensure that the other
vRPAs and ESXs will be able to handle this extra load.
It is best practice to enable VMware HA (high availability) on the ESX server
hosting the vRPAs. In case of an ESX failure, the vRPAs that ran on it will restart
on another ESX. However, note that the CG switch to another vRPA is likely to
happen before the vRPA starts running on another ESX.
It is best practice to run the vRPAs on a different ESX from the application that
they are replicating. This prevents the vRPA and the application from competing
for ESX resources, especially at peak times. Note that it does not mean that
vRPAs cannot share ESX resources with production VMs.
Example:
Four applications generate throughput of 100MB/s each. How many ESX
machines are required to run the vRPAs, assuming that each ESX machine has
10 Gb WAN?
Best practice is to deploy five “8 vCPUs, 8 GB RAM” vRPAs on five different ESX
machines. However, it is possible for RecoverPoint to replicate this load using only
two ESX machines with 40 GB RAM and enough vCPUs since each one of them
can hold 5 replicating “8 vCPUs, 8 GB RAM” vRPAs. In any case, you must ensure
that vRPA role 1 and role 2 are deployed on different ESXs and that HA is
enabled for all vRPAs. In case of an ESX failure, all the vRPAs on that ESX will
restart on the other ESX; however, until then, there might be a period when
replication will be temporarily disrupted due to overloading of the remaining
vRPAs. This situation will correct itself automatically once the vRPAs boot up.
Example:
As In the previous example, four applications generate throughput of 100MB/s
each. How many ESX machines are required to run the vRPAs, assuming that
each ESX machine has only 1 Gb WAN?
Assuming that there are no additional requirement from production VMs or VM
management services, five ESX machines are needed because every such ESX
can handle the network traffic of a single “8 vCPUs, 8 GB RAM” vRPA, and an
additional ESX machine should be provisioned for redundancy in case of an ESX
failure.
For additional vRPA considerations and best practices, refer to the EMC
RecoverPoint vRPA Technical Notes.
Snap-based replication
Snap-based replication provides an alternative to RecoverPoint traditional
continuous replication, which, despite its many advantages, may in the event of
high I/O load cause an extended out-of-sync mode (that is, highload), with
associated high RPO.
Snap-based replication is supported only with VNX storage systems running VNX
OE for Block 05.32.000.5.215 and later, or VNX OE for Block 05.33.000.5.038 and
later.
In order to allow consistent snaps across all replication sets, all volumes in a
consistency group must reside on the same array, so that a snap taken on the
array is applied on all the volumes in the consistency group at exactly the same
time.
To enable snap-based replication, one of the following shipping modes must be
set for the Snap-based Configuration parameter (in the Link Policy tab for the
consistency group):
On Highload — a single snap will be taken on the production array after a
highload event.
Periodic — the system will create snaps on the production array according to
a specified interval.
I/O flow during snap-based replication is as follows:
Production copy
During snap-based replication, the VNX splitter splits the I/Os to the RPA;
however, the RPA uses only the metadata to mark the dirty regions. This is
similar to RPA behavior when a consistency group is in a pause state. Due to
this, IOPS, throughput, and added response time of snap-based replication
are similar to that of a group that is paused.
Replica copy
Replicated snaps are written to the journal at the remote storage array and
distributed using the regular distribution process to the replica volume. A
RecoverPoint bookmark is taken after a snap is replicated successfully.
For additional information about snap-based replication for VNX storage systems,
including configuration and limitations, refer to the EMC RecoverPoint 5.0
Product Guide.
The following sections present the indicators used to measure performance for
snap-based replication, and the factors that affect those indicators.
Table 7. Snap-based replication added response time at very low IOPS on VNX
array
Response time (ms) Added
response
With time by
Without RecoverPoint RecoverPoint
I/O size (KB) IOPS RecoverPoint replicating (ms)
Snap sizes
Snaps sizes depend on the frequency at which snaps are replicated and on the
application write I/O rate, I/O pattern, and hot spots.
The RPA replicates only the dirty regions that were changed since the last snap.
As a consequence, the amount of data to be transferred may be smaller than
the amount of data that was written by the application, due to write-folding. If,
for example, two consecutive I/Os were written to the same offset and length,
only the second I/O will be replicated. This is significant when the application I/O
pattern consists of many hot spots. In addition, the lower the snapshot frequency,
the higher the expected folding factor.
The maximum size of a snap is the sum of the capacities of all the production
volumes in the consistency group. In this extreme case, in which all the
production volumes are changed in a single snap, the journal will need to be
larger than the sum of all the production volumes. If not, the snap can be
replicated only through long resync, in which case you will lose all previous points
in time.
RPO
The RPO is the amount of data that has reached the production copy but is not
yet available for image access at the copy in case of production disaster.
The RPO for periodic snap-based replication is the interval you set plus the time it
takes to create a snap on the production array and transfer it to the remote side.
Example:
Periodic snap replication is configured on the link between two clusters with a
1-hour interval. What is the RPO, assuming that it takes 10 minutes to create a
snap on the production array, and 20 minutes to transfer it to the remote copy?
The RPO is 1 hour and 30 minutes. That is because right before a snap is created
at the remote copy, the latest available bookmark for image access contains
the snap that was taken 1 hour and 30 minutes ago.
Max IOPS
In snap-based replication, just as for consistency groups that are in pause state,
the maximum IOPS that RecoverPoint can sustain depends on its hardware. For
example, on Gen6 RPAs the maximum IOPS is 50,000. Spreading the IOPS load on
more RPAs will linearly increase the total IOPS RecoverPoint system can handle,
up to the VNX array limit.
Example:
An application is performing OLTP2HW pattern at maximum rate on 5 volumes in
a single CG. Would it be better to use continuous replication or snap-based
replication to protect it?
It depends on the bookmark granularity you would like to have. If you need
granularity of any point-in-time or on the order of seconds or a few minutes, use
continuous replication. Otherwise, use snap-based replication.
OLTP2HW writes its data to hot-spots. This reduces the change rate of the data
due to the large folding factor. As a consequence, in CRR replication, snap-
based replication would reduce dramatically the bandwidth usage between
clusters when compared to continuous replication. The larger interval you
choose, the more bandwidth is saved. For example, for a 10-minute, it may be
possible to achieve up to a 95% saving in bandwidth.
In addition snap-based replication would improve host performance by about
30% compared to continuous replication, since RP added response time in snap-
based replication is much lower, especially when the write volume is high.
Example:
The application generates constant write throughput of 300 MB/s to several
volumes that are protected by a single CG. What period should I configure for
snap-based replication?
The answer depends on the change rate of the data.
Each physical RPA can replicate snapshots at average rate of 150-200 MB/s.
If the folding factor is 2, then the change rate would be 150 MB/s which is less
than or equal to the replication rate. This means that snapshot sizes won’t
increase over time and you can configure a short period between snapshots.
However, best practice is to choose an interval that is not less than 1 hour.
If the folding factor is 1 then the change rate would be 300 MB/s, which is faster
than the replication rate. This means that regardless of the period you choose
the sizes of snapshots will increase in time. This will continue until the size of the
snapshot will equal the size of all the user volumes.
VPLEX array intercepts the UNMAP command, and sends write IOs filled with
zeros to the RPA to be replicated to all the copies. The size and number of such
write IOs depends on the capacity that was unmapped by the host.
Depending on the type of array that hosts the replica copy, the writes of zeros
may be translated back to an UNMAP command when applied to the replica.
For example, if a VPLEX volume is replicated to a VNX, a VPLEX UNMAP
command that was sent to RecoverPoint as a write of zeros is applied to the VNX
replica using an UNMAP command. While RecoverPoint supports replication of
UNMAP commands from a VPLEX array at production, it does not yet support
issuing UNMAP commands to a VPLEX array serving as replica.
When UNMAP commands are sent to devices protected by RecoverPoint, the
translation of the commands to zeros may cause the following performance
problems:
Highloads — the amount of data that is being unmapped is the amount of
data that needs to be replicated. If the UNMAP command addresses a large
storage region, it causes high IO load on the relevant CG and RPA. That load
may cause highload even on other CGs running on the same RPA.
Increase in UNMAP command response time (latency) — As in every write IO,
the acknowledgement of the UNMAP IO is sent to the host only after the data
is sent to the RPA. When the UNMAP command is addressed to a large region
of storage, multiple write IOs filled with zeroes are sent to the RPA. That may
cause a significant increase in latency of UNMAP commands sent to devices
that are protected by RecoverPoint. In synchronous replication, that increase
may be even larger because the writes must be sent to and acknowledged
by the replica RPA.
Incompatibility of provisioned storage capacity between production and
remote — Since RecoverPoint does not replicate the original UNMAP
command, the replica receives write IOs filled with zeroes instead. Thus,
capacity that has been de-allocated at production remains allocated at the
replica. This is the case wherever RecoverPoint does not support the issuing of
UNMAP commands to the replica storage (VPLEX included)
Note that the load on communication between sites is expected to be very
minor as long as compression is enabled, because write IOs that are pure zeroes
can be significantly and easily compressed.
Note also that UNMAP commands are not frequently sent by hosts. In addition
the system recovers automatically from highloads. Thus, in most cases, highloads
are not considered very severe as long as they don’t happen too often.
In case of severe problems due to UNMAP commands, it is possible to configure
the ESXi host not to use UNMAP commands according to VMWare
documentation. Note, however, that this will disable UNMAP commands to all
datastores and devices, including those that are not protected by RecoverPoint,
thereby degrading their performance.
MetroPoint
The MetroPoint solution allows full RecoverPoint protection of the VPLEX Metro
configuration, maintaining replication even when one Metro site is down.
I/O flow during MetroPoint replication is as follows:
The VPLEX splitter is installed on all VPLEX directors on all sites. The splitter is
located beneath the VPLEX cache. When host sends a write I/O to a VPLEX
volume the I/O is intercepted by the splitter on both the metro sites. Each
splitter that receives the I/O sends it to the RPA that is connected and runs
the consistency group that protects this volume. Only when it is
acknowledged by the RPA is it sent to the backend storage array. After the
I/O to the backend storage array on both Metro sites is complete, the host is
acknowledged.
In this flow, two RPAs receive the I/O, one RPA on each side of the Metro.
Only the RPA that runs the active production replicates the I/O to the remote
copy. The RPA that runs the standby production will only mark the regions of
the I/O as dirty as if the group is in pause state.
For additional information about MetroPoint, see the EMC RecoverPoint
Deploying with VPLEX Technical Notes and the EMC RecoverPoint 5.0
Administrator’s Guide.
The following sections present the indicators used to measure performance for
MetroPoint replication, and the factors that affect those indicators.
Deployment considerations
As in regular replication, load balancing of the consistency groups over RPAs
can greatly affect RecoverPoint overall performance. Hence, in MetroPoint, it is
advisable to balance the active and standby copies of the consistency groups
between the two production RecoverPoint clusters.
Example:
There are four MetroPoint CGs with throughput of 50 MB/s each. All of the RPA
clusters have 2 RPAs each. How should the CGs be configured to balance the
load?
Put two CGs on each RPA role. In each RPA role define one of the CGs as active
on one of the production RP clusters and the other CG as active on the second
production RP cluster.
In this way, each RPA at production will need to handle an incoming throughput
of 100 MB/s but replicate only 50 MB/s, and each RPA at the remote site will
need to distribute 100 MB/s.
Example:
Two MetroPoint CGs have throughput of 50 MB/s each. All RPA clusters have 2
RPAs each. How should I the CGs be configured to balance the load?
Put one CG on each RPA role. Unless you have WAN restrictions between one of
the production sites and the remote site, then performance-wise it doesn’t
matter which copy is active and which is standby.
It would be incorrect to put the two CGs on RPA 1 but define one of the CGs as
active on one production site and the other CG as active on the second
production site, since in that configuration, RPA 1 on the remote site will need to
distribute 100 MB/s while RPA 2 will be idle.
It is important to note that the results depend on many parameters and can vary
significantly even if only some of the environment parameters are changed. It is
recommended, therefore, that you not compare the results of different
environments, since they are different from each other in so many parameters.
When assessing the expected performance of your environment, refer to the
performance results of the environment that most closely resembles yours. The
given performance results provide you an estimate only of the performance that
you can expect.
In order to be able to see the effect of RecoverPoint on performance, each
table contains the results with and without RecoverPoint. The results without
RecoverPoint can be considered as a baseline, or as the performance
characteristic of the example environment.
Replica storage:
VNX 8000:
Flare version : 05.33.006.5.102
4 frontend FC ports connected
96 SAS disks of 820GB each, in 12 RAID groups (RAID 1/0, 3282GB each)
used for 512 production volumes
40 SAS disk of 820GB each, in 5 RAID groups (RAID 1/0, 3282GB each)
used for 24 journal volumes
Communication between RPA clusters:
IP bandwidth of 10Gb per RPA
Performance results
Table 8. Async replication added response time at very low IOPS
Response time (ms)
I/O size With RecoverPoint Added response time
(KB) IOPS Without RecoverPoint replicating by RecoverPoint (ms)
Table 11. Async replication added response time for application patterns, at 60%
max IOPS
Response time (ms)
Application With RecoverPoint Added response time
pattern Without RecoverPoint replicating by RecoverPoint (ms)
Replica storage:
VPLEX Medium (2 engines):
Software version: INT_D35-30-0.0.08 (Acropolis 5.5)
8 backend FC ports, 8 frontend FC ports connected
4 VPLEX directors
1:1 volume encapsulation
VNX backend storage (8000):
Flare version : 05.33.009.3.101
8 frontend FC ports connected
96 SAS disks of 820GB each, in 12 RAID groups (RAID 1/0, 3,282GB each)
used for 512 production volumes
48 SAS disks of 820GB each, in 6 RAID groups (RAID 1/0, 3,282GB each) for 24
journal volumes
Communication between RPA clusters:
IP bandwidth of 10Gb per RPA
Performance results
Table 12. Async replication added response time at very low IOPS
Response time (ms) Added response time
I/O size (KB) IOPS Without RecoverPoint With RecoverPoint replicating by RecoverPoint (ms)
Table 13. Sync replication added response time at very low IOPS
Response time (ms) Added response time
I/O size (KB) IOPS Without RecoverPoint With RecoverPoint replicating by RecoverPoint (ms)
Table 15. Async replication added response time for application patterns, at 60%
max IOPS
Response time (ms)
Application With RecoverPoint Added response time
pattern Without RecoverPoint replicating by RecoverPoint (ms)
Replica storage:
Unity 500:
Software version : 4.0.0.733913
8 FC ports connected
88 FC disks of 20GB each, in RAID 1/0 for 512 production volumes
24 FC disks of 50GB each, in RAID 1/0 for 128 journal volumes
Communication between RPA clusters:
FC bandwidth of 8 Gb per RPA FC port
Performance results
Table 16. Async replication added response time at very low IOPS
Response time (ms) Added response time
I/O size (KB) IOPS Without RecoverPoint With RecoverPoint replicating by RecoverPoint (ms)
Table 17. Sync replication added response time at very low IOPS
Response time (ms) Added response time
I/O size (KB) IOPS Without RecoverPoint With RecoverPoint replicating by RecoverPoint (ms)
Table 19. Async replication added response time for application patterns, at 60%
max IOPS
Response time (ms)
Application With RecoverPoint Added response time
pattern Without RecoverPoint replicating by RecoverPoint (ms)
96 SAS disks of 820GB each, in 12 RAID groups (RAID 1/0, 3282GB each)
used for 512 production volumes
40 SAS disks of 820GB each, in 5 RAID groups (RAID 1/0, 3282GB each)
used for 24 journal volumes
Replica storage:
VPLEX Medium (2 engines):
Software version: INT_D35-30-0.0.08 (Acropolis 5.5)
8 backend FC ports, 8 frontend FC ports connected
4 VPLEX directors
1:1 volume encapsulation
VNX backend storage (8000):
Flare version : 05.33.009.3.101
8 frontend FC ports connected
96 SAS disks of 820GB each, in 12 RAID groups (RAID 1/0, 3282GB each)
used for 512 user volumes
48 SAS disks of 820GB each, in 6 RAID groups (RAID 1/0, 3282GB each)
used for 24 journal volumes
Communication between RPA clusters:
FC bandwidth of 8 Gb per RPA FC port
Table 20. Async replication added response time at very low IOPS
Response time (ms) Added response time
I/O size (KB) IOPS Without RecoverPoint With RecoverPoint replicating by RecoverPoint (ms)
Table 21. Sync replication added response time at very low IOPS
Response time (ms) Added response time
I/O size (KB) IOPS Without RecoverPoint With RecoverPoint replicating by RecoverPoint (ms)
Table 23. Async replication added response time for application patterns, at 60%
max IOPS
Response time (ms)
Application With RecoverPoint Added response time
pattern Without RecoverPoint replicating by RecoverPoint (ms)
Performance results
Table 24. Async replication added response time at very low IOPS
Response time (ms) Added response time
I/O size (KB) IOPS Without RecoverPoint With RecoverPoint replicating by RecoverPoint (ms)
Table 25. Sync replication added response time at very low IOPS
Response time (ms)
I/O size Without With RecoverPoint Added response time by
(KB) IOPS RecoverPoint replicating RecoverPoint (ms)
Table 27. Async replication added response time for application patterns, at 60%
max IOPS
Response time (ms)
Application With RecoverPoint Added response time
pattern Without RecoverPoint replicating by RecoverPoint (ms)
Performance Tools
The following performance-related tools are available for sizing and analyzing
RecoverPoint systems:
BCSD (Business Continuity Solution Designer) —helps in sizing, based on the
relevant input parameters. For example, it can be used to calculate the
number of RPAs needed for a given workload, the required WAN bandwidth,
or the required journal size. The BCSD is available for download at:
https://elabadvisor.emc.com/app/licensedtools/list
Bottleneck detection tool —provides system performance statistics, and
suggests actions to boost performance by solving bottlenecks. It can detect
problems such as insufficient WAN, wrong compression level, slow journal
volume, or unbalanced load on the RPAs. You can activate this tool by
running the detect_bottlenecks CLI command. For additional information,
refer to EMC RecoverPoint Detecting Bottlenecks Technical Notes.
Load balancer —moves CGs between RPAs helps to balance the load across
the RPAs in a cluster. It is activated by running the balance_load CLI
command.
Short-term statistics — the get_group_statistics and get_system_statistics CLI
commands provide a range of statistics about recent behavior of a CG, or
for the whole system, respectively. The export_statistics CLI command
provides this information in CSV format, and per-minute granularity.
Long-term statistics — the export_consolidated_statistics CLI command
provides the long-term statistics of the RecoverPoint system in CSV format.
These statistics can be displayed graphically using the “RecoverPoint Long
Term Statistics Tool”, which is available with the product downloads at
https://support.emc.com.
DPA (Data Protection Advisor) — an EMC product that allows easy system
monitoring, analysis, and reporting. It can be used to display RecoverPoint
history statistics, detect bottlenecks, predict future system requirements, and
fine tune a RecoverPoint configuration. It can be obtained from
https://support.emc.com. Future releases of DPA will be able to use the
“RecoverPoint Long Term Statistics Tool”.