Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Best Practice
Technical white paper
Table of contents
Abstract .............................................................................................................................................. 3
Background ........................................................................................................................................ 3
Overview ............................................................................................................................................ 3
Best practices summary ........................................................................................................................ 4
First best practices ............................................................................................................................... 5
Best practices to reduce cost ................................................................................................................. 6
Mixed disk capacities influence the cost of storage .............................................................................. 6
Number of disk groups influences the cost of storage ........................................................................... 7
Number of disks influences the cost of storage ..................................................................................... 7
Thin Provisioning influences the cost of storage .................................................................................... 8
HP Dynamic Capacity Management influences the cost of storage ......................................................... 8
Dynamic LUN/RAID Migration influences cost ..................................................................................... 9
Disk performance influences the cost of storage ................................................................................... 9
Cross-RAID snapshots influence the cost of storage (and the perceived availability of storage) ................... 9
SAS Mid-line disks influence the cost of storage ................................................................................. 10
Best practices to improve availability.................................................................................................... 10
Backup processes to enhance recovery ............................................................................................. 11
Disk group type influences availability .............................................................................................. 11
RAID level influences availability ...................................................................................................... 11
Disk group organization influences availability and capacity utilization ................................................ 12
Vraid0 influences availability........................................................................................................... 12
Replacing a failed disk influences availability .................................................................................... 13
Protection level influences availability ............................................................................................... 13
Number of disk groups influences availability .................................................................................... 14
Number of disks in disk groups influences availability ........................................................................ 16
Capacity management and improved availability .............................................................................. 17
Fault monitoring to increase availability ............................................................................................ 20
Integrity checking to increase availability .......................................................................................... 20
Remote mirroring and availability .................................................................................................... 20
HP P6000 Continuous Access and Vraid0 influence availability .......................................................... 21
System fault-tolerant clusters and availability...................................................................................... 21
Best practices to enhance performance................................................................................................. 22
Number of disks influences performance ........................................................................................... 23
Number of disk groups influences performance ................................................................................. 23
Traditional Fibre Channel disk performance influences array performance ............................................ 24
Vraid level influences performance ................................................................................................... 24
Vraid0 influences performance ........................................................................................................ 24
Mixing disk performance influences array performance ...................................................................... 25
Mixing disk capacities influences performance and capacity utilization ................................................ 25
Read cache management influences performance .............................................................................. 26
Abstract
An important value of the HP P6000 Enterprise Virtual Array (EVA) is simplified management. A
storage system that is simple to administer saves management time and money, and reduces
configuration errors. You can further reduce errors and unnecessary expense by implementing a few
best practices and optimizing your P6000 EVA for their intended applications. This paper highlights
common configuration rules and tradeoffs for optimizing the new HP P6300 and P6500 Enterprise
Virtual Array for cost, availability, and performance. Getting the most from your enterprise-class
storage has never been easier.
Background
Two design objectives for the HP P6000 EVA are to provide maximum real-world performance and to
reduce storage management costs. These objectives result in the designing of an intelligent controller
that reduces the number of tuning parameters that are user controlled. In contrast, traditional disk
arrays typically have many tunable settings for both individual logical unit numbers (LUNs) and the
controller. Although tunable settings might appear to be desirable, they pose potential issues:
It is difficult and time consuming for administrators to set the parameters appropriately. Many
settings require in-depth knowledge of the controllers internal algorithms and specific knowledge of
the workload presented to the array.
Storage administrators often do not have the time and resources to attain the expertise necessary to
maintain a traditional array in optimum configuration.
As the I/O workload changes, many parameters that were previously set might no longer be
appropriate. Overcoming the effects of change requires continual monitoring, which is impractical
and costly in most situations.
Because of such concerns, HP P6000 EVA algorithms reduce the parameters that users can set, opting
instead to embed intelligence within the controller. The controller has a better view of the workload
than most administrators and can be far more dynamic in responding to workload changes. The result
is an array that is both easy to configure and high performing.
Overview
Although the HP P6000 EVA is designed to work in a wide range of configurations, configuration
options can influence performance, usable capacity, or availability. With the necessary information
about the configuration options for the HP P6000 EVA, a storage administrator can enhance its
configuration for a specific application.
Source
Discussion
Availability
Performance
Performance, Cost
Cost
Availability
Performance
Solid State disks have the highest performance. However, they are the
most expensive. 15k rpm disks have equal or higher performance than
10k rpm disks, but they are more expensive. See details for discussion
on price-performance optimization.
Performance
Vraid1
Availability,
Performance
Vraid5
Cost
Vraid6
Availability
Capacity management
Availability
Proper settings for the protection level, occupancy alarm, and available
free space provide the resources for the array to respond to capacityrelated faults.
HP P6000 Continuous
Access or host-based
mirroring
Availability
Availability
All data center best practices include processes to regularly copy the
current data set to external media or near-line devices.
Availability
As reconstruction space is not shared across disk groups, it is more efficient to have the least number
of disk groups possible, thus reducing the reconstruction overhead. When the first few larger disks are
introduced into a disk group, the resulting usable capacity is similar to the smaller disks until the
protection-level requirements are met. Then, the additional usable capacity is in line with the physical
capacity of the new disks. Even though the resulting capacity for the first few disks is not very efficient,
the alternative of creating two disk groups provides less usable capacity.
Best practice to reduce the cost of storage: Mix disk sizes within a single disk group.
Best practice to reduce the cost of storage: Use an equal or lower RAID level for the target LUN of a
Vraid snapshot.
Use mid-line disks only where random-access performance and continuous operation are not
required. The EVA requires that mid-line disks be organized in disk groups that are separate from
other drive types.
The best application for mid-line disks is for the online part of your backup and recovery solution.
Clones assigned to mid-line disk groups provide the lowest-cost solution for zero-downtime backup
and fast recovery storage. HP software solutions, such as HP Data Protector, and other industry
backup software, include the management of mid-line based clones for backup and recovery.
Mid-line drives are not recommended in Continuous Access applications as the remote storage
location for local data residing on standard higher speed disk drives. Continuous Access tends to
perform high-duty cycle random writes to the remote disk array. Matching remote mid-line disks with
local enterprise disks will affect the performance of your application, and will adversely impact the
reliability of the mid-line disks.
Best practice for SAS Mid-line disk and the cost of storage: Use mid-line disks to augment near-line
storage usage.
Note:
SAS Mid-line disks and clone copies are not a replacement for offline
backup. Best practice is to retain data on an external device or media for
disaster recovery.
10
A statistical analysis is a prediction of behavior for a large population; it is not a guarantee of operation for a single unit. Any single unit may
experience significantly different results.
11
If performance constraints do not allow a total Vraid6 configuration, consider using Vraid1 for critical
files or data sets. For example:
In database applications, select Vraid1 for log files.
In snapshot and clone applications, select Vraid1 for active data sets and Vraid5 for snapshots
and clones.
Best practice for highest availability: Vraid6 provides the highest levels of availability and data
protection.
Note:
The higher-availability and data-protection capabilities of Vraid1, Vraid5,
or Vraid6 should not be considered a replacement for good backup and
disaster-recovery processes. The best practices for business-critical
applications always include frequent data backup to other near-line or
offline media or devices.
12
Note:
For Vraid0, increasing the protection level does not increase the availability
of the Vdisk. A single disk failure renders it inoperable, even when single
or double protection is enabled.
13
Conversely, the statistical availability of disks and the typical service time to replace a failed disk
(MTTRmean time to repair) indicate that a double protection level is unnecessary in all but the
most conservative installations. A mitigating condition would be a service time (MTTR) that exceeds
seven days. In that case, a protection level of double might be considered.
Best practice to improve availability: Use a single protection level.
Note:
Protection level reserved capacity is not associated with the occupancy
alarm setting. These are independent controls.
14
Data
snapclone
Control
Software or
OS mirroring
Log
Disk Group 1
Data
(clone)
optional
Control
Log
Disk Group 2
If Disk Group 2 does not contain the snapclone of the data files, the number of disks in Disk Group 2
should be determined by the sequential performance demand of the log workload. Typically, this
results in more usable capacity than is required for the logs. In this case, choose Vraid1 for the log
disks. Vraid1 offers the highest availability and, in this case, does not affect the cost of the solution.
A variation on this configuration is shown in Figure 2. In this example, two separate disk groups are
used for the log and control files, and a third disk group is used for the data files. This configuration
has a slightly higher cost but appeals to those looking for symmetry in the configuration. In this
configuration:
Disk Group 1 contains database data files.
Disk Group 2 contains the database log, control file, and archived log files.
Disk Group 3 contains a database log copy, control file copy, and archived log files copy
(if supported).
15
snapclone
Data
Control
Log
Disk Group 1
Disk Group 2
Software or
OS mirroring
Data
(clone)
optional
Control
Log
Disk Group 3
Disk groups can be shared with multiple databases. It is not a best practice to create a separate disk
group for each database.
Best practice to improve availability: For critical database applications, consider placing data files
and recovery files in separate disk groups.
Best practice to improve availability: Assign clones to a separate disk group.
Note:
Creating multiple disk groups for redundancy and then using a volume
manager to stripe data from a single application across both disk groups
defeats the availability value of multiple disk groups. If, for cost or capacity
reasons, multiple disk groups are not implemented, the next best practice is
to store the database log, control file, and log archives in Vraid6 LUNs.
Vraid6 provides greater protection from disk failures than Vraid1, Vraid5,
or Vraid0.
16
17
1200+5
}*100)
40000
= 100 - CEILING({0.0301}*100)
= 100 - CEILING(3.01)
= 100 - 4
= 96
Thus the Occupancy Alarm should be set to 96 percent.
18
Remaining free space: Remaining free space is managed by the creation of Vdisks, and is measured
by the capacity available to create additional Vdisks. This free-space capacity is used by spaceefficient snapshots and Thin Provisioning LUNs. It is critical that sufficient free space be available for
the space-efficient snapshot copies and Thin Provisioning LUNs. If adequate free space is not
available, all snapshot copies in the disk group become inoperative. (Space-efficient snapshot Vdisks
become inoperative individually as each attempts to allocate additional free space. In practice, the
effect is that all become inoperative together.) Fully allocated snapshot Vdisks and clone Vdisks will
continue to be available. Thin Provisioning LUNs will enter an over commit state as writes are
attempted to previously unallocated space. The behavior of over committed Thin Provisioning LUNs
varies by OS, (see Thin Provisioning user guide for expected behavior).
Snapshots use copy-on-write technology. Copy-on-write occurs only when either the original Vdisk or
the snapshot Vdisk is modified (a write); then an image of the associated blocks is duplicated into free
space. For any given block, the data is copied only once. As snapshot copies diverge, the capacity
available to create Vdisks decreases.
The actual capacity required for a space-efficient snapshot depends on the divergence of the original
Vdisk and the snapshot, that is, how much of the data on the original disk changes after the snapshot
is created. This value is unique for each application, but can range from 0 percent to 100 percent.
The suggestion for the initial usage of snapshot (that is, when you do not know the actual physical
capacity required by the snapshot) is to plan for (not allocate to Vdisks) 10 percent of the capacity of
the parent Vdisks times the number of snapshots per parent Vdisk. For example, if you need to create
two space-efficient snapshot Vdisks of a 500-GB Vdisk, you need to ensure that 100 GB (500 GB x
10 percent x 2) of usable capacity is available. Compute the usable capacity using the RAID level
selected for the snapshot Vdisk.
Best practice to improve availability: Leave unallocated Vdisk capacity, the capacity available to
create Vdisks, equal to the sum of the capacities required for all space-efficient snapshot copies
within a disk group. Note that unallocated Vdisk space is also used by thin provisioning Vdisks.
Monitor thresholds on the Vdisk and disk group to determine when to add more physical capacity to
the disk group.
19
20
21
Experience shows that high performance and low cost generally have an inverse relationship. While
this continues to be true, the HP P6000 EVA virtualization technology can significantly reduce the cost
of high performance. This section outlines configuration options for optimizing performance and
price-for-performance, although sometimes at the expense of cost and availability objectives.
Array performance management typically follows one of two strategies: contention management or
workload distribution. Contention management is the act (or art) of isolating different performance
demands to independent array resources (that is, disks and controllers). The classic example of this is
assigning database table space and log files to separate disks. The logic is that removing the
contention between the two workloads improves the overall system performance.
The other strategy is workload distribution. In this strategy, the performance demand is evenly
distributed across the widest set of array resources. The logic for this strategy is to reduce possible
queuing delays by using the most resources within the array.
Before storage devices with write caches and large stripe sets, like the EVA, contention management
was a good strategy. However, with the EVA and its virtualization technology, workload distribution
is the simplest technique to maximizing real-world performance. While you can still manage EVA
performance using contention management, the potential for error in matching the demand to the
resources decreases the likelihood of achieving the best performance.
This is a key concept and the basis for many of the best practices for EVA performance optimization.
This section explores ways to obtain the best performance through workload distribution.
Enhancing performance raises the issue of demand versus capability. Additional array performance
improvements have very little effect on application performance when the performance capabilities of
the array exceed the demand from applications. An analogy would be tuning a car engine. There are
numerous techniques to increase the horsepower output of a car engine. However, the increased
power has little effect if the driver continues to request that the car travel at a slow speed. If the driver
does not demand the additional power, the capabilities go unused.
The best results from performance tuning are achieved when the analysis considers the whole system,
not just the array. However, short of a complete analysis, the easiest way to determine if an
application could take advantage of array performance tuning is to study the queues in the I/O
subsystem on the server and in the array. If there is little or no I/O queuing, additional array
performance tuning is unlikely to improve application performance. The suggestions are still valuable,
but if the array performance capabilities exceed the demand (as indicated by little or no queuing); the
suggestions may yield only a modest gain.
22
23
24
Best practice for Vraid0 and performance: Accepting the possible data and availability loss (Vraid0
provides no protection from disk failure), consider Vraid0 for non-critical storage needs only.
The EVA stripes LUNs across all the disks in a disk group. The amount of LUN capacity allocated to
each disk is a function of the disk capacity and Vraid level. When disks of different capacities exist in
the same disk group, the larger disks have more LUN data. Since more data is allocated to the larger
disks, they are more likely to be accessed. In a disk group with a few, larger-capacity disks, the
larger disks become fully utilized (in a performance sense) before the smaller disks. Although the EVA
does not limit the concurrency of any disk in the disk group, the workload itself may require an I/O to
a larger disk to complete before issuing another command. Thus, the perceived performance of the
whole disk group can be limited by the larger disks.
When the larger disks are included in a disk group with smaller disks, there is no control over the
demand to the larger disks. However, if the disks are placed in a separate disk group, the
storage/system administrator can attempt to control the demand to the disk groups to match the
available performance of the disk group.
Separate disk groups require additional management to balance the workload demand so that it
scales with the performance capability of each disk group. For a performance-focused environment,
however, this is the best alternative.
Best practice to enhance performance: Use disks with equal capacity in a disk group.
Best practice for efficient capacity utilization: Use disks with equal capacity in a disk group.
25
Best practice to enhance performance with disks of unequal capacity: Create separate disk groups
for each disk capacity and manage the demand to each disk group.
26
Note:
HP P6000 Continuous Access requires that all LUNs within a DR group be
owned by the same controller. Load balancing is performed at the DR
group level and not at the individual LUN. Additional configuration
information can be found in the HP P6000 Continuous Access
implementation guide.
27
28
Snapshots take advantage of the HP P6000 EVA virtualization technology and copy only changes
between the two virtual disks. This typically reduces the total data movement and associated
performance impact relative to clones. However, there it does impact the performance. Snapshot uses
copy-on-write technology, meaning that a data set is copied during the host write operation. A data
set is copied only once. After it diverges from the original Vdisk, it is not copied again. Like clones,
each dataset copied competes with the host workload. However, for many applications, less than
10 percent of the data in a Vdisk typically changes over the life of the snapshot, so when this
10 percent has been copied, the performance impact of the copy-on-write ceases. Another
characteristic of typical workloads is that the performance impact exponentially decays over time as
the Vdisks diverge. In other words, the performance impact is greater on a new snapshot than on an
aged snapshot.
Pre-allocated containers can be used to perform the metadata management phase of snapshot or
clone creation, before actually creating the snapshot or clone and initiating the write-cache flush and
data movement phases. This allows the overhead for the metadata management (the creation of a
LUN) to be incurred once (and reused) during low-demand periods. Using pre-allocated containers
can greatly improve response times during snapshot or clone creation. The controller software version
for the P6300 and P6500 arrays includes the additional container type Demand-allocated
allowing the overhead for the metadata management to be incurred once for the demand-allocated
snapshot creation.
Remember, the clone copy operation is independent of the workload; the copy operation is initiated
by the clone request, whereas the snapshot is driven by the workload and by its design, and must
compete with the workload resources (that is, the disks).
To make a consistent copy of a LUN, the data on the associated disks must be current before the
internal commitment for a snapshot or clone. This requires that the write cache for a snapshot or clone
LUN be flushed to the disks before the snapshot or clone commitment. This operation is automatic with
the snapshot or clone command. However, the performance impact of a snapshot or clone operation
can be reduced by transitioning the LUN to write-through mode (no write caching) before issuing the
snapshot or clone copy command, and then re-enabling write caching after the snapshot or clone is
initiated. The performance benefit is greatest when there are multiple snapshots or clones for a
single LUN.
There is an additional step available to enhance the creation performance of either type of local
business copy via the SSSU. The prepare command ensures that the original Vdisk and the
container used for the creation of the snap shot, are both managed by the same controller.
Snapshots affect the time required for the array to perform a controller failover, should it become
necessary. Failover time is impacted by the size of the source Vdisk and the number of snapshots of
the source. To minimize the controller failover time and host I/O interruption, the size of the source
Vdisk multiplied by the number of source snapshots should not exceed 80 TB.
29
30
In addition to the normal LUN divergence consumption of free space, a disk failure and the
subsequent reconstruction can also compete for free space. After a reconstruction, the reserved space
requirements for the protection level can cause the existing snapshot Vdisks to exceed the available
free space and so can cause the snapshot Vdisks to become inoperative. A thinly provisioned Vdisk
would transition to over commit only on a write request (to unallocated capacity) after all available
capacity has been allocated. See the best practice for capacity management and improved
availability to avoid this condition.
31
32
For non-snapshot Vdisks, the EVA always maps data on the disks in its logical order. Unlike other
virtual arrays for which the layout is dynamic and based on the write order, the EVA data structure is
predictable and repeatable.
Given the new LUN-zeroing operation and the predictable data layout, there is no reason (with the
exception of benchmarking) to pre-allocate data by writing zeros to the Vdisk on the EVA.
Note:
Pre-filling a new LUN before a performance benchmark allows the internal
zeroing operation to complete before the benchmark begins.
Summary
All of the preceding recommendations can be summarized in a table. This not only makes it relatively
easy to choose between the various possibilities, but it also highlights the fact that many best practice
recommendations contradict each other. In many cases, there is no single correct choice because the
best choice depends on the goals cost, availability, or performance. In some cases, a choice has
no impact.
Table 2: Summary
2
3
Cost
Discussion
Performance
Yes
No
>1
As few as possible
Maximum
Multiple of 8
Maximum
Maximum
Multiple of 8
Maximum
No
Probability
Yes
Acceptable
Protection level
1 or 2
LUN count
Read cache
Enabled
LUN balancing
Yes
Consult application or operating system best practices for minimum number of disk groups.
Check operating system requirements for any special queue depth-management requirements.
33
Glossary
access density
A unit of performance measurement, expressed in I/Os per second per unit of storage. Example,
I/Os per second per GB.
clones
data availability
data protection
disk group
A collection of disks within the array. Vdisks (LUNs) are created from a single disk group. Data
from a single Vdisk is striped across all disks in the disk group.
Dynamic capacity
management (DCManagement)
34
free space
leveling
LUN
occupancy
The ratio of the used physical capacity to the total available physical capacity of a disk group.
physical space
The total raw capacity of the number of disks installed in the EVA. This capacity includes
protected space and spare capacity (usable capacity).
protection level
The setting that defines the reserved space used to rebuild the data after a disk failure. A
protection level of none, single, or double is assigned for each disk group at the time the disk
group is created.
protection space
reconstruction, rebuild,
sparing
Terms used to describe the process of recreating the data on a failed disk. The data is recreated
on spare disk space.
reserved space
RSS
Redundancy Storage Set. A group of disks within a disk group that contain a complete set of
parity information.
TP
Thin Provisioning
usable capacity
The capacity that is usable for customer data under normal operation.
virtual disk
A LUN. The logical entity created from a disk group and made available to the server and
application.
workload
The characteristics of the host I/Os presented to the array. Described by transfer size, read/write
ratios, randomness, arrival rate, and other metrics.
To simplify management with HP P6000 EVA and to reduce time and money, visit
www.hp.com/go/P6000
Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to
change without notice. The only warranties for HP products and services are set forth in the express warranty
statements accompanying such products and services. Nothing herein should be construed as constituting an
additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
HP-UX Release 10.20 and later and HP-UX Release 11.00 and later (in both 32 and 64-bit configurations) on all
HP 9000 computers are Open Group UNIX 95 branded products.
Oracle is a registered trademark of Oracle and/or its affiliates.
Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation.
4AA3-2641ENW, Created June 2011