Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
torage area networks (SANs) offer an effective means of storing and sharing data. As the amount of data
bandwidth. Acme connected a tape library to the SAN for backup and restore; tapes are stored in a remote site after tapes at the production site are backed up and verified. The remote site also acts as a disaster recovery site for the primary (production) site, and the IT department established a secondary SAN for the applications running at the remote location. Currently, Acme faces three business challenges:
stored on a SAN increases, however, backup windows lengthen and disaster recovery requires more time. EMC SnapView 2.1 and MirrorView 1.7 storage management software can help facilitate efficient backups and disaster recovery in Dell | EMC SAN environments. To demonstrate how IT administrators can improve SAN management in a typical data center environment using EMC storage management software, this article presents a scenario using a fictional company called Acme. In this scenario, Acme has a local data center that uses a Dell | EMC SAN for consolidated, redundant, high-availability storage. In addition, the company has heterogeneous servers sharing the storage and tape library. To protect the companys valuable business data, the Acme IT department implemented a disaster recovery plan. The plan involved deploying a fully redundant Dell | EMC SAN, which allows Acme to achieve high availability at the hardware level. To prevent failures at the operating system (OS) and application levels, the IT department deployed the companys main application in a Microsoft Cluster Service (MSCS) environment. Every application server that is connected to the SAN has two host bus adapter (HBA) cards to provide redundancy and increase
Increasing backup window: As the database grows, the backup window will soon exceed the time available for the daily backups. Lengthy time to recovery: The length of time required for disaster recoveryreferred to as mean time to recovery (MTTR)is growing because larger databases require longer restore times. The companys main application is inaccessible during restoration.
Overhead on the production environment: Acme uses its production database to perform application development work; however, developer access to the production database creates overhead on the database engine. Performing online backups also incurs overhead that affects the performance of the production servers.
August 2003
68
POWER SOLUTIONS
STORAGE ENVIRONMENT
Primary site
NAS server Clustered application server group Stand-alone application server
Secondary site
Remote application server Backup/restore server
To resolve these issues, Acme decided to update its disaster recovery plan. The Acme IT department connected the primary-site SAN with the secondary-site SAN, as shown in Figure 1. The company plans to use SnapView snapshots and clones to create replicas for online backups and for development use. Acme will use MirrorView software across the Dell | EMC SANs to create remote copies for disaster recovery. Using SnapView and MirrorView will also enable Acme to create a plan for recovery at the file, logical unit number (LUN), and array levels, as well as to complete online backups without affecting the production environment.
The snapshot feature uses a cache-and-pointer design, where a chunk map table keeps track of data chunks (groups of blocks) based on their state at a given time. As the first write request to a block is made to the source LUN, the chunk to be modified is copied to a snapshot cache on private LUNsa process known as copy on first write (COFW). The source LUN, the snapshot cache, and the chunk map table work together to create the virtual snapshot LUN. The snapshot LUN is an exact copy of the production LUN, and thus the snapshot must be accessed by a different host, such as a development or backup server. The backup server can read from and write to a snapshot LUN, but any changes made to the snapshot LUN do not replicate back to the source LUN. When the snapshot session is deactivated, the virtual snapshot LUN will be invisible to the server. As Figure 2 indicates, every source LUN can have as many as eight sessions and eight snapshots. Snapshots have a one-to-one relationship with a server. Each snapshot must be assigned to a different server, whereas sessions can be related to any server, depending on which session is activated and when it is activated. The most common use of a snapshot is to produce a backup copy of a large database. Performing an online backup of a database can help to shorten the backup window without interrupting
Storage group 1
Storage group 2
Storage group 3
Up to eight sessions
Up to eight snapshots
Up to eight servers
Storage group 8
POWER SOLUTIONS
69
STORAGE ENVIRONMENT
To create a clone, the initial data is copied, or synchronized, to the clone (see Figure 4). During synchronization, any host write requests made to the source LUN are copied to the clone. Once the
Backup/restore server Snapshot LUN
Monday 6:00 P.M. session NAS server Source LUN Tuesday 6:00 P.M. session Wednesday 6:00 P.M. session Thursday 6:00 P.M. session Friday 6:00 P.M. session
clone is 100 percent synchronized, it is fractured manually at a point in time to create a stand-alone BCV that is independent of the source LUN. Servers cannot access the clone LUN until it is fracturedthough application I/O can still access the source LUN during synchronization. Resynchronization can occur in either direction. To recover data from the clone to the source LUN, administrators can use the reverse synchronization feature while I/O continues to the source LUN. A clone becomes available for read and write access once it is fractured. Administrators also can access a clone by creating a snapshot and then assigning the snapshot to a second server storage group as long as the snapshot is in a different storage group than the source LUN. This manner of implementation not only removes the overhead on the server, but it also enables the source LUN to access snapshots without I/O overhead. After synchronization and fracturing, a clone becomes a fully populated, physical copy of its source LUN. Because clones are not pointer-based replicas, they are not affected by the COFW performance penalty; the data is replicated to the clone instead of being copied to nonvolatile memory along with the modified chunks. This process results in lower performance overhead for clones than snapshots. A clone is commonly used in environments that require quick MTTR or online backups based on the point-in-time copies that have zero impact on the production data. A server can read from and write to a fractured clone without affecting the source LUN. Also, resynchronizing the clone is fast because clones use a space in memory called the clone private log (CPL) to keep track of the changes that occur after they have been fractured. For efficiency, 100 percent resynchronization is avoided; only post-fracture changes are resynchronized.
production access to the database. However, online backups create overhead on the production database server, sometimes even requiring that the database be stopped during the backup window. A SnapView snapshot allows the database to be replicated instantaneously. The replica can then be used for online backups, as well as for development work, without putting additional overhead on the application server. SnapView snapshots also improve and simplify file-level recovery. Administrators can maintain a repository of snapshot sessions across multiple days on the network attached storage (NAS) server connected to the SAN, as shown in Figure 3. If, for example, a user wants to access files from the Friday snapshot session, the SAN administrator can simply activate the Friday session and share that snapshot LUN with the user. The user can then retrieve the needed files by copying files from the snapshot LUN to the source LUN.
Snapshot LUN
70
POWER SOLUTIONS
August 2003
STORAGE ENVIRONMENT
down. The plan also addresses the replication of data from the primary location to the secondary location so that applications running at the secondary site can access the same business data. To implement these processes, the Acme scenario uses the EMC MirrorView add-on software option. MirrorView is similar to the SnapView clone option, but works between Dell | EMC arrays instead of within a single array. Because MirrorView is arraybased software, it does not use server I/O or CPU resources, and it supports all of the operating systems used on the array. Provision for disaster recovery is the major benefit of MirrorView mirroring. As shown in Figure 5, multiple arrays in different locations can mirror to a common disaster recovery site, which makes it the central mirroring site for disaster recovery. If a disaster cripples the primary site, a MirrorView secondary image can be used to recover data and operations at the disaster recovery site. MirrorView runs redundantly across arrays. If one storage processor fails, MirrorViewrunning on the other storage processorwill take ownership of the mirrored LUNs. If the host can fail over I/O to the remaining storage processor (using PowerPath software), then mirroring will continue as normal. After the primary-site array has been recovered, the data at the secondary site can be synchronized back to the primary site. Although the mirrored target cannot be directly assigned to a server while it is acting as a mirrored target, SnapView software can be used to take a snapshot of the secondary mirrored LUN and then assign the snapshot to the servers on the secondary site for immediate access, even if the two sites are mirroring. MirrorView mirroring is synchronous, thus the longer the distance, the longer the delay, because the application must wait for a commitment to be returned from the remote array. For disaster recovery, primary and secondary storage systems should be relatively far apart (within 10 km) and connected through dedicated redundant pairs of fiber-optic cabling for Fibre Channelbased mirroring. For longer distances, other solutions exist.
MirrorView can ensure that data from the primary storage system replicates to the secondary array (see Figure 6). The host (if any) connected to the secondary array might normally sit idle until the primary site fails. With SnapView at the secondary site, the host at the secondary site can take snapshot copies of the mirror images (that is, secondary LUNs) and back them up to other media. This technique provides point-in-time snapshots of production data with little impact to production server performance. MirrorView provides a synchronous mirroring solution, which can help ensure that any write to the primary array also is committed on the secondary array before the production server gets an acknowledgment. Although this technique is commonly implemented on most mirroring technologies, it also requires that latency between two storage arrays be calculated and considered to prevent any performance degradation. Currently, MirrorView runs through either Fibre Channel (using dedicated fiber-optic cables) or Fibre Channel over IP (using routers and sufficient dedicated bandwidth on an IP wide area network, or WAN).
Primary location A
Primary location C
Secondary location B
Primary location B
Primary location D
Primary location A
Secondary location A
72
POWER SOLUTIONS
August 2003
STORAGE ENVIRONMENT
CX400, CX600, or FC4700-2? Yes Single or multiple array? Single What is the purpose of the data copy?
No
Mainframe
Multiple
Microsoft Windows 2000 Server, IBM AIX , Linux , Sun Solaris, Novell NetWare , HP-UX
Arrays to be utilized
Over 500 km 60 km500 km Dense wavelength division multiplexing (DWDM) extender Fibre Channel LW-GBIC Fibre Channel-2 Fibre Channel-1 10 km60 km Up to 10 km Up to 500 m Up to 300 m MirrorView IP or third-party solution
BCV, data replication, online backup, and data recovery within array
Snapshot
Clone
affecting the production environment. These features also provide a way to replicate data to multiple locations as well as maintain data consistency.
Richard Hou (richard_hou@dell.com) is a systems engineer and consultant for the Dell Enterprise Technology and Education Center (ETEC), part of the Dell Enterprise Services and Support Group, where he specializes in SAN and Microsoft solutions. Richard has an M.S. in Electrical and Computer Engineering from The University of Texas at Austin and a B.S. in Mechanical Engineering from Zhejiang University, Hangzhou, China. Steve Feibus (steve_feibus@dell.com) has been a storage enterprise technologist in the Advanced Systems Group at Dell for the past two years and was recently promoted to manager of the Client Technologist team at Dell. Steve has a B.S. in Electrical Engineering from the University of Florida and has spent many years solving customer storage issues using the latest technologies and products. Patty Young (patty_young@dell.com) is a storage enterprise technologist in the Advanced Systems Group at Dell. She has been working with storage solutions for many years, supporting field system consultants in architecting storage solutions for their customers and providing feedback from customers to Dell regarding storage challenges and requirements. Patty has a B.A. from North Carolina State University.
POWER SOLUTIONS
73