Backup Recovery Systems Division EMC Data Domain 2421 Mission College Boulevard Santa Clara, CA 95054 866-WE-DEDUPE 408-980-4800 www.datadomain.com
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Using, copying, and distributing EMC software described in this publication requires an applicable software license. EMC2, EMC, Data Domain, Global Compression, SISL, the EMC logo, and where information lives are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. Copyright 2009-2011EMC Corporation. All rights reserved. Published in the USA.
Contents
Module 0: Course Introduction Module Objectives Lesson 1: Course Introduction Covered Skills Course Objectives Course Objectives (Continued) Course Content Course Content (Continued) Daily Agenda Guides, Introductions, and Orientation Lesson 2: VDC VDC Introduction Access VDC Lab 0.1: Access the VDC Module 1: Deduplication Module Introduction Module Objectives Deduplication Simplified Definition Inline Vs. Post-Process Deduplication Inline Deduplication Inline Deduplication Process Post Processing Deduplication Post-Process Deduplication Process File Based Deduplication Fixed Segment Deduplication Variable Segment Size Deduplication Module Review Module 2: Data Domain Operating Environment Module Introduction Lesson 1: Data Domain Deduplication How Deduplication Works Lesson 2: SISL SISL Definition Deduplication Deduplication (Continued) Lesson 3: DIA Definition DIA End-to-End Verification DIA Fault Avoidance and Containment DIA Continuous Fault Detection and Healing DIA File System Recovery 9 10 11 12 13 14 15 17 18 19 20 21 22 23 25 25 26 27 28 28 29 29 29 30 31 32 33 35 35 36 37 39 40 41 42 43 44 45 46 47 48
Contents 1
Lesson 4: File Systems Administration File System Storage File System Storage File System (Continued) Storage File System (Continued) Lesson 5: Data Domain Data Paths Data Domain System in Typical Backup Environments Data Path over Ethernet Data Path over Fibre Channel VTL Lesson 6: Administration Interfaces Access Enterprise Manager Enterprise Manager Main Screen Enterprise Manager Tabs CLI CLI (Continued) Module Review Module Review (Continued) Module Review (Continued) Module 3: Initial Configuration and Backup Module Introduction Module Objectives Lesson 1: Verify Initial Configuration Launch Enterprise Manager Launch Configuration Wizard Lab 3.1: Initial Configuration Lesson 2: Manage System Access User Classes Manage Administration Access Protocols Create a User Lesson 3: Configure CIFS/NFS Configure a CIFS share Configure an NFS Export Lesson 4: Verify Hardware Model Number, System Uptime, Serial Number Storage (Disk) Status Disk Overview Active Tier Locate a Disk View Usable Disks View Failed, Foreign, or Absent Disks View Chassis Status Lab 3.2: Copy Data to a Data Domain System Module Review Module 4: Manage Network Interfaces Module Introduction and Objectives
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 67 68 69 70 71 72 73 74 76 77 79 80 81 83 84 85 86 87 88 89 90 91 93 94 95 95
Lesson 1: Interfaces, Settings, & Routes Manage Network Routes Create Static Routes Lesson 2: Link Aggregation Definition Requirements Create a Virtual Interface for Link Aggregation Lesson 3: Link Failover Specifications Manage Link Failover Manage Link Failover (Continued) Lesson 4: Manage VLAN and IP Alias Network Interfaces Introduction and Objectives Define VLAN and IP Alias Network Interfaces Create VLAN and IP Aliases Create VLAN and IP Aliases (Continued) Create VLAN and IP Aliases (Continued) Module Review Module Review (Continued) Module Review (Continued) Module 5: Manage a VTL Module Introduction and Objectives VTL Overview Configuration Terms VTL Planning Capacity Planning Create Tapes VTL Barcode Definition Configure VTL Barcode Configure VTL Barcode (Continued) Configure VTL Barcode (Continued) Module Review Module 6: Manage Data Module Introduction and Objectives Lesson 1: Snapshots Manage Snapshots Lab 6.1 Configure Snapshot Lesson 2: Fastcopy Perform Fastcopy Lab 6.2: Perform Fastcopy Lesson 3: Retention Lock Retention Lock (Continued) Retention Lock (Continued) Configure Retention Lock Lesson 3: Sanitization
96 97 99 101 101 102 103 106 106 108 109 110 110 111 112 113 115 117 118 119 121 121 122 124 125 126 127 128 129 130 131 132 133 133 134 136 137 138 139 140 141 142 143 144 145
Contents 3
Lab 6.3: Configure Sanitization Lesson 4: Encryption of Data at Rest Passphrase and Encryption Key File System Locking Encryption Flow Configure Encryption Apply Encryption Changes Deactivate Encryption Lab 6.4 Configure Encryption Lesson 5: File System Cleaning Cleaning Lab 6.5: Configure File-System Cleaning Lesson 6: Monitor File-System Space Usage File System Summary Tab Space Usage Space Usage Terms Space Consumption Tab Space Consumption Terms Daily Written Tab Daily Written Tab Terms Module Review Module Review (Continued) Module 7: Manage Data Replication and Recovery Module Introduction and Objectives Lesson 1: Data Replication Lesson Objectives Data Replication Overview Data Domain Replication Types Collection Replication Directory Replication Pool Replication Replication Context Replication Context Streams Replication Topologies Configure General Replication Configured Advanced Replication Replication Seeding Low Bandwidth Optimization Benefits Low Bandwidth Optimization Using Delta Compression Encryption Over Wire Lab 7.1: Replication Lesson 2: Recover Data Recover Replication-Pair Data Why Resynchronize Recovered Data? Resynchronization Process
147 148 148 149 150 151 152 153 154 155 156 158 159 160 161 161 163 163 164 164 165 166 167 167 168 169 170 171 172 173 174 175 176 177 179 180 181 183 184 186 187 188 189 190 191
Manage Throttle Settings Module Review Module Review (Continued) Module 8: Manage DD Boost Module Introduction and Objectives DD Boost Overview DD Boost Overview (Continued) DD Boost Flow Replica Awareness Flow DD Boost Replication Awareness Advantages Deduplication With Distributed Segment Processing NetWorker Data Zone Architecture NetWorker Server Architecture Networker Work Flow Interface Groups Firewall Ports Download DD Boost Plug-In Software Configure DD Boost Configure DD Boost (Continued) Lab 8.1: DD Boost Module Review Module 9: Plan Capacity and Throughput, Monitor Throughput Module Introduction and Objective Lesson 1: Plan Capacity Collect Information Collect Information (Continued) Determine Capacity Needs Compression Requirements with Variables Compression Requirements with Variables (Continued) Calculate Required Capacity Capacity Requirements Calculation (Page 1 of 2) Capacity Requirements Calculation (Page 2 of 2) Calculate Required Throughput System Model Capacity and Performance Select Model Calculate Capacity Buffer for Selected Models Match Required Capacity to Model Specifications Calculate Performance Buffer for Selected Models Match Required Capacity to Model Specifications Lesson 2: Throughput Tuning Throughput Bottlenecks Performance Metrics Tuning Solutions Monitor Throughput Module Review
192 194 195 197 197 198 199 200 202 203 205 207 209 210 211 212 213 214 215 216 217 219 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241
Contents 5
Module 10: Monitor a Data Domain System Module Introduction Module Objectives Lesson 1: Monitor a Data Domain System Using SNMP SNMP Flow Download and Configure MIB File Lab 10.1: Monitor Using SNMP Lesson 2: Syslog (Remote Logging) Configure Remote Logging Lab 10.2: Monitor Using Syslog Lesson 3: Log Files ddvar log files Log File Types Lab 10.3: Manage Log Files Lesson 4: Support Bundles Generate Support Bundles Lesson 5: Autosupport Autosupport System Autosupport Types Autosupport Via Enterprise Manager Autosupport Reports Autosupport Reports (Continued) Autosupport Reports (Continued) Detailed Autosupport Report Contents Daily Summary Autosupport Alerts Alerts (Continued) Alerts (Continued) Alert Message Types Alerts Notification Find Autosupport Information Find System Autosupports Autosupport Device Symbols and Display Options View Space Plot, Autosupports, and Support Cases Lab 10.4: Autosupport Module Review Module Review (Continued) Module 11: Upgrade a Data Domain System Module Introduction and Objective Download Software System Upgrade Appendix A: Licenses Appendix B: Further Reading Access System Documentation
243 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 279 280 281 283 285 285
285 286
Contents 7
The Data Domain System Administration course provides you with the knowledge and skills you need to maintain Data Domain systems. This course includes: Lecture Demonstrations Hands-on lab exercises (see lab guide) Reviews Pointers to Data Domain system documentation
Module Objectives
After completing this module, you should be able to: Describe this course Access the EMC Virtual Data Center (VDC)
10
After completing lesson 1, you should be able to: Describe the objectives for this course Describe the course content Describe the daily agenda Identify the other students Identify your course materials
11
Covered Skills
This course focuses on Data Domain system: Concepts Basic configuration System monitoring
Other courses in the Data Domain curriculum cover: Design Installation 3rd-party application integration Troubleshooting Parts replacement
12
Course Objectives
13
14
Course Content
This course contains 11 modules. Each module contains lecture and review. Some modules contain labs. Module 0 (this module) is the course introduction. It: Gives a course overview. Introduces course materials Provides instructions to access the VDC. Module 1 covers deduplication (not Data Domain specific) Module 2 gives a product overview. It includes an overview of: The Data Domain operating system The Data Domain file system Data Domain data paths Module 3 covers how to perform the initial setup of a Data Domain system and how to do a file-system based backup.
15
Module 4 covers how to manage a Data Domain system network. Module 5 covers how to configure a VTL. Module 6 covers how to manage data.
16
Module 7 covers how to perform data replication and recovery. Module 8 covers DD Boost Module 9 covers planning for capacity Module 10 covers monitoring a Data Domain system Module 11 covers upgrading the Data Domain operating system Appendix A lists Data Domain software licenses Appendix B tells you how to find Data Domain documentation
17
Daily Agenda
This is a typical agenda. Your instructor may speak with you about variances.
18
Use your student guide to follow the lecture and take notes. Use your lab guide for step-by-step instructions on the labs.
19
Lesson 2: VDC
EMC2 Education Services provides a virtual data center (VDC) for you to use during labs. The VDC gives you access to Data Domain systems.
20
VDC Introduction
The VDC provides Microsoft Windows and Linux virtual machines (VMs).
21
Access VDC
Your instructor will give you a user name and password to access the VDC.
22
Once your instructor gives you your user name, password and lab introduction, locate your lab guide and follow the step-by-step instructions to complete this lab.
23
24
Module 1: Deduplication
Module Introduction
This module introduces you to general deduplication. Data Domain system deduplication is discussed in the next module. This module reviews concepts that you learned about in the EMC Data Domain Technology and Systems Introduction (eLeanring) course.
25
Module Objectives
Deduplication is an important technology that improves data storage by providing extremely efficient data backups and archiving. In this module you learn more about general deduplication.
26
Module 1: Deduplication
Deduplication eliminates redundant data because it stores only one instance of data. For example and only an example, this isnt a precise definition the sentence Mary had a little lamb gets stored as Mary hd lite mb. No second instances of letters get stored. Deduplication recognizes and deletes common elements in data. It stores only one copy of the duplicated data. It looks at each segment of an incoming data stream to determine if it needs to be stored or if it can be replaced by a smaller reference to a segment that is already on the disk.
27
Inline Deduplication
It is simpler to use inline deduplication. Data is filtered before its stored to disk, so its like a regular storage system (it just writes and reads data). Theres no separate administration involved (managing multiple pools some with deduplication, some with regular storage or managing the conditions between them). Incoming data is examined as soon as it arrives to determine if the segment is new or unique or a duplicate of a segment previously stored. Inline deduplication occurs in RAM before the data is written to disk. Around ninety-nine percent of data segments are analyzed in RAM without disk access. A very small amount of data is not identified immediately as either unique or redundant. That data is stored to disk and examined again later against the already stored data. Because deduplication is done with limited disk access, the speed of inline deduplication is not limited by disk seek times. Stream speed is as fast as other virtual tape library products that do not have deduplication.
28
Module 1: Deduplication
29
With file-based deduplication, if 2 files are exactly alike, 1 file is stored and future interations of the file are pointed to the original file. File-based deduplication doesnt segment data like a Data Domain system, nor does it chunk data like an Avamar system. It certain situations it can be efficient. However, it often doesnt result in as large a data reduction as other deuplication methods.
30
Module 1: Deduplication
With fixed-segment deduplication, if you add a segment, the entire data stream moves.
31
Sub-file (segment) deduplication: Analyses data backup streams Breaks file into smaller fixed or variable sized segments. Smaller segments make it easier for Data Domain systems to find duplicates. Is good for backup data stores Compares backup data against existing data segments Is commonly used as a quick fix for backup problems Is more efficient than file based deduplication
Variable segment sized deduplication if better than fixed-segment size deduplication because you can add data to a variable segment and it doesnt move a data stream. If you add data to a fixed segment, the entire data stream moves.
32
Module 1: Deduplication
Module Review
33
34
Module 1: Deduplication
This module introduces you to the Data Domain system. It reviews concepts that you learned about in the EMC Data Domain Systems and Technology Introduction (eLearning) course. This module includes 6 lessons: 1. Data Domain Deduplication 2. SISL 3. DIA 4. File Systems 5. Data Domain Data Paths 6. Administration Interfaces
35
Data Domain systems are disk-based deduplication appliances. In this lesson you will learn more about deduplication.
36
The end result of identifying unique data segments and compressing the unique data before storage is a significant reduction in the size of the data stored on disk. Because of the size reductions, data can be retained on disk on site. The reduced data size also makes WAN vaulting possible because of up to 99% bandwidth reduction. Data Domain global compression technology reduces the data footprint by applying a combination of deduplication and local compression. Deduplication works by breaking the data into segments and then identifying the unique segments. Local compression is performed on the unique segments using standard compression algorithms before the unique data is written to disk. For example, in the illustration of the first full backup, two segments labeled B are the same, so segments A, B, C, and D are stored with a reference to B instead of storing a second copy. The compression factor at this point is the ratio of the size of the original 5 segments received (A+B+C+B again+D) to the size of the 4 segments (A+B+C+D) stored on disk. Usual data reduction for a first full backup is 3-4x.
37
The next backup in the illustration is incremental. The backup includes copies of A and B as well as a new segment E. Only the new segment E is stored. A and B are stored as references to the previously stored segments. The compression factor of this backup is quite good since it is the ratio of the 3 received segments (A+B+E) to the single stored segment E. Usual data reduction for a file level incremental is 6-7x. The second full backup is when the reductions start to become very large. A,B,C,D and E are recognized as duplicates from the previous two backups, and only the new segment F gets stored. The compression factor of this second full backup is very high, with 6 segments coming in but only the one new segment getting stored. Usual data reduction for the second full backup is 50-60x. The compression factor over all three backups is the ratio of all 14 segments coming from the backup software to be stored to the 6 segments that get stored to represent all the data received over time. Deduplication is the process of recognizing common elements in the many versions and copies of data and eliminating the redundant copies of those common elements. With deduplication only one copy of duplicated data is stored. Because deduplication is performed with limited disk access, the speed of in line deduplication is not limited by disk seek times. Stream speed of Data Domain systems is as fast as other virtual tape library products that do not have deduplication. The process is shown as follows: Inbound data is analyzed in RAM. If data is redundant, a reference to the stored duplicate data is created. If data is unique, it is compressed and stored. If data cannot be identified as unique or redundant, the data is stored and re-examined later. For IT administrators, deduplication means there are fewer and smaller storage systems to manage, smaller data centers to run, fewer tapes to handle, fewer tape pickups, smaller network pipes, and cheaper WAN links.
38
Lesson 2: SISL
39
SISL Definition
40
Deduplication
The Data Domain system: 1. Segments 2. Fingerprints 3. Filters 4. Compresses 5. Writes data to containers, containers written to disk
41
Deduplication (Continued)
The Data Domain system: 1. Segments 2. Fingerprints 3. Filters 4. Compresses 5. Writes data to container, container written to disk
42
Lesson 3: DIA
43
Definition
44
The flow for continuous fault detection and healing is: 1. The Data Domain system periodically rechecks the integrity of the RAID stripes and container logs. 2. The Data Domain system uses the redundancy of the RAID system to heal any faults. 3. During every read, data integrity is re-verified. 4. Errors are healed as they are encountered.
45
The following is true for fault avoidance and containment: New data never puts old data at risk Container log never overwrites or updates existing data New data written in new containers The old containers and references remain in place and safe even in the face of software bugs or hardware faults that may occur when storing new backups New data never overrides good data Fewer complex data structures NVRAM for fast restarts No partial stripe writes
46
Here is the flow for continuous fault detection and healing: 1. The Data Domain system periodically rechecks the integrity of the RAID stripes and container logs 2. Uses the redundancy of RAID system to heal any faults 3. During every read, data integrity is re-verified 4. Any errors are healed as they are encountered
47
Here is the flow for file system recoverability: 1. Data is written in a self describing format 2. The file system can be recreated by scanning the logs and rebuilding it from meta data stored with the data
48
49
The Data Domain system administrative file system is called ddvar: The NFS directory is /ddvar The CIFS directory is \ddvar
This file system stores system core and log files. You cannot rename or delete this file system. Nor can you access all of its sub-directories, for example, the core sub-directory. Data streams for this file system change according to the Data Domain OS version and hardware model. Check the Data Domain support portal for more information on data streams for each OS and hardware model.
50
/data/col1/ is the parent directory path under which all user data is retained. MTrees provide granular data management so that you can manage and report on different data types or data from different sources separately. For example, you can configure compression separately on different types of data in separate Mtrees. Note: You cant replicate under Mtrees.
51
52
You can add up to 14 MTrees /data/col1 to keep data separate. You can add subdirectories to MTree directories. You cannot add anything to the /data directory. You can only change to the col1 subdirectory. Subdirectories can be added under /data/col1/backup. The backup MTree (/data/col1/backup) cannot be deleted or renamed. If MTrees are added, they can be renamed and deleted. You can replicate under /backup
Note: This slide shows the Data Domain system view, not the client view. The client view would not show /data/col1.
53
54
Data Domain systems connect to backup servers as storage capacity to hold large collections of backup data. A Data Domain system integrates into typical backup environments non-intrusively. Often the Data Domain system is connected directly to the backup server. The backup data flow is redirected from the clients to the Data Domain device instead of to tape. If tape needs to be made for long term archival retention, data flows from the Data Domain device back to the server and then to tape, completing the same flow that the backup server was doing initially. Tapes come out in the same standard backup software formats as before and can go off site for long term retention. If a tape must be retrieved, it goes back into the tape library and the data flows back through the backup software to the client that needs it.
55
Backup and archive media servers send data from clients to Data Domain systems on the network. A direct connection between a dedicated port on the backup or archive server, and you can also use a dedicated port on the Data Domain system. The connection between the backup or archive server and the Data Domain system can be Ethernet or Fibre Channel, or both if needed. This slide shows the Ethernet connection. Data is written to the backup file system on a Data Domain system. When Data Domain replicator is licensed on two Data Domain systems, replication is enabled between the two systems. The Data Domain systems can be either local for local retention or remote, for disaster recovery. Data in flight over the WAN can be secured using VPN. Physical separation of the replication traffic from backup traffic can be achieved by using two separate Ethernet interfaces on the Data Domain system. This allows backups and replication to run simultaneously without network conflicts. Since the Data Domain OS is based on Linux, it needs additional software to work with CIFS. Samba software enables CIFS to work with the Data Domain OS.
56
If the Data Domain virtual tape library (VTL) option is licensed, and a VTL FC HBA is installed on the Data Domain system, the system can be connected to a Fibre Channel SAN. The backup or archive media server sees the Data Domain system as one or multiple VTLs with up to 256 virtual LTO-1, LTO-2, or LTO-3 tape drives and 20,000 virtual slots across up to 100,000 virtual cartridges.
57
58
With the Enterprise Manager, you can manage 1 or more Data Domain systems. You can access the Enterprise Manager with many browsers, for example: Internet Explorer Chrome Firefox
2. Enter your user name and password 3. Enter your system IP address
59
60
61
CLI
There are 4 ways to log into the CLI: 1. SSH (PuTTY) 2. Serial console 3. Telnet (default is not enabled) 4. Keyboard and monitor (KVM) Initial login information: login: sysadmin password: serial number on Data Domain model (box).
62
CLI (Continued)
You can do everything in the CLI that you can do from the Enterprise Manager. You enter commands with options, for example: #command argument options #filesys show space #user show list #user add Bob #help admin access #help show
63
Module Review
64
65
66
67
Module Objectives
After completing after completing this module, you should be able to: Verify and perform an initial configuration Manage system access Configure CIFS/NFS Verify hardware Copy data to a Data Domain system
68
Your environment for this course assumes that your hardware installation is complete and your network connections are established. In this lesson you will learn how to verify or configure: Licenses Network settings System settings Protocols
After you complete this lesson, you should be able to use the Enterprise Manager to perform an initial setup of the following: Licenses Network and system settings Protocols: CIFS, NFS, DD Boost (to be expanded in the DD Boost module). Currently, you cannot complete the configuration for the VTL protocol in the VDC.
69
To perform an initial configuration from the Enterprise Manager, do the following: 1. Open the Enterprise Manager from an Internet Explorer browser, for example: http://dddev-01/ddem 2. Enter your assigned Username and Password. 1. Double-click on Login.
70
To launch the Configuration Wizard: 1. From the Enterprise Manager, Click on Maintenance. 2. Click on the More Tasks pull-down menu. 3. Double-Click on Launch Configuration Wizard... 4. Follow the Configuration Wizard prompts. You must follow the configuration prompts. You cant select an item to configure from the left-hand navigation pane. You will be prompted to submit your configuration changes as you move through the wizard. You can also quit the wizard during your configuration. The CLI command is config setup.
71
72
In a Data Domain system there are 2 user privilege types: admin and user. In this lesson, youll learn how to manage both.
73
User Classes
You may have to mange access for administrators and users on a Data Domain system. A Data Domain system supports 3 classes of access: 1. Sysadmin (default account): Default administrative account. Cant be deleted. Creates1st security officer Has access to all GUI and CLI configuration, management, and monitoring commands. Can view all users. Can change the sysadmin password, but cant delete the account. Can enable/disable users with admin or user privileges, except sysadmin user. sysadmin is domain admin by defaul when you integrate with Active Directory 2. Local user (user): Has access to GUI and CLI monitoring commands.
74
Can view only their own account. Cant disable sysadmin. Cant make changes to the configuration. 3. Security officer (user): Can enable, disable, modify, and delete other security officers. Can view all users.
75
Use the Access Management page to configure and manage access protocols: Telnet access FTP access HTTP/HTTPS access SSH access
76
Create a User
Item
User Password Verify Password Privilege The user ID or name
Description
The user password. Set a default password, and the user can change it later. The user password, again The privilege of the user: admin, security, or user
77
The default value for the minimum length of a password or minimum number of character classes required for a user password is 1. Allowable character classes include:
Lowercase letters (a-z) Uppercase letters (A-Z) Numbers (0-9)
The available privileges display based on users privilege. Only the Sysadmin user can create the first security officer. After the first security officer is created, only security officers can create or modify other security officers. Sysadmin is the default admin user and cannot be deleted or modified. 5. Enter the following information in the Advanced Tab:
Item
Minimum Days Between Change Maximum Days Between Change Warn Days Before Expire Disable Days After Expire Disable account on the following date
Description
The minimum number of days between password changes that you allow a user. Default is 0. The maximum number of days between password changes that you allow a user. Default is 99999. The number of days to warn the users before their password expires. Default is 7. The number of days after a password expires to disable the user account. Default is Never. Check this box and enter a date (mm/dd/yyyy) on which you want to disable this account. Also, you can click on the calendar to select a date.
6. Click OK. The default password policy can change if the admin user changes them from the Modify Password Policy task. The default values are the initial default password policy values.
78
79
To configure a CIFS share, you must: 1. Configure the workgroup mode 2. Configure the Active Directory mode 3. Give a descriptive name for the share 4. Enter the path to the target directory (for example, /data/col1/mtree1)
80
To configure an NFS export, do the following: 1. Enter a path name for the export 2. In the Clients area, select an existing client or click the + icon to create a client The Clients dialog box appears. a. Enter a server name in the text box. Enter fully qualified domain names, host names, or IP addresses. A single asterisk (*) as a wild card indicates that all backup servers are to be used as clients. Clients given access to the /data/col1/backup directory have access to the entire directory. A client given access to a subdirectory of /data/col1/backup has access only to that subdirectory.
A client can be a fully-qualified domain host name, class-C IP addresses, IP addresses with either netmasks or length, an NIS netgroup name with the prefix @, or an asterisk (*) wildcard with a domain name, such as *.yourcompany.com. A client added to a subdirectory under /data/col/backup has access only to that subdirectory. Enter an asterisk (*) as the client list to give access to all clients on the network.
81
b. Select the check boxes of the NFS options for the client.
- Read-only permission. - (Default) Requires that requests originate on a port that is less than IPPORT_RESERVED (1024). - Map requests from UID or GID 0 to the anonymous UID or GID. - Map all user requests to the anonymous uid or gid. - Use default anonymous UID or GID.
82
As part of your Data Domain initial setup, you should verify that your hardware is correctly installed and running. This lesson teaches you how to do that. The CLI command is system show status.
83
To verify your model number, system uptime, and serial number in the Enterprise Manager: 1. Select a system from the left-hand navigation pane. 2. Click the Maintenance tab. 3. Verify the information. The CLI commands are: 1. system show status 2. system show config 3. system disk show
84
To verify your storage status in the Enterprise Manager: 1. Click the Hardware tab 2. View the Storage Status If the Status indicator Storage operational is green, all disks are good. If the Status indicator Storage operational is yellow, the system is working, but there is a problem. I the Status indicator Storage operational is red, the system isnt operational.
85
Disk Overview
To get an overview of your storage (disk) status in the Enterprise Manager: 1. Select a system from the left-had navigation pane. 2. Click the Hardware tab. 3. Click Storage. From here you can view the Active Tier, Usable Disks, and Failed\Foreign\Absent Disks.
86
Active Tier
Disks in the Active Tier are currently marked as usable by the Data Domain file system. Sections are organized by Disks in Use and Disks Not in Use. If the optional archive feature is installed From the Storage Status Overview pane, you can expand your view of the disk use in the Active Tier. You can view for both Disks In Use and Disks Not In Use: Disk Group: here it is dg0 Status. Here it is 1 Normal. Disk Reconstructing Total Disks Disks
You can also expand the Disks In Use view to view individual disks. Once you View Details, you can Beacon individual disks. The CLI command is storage show
87
Locate a Disk
To locate a disk (for example, when a failed disk needs to be replaced): 1. Select the Data Domain system in the Navigational pane. 2. Click the Hardware > Storage > Disks tabs. The Disks view appears. 3. Select a disk from the Disks table and click Beacon. The Beaconing Disk dialog window appears, and the LED light on the disk begins flashing. 4. Click Stop to stop the LED beaconing.
88
Usable disks are those that arent incorporated into the file system yet. To view details about usable disks from the Enterprise Manager: 1. Select a system from the left-hand navigation pane. 2. Click the Hardware tab. 3. View the status, which includes the disk: Name Status Size Manufacturer/Model Firmware Serial Number You can also view the details of individual disks. The CLI command is disk show status.
89
To get the status on Failed/Foreign/Absent Disks in the Enterprise Manager: 1. Select a system from the left-hand navigation pane. 2. Open the Failed/Foreign/Absent Disks panel. 3. View the following disk information: Name Status Size Manufacturer/Model Firmware Serial Number 4. You can also view the details of individual disks.
90
To view your chassis status in the Enterprise Manager: 1. From the left-hand navigation pane, select a system. 2. Click the Hardware tab. 3. Click Chassis. From here you can view the following by hovering your mouse over them: NVRAM PCI Slots SAS Power Supply PS FAN Riser Expansion Temperature
91
FANs FRONT and BACK chassis views The CLI command is system show status.
92
93
Module Review
94
After completing this module, you should be able to: Manage network interfaces, settings, and routes Describe and use link aggregation. Describe and use link failover. Describe and use VLAN tagging. Describe and use IP aliases.
95
Use the Hardware > Network > Settings view to view and configure network settings.
96
Routes determine the path taken to transfer data to and form the Data Domain system to another network or host. To set the default gateway:
1. From the Navigational pane, select the Data Domain system to configure. 2. Click the Hardware > Network > Routes tabs. 3. Click Edit in the Default Gateway area. The Configure Default Gateway dialog box appears. 4. Choose how the gateway address is set. Either:
Select Use DHCP value radio button for setting the gateway.
Dynamic Host Configuration Protocol (DHCP) indicates if the gateway is configured using value from DHCP server.
Select the Manually Configure radio button.
97
5. Click OK. The system processes the information and returns you to the Routes tab.
98
1. From the Navigational pane, select the Data Domain system to configure. 2. Click the Hardware > Network > Routes tabs. 3. Click Create in the Static Routes area The Create Routes dialog box appears. 4. Select an interface to configure for the static route. a. Click the check boxes for the interface(s) whose route you are configuring. b. Click Next. 5. Specify the Destination. Select either of the following. The Network Address and Netmask. a. Click the Network radio button. b. Enter destination information, by providing destination network address and netmask.
99
Note: This is not the IP of any interface. The interface is selected in the initial dialog and it is used for routing traffic. The Host Name or IP address of host destination. a. Click the Host radio button.
b. Enter the Host Name or IP address of the destination host to use for the route.
100
Definition
Using multiple Ethernet network cables, ports, interfaces (links) in parallel, link aggregation increases network throughput, across a LAN or LANs until the maximum computer speed is reached. Data processing becomes faster than when data is sent over individual links. For example, you can enable link aggregation on a virtual interface (veth1) to a physical interfaces (eth0a and eth0b) in the link aggregation control protocol (LACP) mode and hash XOR-L2. Link aggregation evenly splits network traffic across all links or ports in an aggregation group. It does this minimum with impact to the splitting, assembling, and reordering of out-of-order packets. Aggregation is between 2 directly attached systems (point-to-point and physical or virtual). Normally the aggregation is between the local system and the network device or system that is connected. Normally a Data Domain system is connected to a switch or router. Aggregation is handled between the IP layer (L3 and L4) and the mac layer (L2) network driver. Link aggregation performance is impacted by the following:
101
Switch speed: normally the switch can handle the speed of each link that is connected to it, but it may lose some packets if all of the packets are coming from several ports that are concentrated on one uplink running at maximum speed. In most cases, this means that you can use only 1 switch for port aggregation coming out of a Data Domain system. Some network topologies allow for link aggregation across multiple switches. How much the Data Domain system can process Out-of-order packets: a network program must put out-of-order packets back to the original order. If the link aggregation mode allows the packets to be sent out-of-order, and the protocol requires that they be put back to the original order, the added overhead may impact the throughput speed enough so that the link aggregation mode that caused the out-of-order packets shouldnt be used. Number of clients: in most cases, either the physical of OS resources cant drive data at multiple Gbps. Also, due to hashing limits, youd need multiple clients to push data at multiple Gbps. n Number of streams (connections) per client can significantly impact link utilization depending on the hashing that you use.
A Data Domain system supports 2 aggregation methods: 1. Round robin 2. Balanced-xor (you set up manually on both sides)
Requirements
Links can only be part of 1 group. Aggregation is between 2 systems. All links in a group have the same speed. All links in a group are half-duplex or full-duplex. No changes to the network headers are allowed. You must have a unique address across aggregation groups. Frame distribution must be predictable and consistent.
102
To create a link aggregation virtual interface: 1. From the navigation pane, select a Data Domain system. 2. Click Hardware > Network > Interfaces. 3. Under the Interfaces tab, disable the physical interface where you want to add the virtual interface. Select No from the Enabled pull-down menu. 4. From the Interface Type pull-down menu, select Virtual Interface. The Create Virtual Interface dialog box appears. 5. Specify a virtual interface name in the veth text box. Enter a virtual interface name in the form vethx, where x is a unique ID (typically 1 or 2 digits). A typical full virtual interface name with VLAN and IP alias in veth56.3999.199. The maximum length of the full name is 15 characters. Special characters are not allowed. Numbers must be between 0 and 9999. 6. Select Aggregate from the bonding Type drop-down list.
103
Registry setting can be different from the bonding configuration. When you add interfaces to the virtual interface, the information isnt sent to the bonding module until the virtual interface is brought up. Until that time the registry and the bonding driver configuration are different. 7. From the General tab, specify the Bonding Mode. Specify a bonding mode thats compatible with the system requirements to which the interfaces are directly attached. The available modes are: Round robin: transmits packets in sequential order from the 1st available link through the last in the aggregated group. Balanced: Data sent over interfaces as determined by the hash method yo select. Associated interfaces on the switch must be grouped into an Ether channel (trunk). LACP: Similar to Balanced except it has a control protocol that communicates with the other end and coordinates what links, within the bond, are available. It provides heartbeat failover. 8. Specify the bonding hash. In the General tab, from the Bonding Hash pull-down menu select either Layer 2 (L2) or Layer 3/Layer 4 (L3L4).
Layer 2 (XOR-L2)
Transmit based on static balanced and LACP mode aggregation with an XOR hash or layer 2 (inbound and outbound MAC addresses). Transmit based on static balanced and LACP mode aggregation with an XOR has of layer 2 (inbound and outbound IP address) and layer 3 (inbound and outbound interface numbers). Transmit based on static balanced and LACP mode aggregation with an XOR hash of Layer 3 (inbound and outbound IP address) and Layer 4 (inbound and outbound interface numbers).
9. Select an interface to add to the aggregate configuration by click in the check ox corresponding to the interface. 10. Click Next. The Create virtual interface veth_name dialog box appears. 11. Enter an IP address. 12. Enter a Netmask address. The Netmask is the subnet portion of the IP address that is assigned to the interface. The format is usually 255.255.255.XXX, where XXX is the value that identifies the interface. If you dont specify a netmask, the Data Domain system uses the netmask format as determined by the TCP/ IP address class (A, B, C) that you are using. 13. Specify speed and duplex options. Note: Aggregation isnt available for NICs. The combination of speed and duplex settings define the rate of data transfer through the interface.
104
Autonegotiate Speed/Duplex: select this option to allow the NIC to autonegotiate the line speed and duplex setting for an interface. Manually configure Speed/Duplex: select this option to manually set an interface data transfer rate. Duplex options are half-duplex or full-duplex. Speed options are limited to the capabilities of the hardware. The option are 10 Base-T, 100 Base-T, 1000 Base-T (Gb), and 10,000 (10 Gb). Half-duplex is only available in 10 base-T and 100 Base-T speeds. 1000 and 10,000 line speeds require full-duplex. Optical interfaces require the Autonegotiate option. The copper interface default is 10 Gb. If a copper interface is set to 1000 or 10,000 line speed, duplex must be full-duplex. 14. Specify maximum transfer unit (MTU) settings. This sets the size for the physical (Ethernet) interface. Supported values are from 350 to 9014. For 100 base-T gigabit networks, 1500 is the default. 15. Click the Default button to return the setting to the default value. 16. Ensure that all of your network components support the size set with this options. 17. Select Dynamic Registration (optional). The dynamic DNS (DDNS) protocol enable machines on a network to communicate with and register IP addresses on a Data Domain system DNS server. The DDNS must be registered to enable this option. 18. Click Next. The Configure Interface Settings summary page appears. 19. Ensure that the values listed are correct. 20. Click Finish 21. Click OK.
105
Link failover increases network throughput by keeping backups operational during network glitches. Link failover is supported by a bonding driver on a Data Domain system. The bonding driver checks the carrier signal on the active interface every 9 seconds. If the carrier signal is lost, the active interface is changed to another standby interface. An ARP is sent to indicate that the data must flow to the new interface. The interface can be: On the same switch On a different switch Directly connected
Specifications
Only 1 interface in a group can be active at a time. Data flows over the active interface. Non-active interfaces can receive data. You can specify a primary interface. If you do specify a primary interface, it will be the active interface if its available.
106
Bonded interfaces can go to the same or different switches. You do not have to configure a switch to make link failover work. 1 GbE interface You can put 2, or more, interfaces in a link failover bonding group. The bonding interfaces can be: On the same card Across cards Between a card and a motherboard Link failover is independent of the interface type. For example, copper and optical can be failover links if the switches support the connections.
10 GbE interface You can put only 2 interfaces in a failover bonding group. The bonding interfaces can only be on the same card.
107
108
109
In this lesson youll learn about VLAN and IP alias network interfaces and how to manage it through the Enterprise Manager.
110
Virtual local area networks (VLANs) manage subnets on a network. VLANs enable a LAN to transcend physical boundaries. They enable you to segregate network broadcasting. They are used to: Provide security Speed up network traffic Organize network LANs.
If youre not using VLANs, you can use IP aliases. Ip aliases are easy to implement and are less expensive than VLAN, but they are not a true VLAN. For example, you must use 1 IP address for management and another IP address to backup or archive data. You can combine VLANs and IP aliases.
111
VLAN tag insertion (VLAN tagging) enables you to create multiple VLAN segments. (You get the tags from the network administrator.) In a Data Domain system, you can have up to 4096 VLAN tags.
Create a new VLAN interface from either a physical interface or a virtual interface. The recommended total number that can be created is 80, though it is possible to create up to 100 interfaces before the system is affected.
112
To create a VLAN tag from the Enterprise Manager: 1. From the Navigational pane, select the Data Domain system to configure. 2. Click the Hardware > Network > Interfaces tabs. 3. Click Create and select the VLAN option. The Create VLAN dialog box appears. 4. Specify a VLAN ID by entering a number in the ID field. The range of a VLAN ID is between 1 and 4095. 5. Enter an IP Address. The Internet Protocol (IP) Address is the numerical label assigned to the interface. For example, 192.168.10.23 6. Enter a Netmask address. The Netmask is the subnet portion of the IP address that is assigned to the interface. The format is typically 255.255.255.###, where the ### are the values that identify the interface. If you do not specify a netmask, the Data Domain system uses the netmask format is determined by the TCP/IP address class (A,B,C) you are using. 7. Specify MTU Settings.
113
This sets the maximum transfer unit (MTU) size for the physical (Ethernet) interface. Supported values are from 350 to 9014. For 100 Base-T and gigabit networks, 1500 is the standard default. Notes: Click the Default button to return the setting to the default value. Ensure that all of your network components support the size set with this option. 8. Specify Dynamic DNS Registration option. Dynamic DNS (DDNS) is the protocol that allows machines on a network to communicate with, and register their IP address on, a domain name system (DNS) server. The DDNS must be registered to enable this option. Refer to Registering a DDNS for additional information. 9. Click Next. The Configure Interface Settings summary page appears. The values listed reflect the new system and interface state. 10. Click Finish and OK.
114
Create a new IP Alias interface from either a physical interface or a virtual interface. The recommended total number of IP Aliases and virtual interfaces that can be created is 80 though it is possible to create up to 100 interfaces. 1. From the Navigational pane, select the Data Domain system to configure. 2. Click the Hardware > Network > Interfaces tabs. 3. Click Create and select the IP Alias option.
The Create IP Alias dialog box appears.
4. Specify a IP Alias ID by entering a number in the eth0a field. Requirements are: 1 to 4096 5. Enter an IP Address. The Internet Protocol (IP) Address is the numerical label assigned to the interface. For example, 192.168.10.23 6. Enter a Netmask address.
The Netmask is the subnet portion of the IP address that is assigned to the interface. Format is typically 255.255.255.###, where the ### are the values that identify the interface. If you do not specify a netmask, the Data Domain system uses the netmask format is determined by the TCP/IP address class (A,B,C) you are using.
115
7. Specify Dynamic DNS Registration option. Dynamic DNS (DDNS) is the protocol that allows machines on a network to communicate with, and register their IP address on, a Domain Name System (DNS) server. The DDNS must be registered to enable this option. Refer to Registering a DDNS for additional information. Click Next. The Configure Interface Settings summary page appears. The values listed reflect the new system and interface state. Click Finish and OK.
116
Module Review
117
118
119
120
121
VTL Overview
In some environments the Data Domain system must be configured as a Virtual Tape Library. This practice may be motivated by the need to leverage existing backup policies that were built around using physical tape libraries. Using a VTL can also be a step in a longer range migration to disk based media for backup, or it may be driven by the need to minimize the effort to recertify a system to meet compliance needs. Virtual tape libraries emulate the physical tape equipment and function. Different tape library products may package some things in different ways and the names of some elements may differ between products but the fundamentals are basically the same. Data Domain systems are configured using the concepts of libraries, tapes, Cartridge Access Ports, and barcodes. The Data Domain VTL software option requires installation of a Data Domain VTL HBA to connect to a Fibre Channel storage network. Activating the VTL configuration also requires the installation of a VTL license key. The Data Domain VTL software option requires installation of a Data Domain VTL HBA to connect to a Fibre Channel storage network. Activating the VTL configuration also requires the installation of a VTL license key.
122
The goal of configuring a Data Domain system as a VTL is to provide an interface that the backup software can work with as if it were working with the physical tape library. Some Parameters must be configured in the VTL environment to structure the libraries and the number and size of elements within each. Many of these parameters are dictated by the tape technology and library model that is being emulated. Consult the product documentation and best practice guides for more information about the definitions and ranges for each parameter.
123
Configuration Terms
124
VTL Planning
You must configure parameters in the VTL environment to structure the number and size of elements within each library. Parameters are dictated by the tape technology and library you are emulating. Consult the product documentation and best practices guides for information about the definitions and ranges for each parameter.
125
Capacity Planning
See the best practices documentation for planning a virtual library configuration. Capacity calculations are a function of the backup set size and the compression ratios. Consider how space gets allocated and recovered in events like tape creation and expiration. With deduplication the amount of disk space you need for a tape depends on the compression ratio. When virtual tapes are created space isnt pre-allocated, so the logical setup to emulate a large library of tape incurs a small disk space. You monitor disk space use and compression using the same techniques you use for monitoring data backed up over NFS and CIFS You must clean a Data Domain system to reclaim data storage space from an expired tape image. DIA requirements to not use space until cleaning runs also means that large amounts of space may become unavailable for use. In some environments, you must manage disk use closely. You may need to run system cleaning frequently.
126
Create Tapes
127
When a tape is created, a bar code is assigned that is a unique identifier of a tape. The eight-character barcode must start with six numeric or upper-case alphabetic characters (from the set {0-9, A-Z}) and end in a two-character tag for the supported LT0-1, LT0-2, and LT0-3 tape type.
128
Data Domain recommends creating tapes with unique barcodes only. Duplicate bar codes in the same tape pool create an error. Although no error is created for duplicate barcodes in different pools, duplicate bar codes may cause unpredictable behavior in backup applications.
129
130
131
Module Review
132
133
Lesson 1: Snapshots
Use a snapshot to save a backup directory copy at a specific point in time. You can use it later to restore files from a specific point in time, for example, before a Data Domain OS upgrade. Snapshot benefits: Snapshots do not use many system resources Snapshots are free (do not require a license) A snapshot is a read-only copy of the entire backup directory.
You can schedule multiple snapshot schedules at the same time or create them when you choose, for example, before you upgrade or configure your system. Snapshots are created in the /backup/.snapshot directory. The maximum number of snapshots allowed to be stored on a Data Domain system is 750. You will receive warning when the number of snapshots reaches 90% of the allowed number (675-749). An alert is generated when the maximum number is reached. To clear the warning: 1. Expire snapshots
134
2. Run file system cleaning If the Data Domain system is a source for collection, snapshots are replicated. If the Data Domain system is a source for directory replication, snapshots are not replicated. You must create and replicate snapshots separately. Note: Use snapshots judiciously to avoid having segments in snapshots that remain after files are deleted.
135
Manage Snapshots
To create a snapshot: 1. Data Management > Snapshots 2. Click Create to create snapshot 3. Click Modify Expiration Date to choose Never Expire In: snapshot expires in number of days, weeks, months, or years you select On: snapshot expires at the first minute of the day you select 4. Select Schedule tab to schedule snapshots
136
137
Lesson 2: Fastcopy
Use fastcopy to retrieve data stored in snapshots. Fastcopy copies files and directory trees of a course directory to a target directory on a Data Domain system. The fastcopy force option allows the destination directory to be overwritten if it exists. Fastcopy makes a destination equal to the source, but not at a particular point in time. The source and destination may not be equal if you change either during the copy operation. Fastcopy takes space (its like a clone).
138
Perform Fastcopy
1. Data Management > File System > More Tasks > 2. Enter source and destination 3. Click box to overwrite exiting destination (if it exists)
139
140
The EMC Data Domain Retention lock licensed software feature enables organizations to protect records in non-writeable and non-erasable formats for a specified length of time, up to 70 years. Retention lock protects against: Accidents and user errors Malicious activity Licensed feature
141
Minimum and maximum retention periods are set globally Within the global parameters, retention parameters can be set on a file-by-file basis Retention Lock is designed to respond to commands from the user, archive software, or some backup applications, which trigger the lock on stored data. Assumes administrators are trustworthy Allows administrative users to do the following Override global retention settings Update permissions of locked files Rename empty directories
Fully integrated with Data Domain replication File locking is initiated by the user or by backup or archive software
142
The user or the storage software needs to set the access time (atime) for the file to a date in the future that must be beyond the minimum retention period that is configured on the Data Domain system. The action of setting the atime is the signal to lock the file. As soon as this value is set, the file is locked and cannot be deleted or modified before that date. Retention Lock extends the utility of Data Domain systems into environments that require granular controls over the protection of individual files stored on the system.
143
144
Lesson 3: Sanitization
With the sanitize function, deleted files can be overwritten using a DoD/NIST compliant algorithm and procedures. No complex setup or disruption is needed. Clean data is available during the sanitization process, with limited disruption to daily operations. Sanitizing is the electronic equivalent of data shredding. It removes any trace of deleted files. Sanitization supports organizations that: Are required to remove and destroy confidential data that is accidentally written to unapproved systems Are required to delete data that is no longer needed. Need to resolve classified message incidents (CMIs).
The system sanitize command erases the following: Segments of deleted files that are not used by other files Contaminated metadata All unused capacity in the file system
145
All segments used by deleted file cannot be globally erased, because some segments may be used by other files CLI command: system sanitize, system sanitize start
146
147
With encryption software option licensed and enabled, all incoming data is encrypted inline before it is written to disk. This is a software-based approach, and it requires no additional hardware. It includes: Or Both confidentiality and message authenticity with Galios/Counter Mode (GCM mode) Configurable 128-bit or 256-bit Advanced Encryption Standard (AES) algorithm with either: Confidentiality with Cipher Block Chaining (CBC mode)
Encryption and decryption to and from disk is transparent to all access protocols: DD Boost, NFS, CIFS, and VTL (no administrative action is required for decryption)
148
If locking or unlocking the file system If disabling encryption One key is used for all data in a system. The encryption key is passphrase protected. The administrative user specifies the passphrase when enabling encryption. The passphrase is needed by administrative users only: If locking or unlocking the file system If disabling encryption
Take great care not to lose the passphrase or data can be irrevocably lost. If, for example, the passphrase is lost after a file system is locked, the file system cannot be unlocked. The passphrase is not stored on disk (so the passphrase cant be recovered). The two administrative users use the passphrase to lock the Data Domain system and its external storage devices. They enter a new passphrase during the locking procedure. A thief who steals a system cant unlock the file system without the passphrase.
149
Encryption Flow
150
Configure Encryption
1. In the CLI, ensure the Data Domain Encryption license is enabled. (See Appendix A for Data Domain software licenses.) Disable the file system # filesys disable Enable encryption # filesys encryption enable 2. Enter a passphrase when prompted Select an alternative cryptographic algorithm (optional) # filesys encryption algorithm set algorithm Default algorithm is aes_256_cbc. Other options are: aes_128_cbc, aes_128_gcm, or aes_256_gcm
151
152
Deactivate Encryption
153
154
Cleaning reclaims physical storage occupied by deleted objects. For example, as retention periods expire, old backups are deleted. Space from deleted backups becomes available only after cleaning reclaims the disk space. When application software expires backup or archive images, they are deleted in the sense that they are no longer accessible or available for recovery from the application. The images still occupy physical storage. The clean operation reclaims the segments used by files that are deleted and are not longer referenced. Cleaning may require a lot of system resources. Mechanisms (self-throttling) are in place to automatically adjust the priority assigned to cleaning tasks in favor of more time critical processing tasks. Cleaning schedules are adjustable. By default, cleaning is scheduled to start every Tuesday at 6:00 am. You should schedule cleaning for times when system traffic is lowest. A Data Domain system is available for write and read operations during cleaning.
155
Cleaning
Cleaning enables you to reorganize data to improve the speed and efficiency of deduplication. Data invulnerability requires that data is only written into new containers. This requirement also applies to cleaning. Copy forward segments are segments that, for deduplication efficiency, should be stored adjacent to each other. So they are copied forward together into a single container. Dead segments are dead because the files that referred to them were all deleted, and the pointers were removed. Dead segments are not allowed to be rewritten with new data since this could put valid data at risk of corruption. Instead, valid segments are copied forward into free containers to group the remaining valid segments together. When the data is safe and reorganized, the original containers are appended back onto the available disk space. Since DDS is a log structure file system, space that was deleted (see types below) must be reclaimed. This is called cleaning. Cleaning is done for 2 main reasons: 1. House keeping: periodically, segments are considered dead if the reference file for it was deleted.
156
2. Performance tuning: approximately 10% of the duplicate data is rewritten on the disk. This is done for performance reasons. For example, rewriting duplicate segments (copy forward segment) into the same location (segmentation locality) can be more advantageous than having the segments across different locations. The default time schedule for file system cleaning is every Tuesday at 6 am. The default CPU throttle is 50%. You can change these options using the Enterprise Manager or the CLI.
157
158
159
160
Space Usage
Monitor space use to determine if you have enough space (through extrapolation). If you dont have enough space you can do the following: Increase capacity Reduce retention periods Reduce amount of data
Term
GUI Term
Definition
pre-compression
Pre Compression Written Data sent to Data Domain system before deduplication, local compression, or both Post-Comp Used Space used after compression
161
Term
GUI Term
Definition
compression factor
Comp Factor
The compression factor (global ratio, cumulative ratio) is amount of data footprint reduction It is pre-compression data divided by data collection It is a global setting Cleaning date Notice compression factor increase after each cleaning
cleaning Cleaning
162
Term
GUI Term
Definition
Capacity Post-Comp Comp Factor The compression factor (global ratio, cumulative ratio) is amount of data footprint reduction It is pre-compression data divided by data collection It is a global setting
cleaning date
163
Term
GUI Term
Definition
pre-compression post-compression total compression factor global compression factor local compression factor total-compression factor (reduction)
Pre-Comp Post-Comp Total-Comp Factor Global-Comp Factor Local-Comp Factor Total-Comp Factor (Reduction)
164
Module Review
165
166
167
Because replication duplicates data over a WAN after its deduplicated (only unique data segments are sent over the network) and compressed, network demands are reduced. Replicate (copy) data from one Data Domain system to another for: Disaster recovery Remote office data protection Multiple site tape consolidation Onsite archiving
Once you configure replication between a source and destination, only new data written to the source is automatically replicated to the destination. Data is deduplicated at the source and at the destination. You can recover offsite replicated data online. So, you dont need to transport tape via remounting or by truck. You need a replicator license for both source and destination Data Domain systems. See appendix A for a complete list of Data Domain licenses.
168
Lesson Objectives
169
Because replication duplicates data over a WAN after its deduplicated (only unique data segments are sent over the network) and compressed, network demands are reduced. Replicate (copy) data from one Data Domain system to another for: Disaster recovery Remote office data protection Multiple site tape consolidation Onsite archiving
Once you configure replication between a source and destination, only new data written to the source is automatically replicated to the destination. Data is deduplicated at the source and at the destination. You can recover offsite replicated data online. So, you dont need to transport tape via remounting or by truck. You need a replicator license for both source and destination Data Domain systems. See appendix A for a complete list of Data Domain licenses.
170
Replication is set up with a source Data Domain system and one or more destination Data Domain systems. There are 3 replication types: 1. Collection 2. Directory 3. Pool
171
Collection Replication
With collection replication, user accounts and passwords are replicated from the source to the destination. Any changes made manually on the destination are overwritten after the next change is made on the source. It is recommended that changes be made only on the source. Entire /backup directory replicated Entire /backup directory duplicated Full system data replication mirror Immediate accessibility at destination Other than receiving data from the source, the destination is read only. The context must be broken by using the replication break command to make it read and writable. All user accounts and passwords are replicated from source
172
Directory Replication
Directory replication provides replication at the level of individual directories. Each Data Domain system can be the source or destination for multiple directories and can be a source for some directories and a destination for others. During directory replication, a Data Domain system can perform normal backup and restore operations. A destination Data Domain system must have available storage capacity that is at least the post-compressed size of the expected maximum size of the source directory. A single destination Data Domain system can receive backups from both CIFS and NFS clients as long as separate directories are used for each. Do not mix CIFS and NFS data in the same directory. When replication is initialized, a destination directory is created automatically if it does not already exist. After replication is initialized, ownership and permissions of the destination directory are always identical to those of the source directory. At any time, due to differences in global compression, the source and destination directory can differ in size. Subdirectories under /backup are duplicated Destination must have available storage Can receive backups from both CIFS and NFS clients Do not mix CIFS and NFS data in same directory Destination directory created automatically
173
Pool Replication
Pool replication is similar to directory replication, adapted for Data Domain systems configured as VTLs. Replicating VTL pools and tape cartridges does not require a VTL license on the destination Data Domain system. VTL virtual tapes can be replicated from multiple replication originators to a single replication destination. Refers to directories that contain VTL tape cartridges (or pools) Operates similarly to directory replication Destination does not require the VTL license
174
Replication Context
A replication pair is referred to as a context. A subdirectory that is under a source directory in a replication context cannot be used in another directory replication context. A directory can be in only one context at a time. Use the replication break command to remove Directory Replication contexts instead of disabling the file system. We can also refer to a directory replication context as a Replication Stream, and although the use case is quite different, the stream resource utilization within a Data Domain system is roughly equivalent to a read stream (for a source context) or a write stream (for a destination context).
175
Context numbers appear in CLI output from a number of commands, such as replication status. The context number is shown in the first column of CLI output under the heading CTX. The context number is used for identification; 0 is reserved for collection replication, and directory replication numbering begins at 1. If you exceed the number of streams supported by a specific Data Domain model, your replication throughput may be compromised.
176
Replication Topologies
Data Domain has various supported replication topologies where data flows from a source to a destination directory over a LAN or WAN. Directory Replication can be configured in the following ways. One-to-One Replication: The simplest type of replication is from a Data Domain source system to a Data Domain destination system. This topology can be used only with directory replication. Bi-Directional Replication: In a bi-directional replication pair, data from the Source is replicated to the destination directory on the Destination system, and from the source directory on the destination system to destination directory on the source system. This topology can be used only with directory replication. Many-to-One Replication: In many-to-one replication, data flows from several source directory contexts to a single destination system. This type of replication occurs, for example, when several branch offices replicate their data to the corporate headquarters' IT systems. This topology can be used with directory replication. One-To-Many Replication: In a one-to-many replication, multi-streamed optimization maximizes replication throughput per context. This topology in conjunction with Cascaded Replication requires DD OS 4.8 or higher at the source and destination. This topology can be used with directory replication.
177
Cascaded Replication: In a cascaded replication topology, directory replication is chained among three or more Data Domain systems. Data recovery can be performed from the non-degraded replication pair context. This topology can be used with directory replication. Collection Replication topologies can be configured in the following ways. One-to-One Replication: This topology can be used with collection replication where the entire / backup directory from a source Data Domain system mirrored to a destination Data Domain System. Other than receiving data from the source, the destination is a read-only system. The replication break command should be used to make the destination read and writable. Cascaded Replication: In a cascaded replication topology, directory replication is chained among three or more Data Domain systems. The last system in the chain can be configured as collection replication. Data recovery can be performed from the non-degraded replication pair context.
Data Domain systems are typically used to store backup data onsite for a short period such as 30, 60 or 90 days depending on local practices and capacity. Lost or corrupted files are recovered easily from the onsite Data Domain system since it is disk based, and files are easy to locate and read at any time. In the case of a disaster that destroys the onsite data, the offsite replica is used to restore operations. Data on the replica is immediately available for use by systems in the disaster recovery facility. When a Data Domain system at the main site is repaired or replaced the data can be recovered using a few simple recovery configuration and initiation commands. You can quickly move data offsite (no delays in copying and moving tapes). You dont have to complete replication for backups to occur. Replication occurs in real time.
178
Ensure that the retention lock settings are the same on both Data Domain systems. 1. Replication > Create Pair > General > Replication Type 2. Select replication type
179
1. Replication > Summary > Advanced 2. Check box for configuration type 3. Check if you want to use a non-default connection host 4. Enter the connection port number
180
Replication Seeding
In DD660 and higher models initialization and recovery speeds are 250MB/second. The destination Data Domain system runs a more aggressive container verification mechanism, to ensure that containers get verified as quickly as possible. Throughput is best with 10GbE links with higher end Data Domain systems. If the source has a lot of data, the initial replication seeding can take some time over a slow link. To expedite the initial seeding, you may want to bring the destination system to the same location as the source system to use a high-speed, low-latency link. Once data is initially seeded using the high speed network, you can move the system back to its intended location. As data is initially seeded, only new data is sent from that point onwards. Replication Seeding supports Collection replication. Directory replication throughput can be limited by both the available network bandwidth and by the filtering/packing process. The filtering/packing overheads are proportional to the amount of logical data to be replicated. Directory replication, therefore, has 2 throughput limits to keep in mind. The first is the network or post-compressed and the second is the logical or pre-compressed. Collection replication replicates each container that has been written to disk over a network to a remote Data Domain system. Collection replication usually uses little CPU or disk resources, and can fill most of
181
the available network bandwidth up to about 70MB/s on older models and around 150MB/sec on DD690 and DD880 models. Models perform at 250MB/sec or higher provided that 10Gb/sec Ethernet is used. All models will saturate 1Gb/sec Ethernet.
182
Low bandwidth optimization can increase the virtual throughput of DD Boost replication across links with less than 6 Mbps of available (or throttled) bandwidth. Using low bandwidth optimization adds increased data compression to optimize network bandwidth. More compression directly translates to more throughput on low-bandwidth links. On high-bandwidth links, the computational overhead of low bandwidth optimization may actually reduce throughput. For this reason, low bandwidth optimization is recommended on T2 and lower bandwidth links which is ideal for small remote offices. Low bandwidth optimization must be enabled on both the source and destination. If the source and destination have incompatible low bandwidth optimization settings, low bandwidth optimization will be inactive. This feature, using bandwidth and network-delay settings together, calculates the proper TCP buffer size for replication use.
183
Delta Compression is a global compression algorithm that is applied after identity filtering. The algorithm looks for previous similar segments using a sketch like technique that sends only the difference between previous and new segments. In this example, segment S1 is very similar to S16, the destination can ask the source if it also has S1. If it does, then it only needs to transfer the delta (or difference between, S1 and S16). If the destination doesnt have S1, it can always just send the full segment data for S16 and the full missing segment data for S1. Delta Compression reduces the amount of data to be replicated over low-bandwidth WANs by eliminating the transfer of redundant data found with replicated deduplicated data. This feature is typically beneficial to remote sites with lower Data Domain models (e.g., DD140 and DD120). Replication without deduplication can be expensive, requiring either physical transport of tapes or high capacity WAN links. This often restricts it to being feasible for only a small percentage of data that is identified as critical high-value. Reductions from deduplication make it possible to replicate everything across a small WAN link. Only new unique segments need to be sent. This reduces the WAN traffic down to a small percentage of what would
184
be needed for replication without deduplication. These large factor reductions make it possible to replicate over a less expensive WAN link or to replicate more than just the most critical data. As a result the lag till data is safely off site is as small as possible.
185
186
187
Data Domain systems are typically used to store backup data onsite for a short period such as 30, 60 or 90 days depending on local practices and capacity. Lost or corrupted files are recovered easily from the onsite Data Domain system since it is disk based, and files are easy to locate and read at any time. In the case of a disaster that destroys the onsite data, the offsite replica is used to restore operations. Data on the replica is immediately available for use by systems in the disaster recovery facility. When a Data Domain system at the main site is repaired or replaced the data can be recovered using a few simple recovery configuration and initiation commands. If something has occurred that makes the source replication data inaccessible, the data can be recovered from the replication pair destination. Either collection or directory can be recovered to the source. During collection replication, the destination context must be fully initialized for the recover to be successful. Recover a selected data set if it becomes necessary to recover one or more directory replication pairs. Note: If a recovery fails or must be terminated, the replication recover can be aborted.
188
1. Replication > More Tasks > Start Recovery... 2. Select replication type 3. Select system to recover to 4. Select system to recover from
189
Resynchronization is the process of recovering (or bringing back into sync) the data between a source and destination replication pair after a manual break. The replication pair are resynchronized so both endpoints contain the same data. A replication re-synchronization can be used to recreate a deleted context, when a directory replication destination runs out of space while the source destination still has data to replicate. Resynchronization can also be used to convert a collection replication to directory replication. This is also useful when it is to be a source directory for cascaded replication. A conversion is started with a replication resync that filters all data from the source Data Domain system to the destination Data Domain system. This implies that seeding can be accomplished by first doing collection replication then breaking collection replication, then doing directory replication resync.
190
Resynchronization Process
191
Temporary Override If configured, shows the throttle rate or 0, which means all replication traffic is stopped. Permanent ScheduleShows the time for days of the week that scheduled throttling occurs.
The Add Throttle Setting dialog box appears. 2. Set the days of the week that throttling is active by clicking the check boxes next to the days. 3. Set the time that throttling starts with the Start Time drop-down selectors for the hour:minute and AM/PM.
4.
In the Throttle Rate area: a. Click the Unlimited radio button to set no limits.
192
b. Enter a number in the text entry box (for example, 20000) and select the rate from the dropdown menu (bps, Bps, Kibps, or KiBps).370 Working with Replication c. Select the 0 Bps (Disabled) option to disable all replication traffic.
5. Click OK to set the schedule. The new schedule is shown in the Throttle Settings Permanent Schedule area. Replication runs at the given rate until the next scheduled change or until a new throttle setting forces a change.
193
Module Review
194
195
196
197
DD Boost Overview
198
DD Boost provides standard and centralized management of a Data Domain system through backup software. It works with industry standard backup software like: EMC NetWorker Symantec NetBackup (Data Domain plug-in required) Symantec Backup Exec (Data Domain plug-in required)
You can use the Data Domain system advanced load balancing and failover feature to enable network performance enhancements. Replication is a licensed feature (see appendix A). Your backup software might also require a license to enable this functionality. Verify your backup software documents for more information
199
DD Boost Flow
This is how the DD Boost feature works in a backup environment. 1. Network clients send data (have an active stream) to the backup server over the network. 2. The backup server receives data from clients and deduplicates and compresses it. 3. A backup server manages connections between the backup applications and Data Domain Systems with DD Boost. 4. The backup server sends deduplicated and compressed data over the LAN to the Data Domain system. 5. Deduplicated data is stored in the Data Domain system. The main benefits of distributed segment processing are: Lower CPU loads on the backup servers and Data Domain system Less data sent over the LAN Faster backups Since compression is moved to the backup server, this further reduces bandwidth consumption
200
Distributed segment processing is able to do this because segmentation, creating segment IDs, and compression is distributed to the backup server. While distributed segment processing adds compute responsibilities to the backup server, it reduces the overall CPU load since it reduces the quantity of data to transmit.
201
A backup server initiates and tracks replication for easy management and disaster recovery. You can initiate Data Domain replication on demand and manage it from a backup server management console. If you need to you can archive data to tape by using backup software.
202
With DD Boost, the supported backup application has visibility and control it needs to manage both a main backup and the offsite replica from a single management pane. Without Data Domain Boost a recovery requires an administrator to perform the Data Domain recovery procedures and then to update the backup management information to restore operations. The OST plug-in doesnt have distributed segmented processing. The DD Boost plug-in has distributed segmented processing. DD BOOST can either be enabled or disabled for distributed segment processing.
Without distributed segmented processing enabled, the entire deduplication process is performed on a Data Domain system.The Data Domain system 1. Segments 2. Fingerprints 3. Filters 4. Compresses
203
5. Writes data to disk The backup server has no part in the deduplication process.
204
When you enable distributed segment processing, the deduplication process is distributed between a backup server and a Data Domain system. Distributed segment processing off-loads the following from the Data Domain system to the backup server: segment creation fingerprint creation compression (optional)
A Data Domain system filters fingerprints and writes segments to a Data Domain system disk. The deduplication process remains the same with or without distributed segment processing enabled. Distributed segment processing divides the deduplication process between a backup server and a Data Domain system. When distributed segment processing is enabled, part of the deduplication is offloaded to the backup server: 1. As the data arrives in the backup server, it is spliced into 4-12kb segments.
205
2. A fingerprint (or segment ID) is created for each segment. 3. Each segment ID is sent over the network to the Data Domain system to filter. a. The filter determines if the segment ID is new or a duplicate. The segment IDs are checked against segment IDs already on the Data Domain system. b. The segment IDs that match existing segments IDs are referenced and discarded, while the Data Domain system tells the Backup server which segment IDs are unmatched (new). 4. Unmatched or new segments are compressed using a common compression techniques such LZ, GZ, or Gzfast. This is also called local compression. 5. The compressed segments are sent to the Data Domain system and written to containers on the disk with the associated fingerprints, metadata, and logs.
206
A NetWorker Data Zone consists of a single NetWorker server and its storage node(s) and client(s). Another way to define a data zone is to say it is the set of hosts managed by a single NetWorker server. This includes all hosts with backup devices controlled by the NetWorker server and all hosts who send their backup data to those devices. NetWorker clients may be backed up by multiple NetWorker servers and therefore may belong to multiple data zones. NetWorker servers and storage nodes may belong to only one data zone. Specifically, the NetWorker Server supports the backup and stores tracking and configuration information. The NetWorker Storage Node writes data to and reads data from the backup device. The NetWorker Client generates the backup data. In large environments, storage nodes serve as an aggregation point for a large number of clients. The clients send their data to the storage node with which they are associated, and the storage node backs up the data to the share on a Data Domain system.The NetWorker Server is the component that stores the configuration information, such as supported clients, backup device information, when to run the backups, what data to back up, etc. The NetWorker server also maintains the NetWorker databases that track the save sets and volumes. These include the Client File Indexes (CFI) and Media Database. The NetWorker
207
server, as a client within the data zone, automatically backs up the configuration information and tracking databases to protect NetWorker data. There is a single NetWorker server per data zone and it must be available for any NetWorker activity to take place in that data zone. NetWorker servers have NetWorker client, storage node, and server software installed. See the nsr_service(5) man page for more information. The NetWorker Client is the largest NetWorker software component and the fundamental host. The clients most important functions are to generate backups (also called save sets), push them to a NetWorker storage node, and retrieve them during a recovery. NetWorker clients are usually the data servers in an IT environment. While performing a backup, the client also generates tracking information (including the name of each file and directory backed up, and the time of the backup) and sends it to the NetWorker server where it is stored. This tracking information is used to facilitate point-in-time recoveries. The client software includes graphical user interfaces (GUIs) and command-line utilities which allow users to manually perform backup and recovery operations. NetWorker client software is installed on all participating hosts in the data zone, including hosts that also perform the functions of NetWorker server and NetWorker storage node. Every host in a NetWorker data zone is a NetWorker client. A NetWorker Storage Node is the component that physically controls a backup device and responds to requests from the NetWorker server. The device may be either a direct-attached or SAN-accessible device. If a device is controlled by a host other than the NetWorker server, it is considered a remote device and the storage node controlling the device is referred to as a remote storage node. The NetWorker server is always a storage node and is the default storage node for backups. Using remote storage nodes is optional, although they distribute the backup workload and can reduce network traffic. Storage node hosts have both the NetWorker client and storage node software installed. During a backup, a NetWorker client sends backup data to a particular storage node based on the clients configuration. During a recovery, the client reads from the storage node that controls the device containing the necessary volume. Storage nodes also send tracking information, including details about save sets written to the volume during a backup, to the NetWorker server. This information is used for future backups as well as for recoveries. See nsr_storage_node(5) in the EMC NetWorker Command Reference Guide (man pages) for more information.
208
The NetWorker Server stores configuration information, such as supported clients, backup device information, when to run the backups, what data to back up, etc. The NetWorker server also maintains the NetWorker databases that track the save sets and volumes. These include the client file indexes (CFI) and Media Database. The NetWorker server, as a client within the data zone, automatically backs up the configuration information and tracking databases to protect NetWorker data. There is a single NetWorker server per data zone and it must be available for any NetWorker activity to take place in that data zone. NetWorker servers have NetWorker client, storage node, and server software installed. See the nsr_service(5) man page for more information.
209
The following are the NetWorker backup work flow with DD Boost steps: 1. The NetWorker server initiates the backup job and sends data to the Data Domain system after deduplication. 2. The save set is stored in the local Data Domain System. 3. Information about the primary copy of save set is updated in NetWorker Control Data. 4. The NetWorker server initiates the clone. 5. The clone request from the NetWorker server triggers the replication from the local Data Domain system to the remote Data Domain system. 6. Upon completion of the clone, the status is acknowledged to the NetWorker server. 7. Upon receipt of acknowledgement, information about the clone copy of data set is updated in the NetWorker Control Data.
210
Interface Groups
DD Boost level aggregation of multiple 1GbE or 10GbE links on Data Systems enables backup/restore loads automatically when distributed on multiple ports. Performance on 1 Gb Ethernet ports can be improved using DD Boost interface group feature. DD Boost uses Distributed Segment Processing and can be used in conjunction with network level and switched assisted aggregation. With Dynamic Load Balancing, the OST plug-in dynamically negotiates with the Data Domain system.
211
Firewall Ports
When working with Data Domain Boost, make sure the following TCP ports are open in network firewall. TCP 2049 TCP 2051 TCP 111
212
You can download Data Domain plug-in software from the Data Domain support portal http://my.datadomain.com. Choose the correct client operating system version.
213
Configure DD Boost
214
215
216
Module Review
217
218
219
220
Collect Information
By collecting this information, youre getting to the burn rate for your Data Domain system. When backups expire, they get deleted.
221
Other questions to ask yourself are: How many full backups does your company do? How many incremental backups does your company do? Does your company do one full and many incremental backups? How does your backup software work?
222
Data Domain system internal indexes and other product components use variable amounts of storage, depending on the type of data and the sizes of files. If you send different data sets to otherwise identical systems, one system may, over time, have room for more or less actual backup data than another. Challenging data types include: pre-compressed (multimedia, .zip, & .tiff) pre-encrypted
223
The compression rates shown are approximate. Compression rates depend on a number of variables so it may be difficult to determine what rates you can expect. The highest rates are usually when many full backups are stored. You can use average rates as a starting point for your calculations and refine them once real data is available.
224
225
1. Calculate the required capacity by adding up the space required by the following: 1st full backup + Incremental backup x 4 Weekly full backup 2. Multiply the total by the number of weeks the data is retained.
226
Use these assumption when you perform your calculation on the next slide (or the next page in this guide).
227
Calculate the capacity (burn rate) needed by adding up the space needed for each backup that you retain. In this example 1 TB of data is backed up and a conservative compression rate is estimated of 5x (which may have come from a test or is a reasonable assumption to start with). This gives 200 GB needed for the initial backup. With a 10% change rate in the data each day, incremental backups would be 100 GB each, and with an estimated compression on these of 10x, the amount of space required for each incremental backup would be 10 GB. When subsequent full backups run it is likely that they will compress at a much higher rate so 25x is estimated for the compression rate on subsequent full backup, so that the 1 TB of data compresses to 40 GB. Four daily incrementals needing 10 GB and one weekly backup needing 40 GB yields a burn rate of 80 GB per week. Running the 80 GB weekly burn rate out over the full 8 week retention period means that an estimated 640 GB is needed to store the daily incrementals and the weekly fulls. Adding this to the original full backup gives a total of 840 GB needed. Using a Data Domain system with 1 TB of usable capacity for this scenario would mean that the unit would operate at about 84% of capacity. This may be ok, but a system with a larger capacity or that can have additional storage added might be a better choice to allow for data growth.
228
In this sample calculation the full data set of 6000 GB must be backed up within a 10 hour window. This yields a raw requirement of being able to process at least 600 GB per hour.
229
The system capacity numbers assume a mix of typical enterprise backup data (such as file systems, databases, mail, and developer files). The low and high ends of the range are also determined by how often data is backed up. The maximum capacity for each model assumes the maximum number of drives (either internal or external) that are supported for that model. Maximum throughput for each model is dependent mostly on the number and speed of the processors used. Larger units have more and faster processors so they are able to process incoming data faster. Values shown in this table give an idea of the range of values to expect and the relative differences between models. The number of streams you may expect depends on your hardware model.
230
Select Model
Best practices are to be conservative in calculating the model needed to meet requirements and allow for a 15-25% buffer. The same buffer should be allowed for both the capacity and the throughput calculations. Required capacity divided by logical capacity of a particular model times 100 equals the capacity percentage. Required throughput divided by the maximum throughput of a particular model times 100 equals the throughput percentage. If the capacity or throughput percentage for a particular model does not provide the 15% to 25% buffer, then calculate the capacity and throughput percentages for the next highest model number. For example, if the capacity calculation for a DD610 yielded a capacity percentage of 91%, only a 9% buffer would be available, so you should look at the DD630 next to calculate its capacity. Sometimes one model will give adequate throughput but not provide enough capacity, and vice versa. The final model selected has to provide for both.
231
The best practice is to select a unit with a capacity that is 15 to 25% higher than the required capacity. In this example, the capacity requirement of 840 GB would fill a DD140 to 97% of capacity. The next highest model, the DD610, has a capacity of 1.65 TB when configured with 7 drives. The capacity percentage estimated for the DD610 is 51%, and the 49% buffer is more than adequate.
232
In this example where 840 GB capacity is needed, it appears from the numbers that the DD140 would meet this need with 860 GB capacity, and the next highest capacity model, a DD610 with 7 drives and 1,650 GB capacity, would have almost two times the capacity needed. But you still need to make sure the capacity buffer is big enough.
233
The best practice is to select a model that will meet the throughput requirements at no more than 75-85% of the models maximum throughput capacity. In this example the throughput requirement of 600 GB per hour would load the DD610 to more than 89% of capacity. A better choice would be a model with higher throughput capability such as the DD630. So, even though a DD610 with 12 drives meets the capacity requirement, the DD630 is the minimum model that would meet the performance requirement.
234
In this example where 840 GB capacity is needed, it appears from the numbers that the DD140 would meet this need with 860 GB capacity, and the next highest capacity model, a DD610 with 7 drives and 1,650 GB capacity, would have almost two times the capacity needed. But you still need to make sure the capacity buffer is big enough.
235
To complete throughput tuning, you must do the following: 1. Identify bottlenecks 2. Display and understand Data Domain system performance metrics 3. Implement solutions
236
Throughput Bottlenecks
Integrating Data Domain systems into the backup architecture will change the dynamics of the system. Bottlenecks that restrict the flow of data may shift. Data Domain systems collect and report performance metrics that help identify where bottlenecks might be. Identifying bottlenecks is the first step in the tuning process.
237
Performance Metrics
The system show performance command shows the percentage of time a Data Domain system is: waiting to send a request processing a request over the network waiting to receive a request idle
238
Tuning Solutions
Try the following to tune your Data Domain system: Implement link aggregation Consider implementing DD Boost
239
Monitor Throughput
In addition to watching disk utilization, you should monitor the rate at which data is being received and processed. These throughput statistics are measured at several points in the system to assist with analyzing the performance to identify bottlenecks. The system show stats 2 command outputs a new line every two seconds that shows some of these measurements. This is a monitoring tool for live data. In the example report shown here there is a high and steady amount of data inbound on the network interface which indicates that the backup server is writing data to the Data Domain device. Most of the incoming data duplicates what is already on disk Low disk write rates relative to steady inbound network activity is likely because much of the incoming data segments are duplicates of segments already stored on disk. The Data Domain system is identifying the duplicates in real time as they arrive and only writing any new segments it detects.
240
Module Review
241
242
243
Module Objectives
244
245
SNMP Flow
246
Download the MIB file from the Enterprise Manager. You can also download the MIB files from the /ddvar/ snmp directory. You can monitor a Data Domain system with any SNMP utility. You can integrate the Data Domain MIB into SNMP monitoring software such as HP OpenView. Refer to your SNMP monitoring software administration guide for how to integrate and best practices.
247
248
If you want to keep a remote copy of Data Domain system log messages, you can enable the remote logging feature. In a Data Domain system, the remote logging feature uses TCP port 514. You can configure a Data Domain system to send system messages to the remote syslog server. The syslog server collect logs from network devices. This way, a copy of Data Domain logs are available outside a Data Domain system.
249
You can use the log command and options to configure remote login on a Data Domain system.
250
251
1. Use the Enterprise Manager to view the system log files in /ddvar/log/debug. Maintenance > Logs 2. Click on the file you want to view. The ddvar folder contains log files that you cant view through log commands or from the Enterprise Manager. To view all Data Domain system log files, mount the ddvar folder using CIFS or NFS. The CLI command is log view filename.
252
The /ddvar/log folder include troubleshooting related files. Only relevant files or folders are listed. The CLI command to view logs is log view debug.
253
Some log files in a Data Domain system capture information about Data Domain system activity. For example, the messages file captures all system messages. All CIFS related log files are available under the cifs folder in /ddvar/log/debug. Log files are text files. The following log files are in /ddvar/log: messages ost.log space.log
The following log files are in /ddvar/log/debug: perf.log ddfs.info cifs vtl.info
254
255
A support upload bundle (SUB) is a multi-GB sized tar file which contains system files, logs, and setting. It helps to prioritize and diagnose customer reported problem. You can create a SUB through the Enterprise Manager or from the CLI.
256
Generate a SUB from the Enterprise Manager: Maintenance > Support Bundles > Generate Support Bundles From the CLI the command is support upload {bundle [file_list] | traces | file_list}
257
Lesson 5: Autosupport
Autosupport helps you solve and prevent issues by providing timely notification of significant issues. This helps to enable a rapid response time when solving issues.
258
Autosupport System
A Data Domain system sends periodic system configuration, performance, and system status information to EMC, Data Domain via SMTP. Data Domain support personnel use this autosupport information for analyzing and troubleshooting a system. Data Domain technical support uses Information from the autosupport data warehouse. Autosupport: Sends scheduled daily autosupport reports Sends autosupport when a reboot occurs Sends spontaneous alert of anomalous behaviors Example: failed disk, failed power supply, system nearly full Sends scheduled alerts daily summary (recent alerts) Sends support upload bundle (SUB) containing system information (system administrator initiated event)
Data Domain captures the above emails, parses parts of them, and stores them in the data warehouse.
259
Autosupport Types
There are 2 autosupport types: 1. Scheduled 2. Non-scheduled Each Data Domain system can be configured to send regular autosupport at 6AM and a summary of daily alerts at 8AM. Data Domain systems create an autosupport in the event of system reboot, alert message (disk failures), and warning message (power failure).
260
You can access alerts, autosupport reports, and logs through the Enterprise Manager.
261
Autosupport Reports
A standard autosupport is auto-generated at 6 am daily. It contains detailed information about a system, for example: runtime parameters system settings status performance data system log sections
262
You can disable autosupport notifications. You can configure autosupport email subscribers.
263
Email subscribers receive daily detailed reports. Using SMTP autosupports are sent to Data Domain technical support daily at 6 am local time. This is the default. Autosupports contain: system ID uptime information system command outputs runtime parameters logs system settings status and performance data debugging information
The autosupport report is a long text report (500 - 800K) Section of the autosupport report are parsed into the Data Domain data warehouse for analysis and reporting.
264
Detailed autosupport report contents show: how disks are being used. configuration information. how much data is being handled and how it is deduplicated.
265
A daily summary autosupport provides summary report of system alerts since the previous report. It is generated every day at 8AM. It uses the autosupport email distribution list. You can change the time the daily summary autosupport is generated. The CLI command is: alerts show current
266
Alerts
You can configure and manage alerts from the Enterprise Manager. You can also filter alerts. The CLI commands are: alerts show history, alerts show current
267
Alerts (Continued)
Alerts are represented with: a unique numerical ID the date and time the alert occurred the physical component where the alert occurred
268
Alerts (Continued)
You can filter alerts by severity and class. The alerts severities are: Debug Info Notice Warning Error Critical Alert Emergency
269
270
Alerts Notification
Alerts are notification messages generated by Data Domain system in event of undesirable event and sent immediately on detection. Alert provides short description of the problem. Alert has a separate email distribution list. On receipt of an alert, Data Domain creates a support case.
271
You can search the autosupport of a Data Domain system through https://my.datadomain.com. This can be very helpful when you are troubleshooting.
272
You can search for an autosupport report at: https://my.datadomain.com. You can use the autosupport reports for troubleshooting. You can search a system using the: account name serial number host name
You can sort the search results using the: serial number host name alert level contract expiration date
273
274
When you click on a system listed in on the My Systems page, you are provided with an option to renew your contract if needed. You can also: view space slot view autosupports create and view support cases
When you click on View Space Plot it opens a graph where the space usage is shown. In the space plot page it provide link to view detailed tabular data and view autosupports.
275
276
Module Review
277
278
279
Download Software
280
System Upgrade
281
282
Appendix A: Licenses
Name
Archiver DD Boost Encryption for Data at Rest Expansion Storage Global Deduplication Nearline Replication
Definition
DD860 platform archival tier storage Enables Data Domain system to use DD Boost Enable data on system drives, or external storage, to be encrypted while being saved, and then locked, before it is moved to another location. Enable Data Domain system capacity upgrade of a DD630 to 12 disks Enable the global deduplication array Identifies systems deployed for archive and nearline workloads. Add Data Domain replicator. The Data Domain replicator replicates data from one Data Domain system to another. You can add as many replications as you like with 1 replication license. Prevents retention-locked files from being deleted or modified for up to 70 years. Enables backup software to see a Data Domain system as a tape library.
Enables backup software to see a Data Domain system as a tape library on IBM iSeries backup software. Note: Installation is different for the IBM iSeries.
283
284
Appendix A: Licenses
285
3. View the table for the desired application vendor in the application compatibility lists that apply to your configuration. 4. View the Fibre Channel, storage array, OpenStorage (OST), and other compatibility lists that apply to your configuration. Note: A Data Domain OS version is shown in the compatibility document title. Compatibility of earlier Data Domain OS versions is covered in the Min DD OS columns in the tables. 5. For procedures for integrating Data Domain systems with specific vendors, select a vendor from the Vendor pull-down member and access the applicable document from the documentation list.
286