Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Administration
Student Guide
Education Services
May 2013
Table of Contents
EMC Data Domain System Administration Course Introduction............................................................ 1
Module 1: Technology Overview ........................................................................................................ 15
Module 2: Basic Administration ......................................................................................................... 69
Module 3: Managing Network Interfaces ......................................................................................... 131
Module 4: CIFS and NFS ................................................................................................................... 169
Module 5: File System and Data Management ................................................................................. 193
Module 6: Data Replication and Recovery ........................................................................................ 257
Module 7: Tape Library and VTL Concepts ........................................................................................ 303
Module 8: DD Boost ......................................................................................................................... 347
Module 9: Data Security................................................................................................................... 379
Module 10: Sizing, Capacity and Throughput Planning and Tuning ................................................... 417
Slide 1
DATA DOMAIN
SYSTEM ADMINISTRATION
EMC2, EMC, Data Domain, RSA, EMC Centera, EMC ControlCenter, EMC LifeLine, EMC OnCourse, EMC
Proven, EMC Snap, EMC SourceOne, EMC Storage Administrator, Acartus, Access Logix, AdvantEdge,
AlphaStor, ApplicationXtender, ArchiveXtender, Atmos, Authentica, Authentic Problems, Automated
Resource Manager, AutoStart, AutoSwap, AVALONidm, Avamar, Captiva, Catalog Solution, C-Clip,
Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, ClaimPack, ClaimsEditor, CLARiiON,
ClientPak, Codebook Correlation Technology, Common Information Model, Configuration Intelligence,
Configuresoft, Connectrix, CopyCross, CopyPoint, Dantz, DatabaseXtender, Direct Matrix Architecture,
DiskXtender, DiskXtender 2000, Document Sciences, Documentum, elnput, E-Lab, EmailXaminer,
EmailXtender, Enginuity, eRoom, Event Explorer, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony,
Global File Virtualization, Graphic Visualization, Greenplum, HighRoad, HomeBase, InfoMover,
Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, ISIS, Max Retriever, MediaStor,
MirrorView, Navisphere, NetWorker, nLayers, OnAlert, OpenScale, PixTools, Powerlink, PowerPath,
PowerSnap, QuickScan, Rainfinity, RepliCare, RepliStor, ResourcePak, Retrospect, RSA, the RSA logo,
SafeLine, SAN Advisor, SAN Copy, SAN Manager, Smarts, SnapImage, SnapSure, SnapView, SRDF,
StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX,
TimeFinder, UltraFlex, UltraPoint, UltraScale, Unisphere, VMAX, Vblock, Viewlets, Virtual Matrix, Virtual
Matrix Architecture, Virtual Provisioning, VisualSAN, VisualSRM, Voyence, VPLEX, VSAM-Assist,
WebXtender, xPression, xPresso, YottaYotta, the EMC logo, and where information lives, are registered
trademarks or trademarks of EMC Corporation in the United States and other countries.
All other trademarks used herein are the property of their respective owners.
Copyright 2013 EMC Corporation. All rights reserved. Published in the USA.
Revision Date: 04/23/2013
Revision Number: MR-1CP-DDSADMIN.5.2.1.0
Slide 2
Class Introductions
Name
Company
Region
Role
Data Domain system experience
Slide 3
Classroom Etiquette
Slide 4
Course Overview
Description
Audience
This EMC Education Services course provides the knowledge and skills needed to
manage a Data Domain system. This course provides lectures and hands-on
learning.
This course is for any person who presently manages or plans to manage Data
Domain systems.
Prior to attending this course, you should have attended the EMC Data Domain
Slide 5
Course Objectives
deduplication
Monitor a Data Domain system
Perform a Data Domain system initial setup
Identify and configure Data Domain data paths
Configure and manage Data Domain network interfaces
Slide 6
Slide 7
Course Flow
Conceptual
Configuration
Application
Monitoring
Monitor CIFS/NFS
Performance
Data Management
Operations
File System
Management
Monitoring Mtrees,
Space Usage and
Consumption
Data Domain
Introduction
Replication
Operations
Replication
Concepts, Types
and Topologies
Monitor VTL
Performance
Basic
Administration
Configure Data
Domain as a VTL
Monitor VTL
Performance
DD Boost
Configure Data
Domain to use DD
Boost
Monitor DD Boost
Performance
Data Security
Foundation
Managing Network
Interfaces
Throughput
Monitoring and
Tuning
Slide 8
Agenda
Modules
Day 1
Labs
1.
Technical Overview
2.
Basic Administration
Slide 9
Agenda (Continued)
Modules
Labs
3.
4.
5.
Day 2
10
Slide 10
Agenda (Continued)
Modules
Day 3
Labs
6.
7.
11
10
Slide 11
Agenda (Continued)
Modules
Labs
8.
Data Security
9.
DD Boost
10.
Day 4
12
11
Slide 12
Course Materials
Bring these materials with you to class each day:
Student Guide
Lab Guide
12
You can use your student guide to follow the lecture and take notes. Space is provided for you to take
notes.
Use the lab guide to get step-by-step instructions to complete the labs.
Bring these materials with you to class each day.
13
14
Slide 1
This module focuses on Data Domain core technologies. It includes the following lessons:
Data Domain Overview
Deduplication Basics
EMC Data Domain Stream-Informed Segment Layout (SISL) Scaling Architecture Overview
EMC Data Domain Data Invulnerability Architecture (DIA) Overview
EMC Data Domain File Systems Introduction
EMC Data Domain Protocols Overview
EMC Data Domain Data Paths Overview
EMC Data Domain Administration Interfaces
This module also includes a lab, which will enable you to test your knowledge.
15
Slide 2
This lesson is an introduction to the EMC Data Domain appliance. The first topic answers the question:
What is a Data Domain system? Also covered in this lesson is an overview of some Data Domain OS
software features and a current hardware model overview.
16
Slide 3
workloads that:
storage efficiency
Ensures recoverability of data through integrated
data integrity intelligence
Can replicate data automatically for disaster
recovery
Easily integrates via Ethernet and Fibre Channel into
existing backup infrastructures
Safe and reliable
Provides Continuous recovery verification, fault
detection, and healing for end-to-end data integrity
EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery. An EMC Data Domain system can also be used for online storage with additional features and
benefits.
A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.
Data Domain systems use low-cost Serial Advanced Technology Attachment (SATA) disk drives and
implement a redundant array of independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity.
Most Data Domain systems have a controller and multiple storage units.
17
Slide 4
Hardware Overview
Visit the Data Domain Hardware page on http://www.emc.com/ for specific models and specifications.
http://www.emc.com/ > Products and Solutions > Backup and Recovery > EMC Data Domain >
Hardware
18
Slide 5
Software Overview
The latest Data Domain Operating System (DD OS):
Supports leading backup, file archiving, and email archiving
applications
Allows simultaneous use of VTL, CIFS, NFS, NDMP, and EMC
Data Domain Boost
Provides inline write/read verification, continuous fault
detection, and healing
Meets IT governance and regulatory compliance standards for
archived data
The latest Data Domain Operating System (DD OS) has several features and benefits, including:
Support for leading backup, file archiving, and email archiving applications
Simultaneous use of VTL, CIFS, NFS, NDMP, and EMC Data Domain Boost
Inline write/read verification, continuous fault detection, and healing
Conformance with IT governance and regulatory compliance standards for archived data
19
Slide 6
This lesson covers deduplication, which is an important technology that improves data storage by
providing extremely efficient data backups and archiving. This lesson also covers the different types of
deduplication (inline, post-process, file-based, block-based, fixed-length, and variable-length) and the
advantages of each type. The last topic in this lesson covers Data Domain deduplication and its
advantages.
20
Slide 7
Deduplication Fundamentals
Deduplication has the following characteristics:
It is performed at the sub-file, whole file, or backup job level
Redundant data is stored only once
Multiple instances point to the same copy
Deduplication performance is dependent on several factors:
New Data
Amount of
data
Bandwidth
CPU
Disk speed
Memory
P L A Q
U A P L
P L A Q
Files
Segments
Smaller References
16
12
17
21
16
12
16
12
17
Deduplication is similar to data compression, but it looks for redundancy of large sequences of bytes.
Sequences of bytes identical to those previously encountered and stored are replaced with references
to the previously encountered data.
This is all hidden from users and applications. When the data is read, the original data is provided to the
application or user.
Deduplication performance is dependent on the amount of data, bandwidth, disk speed, CPU, and
memory or the hosts and devices performing the deduplication.
When processing data, deduplication recognizes data that is identical to previously stored data. When it
encounters such data, deduplication creates a reference to the previously stored data, thus avoiding
storing duplicate data.
21
Slide 8
Fingerprints
How deduplication compresses data:
Deduplication typically uses hashing algorithms
Hashing algorithms yield a unique value based on data content
The unique value is called a hash, fingerprint, or checksum
The fingerprint is much smaller than the original data
Fingerprints are used to determine if data is new or duplicate
P L A Q U A P L P L A Q
42
37
89
22
Slide 9
File-Based Deduplication
Pros
Only one copy of file content is stored
Identical copies are replaced with a reference to the original
Cons
Any change to the file results in the whole file being stored again
It uses more disk space than other deduplication methods
Original Data
Deduplicated Data
In file-based deduplication, only the original instance of a file is stored. Future identical copies of the file
use a small reference to point to the original file content. File-based deduplication is sometimes called
single-instance storage (SIS).
In this example, eight files are being deduplicated. The blue files are identical, but each has its own copy
of the file content. The grey files also have their own copy of identical content. After deduplication there
are still eight files. The blue files point to the same content, which is stored only once on disk. This is
similar for the grey files. If each file is 20 megabytes, the file-based deduplication has reduced the
storage required from 160 megabytes to 40.
File-based deduplication enables storage savings. It can be combined with compression (a way to
transmit the same amount of data in fewer bits) for additional storage savings. It is popular in desktop
backups. It can be more effective for data restores. It doesnt need to re-assemble files. It can be
included in backup software, so an organization doesnt have to depend on a vendor disk.
23
File-based deduplication results are often not as great as with other types of deduplication (such as
block- and segment-based deduplication). The most important disadvantage is there is no deduplication
with previously backed up files if the file is modified.
File-based deduplication stores an original version of a file and creates a digital signature for it (such as
SHA1, a standard for digital signatures). Future exact copy iterations of the file are pointed to the digital
signature rather than being stored.
24
Slide 10
Fixed-Length Deduplication
P L A Q U A P L P L A Q
42
56
42
P L A Q U A P L
42
56
42
10
25
Slide 11
Fixed-Length Deduplication
A
P L A Q U A P L P L A Q
Add one byte
42
56
42
AP L A Q U A P L P L A Q
68
87
30
11
11
When data is altered the segments shift, causing more segments to be stored. For example, when you
add a slide to a Microsoft PowerPoint deck, all subsequent blocks in the file are rewritten and are likely
to be considered as different from those in the original file, so the deduplication effect is less significant.
Smaller blocks get better deduplication than large ones, but it takes more resources to deduplicate.
In backup applications, the backup stream consists of many files. The backup streams are rarely entirely
identical even when they are successive backups of the same file system. A single addition, deletion, or
change of any file changes the number of bytes in the new backup stream. Even if no file has changed,
adding a new file to the backup stream shifts the rest of the backup stream. Fixed-sized segment
deduplication backs up large numbers of segments because of the new boundaries between the
segments.
Many hardware and software deduplication products use fixed-length segments for deduplication.
26
Slide 12
Variable-Length Deduplication
P L A Q U A P L P L A Q
21
56
21
28
21
P L A Q U A A Q
21
21
56
21
28
12
Variable-length segment deduplication evaluates data by examining its contents to look for the
boundary from one segment to the next. Variable-length segments are any number of bytes within a
range determined by the particular algorithm implemented.
Unlike fixed-length segment deduplication, variable-length segment deduplication uses the content of
the stream to divide the backup or data stream into segments based on the contents of the data stream.
27
Slide 13
Variable-Length Deduplication
Add one byte
P L A Q U A P L P L A Q
21
56
21
28
21
A P L A Q U A P L P L A Q
24
56
21
21
28
13
When you apply variable-length segmentation to a data sequence, deduplication uses variable data
segments when it looks at the data sequence. In this example, byte A is added to the beginning of the
data. Only one new segment needs to be stored, since the data defining boundaries between the
remaining data were not altered.
Eventually variable-length segment deduplication will find the segments that have not changed, and
backup fewer segments than fixed-size segment deduplication. Even for storing individual files, variable
length segments have an advantage. Many files are very similar to, but not identical to, other versions of
the same file. Variable length segments will isolate the changes, find more identical segments, and store
fewer segments than fixed-length deduplication.
28
Slide 14
Post-Process Deduplication
In contrast, post-process deduplication:
Should not interfere with the incoming backup data speed
Requires more I/O
Writes files first to disk in their entirety, then scans and
deduplicates them
Post-Process
Deduplication
Backup Server
All incoming data written to disk first
14
With post-process deduplication, files are written to disk first, and then they are scanned and
compressed.
Post-process deduplication should never interfere with the incoming backup data speed.
Post-process deduplication requires more I/O. It writes new data to disk and then reads the new data
before it checks for duplicates. It requires an additional write to delete the duplicate data and another
write to update the hash table. If it cant determine whether a data segment is duplicate or new, it
requires another write (this happens about 5% of the time). It requires more disk space to:
initially capture the data.
store multiple pools of data.
provide adequate performance by distributing the data over a large number of drives.
Post-process deduplication is run as a separate processing task and could lengthen the time needed to
fully complete the backup.
29
In post-process deduplication, files are first written to disk in their entirety (they are buffered to a large
cache). After the files are written, the hard drive is scanned for duplicates and compressed. In other
words, with post-process deduplication, deduplication happens after the files are written to disk.
With post-process deduplication, a data segment enters the appliance (as part of a larger stream of data
from a backup), and it is written to disk in its entirety. Then a separate process (running asynchronously
and possibly from another appliance accessing the same disk) reads the block of data to determine if it is
a duplicate. If it is a duplicate, it is deleted and replaced with a pointer. If it is new, it is stored.
30
Slide 15
P L A Q U A P L P L A Q
RAM
P L A Q UA A Q
21 56 21 21 28
New data compared to previously
Stored data before it is written to disk
21 56 21 21 28
P L A Q U A A Q
Deduplication Disk
15
With Data Domain inline deduplication, incoming data is examined as soon as it arrives to determine if a
segment (or block, or chunk) is new or unique or a duplicate of a segment previously stored. Inline
deduplication occurs in RAM before the data is written to disk. Around 99% of data segments are
analyzed in RAM without disk access. A very small amount of data is not identified immediately as either
unique or redundant. That data is stored to disk and examined again later against the previously stored
data.
In some cases, an inline deduplication process will temporarily store a small amount of data on disk
before it is analyzed.
The process is shown in this slide, as follows:
Inbound segments are analyzed in RAM.
If a segment is redundant, a reference to the stored segment is created.
If a segment is unique, it is compressed and stored.
31
Inline deduplication requires less disk space than post-process deduplication. There is less
administration for an inline deduplication process, as the administrator does not need to define and
monitor the staging space.
Inline deduplication analyzes the data in RAM, and reduces disk seek times to determine if the new data
must be stored.
32
Slide 16
Source-based deduplication
Occurs near where data is created
Uses a host-resident agent that reduces data at the server source
Target-based deduplication
Occurs near where the data is stored
Is controlled by a storage system, rather than a host
Provides an excellent fit for a virtual tape library (VTL) without
16
When the deduplication occurs close to where data is created, it is often referred to as source-based
deduplication, whereas when it occurs near where the data is stored, it is commonly called target-based
deduplication.
Source-based deduplication
Occurs near where data is created
Uses a host-resident agent that reduces data at the server source and sends just changed data
over the network
Reduces the data stream prior to transmission, thereby reducing bandwidth constraints
Target-based deduplication
Occurs near where the data is stored
Is controlled by a storage system, rather than a host
Provides an excellent fit for a virtual tape library (VTL) without substantial disruption to existing
backup software infrastructure and processes
Works best for higher change-rate environments
33
Slide 17
Local compression
Compresses segments before writing them to disk
Uses common, industry-standard algorithms (lz, gz, and gzfast)
Is similar to zipping a file to reduce the file size
Can be turned off
17
EMC Data Domain Global Compression is the EMC Data Domain trademarked name for global
compression, local compression, and deduplication.
Global compression equals deduplication. It identifies previously stored segments and cannot be turned
off.
Local compression compresses segments before writing them to disk. It uses common, industrystandard algorithms (for example, lz, gz, and gzfast). The default compression algorithm used by Data
Domain systems is lz.
Local compression is similar to zipping a file to reduce the file size. Zip is a file format used for data
compression and archiving. A zip file contains one or more files that have been compressed, to reduce
file size, or stored as is. The zip file format permits a number of compression algorithms. Local
compression can be turned off.
34
Slide 18
35
18
Slide 19
fingerprints
Scales with Data Domain systems using newer and faster CPUs and
RAM
Increases new-data processing throughput-rate
36
19
Slide 20
gz, gzfast)
Write: Segments (including fingerprints, metadata, & logs) written to
containers, containers written to disk
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~
~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
2
1
container
disk
disk
20
37
Slide 21
21
This lesson covers EMC Data Domain Data Invulnerability Architecture (DIA), which is an important EMC
Data Domain technology that provides safe and reliable storage.
38
Slide 22
22
Data Invulnerability Architecture (DIA) is an important EMC Data Domain technology that provides safe
and reliable storage.
The EMC Data Domain operating system (DD OS) is built for data protection. Its elements comprise an
architectural design whose goal is data invulnerability. Four technologies within the DIA fight data loss:
1. End-to-end verification
2. Fault avoidance and containment
3. Continuous fault detection and healing
4. File system recoverability
DIA helps to provide data integrity and recoverability and extremely resilient and protective disk
storage. This keeps data safe.
39
Slide 23
End-to-End Verification
1. Writes request from
2.
3.
4.
5.
6.
Generate
Checksum
backup software.
Analyzes data for
redundancy.
Stores new data segments.
Stores fingerprints.
Verifies, after backup I/O,
that the Data Domain OS
(DD OS) can read the data
from disk and through the
Data Domain file system.
Verifies that the checksum
that is read back matches
the checksum written to
disk.
Verify
File System
Global Compression
Local Compression
RAID
23
The end-to-end verification check verifies all file system data and metadata. The end-to-end verification
flow:
1. Writes request from backup software.
2. Analyzes data for redundancy.
3. Stores new data segments.
4. Stores fingerprints.
5. Verifies, after backup I/O, that the Data Domain OS (DD OS) can read the data from disk and
through the Data Domain file system.
6. Verifies that the checksum that is read back matches the checksum written to disk.
If something goes wrong, it is corrected through self-healing and the system alerts to back up again.
Since every component of a storage system can introduce errors, an end-to-end test is the simplest way
to ensure data integrity. End-to-end verification means reading data after it is written and comparing it
to what was sent do disk, proving that it is reachable through the file system to disk, and proving that
data is not corrupted.
40
When the DD OS receives a write request from backup software, it computes a huge checksum over the
constituent data. After analyzing the data for redundancy, it stores the new data segments and all of the
checksums. After the I/O has selected a backup and all data is synced to disk, the DD OS verifies that it
can read the entire file from the disk platter and through the Data Domain file system, and that the
checksums of the data read back match the checksums of the written data.
This ensures that the data on the disks is readable and correct and that the file system metadata
structures used to find the data are also readable and correct. This confirms that the data is correct and
recoverable from every level of the system. If there are problems anywhere, for example if a bit flips on
a disk drive, it is caught. Mostly, a problem is corrected through self-healing. If a problem cant be
corrected, it is reported immediately, and a backup is repeated while the data is still valid on the primary
store.
41
Slide 24
Old Data
New Data
24
Data Domain systems are equipped with a specialized log-structured file system that has important
benefits.
1. New data never overwrites existing data. (The system never puts existing data at risk.)
Traditional file systems often overwrite blocks when data changes, and then use the old block
address. The Data Domain file system writes only to new blocks. This isolates any incorrect
overwrite (a software bug problem) to only the newest backup data. Older versions remain safe.
As shown in this slide, the container log never overwrites or updates existing data. New data is
written to new containers. Old containers and references remain in place and safe even when
software bugs or hardware faults occur when new backups are stored.
2. There are fewer complex data structures.
In a traditional file system, there are many data structures (for example, free block bit maps and
reference counts) that support fast block updates. In a backup application, the workload is
primarily sequential writes of new data. Because a Data Domain system is simpler, it requires
fewer data structures to support it. As long as the Data Domain system can keep track of the
head of the log, new writes never overwrite old data. This design simplicity greatly reduces the
chances of software errors that could lead to data corruption.
42
3. The system includes non-volatile RAM (NVRAM) for fast, safe restarts.
The system includes a non-volatile RAM (NVRAM) write buffer into which it puts all data not yet
safely on disk. The file system leverages the security of this write buffer to implement a fast,
safe restart capability.
The file system includes many internal logic and data structure integrity checks. If a problem is
found by one of these checks, the file system restarts. The checks and restarts provide early
detection and recovery from the kinds of bugs that can corrupt data. As it restarts, the Data
Domain file system verifies the integrity of the data in the NVRAM buffer before applying it to
the file system and thus ensures that no data is lost due to a power outage.
For example, in a power outage, the old data could be lost and a recovery attempt could fail. For
this reason, Data Domain systems never update just one block in a stripe. Following the nooverwrite policy, all new writes go to new RAID stripes, and those new RAID stripes are written
in their entirety. The verification-after-write ensures that the new stripe is consistent (there are
no partial stripe writes). New writes dont put existing backups at risk.
43
Slide 25
File System
Global Compression
Local Compression
RAID
25
Continuous fault detection and healing provide an extra level of protection within the Data Domain
operating system. The DD OS detects faults and recovers from them continuously. Continuous fault
detection and healing ensures successful data restore operations.
Here is the flow for continuous fault detection and healing:
1. The Data Domain system periodically rechecks the integrity of the RAID stripes and container
logs.
2. The Data Domain system uses RAID system redundancy to heal faults. RAID 6 is the foundation
for Data Domain systems continuous fault detection and healing. Its dual-parity architecture
offers advantages over conventional architectures, including RAID 1 (mirroring), RAID 3, RAID 4
or RAID 5 single-parity approaches.
RAID 6:
Protects against two disk failures.
Protects against disk read errors during reconstruction.
Protects against the operator pulling the wrong disk.
Guarantees RAID stripe consistency even during power failure without reliance on NVRAM
or an uninterruptable power supply (UPS).
44
45
Slide 26
Meta
Data
Meta
Data
Containers
Data
Data
26
The EMC Data Domain Data Invulnerability Architecture (DIA) file system recovery is a feature that
reconstructs lost or corrupted file system metadata. It includes file system check tools.
If a Data Domain system does have a problem, DIA file system recovery ensures that the system is
brought back online quickly.
This slide shows DIA file system recovery:
Data is written in a self-describing format.
The file system can be recreated by scanning the logs and rebuilding it from metadata stored
with the data.
In a traditional file system, consistency is not checked. Data Domain systems check through initial
verification after each backup to ensure consistency for all new writes. The usable size of a traditional
file system is often limited by the time it takes to recover the file system in the event of some sort of
corruption.
46
Imagine running fsck on a traditional file system with more than 80 TB of data. The reason the checking
process can take so long is the file system needs to sort out the locations of the free blocks so new
writes do not accidentally overwrite existing data. Typically, this entails checking all references to
rebuild free block maps and reference counts. The more data in the system, the longer this takes.
In contrast, since the Data Domain file system never overwrites existing data and doesnt have block
maps and reference counts to rebuild, it has to verify only the location of the head of the log to safely
bring the system back online and restore critical data.
47
Slide 27
This lesson covers the Data Domain file system. The Data Domain file system includes:
ddvar (Administrative files)
MTrees (File Storage)
48
27
Slide 28
ddvar
Core files
Log files
/log
/releases
/snmp
/support
sub-directories
28
Data Domain system administrative files are stored in /ddvar. This directory stores system core and
log files, generated support upload bundles, compressed core files, and .rpm upgrade packages.
The ddvar file structure keeps administrative files separate from storage files.
You cannot rename or delete /ddvar, nor can you access all of its sub-directories, such as the core subdirectory.
49
Slide 29
MTree Introduction
deduplicated data
Is the Root directory for deduplicated
data
Lets you configure directory export levels
to separate and organize backup files
Lets you manage each MTree directory
separately (for example, different
compression rates)
/data
/col1
/backup
/HR
/sales
/support
29
The MTree (Managed Tree) file structure is the destination for deduplicated data. It is also the root
directory for deduplicated data. It comes pre-configured for NFS export as /backup. You configure
directory export levels to separate and organize backup files in the MTree file system.
The MTree file structure:
Uses compression.
Implements data integrity.
Reclaims storage space with file-system cleaning. You will learn more about file-system cleaning
later in this course.
MTrees provide more granular space management and reporting. This allows for finer management of
replication, snapshots, and retention locking. These operations can be performed on a specific MTree
rather than on the entire file system. For example, you can configure directory export levels to separate
and organize backup files.
50
Although a Data Domain system supports a maximum of 100 MTrees, system performance might
degrade rapidly if more than 14 MTrees are actively engaged in read or write streams. The degree of
degradation depends on overall I/O intensity and other file-system loads. For optimum performance,
you should contain the number of simultaneously active MTrees to a maximum of 14. Whenever
possible, it is best to aggregate operations on the same MTree into a single operation.
You can add subdirectories to MTree directories. You cannot add anything to the /data directory. You
can change only the col1 subdirectory. The backup MTree (/data/col1/backup) cannot be
deleted or renamed. If MTrees are added, they can be renamed and deleted. You can replicate
directories under /backup.
51
Slide 30
52
30
Slide 31
31
53
DD Boost
The DD Boost protocol enables backup servers to communicate with storage systems without
the need for Data Domain systems to emulate tape. There are two components to DD Boost:
one component that runs on the backup server and another component that runs on a Data
Domain system.
NDMP
If the VTL communication between a backup server and a Data Domain system is through NDMP
(Network Data Management Protocol), no Fibre Channel (FC) is required. When you use NDMP,
all initiator and port functionality does not apply.
54
Slide 32
This lesson covers Data Domain data paths, which include NFS, CIFS, DD Boost, NDMP, and VTL over
Ethernet or Fibre Channel.
This lesson also covers where a Data Domain system fits into a typical backup environment.
55
32
Slide 33
tape library
WAN-based replication
backup server
WAN
offsite disaster
recovery location
33
Data Domain systems connect to backup servers as storage capacity to hold large collections of backup
data. This slide shows how a Data Domain system integrates non-intrusively into an existing storage
environment. Often a Data Domain system is connected directly to a backup server. The backup data
flow from the clients is simply redirected to the Data Domain device instead of to a tape library.
Data Domain systems integrate non-intrusively into typical backup environments and reduce the
amount of storage needed to back up large amounts of data by performing deduplication and
compression on data before writing it to disk. The data footprint is reduced, making it possible for tapes
to be partially or completely replaced.
Depending on an organizations policies, a tape library can be either removed or retained.
An organization can replicate and vault duplicate copies of data when two Data Domain systems have
the Data Domain Replicator software option enabled.
56
One option (not shown) is that data can be replicated locally with the copies stored onsite. The smaller
data footprint after deduplication also makes WAN vaulting feasible. As shown in the slide, replicas can
be sent over the WAN to an offsite disaster recovery (DR) location.
WAN vaulting can replace the process of rotating tapes from the library and sending the tapes to a vault
by truck.
If an organizations policies dictate that tape must still be made for long-term archival retention, data
can flow from the Data Domain system back to the server and then to a tape library.
Often the Data Domain system is connected directly to the backup server. The backup data flow is
redirected from the clients to the Data Domain system instead of to tape. If tape needs to be made for
long-term archival retention, data flows from the Data Domain system back to the server and then to
tape, completing the same flow that the backup server was doing initially. Tapes come out in the same
standard backup software formats as before and can go off-site for long-term retention. If a tape must
be retrieved, it goes back into the tape library, and the data flows back through the backup software to
the client that needs it.
57
Slide 34
DD Boost
NFS/CIFS
FTP/NDMP
Ethernet
TCP(UDP)/IP
WAN
TCP(UDP)/IP
deduplicated replication
Ethernet
NFS/CIFS/DD Boost
FTP/NDMP
Ethernet
deduplicated
data written to
file system
34
A data path is the path that data travels from the backup (or archive) servers to a Data Domain system.
Ethernet supports the NFS, CIFS, FTP, NDMP, and DD Boost protocols that a Data Domain system uses to
move data.
In the data path over Ethernet (a family of computer networking technologies), backup and archive
servers send data from clients to Data Domain systems on the network via the TCP(UDP)/IP (a set of
communication protocols for the internet and other networks).
You can also use a direct connection between a dedicated port on the backup or archive server and a
dedicated port on the Data Domain system. The connection between the backup (or archive) server and
the Data Domain system can be Ethernet or Fibre Channel, or both if needed. This slide shows the
Ethernet connection.
58
When Data Domain Replicator is licensed on two Data Domain systems, replication is enabled between
the two systems. The Data Domain systems can be either local, for local retention, or remote, for
disaster recovery. Data in flight over the WAN can be secured using VPN. Physical separation of the
replication traffic from backup traffic can be achieved by using two separate Ethernet interfaces on a
Data Domain system. This allows backups and replication to run simultaneously without network
conflicts. Since the Data Domain OS is based on Linux, it needs additional software to work with CIFS.
Samba software enables CIFS to work with the Data Domain OS.
59
Slide 35
/dev/rmt
\\.\Tape#
WAN
FC SAN
TCP(UDP)/IP
deduplicated replication
SAN
Ethernet
Ethernet
VTL
deduplicated
data written to
file system
35
A data path is the path that data travels from the backup (or archive) servers to a Data Domain system.
Fibre Channel supports the VTL protocols that a Data Domain system uses to move data.
If the Data Domain virtual tape library (VTL) option is licensed, and a VTL FC HBA is installed on the Data
Domain system, the system can be connected to a Fibre Channel system attached network (SAN). The
backup or archive server sees the Data Domain system as one or multiple VTLs with up to 512 virtual
linear tape-open (LTO)-1, LTO-2, or LTO-3 tape drives and 20,000 virtual slots across up to 100,000
virtual cartridges.
60
Slide 36
61
36
Slide 37
Enterprise Manager
https://<DDHostName>/ddem
https://<DDHostName>/ddem
You need the
sysadmin password
to add a Data
Domain system
Enterprise Manager
summary screen
37
With the Enterprise Manager, you can manage one or more Data Domain systems. You can monitor and
add systems from the Enterprise Manager. (To add a system you need a sysadmin password.) You can
also view cumulative information about the systems youre monitoring.
A Data Domain system should be added to, and managed by, only one Enterprise Manager.
You can access the Enterprise Manager from many browsers:
Microsoft Internet Explorer
Google Chrome
Mozilla Firefox
The Summary screen presents a status overview of, and cumulative information for, all managed
systems in the DD Network devices list and summarizes key operating information. The System Status,
Space Usage, and Systems panes provide key factors to help you recognize problems immediately and to
allow you to drill down to the system exhibiting the problem.
62
The tally of alerts and charts of disk space that the Enterprise Manager presents enables you to quickly
spot problems.
Click the plus sign (+) next to the DD Network icon in the sidebar to expose the systems being managed
by the Enterprise Manager.
The Enterprise Manager includes tabs to help you navigate your way through administrative tasks. To
access the top- and sub-level tabs, shown in this slide, you must first select a system. In the lower pane
on the screen, you can view information about the system you selected. In this slide, a system has been
selected, and you can view details about it.
63
Slide 38
1.
2.
3.
4.
5.
Keyboard
Video Port
Serial Port
eth0a
eth0b
38
The EMC Data Domain command line interface (CLI) enables you to manage Data Domain systems.
You can do everything from the CLI that you can do from the Enterprise Manager.
After the initial configuration, use the SSH or Telnet (if enabled) utilities to access the system remotely
and open the CLI.
The DD OS 5.2 Command Reference Guide provides information for using the commands to accomplish
specific administration tasks. Each command also has an online help page that gives the complete
command syntax. Help pages are available at the CLI using the help command. Any Data Domain system
command that accepts a list (such as a list of IP addresses) accepts entries separated by commas, by
spaces, or both.
64
Slide 39
65
39
Slide 40
protocols
Easy to integrate
Qualified with leading enterprise backup and
archiving applications
Integrates easily into existing storage infrastructures.
Module 1: Technology Overview
40
EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery with high-speed, inline deduplication. An EMC Data Domain system can also be used for online
storage with additional features and benefits.
A Data Domain system can connect to your network via Ethernet or Fibre Channel connections. With an
Ethernet connection, the system can be accessed using the NDMP, DD Boost, CIFS and NFS protocols.
The Fibre Channel connection supports the VTL protocol.
EMC Data Domain implements deduplication in a special hardware device. Most Data Domain systems
have a controller and multiple storage units.
Data Domain systems use low-cost Serial Advanced Technology Attachment (SATA) disk drives and
implement a redundant array of independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity. Data Domain systems use non-volatile random access memory
(NVRAM) to protect unwritten data. NVRAM is used to hold data not yet written to disk. Holding data
like this ensures that data is not lost in a power outage.
66
Slide 41
Module 1: Summary
67
41
68
Slide 1
This module covers basic administrative tasks on a Data Domain system. It includes the following
lessons:
Verifying Hardware
Managing System Access
Introduction to Monitoring a Data Domain System
Licensed Features
Upgrading a Data Domain System
69
Slide 2
As part of initially setting up your Data Domain system, you should verify that your hardware is installed
and configured correctly. This lesson covers verifying your hardware.
70
Slide 3
1. Click Maintenance
2. Click More Tasks
3. Select Launch
4.
Configuration
Wizard
Follow all steps of
the Configuration
Wizard
The initial configuration of the Data Domain system will most likely be done using the Enterprise
Manager (EM) Configuration Wizard. The Enterprise Manager Configuration Wizard provides a graphical
user interface (GUI) that includes configuration options. After a network connection is configured (with
the CLI-based Configuration Wizard), you can use the Enterprise Manager Configuration Wizard to
modify or add configuration data. The Configuration Wizard performs an initial configurationit does
not cover all configuration options; it configures what is needed for the most basic system setup. After
the initial configuration, you can use the Enterprise Manager or CLI commands to change or update the
configuration.
The Configuration Wizard consists of these sections: Licenses, Network, File system, System, CIFS, and
NFS. You can configure or skip any section. After completing the Configuration Wizard, reboot the Data
Domain system. Note: The file system configuration is not described here. Default values are acceptable
to most sites.
The Configuration Wizard enables you to quickly step through basic configuration options without
having to use CLI commands.
71
72
Slide 4
1.
2.
Click Maintenance
Verify the model
number, DD OS
version, system
uptime, and serial
number
After your Data Domain system is installed, you should verify that you have the correct model number,
DD OS version, and serial number to ensure that they match what you ordered.
The System page in the Enterprise Manager gives you important system information without requiring
you to enter multiple commands.
To verify your model number, system uptime, and serial number in the Enterprise Manager:
1. Click the Maintenance tab.
2. Verify the model number, DD OS version, system uptime, and serial number.
You can also use the system show command using the command line interface (CLI) to view system
options.
# system show all
Show all system information.
# system show modelno
Display the hardware model number of a Data Domain system.
73
74
Slide 5
2
3
1. Click Hardware
2. Click Storage
3. Storage Status
Green
All disks in the system are in good condition
Yellow
The system is operational, but there are
Red
The system is not operational
After your Data Domain system is installed, you should verify that your storage is operational.
The Storage Status area of the page shows the current status of the storage (such as operational or nonoperational) and any active alerts (these can be clicked to view alert details). There are no active alerts
shown in this slide.
The status of a storage system can be:
Normal: System operational (green). All disks in the system are in good condition.
Warning: System operational (yellow). The system is operational, but there are problems that
need to be corrected. Warnings may result from a degraded RAID group, the presence of foreign
storage, or failed or absent disks.
Error: System non-operational (red). The system is not operational.
75
The Storage view provides a way of organizing the Data Domain system storage so disks can be viewed
by usage type (Active, Archive, Failed, and so on), operational status, and location. This includes internal
system storage and systems configured with external disk shelves. The status and inventory are shown
for all enclosures, disks, and RAID groups. The system is automatically scanned and inventoried so all
storage is shown in the Storage view.
1. Click the Hardware tab.
2. Click the Storage tab.
3. Verify the storage status.
From the command line, you can use the storage show command to display information about file
system storage.
# storage show {all | summary | tier {active | archive}}
Display information about file system storage. All users may run this command option.
Output includes the number of disk groups working normally and the number of degraded disk
groups. Details on disk groups undergoing, or queued for, reconstruction, are also shown when
applicable. The abbreviation N/A in the column Shelf Capacity License Needed indicates the
enclosure does not require a capacity license, or that part of the enclosure is within a tier and
the required capacity license for the entire enclosure has been accounted for.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.
76
Slide 6
1.
2.
3.
4.
Click Hardware
Click Storage
Click Overview
Expand Active Tier,
Usable Disks,
Failed/Foreign/Absent
or System Disks to view
details
After your Data Domain system is installed, you should verify that your storage is operational and your
disk group status is normal. Ensure that you observe your Disks Not In Use status.
The Storage view provides a way of organizing the Data Domain system storage so disks can be viewed
by usage type (Active, Archive, Failed, and so on), operational status, and location. This includes internal
system storage and systems configured with external disk shelves. The status and inventory are shown
for all enclosures, disks, and RAID groups. The system is automatically scanned and inventoried so all
storage is shown in the Storage view.
To view information about the Active Tier, Usable Disks, or Failed/Foreign/Absent disks, do the
following:
1. Click the Hardware tab.
2. Click the Storage tab.
3. Click the Overview tab.
4. Click Active Tier, Usable Disks, Failed/Foreign/Absent or System Disks to view details.
77
You can also use the command line interface (CLI) to display state information about all disks in an
enclosure (a Data Domain system or an attached expansion shelf), or LUNs in a Data Domain gateway
system using storage area network (SAN) storage using the disk show state command.
# disk show state
Display state information about all disks in an enclosure (a Data Domain system or an attached
expansion shelf), or LUNs in a Data Domain gateway system using storage area network (SAN)
storage.
Columns in the output display the disk state for each slot number by enclosure ID, the total
number of disks by disk state, and the total number of disks.
If a RAID disk group reconstruction is underway, columns for the disk identifier, progress, and
time remaining are also shown.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.
78
Slide 7
3
4
1.
2.
3.
4.
Click Hardware
Click Storage
Click Overview
Expand Active Tier
Disks in the active tier are currently marked as usable by the Data Domain file system.
Sections are organized by disks in use and disks not in use. If the optional archive feature is installed,
you can expand your view of the disk use in the active tier from the Storage Status Overview pane. You
can view both disks in use and disks not in use. In this example:
Disk Group: dg1
Status: Normal
Disk Reconstructing: N/A
Total Disks: 14
Disks: 3.1-3.14
You can also click the View Disks link to view individual disks.
79
Slide 8
Locating a Disk
1
1.
2.
3.
4.
Click Hardware
Click Storage
Click Disks
Select a disk and click
Beacon to locate a
disk
5. The Beaconing Disk
dialog opens. Click
Stop to close.
5
The Disks view lists all the system disks in a scrollable table with the following information.
Disk: The disk identifier. It can be:
The enclosure and disk number (in the form Enclosure.Slot).
A gateway disk (devn).
A LUN.
Status: The status of the disk (for example In Use, Spare).
Manufacturer/Model The manufacturers model designation. The display may include a model
ID or RAID type or other information depending on the vendor string sent by the storage array.
Firmware: The firmware level used by the third-party physical disk storage controller.
Serial Number: The manufacturers serial number for the disk.
The Disks tab enables you to see the status of all disks and details on individual disks.
Use the radio buttons to select how the disks are viewed: by all disks, or by tier, or by disk group.
80
To locate (beacon) a disk (for example, when a failed disk needs to be replaced):
1. Click Hardware > Storage > Disks.
2. The Disks view appears.
3. Select a disk from the Disks table and click Beacon.
4. The Beaconing Disk dialog window appears, and the LED light on the disk begins flashing.
5. Click Stop to stop the LED from beaconing.
From the command line, you can use the disk show command to display list of serial numbers of failed
disks in the Data Domain system. The disk beacon command will cause the LED that signals normal
operation to flash on the target disk.
# disk show failure-history
Display a list of serial numbers of failed disks in the Data Domain system.
# disk beacon <enclosure-id>.<disk-id>
Cause the LED that signals normal operation to flash on the target disk. Press Ctrl-C to stop the
flash. To check all disks in an enclosure, use the enclosure beacon command option.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.
81
Slide 9
1.
2.
3.
4.
Click Hardware
Click Storage
Click Overview
Expand Usable
Enclosures
Usable enclosures are those that arent incorporated into the file system yet.
The Usable Enclosures section enables you to view the usable disks within the expansion shelves on a
Data Domain system. You can also view the details of individual disks.
To view details about usable disks from the Enterprise Manager:
1. Select a system from the left navigation pane.
2. Click the Hardware tab.
3. View the status, which includes the disk:
Name
Status
Size
Manufacturer/model
Firmware
Serial number
82
From the command line, the disk show hardware command will display disk hardware information.
# disk show hardware
Display disk hardware information.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.
83
Slide 10
1.
2.
3.
4.
Click Hardware
Click Storage
Click Overview
Expand
Failed/Foreign/
Absent Disks
10
If there are any unusable disks, whether failed, foreign or absent, they will be displayed in this section.
The Failed/Foreign/Absent Disks section enables you to view failed, foreign, and absent Disks. You can
also view the details of individual disks.
84
To get the status on failed, foreign, and absent Disks in the Enterprise Manager:
1. Select a system from the left navigation pane.
2. Open the Failed/Foreign/Absent Disks panel.
3. View the following disk information:
Name
Status
Size
Manufacturer/model
Firmware
Serial number
85
Slide 11
1. Click Hardware
2. Click Chassis
11
The Chassis view provides a block drawing of the chassis and its componentsdisks, fans, power
supplies, NVRAM, CPUs, Memory, etc. The components that appear depend on the Data Domain system
model.
The chassis view enables you to check the hardware status.
To view your chassis status in the Enterprise Manager:
1. Click the Hardware tab.
2. Click Chassis.
86
From here you can view the following by hovering your mouse over them:
NVRAM
PCI slots
SAS
Power supply
PS fan
Riser expansion
Temperature
Fans
Front and back chassis views
Using the command line interface (CLI), you can check system statistics for the time period since the last
reboot using the system show stats command. The system show hardware command will display
information about slots and vendors and other hardware in a Data Domain system. Consult the DD OS
5.2 Command Reference Guide for more information on using the commands referenced in this student
guide.
87
Slide 12
88
12
Slide 13
This lesson covers user privileges, administration access, and user administration.
89
13
Slide 14
Only the sysadmin user can create the first security officer. After
the first security officer is created, only security officers can
create or modify other security officers.
Sysadmin is the default admin user and cannot be deleted or
modified.
The first security-officer account cannot be deleted
14
To enhance security, each user can be assigned a different role. Roles enable you to restrict system
access to a set of privileges. A Data Domain system supports the following roles:
Admin
Allows one to administer, that is, configure and monitor, the entire Data Domain system.
User
Allows one to monitor Data Domain systems and perform the fast copy operation.
Security
In addition to the user role privileges, allows one to set up security officer configurations and
manage other security officer operators.
Backup-operator
In addition to the user role privileges, allows one to create snapshots, import and export tapes
to a VTL library and move tapes within a VTL library.
Data-access
Intended for DD Boost authentication, an operator with this role cannot monitor or configure a
Data Domain system.
90
Note: The available roles display based on the users role. Only the Sysadmin user can create the first
security officer. After the first security officer is created, only security officers can create or modify other
security officers. Sysadmin is the default admin user and cannot be deleted or modified.
91
Slide 15
Management
15
In the Access Management tab, you can create and manage users.
Managing users enables you to name the user, grant them privileges, make them active, disabled or
locked, and find out if and when they were disabled. You can also find out the users last login location
and time.
To create new users, follow these steps:
1. Click the System Settings > Access Management > Local Users tabs.
The Local Users view appears.
2. Click the Create button to create a new user.
The Create User dialog box appears.
3. Enter the following information in the General Tab:
User The user ID or name.
Password The user password. Set an initial password (the user can change it later).
Verify Password The user password, again.
Role The role assigned to the user.
92
93
Slide 16
1
4
3
5
1. Click System
2.
3.
4.
5.
Settings
Click Access
Management
Select Administrator
Access
Expand More Tasks
Select a protocol to
configure
16
As an administrator, you need to view and configure services that provide administrator and user access
to a Data Domain system. The services include:
Telnet: Provides access to a Data Domain system through a Telnet connection.
FTP: Provides access to a Data Domain system through an FTP connection.
HTTP/HTTPS: Provides access to a Data Domain system through an HTTP HTTPS, or both,
connection.
SSH: Provides access to a Data Domain system through an SSH connection.
Managing administration access protocols enables you to view and manage how other administrators
and users access a Data Domain system.
94
95
To provide access to a Data Domain system through an HTTP, HTTPS, or both connection:
1. On the Access Management page, select Configure HTTP/HTTPS from the More Tasks menu.
The Configure HTTP/HTTPS Access dialog box appears.
2. To enable HTTP and/or HTTPS access, click the checkbox for Allow HTTP Access and/or the Allow
HTTPS Access.
3. Determine how hosts connect:
To allow complete access, click the Allow all hosts to connect radio button.
To configure specific hosts, click the Limit Access to the following systems radio button, and
click the appropriate icon in the Allowed Hosts pane. A hostname can be a fully qualified
hostname or an IP address.
To add a host, click the plus button (+). Enter the hostname, and click OK.
To modify a hostname, click the checkbox next to the hostname in the Hosts list, and
click the edit button (pencil). Change the hostname, and click OK.
To remove a hostname, click the checkbox of the hostname in the Hosts list, click the
minus button (-), and click OK.
4. To configure system ports and session timeout values, click the Advanced tab.
In the HTTP Port text entry box, enter the port for connection.
Port 80 is assigned by default.
In the HTTPS Port text entry box, enter the port for the connection.
Port 443 is assigned by default.
In the Session Timeout text entry box, enter the interval in seconds that must elapse before
the connection closes.
10800 seconds (3 hours) is assigned by default.
Note: Click Default to return the setting back to the default value.
5. Click OK.
To provide access to a Data Domain system through an SSH connection:
1. On the Access Management page, select Configure SSH from the More Tasks menu.
The Configure SSH Access dialog box appears.
2. To enable SSH access, click the Allow SSH Access checkbox.
3. Determine how hosts connect:
To allow complete access, click the Allow all hosts to connect radio button.
To configure specific hosts, click the Limit Access to the following systems radio button, and
click the appropriate icon in the Allowed Hosts pane. A hostname can be a fully qualified
hostname or an IP address.
To add a host, click the plus button (+). Enter the hostname, and click OK.
To modify a hostname, click the checkbox of the hostname in the Hosts list, and click the
edit button (pencil). Change the hostname, and click OK.
To remove a hostname, click the checkbox of the hostname in the Hosts list, click the
minus button (-), and click OK.
4. Click OK.
Using the command line interface (CLI) the adminaccess command can be used to allow remote
hosts to use the FTP, Telnet, HTTP, HTTPS, and SSH administrative protocols on the Data Domain system.
96
Slide 17
97
17
Slide 18
18
This lesson covers the basics of monitoring a Data Domain system, including log file locations, settings
and alerts.
98
Slide 19
Log Files
/ddvar
messages
space.log
/log
ddfs.info
vtl.info
perf.log
messages.engineering
/debug
/cifs
/ost
/platform
cifs.log
join_domain.log
ost.log
kern.info
19
The Data Domain system logs system status messages hourly. Log files can be bundled and sent to Data
Domain Support to provide the detailed system information that aids in troubleshooting any system
issues that may arise.
The Data Domain system log file entries contain messages from the alerts feature, autosupport reports,
and general system messages. The log directory is /ddvar/log.
Every Sunday at 3 a.m., the Data Domain system automatically opens new log files and renames the
previous files with an appended number of 1 through 9, such as messages.1. Each numbered file is
rolled to the next number each week. For example, at the second week, the file messages.1 is rolled
to messages.2. If a file messages.2 already existed, it rolls to messages.3. An existing
messages.9 is deleted when messages.8 rolls to messages.9.
The /ddvar/log folder includes files related to troubleshooting. Only relevant files or folders are
listed. The CLI command to view logs is log view [filename].
99
Use the Enterprise Manager to view the system log files in /ddvar/log.
1. Maintenance > Logs
2. Click the file you want to view.
The ddvar folder contains other log files that you cannot view through log commands or from the
Enterprise Manager.
To view all Data Domain system log files, create a ddvar share (CIFS) or mount the ddvar folder (NFS).
Contents of listed log files:
messages: Messages from the alerts, autosupport reports, and general system messages
space.log: Messages about disk space used by Data Domain system components and data
storage, and messages from the cleaning process
ddfs.info: Debugging information created by the file system processes
vtl.info: VTL information messages
perf.log: Performance statistics used by Data Domain support staff for system tuning
cifs.log: CIFs information messages
join_domain.log: Active directory information messages
ost.log: System information related to DD Boost
messages.engineering: Engineering-level messages related to the system
kern.info: Kernel information messages
You can also view log files from the command line using the following commands:
# log list
List top level or debug files in the log directory
# log view
View the system log or another log file
# log watch
Watch the system log or another log file in real time
100
Slide 20
20
Autosupport logs and alert messages help solve and prevent potentially crippling Data Domain system
problems.
Autosupport alert files provide timely notification of significant issues. Autosupport sends system
administrators, as well as Data Domain Support (when configured), a daily report of system information
and consolidated status output from a number of Data Domain system commands and entries from
various log files. Included in the report are extensive and detailed internal statistics and log information
to aid Data Domain Support in identifying and debugging system problems.
Autosupport logs are sent by email as simple text. Autosupport log distribution can be scheduled, with
the default time being 6:00 a.m.
During normal operation, a Data Domain system may produce warnings or encounter failures whereby
administrators must be informed immediately. This communication is performed by means of an alert.
101
Alerts are sent out to designated individuals or groups so appropriate actions can be taken promptly.
Alerts are sent as email in two forms: one is an immediate email for an individual alert to subscribers set
via the notification settings. The other is sent as a cumulative Daily Alert Summary email that is logged
on the Current Alerts page. These summaries are sent daily at 8:00 a.m. Daily alert summaries update
any critical events that might be occurring to the system.
Autosupport logs and alert messages:
Report the system status and identify potential system problems
Provide daily notification of the systems condition
Send email notifications to specific recipients for quicker, targeted responses
Supply critical system data to aid support case triage and management
102
Slide 21
autosupport@
autosupport.datadomain.com
via SMTP
summary alert report
detailed autosupport
report
System History
/ddvar/support
reboots
warnings
reports
integration to
other systems
21
Each autosupport report can be rather large, depending on your system configuration (plain text
format).
The autosupport file contains a great deal of information on the system. The file includes general
information, such as the DD OS version, System ID, Model Number and Uptime, as well as information
found in many of the log files.
Autosupport logs are stored in the Data Domain system in /ddvar/support. Autosupport contents
include:
system ID
uptime information
system command outputs
runtime parameters
logs
system settings
status and performance data
debugging information
103
By default, the full autosupport report is emailed daily at 6:00 a.m. A second report, the autosupport
alert summary, is sent daily at 8:00 a.m.
A Data Domain system can send autosupport reports, if configured, to EMC Data Domain via SMTP to
the autosupport data warehouse within EMC. Data Domain captures the above files and stores them by
Data Domain serial number in the data warehouse for reference when needed for troubleshooting that
system. Autosupport reports are also a useful resource for Data Domain Technical Support to assist in
researching any cases opened against the system.
The autosupport function also sends alert messages to report anomalous behaviors, such as, reboots,
serious warnings, failed disk, failed power supply, and system nearly full. For more serious issues, such
as system reboots and failed hardware, these messages, can be configured to send to Data Domain, and
to automatically create cases for Support to proactively take action on your behalf.
Autosupport requires SMTP service to be active on the Data Domain system pointing to a valid email
server over a connection path to the Internet.
104
Slide 22
Configure Autosupport
1
2
3
5
1.
2.
3.
4.
5.
Click Maintenance
Click Support
Select Autosupport
Add or remove
additional subscribers
to autosupport
mailing list
Enable or disable
notifications
22
In the Enterprise Manager, you can add, delete, or edit email subscribers by clicking Configure in the
Autosupport Mailing List Subscribers area of the Autosupport tab.
Autosupport subscribers receive daily detailed reports. Using SMTP, autosupports are sent to Data
Domain Technical Support daily at 6 a.m. local time. This is the default setting.
View any of the collection of Autosupport reports in the Autosupport Report file listing by clicking the
file name. You are then prompted to download the file locally. Open the file for reading in a standard
web browser for convenience.
105
You can also use the command line interface (CLI) to configure Autosupports. Consult the DD OS 5.2
Command Reference Guide for more information on using the commands referenced in this student
guide.
# autosupport disable support-notify
Disables the sending of the Daily Alert Summary and the Autosupport Report to Data Domain
Support.
# autosupport enable support-notify
Enables the sending of the Daily Alert Summary and the Autosupport Report to Data Domain
Support.
# autosupport add
Adds entries to the email list for the Daily Alert Summary or the Autosupport Report.
# autosupport del
Deletes entries to the email list for the Daily Alert Summary or the Autosupport Report.
# autosupport set schedule
Schedules the Daily Alert Summary or the Autosupport Report. For either report, the most
recently configured schedule overrides the previously configured schedule.
# autosupport show
Displays autosupport configuration.
# autosupport show schedule
Displays the schedules for the Daily Alert Summary and the Autosupport Report.
106
Slide 23
Alerts
1
2
1.
2.
3.
4.
5.
3
4
Click Status
Click Alerts
Select Notification
Click Add
Add a group name and
set appropriate
attributes
23
Alerts are notification messages generated by a Data Domain system if an undesirable event occurs.
A configured Data Domain system sends an alert immediately via email to any list of subscribers. Higherlevel alerts can be sent automatically to EMC Data Domain Support for tracking.
If Data Domain Support receives a copy of the message, and depending on the nature of the event, a
support case is generated, and a Technical Support Engineer proactively tries to resolve the issue as
soon as possible.
107
Alert notification groups allows flexibility in notifying the responsible parties who provide maintenance
to a Data Domain system. Individual subscribers can be targeted for specific types of alerts. Instead of
sending alerts to every subscriber for every type of problem, a sysadmin can configure groups of
contacts related to types of issues. For example, you can create an environment alert notification group
for team members who are responsible for data center facilities, and power to the system. When the
system creates a specific, environment-related alert, only those recipients for that class of alerts are
contacted.
System administrators can also set groups according to the seriousness of the alert.
Set alert notification groups in Status > Alerts > Notifications tab.
After a group is created, you can configure the Class Attributes pane to modify the types and severity of
the alerts this group should receive. In the Subscribers pane, you can modify a list of recipient email
addresses belonging to this group.
You can also use the command line interface (CLI) to configure autosupports.
# alerts notify-list create
Creates a notification list and subscribes to events belonging to the specified list of classes and
severity levels.
# alerts notify-list add
Adds to a notification list and subscribes to events belonging to the specified list of classes and
severity levels.
# alerts notify-list del
Deletes members from a notification list, a list of classes, a list of email addresses.
# alerts notify-list destroy
Destroys a notification list
# alerts notify-list reset
Resets all notification lists to factory default
# alerts notify-list show
Shows notification lists configuration
# alerts notify-list test
Sends a test notification to alerts notify-list
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.
108
Slide 24
24
Within the EMC Data Domain support portal, you can access and view autosupports, alert messages,
and alert summaries sent by a Data Domain system. Only systems sending autosupport information to
Data Domain are presented through the support portal.
When reviewing your systems, you see a list of systems and their maintenance status. The symbols used
on this web page reflect maintenance contract status and do not reflect the operational status of the
machine.
A maintenance alert is a red disk icon with a white X. It indicates that a maintenance contract has
expired. An amber triangle with a white exclamation point indicates that maintenance is nearing
expiration.
Select a line item in the list of available Data Domain systems, and you are presented with information
about your support contract, including its expiration date and a link to renew the contract.
109
Slide 25
25
When you click View Space Plot, a graph appears where the space usage is shown. Cumulative
autosupport data is gathered in this graph. In the space plot page, there is a link to view detailed tabular
data.
Within the autosupport archive, you see autosupports, alerts, alert summaries, and reboot notifications
for a given system. Autosupports can be listed in the support portal, showing only the most recent of
each type of autosupport, or a list of all autosupports of a single type, or all autosupports of all types.
110
Slide 26
SNMP
SNMP server
SNMP
management console
snmpd
Trap Packet
MIB
community string (V2C) or
authenticated user with privacy (V3)
DATA-DOMAIN-MIB:
powerSupplyFailedAlarm
Trap Packet
SNMP
agent
OID: 1.3.6.1.4.1.19746.2.0.1
26
The Simple Network Management Protocol (SNMP) is an open-standard protocol for exchanging
network management information, and is a part of the Transmission Control Protocol/Internet Protocol
(TCP/IP) protocol suite. SNMP provides a tool for network administrators to monitor and manage
network-attached devices, such as Data Domain systems, for conditions that warrant administrator
attention.
In typical SNMP uses, one or more administrative computers, called managers, have the task of
monitoring or managing a group of hosts or devices on a computer network. Each managed system
executes, at all times, a software component called an agent that reports information via SNMP to the
manager.
Essentially, SNMP agents expose management data on the managed systems through object IDs (OIDs).
The protocol also permits active management tasks, such as modifying and applying a new
configuration, through remote modification of these variables. In the case of Data Domain systems,
active management tasks are not supported. The data contained in the OIDs are called variables, and are
organized in hierarchies. These hierarchies, and other metadata (such as type and description of the
variable), are described by Management Information Bases (MIBs).
111
When an SNMP agent residing on the Data Domain system transmits OID traps, which are messages
from the system indicating change of system state in the form of a very basic OID code (for example,
1.3.6.1.4.1.19746.2.0.1). The management system, running the snmp daemon, interprets the OID
through the Data Domain MIB and generates the alert message to the SNMP management console (for
example, powerSupplyFailedAlarm).
DD OS supports two forms of SNMP authentication, each in a different SNMP version. In SNMP version 2
(v2), each SNMP management host and agent belongs to an SNMP community: a collection of hosts
grouped together for administrative purposes. Deciding the computers that should belong to the same
community is generally, but not always, determined by the physical proximity of the computers.
Communities are identified by the names you assign them. A community string can be thought of as a
password shared by SNMP management consoles and managed computers. Set hard-to-guess
community strings when you install the SNMP service. There is little security as none of the data is
encrypted.
SNMP version 3 (v3) offers individual users instead of communities with related authentication (MD5 or
SHA1) and AES or DES privacy.
When an SNMP agent receives a message from the Data Domain system, the community string or user
authentication information contained in the packet is verified against the agent's list of acceptable users
or community strings. After the name is determined to be acceptable, the request is evaluated against
the agent's list of access permissions for that community. Access can be set to read-only or read-write.
System status information can be captured and recorded for the system that the agent is monitoring.
You can integrate the Data Domain management information base into SNMP monitoring software, such
as EMC NetWorker or Data Protection Advisor. Refer to your SNMP monitoring software administration
guide for instructions on how to integrate the MIB into your monitoring software and for recommended
practices. SNMP management systems monitor the system by maintaining an event log of reported
traps.
112
Slide 27
SNMP
2
1
3
27
You can download the Management Information Base (MIB) file from the Enterprise Manager by
navigating to System Settings > General Configuration > SNMP and clicking the Download MIB file
button. You can also download the MIB files from the /ddvar/snmp directory.
Install the MIB file according to the instructions of your management server.
The default port that is open when SNMP is enabled is port 161. Traps are sent out through port 162.
Configure either SNMP V3 or V2C in the same window. Follow the instructions for your SNMP
management software to ensure proper set-up and communication between the management console
and the Data Domain system.
Consult the EMC Data Domain Operating System 5.2 Command Reference Guide for the full set of MIB
parameters included in the Data Domain MIB branch.
113
Slide 28
port 514
Syslog can be configured using only the command line interface
(CLI)
System
Messages
LAN
syslog server
collects
logs
28
Some log messages can be sent from the Data Domain system to other systems. DD OS uses syslog to
publish log messages to remote systems.
In a Data Domain system, the remote logging feature uses UDP port 514.
You can configure a Data Domain system to send system messages to a remote syslog server.
A Data Domain system exports the following facility.priority selectors for log files. For
information on managing the selectors and receiving messages on a third-party system, see your
vendor-supplied documentation for the receiving system.
The log host commands manage the process of sending log messages to another system:
*.noticeSends all messages at the notice priority and higher.
*.alertSends all messages at the alert priority and higher (alerts are included in *.notice).
kern.*Sends all kernel messages (kern.info log files).
local7.*Sends all messages from system startups (boot.log files).
Syslog can be configured using only the command line interface (CLI) with the Data Domain system.
114
115
Slide 29
116
29
Slide 30
30
This lesson covers the basics of adding licensed features to, and removing optional licenses from, a Data
Domain system.
117
Slide 31
DD Boost
Replication
Retention Lock Governance
Retention Lock Compliance
VTL (Virtual Tape Library)
Encryption of Data at Rest
Expansion Storage
Shelf Capacity
Level 2
Gateway Expanded Storage
Level 3
DD Extended Retention
(Formerly DD Archiver)
Global Deduplication Array
(GDA)
Nearline
31
DD Boost
Allows a system to use the Boost interface on a Data Domain system.
Replication
Adds the Data Domain Replicator for replication of data from one Data Domain system to
another.
Retention Lock Governance
Protects selected files from modification and unscheduled deletion, that is, deletion before a
specified retention period has expired.
Retention Lock Compliance
Allows you to meet the strictest data retention requirements from regulatory standards such as
SEC17a-4.
VTL (Virtual Tape Library)
Allows backup software to see a Data Domain system as a tape library.
Encryption of Data at Rest
Allows data on system drives or external storage to be encrypted while being saved, and then
locked before moving to another location.
118
Expansion Storage
Allows the upgrade of capacity for the Data Domain system. Enables either the upgrade of a 9disk DD510/DD530 to 15 disks, or the upgrade of a 7-disk DD610/DD630 to 12 disks.
Shelf Capacity
Allows ES30 and ES20 (purchased for use with DD OS 5.1) external shelves to be added to the
Data Domain system for additional capacity.
Gateway Expanded Storage Level 2
Enables gateway systems to support up to 71 TB of usable capacity.
Gateway Expanded Storage Level 3
Enables gateway systems to support up to 145 TB of usable capacity.
DD Extended Retention (formerly DD Archiver)
Provides long-term backup retention on the DD860 and DD990 platforms.
Global Deduplication Array (GDA)
Licenses the global deduplication array.
Nearline
Identifies systems deployed for archive and nearline workloads.
119
Slide 32
Managing Licenses
1
2
3
1. Click System
2.
3.
4.
Settings
Click Licenses
Click Add
Licenses to add
licenses
Select one or
more licenses
from the list
then click Delete
Selected
Licenses to
remove licenses
32
You can check which licenses are enabled on your Data Domain system using the Enterprise Manager.
1. In the Navigational pane, expand the DD Network and select a system.
2. Click the System Settings > Licenses tabs.
The Feature Licenses pane appears, showing the list of license keys and features.
You can also use the command line interface (CLI) to check which licenses are enabled by using the
license show command. If the local argument is included in the option, output includes details on
local nodes only.
To add a feature license using the Enterprise Manager:
1. In the Feature Licenses pane, click Add Licenses.
The Add Licenses dialog box displays.
2. In the License Key text box, type or paste one or more license keys, each on its own line or
separated by a space or comma (and they will be automatically placed on a new line).
3. Click Add.
120
The added licenses display in the Added license list. If there are errors, they will be shown in the error
license list. Click a license with an error to edit the license, and click Retry Failed License(s) to retry the
key. Otherwise, click Done to ignore the errors and return to the Feature Licenses page.
You can also add one or more licenses for features and storage capacity using the command line
interface (CLI). Include dashes when entering the license codes. This command option may run on a
standalone Data Domain system or on the master controller of a Global Deduplication Array.
# license add <license-code> [<license-code> ...]
Example
# license add ABCD-DCBA-AABB-CCDD BBCC-DDAA-CCAB-AADD-DCCB-BDAC-E5
Added "ABCD-DCBA-AABB-CCDD" : REPLICATION feature
Added "BBCC-DDAA-CCAB-AADD-DCCB-BDAC-E5" : CAPACITY-ARCHIVE feature
for 6TiB capacity ES20
To remove one or more feature licenses using the Enterprise Manager:
In the Feature Licenses pane, click a checkbox next to one or more licenses you wish to
remove and click Delete Selected Licenses.
In the Warning dialog box, verify the license(s) to delete and click OK.
The licenses are removed from the license list.
You can also use the command line interface (CLI) to delete one or more software option licenses. In a
GDA configuration, run this command on the master controller.
Security officer authorization is required to delete licenses from Retention Lock Compliance systems
only.
You can also use the license del command to remove licenses from the command line.
Example
# license del EEFF-GGHH-JJII-LLKK MMNN-OOQP-NMPQ-PMNM STXZ-ZDYSGSSGBBAA
License code "EEFF-GGHH-JJII-LLKK" deleted.
License code "MMNN-OOQP-NMPQ-PMNM" deleted.
License code "STXZ-ZDYS-GSSG-BBAA" deleted.
If you need to remove all licenses at once using the command line interface (CLI) you can use the
license reset command. This command option requires security officer authorization if removing
licenses from Retention Lock Compliance systems. Licenses cannot be reset on a Global Deduplication
Array.
121
Slide 33
122
33
Slide 34
34
Upon completion of this module, you should be able to describe the upgrade process for a Data Domain
system.
This lesson covers the following topics:
Preparing for a DD OS upgrade
Downloading the upgrade file
Using release notes to prepare for an upgrade
Performing the upgrade process
123
Slide 35
DD OS Releases
Release Types
RA, IA, and GA
35
Restricted availability releases are not available to all Data Domain system owners as a general
download. They can be obtained only through the appropriate EMC Data Domain Sales or
Support team approvals.
Initial Availability (IA)
An IA release is available as a download on the Data Domain support website and is intended for
production use by customers who need any of the new features or bug fixes contained in the
release.
General Availability (GA)
A GA release is available as a download on the Data Domain Support website and is intended for
production use by all customers. Any customer running an earlier Data Domain operating
system release, GA release or non-GA release, should upgrade to the latest GA release.
124
To ensure consistency in how we introduce our software, all release types move through the RA, IA, and
GA progression in a similar fashion. This allows customers to evaluate the releases using similar
standards. Data Domain recommends that you track Data Domain OS releases deployed in your backup
environment. It is important that the backup environment run the most current, supported releases.
Minimize the number of different deployed release versions in the same environment. As a general rule,
you should upgrade to the latest GA release of a particular release family. This ensures you are running
the latest version that has achieved our highest reliability status.
When RA or IA status releases are made available for upgrade, carefully consider factors such as the
backup environment, the feature improvements that are made to the release, and the potential risks of
implementing releases with less customer run-time than a GA release. Depending on these factors, it
might make sense to wait until a release reaches GA status.
There is no down-grade path to a previous version of the Data Domain operating system (DD OS). The
only method to revert to a previous DD OS version is to destroy the file system and all the data
contained therein, and start with a fresh installation of your preferred DD OS.
Caution: REVERTING TO A PREVIOUS DD OS VERSION DESTROYS ALL DATA ON THE DATA DOMAIN
SYSTEM.
Before upgrading:
Read all pertinent information contained in the release notes for the given upgrade version.
If you have questions or need additional information about an upgrade, contact EMC Data
Domain Support before upgrading for the best advice on how to proceed.
125
Slide 36
Why Upgrade?
same version of DD OS
Compatibility is ensured with your backup host software
Unexpected system behavior can be corrected
36
It is not always essential, but it is wise, to maintain a Data Domain system with the current versions of
the OS. With the newest version of the Data Domain operating system, you can be sure that you have
access to all features and capabilities your system has to offer.
When you add newer Data Domain systems to your backup architecture, a newer version of DD
OS is typically required to support hardware changes such as remote-battery NVRAM, or when
adding the newer ES30 expansion shelf.
Data Domain Support recommends that systems paired in a replication configuration all have
the same version of DD OS.
Administrators upgrading or changing backup host software should always check the minimum
DD OS version recommended for a version of backup software in the Backup Compatibility
Guide. This guide is available in the EMC Data Domain support portal. Often, newer versions of
backup software are supported only with a newer version of DD OS. Always use the version of
the Data Domain operating system recommended by the backup software used in your backup
environment.
No software is free of flaws, and EMC Data Domain works continuously to improve the
functionality of the DD OS. Each version release has complete Release Notes that identify bug
fixes by number and what was fixed in the version.
126
Slide 37
Considerations
Are you upgrading more than two release families at a time?
4.7 to 4.9 is considered two families
4.9 to 5.2 is more than two families and requires two upgrades
Time required
Single upgrades can take 45 minutes or more
During the upgrade, the Data Domain file system is unavailable
Shutting down processes, rebooting after upgrade, and checking the
upgrade all take time
Replication
Do not disable replication on either system in the pair
Upgrade the destination (replica) before upgrading the source
(originator)
The system should be idle before beginning the upgrade
37
An upgrade to release 5.2 can be performed only from systems using release families 5.0 or 5.1.
Typically when upgrading DD OS, you should upgrade only two release families at a time ( 4.7 to 4.9, or
4.8 to 5.0). In order to upgrade to release 5.2 from a release family earlier than 4.7, you must upgrade in
steps. If you are more than two release families behind, contact EMC Data Domain Support for advice on
the intermediate versions to use for your stepped upgrade.
Make sure you allocate appropriate system downtime to perform the upgrade. Set aside enough time to
shut down processes prior to the upgrade and for spot-checking the upgraded system after completing
the upgrade. The time to run an the actual upgrade should take no longer than 45 minutes. Adding the
time to shut down processes, and to check the upgraded system, might take 90 minutes or more to
complete the upgrade. Double this time if you are upgrading more than two release families.
For replication users: Do not disable replication on either side of the replication pair. After it is back
online, replication automatically resumes service.
You should upgrade the destination (replica) before you upgrade the source Data Domain system.
Be sure to stop any client connections before beginning the upgrade.
127
Slide 38
128
38
Slide 39
39
When you have the new DD OS upgrade package downloaded locally, you can upload it to the Data
Domain system with the Data Domain Enterprise Manager:
1. Click Upload Upgrade Package and browse your local system until you find the upgrade package
you downloaded from the support portal.
2. Click OK.
The file transfers to the Data Domain system. The file is now in the list of available upgrade packages.
To perform a system upgrade:
1. Select the upgrade package you want to use from the list of available upgrade packages.
2. Click Perform System Upgrade.
The upgrade proceeds.
When the upgrade is complete, the system automatically reboots on its own. You need to login to the
Data Domain Enterprise Manager to resume administrative control of the Data Domain system.
129
Slide 40
Module 2: Summary
130
40
Slide 1
This module focuses on managing network interfaces. It includes the following lessons:
Configuring Network Interfaces
Link Aggregation
Link Failover
VLAN and IP Alias Interfaces
This module also includes a lab, which will enable you to test your knowledge.
131
Slide 2
This lesson covers configuring network interfaces. To do this, you need to know how to manage network
settings and routes, and how to create and configure static routes.
132
Slide 3
1. Click Hardware
2. Click Network
3. Click Interfaces
133
134
You can also use the command line interface (CLI) to configure and manage physical and virtual
interfaces, DHCP, DNS, IP addresses, and display network information and status.
# net config <ifname>{[[<ipaddr>] [netmask <mask>] [dhcp {yes |
no}]] | [<ipv6addr>]} {[autoneg] | [duplex {full | half} speed
{10|100|1000|10000}] [up | down] [mtu {<size> | default}]
Configure an Ethernet interface.
# net config <ifname> type {none | management | replication |
cluster}
Configure or set the type of Ethernet interface.
# net show all
Display all networking information, including IPv4 and IPv6 addresses.
# net show config [<ifname>]
Display the configuration for a specific Ethernet interface.
# net show {domainname | searchdomains}
Display the domain name or search domains used for email sent by a Data Domain system.
# net show dns
Display a list of DNS servers used by the Data Domain system. The final line in the output shows
if the servers were configured manually or by DHCP.
# net show hardware
Display Ethernet port hardware information.
# net show stats [ipversion {ipv4 | ipv6}] [all | interfaces |
listening | route | statistics]
Display network statistics.
Consult the EMC Data Domain Operating System 5.2 Command Reference Guide for details on using the
net command.
135
Slide 4
136
137
Slide 5
2
3
2.
3.
Hardware
Click Network
Click Settings
The Settings view enables you to manage Network settings in one place without having to execute
multiple commands.
To manage hardware settings, go to the Hardware tab, select the Network tab, then select the Settings
tab. From the Settings tab, you can view and edit the host settings, domain list, host mappings, and DNS
list.
The Network view presents status and configuration information about the system Ethernet interfaces.
It contains the Interfaces view, Settings view, and Routes view.
Use the Hardware > Network > Settings view to view and configure network settings. This includes
network parameters such as the hostname, domain name, search domains, host mapping, and the DNS
list.
138
Host Settings
Host Name: The hostname of the selected Data Domain system.
Domain Name: The fully-qualified domain name associated with the selected Data Domain
system.
Search Domain List
Search Domain: A list of search domains used by the Data Domain system. The Data Domain
system applies the search domain as a suffix to the hostname.
Hosts Mapping
IP Address: IP address of the host to resolve.
Host Name: Hostnames associated with the IP address.
DNS List
DNS IP Address: Current DNS IP addresses associated with the selected Data Domain
system. An asterisk (*) indicates the addresses were assigned through DHCP.
139
Slide 6
Data Domain systems do not generate or respond to any of the network routing management protocols
(RIP, EGRP/EIGRP, and BGP) in any way. The only routing implemented on a Data Domain system is
based on the internal route table, where the administrator may define a specific network or subnet used
by a physical interface (or interface group).
Data Domain systems use source-based routing, which means outbound network packets that match
the subnet of multiple interfaces will be routed over only the physical interface from which they
originated.
In the Routes view, you can view and manage network routes without having to execute many
commands.
140
141
Use the route command to manage routing between a Data Domain system and the backup hosts. An
added routing rule appears in the Kernel IP routing table and in the Data Domain system Route Config
list, a list of static routes that are reapplied at each system boot.
# route show config
Display the configured static routes in the Route Config list.
# route show table [ipversion {ipv4 | ipv6}]
Display all entries in the Kernel IP routing table.
Consult the EMC Data Domain Operating System 5.2 Command Reference Guide for details on using the
route command.
142
Slide 7
143
Slide 8
This lesson covers link aggregation. First you will learn about link aggregation. Then, you will create a
virtual interface for link aggregation.
144
Slide 9
port 1
NIC 2
port 2
link aggregation 2
LAN
port 3
eth0a
port 4
eth1a
Data Domain
Appliance
Application/
Media Server
Using multiple Ethernet network cables, ports, and interfaces (links) in parallel, link aggregation
increases network throughput, across a LAN or LANs, until the maximum computer speed is reached.
Data processing can thus become faster than when data is sent over individual links. For example, you
can enable link aggregation on a virtual interface (veth1) to two physical interfaces (eth0a and eth0b) in
the link aggregation control protocol (LACP) mode and hash XOR-L2. Link aggregation evenly splits
network traffic across all links or ports in an aggregation group. It does this with minimal impact to the
splitting, assembling, and reordering of out-of-order packets.
145
Aggregation can occur between two directly attached systems (point-to-point and physical or virtual).
Normally, aggregation is between the local system and the connected network device or system. A Data
Domain system is usually connected to a switch or router. Aggregation is handled between the IP layer
(L3 and L4) and the mac layer (L2) network driver. Link aggregation performance is impacted by the
following:
Switch speed: Normally the switch can handle the speed of each connected link, but it may lose
some packets if all of the packets are coming from several ports that are concentrated on one
uplink running at maximum speed. In most cases, this means you can use only one switch for
port aggregation coming out of a Data Domain system. Some network topologies allow for link
aggregation across multiple switches.
The quantity of data the Data Domain system can process.
Out-of-order packets: A network program must put out-of-order packets back in their original
order. If the link aggregation mode allows the packets to be sent out of order, and the protocol
requires that they be put back to the original order, the added overhead may impact the
throughput speed enough that the link aggregation mode causing the out-of-order packets
should not be used.
The number of clients: In most cases, either the physical or OS resources cannot drive data at
multiple Gbps. Also, due to hashing limits, you need multiple clients to push data at multiple
Gbps.
The number of streams (connections) per client can significantly impact link utilization
depending on the hashing used.
A Data Domain system supports two aggregation methods: round robin and balance-xor (you set
it up manually on both sides).
Requirements
Links can be part of only one group.
Aggregation is only between two systems.
All links in a group must have the same speed.
All links in a group must be either half-duplex or full-duplex.
No changes to the network headers are allowed.
You must have a unique address across aggregation groups.
Frame distribution must be predictable and consistent.
146
Slide 10
147
10
6. Enter a virtual interface name in the form vethx, where x is a unique ID (typically one or two
digits).
A typical virtual interface name with VLAN and IP alias is veth56.3999.199. The maximum length
of the full name is 15 characters. Special characters are not allowed. Numbers must be between
0 and 9999.
From the General tab, specify the bonding mode by selecting type from the Bonding Type
list.
In this example, aggregate is selected. The registry setting can be different from the bonding
configuration. When you add interfaces to the virtual interface, the information is not sent
to the bonding module until the virtual interface is brought up. Until that time, the registry
and the bonding driver configuration are different. Specify a bonding mode compatible with
the system requirements to which the interfaces are directly attached. The available modes
are:
Round robin: Transmits packets in sequential order from the first available link through the
last in the aggregated group.
Balanced: Sends data over the interfaces as determined by the selected hash method. All
associated interfaces on the switch must be grouped into an EtherChannel (trunk).
LACP: Is similar to Balanced, except for the control protocol that communicates with the
other end and coordinates what links, within the bond, are available. It provides heartbeat
failover.
7. Select an interface to add to the aggregate configuration by clicking the checkbox corresponding
to the interface.
8. Click Next.
The Create Virtual Interface veth name dialog box appears.
148
Slide 11
11
149
13. Select the Manually Configure Speed/Duplex radio button if you want to manually set an
interface data transfer rate.
Duplex options are half-duplex or full-duplex. Speed options are limited to the capabilities of the
hardware. Ensure that all of your network components support the size set with this option.
Optionally select Dynamic Registration (also called DDNS). The dynamic DNS (DDNS) protocol
enables machines on a network to communicate with and register IP addresses on a Data
Domain system DNS server. The DDNS must be registered to enable this option.
14. Click Next.
The Create Virtual Interface Settings summary appears.
15. Ensure that the values listed are correct.
16. Click Finish.
17. Click OK.
Several commands can be used from the command line interface (CLI) to set up and configure link
aggregation on a Data Domain system:
# net aggregate add
Enables aggregation on a virtual interface by specifying the physical interfaces and mode.
Choose the mode compatible with the requirements of the system to which the ports are
attached.
#net aggregate del
Deletes interfaces from the physical list of the aggregate virtual interfaces.
#net aggregate modify
Changes the aggregation configuration on a virtual interface by specifying the physical interfaces
and mode. Choose the mode compatible with the requirements of the system to which the
ports are directly attached.
#net aggregate reset
Removes all physical interfaces from an aggregate virtual interface.
#net aggregate show
Displays basic information on the aggregate setup.
Consult the EMC Data Domain Operating System 5.2 Command Reference Guide for details on using the
net aggregate commands.
150
Slide 12
151
12
Slide 13
13
This lesson covers link failover. First you will learn what link failover does and then you will learn how to
create a virtual interface for link failover on a Data Domain system.
152
Slide 14
Application/
Media Server
14
A virtual interface may include both physical and virtual interfaces as members (called interface group
members).
Link failover improves network stability and performance by keeping backups operational during
network glitches.
Link failover is supported by a bonding driver on a Data Domain system. The bonding driver checks the
carrier signal on the active interface every 0.9 seconds. If the carrier signal is lost, the active interface is
changed to another standby interface. An address resolution protocol (ARP) is sent to indicate that the
data must flow to the new interface. The interface can be:
On the same switch
On a different switch
Directly connected
153
Specifications
Only one interface in a group can be active at a time.
Data flows over the active interface. Non-active interfaces can receive data.
You can specify a primary interface. If you do specify a primary interface, it is the active
interface if it is available.
Bonded interfaces can go to the same or different switches.
You do not have to configure a switch to make link failover work.
For a 1 GbE interface, you can put two, or more interfaces in a link failover bonding group.
The bonding interfaces can be:
On the same card
Across cards
Between a card and an interface on the motherboard
Link failover is independent of the interface type. For example, copper and optical can be
failover links if the switches support the connections.
For a 10 GbE interface, you can put only two interfaces in a failover bonding group.
154
Slide 15
155
15
Slide 16
16
156
157
Slide 17
1.
2.
3.
4.
Click Hardware
Click Network
Click Interfaces
Select Yes or No
from the
Enabled menu
for the
appropriate
interface
158
17
Slide 18
159
18
Slide 19
19
This lesson covers virtual local area network (VLAN) and internet protocol (IP) alias interfaces. First, you
will learn more about these interfaces and how they differ. Then, you will learn how to enable and
disable them using the Enterprise Manager.
160
Slide 20
20
Virtual local area networks (VLANs) manage subnets on a network. VLANs enable a LAN to bypass router
boundaries. IP aliases do the same thing.
Virtual local area network (VLAN) and internet protocol (IP) network interfaces are used to:
Segregate network broadcasting
Provide network security
Segregate network traffic
Speed up network traffic
Organize a network
161
Slide 21
IP Address
IP Address
IP Address
.
.
.
Corporate Network
IP
Alias
IP
Alias
IP Address
IP Address
IP Address
.
.
.
IT
HR
VLAN 100
VLAN
VLAN 200
Subnet
192.168.11.X
Subnet
10.10.10.X
IP aliases are easy to implement and are less expensive than VLANs
You can combine VLANs and IP aliases
Module 3: Managing Network Interfaces
21
If you are not using VLANs, you can use IP aliases. IP aliases are easy to implement and are less
expensive than VLANs, but they are not true VLANs. For example, you must use one IP address for
management and another IP address to back up or archive data. You can combine VLANs and IP aliases.
162
Slide 22
22
163
164
Slide 23
23
You can create a new IP Alias interface from a physical interface, a virtual interface, or a VLAN. When
you do this, you are telling the interface the IP Subnet(s) to which it belongs. This is done because the
switch/router may be connected to many networks, and you want the most direct path to the Data
Domain system.
The recommended total number of IP Aliases, VLAN, physical, and virtual interfaces that can exist on the
system is 80, although it is possible to have up to 100 interfaces.
1. From the Navigation pane, select the Data Domain system to configure.
2. Click the Hardware > Network > Interfaces tabs.
3. Click the Create menu and select the IP Alias option.
The Create IP Alias dialog box appears.
4. Specify an IP Alias ID by entering a number in the eth0a field.
Requirements are: 1 to 4094 inclusive.
5. Enter an IP Address.
The Internet Protocol (IP) Address is the numerical label assigned to the interface. For example,
192.168.10.23
165
166
Slide 24
Module 3: Summary
167
24
168
Slide 1
This module focuses on connecting to a Data Domain appliance using the CIFS and NFS protocols.
169
Slide 2
Lesson 1: CIFS
This lesson covers the following topics:
Data Access for CIFS
Enabling CIFS Services
CIFS Authentication
Creating a CIFS Share
Accessing a CIFS Share
Monitoring CIFS
In many cases, as part of the initial Data Domain system configuration, CIFS clients were configured to
access the ddvar and MTree directories. This module describes how to modify these settings and how to
manage data access using the Enterprise Manager and cifs command.
This lesson covers the following topics:
Data Access for CIFS
Enabling CIFS Services
Creating a CIFS Share
Accessing a CIFS Share
Monitoring CIFS
170
Slide 3
The Common Internet File System (CIFS) clients can have access to the system directories on the Data
Domain system. The /data/col1/backup directory is the default destination directory for
compressed backup server data. The /ddvar directory contains Data Domain system core and log files.
Clients, such as backup servers that perform backup and restore operations with a Data Domain System,
at the least, need access to the /data/col1/backup directory. Clients that have administrative
access need to be able to access the /ddvar directory to retrieve core and log files.
The Common Internet File System (CIFS) operates as an application-layer network protocol. It is mainly
used for providing shared access to files, printers, serial ports, and miscellaneous communication
between nodes on a network.
When you configure CIFS, your Data Domain system is able to communicate with MS Windows.
171
172
Slide 4
1. Click Data
2.
Management
Click CIFS
After configuring client access, enable CIFS services, which allow the client to access the system using
the CIFS protocol.
1. For the Data Domain system selected in the Enterprise Manager Navigation pane, click Data
Management > CIFS.
2. In the CIFS Status area, click Enable.
The hostname for the Data Domain system that serves as the CIFS server was set during the systems
initial configuration.
A Data Domain systems hostname should match the name assigned to its IP address, or addresses, in
the DNS table. Otherwise, there might be problems when the system attempts to join a domain, and
authentication failures can occur. If you need to change the Data Domain systems hostname, use the
net set hostname command, and also modify the systems entry in the DNS table.
When the Data Domain system acts as a CIFS server, it takes the hostname of the system. For
compatibility, it also creates a NetBIOS name. The NetBIOS name is the first component of the hostname
in all uppercase letters. For example, the hostname jp9.oasis.local is truncated to the NetBIOS name JP9.
The CIFS server responds to both names.
173
From the command line, you can use the cifs enable command to enable CIFS services.
# cifs enable
Enable the CIFS service and allow CIFS clients to connect to the Data Domain system.
For complete information about the cifs enable command, see the DD OS 5.2 Command Reference
Guide.
174
Slide 5
CIFS Authentication
The Enterprise Manager Configure Authentication dialog box allows you to set the authentication
parameters that the Data Domain system uses for working with CIFS.
The Data Domain system can join the active directory (AD) domain or the NT4 domain, or be part of a
workgroup (the default). If you did not use the Enterprise Managers Configuration Wizard to set the
join mode, use the procedures in this section to choose or change a mode.
The Data Domain system must meet all active-directory requirements, such as a clock time that differs
no more than five minutes from that of the domain controller.
The workgroup mode means that the Data Domain system authenticates CIFS clients using local user
accounts defined on the Data Domain system.
175
You can also set authentication for CIFS shares using the command line interface (CLI):
# cifs set authentication active-directory <realm> { [<dc1> [<dc2>
...]] | * }
Set authentication to the Active Directory. The realm must be a fully qualified name. Use
commas, spaces, or both to separate entries in the domain controller list. Security officer
authorization is required for systems with Retention Lock Compliance.
Note: Data Domain recommends using the asterisk to set all controllers instead of entering
them individually.
When prompted, enter a name for a user account. The type and format of the name depend on
whether the user is inside or outside the company domain.
For user Administrator inside the company domain, enter the name only: administrator.
For user JaneDoe in a non-local, trusted domain, enter the username and domain:
jane.doe@trusteddomain.com. The account in the trusted domain must have permission to
join the Data Domain system to your company domain.
If DDNS is enabled, the Data Domain system automatically adds a host entry to the DNS server.
It is not necessary to create the entry manually when DDNS is enabled.
If you set the NetBIOS hostname using the command cifs set nb-hostname, the entry is
created for the NetBIOS hostname only, not the system hostname. Otherwise, the system
hostname is used.
# cifs set authentication workgroup <workgroup>
Set the authentication mode to workgroup for the specified workgroup name.
For complete information about the cifs set authentication command, see the DD OS 5.2
Command Reference Guide.
176
Slide 6
When creating shares, you must assign client access to each directory separately and remove access
from each directory separately. For example, a client can be removed from /ddvar and still have
access to /data/col1/backup.
Note: If Replication is to be implemented, a Data Domain system can receive backups from both CIFS
clients and NFS clients as long as separate directories are used for each. Do not mix CIFS and NFS data in
the same directory.
To share a folder using the CIFS protocol on a Data Domain system:
1. From the Navigational pane, select a Data Domain system to configure shares.
2. Click Data Management > CIFS tabs to navigate to the CIFS view.
3. Ensure authentication has been configured.
4. On the CIFS client, set shared directory permissions or security options.
5. On the CIFS view, click the Shares tab.
6. Click Create.
The Create Shares dialog box appears.
177
178
Slide 7
From a Windows Client, you can access CIFS shares on a Data Domain system either from a Windows
Explorer window or at the DOS prompt (Run menu).
From a Windows Explorer window:
1. Select Map Network Drive
2. Select a Drive letter to assign the share
3. Enter the DD system to connect to and the share name (\\<DD_Sys>\<Share>), for
example, \\host1\backup
4. Check the box Connect using a different username, if necessary
5. Click Finish
If Connect using a different username was checked, you will be prompted for your Data Domain
username and password.
From the DOS Prompt or Run menu, enter:
> net use drive: \\<DD_Sys>\<Share> /USER:<DD_Username>
You will be prompted for the password to your Data Domain user account.
179
180
Slide 8
Monitoring CIFS
The CIFS tab of the Data Domain Enterprise Manager provides information about the configuration and
status of CIFS shares.
Easily viewable are the number of open connections, open files, connection limit and open files limit per
connection. Click the Connection Details link to view the details about active connections to the CIFS
shares.
181
You can also use the command line interface (CLI) to view details and statistics about CIFS shares.
# cifs show active
Display all active CIFS clients.
# cifs show clients
Display all allowed CIFS clients for the default /ddvar administrative share and the default
/backup data share.
# cifs show config
Display the CIFS configuration.
# cifs show detailed-stats
Display statistics for every individual type of SMB operation, display CIFS client statistics, and
print a list of operating systems with their client counts.
The list counts the number of different IP addresses connected from each operating system. In
some cases, the same client may use multiple IP addresses.
Output for CIFS Client Type shows Miscellaneous clients, where Yes means the displayed list of
clients is incomplete. No means the list is complete, and Maximum connections, where the
value is the maximum number of connections since the last reset.
# cifs show stats
Show CIFS statistics.
For complete information about the cifs show command, see the DD OS 5.2 Command Reference
Guide.
182
Slide 9
183
Slide 10
Lesson 2: NFS
This lesson covers the following topics:
NFS Exports
Configuring NFS
Monitoring NFS
This lesson covers the configuration and monitoring of NFS exports on a Data Domain system.
184
10
Slide 11
NFS Exports
Network File System (NFS) clients can have access to the system
directories or MTrees on the Data Domain system.
11
The Network File System (NFS) is a distributed file system protocol originally developed by Sun
Microsystems in 1984. It allows a user on a client computer to access files over a network in a manner
similar to how local storage is accessed. NFS, like many other protocols, builds on the Open Network
Computing Remote Procedure Call (ONC RPC) system. The Network File System is an open standard
defined in RFCs, allowing anyone to implement the protocol.
Network File System (NFS) clients can have access to the system directories or MTrees on the Data
Domain system.
/backup is the default destination for non-MTree compressed backup server data.
The /data/col1/backup path is the root destination when using MTrees for compressed
backup server data.
The /ddvar directory contains Data Domain System core and log files.
Clients, such as backup servers that perform backup and restore operations with a Data Domain System,
need access to the /backup or /data/col1/backup areas. Clients that have administrative access
need access to the /ddvar directory to retrieve core and log files.
185
Slide 12
Configuring NFS
1
1. Click Data
2.
3.
4.
Management
Click NFS
Click Create
Click + to add
clients
12
186
187
Slide 13
Monitoring NFS
1
1.
2.
3.
Click Data
Management
Click the NFS tab
Click Active
Clients
13
You can use the Data Domain Enterprise Manager to monitor NFS client status and NFS configuration:
1. Click Data Management
2. Click the NFS tab
The top pane shows the operational status of NFS, for example, NFS is currently active and
running.
188
You can also use the command line interface (CLI) to monitor NFS client status and statistics.
# nfs show active
List clients active in the past 15 minutes and the mount path for each. Allow all NFS-defined
clients to access the Data Domain system.
# nfs show clients
List NFS clients allowed to access the Data Domain system and the mount path and NFS options
for each.
# nfs show detailed-stats
Display NFS cache entries and status to facilitate troubleshooting.
# nfs show histogram
Display NFS operations in a histogram. Users with user role permissions may run this command.
# nfs show port
Display NFS port information. Users with user role permissions may run this command.
# nfs show stats
Display NFS statistics.
# nfs status
Enter this option to determine if the NFS system is operational. When the file system is active
and running, the output shows the total number of NFS requests since the file system started, or
since the last time the NFS statistics were reset.
For complete information about the nfs commands, see the DD OS 5.2 Command Reference Guide.
189
Slide 14
190
14
Slide 15
Module 4: Summary
191
15
192
Slide 1
In this module, you will learn about managing data with a Data Domain system.
Describe and configure MTrees
Describe and perform snapshots
Describe and perform a fast copy
Describe and perform file system cleaning
Describe file system space usage
193
Slide 2
This lesson covers configuring and monitoring MTrees for storing backups within a Data Domain file
system. Topics include:
MTree use and benefits
Soft and hard MTree quotas
You will have a chance to configure MTrees, as well as set and monitor quotas on a Data Domain system
in a structured lab.
194
Slide 3
MTrees
/backup/
/data/
/hr
/col1/
/backup
/sales
All subdirectories are
subject to the same
permissions, policies and
reporting.
/hr
/sales
Each MTree can
be managed
individually
MTrees (Management Trees) are used to provide more granular management of data so different types
of data, or data from different sources, can be managed and reported on, separately. Various backup
operations are directed to individual MTrees. For example, you can configure directory export levels and
quotas to separate and manage backup files by department.
Before MTrees were implemented, subdirectories under a single /backup directory were created to
keep different types of data separate. Data from different sources, departments, or locales were backed
up to separate subdirectories under /backup but all subdirectories were subject to the same
permissions, policies, and reporting.
With MTrees enabled, data can now be backed up to separately managed directory trees, MTrees. A
static MTree, /backup, is still created by the file system, but cannot be removed or renamed.
Additional MTrees can be configured by the system administrator under /data/col1/ (col stands for
collection). You can still create a subdirectory under any MTree, but it will be subject to the same
permissions, policies, and reporting as the MTree in which it resides.
195
Slide 4
Benefits of MTrees
reporting by MTree
Independent replication
scheduling MTree replication
Independent snapshot schedules
MTree-specific retention lock
MTree-specific compression types
Limit logical space used by
specific MTree - quotas
/data/
/col1/
/backup
/hr
/sales
Increased granular reporting of space and deduplication rates in the case you might have different
departments or geographies backing up to the same Data Domain system, each department or
geography could have their own independent storage location each with different choices for
compression, and replication.
The term, snapshot, is a common industry term denoting the ability to record the state of a storage
device or a portion of the data being stored on the device, at any given moment, and to preserve that
snapshot as a guide for restoring the storage device, or portion thereof. Snapshots are used extensively
as a part of the Data Domain data restoration process. With MTrees, snapshots can be managed at a
more granular level.
Retention lock is an optional feature used by Data Domain systems to securely retain saved data for a
given length of time and protecting it from accidental or malicious deletion. Retention lock feature can
now be applied at the MTree level.
Another major benefit is to limit the logical, pre-comp, space used by the specific MTree through
quotas.
196
Slide 5
MTrees
Data Domain systems support up to 100
MTrees.
More than 14 simultaneous MTrees
engaged in read or write streams will
degrade performance.
Nothing can be added to the
/data/ directory.
/data/, /data/col1/, and
/data/col1/backup cannot be deleted or
renamed.
MTrees are only created under
/data/col1/
Subdirectories can still be created under
/data/col1/backup.
Subdirectories can be created within
user-created MTrees. Reporting is
cumulative for the entire Mtree.
/data/
/col1/
/backup
/hr
/sales
Although a Data Domain system supports a maximum of 100 MTrees, system performance might
degrade rapidly if more than 14 MTrees are actively engaged in read or write streams. The degree of
degradation depends on overall I/O intensity and other file system loads. For optimum performance,
constrain the number of simultaneously active MTrees to a maximum of 14. Whenever possible,
aggregate operations on the same MTree into a single operation.
Regular subdirectories can be configured under /data/col1/backup as allowed in prior versions of
DDOS. Subdirectories can also be configured under any other configured MTree. Although you can
create additional directories under an MTree, the Data Domain system recognizes and reports on the
cumulative data contained within the entire MTree.
You cannot add data or directories to /data or /col1. You can add MTrees only to /col1/data.
/col1, and /backup cannot be deleted or renamed.
197
Slide 6
/data/
/col1/
/backup
/hr
/sales
NFS and CIFS can access /data and all of the MTrees beneath /col1 by configuring normal CIFS
shares and NFS exports.
VTL and DD Boost have special storage requirements within the MTree structure and are discussed in
later modules.
198
Slide 7
MTree Quotas
MTree quotas allow you to set limits on the amount of logical, pre-comp space used by individual
MTrees. Quotas can be set for MTrees used by CIFS, NFS, VTL, or DD BOOST data.
There are two types of quotas:
Soft limit: When this limit is reached, an alert is generated through the system, but operations
continue as normal.
Hard limit: When this limit is reached, any data in the process of backup to this MTree fail. An
alert is also generated through the system, and an out of space error (EMOSP for VTL) is
reported to the backup app. In order to resume backup operations after data within an MTree
reaches a hard limit quota, you must either delete sufficient content in the MTree, increase the
hard limit quota, or disable quotas for the MTree.
You can set a soft limit, a hard limit, or both soft and hard limits. Quotas work using the amount of
logical space (pre-comp, not physical space) allocated to an individual MTree. The smallest quota that
can be set is 1 MiB.
199
An administrator can set the storage space restriction for an MTree to prevent it from consuming excess
space. The Data Management > Quota page shows the administrator how many MTrees have no soft or
hard quotas set, and for MTrees with quotas set, the percentage of pre-compressed soft and hard limits
used.
The entire quota function is enabled or disabled from the Quota Settings window.
Quotas for existing MTrees are set by selecting the Configure Quota button.
200
Slide 8
201
202
Slide 9
When the MTree is created, it appears in the list of MTrees alphabetically by name.
As data fills the MTree, Data Domain Enterprise Manager will display graphically and by percentage the
quota hard limit. You can view this display at Data Management > MTree. The MTree display presents
the list of MTrees, quota hard limits, daily and weekly pre-comp and post-comp amounts and ratios.
203
Slide 10
10
Scroll further down the MTree tab and you see three additional tabs: Summary, Space Usage, and Daily
Written.
Selecting an MTree from the list will display a summary of that MTree. In the Summary tab you can also
rename the MTree, adjust the quotas, and create an NFS export.
The Space Usage tab displays a graph representing the amount of space used in the selected MTree over
the selected duration (7, 30, 60, or 120 days).
Click the Daily Written tab, and you see a graph depicting the amount of space written in the selected
MTree over a selected duration (7, 30, 60, or 120 days).
Note: You must have the most current version of Adobe Flash installed and enabled with your web
browser in order to view these reports.
The related pre-, post-, and total compression factors over the same time period are also reported.
204
Slide 11
11
Data Domain systems not only provide improved control over backups using MTrees, the system also
provides data monitoring at the MTree level.
Under Data Management > MTree is a summary tab that provides an at-a-glance view of all configured
MTrees, their quota hard limits (if set), pre- and post-comp usage, as well as compression ratios for the
last 24 hours, the last 7 days, and current weekly average compression.
Select an Mtree, and the Summary pane presents current information about the selected MTree.
Note: The information on this summary page is delayed by at least 10 minutes.
205
Slide 12
12
If a quota-enabled MTree fills with data, the system will generate soft and hard limit alerts when a soft
or hard limit in a specific MTree is reached.
Soft limit: When this limit is reached, an alert is generated through the system, but operations
continue as normal.
Hard limit: When this limit is reached, any data in the process of backup to this MTree fail. An
alert is also generated through the system, and an out of space error (EMOSP for VTL) is
reported to the backup app. In order to resume backup operations after data within an MTree
reaches a hard limit quota, you must delete sufficient content in the MTree, increase the hard
limit quota, or disable quotas for the MTree.
These alerts are reported in the Data Domain Enterprise Manager > Status > Summary > Alerts pane in
the file system alerts. Details are reported in the Status > Alerts > Current Alerts and Alerts History tabs.
When an alert is reported, you will see the status as posted. After the alert is resolved, you will see the
status as cleared.
206
Slide 13
13
A Data Domain system provides control through individual MTree organization. You can also monitor
system usage at the same MTree level.
Under Data Management > MTree you find a summary tab providing an at-a-glance view of all
configured MTrees, their quota hard limits (if set), pre- and post-comp usage, as well as compression
ratios for the last 24 hours, the last 7 days, and current weekly average compression.
Below the list of MTrees, the MTree Summary pane shows at-a-glance the settings associated with the
selected MTree. In this pane, you can also perform the following on the selected MTree:
Rename the MTree
Configure quotas, hard and soft
Create an NFS export
207
On the same display below the summary pane, you can also find panes that monitor MTree replication,
snapshots and retention lock for the selected MTree. This course covers the MTree replication pane and
the retention lock pane in a later module.
You can control the snapshot schedules associated with the selected MTree. You can also see at-aglance, the total number of snapshots collected, expired, and unexpired, as well as the oldest, newest,
and next scheduled snapshot.
208
Slide 14
Soft-Limit (MiB)
---------------none
1500
200
750
1000
2000
----------------
Hard-Limit (MiB)
---------------none
2000
250
1000
2000
3000
----------------
14
The reports shown in Data Management > MTree are delayed at least fifteen minutes. Real time
reporting is available only through the command line interface (CLI) using the quota show command.
As data transfers to any MTree, you can use quota show all to view a nearly instant update of the
pre-comp amount change.
In this example, /data/col1/HR has exceeded the soft-limit and nearly reached the hard-limit.
209
Slide 15
Class
Object
----------
-------------------
Filesystem
MTree=/data/col1/HR
----------
-------------------
15
After an MTree exceeds the value set as a soft-limit quota, the Data Domain system generates an alert
warning.
In this example, /data/col1/HR has exceeded the soft-limit and the system has generated the alert
warning.
From the command line, you can review current alerts by issuing the alerts show current command. In
this case, there is only one current system alert showing that /data/col1/HR has reached its quota
soft limit.
In the Data Domain Enterprise Manager, you can view alerts by clicking Status > Alerts > Current Alerts
There are three ways to clear a quota limit alert: remove data stored in the MTree, increase the quota
limit, or turn quota limits off.
210
211
Slide 16
212
16
Slide 17
17
This lesson covers snapshot operations and their use in a Data Domain file system. Topics include:
Snapshot definition, use, and benefits
Basic snapshot operations: creation, schedule, and expiration
You will have a chance to configure and create a snapshot on a Data Domain system in a structured lab.
213
Slide 18
What is a Snapshot?
/data/
/data/
/col1/
/col1/
/backup
/backup
/HR
/HR
/sales
/sales
/support
/support
18
Snapshot is a common industry term denoting the ability to record the state of a storage device or a
portion of the data being stored on the device, at any given moment, and to preserve that snapshot as a
guide for restoring the storage device, or portion thereof. A snapshot primarily creates a point-in-time
copy of the data. Snapshot copy is done instantly and made available for use by other applications such
as data protection, data analysis and reporting, and data replication applications. The original copy of
the data continues to be available to the applications without interruption, while the snapshot copy is
used to perform other functions on the data.
Snapshots provide an excellent means of data protection. The trend towards using snapshot technology
comes from the benefits that snapshots deliver in addressing many of the issues that businesses face.
Snapshots enable better application availability, faster recovery, and easier back up management of
large volumes of data.
214
Snapshot benefits:
Snapshots initially do not use many system resources.
Note: Snapshots will continue to place a hold on all data they reference even when the backups
have expired.
Snapshots save a read-only copy of a designated MTree at a specific point in time.
Snapshots are useful for saving a copy of MTrees at specific points in time for instance, before
a Data Domain OS upgrade which can later be used as a restore point if files need to be
restored from that specific point in time. Use the snapshot command to take an image of an
MTree, to manage MTree snapshots and schedules, and to display information about the status
of existing snapshots.
You can schedule multiple snapshot schedules at the same time or create them individually as
you choose.
The maximum number of snapshots allowed to be stored on a Data Domain system is 750 per MTree.
You will receive a warning when the number of snapshots reaches 90% of the allowed number (675-749)
in a given MTree. An alert is generated when you reach the maximum snapshot count.
215
Slide 19
What is a Snapshot?
Snapshot of
the production
file
Production file
/HR
/HR
19
A snapshot saves a read-only copy of the designated MTree at a specific point in time where it can later
be used as a restore point if files need to be restored from that specific point in time.
In a snapshot, only the pointers to the production data being copied are recorded at a specific point in
time. In this case, 22:24 GMT. The copy is extremely quick and places extremely little load on the
production systems to copy this data.
216
Slide 20
What is a Snapshot?
Modified
production
file
Snapshot of the
un-modified file
/HR
/HR
20
When production data is changed, additional blocks are written, and pointers are changed to access the
changed data. The snapshot maintains pointers to the original, point-in-time data. All data remains on
the system as long as pointers reference the data.
Snapshots are a point-in-time view of a file system. They can be used to recover previous versions of
files, and also to recover from an accidental deletion of files.
217
Slide 21
original copy
snapshot copy
/HR
/HR
21
As an example, snapshots for the MTree named backup are created in the system directory
/data/col1/backup/.snapshot. Each directory under /data/col1/backup also has a
.snapshot directory with the name of each snapshot that includes the directory. Each MTree has the
same type of structure, so an MTree named HR would have a system directory
/data/col1/HR/.snapshot, and each subdirectory in /data/col1/HR would have a
.snapshot directory as well.
Use the snapshot feature to take an image of an MTree, to manage MTree snapshots and schedules, and
to display information about the status of existing snapshots.
Note: If only /data is mounted or shared, the .snapshot directory is not visible. The .snapshot directory
is visible when the MTree itself is mounted.
218
Slide 22
Snapshot Operations
22
To create a Snapshot:
1. Go to Data Management > Snapshots
Select the MTree from the dropdown list.
If snapshots are listed, you can search by using a search term in the Filter By Name or Year
field.
You can modify the expiration date, rename a snapshot or immediately expire any number
of selected snapshots from the Snapshots pane.
2. Click Create. A snapshot Create dialog appears.
3. Name the snapshot, and set an expiration date. If you do not set a date, the snapshot will not
release the data to which it is pointing until you manually remove the snapshot.
You can perform modify, rename, and delete actions using the same interface in the Snapshots
tab.
219
220
Slide 23
221
23
222
Slide 24
24
Immediately below the MTree list, in the summary pane, you can view the Snapshot pane that monitors
snapshots for the selected MTree.
The Snapshots pane in the MTree summary page allows you to see at-a-glance, the total number of
snapshots collected, expired, and unexpired, as well as the oldest, newest, and next scheduled snapshot
within a given MTree.
You can associate configured snapshot schedules with the selected MTree name. Click Assign Snapshot
Schedules, select a schedule from the list of snapshot schedules and click okay to assign it. You can
create additional snapshot schedules if needed.
223
Slide 25
224
25
Slide 26
26
This lesson covers fast copy operations and their use in a Data Domain file system. Topics include:
Fast copy definition, use, and benefits.
Basic fast copy operations: creation, schedule, and expiration.
You will have a chance to configure and create a fast copy on a Data Domain system in a structured lab.
225
Slide 27
Fast Copy
/backup
/recovery
Destination:
10-31-2012
/hr
/.snapshot
/data/col1/backup/recovery
Source:
/data/col1/hr/.snapshot/10-31-2012
10-31-2012
10-15-2012
/support
27
Fast copy is a function that makes an alternate copy of your backed up data on the same Data Domain
system. Fast copy is very efficient at making duplicate copies of pointers to data by using the DD OS
snapshot function with only 1% to 2% of overhead needed to write data pointers to the original data.
Sometimes, access to production backup data is restricted. Fast copy gives access to all data fast copied
readable and writeable, making this operation handy for data recovery from backups.
The difference between snapshots and fast copied data is that the fast copy duplicate is not a point-intime duplicate. Any changes that are made during the data copy, in either the source or the target
directories, will not be duplicated in the fast copy.
Note that fast copy is a read/write copy of a point-in-time copy at the time it was made while a snapshot
is read only.
226
Fast copy makes a copy of the pointers to data segments and structure of a source to a target directory
on the same Data Domain system. You can use the fast copy operation to retrieve data stored in
snapshots. In this example, the /hr MTree contains two snapshots in the /.snapshot directory. One
of these snapshots, 10-31-2012, is fast copied to /backup/recovery. Only pointers to the actual
data are copied, adding a 1% to 2% increase in actual used data space. All of the referenced data is
readable and writable. If the /hr MTree or any of its contents is deleted, no data referenced in the fast
copy is deleted from the system.
227
Slide 28
28
228
Slide 29
29
The fast copy operation can be used as part of a data recovery workflow using a snapshot. Snapshot
content is not viewable from a CIFS share or NFS mount, but a fast copy of the snapshot is fully
viewable. From a fast copy on a share or a mount, you can recover lost data without disturbing normal
backup operations and production files.
Fast copy makes a destination equal to the source, but not at a particular point in time. The source and
destination may not be equal if either is changed during the copy operation.
This data must be manually identified and deleted to free up space. Then, space reclamation (file system
cleaning) must be run to regain the data space held by the fast copy. When backup data expires, a fast
copy directory will prevent the Data Domain system from recovering the space held by the expired data
because it is flagged by the fast copy directory as in-use.
229
Slide 30
230
30
Slide 31
This lesson covers Data Domain file system cleaning, also called garbage collection.
Topics include:
The purpose and use of file system cleaning.
Scheduling, configuring, and running the file system cleaning operation.
You will have a chance to configure and run file system cleaning on a Data Domain system in a
structured lab at the end of this lesson.
231
31
Slide 32
Application
host
A
B
C
D
E
Expired data
container 1
When the backup application expires data, the related file segments are
marked on the Data Domain system for deletion. No data is deleted until file
system cleaning is run.
Module 5: File System and Data Management
32
When your backup application (such as NetWorker or NetBackup) expires backups, the associated data
is marked by the Data Domain system for deletion. However, the expired data is not deleted
immediately by the Data Domain system; it is removed during the cleaning operation. While the data is
not immediately deleted, the path name is. This results in unclaimed segment space that is not
immediately available.
File system cleaning is the process by which storage space is reclaimed from stored data that is no
longer needed. For example, when retention periods on backup software expire, the backups are
removed from the backup catalog, but space on the Data Domain system is not recovered until file
system cleaning is completed.
Depending on the amount of space the file system must clean, file system cleaning can take from several
hours to several days to complete. During the cleaning operation, the file system is available for all
normal operations including backup (write) and restore (read).
Although cleaning uses a significant amount of system resources, cleaning is self-throttling and gives up
system resources in the presence of user traffic.
232
Slide 33
Cleaning Process
container 1
unclaimed segment
Reorganized valid
data segments
valid segment
container 1
Reclaimed space is
appended back onto
available disk space in
new, empty
containers
33
Data invulnerability requires that data be written only into new, empty containers data already written
in existing containers cannot be overwritten. This requirement also applies to file system cleaning.
During file system cleaning, the system reclaims space taken up by expired data so you can use it for
new data.
The example in this figure refers to dead and valid segments. Dead segments are segments in containers
no longer needed by the system, for example, claimed by a file that has been deleted and was the
only/or final claim to that segment, or any other segment/container space deemed not needed by the
file system internally. Valid segments contain unexpired data used to store backup-related files. When
files in a backup are expired, pointers to the related file segments are removed. Dead segments are not
allowed to be overwritten with new data since this could put valid data at risk of corruption. Instead,
valid segments are copied forward into free containers to group the remaining valid segments together.
When the data is safe and reorganized, the original containers are appended back onto the available
disk space.
Since the Data Domain system uses a log structured file system, space that was deleted must be
reclaimed. The reclamation process runs automatically as a part of file system cleaning.
233
During the cleaning process, a Data Domain system is available for all normal operations, to include
accepting data from backup systems.
Cleaning does require a significant amount of system processing resources and might take several hours,
or under extreme circumstances days, to complete even when undisturbed. Cleaning applies a set
processing throttle of 50% when other operations are running, sharing the system resources with other
operations. The throttling percentage can be manually adjusted up or down by the system
administrator.
File system cleaning can be scheduled to meet the needs of your backup plan. The default time schedule
is set to run every Tuesday at 6 a.m. The default CPU throttle is 50%. This setting applies half of the CPU
resources with the cleaning process and half with all of the other processes. Increasing the throttle
amount, increases the resources dedicated to the cleaning process and decreases resources available to
other running processes.
234
Slide 34
34
Using the Data Domain Enterprise Manager, navigate to Data Management > File System > Start
Cleaning.
This action begins an immediate cleaning session.
A window displays an informational alert describing the possible performance impact during cleaning,
and a field to set the percentage of throttle for the cleaning session.
235
Slide 35
35
Schedule file system cleaning to start when the period of high activity ends, and the competition for
resources is minimal or non-existent.
To schedule file system cleaning using the Data Domain Enterprise Manager, navigate to Data
Management > File System > Configuration > Clean Schedule.
You see a window with three options for scheduling file system cleaning:
Default: Tuesday at 6 a.m. with 50% throttle.
Note: The throttle setting affects cleaning only when the system is servicing other user requests.
When there are no user requests, cleaning always runs at full throttle. For example, if throttle is
set to 70%, the system uses 100% of the system resources and throttles down to 70% of
resources when the system is handling other user requests.
No Schedule: The only cleaning that occurs would be manually initiated.
Custom Clean Schedule: Configurable with weekly-based or monthly-based settings.
Every day or selected days of the week on the schedule will run cleaning at the same time on the given
days.
Click OK to set the schedule you have selected.
236
237
Slide 36
running processes.
Taking the file system offline for any reason stops the cleaning process.
Cleaning does not automatically resume after the file system restarts
until the next cleaning cycle.
Encryption and gz compression increases cleaning process time.
All pointers to data, including snapshots and fast copies, and pending
replication must be removed before that data can be a candidate for
cleaning.
Overly frequent cleaning can cause poor deduplication and increased
file fragmentation.
Cleaning might cause replication to lag.
Run cleaning after the first full backup to increase the compression
factor.
Module 5: File System and Data Management
36
238
Daily file system cleaning is not recommended as overly frequent cleaning can lead to increased
file fragmentation. File fragmentation can result in poor data locality and, among other things,
higher-than-normal disk utilization. If the retention period of your backups is short, you might
be able to run cleaning more often than once weekly. The more frequently the data expires, the
more frequently file system cleaning can operate. Work with EMC Data Domain Support to
determine the best cleaning frequency under unusual circumstances.
If your system is growing closer to full capacity, do not change the cleaning schedule to increase
cleaning cycles. A higher frequency of cleaning cycles might reduce the deduplication factor,
thus reducing the logical capacity on the Data Domain system and causing more space to be
used by the same data stored.
Instead, manually remove unneeded data or reduce the retention periods set by your backup
software to free additional space. Run cleaning per the schedule after data on the system has
been expired.
If you encounter a system full (100%) or near full (90%) alert, and you are unable to free up
space before the next backup, contact Support as soon as possible.
If cleaning is run during replication operations and replication lags in its process, cleaning may
not be able to complete operations. This condition requires either replication break and resync
after cleaning has completed or allowing replication to catch up (for example, increasing
network link speed or writing less new data to the source directory).
Note: It is good practice to run a cleaning operation after the first full backup to a Data Domain system.
The initial local compression on a full backup is generally a factor of 1.5 to 2.5. An immediate cleaning
operation gives additional compression by another factor of 1.15 to 1.2 and reclaims a corresponding
amount of disk space.
239
Slide 37
240
37
Slide 38
38
This lesson covers how to monitor Data Domain file system space usage.
Topics include:
The factors that affect the rate at which space is consumed on the system.
How to monitor the space used and rate of consumption on the system.
You will have a chance to review space usage, and data consumption reports on a Data Domain system
in a structured lab.
241
Slide 39
39
When a disk-based deduplication system such as a Data Domain system is used as the primary
destination storage device for backups, sizing must be done appropriately. Presuming the correctly sized
system is installed, it is important to monitor usage to ensure data growth does not exceed system
capacity.
The factors affecting how fast data on a disk grows on a Data Domain system include:
The size and number of data sets being backed up. An increase in the number of backups or an
increase in the amount of data being backed-up and retained will cause space usage to increase.
The compressibility of data being backed up. Pre-compressed data formats do not compress or
deduplicate as well as non-compressed files and thus increase the amount of space used on the
system.
The retention period specified in the backup software. The longer the retention period, the
larger the amount of space required.
If any of these factors increase above the original sizing plan, your backup system could easily overrun
its capacity.
There are several ways to monitor the space usage on a Data Domain system to help prevent system full
conditions.
242
Slide 40
243
40
Slide 41
testsystem.test.com (1FA1432305)
41
If you have set your system to send autosupports to EMC Data Domain Support at,
http://my.datadomain.com, you can log in to the site and click My Systems, select from a list of systems
registered for support and view an up-to-the-day plot of your space usage over time. The plot usually
shows up to a years worth of data at a time.
On the plot, you can see data reported by your system through daily autosupports. The plots will show
your pre-compressed, and post-compressed data and the daily compression ratio. This is a valuable tool
to watch longer trends in data growth and compression. You can note when your system took on a
different backup plan and how it impacted the growth rate and compression ratio.
From this same page, you can also view the tabular data used to create the graph, or the autosupports
themselves for a more granular view.
244
Slide 42
42
The File System Summary tab is under the Data Management tab in the Data Domain Enterprise
Manager.
The window displays an easy-to-read dashboard of current space usage and availability. It also provides
an up-to-the-minute indication of the compression factor.
The Space Usage section shows two panes:
The first pane shows the amount of disk space available and used by file system components, based on
the last cleaning.
245
/data:post-comp shows:
Size (GiB): The amount of total physical disk space available for data.
Used: (GiB): The actual physical space used for compressed data. Warning messages go to the
system log, and an email alert is generated when the use reaches 90%, 95%, and 100%. At 100%,
the Data Domain system accepts no more data from backup hosts.
Available (GiB): The total amount of space available for data storage. This figure can change
because an internal index may expand as the Data Domain system fills with data. The index
expansion takes space from the Avail GiB amount.
Cleanable (GiB): The estimated amount of space that could be reclaimed if a cleaning operation
were run.
The /ddvar line is the space reserved for system operations such as log files and upgrade tar files. It is
not a part of the data storage total.
The second Space Usage pane shows the compression factors:
Currently Used: The amounts currently in use by the file system.
Written in Last 24 Hours: The compression activity over the last day.
For both of these areas, the following is shown:
Pre-Compression (GiB*): Data written before compression
Post-Compression (GiB*): Storage used after compression
Global-Comp Factor: Pre-Compression / (Size after global compression)
Local-Comp Factor: (Size after global compression) / Post- Compression
Total-Comp Factor: Pre-Compression / Post-Compression
Reduction %: [(Pre-Compression - Post-Compression) / Pre-Compression] * 100
*The gibibyte is a standards-based binary multiple (prefix gibi, symbol Gi) of the byte, a unit of digital
information storage. The gibibyte unit symbol is GiB.[1] 1 gibibyte = 230 bytes = 1073741824bytes = 1024
mebibytes.
Note: It is important to know how these compression statistics are calculated and what they are
reporting to ensure a complete understanding of what is being reported.
Related CLI commands:
# filesys show space
Display the space available to, and used by, file system resources.
# filesys show compression
Display the space used by, and compression achieved for, files and directories in the file system.
246
Slide 43
Pre-Comp Written
Sat Feb 04 2012 12:00 PM
16.9 GiB
43
The Space Usage view contains a graph that displays a visual representation of data usage for the
system.
This view is used to monitor and analyze daily activities on the Data Domain system
Roll over a point on a graph line to display a box with data at that point. (as shown in the slide).
Click Print (at the bottom on the graph) to open the standard Print dialog box.
Click Show in a new window to display the graph in a new browser window.
The lines of the graph denote measurement for:
Pre-comp WrittenThe total amount of data sent to the Data Domain system by backup
servers. Pre-compressed data on a Data Domain system is what a backup server sees as the total
uncompressed data held by a Data Domain system-as-storage unit. Shown with the Space Used
(left) vertical axis of the graph.
Post-comp UsedThe total amount of disk storage in use on the Data Domain system. Shown
with the Space Used (left) vertical axis of the graph.
Comp FactorThe amount of compression the Data Domain system has performed with the
data it received (compression ratio). Shown with the Compression Factor (right) vertical axis of
the graph.
247
The bottom of the screen also displays all three measurements when a point is rolled over on the graph.
Note: In this example, 16.9 GiB was ingested while only 643.5 MiB was used to store the data for a total
compression factor of 26.8x.
The view can be set to various durations between 7 and 120 days.
Related CLI command:
# filesys show compression
Display the space used by, and compression achieved for, files and directories in the file system.
248
Slide 44
Post-Comp
Thur Mar 01 2012 12:00 PM
2.1 GiB
44
The Space Consumption view contains a graph that displays the space used over time, shown in relation
to total system capacity.
With the Capacity option unchecked (see circled on the slide), the scale is reduced from TiB to GiB in
order to present a clear view of space used. In this example, only 2.1 GiB post-comp has been stored
with a 7.5 TiB capacity. See the next slide to see the consumption view with the capacity indicator.
This view is useful to note trends in space availability on the Data Domain system, such as changes in
space availability and compression in relation to cleaning processes.
Roll over a point on a graph line to display a box with data at that point.
Click Print (at the bottom on the graph) to open the standard Print dialog box.
Click Show in a new window to display the graph in a new browser window.
249
250
Slide 45
Capacity
Sun Feb 05 2012 12:00 PM
7.5 TiB
45
When the capacity option is checked, the display scales to TiB, and a line at the maximum capacity of 7.5
TiB appears.
When you roll over the capacity line, an indicator will show the capacity details as shown in this
screenshot.
Notice that at this scale, the 666.0 MiB Post-Comp data mark on February 5, does not show on the
graph.
251
Slide 46
Pre-Comp
Thu Feb 02 2012 12:00 PM
13.7 GiB
46
The Daily Written view contains a graph that displays a visual representation of data that is written daily
to the system over a period of time, selectable from 7 to 120 days. The data amounts are shown over
time for pre- and post-compression amounts.
It is useful to see data ingestion and compression factor results over a selected duration. You should be
able to notice trends in compression factor and ingestion rates.
It also provides totals for global and local compression amounts, and pre-compression and postcompression amounts:
Roll over a point on a graph line to display a box with data at that point.
Click Print (at the bottom on the graph) to open the standard Print dialog box.
Click Show a in new window to display the graph in a new browser window.
252
253
Slide 47
Module 5: Summary
Key points covered in this module include:
MTrees can be configured so that different types of data, or
data from different sources, can be managed and reported on
separately.
You can set limits on the amount of logical, pre-comp, space
used by individual Mtrees using Mtree hard and soft quotas.
Snapshots enable you to save a read-only copy of an MTree at
a specific point in time.
Fast copy gives read/write access to all data fast copied,
making this operation handy for data recovery from snapshots.
254
47
Slide 48
Module 5: Summary
Key points covered in this module include (continued):
The default time scheduled for file system cleaning is every
Tuesday at 6 a.m.
Overly frequent cleaning can cause poor deduplication and
increased file fragmentation.
Use the Space Usage, Consumption, and Daily Written views in
the File System tab to monitor data ingestion and compression
rates over time.
The total compression factor is the pre-compression rate
divided by the post-compression rate.
255
48
256
Slide 1
Replication of deduplicated, compressed data offers the most economical approach to the automated
movement of data copies to a safe site using minimum WAN bandwidth. This ensures fast recovery in
case of loss of the primary data, the primary site or the secondary store.
257
Slide 2
This lesson is an overview of Data Domain replication types and topologies, configuring, and seeding
replication.
258
Slide 3
Replication pair
Ethernet/SAN
Clients
Server
Primary
storage
Source
Network
Destination
Data Domain systems are used to store backup data onsite for a short period such as 30, 60 or 90 days,
depending on local practices and capacity. Lost or corrupted files are recovered easily from the onsite
Data Domain system since it is disk-based, and files are easy to locate and read at any time.
In the case of a disaster that destroys the onsite data, the offsite replica is used to restore operations.
Data on the replica is immediately available for use by systems in the disaster recovery facility. When a
Data Domain system at the main site is repaired or replaced, the data can be recovered using a few
simple recovery configuration and initiation commands.
You can quickly move data offsite (with no delays in copying and moving tapes). You dont have to
complete replication for backups to occur. Replication occurs in real time.
Replication typically consists of a source Data Domain system (which receives data from a backup
system), and one or more destination Data Domain systems.
Replication duplicates backed-up data over a WAN after it has been deduplicated and compressed.
Replication creates a logical copy of the selected source data post-deduplication, and only sends any
segments that do not already exist on the destination. Network demands are reduced during replication
because only unique data segments are sent over the network.
259
260
Slide 4
/data/
/col1/
/col1/
/backup/
/backup/
/subdir1
/subdir1
system A (source)
system B (destination)
Defining a replication source and destination is called a pair. A source or a destination in the
replication pair is referred to as a context. The context is defined in both the source and destination
Data Domain systems paired for replication.
A replication context can also be termed a replication stream, and although the use case is quite
different, the stream resource utilization within a Data Domain system is roughly equivalent to a read
stream (for a source context) or a write stream (for a destination context).
The count of replication streams per system depends upon the processing power of the Data Domain
system on which they are created. Lesser systems can handle no more than 15 source and 20
destination streams, while the most powerful Data Domain system can handle over 200 streams.
261
Slide 5
Replication Topologies
System A
source/destination
System B
destination
System A
source
1 to 1
System B
destination/source
bi-directional
source
destination
destination
source
1 to many
many to 1
source
primary
source/
destination
source
destination
cascaded
primary
source/
destination
destination
cascaded 1-to-many
Data Domain supports various replication topologies in which data flows from a source to a destination
directory over a LAN or WAN.
One-to-one replication
The simplest type of replication is from a Data Domain source system to a Data Domain
destination system, otherwise known as a one-to-one replication pair. This replication topology
can be configured with directory, MTree, or collection replication types.
Bi-directional replication
In a bi-directional replication pair, data from a directory or MTree on System A is replicated to
System B, and from another directory or MTree on System B to System A.
One-to-many replication
In one-to-many replication data flows from a source directory or MTree on a System A to several
destination systems. You could use this type of replication to create more than two copies for
increased data protection, or to distribute data for multi-site usage.
262
Many-to-one replication
In many-to-one replication, whether with MTree or directory, replication data flows from
several source systems to a single destination system. This type of replication can be used to
provide data recovery protection for several branch offices at the corporate headquarters IT
systems.
Cascaded replication
In a cascaded replication topology, a source directory or MTree is chained among three Data
Domain systems. The last hop in the chain can be configured as collection, MTree, or directory
replication, depending on whether the source is directory or MTree.
For example, the first DD system replicates one or more MTrees to a second DD system, which
then replicates those MTrees to a final DD system. The MTrees on the second DD system are
both a destination (from the first DD system) and a source (to the final DD system). Data
recovery can be performed from the non-degraded replication pair context.
263
Slide 6
directory backup
MTree Replication: For partial site,
point-in-time backup
Managed Replication: Used with Data Domain Boost
Data Domain Replicator software offers four replication types that leverage the different logical levels of
the system described in the previous slide for different effects.
264
MTree replication: This is used to replicate MTrees between Data Domain systems. It uses the
same WAN deduplication mechanism as used by directory replication to avoid sending
redundant data across the network. The use of snapshots ensures that the data on the
destination is always a point-in-time copy of the source with file consistency, while reducing
replication churn, thus making WAN use more efficient. Replicating individual directories under
an MTree is not permitted with this type.
A fourth type, managed replication, belongs to Data Domain Boost operations and will be
discussed later in this course.
265
Slide 7
Collection Replication
C1
C2
C3
C4
C3
head of
source
collection log
C2
head of
destination
collection log
system A (source)
C1
system B (destination)
Collection replication replicates the entire /data/col1 area from a source Data Domain system to a
destination Data Domain system. Collection replication uses the logging file system structure to track
replication. Transferring data in this way means simply comparing the heads of the source and
destination logs, and catching-up, one container at a time, as shown in this diagram. If collection
replication lags behind, it continues until it catches up.
The Data Domain system to be used as the collection replication destination must be empty before
configuring replication. Once replication is configured, the destination system is dedicated to receive
data only from the source system.
With collection replication, all user accounts and passwords are replicated from the source to the
destination. If the Data Domain system is a source for collection replication, snapshots are also
replicated.
266
Collection replication is the fastest and lightest type of replication offered by the DD OS. There is no ongoing negotiation between the systems regarding what to send. Collection replication is mostly unaware
of the boundaries between files. Replication operates on segment locality containers that are sent after
they are closed.
Because there is only one collection per Data Domain system, this is specifically an approach to system
mirroring. Collection replication is the only form of replication used for true disaster recovery. The
destination system cannot be shared for other roles. It is read-only and shows data only from one
source. After the data is on the destination, it is immediately visible for recovery.
267
Slide 8
Collection replication replicates the entire /data/col1 area from a source Data Domain system to a
destination Data Domain system. This is useful when all the contents being written to the DD system
need to be protected at a secondary site.
The Data Domain system to be used as the collection replication destination must be empty before
configuring replication. The destination immediately offers all backed up data, as a read-only mirror,
after it is replicated from the source.
Snapshots cannot be created on the destination of a collection replication because the destination is
read-only.
With collection replication, all user accounts and passwords are replicated from the source to the
destination.
268
Data Domain Replicator software can be used with the optional Encryption of Data at Rest feature,
enabling encrypted data to be replicated using collection replication. Collection replication requires the
source and target to have the exact same encryption configuration because the target is expected to be
an exact replica of the source data. In particular, the encryption feature must be turned on or off at both
source and target and if the feature is turned on, then the encryption algorithm and the system
passphrases must also match. The parameters are checked during the replication association phase.
During collection replication, the source system transmits the encrypted user data along with the
encrypted system encryption key. The data can be recovered at the target, because the target machine
has the same passphrase and the same system encryption key.
Collection replication topologies can be configured in the following ways.
One-to-One Replication: This topology can be used with collection replication where the entire
/backup directory from a source Data Domain system is mirrored to a destination Data
Domain system. Other than receiving data from the source, the destination is a read-only
system.
Cascaded Replication: In a cascaded replication topology, directory replication is chained among
three or more Data Domain systems. The last system in the chain can be configured as collection
replication. Data recovery can be performed from the non-degraded replication pair context.
269
Slide 9
Directory Replication
/data/
/data/
/col1/
/col1/
/backup/
/backup/
/subdir1
/subdir1
With directory replication, a replication context pairs a directory, under /data/col1/backup/ and
all files and directories below it on a source system with a destination directory on a different system.
During replication, deduplication is preserved since data segments that already reside on the destination
system will not be resent across the network. The destination directory is read-only, and it can coexist
on the same system with other replication destination directories, replication source directories, and
other local directories, all of which share deduplication in that systems collection.
The directory replication process is triggered by a file closing on the source. In cases where file closures
are infrequent, Data Domain Replicator forces the data transfer periodically.
If the Data Domain system is a source for directory replication, snapshots within that directory are not
replicated. You must create and replicate snapshots separately.
270
Slide 10
10
During directory replication, a Data Domain system can perform normal backup and restore operations.
A destination Data Domain system must have available storage capacity that is at least the postcompressed size of the expected maximum size of the source directory. In a directory replication pair,
the destination is always read-only. In order to write to the destination outside of replication, you must
first break replication.
When replication is initialized, a destination directory is created automatically if it does not already
exist. After replication is initialized, ownership and permissions of the destination directory are always
identical to those of the source directory.
Directory replication can receive backups from both CIFS and NFS clients, but cannot not mix CIFS and
NFS data in same directory.
Directory replication supports encryption and retention lock.
271
272
Slide 11
MTree Replication
/data/
/data/
/col1/
Snapshot 2
Snapshot 1
/col1/
/backup
/hr
/sales
/sales
11
MTree replication enables the creation of disaster recovery copies of MTrees at a secondary location by
the /data/col1/mtree pathname. A Data Domain system can simultaneously be the source of
some replication contexts and the destination for other contexts. The Data Domain system can also
receive data from backup and archive applications while it is replicating data.
One fundamental difference between MTree replication and directory replication is the method used for
determining what needs to be replicated between the source and destination. MTree replication creates
periodic snapshots at the source and transmits the differences between two consecutive snapshots to
the destination. At the destination Data Domain system, the latest snapshot is not exposed until all of
the data for that snapshot is received. This ensures the destination is always a point-in-time image of
the source Data Domain system. In addition, files do not show out of order at the destination. This
provides file-level consistency, simplifying recovery procedures. It also reduces recovery time objectives
(RTOs). Users are also able to create a snapshot at the source Data Domain system for application
consistency (for example, after a completion of a backup), which is replicated on the destination where
the data can be used for disaster recovery.
273
MTree replication shares some common features with directory replication. It uses the same WAN
deduplication mechanism as used by directory replication to avoid sending redundant data across the
network. It also supports the same topologies that directory replication supports. Additionally, you can
have directory and MTree contexts on the same pair of systems.
The destination of the replication pair is read-only.
The destination must have sufficient available storage to avoid replication failures.
CIFS and NFS clients should not be used within the same MTree.
MTree replication duplicates data for an MTree specified by the /data/col1/mtree pathname
including the destination MTree.
Some replication command options with MTree replication may target a single replication pair (source
and destination directories) or may target all pairs that have a source or destination on the Data Domain
system.
MTree replication is usable with encryption and Data Domain Retention Lock Compliance on an MTreelevel at the source that is replicated to the destination.
274
Slide 12
12
A destination Data Domain system must have available storage capacity that is at least the postcompressed size of the expected maximum size of the source MTree.
A destination Data Domain system can receive backups from both CIFS clients and NFS clients as
long as they are separate.
MTree replication can receive backups from both CIFS and NFS clients each in their own
replication pair. (But not in the same MTree.)
When replication is initialized, a destination MTree is created automatically it cannot already
exist.
After replication is initialized, ownership and permissions of the destination MTree are always
identical to those of the source MTree.
At any time, due to differences in global compression, the source and destination MTree can
differ in size.
MTree replication supports 1-to-1, bi-directional, one-to-many, many-to-one, and cascaded
replication topologies.
275
Slide 13
Directory-based layout
/data/
/data/
/col1/
/col1/
/backup/
/backup/
/Oracle/
/prod/
/prod
These sub-directories
are replicated as part of
the /backup/ MTree
/Oracle
/dev
/SQL
/SQL/
/dev/
/prod
/Oracle
/dev
/SQL
13
Replication is a major feature that takes advantage of MTree structure on the Data Domain system.
MTree structure and flexibility provides greater control over its data being replicated. Careful planning
of your data layout will allow the greatest flexibility when managing data under an MTree structure.
MTree replication works only at the MTree level. If you want to implement MTree replication, you must
move data from the existing directory structure within the /backup MTree to a new or existing MTree,
and create a replication pair using that MTree.
For example, suppose that a Data Domain system has shares mounted in locations under /backup/ as
shown in the directory-based layout.
276
If you want to use MTree replication for your production (prod) data, but are not interested in
replicating any of the development (dev) data, the data layout can be modified to create two MTrees:
/prod and /dev, with two directories within each of them. The old shares would then be deleted and
new shares created for each of the four new subdirectories under the two new MTrees. This would look
like the structure shown in the MTree-based layout.
The Data Domain system now has two new MTrees, and four shares as earlier. You can set up MTree
replication for the /prod MTree to replicate all of your production data and not set up replication for
the /dev MTree as you are not interested in replicating your development data.
277
Slide 14
Replication Seeding
high-speed
low-latency link
Source
Destination
14
If the source Data Domain system has a high volume of data prior to configuring replication, the initial
replication seeding can take some time over a slow link. To expedite the initial seeding, you can bring
the destination system to the same location as the source system to use a high-speed, low-latency link.
After data is initially seeded using the high-speed network, you then move the system back to its
intended location.
After data is initially seeded, only new data is sent from that point onwards.
All replication topologies are supported for this process, which is typically performed using collection
replication.
278
Slide 15
15
This lesson shows how to configure replication using DD Enterprise Manager, including low-bandwidth
optimization (LBO), encryption over wire, using a non-default connection port, and setting replication
throttle.
279
Slide 16
Configuring Replication
16
280
281
Slide 17
17
Low bandwidth optimization (LBO) is an optional mode that enables remote sites with limited
bandwidth to replicate and protect more of their data over existing networks. LBO:
Can optionally reduce WAN bandwidth utilization.
Is useful if file replication is being performed over a low-bandwidth WAN link.
Provides additional compression during data transfer.
Is recommended only for file replication jobs that occur over WAN links with less than 6 Mb/s of
available bandwidth. Do not use this option if maximum file system write performance is
required.
LBO can be applied on a per-context basis to all file replication jobs on a system.
Additional tuning might be required to improve LBO functionality on your system. Use bandwidth and
network-delay settings together to calculate the proper TCP buffer size, and set replication bandwidth
for replication for greater compatibility with LBO.
LBO can be monitored and managed through the Data Domain Enterprise Manager Data Management >
DD Boost > Active File Replications view.
282
Slide 18
S1
S2
S3
S16
WAN
S4
S7
source
S1
S2
S3
S16
S4
S7
destination
S16
S7
Missing segments & deltas
S7
(S1)
S1 +6
S7
S7
S16
(S1)
+6
(S1)
+6 S16
S1
18
Delta compression is a global compression algorithm that is applied after identity filtering. The algorithm
looks for previous similar segments using a sketch-like technique that sends only the difference between
previous and new segments. In this example, segment S1 is similar to S16. The destination can ask the
source if it also has S1. If it does, then it needs to transfer only the delta (or difference) between S1 and
S16. If the destination doesnt have S1, it can send the full segment data for S16 and the full missing
segment data for S1.
Delta comparison reduces the amount of data to be replicated over low-bandwidth WANs by eliminating
the transfer of redundant data found with replicated, deduplicated data. This feature is typically
beneficial to remote sites with lower-performance Data Domain models.
Replication without deduplication can be expensive, requiring either physical transport of tapes or high
capacity WAN links. This often restricts it to being feasible for only a small percentage of data that is
identified as critical and high value.
283
Reductions through deduplication make it possible to replicate everything across a small WAN link. Only
new, unique segments need to be sent. This reduces WAN traffic down to a small percentage of what is
needed for replication without deduplication. These large factor reductions make it possible to replicate
over a less-expensive, slower WAN link or to replicate more than just the most critical data.
As a result, the lag is as small as possible.
284
Slide 19
19
LBO is enabled on a per-context basis. LBO must be enabled on both the source and destination Data
Domain systems. If the source and destination have incompatible LBO settings, LBO will be inactive for
that context. This feature is configurable in the Create Replication Pair settings in the Advanced Tab.
To enable LBO, click the checkbox, Use Low Bandwidth Optimization.
Key points of LBO:
Must be enabled on both source and destination
Can be monitored through the Data Domain Enterprise Manager
Encrypted replication uses the ADH-AES256-SHA cipher suite
Related CLI command:
# replication modify
Enables delta replication on a replication context.
285
Slide 20
20
Encryption over wire or live encryption is supported as an advanced feature to provide further security
during replication. This feature is configurable in the Create Replication Pair settings in the Advanced
tab.
To enable encrypted file replication, click the checkbox, Enable Encryption Over Wire.
It is important to note, when configuring encrypted file replication, that it must be enabled on both the
source and destination Data Domain systems. Encrypted replication uses the ADH-AES256-SHA cipher
suite and can be monitored through the Data Domain Enterprise Manager.
Related CLI command:
# replication modify
Modifies the destination hostname and sets the state of encryption.
Note: This command must be entered on both Data Domain systemsthe source and
destination (target) systems. Only an administrator can set this option.
286
Slide 21
21
The source system transmits data to a destination system listen port. As a source system can have
replication configured for many destination systems (each of which can have a different listen port),
each context on the source can configure the connection port to the corresponding listen port of the
destination.
1.
2.
3.
4.
5.
287
Slide 22
22
288
289
Slide 23
290
23
Slide 24
291
24
Slide 25
Replication Reports
25
Data Domain Enterprise Manager allows you to generate reports to track space usage on a Data Domain
system for a period of up to two years back. In addition, you can generate reports to help understand
replication progress. You can view reports on file systems daily and cumulatively, over a period of time.
Access the Reports view by selecting the Reports stack in the left-hand column of the Data Domain
Enterprise Manager beneath the listed Data Domain systems.
292
Slide 26
Replication Reports
26
The Reports view is divided into two sections. The upper section allows you to create various space
usage and replication reports. The lower section allows you to view and manage saved reports.
The reports display historical data, not real-time data. After the report is generated, the charts remain
static and do not update.
The replication status reports includes the status of the current replication job running on the system.
This report is used to provide a snapshot of what is happening for all replication contexts, to help you
understand the overall replication status on a Data Domain System.
The replication summary reports includes network-in and network-out usage for all replication, in
addition to per-context levels on the system during the specified duration. This report is used to analyze
network utilization during the replication process to help understand the overall replication
performance on a Data Domain system.
293
Slide 27
27
The replication status report generates a summary of all replication contexts on a given Data Domain
system with the following information:
ID: the context number or designation or a particular context. The context number is used for
identification; 0 is reserved for collection replication, and directory replication numbering begins
at 1.
Source > Destination: The path between both Data Domain systems in the context.
Type: The type of replication context, will be Directory, MTree, or Collection .
Status: Error or Normal.
Sync as of Time: Time and date stamp of the most recent sync.
Estimated Completion: The estimated time at which the current replication operation should be
complete.
Pre-Comp Remaining: The amount of storage remaining pre-compression (applies only to
collection contexts)
Post-Comp Remaining: The amount of storage remaining post-compression (applies only to
directory, MTree, and collection contexts).
294
If an error exists in a reported context, a section called Replication Context Error Status is added to the
report. It includes the ID, source/destination, the type, the status, and a description of the error.
The last section of the report is the Replication Destination Space Availability, showing the destination
system name and the total amount of storage available in GiB.
Related CLI command:
# replication show performance
Displays current replication activity.
295
Slide 28
296
28
Slide 29
Recovering Data
29
Onsite Data Domain systems are typically used to store backup data onsite for short periods such as 30,
60, or 90 days, depending on local practices and capacity. Lost or corrupted files are recovered easily
from the onsite Data Domain system since it is disk-based, and files are easy to locate and read at any
time.
In the case of a disaster destroying onsite data, the offsite replica is used to restore operations. Data on
the replica is immediately available for use by systems in the disaster recovery facility. When a Data
Domain system at the main site is repaired or replaced, the data can be recovered using a few simple
recovery configuration and initiation commands.
If something occurs that makes the source replication data inaccessible, the data can be recovered from
the offsite replica. Either collection or directory replicated data can be recovered to the source. During
collection replication, the destination context must be fully initialized for the recover process to be
successful. Recover a selected data set if it becomes necessary to recover one or more directory
replication pairs.
Note: If a recovery fails or must be terminated, the replication recovery can be aborted.
297
Slide 30
30
298
Slide 31
31
Resynchronization is the process of recovering (or bringing back into sync) the data between a source
and destination replication pair after a manual break in replication. The replication pair are
resynchronized so both endpoints contain the same data.
Resynchronization can be used:
To convert a collection replication to directory replication. This is useful when the system is to
be a source directory for cascaded replication. A conversion is started with a replication
resynchronization that filters all data from the source Data Domain system to the destination
Data Domain system. This implies that seeding can be accomplished by first performing a
collection replication, then breaking collection replication, then performing a directory
replication resynchronization.
To re-create a context that was lost or deleted.
When a replication destination runs out of space and the source system still has data to
replicate.
299
Slide 32
Resynchronization Process
process.
Depending on the amount of data, throughput rates, and load
factors, the resync process can take between several hours and
several days.
32
300
Slide 33
Module 6: Summary
Key points covered in this module:
data.
Replicated data is used to restore operations when backup data is lost.
Data replication types include, collection, MTree, and directory.
A replication pair is also called a context.
Replication seeding is a term to describe copying initial source backup
data to a remote destination.
You can resynchronize recovered data when:
You need to recreate a deleted context.
A destination system in a context runs out of space.
You want to convert collection replication to directory replication.
301
33
302
Slide 1
In this module, you will learn about things to consider when planning, configuring, and managing a
virtual tape library (VTL).
303
Slide 2
In this lesson, you will become familiar with the virtual tape library (VTL) environment that is
configurable on a Data Domain system.
304
Slide 3
In some environments, the Data Domain system is configured as a virtual tape library (VTL). This practice
may be motivated by the need to leverage existing backup policies that were built using a strategy of
physical tape libraries. Using a VTL can be an intermediate step in a longer range migration plan toward
disk-based media for backup. It might also be driven by the need to minimize the effort to recertify a
system to meet compliance needs.
A Fibre Channel HBA-equipped host connecting to an FC SAN can ultimately connect to a Fibre Channel
HBA-equipped Data Domain system. When properly zoned, the host can send its backups via VTL
protocol directly to the Data Domain system as if the Data Domain system were an actual tape library
complete with drives, robot, and tapes.
This host can be a Windows, Linux, Unix, Solaris, IBM i, NetApp, VNX, or any NAS that support having a
Fibre Channel card in it.
Virtual tape libraries emulate the physical tape equipment and function. Virtual tape drives are
accessible to backup software in the same way as physical tape drives. Once drives are created in the
VTL, they appear to the backup software as SCSI tape drives. A virtual tape library appears to the backup
software as a SCSI robotic device accessed through standard driver interfaces.
305
When disaster recovery is needed, pools and tapes can be replicated to a remote Data Domain system
using the Data Domain replication process and later archived to tape.
Data Domain systems support backups over the SAN via Fibre Channel HBA. The backup application on
the backup host manages all data movement to and from Data Domain systems. The backup application
also directs all tape creation. Data Domain replication operations manage virtual tape replication, and
vaulting. The Data Domain Enterprise Manager is used to configure and manage tape emulations.
306
Slide 4
LAN
Backup data is
sent over TCP/IP
clients
Server configured
with Ethernet NIC
VTL
Data Domain system
configured with
NDMP tape server
receives backup data
and places onto
virtual tapes
NDMP (Network Data Management Protocol) is an open-standard protocol for enterprise-wide backup
of heterogeneous network-attached storage. NDMP was co-invented by Network Appliance and PDC
Software (acquired by Legato Systems, Inc., and now part of EMC).
Data Domain systems support backups using NDMP over TCP/IP via standard Ethernet as an alternate
method. This offers a VTL solution for remote office/back office use.
Data servers configured only with Ethernet can also back up to a Data Domain VTL when used with an
NDMP tape server on the Data Domain system. The backup host must also be running NDMP client
software to route the server data to the related tape server on the Data Domain system.
When a backup is initiated, the host tells the server to send its backup data to the Data Domain VTL tape
server. Data is sent via TCP/IP to the Data Domain system where it is captured to virtual tape and stored.
While this process can be slower than Fibre Channel speeds, a Data Domain can function as an NDMP
tape server in an NDMP environment over IP.
307
Slide 5
infrastructure.
Allows simultaneous use of VTL with NAS, NDMP, and DD Boost
Eliminates disk-based storage issues related to physical tape.
Simplifies and speeds up backups through the use of Data
Domain deduplication technology.
Reduces RTO by eliminating the need for physical tape handling.
A Data Domain virtual tape library (VTL) offers a simple integration, leveraging existing backup policies.
A Data Domain VTL can leverage existing backup policies in a backup system currently using a strategy of
physical tape libraries.
Any Data Domain system running VTL can also run other backup operations using NAS, NDMP, and DD
Boost simultaneously.
A Data Domain VTL eliminates the use of tape and the accompanying tape-related issues (large physical
storage requirement, off-site transport, high time to recovery, and tape shelf life) for the majority of
restores. Compared to normal tape technology, a Data Domain VTL provides resilience in storage
through the benefits of Data Invulnerability Architecture (DIA) (end-to-end verification, fault avoidance
and containment, continuous fault detection and healing, and file system recoverability).
308
Compared to physical tape libraries, Data Domain systems configured for VTL, simplify and speeds up
backups through the use of deduplication technology. Backups are also speedier with the use of virtual
tape does not need to wind, rewind, or position to a particular spot. Robotic movement of tapes is also
eliminated, which speeds up the overall performance of the tape backup.
Disk-based network storage provides a shorter RTO by eliminating the need for handling, loading, and
accessing tapes from a remote location.
309
Slide 6
A collection (list) of initiator WWPNs or initiator names and the drives and
changers they are allowed to access. The equivalent of LUN masking.
A unique ID for a virtual tape that is assigned when the user creates the virtual
tape cartridge.
Cartridge access port. In a VTL, a CAP is the emulated tape enter/eject point for
moving tapes to or from a library.
Also called: mail slot
Changer
A device that handles the tape between a tape library and the tape drive. In the
virtual tape world, the system emulates a specific changer type.
Initiator
Any Data Domain Storage System clients HBA world-wide port name (WWPN).
An initiator name is an alias that maps to a clients WWPN.
Library
A collection of magnetic tape cartridges used for long term data backup. A
virtual tape library emulates a physical tape library with tape drives, changers,
CAPs, and slots (cartridge slots).
Also called: autoloader, tape silo, tape mount, tape jukebox
Pool
A collection of tapes that maps to a directory in the Data Domain system, used
to replicate tapes to a destination.
Different tape library products may package some components in different ways, and the names of
some elements may differ among products, but the fundamental function is basically the same. The
Data Domain features VTL configuration including tape libraries, tapes, cartridge access ports, and
barcodes.
Barcode
A unique ID for a virtual tape. Barcodes are assigned when the user creates the virtual tape
cartridge.
310
CAP
An abbreviation for cartridge access port. A CAP enables the user to deposit and
withdraw volumes in an autochanger without opening the door to the autochanger. In a VTL, a
CAP is the emulated tape enter/eject point for moving tapes to or from a library.
Also called: mail slot.
Initiator
Any Data Domain Storage System clients HBA WWPN. An initiator name is an alias that maps to
a clients WWPN.
Library
A collection of magnetic tape cartridges used for long-term data backup. A virtual tape library
emulates a physical tape library with tape drives, changer, CAPs, and slots (cartridge slots).
Also called: autoloader, tape silo, tape mount, tape jukebox, vault.
Pool
A collection of tapes that maps to a directory on a file system, used to replicate tapes to a
destination.
Note: Data Domain pools are not the same as backup software pools. Most backup software,
including EMC NetWorker, has its own pooling mechanism.
311
Slide 7
A storage location within a library. For example, a tape library has one slot for
each tape that the library can hold.
Tape
A tape is a cartridge holding magnetic tape used to store data long term. Tapes
are virtually represented in a system as grouped data files. The user can
export/import from a vault to a library, move within a library across drives,
slots, and CAPs.
Also called: cartridge.
Tape Drive
The device that records backed-up data to a tape. In the virtual tape world, this
drive still uses the same Linear Tape-Open (LTO) technology standards.
Vault
Slot
A storage location within a library. For example, a tape library has one slot for each tape that the
library can hold.
Tape
A cartridge holding magnetic tape used to store data long term. Tapes are virtually represented
in a system as grouped data files. The user can export/import from a vault to a library, and move
within a library across drives, slots, and CAPs.
Also called: cartridge.
Tape Drive
The device that records backed-up data to a tape cartridge. In the virtual tape world, this drive
still uses the same Linear Tape-Open (LTO) technology standards as physical drives with the
following capacities:
LTO-1: 100 GB per tape
LTO-2: 200 GB per tape
LTO-3: 400 GB per tape
312
There are additional generations of LTO, but only LTO -1, -2, and -3 are currently supported by
Data Domain. Each drive operates as a single data stream on your network.
Vault
A holding place for tapes not currently in any library. Tapes in the vault eventually have to be
inserted into the tape library before they can be used.
313
Slide 8
In this lesson, you will become familiar with the evaluation process to determine the capacity and
throughput requirements of a Data Domain system.
Note: This lesson is intended to be a simplified overview of Data Domain VTL configuration planning.
Typically, any production Data Domain system running VTL has been assessed, planned, and configured
by Data Domain implementation experts prior to installation and production.
314
Slide 9
between 64 and 540 LTO-1, LTO-2, or LTO-3 tape drives per system:
DD990 has a 540 virtual drive capacity
DD890 has a 256 virtual drive capacity
DD6xx has a 64 virtual drive capacity
A single Data Domain system can support:
Up to 64 virtual libraries
Up to 32k slots per library and 64k slots per system
Up to 100 CAPs per library and 1000 CAPs per system
Up to 4000 GiB per tape.
Note: These are some of the maximum capacities for various features in a VTL
configuration for the larger Data Domain systems. Check the VTL Best Practices Guide for
recommendations for your system and configuration.
In setting up a virtual tape library (VTL) on a Data Domain system, you configure parameters in the
environment to structure the number and size of elements within each library. The parameters you
choose are dictated by the tape technology and library you are emulating. Efficiencies are dictated by
the processing power and storage capacity of the Data Domain restorer being used as the VTL systems.
Larger, faster systems allow more streams to write to a higher number of virtual tape drives, thus
providing faster virtual tape backups.
Libraries: All systems are currently limited to a maximum of 64 libraries, (64 concurrently active VTL
instances on each Data Domain system).
Drives: Up to 540 tape drives are supported, depending on the Data Domain model. A DD6xx, model can
have a maximum of 64 drives. A DD890 model can have a maximum of 256 drives.
Note: Although a DD890 can configure up to 256 tape devices, the system is limited to a maximum
stream limit of 180 streams. Additional drives beyond the 180 can be configured for provisioning per
backup policies.
Initiators: A maximum of 92 initiator names or WWPNs can be added to a single access group.
315
316
Slide 10
VTL license
Fibre Channel hardware
Number of slots and drives
Space management considerations
Backup size
Data type
Retention periods and expired media
Replication
10
As you plan your VTL configuration, be sure to give special consideration to the following:
VTL License
VTL is a licensed feature of the Data Domain system. Only one license is needed to back up to a
Data Domain configured for VTL.
317
When you establish fabric zones via FC switches, the best way to avoid problems with VTL configurations
is to include only one initiator and one target port in one zone. Avoid having any other targets or
initiators in any zones that contain a gateway target HBA port.
The following recommendations apply when connecting the Data Domain system to a backup
host via Fibre Channel:
Only initiators that need to communicate with a particular set of VTL target ports on a Data
Domain system should be zoned with that Data Domain system.
The host-side FC port must be dedicated to Data Domain VTL devices.
All host-side FC HBAs should be upgraded to the latest driver version for the OS being used.
If you are uncertain about compatibility with your FC HBAs installed in an application server
and operating as initiators for VTL, consult the DD OS 5.2 Backup Compatibility Guide,
available on the Support Portal or contact Support for assistance and advice.
When establishing fabric zones via FC switches, the best way to avoid problems with VTL
configurations is to include only one initiator and one target port in one zone.
The number of slots and drives in a VTL are governed by the number of simultaneous backup
and restore streams that are expected to run. Drive counts are also constrained by the
configuration and overall performance limits of your particular Data Domain system. Slot counts
are typically based on the number of tapes are used over a retention policy cycle.
318
319
Slide 11
11
Choosing the optimal size of tapes for your needs depends on multiple factors, including the specific
backup application being used, and the characteristics of the data being backed up. In general, its better
to use a larger number of smaller capacity tapes than a smaller number of large capacity tapes, in order
to control disk usage and prevent system full conditions.
When choosing a tape size, you should also consider the backup application being used. For instance,
Hewlett Packard Data Protector supports only LTO-1 /200 GB capacity tapes.
Data Domain systems support LT0-1, LTO-2, and LTO-3 formats.
LTO-1: 100 GB per tape
LTO-2: 200 GB per tape
LTO-3: 400 GB per tape
If the data you are backing up is large, (over 200 GB, for example), you may want larger-sized tapes since
some backup applications are not able to span across multiple tapes.
320
The strategy of using smaller tapes across many drives gives your system greater throughput by using
more data streams between the backup host and Data Domain system.
Larger capacity tapes pose a risk to system full conditions. It is more difficult to expire and reclaim the
space on data being held on a larger tape than on smaller tapes. A larger tape can have more backups
on it, making it potentially harder to expire because it might contain a current backup on it. Expired
tapes are not deleted, and the space occupied by that tape is not reclaimed until it is relabeled,
overwritten, or deleted. Consider a situation in which 30% of your data is being held on a 1TB tape. You
could recover half of that data space (500 GB) and still not be able to reclaim any of that space while the
tape is still holding unexpired data.
321
Slide 12
Tape Sizing
Unexpired and active
data pointers
12
All backups on a tape must expire, by policy or manually, before the space in the cartridge can be
relabeled and made available for reuse. If backups with different retention policies exist on a single
piece of media, the youngest image will prevent file system cleaning and reuse of the tape. You can
avoid this condition by initially creating and using smaller tape cartridges in most cases, tapes in the
100GB to 200GB range.
Unless you are backing up larger-size files, backing up smaller files to larger-sized tapes will contribute to
this issue by taking longer to fill a cartridge with data. Using a larger number of smaller-sized tapes can
reduce the chances of a few young files preventing cleaning older data on a larger tape.
322
Slide 13
Optimal tape size depends on the size of the files being backed
13
When deciding how many tapes to create for your VTL configuration, remember, that creating more
tapes than you actually need might cause the system to fill up prematurely and cause unexpected
system full conditions. In most cases, backup software will use blank tapes before recycling tapes. It is a
good idea to start with a tape count less than twice the available space on the Data Domain system.
323
Slide 14
tape capacity
Code
L1
L2
L3
LA
LB
LC
sequence
Capacity
100 GiB
200 GiB
400 GiB
50 GiB
30 GiB
10 GiB
Tape Type
LTO-1
LTO-2
LTO-3
14
When a tape is created, a logical, eight-character barcode is assigned that is a unique identifier of a tape.
When creating tapes, the administrator must provide the starting barcode. The barcode must start with
six numeric or uppercase alphabetic characters (from the set {0-9, A-Z}). The barcode may end with a
two-character tag for the supported LT0-1, LT0-2, and LT0-3 tape types.
A good practice is to use either two or three of the first characters as the identifier of the group in which
the tapes belong. If you use two characters as the identifier, you can then use four numbers in sequence
to number up to 10,000 tapes. If you use three characters, you are able to sequence only 1000 tapes.
Note: If you specify the tape capacity when you create a tape through the Data Domain Enterprise
Manager, you will override the two-character tag capacity specification.
324
Slide 15
In this lesson, you will see the steps you would take to create a library and tapes, and set the logical
interaction between the host initiators and their related access groups.
Basic NDMP tape server configuration with a Data Domain VTL library and a brief overview of VTL
support for IBM i products are also presented.
325
15
Slide 16
16
The Enterprise Manager Configuration Wizard walks you through the initial VTL configuration, using the
VTL configuration module. Typically, the Configuration Wizard is run initially by the EMC installation
team in your environment.
To open the Enterprise Manager Configuration Wizard, go to the Enterprise Manager, and select
Maintenance > More Tasks > Launch Configuration Wizard.
Navigate to the VTL configuration, and click No until you arrive at the VTL Protocol configuration section.
Select Yes to configure VTL.
The wizard steps you through library, tape, initiator, and access group configuration.
Manual configuration is also possible. Manually configuring the tape library and tapes, importing tapes,
configuring physical resources, setting initiators, and creating VTL access groups are covered in the
following slides.
326
Slide 17
Creating a Library
17
Libraries identify the changer, the drives, the drives associated slots and CAPs, and tapes to be used in a
VTL configuration.
To create a library outside of the configuration manager, go to Data Management > VTL
Click the Virtual Tape Libraries stack > More Tasks menu > Library > Create
Pictured here is the Create Library window in the Data Domain Enterprise Manager.
If the VTL is properly planned ahead of time, you should know the values to enter when creating a
library.
327
Keep in mind the capacities and scalability of the elements configured when creating a library (see the
earlier slide on capacity and scalability).
1. Check the backup software application documentation on the Data Domain support site for the
model name you should use with your application. Typically, Restorer-L180 is used only with
Symantec NetBackup and BackupExec software. TS3500 is used with various backup applications
and various OS versions. If you intend to use TS3500 as your changer emulator, check the DD OS
5.2 Backup Compatibility Guide to be sure TS3500 is supported with your selected OS version
and backup application.
2. Click OK.
The new library appears under the Libraries icon in the VTL Service stack. Options configured
above appear as icons under the library. Clicking the library displays the configuration details in
the informational pane.
Related CLI Commands:
# vtl add
Creates/adds a tape library.
# vtl enable
Enables VTL subsystem.
# vtl disable
Closes all libraries and shuts down the VTL process.
328
Slide 18
Creating Tapes
18
To create tapes:
1. Select the Virtual Tape Library stack, then click the library for which you want to create tapes. In
this case the library titled VTL is selected.
2. From the More Tasks menu (not pictured), select Tapes > Create
The Create Tapes pane appears as shown in this slide.
Refer to your implementation planning, to find the number, capacity, and starting barcode for your tape
set.
A VTL supports up to 100,000 tapes, and the tape capacity can be up to 4000 GiBs.
You can use the Enterprise Manager to create tapes.
You can create tapes from within a library, a vault, or a pool.
Related CLI commands:
# vtl tape add
Adds one or more virtual tapes and inserts them into the vault. Optionally, associates the tapes
with an existing pool for replication.
329
Slide 19
Importing Tapes
then click
Import from Vault
19
When tapes are created, they are added into the vault. From the vault, tapes can be imported, exported,
moved, searched, and removed. Importing moves existing tapes from the vault to a library slot, drive, or
cartridge access port (CAP). The number of tapes you can import at one time is limited by the number of
empty slots in the library.
To import tapes:
1. Select Data Management > VTL > VTL Service > Libraries.
2. Select a library and view the list of tapes, or click More Tasks and select Tapes > Import
3. Enter the search criteria about the tapes you want to import and click Search.
4. Select the tapes to import from the search results.
or
1. Select Data Management > VTL > VTL Service > Libraries.
2. Select the tapes to import by clicking the checkbox next to a tape, a barcode column or select all
by clicking the top of the checkbox column.
3. Only tapes showing Vault in the location are imported.
4. Click Import from Vault.
330
331
Slide 20
20
There are three steps to configuring the physical resources used for VTL communication:
1. Enable the HBA ports to be used with your VTL configuration.
2. Work with Networking resources that the SAN switch is connected and zoned properly between
the host and the Data Domain system.
3. Locate and set the alias of the initiators in the Physical Resources stack in the Data Domain
Enterprise Manager.
4. Configure the VTL access groups.
332
Slide 21
21
333
Slide 22
Assigned
accessnode
group
The world-wide
for
this initiator
number
and port
number of the FC port
in the media server
22
An initiator is any Data Domain Storage System clients HBA worldwide port name (WWPN) that belongs
to the backup host. An initiator name is an alias that maps to a clients WWPN. The Data Domain system
interfaces with the initiator for VTL activity. Initiator aliases are useful because it is easier to reference a
name than an eight-pair WWPN number when configuring access groups.
For instance, you might have a host server with the name HP-1, and you want it to belong to a group HP1. You can name the initiator coming from that host server as HP-1. You can then create an access group
also named HP-1 and ensure that the associated initiator has the same name.
To set the alias of an initiator:
1. Click Data Management > VTL > Physical Resources > Initiators.
2. Select the initiator you want to alias.
3. Click More Tasks > Set Alias
334
335
Slide 23
23
A VTL access group (or VTL group) is created to manage a collection of initiator WWPNs or aliases and
the drives and changers they are allowed to access. Access group configuration allows initiators in
backup applications to read and write data only to the devices included in the access group list. An
access group may contain multiple initiators (a maximum of 128), but an initiator can exist in only one
access group. A maximum of 512 initiators can be configured for a Data Domain system.
A default access group exists named TapeServer, to which you can add devices that support NDMPbased backup applications. Configuration for this group is discussed in the next slide.
Access groups are similar to LUN masking. They allow clients to access only selected LUNs (media
changers or virtual tape drives) on a system through assignment. A client set up for an access group can
access only those devices in the access group to which it is assigned.
Note: Avoid making access group changes on a Data Domain system during active backup or restore
jobs. A change may cause an active job to fail. The impact of changes during an active job depends on a
combination of backup software and host configurations.
336
337
Slide 24
24
The Initiators tab of the Access Group shows the Initiator alias and its related WWPN that is grouped to
the LUNs listed in the LUNs tab.
It is showing the administrator that the host associated to this initiator can see the changers and drives
listed in the LUNs tab.
338
Slide 25
25
When configuring an NDMP over TCP/IP configuration, a Data Domain system starts an NDMP tape
server.
NDMP tape servers are accessed via a standard NDMP protocol. For more details see http://ndmp.org.
The host server must have NDMP client software installed and running. This client software is used to
remotely access the Data Domain VTL.
Devices assigned to the access group TapeServer on the Data Domain system can be accessed only by
the NDMP TapeServer
The NDMP tape server on the Data Domain system converts this data to tape I/O, and writes to the Data
Domain VTL.
An NDMP user is associated with the configuration for authentication purposes. DDOS users can be used
but the password is plain over the network. NDMPD adds the user and can enable password encryption
for added security.
The top level CLI command is NDMPD.
339
Slide 26
Make sure NDMP daemon sees the devices in the TapeServer access group
sysadmin@dddev-01# ndmpd show devicenames
NDMP Device
Virtual Name
Vendor
----------------- -----------------------/dev/dd_ch_c0t310
Mydd610 changer
STK
/dev/dd_ch_c0t410
Mydd610 drive 1
IBM
/dev/dd_ch_c0t510
Mydd610 drive 2
IBM
/dev/dd_ch_c0t910
Mydd610 drive 3
IBM
/dev/dd_ch_c0t1310 Mydd610 drive 4
IBM
Product
-----------L180
ULTRIUM-TD3
ULTRIUM-TD3
ULTRIUM-TD3
ULTRIUM-TD3
Serial #
---------3478270003
3478270004
3478270005
3478270006
3478270007
26
The following steps configure an NDMP tape server on the Data Domain system.
1. Enable the NDMP daemon by typing the CLI command ndmpd enable.
2. Verify that the NDMP daemon sees the devices created in the TapeServer access group
Note: you must first create a VTL per the instructions discussed earlier in this module, and then
assign the access group, TapeServer, before performing this step. Enter the command ndmpd
show devicenames.
The VTL device names will appear as a table as shown in this slide.
3. Add an NDMP user for the ndmpd service. Enter the command, ndmpd user add ndmp.
When prompted, enter and verify the password for this user. Verify the created user by entering
the command, ndmpd user show. The username appears below the command.
340
Slide 27
27
341
Slide 28
28
The IBM power systems utilize a hardware abstraction layer, commonly referred to as the physical
hardware. All peripheral equipment must emulate IBM equipment, including IBM tape libraries and
devices, when presented to the operating system.
Additionally, the hardware drivers used by these systems are embedded in the LIC and IBM i operating
system. LIC PTFs, or program temporary fixes, are IBM's method of updating and activating the drivers.
In most cases, hardware configuration settings cannot be manually configured, as only IBM, or
equipment that emulates IBM equipment is attached, requiring only fixed configuration settings.
Fibre Channel devices can be connected directly to host (direct attach) through FC-AL topology or
through a switched fabric (FC-SW) topology. Please note that the Data Domain VTL supports only
switched fabric for connectivity. The Fibre Channel host bus adapters or IOAs (input/output adapters)
can negotiate at speeds of 2 Gbps, 4 Gbps, and 8 Gbps in an FC-SW environment without any
configuration on the operating system other than plugging in the cable at the host. Fibre Channel IOPs
and IOAs are typically installed by an IBM business partner.
342
Virtual Libraries
Data Domain VTL supports one type of library configuration for IBM i use. This is an IBM TS3500
configured with IBM LT03 virtual tape drives. Virtual library management is done from the Virtual Tape
Libraries tab. From Virtual Tape Libraries > More Tasks > Library > Create, you can set the number of
virtual drives and the number of slots.
A special VTL license that supports IBM i use is required. This special license supports other VTL
configurations as well, but the standard VTL license does not directly support IBM i configurations.
IBM i virtual libraries are not managed any differently from other operating systems. Once the library
and tapes are created, they are managed either by BRMS (IBM's tape management on the i) or through
other IBM i native command access or third-party tape management systems. The only library
supported on the IBM i is the TS3500, and LTO3 drives. They must be created after you add the i/OS
license to the DD system to have the correct IBM i configuration.
Refer to the Virtual Tape Library for IBM System i Integration Guide for current configuration
instructions available in the support portal for all configuration and best practices information when
using VTL in an IBM i environment.
343
Slide 29
344
29
Slide 30
Module 7: Summary
345
30
346
Slide 1
Module 8: DD Boost
Module 8: DD Boost
This module discusses how DD Boost incorporates several features to significantly reduce backup time
and manage replicated data for easier access in data recovery operations.
By the end of this module, you should be able to:
Describe DD Boost features and their functions.
Indentify how replication is enhanced with DD Boost.
Describe how DD Boost is configured for operation.
347
Slide 2
Module 8: DD Boost
Module 8: DD Boost
EMC Data Domain Boost extends the optimization capabilities of Data Domain systems for other EMC
environments, such as Avamar and NetWorker, as well as Greenplum, Quest vRanger, Oracle RMAN,
Symantec NetBackup, and Backup Exec.
In this lesson, you will get an overview of the DD Boost functionality and the features that make up this
licensed addition to the Data Domain operating system.
348
Slide 3
than CIFS/NFS.
The application host is aware of, and manages replication of
backups created with DD Boost. This is called Managed File
Replication.
DD Boost shares the work of deduplication by distributing some
of the processing with the application host. This feature is called
distributed segment processing (DSP).
Module 8: DD Boost
349
Reduced load on the storage node/backup host. Managed file replication, an optional
feature of DD Boost, offers a replication environment where the application host is both
aware and can control replication.
3. DD Boost provides systems with centralized replication awareness and management. Using this
feature, known as Managed File Replication, backups written to one Data Domain system can be
replicated to a second Data Domain system under the management of the application host. The
application host catalogs and tracks the replica, making it immediately accessible for recovery
operations. Administrators can use their backup application to recover duplicate copies directly
from a replica Data Domain system.
Benefits of managed file replication include:
Faster disaster recovery.
Quicker access to recovery. All backups and clones are cataloged in your backup application
on your server.
Full administrative control of all backups and replicas through the backup software.
350
Slide 4
Module 8: DD Boost
351
Optimized synthetic backups reduce processing overhead associated with traditional synthetic full
backups. Just like a traditional backup scenario, optimized synthetic backups start with an initial full
backup followed by incremental backups throughout the week. However, the subsequent full backup
requires no data movement between the application server and Data Domain system. The second full
backup is synthesized using pointers to existing segments on the Data Domain system. This optimization
reduces the frequency of full backups, thus improving recovery point objectives (RPO) and enabling
single step recovery to improve recovery time objectives (RTO). In addition, optimized synthetic backups
further reduce the load on the LAN and application host.
Benefits include:
Reduces the frequency of full backups
Improves RPO and RTO
Reduces load on the LAN and application host
Both low bandwidth optimization and encryption of managed file replication data are replication
optional features and are both supported with DD Boost enabled.
352
Slide 5
Module 8: DD Boost
As of DD OS version 5.2, DD Boost currently supports interoperability with the listed products on various
backup host platforms and operating systems. The interoperability matrix is both large and complex. To
be certain a specific platform and operating system is compatible with a version of DD Boost, consult the
EMC DD Boost Compatibility Guide found in the Support Portal at http://my.datadomain.com.
353
Slide 6
Module 8: DD Boost
To store backup data using DD Boost, the Data Domain system exposes user-created disk volumes called
storage units (SUs) to a DD Boost-enabled application host. In this example, an administrator created an
SU named exchange_su. As the system completes the SU creation, an MTree is created, and the file,
/.ddboost is placed within the created MTree. Creating additional storage units creates additional
MTrees under /data/col1 each with its own /.ddboost file within. Access to the SU is OS
independent. Multiple applications hosts, when configured with DD Boost, can use the same SU on a
Data Domain system as a storage server.
Storage units can be monitored and controlled just as any data managed within an MTree. You can set
hard and soft quota limits and receive reports about MTree content.
Note: Storage units cannot be used with anything but a DD Boost replication context.
354
Slide 7
Backup host
LAN
Clients
Server
DD
Boost
Module 8: DD Boost
If you recall, the deduplication on a Data Domain system is a five-step process where the system:
1. Segments data to be backed up
2. Creates fingerprints of segment data
3. Filters the fingerprints and notes references to previously stored data
4. Compresses unique, new data to be stored
5. Writes the new data to disk
In normal backup operations, the backup host has no part in the deduplication process. When backups
run, the backup host sends all backup data to allow the Data Domain system to perform the entire
deduplication process to all of the data.
355
Slide 8
Backup host
DD
Boost
Library
LAN
Clients
Server
DD
Boost
Module 8: DD Boost
Distributed segment processing (DSP) shares deduplication duties with the backup host. With DSP
enabled the backup host:
1. Segments the data to be backed up
2. Creates fingerprints of segment data and sends them to the Data Domain system
3. Optionally compresses data to be backed up
4. Sends only the requested unique data segments to the Data Domain system
The Data Domain system:
1. Filters the fingerprints sent by the backup host and requests data not previously stored
2. Notes references to previously stored data and writes new data
The deduplication process is the same whether DSP is enabled or not. With DSP enabled, the backup
host will split the arriving data into 4-12 kb segments. A fingerprint (or segment ID) is created for each
segment. Each segment ID is sent over the network to the Data Domain system to filter. The filter
determines if the segment ID is new or a duplicate. The segment IDs are checked against segment IDs
already on the Data Domain system. The segment IDs that match existing segments IDs are referenced
and discarded, while the Data Domain system tells the backup host which segment IDs are unmatched
(new).
356
Unmatched or new segments are compressed using common compression techniques, such LZ, GZ, or
Gzfast. This is also called local compression. The compressed segments are sent to the Data Domain
system and written to the Data Domain system with the associated fingerprints, metadata, and logs.
The main benefits of DSP are:
More efficient CPU utilization.
Improved utilization of network bandwidth. Less data throughput is required to send with each
backup.
Less time to restart failed backup jobs. If a job fails, the data already sent to the Data Domain
system does not need to be sent again reducing the load on the network and improving the
overall throughput for the failed backups upon retry.
Distribution of the workload between the Data Domain system and the DD Boost aware
application.
DD BOOST can operate with distributed segment processing either enabled or disabled.
357
Slide 9
Module 8: DD Boost
The network bandwidth requirements are significantly reduced because only unique data is sent over
the LAN to the Data Domain systems.
Consider DSP only if your application host can accommodate the additional processing required by its
share of the DSP workflow.
358
Slide 10
WAN
Clients
Server
Source
DD
Boost
Network
Replication pair
Destination
DD
Boost
Module 8: DD Boost
10
DD Boost integration enables the backup application to manage file replication between two or more
Data Domain systems configured with DD Boost software. It is a simple process to schedule Data
Domain replication operations and keep track of backups for both local and remote sites. In turn,
recovery from backup copies at the central site is also simplified because all copies are tracked in the
backup software catalog.
The Data Domain system uses a wide area network (WAN)-efficient replication process for deduplicated
data. The process can be optimized for WANs, reducing the overall load on the WAN bandwidth
required for creating a duplicate copy.
359
Slide 11
Initial
backup
control
data
clone
copy
control
data
NetWorker
Storage Node
update control
data
(initial backup)
update
control data
(replication copy)
Local
Data Domain System
Initial data backup
backup complete
3
4
7
begin replication
replication complete
initial
backup
5
Remote
Data Domain System
replication
Replication
complete
replication
copy
6
Module 8: DD Boost
11
This example shows managed file replication with DD Boost. The example is specific to an EMC
NetWorker environment. Symantec and other backup applications using DD Boost will manage
replication in a similar manner.
In this environment, a backup server is sending backups to a local Data Domain system. A remote Data
Domain system is set up for replication and disaster recovery of the primary site.
1. The NetWorker storage node initiates the backup job and sends data to the Data Domain
system. Backup proceeds.
2. The Data Domain system signals that the backup is complete.
3. Information about the initial backup is updated in the NetWorker media database.
4. The NetWorker storage node initiates replication of the primary backup to the remote Data
Domain system through a clone request.
5. Replication between the local and remote Data Domain systems proceed.
6. When replication completes, the Networker storage node receives confirmation of the
completed replication action.
7. Information about the clone copy of the data set is updated in the NetWorker media database.
Replicated data is now immediately accessible for data recovery using the NetWorker media database.
360
Slide 12
Module 8: DD Boost
12
While it is acceptable for both standard MTree replication and managed file replication to operate on
the same system, be aware that managed file replication can be used only with MTrees established with
DD Boost storage units. MTree replication can be used only with CIFS and NFS data.
You also need to be mindful not to exceed the total number of 100 MTrees on a system. The 100 MTree
limit is a count of both standard MTrees and MTrees created as DD Boost storage units.
Also remember to remain below the maximum total number of replication pairs (contexts)
recommended for your particular Data Domain systems.
361
Slide 13
OST Plug-in
OST Plug-in
OST Plug-in
OST Plug-in
backup hosts
Application layer
aggregation
4 port interface
group
NIC
NIC
Module 8: DD Boost
13
For Data Domain systems that require multiple 1 GbE links to obtain full system performance, it is
necessary to set up multiple backup servers on the Data Domain systems (one per interface) and target
the backup policies to different servers to spread the load on the interfaces. Using the DD Boost
interface groups, you can improve performance on 1 Gb Ethernet ports.
The Advanced Load Balancing and Link Failover feature allows for combining multiple Ethernet links into
a group. Only one of the interfaces on the Data Domain system is registered with the backup
application. DD Boost software negotiates with the Data Domain system on the interface registered with
the backup application to obtain an interface to send the data. The load balancing provides higher
physical throughput to the Data Domain system compared to configuring the interfaces into a virtual
interface using Ethernet-level aggregation.
The links connecting the backup hosts and the switch that connects to the Data Domain system are
placed in an aggregated failover mode. A network-layer aggregation of multiple 1 GbE or 10 GbE links is
registered with the backup application and is controlled on the backup server.
This configuration provides network failover functionality from end-to-end in the configuration. Any of
the available aggregation technologies can be used between the backup servers and the switch.
362
An interface group is configured on the Data Domain system as a private network used for data transfer.
The IP address must be configured on the Data Domain system and its interface enabled. If an interface
(or a NIC that has multiple interfaces) fails, all of the in-flight jobs to that interface transparently failover to a healthy interface in the interface group (ifgroup). Any jobs started subsequent to the failure
are routed to the healthy interfaces. You can add public or private IP addresses for data transfer
connections.
Distributed segment processing (DSP) is not affected by DD Boost application-level groups.
With dynamic load balancing and failover, the DD Boost plug-in dynamically negotiates with the Data
Domain system on the interface registered with the backup application to obtain an interface to send
the data. The load balancing provides higher physical throughput to the Data Domain system compared
to configuring the interfaces into a virtual interface using Ethernet-level aggregation.
Note: Do not use 1GbE and 10GbE connections in the same interface group.
363
Slide 14
Module 8: DD Boost
14
A synthetic full or synthetic cumulative incremental backup is a backup assembled from previous
backups. Synthetic backups are generated from one previous, traditional full or synthetic full backup,
and subsequent differential backups or a cumulative incremental backup. (A traditional full backup
means a non-synthesized, full backup.) A client can use the synthesized backup to restore files and
directories in the same way that a client restores from a traditional backup.
During a traditional full backup, all files are copied from the client to a media server and the resulting
image set is sent to the Data Domain system. The files are copied even though those files may not have
changed since the last incremental or differential backup. During a synthetic full backup, the previous
full backup and the subsequent incremental backups on the Data Domain system are combined to form
a new, full backup. The new, full synthetic backup is an accurate representation of the clients file
system at the time of the most recent full backup.
Because processing takes place on the Data Domain system under the direction of the media server
instead of the client, virtual synthetic backups help to reduce the network traffic and client processing.
Client files and backup image sets are transferred over the network only once. After the backup images
are combined into a synthetic backup, the previous incremental and/or differential images can be
expired.
364
The virtual synthetic full backup is a scalable solution for backing up remote offices with manageable
data volumes and low levels of daily change. If the clients experience a high rate of change daily, the
incremental or differential backups are too large. In this case, a virtual synthetic backup is no more
helpful than a traditional full backup. To ensure good restore performance, it is recommended that you
create a traditional full backup every two months, presuming a normal weekly full and daily incremental
backup policy.
The virtual synthetic full backup is the combination of the last full (synthetic or full) backup and all
subsequent incremental backups. It is time-stamped as occurring one second after the latest
incremental. It does NOT include any changes to the backup selection since the latest incremental.
365
Slide 15
system.
How well your systems handle DSP.
How frequently you perform data restores from your backed-up
data.
The type of data being backed up, that is, does it lend itself well
to virtual synthetic backups?
Module 8: DD Boost
15
Synthetic backups can reduce the load on an application server and the data traffic between an
application server and a media server. Synthetic backups can reduce the traffic between the media
server and the DD System by performing the Virtual Synthetic Backup assembly on the DD System.
You might want to consider using virtual synthetic backups when:
Your backups are small, and localized, so that daily incrementals are small (<10% of a normal,
full backup).
The Data Domain system you are using has a large number of disks (>10).
Data restores are infrequent.
Your intention is to reduce the amount of network traffic between the application server, the
media servers and the Data Domain system.
Your media servers are burdened and might not handle DSP well.
366
367
Slide 16
Module 8: DD Boost
Module 8: DD Boost
16
EMC Data Domain Boost integrates with many EMC, and a growing number of third-party, applications.
This lesson discusses how DD Boost integrates with EMC NetWorker and Symantec NetBackup.
368
Slide 17
Enabling DD Boost
Backup host
DD
Boost
Library
source
destination
DD
Boost
DD
Boost
A separate DD Boost
license is required for
a destination Data Domain
system if you implement
the managed file
replication feature.
Module 8: DD Boost
17
The DD Boost feature is built-into the Data Domain operating system. Unlock the DD Boost feature on
each Data Domain system with separate license keys. If you are planning not to use Managed File
Replication, the destination Data Domain system does not require a DD Boost license.
Note: For EMC, Oracle, and Quest users, the Data Domain Boost library is already included in recent
versions of software. Before enabling DD Boost on Symantec Backup Exec, and NetBackup, a special OST
plug-in must be downloaded and installed on the backup host. The plug-in contains the appropriate DD
Boost Library for use with compatible Symantec product versions. Consult the most current DD Boost
Compatibility Guide to verify compatibility with your specific software and Data Domain operating
system versions. Both the compatibility guide and versions of OpenStorage (OST) plug-in software are
available through the Data Domain support portal at: http://my.datadomain.com.
A second destination Data Domain system licensed with DD Boost is needed when implementing
centralized replication awareness and management.
369
Slide 18
DD Boost Configuration
backup
host
DD
Boost
Library
1. License as required
2. Create devices, pools through
the backup server management
console and interface.
3. Configure the backup
policies/groups to use Data
Domain configured devices.
4. Configure the backup host to
use Data Domain configured
devices on desired Data Domain
systems.
source
destination
DD
Boost
DD
Boost
1. License DD Boost.
1. License DD Boost.
2. Enable DD Boost.
2. Enable DD Boost.
3. Set a client and a Data 3. Set a Data Domain
local user as a DD
Domain local user as
Boost user.
a DD Boost user.
4. Create DD Boost
4. Create DD Boost
storage units.
storage units.
5. Enable or disable
optional DD Boost
Network Note: Enable the following ports:
features.
UDP 2049 (enables NFS communication)
Module 8: DD Boost
18
Data Domain Boost configuration is the same for all backup environments:
On each of the Data Domain systems:
1. License DD Boost on the Data Domain system(s): System Settings > Licenses > Add Licenses
2. Enable DD Boost on all Data Domain systems: Data Management > DD Boost > DD Boost Status >
Enable.
3. Set a backup host as a client by hostname (the configuration does not accept IP addresses in this
case). Define a Data Domain local user as the DD Boost User: Data Management > DD Boost >
DD Boost User > Modify
4. Create at least one storage unit. You must create one or more storage units for each Data
Domain system enabled for DD Boost: Data Management > DD Boost > Storage Units > Create
Storage Unit
370
371
372
Slide 19
Module 8: DD Boost
19
Enable DD Boost by navigating in the Data Domain Enterprise Manager to Data Management > DD Boost
> Settings.
In the example on the slide, see that the current DD Boost Status is enabled. To click the button circled
in red to either enable or disable DD Boost on a system.
373
Slide 20
Module 8: DD Boost
20
To add or change a DD Boost user for the system, click the Modify button. In the Modify DD Boost
User window, select from an existing user or add a new user, give them a password and assign them a
role. In the case on this slide, we have added the user name, ddboost, and assigned them the role of
backup-operator.
In the Allowed Clients field, click the green plus button to add a new client whom you are allowing to
access DD Boost on the system. Add the client name as a domain name since IP addresses are not
allowed.
374
Slide 21
Module 8: DD Boost
21
Create a storage unit by navigating to Data Management > DD Boost > Storage Units > Create
Note: The section, Storage Unit Details is new to DD OS 5.2. It provides a good summary of a storage
unit and the status of file count, compression ratio, SU status, and quota function.
Name the storage unit and set any quota settings you wish. Be aware that these quota settings are not
enforced unless MTree quotas are enabled.
375
Slide 22
Module 8: DD Boost
22
To enable or disable distributed segment processing, bandwidth optimization for file replication, and file
replication encryption, click More Tasks > Set Options.
376
Slide 23
Module 8: DD Boost
23
In this lab, you have the choice of configuring DD Boost using either EMC NetWorker or Symantec
NetBackup. If time allows, you may perform this lab twice, configuring with both backup applications.
377
Slide 24
Module 8: Summary
Key points covered in this module:
DD Boost uses distributed segment processing (DSP) to reduce
network bandwidth.
DD Boost features centralized replication management as a
single point for tracking all backups and duplicate copies.
DD Boost uses advanced load balancing and failover among
available ports, thereby keeping backups running efficiently and
fault tolerant.
With DSP, the deduplication process is distributed between the
backup host and a Data Domain system, increasing aggregate
throughput while decreasing data transferred over the network.
Module 8: DD Boost
378
24
Slide 1
In this module, you will learn about security and protecting your data with a Data Domain system,
specifically how to:
Describe the purposes of, and differences between, retention lock compliance and retention
lock governance.
Configure and set retention lock compliance
Describe file system locking
Describe and perform data sanitization
Describe and perform encryption for data at rest
379
Slide 2
As data ages and becomes seldom used, EMC recommends moving this data to archive storage where it
can still be accessed, but no longer occupies valuable storage space.
Unlike backup data, which is a secondary copy of data for shorter-term recovery purposes, archive data
is a primary copy of data and is often retained for several years. In many environments, corporate
governance and/or compliance regulatory standards can mandate that some or all of this data be
retained as-is. In other words, the integrity of the archive data must be maintained for specific time
periods before it can be deleted.
The EMC Data Domain Retention Lock (DD Retention Lock) feature provides immutable file locking and
secure data retention capabilities to meet both governance and compliance standards of secure data
retention. DD Retention Lock ensures that archive data is retained for the length of the policy with data
integrity and security.
This lesson presents an overview of Data Domain Retention Lock, its configuration and use.
380
Slide 3
Protects against
User errors
Malicious activity
EMC Data Domain Retention Lock is an optional, licensed software feature that allows storage
administrators and compliance officers to meet data retention requirements for archive data stored on
an EMC Data Domain system. For files committed to be retained, DD Retention Lock software works in
conjunction with the applications retention policy to prevent these files from being modified or deleted
during the applications defined retention period, for up to 70 years. It protects against data
management accidents, user errors and any malicious activity that might compromise the integrity of
the retained data. The retention period of a retention-locked file can be extended, but not reduced.
After the retention period expires, files can be deleted, but cannot be modified. Files that are written to
an EMC Data Domain system, but not committed to be retained, can be modified or deleted at any time.
381
382
Slide 4
Retention Lock
Governance
Retention Lock
Compliance
Yes
Yes
Yes
Rename MTree
Yes
Yes
Collection, Directory,
MTree
Collection
No
Yes
Audit Logging
No
Yes
CLI Support
Yes
Yes
Yes
No
Supported Protocols
CIFS, NFS
The capabilities built into Data Domain Retention Lock are based on governance and compliance archive
data requirements.
Governance archive data requirements:
Governance standards are considered to be lenient in nature allowing for flexible control of retention
policies, but not at the expense of maintaining the integrity of the data during the retention period.
These standards apply to environments where the system administrator is trusted with his administrator
actions.
383
The storage system has to securely retain archive data per corporate governance standards and must
meet the following requirements:
Allow archive files to be committed for a specific period of time during which the contents of the
secured file cannot be deleted or modified.
Allow for deletion of the retained data after the retention period expires.
Allow for ease of integration with existing archiving application infrastructure through CIFS and
NFS.
Provide flexible policies such as allow extending the retention period of a secured file, revert of
locked state of the archived file, etc.
Ability to replicate both the retained archive files and retention period attribute to a destination
site to meet the disaster recovery (DR) needs for archived data.
Compliance archive data requirements:
Securities and Exchange Commission (SEC) rules define compliance standards for archive storage to be
retained on electronic storage media, which must meet certain conditions:
Preserve the records exclusively in a non-writeable, non-erasable format.
Verify automatically the quality and accuracy of the storage media recording process.
Serialize the original, and any duplicate units of storage media, and the time-date for the
required retention period for information placed on the storage media.
Store, separately from the original, a duplicate copy of the record on an SEC-approved medium
for the time required.
Data Domain Retention Lock Governance edition maintains the integrity of the archive data with the
assumption that the system administrator is trusted, and that any actions they take are valid to maintain
the integrity of the archive data.
Data Domain Retention Lock Compliance edition is designed to meet the regulatory compliance
standards such as those set by the SEC standards, for records (SEC 17a-4(f)). Additional security
authorization is required to manage the manipulation of retention periods, as well as renaming MTrees
designated for retention lock.
Note: DD Retention Lock software cannot be used with EMC Data Domain GDA models or with the DD
Boost protocol. Attempts to apply retention lock to MTrees containing files created by DD Boost will fail.
384
Slide 5
As discussed in the Basic Administration module, a security privilege can be assigned to user accounts:
In the Enterprise Manager when user accounts are created.
In the CLI when user accounts are added.
This security privilege is in addition to the user and admin privileges.
A user assigned the security privilege is called a security officer.
The security officer can run a command via the CLI called the runtime authorization policy.
Updating or extending retention periods, and renaming MTrees, requires the use of the runtime
authorization policy. When enabled, runtime authorization policy is invoked on the system for the
length of time the security officer is logged in to the current session.
Runtime authorization policy, when enabled, authorizes the security officer to provide credentials, as
part of a dual authorization with the admin role, to set-up and modify both retention lock compliance
features, and data encryption features as you will learn later in this module.
385
Slide 6
Optional
Extend file retention times or delete
files with expired retention periods
using client-side commands.
1. Enable DD Retention Lock Governance, Compliance, or both on the Data Domain system. (You
must have a valid license for DD Retention lock Governance and/or Compliance.)
2. Enable MTrees for governance or compliance retention locking using Enterprise Manger or CLI
commands.
3. Commit files to be retention locked on the Data Domain system using client-side commands
issued by an appropriately configured archiving or backup application, manually, or using scripts.
4. (Optional) Extend file retention times or delete files with expired retention periods using clientside commands.
386
Slide 7
Lock enabled, the user or software must set the last access time
(atime) of that file to communicate the retention period to the
Data Domain system.
atime must be set beyond the current configured minimum
retention period.
Defaults
Minimum retention period = 12 hours
Maximum retention period = 5 years
After an archive file has been migrated onto a Data Domain system, it is the responsibility of the
archiving application to set and communicate the retention period attribute to the Data Domain system.
The archiving application sends the retention period attribute over standard industry protocols.
The retention period attribute used by the archiving application is the last access time: the atime. DD
Retention Lock software allows granular management of retention periods on a file-by-file basis. As part
of the configuration and administrative setup process of the DD Retention Lock software, a minimum
and maximum time-based retention period for each MTree is established. This ensures that the atime
retention expiration date for an archive file is not set below the minimum, or above the maximum,
retention period.
387
The archiving application must set the atime value, and DD Retention Lock must enforce it, to avoid any
modification or deletion of files under retention of the file on the Data Domain system. For example,
Symantec Enterprise Vault retains records for a user-specified amount of time. When Enterprise Vault
retention is in effect, these documents cannot be modified or deleted on the Data Domain system.
When that time expires, Enterprise Vault can be set to automatically dispose of those records.
Locked files cannot be modified on the Data Domain system even after the retention period for the file
expires. Files can be copied to another system and then be modified. Archive data retained on the Data
Domain system after the retention period expires is not deleted automatically. An archiving application
must delete the remaining files, or they must be removed manually.
388
Slide 8
You can configure DD Retention Lock Governance using the Enterprise Manager or by using CLI
commands. Enterprise Manager provides the capability to modify the minimum and maximum retention
period for selected MTrees. In the example above, the Modify dialog is for the MTree /data/col1/hr.
To configure retention lock:
1. Select the system in the navigation pane.
2. Select Data Management > MTree.
3. Select the MTree you want to edit with DD Retention Lock.
4. Go to the Retention Lock pane at the bottom of the window.
5. Click Edit.
6. Check the box to enable retention lock.
7. Enter the retention period or select Default.
8. Click OK.
389
390
Slide 9
The DD Retention Lock Compliance edition meets the strict requirements of regulatory standards for
electronic records, such as SEC 17a-4(f), and other standards that are practiced worldwide.
DD Retention Lock Compliance, when enabled on an MTree, ensures that all files locked by an archiving
application, for a time-based retention period, cannot be deleted or overwritten under any
circumstances until the retention period expires. This is archived using multiple hardening procedures:
Requiring dual sign-on for certain administrative actions. Before engaging DD Retention Lock
Compliance edition, the System Administrator must create a Security Officer role. The System
Administrator can create the first Security Officer, but only the Security Officer can create other
Security Officers on the system.
Some of the actions requiring dual sign-on are:
Extending the retention periods for an MTree.
Renaming the MTree.
Deleting the Retention Lock Compliance license from the Data Domain system.
391
392
Slide 10
393
10
Slide 11
11
In this lesson, you will about learn the function of data sanitization and how to run a command from the
CLI to sanitize data on a Data Domain system.
394
Slide 12
Erases segments of deleted files not used by other files and all
unused capacity in the file system
Unused capacity is data space that has been used and cleaned
Unused capacity does not include space that has never been used
system sanitize
12
395
396
Slide 13
13
When you issue the system sanitize start command, you are prompted to consider the length
of time required to perform this task. The system advises that it can take longer than the time it takes to
reclaim space holding expired data on the system (filesys clean). This can be several hours or
longer, if there is a high percentage of space to be sanitized.
During sanitization, the system runs through five phases: merge, analysis, enumeration, copy, and zero.
1. Merge: Performs an index merge to flush all index data to disk.
2. Analysis: Reviews all data to be sanitized. This includes all stored data.
3. Enumeration: Reviews all of the files in the logical space and remembers what data is active.
4. Copy: Copies live data forward and frees the space it used to occupy.
5. Zero: Writes zeroes to the disks in the system.
You can view the progress of these five phases by running the system sanitize watch command.
397
398
Slide 14
399
14
Slide 15
15
In this lesson, you will learn about the features, benefits, and function of the encryption of data at rest
feature.
You will also learn about the purpose of other security features, such as file system locking, and when
and how to use this feature.
400
Slide 16
16
Data encryption protects user data if the Data Domain system is stolen or if the physical storage media is
lost during transit, and eliminates accidental exposure of a failed drive if it is replaced. In addition, if an
intruder ever gains access to encrypted data, the data is unreadable and unusable without the proper
cryptographic keys.
Encryption of data at rest:
Enables data on the Data Domain system to be encrypted, while being saved and locked, before
being moved to another location.
Is also called inline data encryption.
Protects data on a Data Domain system from unauthorized access or accidental exposure.
Requires an encryption software license.
Encrypts all ingested data.
Does not automatically encrypt data that was in the system before encryption was enabled.
Such data can be encrypted by enabling an option to encrypt existing data.
Furthermore, you can use all of the currently supported backup applications described in the Backup
Application Matrix on the Support Portal with the Encryption of Data at Rest feature.
401
Slide 17
Key Management
Two key management capabilities are available:
1. The Local Key Manager provides a single encryption key per
Data Domain system. This single internal Data Domain
encryption key is available on all Data Domain systems.
2. Optional RSA Data Protection Manager (DPM) Key Manager for
added capability. The RSA DPM Key Manager enables the use
of multiple, rotating keys on a Data Domain system.
A single internal Data Domain encryption key is available on all Data Domain systems.
The first time Encryption of Data at Rest is enabled, the Data Domain system randomly generates an
internal system encryption key. After the key is generated, the system encryption key cannot be
changed and is not accessible to a user.
402
17
The encryption key is further protected by a passphrase, which is used to encrypt the encryption key
before it is stored in multiple locations on disk. The passphrase is user-generated and requires both an
administrator and a security officer to change it.
The RSA DPM Key Manager enables the use of multiple, rotating keys on a Data Domain system.
The RSA DPM Key Manager consists of a centralized RSA DPM Key Manager Server and the
embedded DPM client on each Data Domain system.
The RSA DPM Key Manager is in charge of the generation, distribution, and lifecycle
management of multiple encryption keys. Keys can be rotated on a regular basis, depending on
the policy. A maximum number of 254 keys is supported.
If the RSA DPM Key Manager is configured and enabled, the Data Domain systems uses keys
provided by the RSA DPM Key Manager Server.
Note: Only one encryption key can be active on a Data Domain system. The DPM Key Manager provides
the active key. If the same DPM Key Manager manages multiple Data Domain systems, all will have the
same active keyif they are synced, and the Data Domain file system has been restarted.
For additional information about RSA DPM Key Manager, refer to the DD OS 5.2 Administration Guide.
403
Slide 18
Inline Encryption
or
Both confidentiality and message authenticity with Galois/Counter
(GCM) mode
18
With the encryption software option licensed and enabled, all incoming data is encrypted inline before it
is written to disk. This is a software-based approach, and it requires no additional hardware. It includes:
Configurable 128-bit or 256-bit advanced encryption standard (AES) algorithm with either:
Confidentiality with cipher-block chaining (CBC) mode.
or
404
When data is backed up, data enters via NFS, CIFS, VTL, DD Boost, and NDMP tape server protocols. It is
then:
1. Segmented
2. Fingerprinted
3. Deduplicated (or globally compressed)
4. Grouped
5. Locally compressed
6. Encrypted
Note: When enabled, the encryption at rest feature encrypts all data entering the Data Domain system.
You cannot enable encryption at a more granular level.
405
Slide 19
Authorization Workflow
To set encryption on a Data Domain system:
The security officer logs in via
CLI and issues the runtime
authorization policy.
The administrator role issues
the command to enable
encryption via the Enterprise
Manager.
The Enterprise Manager
prompts for security officer
credentials.
With system-accepted
security credentials,
encryption is enabled.
19
Procedures requiring authorization must be dual-authenticated by the security officer and the user in
the admin role.
For example, to set encryption, the admin enables the feature, and the security officer enables runtime
authorization.
A user in the administrator role interacts with the security officer to perform a command that requires
security officer sign off.
In a typical scenario, the admin issues the command, and the system displays a message that security
officer authorizations must be enabled. To proceed with the sign-off, the security officer must enter his
or her credentials on the same console at which the command option was run. If the system recognizes
the credentials, the procedure is authorized. If not, a Security alert is generated. The authorization log
records the details of each transaction.
406
Slide 20
Configuring Encryption
20
With encryption active in the Data Domain system, the Encryption tab within the File System section of
the Data Domain Enterprise Manager shows the current status of system encryption of data at rest.
The status indicates Enabled, Disabled, or Not configured. In the slide, the encryption status is Not
configured.
To configure encryption:
1. Click Configure
(Continued on the next slide)
407
Slide 21
21
You are prompted for a passphrase. The system generates an encryption key and uses the passphrase to
encrypt the key. One key is used to encrypt all data written to the system. After encryption is enabled,
the passphrase is used by system administrators only when locking or unlocking the file system, or when
disabling encryption. The current passphrase size for DD OS 5.2 is 256 characters.
CAUTION: Unless you can reenter the correct passphrase, you cannot unlock the file system and access
the data. The data will be irretrievably lost.
2. Click Next.
You are prompted to choose the encryption algorithm:
Configurable 128-bit or 256-bit Advanced Encryption Standard (AES) algorithm with either:
Confidentiality with Cipher Block Chaining (CBC) mode
Both confidentiality and message authenticity with Galois/Counter (GCM) mode
In this configuration window, you can optionally apply encryption to data that existed on the
system before encryption was enabled.
408
3. Click Restart the system now to enable encryption of data at rest once you have closed the
Configure Encryption window. If you do not click this, you need to disable and re-enable the file
system before encryption will begin.
4. Click OK to select the default AES 256-bit (CBC) algorithm, close the Configure Encryption
window, and continue.
Related CLI commands:
# filesys disable
Disables the file system
# filesys encryption enable
Enables encryption. Enter a passphrase when prompted
# filesys encryption algorithm set algorithm
Sets an alternative cryptographic algorithm (optional). Default algorithm is aes_256_cbc. Other
options are: aes_128_cbc, aes_128_gcm, or aes_256_gcm
# filesys enable
Enables the file system
409
Slide 22
22
Only administrative users with security officer credentials can change the encryption passphrase.
To change the existing encryption passphrase:
1. Disable the file system by clicking the disable button on the State line of the File System section.
The slide shows the file system state as disabled and shut down after the disable button clicked.
2. Click Change Passphrase.
3. Enter the security officer credentials to authorize the passphrase change.
4. Enter the current passphrase.
5. Enter the new passphrase twice.
6. Click Enable file system now if you want to reinstate services with the new passphrase;
otherwise the passphrase does not go into effect until the file system is re-enabled.
7. Click OK to proceed with the passphrase change.
410
Slide 23
Disabling Encryption
Only administrative users with security officer credentials can disable encryption.
To disable encryption on a Data Domain system:
1. Click Disable on the Encryption status line of the Encryption tab.
2. Enter the security officer credentials.
3. Click Restart file system now in order to stop any further encryption of data at rest.
Note: Restarting the file system will interrupt any processes currently running on the Data
Domain system.
4. Click OK to continue.
411
23
412
Slide 24
24
Use file system locking when an encryption-enabled Data Domain system and its external storage
devices (if any) are being transported. Without the encryption provided in file system locking, user data
could possibly be recovered by a thief with forensic tools (especially if local compression is turned off).
This action requires two-user authentication a sysadmin and a security officer to confirm the lockdown action.
File system locking:
Requires the user name and password of a security officer account to lock the file system.
Protects the Data Domain system from unauthorized data access.
Is run only with the file system encryption feature enabled. File system locking encrypts all user
data, and the data cannot be decrypted without the key.
A passphrase protects the encryption key, which is stored on disk, and is encrypted by the
passphrase. With the system locked, this passphrase cannot be retrieved.
Allows only an admin, who knows the set passphrase, to unlock an encrypted file system.
413
Slide 25
25
Note: Before you can lock the file system, the file system must be stopped, disabled, and shut down.
To lock the file system:
1. In the passphrase area, enter the current passphrase (if one existed before) followed by a new
passphrase that locks the file system for transport. Repeat the passphrase in the Confirm New
Passphrase field.
2. Click OK to continue.
After the new passphrase is entered, the system destroys the cached copy of the current
passphrase. Therefore, anyone who does not possess the new passphrase cannot decrypt the
data.
CAUTION: Be sure to take care of the passphrase. If the passphrase is lost, you will never be able
to unlock the file system and access the data. There is no backdoor access to the file system. The
data is irretrievably lost.
3. Shut down the system using the system poweroff command from the command line
interface (CLI).
CAUTION: Do not use the chassis power switch to power off the system. There is no other
method for shutting down the system to invoke file system locking.
414
415
Slide 26
Module 9: Summary
416
26
Slide 1
In any backup environment, it is critical to plan capacity and throughput adequately. Planning ensures
your backups complete within the time required and are securely retained for the needed times. Data
growth in backups is also a reality as business needs change. Inadequate capacity and bandwidth to
perform the backup can cause backups to lag, or fail to complete. Unplanned growth can fill a backup
device sooner than expected and choke backup processes.
The main goal in capacity planning is to design your system with a Data Domain model and configuration
that is able to hold the required data for the required retention periods and have plenty of space left
over to avoid system full conditions.
For throughput planning, the goal is to ensure the link bandwidth is sufficient to perform daily and
weekly backups to the Data Domain system within the backup window allotted. Good throughput
planning takes into consideration network bandwidth sharing, along with adequate backup and system
housekeeping timeframes (windows).
417
418
Slide 2
In this lesson, you will become familiar with the testing and evaluation process that helps to determine
the capacity requirements of a Data Domain system.
Collecting information
Determining and calculating capacity needs
Note: EMC Sales uses detailed software tools and formulas when working with its customers to identify
backup environment capacity and throughput needs. Such tools help systems architects recommend
systems with appropriate capacities and correct throughput to meet those needs. This lesson discusses
the most basic considerations for capacity and throughput planning.
419
Slide 3
Using information collected about the backup system, you calculate capacity needs by understanding
the amount of data (data size) to be backed up, the types of data, the size of a full (complete) backup,
and the expected data reduction rates (deduplication).
Data Domain system internal indexes and other product components use additional, variable amounts
of storage, depending on the type of data and the sizes of files. If you send different data sets to
otherwise identical systems, one system may, over time, have room for more or less actual backup data
than another.
Data reduction factors depend on the type of data being backed up. Some types of challenging
(deduplication-unfriendly) data types include:
pre-compressed (multimedia, .mp3, .zip, and .jpg)
pre-encrypted data
Secondly, retention policies greatly determine the amount of deduplication that can be realized on a
Data Domain system. The longer data is retained, the greater the data reduction that can be realized. A
backup schedule where retained data is repeatedly replaced with new data ensures very little data
reduction.
420
Slide 4
5x
Incremental plus weekly full backup with 2 weeks retention
Daily full backup with 1 week retention
Online and archival use data reduction tends to be capped here
10x
Incremental plus weekly full backup with 1 month of retention
Daily full backup with 2-3 weeks retention
20x
Incremental plus weekly full backup with 2-3 months retention
Daily full backup with 3-4 weeks retention
The reduction factors listed in this slide are examples of how changing retention rates can improve the
amount of data reduction over time.
The compression rates shown are approximate.
A daily full backup held only for one week on a Data Domain system may realize no more than a
compression factor of 5x, while holding weekly backups plus daily incrementals for up to 90 days may
result in 20x or higher compression.
Data reduction rates depend on a number of variables including data types, the amount of similar data,
and the length of storage. It is difficult to determine exactly what rates to expect from any given system.
The highest rates are usually achieved when many full backups are stored.
When calculating capacity planning, use average rates as a starting point for your calculations and refine
them after real data is available.
421
Slide 5
remaining capacity
Base = 200GB
1 Week = 80 GB
1 Retention Period = 640 GB
Calculate the required capacity by adding up the space required in this manner:
First Full backup plus
Incremental backups (the number of days incrementals are runtypically 4-6) plus
Weekly cycle (one weekly full and 4-6 incrementals) times the number of weeks data is retained.
For example, 1 TB of data is backed up, and a conservative compression rate is estimated at 5x (which
may have come from a test or is a reasonable assumption to start with). This gives 200 GB needed for
the initial backup. With a 10 percent change rate in the data each day, incremental backups are 100 GB
each, and with an estimated compression on these of 10x, the amount of space required for each
incremental backup is 10 GB.
As subsequent full backups run, it is likely that the backup yields a higher data reduction rate. 25x is
estimated for the data reduction rate on subsequent full backups. 1 TB of data compresses to 40 GB.
422
Four daily incremental backups require 10 GB each, and one weekly backup needing 40 GB yields a burn
rate of 80 GB per week. Running the 80 GB weekly burn rate out over the full 8-week retention period
means that an estimated 640 GB is needed to store the daily incremental backups and the weekly full
backups.
Adding this to the initial full backup gives a total of 840 GB needed. On a Data Domain system with 1 TB
of usable capacity, this means the unit operates at about 84% of capacity. This may be okay for current
needs. You might want to consider a system with a larger capacity or that can have additional storage
added, which might be a better choice to allow for data growth.
Again, these calculations are for estimation purposes only. Before determining true capacity, use the
analysis of real data gathered from your system as a part of an EMC BRS sizing evaluation.
423
Slide 6
In this lesson, you will become familiar with the testing and evaluation process that helps to determine
the throughput requirements of a Data Domain system.
Note: EMC Sales uses detailed software tools and formulas when working with customers to identify
backup environment capacity and throughput needs. Such tools help systems architects recommend
systems with appropriate capacities and correct throughput to meet those needs. This lesson discusses
the most basic considerations for capacity and throughput planning.
424
Slide 7
Required Throughput =
Largest Backup divided by
Backup Window Time
Backup Server
20 GB/hr
200 GB
10 Hours
While capacity is one part of the sizing calculation, it is important not to neglect the throughput of the
data during backups.
An assumption would be that the greatest backup need is to process a full 200 GB backup within a 10hour backup window. Incremental backups should require much less time to complete, and we could
safely presume that incremental backups would easily complete within the backup window.
Dividing 200 GB by 10 hours yields a raw processing requirement of at least 20 GB per hour.
Over an unfettered 1 GB network with maximum bandwidth available (with a theoretical 270 GB per
hour throughput), this backup would take less than 1 hour to complete. If the network were sharing
throughput resources during the backup time window, the amount of time required to complete the
backup would increase considerably.
It is important to note the effective throughput of both the Data Domain system and the network on
which it runs. Both points in data transfer determine whether the required speeds are reliably feasible.
Feasibility can be assessed by running network testing software such as iperf.
425
Slide 8
This lesson applies the formulae from the previous two lessons to selecting the best Data Domain
system to fit specific capacity and throughput requirements.
426
Slide 9
The system capacity numbers of a Data Domain system assume a mix of typical enterprise backup data
(such as file systems, databases, mail, and developer files). The low and high ends of the range are also
determined by how often data is backed up.
The maximum capacity for each Data Domain model assumes the maximum number of drives (either
internal or external) supported for that model.
Maximum throughput for each Data Domain model is dependent mostly on the number and speed
capability of the network interfaces being used to transfer data. Some Data Domain systems have more
and faster processors so they can process incoming data faster.
Note: Advertised capacity and throughput ratings for Data Domain products are best case results, based
on tests conducted in laboratory conditions. Your throughput will vary depending on your network
conditions.
The number of network streams you may expect to use depends on your hardware model. Refer to the
specific model Data Domain system guide to learn specific maximum supported stream counts.
427
Slide 10
Selecting a Model
10
Standard practices are to be conservative in calculating capacity and throughput required for the needs
of a specific backup environment; estimate the need for greater throughput and capacity rather than
less. Apply your requirements against conservative ratings (not the maximums) of the Data Domain
system needed to meet requirements. Allow for a minimum 20% buffer in both capacity and throughput
requirements.
Required capacity divided by maximum capacity of a particular model times 100 equals the
capacity percentage.
Required throughput divided by the maximum throughput of a particular model times 100
equals the throughput percentage.
If the capacity or throughput percentage for a particular model does not provide at least a 20% buffer,
then calculate the capacity and throughput percentages for a Data Domain model of the next higher
capacity. For example, if the capacity calculation for a DD620 yields a capacity percentage of 91%, only a
9% buffer is available, so you should look at the DD640 next to calculate its capacity.
Sometimes one model provides adequate capacity, but does not provide enough throughput, or vice
versa. The model selection must accommodate both throughput and capacity requirements with an
appropriate buffer.
428
Slide 11
Model A
Model B
3,350 GB Capacity
7,216 GB Capacity
3,248/3,350 = 97%
55% Buffer
3% Buffer
11
In this example, the capacity requirement of 3248 GB fills Model A to 97% of capacity.
Model B has a capacity of 7.2 TB. The capacity percentage estimated for Model B is 45%, and the 55%
buffer is more than adequate.
429
Slide 12
7,216 GB Capacity
3,248/7,216 = 45%
55% Buffer
OR?
Model A
3,350 GB Capacity
3,248/3,350 = 97%
3% Buffer
Model A
7,974 GB Capacity
(1 Additional Shelf)
3,248/7,974 = 40%
60% Buffer
430
12
Slide 13
Model A
Model B
1,334 GB/hr
2,252 GB/hr
1,200/1,334= 89%
47% Buffer
11% Buffer
13
This calculation is similar to calculating the capacity buffer for selected models.
Select a model that meets throughput requirements with no more than 80% of the models maximum
throughput capacity.
In this example, the throughput requirement of 1,200 GB per hour would load Model A to more than
89% of capacity, with a buffer of 11%.
A better selection is a model with higher throughput capability, such Model B, rated with 2,252 GB per
hour throughput and offering a 47% buffer in estimated throughput.
431
Slide 14
Model A
Model A
Model B
3,350 GB Capacity
1,334 GB/hr Throughput
7,974 GB Capacity
(1 Additional Shelf)
1,334 GB/hr Throughput
7,216 GB Capacity
2,252 GB/hr Throughput
3% Capacity Buffer
11% Throughput Buffer
14
In summary, Model A with an additional shelf might meet the capacity requirement; Model B is the
minimum model that would meet the throughput performance requirement.
While Model A meets the storage capacity requirement, Model B is the best choice based upon the need
for greater throughput.
Note: Another option is to consider implementing DD Boost with Model A to raise the throughput
rating.
432
Slide 15
15
This lesson covers basic throughput monitoring and tuning on a Data Domain System.
There are three primary steps to throughput:
Identifying potential bottlenecks that might reduce the data transfer rates during backups and
restores.
Displaying and understanding Data Domain system performance metrics.
Identifying and implementing viable solutions to resolve slower-than-expected throughput
issues.
433
Slide 16
Throughput Bottlenecks
Clients
Network
Backup Server
Network
16
Integrating Data Domain systems into an existing backup architecture can change the responsiveness of
the backup system. Bottlenecks can appear and restrict the flow of data being backed up.
Some possible bottlenecks are:
Clients
Disk Issues
Configuration
Connectivity
Network
Wire speeds
Switches and routers
Routing protocols and firewalls
Backup Server
Configuration
Load
Connectivity
434
As demand shifts among system resources such as the backup host, client, network, and Data Domain
system itself the source of the bottlenecks can shift as well.
Eliminating bottlenecks where possible, or at least mitigating the cause of reduced performance through
system tuning, is essential to a productive backup system. Data Domain systems collect and report
performance metrics through real-time reporting and in log files to help identify potential bottlenecks
and their causes.
435
Slide 17
2
3
1---------------Protocol----------------4
ops/s
----0
0
0
0
0
0
0
0
load
--%-0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
data(MB/s)
wait(ms/MB)
--in/out----in/out--0.00/ 0.00 221.02/ 80.53
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00 198.07/ 81.24
0.00/ 0.00
0.00/ 0.00
Note: The above output has been simplified for this lesson to
show only pertinent areas of # system show performance
output.
2.
3.
4.
second
load - Load percentage
(pending ops/total RPC
ops *100)
data (MB/s) - Protocol
throughput. Amount of
data the file system can
read from and write to
the kernel socket buffer
wait (ms/MB) - Time
taken to send and
receive 1MB of data
from the file system to
kernel socket buffer
17
If you notice backups running slower than expected, it is useful to review system performance metrics.
From the command line, use the command system show performance
The command syntax is:
# system show performance [ {hr | min | sec} [ {hr | min | sec} ]]
For example:
# system show performance 24 hr 10 min
This shows the system performance for the last 24 hours at 10 minute intervals. 1 minute is the
minimum interval.
Servicing a file system request consists of three steps: receiving the request over the network,
processing the request, and sending a reply to the request.
436
ops/s
Operations per second.
load
Load percentage (pending ops/total RPC ops *100).
data (MB/s in/out)
Protocol throughput. Amount of data the file system can read from and write to the kernel
socket buffer.
wait (ms/MB in/out)
Time taken to send and receive 1MB of data from the file system to kernel socket buffer.
437
Slide 18
18
An important section of the system show performance output is the CPU and disk utilization.
CPU avg/max: The average and maximum CPU utilization; the CPU ID of the most-loaded CPU is
shown in the brackets.
Disk max: Maximum disk utilization over all disks; the disk ID of the most-loaded disk is shown in
the brackets.
If the CPU utilization shows 80% or greater, or if the disk utilization is 60% or greater for an extended
period of time, the Data Domain system is likely to run out of disk capacity or is the CPU processing
maximum. Check that there is no cleaning or disk reconstruction in progress. You can check cleaning and
disk reconstruction in the State section of the system show performance report.
438
The following is a list of states and their meaning indicated in the # system show performance output:
C Cleaning
D Disk reconstruction
B GDA (also known as multinode cluster [MNC] balancing)
V Verification (used in the deduplication process)
M Fingerprint merge (used in the deduplication process)
F Archive data movement (active to archive)
S Summary vector checkpoint (used in the deduplication process)
I Data integrity
Typically the processes listed in the State section of the system show performance report impact the
amount of CPU utilization for handling backup and replication activity.
439
Slide 19
19
In addition to watching disk utilization, you should monitor the rate at which data is being received and
processed. These throughput statistics are measured at several points in the system to assist with
analyzing the performance to identify bottlenecks.
If slow performance is happening in real-time, you can also run the following command:
# system show stats interval [interval in seconds]
Example:
# system show stats interval 2
Adding 2 produces a new line of data every 2 seconds.
The system show stats command reports CPU activity and disk read/write amounts.
In the example report shown, you can see a high and steady amount of data inbound on the network
interface, which indicates that the backup host is writing data to the Data Domain device. We know it is
backup traffic and not replication traffic as the Repl column is reporting no activity.
440
Low disk-write rates relative to steady inbound network activity are likely because much of the incoming
data segments are duplicates of segments already stored on disk. The Data Domain system is identifying
the duplicates in real time as they arrive and writing only those new segments it detects.
441
Slide 20
Tuning Solutions
20
If you experience system performance concerns, for example, you are exceeding your backup window,
or if throughput appears to be slower than expected, consider the following:
Check the Streams columns of the system show performance command to make sure that the
system is not exceeding the recommended write and read stream count. Look specifically under
rd (active read streams) and wr (active write streams) to determine the stream count. Compare
this to the recommended number of streams allowed for your system. If you are unsure about
the recommended streams number, contact Data Domain Support for assistance.
Check that CPU utilization (1 process) is not unusually high. If you see CPU utilization at or
above 80%, it is possible that the CPU is under-powered for the load it is required to currently
process.
Check the State output of the system show performance command. Confirm that there
is no cleaning (C) or disk reconstruction (D) in progress.
Check the output of the replication show performance all command. Confirm
that there is no replication in progress. If there is no replication activity, the output reports
zeros. Press Ctrl + c to stop the command. If replication is occurring during data ingestion and
causing slower-than-expected performance, you might want to separate these two activities in
your backup schedule.
442
If CPU utilization (1 process) is unusually high for any extended length, and you are unable to
determine the cause, contact Data Domain Support for further assistance.
When you are identifying performance problems, it is important to note the actual time when
poor performance was observed to know where to look in the system show performance output
chronology.
An example of a network-related problem occurs when the client is trying to access the Data Domain
system over a 100 MBit network, rather than a 1 GB network.
Check network settings, and ensure the switch is running 1 GB to the Data Domain system and is
not set to 100 MBit
If possible, consider implementing link aggregation.
Isolate the network between the backup server and the Data Domain system. Shared bandwidth
adversely impacts optimum network throughput.
Consider implementing DD Boost to improve overall transfer rates between backup hosts and
Data Domain systems.
443
Slide 21
ingestion
Implement link aggregation
Consider implementing DD Boost
Maximize Data Domain system storage capacity
444
21