Sei sulla pagina 1di 44

EMC Celerra Network Server

Release 5.6.43

Using Celerra Data Deduplication


P/N 300-008-026 REV A02

EMC Corporation Corporate Headquarters: Hopkinton, MA 01748-9103 1-508-435-1000 www.EMC.com

Copyright 2009 EMC Corporation. All rights reserved. Published April, 2009 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date regulatory document for your product line, go to the Technical Documentation and Advisories section on EMC Powerlink. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners.

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Contents

Preface.................................................................................................................................5

Chapter 1: Introduction....................................................................................................9
System requirements...........................................................................................................10 Restrictions and limitations................................................................................................10 User interface choices..........................................................................................................11 Related information.............................................................................................................12

Chapter 2: Concepts........................................................................................................13
Overview...............................................................................................................................14 Planning application integration.......................................................................................15

Chapter 3: Configuring...................................................................................................19
Enable file system deduplication.......................................................................................20

Chapter 4: Managing......................................................................................................21
Display deduplication-enabled file systems....................................................................22 List all deduplication-enabled file systems......................................................................23 Query deduplication-enabled file systems.......................................................................24 Suspend file system deduplication....................................................................................25 Undo file system deduplication.........................................................................................26 Using deduplication parameters........................................................................................27 List all deduplication parameters...........................................................................28 List information for a specific deduplication parameter.....................................29

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Contents

Exclude file extensions from deduplication..........................................................30 Set access time for deduplication............................................................................31 Set maximum file size for deduplication...............................................................32 Set minimum file size for deduplication................................................................33 Set minimum scan interval for deduplication.......................................................34 Set modification time for deduplication................................................................35 Set the deduplication data comparison capability...............................................36

Chapter 5: Troubleshooting..........................................................................................37
EMC E-Lab Interoperability Navigator.............................................................................38 Error messages......................................................................................................................38 Known problems..................................................................................................................39 Customer training programs..............................................................................................40

Terminology.....................................................................................................................41

Index ........................................................................................................................................43

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Preface

As part of an effort to improve and enhance the performance and capabilities of its product lines, EMC periodically releases revisions of its hardware and software. Therefore, some functions described in this document may not be supported by all versions of the software or hardware currently in use. For the most up-to-date information on product features, refer to your product release notes. If a product does not function properly or does not function as described in this document, please contact your EMC representative. Special notice conventions EMC uses the following conventions for special notices:
A caution contains information essential to avoid data loss or damage to the system or equipment.

Important: An important note contains information essential to operation of the software.

Note: A note presents information that is important, but not hazard-related.

Hint: A note that provides suggested advice to users, often involving follow-on activity for a particular action.

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Preface

Typographical conventions EMC uses the following type style conventions in this document.
Normal Used in running (nonprocedural) text for:

Names of resources, attributes, pools, Boolean expressions, buttons, DQL statements, keywords, clauses, environment variables, functions, utilities URLs, pathnames, filenames, directory names, computer names, filenames, links, groups, service keys, file systems, notifications

Bold

Used in running (nonprocedural) text for names of commands, daemons, options, programs, processes, services, applications, utilities, kernels, notifications, system calls, and man pages. Used in procedures for:

Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) What user specifically selects, clicks, presses, or types

Italic

Used for:

Full titles of publications (citations) referenced in text User input variable identifiers

Helvetica bold

User interface elements (what users specifically select, click, or press) Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) Command and program options

Courier bold

Indicates specific user input (such as commands). Indicates variables in procedures and syntax diagrams. Encloses available selections when they are optional. Separates alternative selections. The bar means or. Encloses available selections when they are required. Represents nonessential information omitted from an example.

Courier italic
[] | {} ...

Where to get help EMC support, product, and licensing information can be obtained as follows.

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Preface

Product information For documentation, release notes, software updates, or for information about EMC products, licensing, and service, go to the EMC Powerlink website (registration required) at: http://Powerlink.EMC.com. Technical support For technical support, go to Powerlink and choose Support. On the Support page, you can access Support Forums, request a product enhancement, talk directly to an EMC representative, or open a service request. To open a service request, you must have a valid support agreement. Please contact your EMC sales representative for details about obtaining a valid support agreement or to answer any questions about your account. Your comments Your suggestions will help us continue to improve the accuracy, organization, and overall quality of the user publications. Please send your opinion of this document to:
techpubcomments@EMC.com

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Preface

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

1 Introduction

The EMC Celerra file-level data deduplication feature increases file storage efficiency by eliminating redundant data from the files stored in the file system, thereby saving storage space and money. For each file system, file-level deduplication gives the Data Mover the ability to process files in order to compress them, as well as the ability to share the same instance of the data only if they happen to be identical. Deduplication functionality operates on whole files and is applicable to files that are static or nearly static. For example, if there are 50 unique files in a file system that can be deduplicated, 50 unique files will still exist but the data will be compressed, yielding a space savings of up to 50 percent. If there are 70 identical copies of a presentation document in a file system, 70 files will still exist but they will all share the same file data. In this example, the data usage will decrease by a factor of almost 70. In addition, the one instance of the file data shared by the 70 files will also be compressed, providing further space savings. Celerra deduplication processes file data, not metadata. This means that duplicate files can have different names, permissions and timestamps. This document is part of the EMC Celerra Network Server documentation set and is intended for system administrators responsible for creating and managing a deduplication-enabled file system. Topics include:

System requirements on page 10 Restrictions and limitations on page 10 User interface choices on page 11 Related information on page 12

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Introduction

System requirements
Table 1 on page 10 describes the EMC Celerra Network Server software, hardware, network, and storage configurations.
Table 1. System requirements Software Hardware Celerra Network Server version 5.6.43 or later Celerra Network Server 514 Data Mover or later NS500 or later Network Storage No specific network requirements No specific storage requirements

Restrictions and limitations


If any of this information is unclear, contact your EMC Customer Support Representative for assistance:

File systems enabled for processing by Celerra Data Deduplication cannot be replicated using Celerra Replicator (V1). In addition, you cannot enable deduplication on a file system that is already being replicated by Celerra Replicator (V1). File systems enabled for processing by Celerra Data Deduplication may be replicated using Celerra Replicator (V2). All destination file systems are required to support Celerra Data Deduplication; therefore, the destination Celerra Network Server must be running Celerra Network Server Version 5.6.43 or later. Celerra deduplication will not process files smaller than 24 KB. The file system must have at least 1 MB of free space before deduplication can be enabled. If there is not enough free space, an error message is generated and the server log is updated. To avoid CIFS client timeouts when modifying deduplicated files, the Celerra will, by default, not deduplicate any file over 200 MB in size. For NFS-only environments, you can set this value higher to potentially achieve greater space savings. You might not want to use Data Movers that have heavy usage for deduplication. The deduplication scan and ingest process will adaptively throttle itself when the Data Mover is very busy. Therefore, a Data Mover that maintains a high level of usage will not be able to scan or deduplicate files as quickly as a less busy system. Accessing deduplicated files also uses more system resources than accessing normal files. This additional load

10

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Introduction

for deduplication access may negatively impact other client access if the system is already very busy.

Accessing deduplicated files requires more resources than accessing normal files, and the read rate of these files may be less than that of normal files. Because the data being deduplicated is typically cold, user access will likely not be affected. However, PAX-based NDMP backups, which must reduplicate the files during the backup process, will be slowed when backing up deduplicated files. This will be particularly noticeable for small files. Celerra deduplication does not deduplicate data across or between file systems. Celerra MPFS may be used to access file systems on which Celerra deduplication is enabled. However, the MPFS client transparently falls back to standard CIFS or NFS when accessing deduplicated files. Celerra deduplication-enabled file systems can be backed up using Celerra NDMP Volume Backup (NVB) and restored in full by using the full destructive restore (FDR) method. However, a single file restore or a file-by-file restore of deduplicated files from NVB backups is not supported and will be rejected by the Celerra. EMC recommends that a single file restore or a file-by-file restore be performed using local or remotely replicated SnapSurecheckpoints instead of by using NVB backups. If Celerra deduplication is enabled on a file system that contains iSCSI LUNs, the iSCSI LUNs will not be deduplicated. Celerra deduplication will not process or affect alternate data streams (also known as named attributes) associated with files and directories in the file system.

User interface choices


The Celerra Network Server offers flexibility in managing networked storage based on your support environment and interface preferences. This document describes how to configure deduplicated files and a deduplication-enabled file system by using the command line interface (CLI). You can also perform all of these tasks by using one of the Celerra management applications:

Celerra Manager Basic Edition Celerra Manager Advanced Edition

For additional information about managing your Celerra:


Learning about EMC Celerra on the EMC Celerra Network Server Documentation CD Celerra Manager online help

The Installing Celerra Management Applications document includes instructions on launching Celerra Manager.

User interface choices

11

Introduction

Related information
Specific information related to the features and functionality described in this document are included in:

EMC Celerra Glossary EMC Celerra Network Server Command Reference Manual EMC Celerra Network Server Parameters Guide Online Celerra man pages

The EMC Celerra Network Server Documentation CD supplied with Celerra and also available on Powerlink provides the complete set of EMC Celerra customer publications. After logging in to Powerlink, go to Support Technical Documentation and Advisories Hardware/Platforms Documentation Celerra Network Server. On this page, click Add to Favorites. The Favorites section on your Powerlink home page provides a link that takes you directly to this page.

12

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

2 Concepts

The concepts and planning considerations to understand deduplication are:


Overview on page 14 Planning application integration on page 15

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

13

Concepts

Overview
For each file system, file-level deduplication gives the Data Mover the ability to process files in order to compress them, as well as the ability to share the same instance of the data only if they happen to be identical. Deduplication functionality operates on whole files and is applicable to files that are static or nearly static. During deduplication, each deduplication-enabled file system on the Data Mover is scanned for files that match a specific criteria, such as last access time and a modification time older than a certain date. Once a file is found that matches the criteria, the file data is deduplicated, and compressed if appropriate. Different instances of the file can have different names, security attributes, and timestamps. None of the metadata is affected by deduplication. When a user reads a file that was deduplicated, the Celerra retrieves the data so that the NAS clients are unaware that the data was deduplicated. Read operations decompress the portion of the file requested on-the-fly, in memory. Read operations do not cause the file to be decompressed on disk. When a user tries to write to a deduplicated file, the entire file is decompressed and reduplicated on disk to allow the write operation to be performed. File-level deduplication relies on the SHA-1 hash calculation to detect identical files. However, if you do not want to use SHA-1, deduplication can be disabled and still allow file-level deduplication to compress files. Set the deduplication data comparison capability on page 36 contains instructions on changing the deduplication data comparison parameter. If the Celerra detects over 65,535 identical copies of a specific deduplicated file on the same file system, it will no longer deduplicate that specific file. However, it will still compress the data of any additional files that match this working text. There are three possible states for deduplication: on, off, or suspended.

on = Enable deduplication processing on a file system. Deduplication is the process used to compress redundant data, allowing space to be saved on a file system. Setting the state to on schedules the file system to be the next file system that is scanned. If there are no active scans, it starts immediately. off = Undo all deduplication processing. Do not perform any new space reduction. Any data that was deduplicated is now reduplicated, which is the process used to restore a file that was deduplicated to its original condition. This process may take some time. If reduplication fails, such as when the system detects that the file system is too small to contain the reduplicated data, then the state transitions to the suspended state, the file system is left in a consistent, usable state, and a CCMD message will be sent to the server's event log. If reduplication succeeds, then it remains in the off state. suspended = Suspend deduplication processing on a file system. The state transitions to the suspended state. No new space reduction is performed, but existing space-reduced files remain as is.

14

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Concepts

Planning application integration


The applications used to ensure a successful integration with Celerra Data Deduplication are:

CDMS Celerra file-level retention Celerra FileMover archiving Celerra NDMP Volume Backup Multi-Path File System (MPFS) Network (LAN) and Celerra NDMP (non-NVB) backups Point-in-time views of the file system File system space usage Quotas Replication

CDMS Deduplication cannot be used with CDMS. You cannot enable the deduplication functionality on a CDMS migration (MGFS) file system. Deduplication cannot be used to migrate data out of file system while you are using CDMS to migrate data into it. You can only enable deduplication on a file system after you have finished using CDMS and converted the file system to UxFS. Celerra file-level retention Celerra Data Deduplication can be enabled on both enterprise and compliance styles of Celerra File-Level Retention (FLR) file systems without compromising the protection offered to the data that the file systems contain. Files, including locked files, contained on both styles of FLR file systems can be deduplicated. Celerra FileMover archiving Celerra Data Deduplication is transparent to Celerra FileMover archiving and the two features can be used together to maximize the storage efficiency of the file storage solution. Any files archived from a Celerra file system on which Celerra Data Deduplication is enabled will be written to and read from the archive storage in their un-deduplicated form. However, the archive storage system may deduplicate the archived data itself. Celerra file systems used as repositories for archived data are good candidates for Celerra Data Deduplication. Celerra NDMP Volume Backup Celerra Data Deduplication-enabled file systems can be backed up using Celerra NDMP Volume Backup (NVB) and restored in full by using the full destructive restore (FDR) method. Because NVB operates at the block level (while preserving the history of which files it backs

Planning application integration

15

Concepts

up), backing up a deduplicated file system does not cause any data reduplication. The data in the file system is backed up in its reduced form. The benefits of the space-saving storage efficiency realized in the production file system flow through to backups. However, a single file restore or a file-by-file restore of space-reduced files from NVB backups is not supported and will be rejected by the Celerra. EMC recommends that NVB backups of deduplicated file systems should be used as part of a strategy where a single file restore or file-by-file restore is done from locally or remotely replicated SnapSure checkpoints, not from "tape." The vbb facility's skipDedupFiles parameter determines whether to skip deduplicated files during a file-level restore. Refer to the Celerra Network Server Parameters Guide for further information. Multi-Path File System (MPFS) When accessing deduplicated files, MPFS clients will use standard CIFS or NFS, not MPFS accelerated I/O.
Note: An application on the MPFS client is not affected, except that it will likely experience some performance degradation if it accesses a migrated file.

Network (LAN) and Celerra NDMP (non-NVB) backups When backed up over the network using CIFS or NFS, space-reduced files are reduplicated to their original size for transfer to the backup application, although the data is not reduplicated on disk. The benefits of the space-saving storage efficiency realized in the production file system will not flow through to backups when using network-based or PAX-based NDMP backups of Celerra deduplicated file systems. In addition, when restoring files from a PAX-based NDMP backup, the file will be restored as a normal file and will not be deduplicated. Therefore, the restored file will require more file system space after the restore than it was consuming when backed up. Depending on the amount of available space within the file system, restoring previously deduplicated files from tape could potentially consume all free space within the file system. Point-in-time views of the file system The deduplication process releases space in the production file system immediately. However, it may cause blocks to be copied to the SnapSure save volume (SavVol) in the process. Deduplicating data associated with a file involves copying the data within the file system so it can be compressed as well as deduplicated. Since SnapSure checkpoints copy changed blocks to the SavVol on first write, the blocks that are deduplicated may need to be copied to the SavVol in order to preserve a previous checkpoint point-in-time view of the file system. These blocks are freed when the corresponding checkpoint gets deleted or refreshed and are then available for re-use by other checkpoints. How many blocks will need to be copied to the SavVol during the deduplication process is a function of how full the file system is, the rate of change in it, and so on, and therefore is difficult to predict. By default the system is configured to abort deduplication operations on a file system before it causes the SavVol to extend. This avoids the SavVol expanding due to deduplication activity. If the deduplication process is aborted in this way, an alert is generated that explains what

16

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Concepts

happened. The Celerra administrator can choose to extend the SavVol or simply let the deduplication process execute again on its next scheduled run.
Note: When Celerra Data Deduplication is first enabled on a file system with a SavVol that is relatively full, the system may not be able to achieve the maximum space savings the first time the deduplication process is run. You may need to let the deduplication process run multiple times before achieving the maximum space savings. Known problems on page 39 describes workarounds.

File system space usage If the file system is configured to auto-extend, the deduplication process will abort if the space usage is greater than the configured auto-extension threshold minus 5 percent. This avoids the file system unexpectedly auto-extending. If the file system is not configured to auto-extend, the deduplication process will abort if the file system usage is equal to or greater than 95 percent. The behavior described above ensures that the deduplication process has enough free space in which to work, and thereby reduces the need to automatically extend the file system. Known problems on page 39 describes workarounds. Quotas When you set file system quotas on a Celerra Network Server, those quotas monitor and control the usage of the file systems on that Celerra system. The Celerra Network Server can track user, group, and tree quotas by using either of two quota policies: blocks or filesize. By default, the quota policy is set to blocks. When configured to use the blocks quota policy, the Celerra Network Server calculates quota usage by counting the number of file system blocks each file occupies on disk. For example, a 1 KB file counts as 8 KB in the quota because the file consumes one 8 KB block on disk. When the Celerra deduplicates a file, the block count is subtracted from the quota usage of the owner of the file. Similarly, when a file is reduplicated, the usage of a block-based quota will increase. When configured to use the file-size quota policy, the Celerra Network Server calculates quota usage by counting the logical size of files. Since deduplication activity does not affect the logical size of files, it has no affect on file-sizebased quota usage. If the quota policy is set to blocks, the user, group, or tree quota will reflect the reduced size of the file on disk. Files with deduplicated file data each count towards any relevant user, group, or tree quota. When calculating the storage required in the file system on the Celerra Network Server remember that each deduplicated file consumes a minimum of one inode, regardless of the quota policy setting. In addition, the Celerra Network Server will allocate an additional inode in the file system while it deduplicates a file in the file system, and while it reduplicates the content of a file. These temporary inodes count toward any inode-based quotas in use. If inode quotas prevent reduplication and write operations to deduplicated files, the server log will contain a message stating either that a hard quota was reached or exceeded, or that the quota was exceeded

Planning application integration

17

Concepts

and the reduplication of the file failed. EMC strongly recommends that you avoid letting the system reach hard inode quota limits to avoid disrupting deduplication-related activity. The server_df command always displays the inodes and blocks in use in the file system on the Celerra Network Server. Using Quotas on EMC Celerra provides additional information about setting quotas. Replication Deduplicating the contents of a file system before it is replicated using Celerra Replicator can greatly reduce the amount of data that has to be sent over the network as part of the initial baseline copy process. Once replication and deduplication are running together, the impact of deduplication on the amount of data transferred over the network will depend on the relative timing of replication updates and deduplication runs. In all but the most extreme circumstances, replication updates will be more frequent than deduplication scans of a file system. New and changed data in the file system will almost always be replicated in its non-deduplicated form first and any subsequent deduplication of that data will prompt additional replication traffic due to the changes within the file system. This effect will be true of any deduplication solution that post-processes data and updates remote replicas of the data set more frequently than the deduplication process is run. The space savings realized by the production file system will be reflected on the destination file system.

18

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

3 Configuring

The task to configure a deduplication-enabled file system is:

Enable file system deduplication on page 20

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

19

Configuring

Enable file system deduplication


Note: If you want to exclude file extensions from the deduplication process, set up the file extension exclusion list before running deduplication for the first time. If you add exclusions after deduplication has been run on a file system, the new exclusions take affect from that point forward and do not affect what has already been deduplicated. Exclude file extensions from deduplication on page 30 contains instructions on excluding file extensions. Action To enable deduplication space-reduction processing, use this command syntax: $ fs_dedupe -modify where: <fs_name> = name of the file system <fs_id> = identifier of the file system Example: To enable deduplication space-reduction processing on file system fs_ufs1, type: $ fs_dedupe -modify fs_ufs1 -state on Output Done Note This feature is disabled by default. Setting the deduplication state to on prompts an immediate scan of a file system, even if the state was already set to on. If another scan is in progress on the Data Mover, it is aborted. The file system must have at least 1 MB of free space before deduplication can be enabled. If there is not enough free space, an error message is generated and the server log is updated. { <fs_name> | id=<fs_id> } -state on

20

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

4 Managing

The tasks to manage a deduplication-enabled file system are:


Display deduplication-enabled file systems on page 22 List all deduplication-enabled file systems on page 23 Query deduplication-enabled file systems on page 24 Suspend file system deduplication on page 25 Undo file system deduplication on page 26 Using deduplication parameters on page 27

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

21

Managing

Display deduplication-enabled file systems


Action To display deduplication-enabled file systems on the Celerra, type: $ nas_fs -info where: <fs_name> = name of the file system <fs_id> = identifier of the file system To display the deduplication-enabled file system fs_ufs1, type: $ nas_fs -info fs_ufs1 Output nas_fs -info fs_ufs1 id = 401 name = fs_ufs1 acl = 0 in_use = True type = uxfs worm = off volume = v1232 pool = symm_std_rdf_src member_of = root_avm_fs_group_8 rw_servers = server_2 ro_servers = rw_vdms = ro_vdms = auto_ext = no,virtual_provision=no deduplication = On ckpts = fsufs1_ckpt1 stor_devs = 000190100563-0277 disks = d61 disk=d61 stor_dev=000190100563-0277 addr=c0t4l6-00-0 server=server_2 disk=d61 stor_dev=000190100563-0277 addr=c48t4l6-00-0 server=server_2 [-size ] { -all | <fs_name> | id=<fs_id> }

22

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Managing

List all deduplication-enabled file systems


Action To list all deduplication-enabled file systems on the Celerra, type: $ fs_dedupe -list Output fs_dedupe -list id name state status usage space_saved 401 fs_ufs1 On Idle 119% 93 MB (26%)

time_of_last_scan

original_data_size 352 MB

Wed Nov 12 13:54:29 EST 2008

List all deduplication-enabled file systems

23

Managing

Query deduplication-enabled file systems


Action To query deduplication-enabled file systems on the Celerra, use this command syntax: $ fs_dedupe -info where: <fs_name> = name of the file system <fs_id> = identifier of the file system Example: To query the deduplication-enabled file system fs_ufs1, type: $ fs_dedupe -info fs_ufs1 Output Id = 90 Name = fs_ufs1 Deduplication = On Status = Idle As of the last file system scan (Fri Nov 28 10:31:16 EST 2008): Files scanned = 1986265 Files deduped = 606472 (30%) File system capacity = 1032575 MB Original data size = 875459 MB (84% of current file system capacity) Space saved = 341622 MB (39% of original data size) Note The Status option can be either Idle, Scanning, or Reduplicating. { -all | <fs_name> | id=<fs_id> }

24

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Managing

Suspend file system deduplication


Setting the deduplication state to suspended stops deduplication processing on a file system. No new space reduction is performed, but existing space-reduced files remain as is.
Action To suspend deduplication space-reduction operation but keep existing space-reduced files, use this command syntax: $ fs_dedupe -modify where: <fs_name> = name of the file system <fs_id> = identifier of the file system Example: To suspend deduplication processing on file system fs_ufs1, type: $ fs_dedupe -modify fs_ufs1 -state suspended Output Done { <fs_name> | id=<fs_id> } -state suspended

Suspend file system deduplication

25

Managing

Undo file system deduplication


Undoing deduplication processing does not perform any new space reduction. Any data that was deduplicated is now reduplicated, which is the process used to restore a file that was deduplicated to its original condition. This process may take some time. If reduplication fails, such as when the system detects that the file system is too small to contain the reduplicated data, then the state transitions to the suspended state, the file system is left in a consistent, usable state, and a CCMD message will be sent to the server's event log. If reduplication succeeds, then it remains in the off state.
Action To reduplicate files and remove all space-reduced data, use this command syntax: $ fs_dedupe -modify where: <fs_name> = name of the file system <fs_id> = identifier of the file system Example: To remove all space-reduced data from file system fs_ufs1, type: $ fs_dedupe -modify fs_ufs1 -state off Output Done { <fs_name> | id=<fs_id> } -state off

26

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Managing

Using deduplication parameters


Use the dedupe facility command to set deduplication parameters, which allow you to:

List all deduplication parameters on page 28 List information for a specific deduplication parameter on page 29 Exclude file extensions from deduplication on page 30 Set access time for deduplication on page 31 Set maximum file size for deduplication on page 32 Set minimum file size for deduplication on page 33 Set minimum scan interval for deduplication on page 34 Set modification time for deduplication on page 35 Set the deduplication data comparison capability on page 36

Using deduplication parameters

27

Managing

List all deduplication parameters


Action For the dedupe facility, to list all deduplication parameters with default, current, and configured values, use this command syntax: $ server_param <movername> -facility dedupe -list where: <movername> = name of the Data Mover Example: To list all deduplication parameters on server_2, type: $ server_param server_2 -facility dedupe -list Output server_2 : param_name fileExtensionExcludeList singleInstancingEnabled minimumSize savVolThreshold accessTime maximumSize caseSensitive throttle.cpuLowTrigger throttle.cpuHighTrigger minimumScanInterval modificationTime

facility dedupe dedupe dedupe dedupe dedupe dedupe dedupe dedupe dedupe dedupe dedupe

default '' 1 24 90 30 200 0 40 75 7 60

current '' 1 24 90 30 200 1 40 75 2 60

configured 1 30 1 2 60

28

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Managing

List information for a specific deduplication parameter


Action For the dedupe facility, to list information for a specific deduplication parameter, use this command syntax: $ server_param <movername> -facility dedupe -info <paramname> -verbose where: <movername> = name of the Data Mover <paramname> = name of the parameter (case-sensitive). Valid parameters are: fileExtensionExcludeList; singleInstancingEnabled; minimumSize; maximumSize; accessTime; caseSensitive; throttle.cpuLowTrigger; throttle.cpuHighTrigger; and modificationTime. You must use a colon between entries when specifying multiple extensions. Example: To list information for the minimumSize deduplication parameter on server_2, type: $ server_param server_2 -facility dedupe -info minimumSize -verbose Output server_2 : name = facility_name = default_value = current_value = configured_value = user_action = change_effective = range = description = will not be deduplicated.

minimumSize dedupe 24 24 24 none immediate (0,1000) Files less than or equal to this size in KB Setting this value to 0 disables this test.

detailed_description This value is the size in KB that limits deduplication. Files of this size or smaller will not be deduplicated. Files that are greater than this size will be candidates for deduplication. This value should not be set lower than 24 KB. Changing this value will take effect on the next scan operation but will not affect files there were deduplicated in previous scans.

List information for a specific deduplication parameter

29

Managing

Exclude file extensions from deduplication


Note: If you want to exclude certain file extensions from the deduplication process, set up the file extension exclusion list before running deduplication for the first time. If you add exclusions after deduplication has been run on a file system, the new exclusions take effect from that point forward and do not affect files that were deduplicated in previous scans. Action For the dedupe facility, to exclude file extensions from the deduplication process, use this command syntax: $ server_param <movername> -facility dedupe -modify fileExtensionExcludeList -value <new_value> where: <movername> = name of the Data Mover <new_value> = colon-delimited list of filename extensions to be excluded from deduplication. Each extension must include the leading dot. The default value is ' '. Example: To exclude mp3 and zip files from the deduplication process on server_2, type: $ server_param server_2 -facility dedupe -modify fileExtensionExcludeList -value .mp3:.zip Output server_2: done

30

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Managing

Set access time for deduplication


Action For the dedupe facility, to set the access time parameter for Celerra Data Deduplication, use this command syntax: $ server_param <movername> -facility dedupe -modify accessTime -value <new_value> where: <movername> = name of the Data Mover <new_value> = the minimum required file age in days based on read access time. Files that have been read within the specified number of days will not be deduplicated. This parameter does not apply to files with an FLR state. Setting this value to 0 disables this parameter. 0365 is the range of values and the default value is 30. Example: To set the access time to 20 on server_2, type: $ server_param server_2 -facility dedupe -modify accessTime -value 20 Output server_2: done

Set access time for deduplication

31

Managing

Set maximum file size for deduplication


Action For the dedupe facility, to set the maximum file size for Celerra Data Deduplication, use this command syntax: $ server_param <movername> -facility dedupe -modify maximumSize -value <new_value> where: <movername> = name of the Data Mover <new_value> = file size in MB of the largest file to be processed for deduplication. Files larger than this size in MB will not be deduplicated. Setting this value too high may affect system write performance as the first write operation reduplicates the file in its entirety. Setting this value to 0 disables this parameter. 0200000 is the range of values and the default value is 200. Example: To set the maximum file size to 300 MB on server_2, type: $ server_param server_2 -facility dedupe -modify maximumSize -value 300 Output server_2: done

32

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Managing

Set minimum file size for deduplication


Action For the dedupe facility, to set the minimum file size for Celerra Data Deduplication, use this command syntax: $ server_param <movername> -facility dedupe -modify minimumSize -value <new_value> where: <movername> = name of the Data Mover <new_value> = the file size in KB that limits deduplication. File sizes equal to this value or smaller will not be deduplicated. File sizes greater than this value will be candidates for deduplication. Setting this value to 0 disables this parameter. This value should not be set lower than 24 KB. 01000 is the range of values and the default value is 24. Example: To set the minimum file size to 30 on server_2, type: $ server_param server_2 -facility dedupe -modify minimumSize -value 30 Output server_2: done

Set minimum file size for deduplication

33

Managing

Set minimum scan interval for deduplication


Action For the dedupe facility, to set the minimum scan interval for Celerra Data Deduplication, use this command syntax: $ server_param <movername> -facility dedupe -modify minimumScanInterval -value <new_value> where: <movername> = name of the Data Mover <new_value> = minimum number of days after completing one scan of a file system before scanning the same file system again. 1365 is the range of values and the default value is 7. Example: To set the minimum scan interval to 14 days on server_2, type: $ server_param server_2 -facility dedupe -modify minimumScanInterval -value 14 Output server_2: done

34

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Managing

Set modification time for deduplication


Action For the dedupe facility, to set the modification time for Celerra Data Deduplication, use this command syntax: $ server_param <movername> -facility dedupe -modify modificationTime -value <new_value> where: <movername> = name of the Data Mover <new_value> = minimum required file age in days based on modification time. Files updated within the specified number of days will not be deduplicated. Setting this value to 0 disables this parameter. 0365 is the range of values and the default value is 60. Example: To set the modification time to 30 days on server_2, type: $ server_param server_2 -facility dedupe -modify modificationTime -value 30 Output server_2: done

Set modification time for deduplication

35

Managing

Set the deduplication data comparison capability


Note: The deduplication data comparison capability is enabled by default. If you want to disable deduplication data comparison, set this parameter before running deduplication for the first time. If you modify this parameter after deduplication has been run on a file system, the modification takes effect from that point forward and does not affect files that were deduplicated in previous scans. Compression remains enabled regardless of which option is set. Action For the dedupe facility, to enable or disable the deduplication data comparison capability, use this command syntax: $ server_param <movername> -facility dedupe -modify singleInstancingEnabled -value <new_value> where: <movername> = name of the Data Mover <new_value> = enable (1) or disable (0) the deduplication data comparison capability. 01 is the range of values and the default value is 1. Example: To enable deduplication data comparison on server_2, type: $ server_param server_2 -facility dedupe -modify singleInstancingEnabled -value 1 Output server_2: done

36

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

5 Troubleshooting

As part of an effort to continuously improve and enhance the performance and capabilities of its product lines, EMC periodically releases new versions of its hardware and software. Therefore, some functions described in this document may not be supported by all versions of the software or hardware currently in use. For the most up-to-date information on product features, refer to your product release notes. If a product does not function properly or does not function as described in this document, please contact your EMC Customer Support Representative. Topics include:

EMC E-Lab Interoperability Navigator on page 38 Error messages on page 38 Known problems on page 39 Customer training programs on page 40

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

37

Troubleshooting

EMC E-Lab Interoperability Navigator


The EMC E-Lab Interoperability Navigator is a searchable, web-based application that provides access to EMC interoperability support matrices. It is available at http://Powerlink.EMC.com. After logging in to Powerlink, go to Support Interoperability and Product LifeCycle Information E-Lab Interoperability Navigator.

Error messages
As of version 5.6, all new event, alert, and status messages provide detailed information and recommended actions to help you troubleshoot the situation. To view message details, use any of these methods:

Celerra Manager:

Right-click an event, alert, or status message and select to view Event Details, Alert Details, or Status Details.

Celerra CLI:

Type nas_message -info <MessageID>, where <MessageID> is the message identification number.

EMC Celerra Network Server Error Messages Guide:

Use this guide to locate information about messages that are in the earlier-release message format.

Powerlink:

Use the text from the error message's brief description or the message's ID to search the Knowledgebase on Powerlink. After logging in to Powerlink, go to Support Knowledgebase Search Support Solutions Search.

38

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Troubleshooting

Known problems
Table 2 on page 39 describes known problems that might occur when using deduplication and presents workarounds.
Table 2. Known problems and workarounds Known problems The deduplication process aborts because the SavVol is full. Symptom Workaround

Deduplication process fails due to lack If the deduplication process stops because of a SavVol restriction, you can of space. use one of the following options:

Let deduplication rerun in a week (or as scheduled) because data may have been purged from the SavVol, allowing more deduplication to occur. Reduce the minimum scan interval parameter to increase the frequency that deduplication runs. Delete the oldest checkpoint(s) and force deduplication to rerun by reenabling deduplication on the file system. Manually extend the SavVol. Change the dedupe facility's SavVolThreshold parameter that allows the SavVol to extend.

Not enough space available on the file If the file system is configured to autosystem to run deduplication. extend, the deduplication process will abort if the space usage is greater than the configured auto-extension threshold minus 5%.

Either extend the file system manually, or move some files out of the file system temporarily to let the deduplication process run. Once space is available, move the files back again.

If the file system is not configured to You can also archive files on a secauto-extend, the deduplication process ondary storage to clear space on the will abort if the file system usage is primary file system. equal to or greater than 95%.

Known problems

39

Troubleshooting

Customer training programs


EMC customer training programs are designed to help you learn how EMC storage products work together and integrate within your environment to maximize your entire infrastructure investment. EMC customer training programs feature online and hands-on training in state-of-the-art labs conveniently located throughout the world. EMC customer training programs are developed and delivered by EMC experts. For program information and registration, log in to Powerlink, our customer and partner website, and select the Training menu.

40

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Terminology

Celerra FileMover Policy-based system used to determine where files should be physically stored. In most cases, policies are based on file size or last access time (LAT) or both and are used to identify data that can be moved to slower, less-expensive storage. Celerra Network Server EMC network-attached storage (NAS) product line. Common Internet File System (CIFS) File-sharing protocol based on the Microsoft Server Message Block (SMB). It allows users to share file systems over the Internet and intranets. compression Process of encoding data to reduce its size by representing repeating patterns of data using a smaller number of bits than the original. Data Mover In a Celerra Network Server, a cabinet component running its own operating system that retrieves data from a storage device and makes it available to a network client. This is also referred to as a blade. A Data Mover is sometimes internally referred to as 'DART' since DART is the software running on the platform. deduplication Process used to compress redundant data, allowing space to be saved on a file system. When multiple files have identical data, the file system only stores one copy of the data and shares that data between the multiple files. Different instances of the file can have different names, security attributes, and timestamps. None of the metadata is affected by deduplication. Internal Policy Engine Internal process that performs periodic scans to deduplicate files, or reduplicates files on request. link aggregation High-availability feature based on the IEEE 802.3ad Link Aggregation Control Protocol (LACP) standard allowing Ethernet ports with similar characteristics to the same switch to

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

41

Terminology

combine into a single virtual device or link with a single MAC address and potentially multiple IP addresses. Multi-Path File System (MPFS) Celerra Network Server feature that allows heterogeneous servers with MPFS software to concurrently access, directly over Fibre Channel or iSCSI channels, shared data stored on a Symmetrix or CLARiiON storage array. MPFS adds a lightweight protocol called File Mapping Protocol (FMP) that controls metadata operations. Network Data Management Protocol (NDMP) Open standard network protocol designed for enterprise-wide backup and recovery of heterogeneous network-attached storage. network file system (NFS) Network file system (NFS) is a network file system protocol allowing a user on a client computer to access files over a network as easily as if the network devices were attached to its local disks. primary storage Celerra Network Server that provides clients access to normal files as well as archived files through the stub files that represent them. The Celerra Network Server contains all the stub files. quota Limit on the amount of allocated disk space as well as the number of files (inodes) a user or group of users can create in a production file system. Quotas control the amount of disk space or the number of files a user or group of users can consume or both. reduplication Process to undo the effect of Celerra deduplication on a file. If the file was compressed, it will be decompressed. If there are multiple instances of the file data, then a copy of the file data is made so that blocks are not shared between instances of the file. This process consumes additional space in the file system. Therefore, there must be sufficient free space in the file system to hold an additional copy of the original file for this process to complete. SnapSure On a Celerra system, a feature providing read-only point-in-time copies, also known as checkpoints, of a file system.

42

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Index

A
access time, set 31 application integration 15

E
enabling deduplication 20 exclude file extensions 20, 30

C
CDMS integration 15 concepts overview 14 configuring deduplication 19

F
file size limitation 10 file systems display deduplicated 22 list deduplicated 23 query deduplicated 24 suspend deduplication 25 undo deduplication 26 file-level retention integration 15 FileMover integration 15

D
data comparison capability, modify 14, 36 deduplication configure 19 display file systems 22 enabling 20 exclude file extensions 30 list file systems 23 list parameters 28 list specific parameter 29 managing 21 modify data comparison capability 14, 36 query file systems 24 set access time 31 set maximum file size 32 set minimum file size 33 set minimum scan interval 34 set modification time 35 states 14 suspend 25 undo 26 display deduplicated file systems 22

I
integrating applications 15 iSCSI LUN restriction 11

L
LAN integration 16 limitations 10 list deduplicated file systems 23 list parameters 28 list specific parameter 29

M
managing deduplication 21 maximum file size, set 32 minimum file size, set 33 minimum scan interval, set 34 modification time, set 35

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

43

Index

modify data comparison capability 14, 36 MPFS integration 16 MPFS restriction 11

N
NDMP non-NVB integration 16 NDMP NVB integration 16 NDMP NVB restriction 11

related information 12 replication integration 18 replication restrictions 10 requirements, system 10 restrictions 10

S
SHA-1 hash calculation 14 states, deduplication 14 suspend deduplication 25 system requirements 10

P
point in time file system views integration 17

T Q
query deduplicated file systems 24 quotas integration 17 troubleshooting 37

U
undo deduplication 26 user interface choices 11

R
read operations 14

44

EMC Celerra Network Server 5.6.43 Using Celerra Data Deduplication

Potrebbero piacerti anche