Sei sulla pagina 1di 4

Having Data De-Duplication Doubts?

Discover How to Get the


Best of Both Worlds

With the explosive growth of data in business today, problems arise due to shrinking backup windows, the
need for larger repositories for backup data, longer on-site retention needs for recovery and the perceived
problems associated with backing up directly to tape. When disk becomes the primary storage medium for
this data, new issues arise; the cost associated with the expansion of disk arrays in the face of ever-growing
data requirements, the lack of removable media , limited functionality and manageability between virtual
tape libraries (VTLs) and physical tape libraries (PTLs), disk access frequency and usage. Data de-
duplication arose primarily to overcome the problems associated with the need for larger repositories
for backup data and longer on-site retention needs for recovery.

After much analysis of this type of compression, Gresham Enterprise Storage concluded that the benefits of
data de-duplication integrated in Gresham's Clareti Storage Director (Clareti SD), did not outweigh the
costs in performance. However, Gresham realised the need to build a technology that could manage data
de-duplication…and that's exactly what Gresham did.

To de-dupe or not to de-dupe, that is the question! Torn between the two? Discover how you can
have the best of both worlds…
Many virtual backup solutions are designed with disk as the final destination for
data. With disk, space is limited. The question now becomes how can I store
all my data on a limited amount of disk and maximize its usage? Data de-
duplication arose in part from this challenge. When disk is employed as the
final destination of data, other issues arise, such as how to manage the
data from disk to physical tape when physical tape is also needed. Some
virtual backup solutions use a commercial backup application to manage the data,
which complicates storage management even more.

Background
Data de-duplication is an advanced form of compression that generally yields
higher compression ratios than standard compression algorithms. For virtual
backup applications, there are several disadvantages to using data de-
duplication.

De-Duplication Disadvantages
For in-band implementations, performance degradation of the data
transfer results in high costs since many virtual nodes may be required to
satisfy performance needs. For out-of-band or post-processing
implementations, high processing cycles in the virtual backup device also
impacts performance. Further, data must be stored on disk before it can be de-
duplicated, which adds to the disk requirements. It is also important to note that
data is not generally available for a restore while it is being de-
duplicated..

Gresham's Conclusion
After much analysis of this type of compression, Gresham Enterprise Storage
concluded that the benefits of data de-duplication in Gresham's backup
virtualization solution, Clareti Storage Director, did not outweigh the
®

costs in performance. Furthermore, Clareti Storage Director has a true disk


cache, and data does not reside on disk long enough for the reduction in disk
storage to produce significant savings.

Gresham designed the Clareti Storage Director with tape, rather than disk, as the
final destination for data. The disk in a Clareti Storage Director system is used for
intermediary storage for as long as the data may have significant value. In Clareti
Storage Director's InfiniCache method of disk usage, the cache doesn't get full,
since the data is being sent to tape and the disk version of data is overwritten as
more space is needed. In this way, terabytes of disk can conceptually manage
petabytes of data, which is far better than any de-duplication implementation.

Cache Operation

On Disk Only On Disk & Tape Replacing Data

vTape 5 vTape 5 / pTape 5

vTape 4 vTape 4 / pTape 4

vTape 3 vTape 3/ pTape 3 vTape 8

vTape 2 vTape 2/ pTape 2 vTape 7

vTape 1 vTape 1/ pTape 1 vTape 6

To achieve fast movement of data, it is initially stored on disk. Next, it is copied


to real tape so it arrives at its final destination quickly. At this point, there are
two copies of the data: one on disk and one on tape for maximum performance,
availability and reliability. This is diametrically the opposite of data de-
duplication, where multiple copies of the data are turned into one single-point
copy. The practice of making multiple tape copies to protect your data will be
useless if those tape volumes reside in a de-dupe VTL, since they will be de-
duplicated back to one single-point copy.single-point copy.

The Gresham Concept


The Gresham concept is for the data to stay in the Clareti Storage Director's
disk cache while it has its highest value-that is, while it is most likely needed
for a restore. The customer can determine this variable during the Storage
Director's configuration. Then, the data resides just on tape after that value has
diminished. Using this method provides the benefit of a restore from high-speed
disk as well as the economy of tape after the likelihood of a restore has been
reduced. There is no need to keep older data of low value on spinning disk that
requires a lot of power and cooling to maintain. A major criterion for data storage
today is how few Kilowatts and BTU's (power and cooling) are used-and a tape
volume doesn't use any to maintain the data.

The disk cache is managed with a Least Recently Used (LRU) algorithm, which
means that as the disk reaches its high watermark, the Clareti Storage Director
starts to replace the oldest data first with new data-as long as the older data has
been copied to tape. If a restore of any data that still resides in cache is required,
the restore occurs from cache and not tape, regardless of whether or not it is on
tape, for the fastest data transfer possible. If a restore does not find the data in
cache, then the tape is mounted and data is transferred from tape, bypassing
cache, and going straight back to the initiator, providing quick accessibility to the
data. In addition, the Clareti Storage Director has a unique tape positioning
method for accessing data from tape very quickly to create an overall high-
performance tape restore process.

The Clareti Storage Director does have compression capability to minimize disk
utilization. Through research, Gresham has seen compression ratios of two to
seven times, depending on data patterns, which ultimately can overlap a portion
of the realistic de-duplication ratios.

A New Class: The Backup Virtualisation Solution


If disk-only virtual tape storage is the requirement with higher de-
duplication compression ratios, the Clareti Storage Director can act as
“master controller” for one or more de-duplication VTL systems. This
feature classifies the Clareti Storage Director as a Backup Virtualization
Solution. The de-duplication VTL would be behind the Clareti Storage Director
and will look like a real library to the Clareti Storage Director. This configuration
has several benefits for the de-duplication Virtual Tape Library:

• The backup server will experience maximum interface performance to the


Clareti Storage Director's virtual tape drive, since in reality the server will be
sending data to the Gresham Clareti InfiniCache. As a backup virtualization
solution, the Clareti Storage Director will mask any performance degradation
in the de-duplication VTL.
• If a restore is required shortly after the backup, it will occur from the Clareti

InfiniCache. The speed of restoring from disk will mask any unavailability of
data for a restore if the de-duplication VTL is still in a de-duplication cycle.
• If there are multiple de-duplication appliances, the Clareti Storage Director

may be able to represent them as one virtual library to the backup server
through its library consolidation feature. The Clareti Storage Director's
methods of library consolidation minimize configuration and management
issues.
Get the Best of Both Worlds
With the benefits of real tape in mind, the Clareti Storage Director unifies
physical and virtual tape resources in a way that makes all the devices in
the virtual system easier to manage. The default currency of exchange
between the virtual and physical tape subsystems is the Volume ID. The
default 1:1 configuration provides a common Volume ID between the virtual
and physical tapes. This practice simplifies management of the virtual devices
and the tape subsystems.

In addition, the Clareti Storage Director has an SQL database that provides a
wealth of statistical information and status of virtual drives, virtual tape
volumes, virtual libraries, physical drives, physical tape volumes and physical
libraries that can be exported from the GUI into a CSV format for spreadsheets
with the touch of a button.

Bottom-Line
In summary, Gresham's Clareti Storage Director is designed for high
performance and brings the capabilities of physical tape closer to disk
rather than the other way around. It eases tape management by providing
more information about the tape subsystem than has ever been provided before
and can present a common configuration of drives and libraries to all backup
servers, regardless of the physical hardware attached, to simplify the
configuration and management of the backup servers. About Gresham
It also has superior capability for consolidating or partitioning back-end Gresham designs and develops
storage devices and scales well from high-end requirements downward. enterprise storage software applications.
Many of the world’s most successful
De-Duplication is a technology that has its benefits and purpose, especially for organisations choose Gresham to help
reducing traffic across a network, or for some applications, such as E-Mail, or improve their competitive edge and
for systems that don't require high performance. To make de-duplication, disk bottom line performance in some of the
and tape technologies successful, their strengths must be used properly rather most challenging market sectors,
than “shoe-horned” in to a system as a solution for all ailments. Tape extends including financial services,
the capabilities of system storage, and we should not ignore this technology or manufacturing, healthcare, utilities,
its benefits. telecommunications, and public services.

If you are looking for a high-performance backup virtualization solution to Further information
complement and bring clarity to your physical tape or de-dupe VTL For more information about Gresham
infrastructure, contact Gresham. software products please contact us

www.greshamstorage.com
“Clareti Storage Director architecture supports tremendous performance and or you can email us at
explains why Gresham is less concerned with secondary features such as de- info@greshamstorage.com
duplication. With its intelligent algorithm, Clareti Storage Director's cache Alternatively you can contact our
ensures that the most in-demand data is placed on disk, while rapid offices directly.
tape synchronization optimizes tape performance. Further, Gresham's
use of standard hardware makes the Clareti Storage Director flexible, and Americas
a solution can be designed to retain an appropriate amount of storage T US - (Toll Free): 1-800-450-0575
in cache to meet the cost and performance needs of a customer without T Outside of US: 1-(512)-450-0900
the entrapments of proprietary repositories. With the focus on
performance and intelligent cache management rather than long-term Europe, Middle East and Africa
T +44 (0)1489 555500

Asia Pacific
T 1-(512)-450-0900

©2008 Gresham Computing plc. All rights reserved. Clareti Storage Director is a trademark of Gresham Computing plc.
All other products and company names mentioned may be trademarks of their respective owners.

Potrebbero piacerti anche