Sei sulla pagina 1di 11

Versity Storage Manager

The cloud scale solution for managing large archives.

Versity Storage Manager (VSM) is a next generation large scale archive data management tool
for cost effectively protecting massive data collections on any mix of storage devices and cloud
services.

VSM interfaces with existing enterprise business applications and backup solutions through a
standard POSIX file system interface, making it easy to expose low cost archival storage
resources to end users. Rich and flexible archive policies optimize the flow of data both in and
out of cloud storage services and archival storage devices such as tape libraries, disk arrays,
and on premise object storage systems. VSM's ability to broker data across a diverse mix of
archival storage systems enables customers to optimize performance, reliability, and cost
factors for different types of data and different users while always maintaining independence
from storage hardware vendors and cloud providers.

VSM is engineered for maximum utilization of storage and networking hardware. Our customers
use VSM to read and write Petabytes of archival data per day. With advanced technology,
friendly business practices, and responsive customer support, Versity is an archiving solution
you can love!

VSM Key Benefits

• Lowest TCO
VSM allows large storage sites to operate their own infrastructure with cloud scale
efficiency. Versity sites are able to obtain TCO levels up to 10x lower than Amazon
Glacier while delivering unlimited amounts of data at no incremental cost over high
speed local area networks.

• Hybrid Cloud
VSM provides enterprise class archiving performance natively to both traditional on
premise tape systems and private or public cloud systems. VSM does not rely on
object to tape gateways which limit performance, functionality, and utilization. VSM’s
POSIX front end interface allows customers to deploy object storage systems without
changing their applications.

• High Throughput
Extremely large and active data collections require very high reading and writing
throughput rates. VSM works in GB/s not MB/s. Busy Versity sites can move up to 1PB
of data per day under real world conditions.

© 2018 Versity Software Inc. !2


!

• Open Source
VSM stores all customer data in the open source GNU tar file format so that archive
content is always available with or without vendor software. This practice eliminates
vendor lock-in and provides an extra layer data availability assurance. VSM2 also
features a fully open source GPL file system.

• No code changes
VSM presents itself as a POSIX filesystem or can be used directly for object interactions.
There is no need to change legacy or new codes to use the system.

• Automatic
VSM employs rich policies to automatically move data to the desired media, and
automatically manages cache space so there’s no need to worry about it.

The VSM Product Family

An Evolution

VSM1 was commercially launched in 2013 and is Versity's flagship product. VSM1 has been
established as a leading large scale archive management tool supporting sustained throughput
of of up to 12GB/s (1 petabyte per day) under real world conditions. VSM1 has been deployed
globally in nearly every significant archive industry vertical and across sites ranging in size from
Fortune 10 to small research sites. The technology behind VSM1 was based on Sun
Microsystem's open source SAM-QFS product. Versity ported SAM-QFS to Linux, assembled a
team of leading archive storage experts, released dozens of incremental improvements
including adding support for object storage and has been rapidly evolving and advancing the
product for over five years.

Followed by a Revolution!

VSM2 is Versity's revolutionary next generation scale out archiving product and is currently in
the pre-beta phase of development. VSM2 is a completely new archiving platform built by
Versity from scratch to address the exascale archiving requirements that are materializing over
the next decade. The design challenge for VSM2 was to create a product capable of routinely
managing and archiving tens or even hundreds of billions of files within a POSIX namespace
that could ultimately scale to a capacity of 1 trillion files.

The aggressive design goals for VSM2 required a radical shift from the traditional and
ubiquitous server centric HSM architecture (pictured below) featuring a central metadata server

© 2018 Versity Software Inc. !3


!

or metadata controller orchestrating file system operations and archiving work. In this
architecture, the central metadata controller is the architectural bottleneck as well as a single
point of failure. While SAN clients may be added to increase IO to shared storage, the overall
system does not scale beyond certain limits.

VSM1 and traditional HSM architecture

! 


© 2018 Versity Software Inc. !4


!

VSM2 moves from the central server architecture to a full scale out "nodes and services"
architecture where there throughput and performance of the system scales by adding nodes to
the cluster. Each node in the cluster shares work including file system metadata processing, and
there is no longer any single point of failure.

VSM2 next generation scale out archive data broker

© 2018 Versity Software Inc. !5


!

VSM2 Components - ScoutFS and ScoutAM


The VSM2 platform is comprised of two primary components. The first is an open source scale
out archiving file system called ScoutFS, and the second is a scale out userland application
called ScoutAM. The two components work together to deliver file create rates, parallel
archiving, find operations, and other archival work with vastly improved speed and efficiency.

ScoutFS is a POSIX, scale out, open source GPL, in kernel, block file system designed
specifically for archiving. Metadata is processed on all nodes or a sub set of nodes in a VSM2
cluster. There is no central metadata controller nor any single point of failure. ScoutFS
introduces many outstanding new capabilities by radically increasing the number of POSIX files
that can reliably be maintained in a single namespace.

© 2018 Versity Software Inc. !6


!

ScoutAM is a next generation scale out data broker designed to replace traditional
HSM applications and deliver the ability to meet exascale archiving requirements where
tens or hundreds of billions of files are rapidly moving in and out of archival storage
systems. ScoutAM provides cloud scale services by intelligently spreading work among
compute nodes and saturating available storage devices. With ScoutAM, the application
of policies, packaging of work, and execution of archive jobs is packetized, and
executed in parallel utilizing all available node resources. 


VSM2 Scale Out Data Broker:

• Moves beyond the traditional limitations of HSM products.


• Built for exascale workloads.
• Parallel, fast, and scalable.

© 2018 Versity Software Inc. !7


!

How VSM Works

VSM includes a SAN file system that runs on nodes with access to shared block
devices. A single VSM node primarily consists of two components, the Intelligent
Cache™, and the Archive Engine™.

The Archive Engine™ efficiently packages and groups archive data. Files are grouped
intelligently by user, group, path name pattern, size, creation/modification time, access
time, or any combination of these so that like data and data destined for a common
target are packed together. Files are containerized into the GNU TAR format, both for its
open data format and for read/write efficiency. This format has distinct benefits for small
files, which are grouped and then sent to archive, thereby shaping traffic into workloads
that are optimal for obtaining maximum throughput of the target archive storage devices
and media. Upon retrieval, the system recalls files individually without retrieving the
entire tar file. Large files are divided into chunks then grouped into data sets that are
efficient for streaming tape writes. This capability allows VSM to stripe large files across
an arbitrary number of drives simultaneously. Unlike other products that stripe across
tape drives, VSM allows files to be read back with a different number of drives than
were used during the write so that administrators can control drive resources based
upon workloads.

© 2018 Versity Software Inc. !8


!

Archive resources can include any combination of tape, private cloud, public cloud, and
disk. The archive resources are specified by configurable policies. For instance, a
customer could specify a configuration that saved copy 1 to a tape library, copy 2 to a
different tape library, copy 3 to AWS, and copy 4 to a private cloud system at a remote
site. Storage resources may be reserved or prioritized, making things like placing
certain types of data on separate media within the same archive possible. For example,
it is possible to ensure that multiple copies within the same tape library are always
written to different pieces of physical media.

All data including objects are presented as standard POSIX files, so there is no need to
modify or re-write existing enterprise applications in order to take advantage of private
cloud or public cloud storage resources. Versity’s file packaging process minimizes the
overhead associated with object storage servers and enables parallel uploading for
extreme file to object performance. For example, a current VSM reference customer
utilizing ActiveScale private cloud hardware sustains full saturation of 2 10G interfaces.
For applications that utilize object directly it is possible to get/put objects directly to and
from VSM using the S3 protocol.

The Intelligent Cache™ stores files and indexes them. Archive metadata is stored
directly in the filesystem. Metadata stored in this way is a major advantage because
there is no external database that can get out of sync or lag behind the file system.
Metadata remains online for search, browsing, and use by applications. Metadata
operations such as an ‘ls -l’ do not cause tape mounts. Both metadata and data are
checksummed to ensure end to end data correctness.

Cache space on shared block devices is managed between high and low thresholds
that are specified by the site administrator. VSM automatically manages space to
balance new incoming files, archiving activities, and data retrieval. Archived files are
optionally released (removed) from the cache only when the high threshold is reached
so that files remain accessible on the fastest storage whenever possible. Storage
administrators may specify that files with certain attributes such as path, owner, size,
type, etc. should always remain in the cache for fast access.

When data is ready to be used by an application or user, it enters a process called


staging. Staging activity is fully automated and transparent to applications, although

© 2018 Versity Software Inc. !9


!

there is visibility into the stage queue for the storage administrator if needed. Staging
order is set by policy and may be configured to favor the copy that is most readily
available. This is usually the copy on the fastest media, but this depends on site
specifics like connection speeds and charges for access. Like archive workloads, stage
workloads are sorted and optimized to enable efficient media handling and maximum
throughput. Stage resources may be managed to ensure optimal system availability. For
example, the number of tape mounts or drives utilized by a specific user or by a specific
set of files may be limited.

To help identify candidate data for archiving and to help with ingesting files from various
enterprise file systems, Versity provides the Archive Fabric Module™ (AFM). AFM is a
tool that will analyze external filesystems and help identify good candidates for
archiving, as well as copy or move bulk data into the VSM system.

VSM runs on customer supplied commodity hardware. The supplied hardware must
include:

• Shared storage device for Intelligent Cache™ (any low latency SAN device or
VSAN ex. ScaleIO)
• 1 or more SSD LUN(s) for metadata
• 1 or more LUN(s) for cache data
• Metadata size = 1GB per 1 million files
• Fiber Channel or IB network for SAN or Ethernet for VSAN
• Commodity server nodes for VSM Tape, disk, cloud infrastructure

Tape, disk, or cloud infrastructure archive resources must also be supplied and
connected.

VSM is delivered to customers as an rpm. The customer executes the binary and may
use the installation guide to step through the installation and configuration process.
Installation typically takes anywhere from one to four hours, provided that the hardware
and operating systems have been installed and prepared for VSM. On site professional
services are available for deployment, configuration, and administration if desired.

Versity executives and software engineers are available via a dedicated Slack channel,
email, or phone to help with installation, configuration, or other questions.

© 2018 Versity Software Inc. !10


!

Conclusion

Versity provides rock solid data protection at the lowest total cost of ownership. VSM
enables organizations to use commodity hardware resulting in the lowest overall total
cost of ownership for a solution while maximizing performance and eliminating
challenges associated with archiving large data sets.

About Versity

Versity is the only independent provider of advanced, scalable, high throughput software
defined archiving storage technology products. Organizations leverage the Versity
Storage Manager to implement data preservation strategies for long term storage, and
retrieval of massive data stores both on premise and in the cloud. Versity serves
customers in North America, Europe, India, and the Middle East including leading public
and private entities in financial services, education, research, aerospace, energy,
entertainment, web services, and publishing. For more information go to
www.versity.com

Contact Versity: 415-723-0949 | info@versity.com | versitysoftware on social media

© 2018 Versity Software Inc. !11

Potrebbero piacerti anche