Hitachi White Paper Introduction To Object Storage and HCP

DATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG
ON POWERFUL RELEVANT PERFORMANCE SOLUTION CLO

VIRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V
WHITE PAPER
Better Object Storage

With Hitachi Content Platform
The Fundamentals of Hitachi Content Platform
By Michael Ratner
November 2014
WHITE PAPER 2
Contents
Executive Summary 3
Introduction 4
Main Concepts and Features 4
Object-Based Storage 4
Distributed Design 7
Open Architecture 7
Multitenancy 8
Object Versioning 8
Search 9
Adaptive Cloud Tiering 9
Spin-Down Capability 9
Replication and Global Access Topology 10
Common Use Cases 11
Cloud-Enabled Storage 11
Backup-Free Data Protection and Content Preservation 12
Fixed-Content Archiving 14
Compliance, E-Discovery and Metadata Analysis 14
System Fundamentals 15
Hardware Overview 15
Software Overview 18
System Organization 18
Namespaces and Tenants 20
Main Concepts 20
User and Group Accounts 21
System and Tenant Management 22
Object Policies 22
Content Management Services 24
Conclusion 27
WHITE PAPER 3
Better Object Storage With Hitachi Content Platform

Executive Summary
One of ITs greatest challenges today is an explosive, uncontrolled growth of unstructured data. Continual growth of
email and documents, video, Web pages, presentations, medical images and the like increase both complexity and
risk. These difficulties are seen particularly in distributed IT environments, such as cloud service providers and orga-
nizations with branch or remote office sites. The vast quantity of data being created, the difficulties in management
and proper handling of unstructured content, and the complexity of supporting more users and applications pose
significant challenges to IT departments. Organizations often end up with sprawling storage silos for a multitude of
applications and workloads, with few resources available to manage, govern, protect and search the data.
Hitachi Data Systems provides an alternative solution to these challenges through Hitachi Content Platform (HCP).
This single object storage platform can be divided into virtual storage systems, each configured for the desired level
of service. The great scale and rich features of this solution help IT organizations in both private enterprises and cloud
service providers. HCP assists with management of distributed IT environments and control of the flood of storage
requirements for unstructured content, and it addresses a variety of workloads.
The Hitachi Content Platform portfolio products integrate tightly with HCP to deliver powerful file sync and share
capability, and elastic backup-free file services for remote and branch offices.
Built from end to end by Hitachi Data Systems, Hitachi Content Platform Anywhere (HCP Anywhere) provides safe,
secure file sharing, collaboration and synchronization. End users simply save a file to HCP Anywhere and it synchro-
nizes across their devices. These files and folders can then be shared via hyperlinks. Because HCP Anywhere stores
data in HCP, it is protected, compressed, single-instanced, encrypted, replicated and access-controlled.
Hitachi Data Ingestor (HDI) combines with HCP to deliver elastic and backup-free file services beyond the data center.
When a file is written to HDI, it is automatically replicated to HCP. From there, it can be used by another HDI for effi-
cient content distribution and in support of roaming home directories, where users' permissions follow them to any
HDI site. Files stay in the HDI file system until free space is needed. Then, HDI reduces any inactive files to pointers
referencing the object on HCP. HDI drastically simplifies deployment, provisioning and management by eliminating the
need to constantly manage capacity, utilization, protection, recovery and performance of the system.
One infrastructure is far easier to manage than disparate silos of technology for each application or set of users. By
integrating many key technologies in a single storage platform, Hitachi Data Systems object storage solutions provide
a path to short-term return on investment and significant long-term efficiency improvements. They help IT evolve to
meet new challenges, stay agile over the long term, and address future change and growth.
WHITE PAPER 4
Introduction
Hitachi Content Platform (HCP) is a multipurpose distributed object-based storage system HCP and
designed to support large-scale repositories of unstructured data. HCP enables IT organizations Content
and cloud service providers to store, protect, preserve and retrieve unstructured content with a Cloud
single storage platform. It supports multiple levels of service and readily evolves with technology
SEE VIDEO
and scale changes. With a vast array of data protection and content preservation technologies, the
system can significantly reduce or even eliminate tape-based backups of itself or of edge devices
connected to the platform. HCP obviates the need for a siloed approach to storing unstructured content. Massive
scale, multiple storage tiers, Hitachi reliability, nondisruptive hardware and software updates, multitenancy and con-
figurable attributes for each tenant allow the platform to support a wide range of applications on a single physical
HCP instance. By dividing the physical system into multiple, uniquely configured tenants, administrators create "virtual
content platforms" that can be further subdivided into namespaces for further organization of content, policies and
access. With support for thousands of tenants, tens of thousands of namespaces, and petabytes of capacity in one
system, HCP is truly cloud-ready (see Figure 1).
Figure 1. A single Hitachi Content Platform supports a wide range of applications.
Main Concepts and Features

Object-Based Storage
Hitachi Content Platform, as a general-purpose object store, allows unstructured data files to be stored as objects.
An object is essentially a container that includes both file data and associated metadata that describes the data.
The objects are stored in a repository. Each object is treated within HCP as a single unit for all intents and purposes.
The metadata is used to define the structure and administration of the data. HCP can also leverage object metadata
to apply specific management functions, such as storage tiering, to each object. The objects have intelligence that
enables them to automatically take advantage of advanced storage and data management features to ensure proper
placement and distribution of content.
HCP architecture isolates stored data from the hardware layer. Internally, ingested files are represented as objects
that encapsulate both the data and metadata required to support applications. Externally, HCP presents each object
either as a set of files in a standard directory structure or as a uniform resource locator (URL) accessible by users and
applications via HTTP or HTTPS.
WHITE PAPER 5
Object Structure
An HCP repository object is composed of fixed-content data and the associated metadata, which in turn consists of
system metadata and, optionally, custom metadata and an access control list (ACL). The structure of the object is
shown in Figure 2.
Fixed-content data is an exact digital copy of the actual file contents at the time of its ingestion. It becomes immuta-
ble after the file is successfully stored in the repository. If the object is under retention, it cannot be deleted before the
expiration of its retention period, except when using a special privileged operation. If versioning is enabled, multiple
versions of a file can be retained. If appendable objects are enabled, data can be appended to an object (with the
CIFS or NFS protocols) without modifying the original fixed-content data.
Figure 2. HCP Object
Metadata is system- or user-generated data that describes the fixed-content data of an object and defines the
object's properties. System metadata, the system-managed properties of the object, includes HCP-specific meta-
data and POSIX metadata.
HCP-specific metadata includes the date and time the object was added to the namespace (ingest time), the
date and time the object was last changed (change time), the cryptographic hash value of the object along with the
namespace hash algorithm used to generate that value, and the protocol through which the object was ingested. It
also includes the object's policy settings such as DPL, retention, shredding, indexing and versioning.
POSIX metadata includes a user ID and group ID, a POSIX permissions value and POSIX time attributes.
Custom metadata is optional, user-supplied descriptive information about a data object that is usually provided as
well-formed XML. It is typically intended for more detailed description of the object. This metadata can also be used
by future users and applications to understand and repurpose the object content. HCP supports multiple custom
metadata fields for each object.
WHITE PAPER 6
ACL is optional, user-provided metadata containing a set of permissions granted to users or user groups to perform
operations on an object. ACLs control data access at an individual object level and are the most granular data
access mechanism.
In addition to data objects, HCP also stores directories and symbolic links in the repository. Only POSIX metadata
is maintained for directories and symbolic links; they have no fixed-content data, custom metadata or ACLs.
All the metadata for an object is viewable; only some of it can be modified. The way metadata can be viewed and
modified depends on the namespace configuration, the data access protocol and the type of metadata.
Object Representation
HCP presents objects to a user or application in 2 different ways, depending on the namespace access interface.
With the RESTful HTTP protocols (HCP REST, Amazon S3), HCP presents each object as a URL. Both data and
metadata is accessed through the REST interface. Metadata is handled by using URL query parameters and HTTP
headers. Clients specify metadata values by including HCP-specific parameters in the request URL; HCP returns
system metadata in HTTP response headers.
For non-RESTful namespace protocols (WebDAV, CIFS, and NFS), HCP includes the HCP file system, a standard
POSIX file system that allows users and applications to view stored objects as regular files, directories and symbolic
links. HCP file system allows data to be handled in familiar ways using existing methods. It presents each object as a
set of files in 2 hierarchical directory structures that hold the components of the object: one for the object's data and
another for the object's metadata. For a data object (an object other than a directory or symbolic link), one of these
files contains the fixed-content data. The name of this file is identical to the object's name, and its content is the
same as the originally stored file. The other files contain object metadata. These files, which are either plain text, XML
or JSON, are called metafiles. Directories that contain metafiles are called metadirectories.
HCP File System

HCP file system represents a single file system across a given namespace. Each HCP namespace that has any
non-RESTful access protocol enabled exposes a separate HCP file system instance to clients.
HCP file system maintains a directory structure with separate branches for data files and metafiles. The data top-level
directory is a traditional file system view that includes fixed-content data files for all objects in the namespace. This
directory hierarchy is created by a user adding files and directories to the namespace. Each data file and directory in
this structure has the same name as the object or directory it represents. The metadata top-level directory contains
all the metafiles and metadirectories for objects and directories. This structure parallels that of data, excluding sym-
bolic links, and is created by HCP file system automatically as data and directories are added to the namespace
by an end user.
HCP metafiles provide a means of viewing and manipulating object metadata through a traditional file system inter-
face. Clients can view and retrieve metafiles through the WebDAV, CIFS and NFS protocols. These protocols can also
be used to change metadata by overwriting metafiles that contain the HCP-specific metadata (that can be changed).
A sample HCP file system data and metadata structure, as seen through CIFS, NFS and WebDAV protocols, is
shown in Figure 3.
WHITE PAPER 7
Figure 3. HCP File System Data and Metadata Structure
Distributed Design
A single Hitachi Content Platform consists of both hardware and software. It is composed of many different compo-
nents that are connected together to form a robust, scalable architecture for object-based storage. HCP runs on an
array of servers, or nodes, that are networked together to form a single physical instance. Each node stores data
objects and can also store search index. All runtime operations and physical storage, including data, metadata and
index, are distributed among the system nodes. All objects in the repository are distributed across all available
storage space but still presented as files in a standard directory structure. Objects that are physically stored on any
particular node are available from all other nodes.
Open Architecture
Hitachi Content Platform has an open architecture that insulates stored data from technology changes and from
changes in HCP itself due to product enhancements. This open architecture ensures that users will have access to
the data long after it has been added to the repository. HCP acts as a repository that can store customer data and
an online portal. As a portal, it enables access to that data by means of several industry-standard interfaces, as well
as through an integrated search facility and Hitachi Data Discovery Suite (HDDS).
The industry-standard HTTP REST, Amazon S3, WebDAV, CIFS and NFS protocols support various operations.
These operations include storing data, creating and viewing directories, viewing and retrieving objects and their
metadata, modifying object metadata, and deleting objects. Objects that were added using any protocol are
immediately accessible through any other supported protocol. These protocols can be used to access the data with
a Web browser, the HCP client tools, 3rd-party applications, Microsoft Windows Explorer, or native Windows or
Unix tools. HCP also allows special-purpose access to the repository through the SMTP protocol in order to support
email journaling.
WHITE PAPER 8
HCP provides a number of HTTP-based RESTful open APIs for easy integration with customer applications. In addi-
tion to HCP REST and Amazon S3-compatible HS3 interfaces that are used for namespace content access, HCP
supports metadata query API for searching for objects in a namespace and management API (MAPI) for tenant and
namespace-level administration.
HCP implements the open, standards-based Internet Protocol version 6 (IPv6), the latest version of the Internet
Protocol (IP). This protocol allows HCP to be deployed in very large scale networks and ensure compliance with a
number of government agencies where IPv6 is mandatory. HCP provides IPv6 dual stack capability that enables
coexistence of IPv4 and IPv6 protocols and corresponding applications. HCP can be configured in native IPv4, native
IPv6, or dual IPv4 and IPv6 modes where each virtual network will support either or both IP versions.
The IPv4 and IPv6 dual-stack feature is indispensable in heterogeneous environments during transition to IPv6 infra-
structure. Any network mode can be enabled when desired, and existing IPv4 applications can be upgraded to IPv6
independently and with minimal disruption in service. All standard networking protocols and existing HCP access
interfaces are supported and can use either IPv4 and/or IPv6 addresses based on the enabled network mode, which
allows seamless integration with existing data center environments.
Multitenancy
Multitenancy support allows the repository in a single physical Hitachi Content Platform instance to be partitioned
into multiple namespaces. A namespace is a logical partition that contains a collection of objects particular to one
or more applications. Each namespace is a private object store that is represented by a separate directory structure
and has a set of independently configured attributes. Namespaces provide segregation of data, while tenants, or
groupings of namespaces, provide segregation of management. An HCP system can have up to 1,000 tenants and
10,000 namespaces. Each tenant and its set of namespaces constitute a virtual HCP system that can be accessed
and managed independently by users and applications. This HCP feature is essential in enterprise, cloud and service-
provider environments.
Data access to HCP namespaces can be either authenticated or nonauthenticated, depending on the type and
configuration of the access protocol. Authentication can be performed using HCP local accounts or Microsoft Active
Directory groups.
Object Versioning
Hitachi Content Platform supports object versioning, which is the capability of a namespace to create, store and
manage multiple versions of objects in the HCP repository. This ability provides a history of how the data has
changed over time. Versioning facilitates storage and replication of evolving content, thereby creating new opportu-
nities for HCP in markets such as content depots and workflow applications.
Versioning is available in HCP namespaces and is configured at the namespace level. Versioning is supported only
with HTTP REST protocol. Other protocols cannot be enabled if versioning is enabled for the namespace. Versioning
applies only to objects, not to directories or symbolic links. A new version of an object is created when an object
with the same name and location as an existing object is added to the namespace. A special type of version, called
a deleted version, is created when an object is deleted. This helps protect the content against accidental deletes.
Updates to the object metadata affect only the current version of an object and do not create new versions.
Previous versions of objects that are older than a specified amount of time can be automatically deleted, or pruned. It
is not possible to delete specific historical versions of an object; however, a user or application with appropriate per-
missions can purge the object to delete all its versions, including the current one.
WHITE PAPER 9
Search
Hitachi Content Platform includes comprehensive search capabilities that enable users to search for objects in
namespaces, analyze namespace contents, and manipulate groups of objects. To satisfy government requirements,
HCP supports e-discovery for audits and litigation.
HCP supports 2 search facilities and includes a Web application portal called the search console that provides an
interactive interface to these search facilities. HCP provides the only integrated metadata query engine (MQE) on
the market. The MQE search facility is integrated with HCP and is always available in any HCP system. The HDDS
search facility interacts with Hitachi Data Discovery Suite, and this separate HDS product enables federated search
across multiple HCP and other supported systems. HDDS performs search and returns results to the HCP search
console. HDDS must be installed separately and configured in the HCP search console.
MQE can index and search only object metadata. The HDDS search facility indexes both content and metadata and
allows full content search of objects in a namespace. MQE is also used by the metadata query API, a program-
matic interface for querying namespaces.
Adaptive Cloud Tiering

Adaptive cloud tiering expands Hitachi Content Platform capacity to any storage device or cloud service. It enables
hybrid cloud configurations to scale and share resources between public and private clouds. It also allows HCP
to be used to build custom, evolving service level agreements (SLAs) for specific data sets using enhanced service
plans.
HCP provides comprehensive storage-tiering capabilities as part of the long-term goal of supporting information life-
cycle management (ILM) and intelligent objects. HCP supports a range of storage components that are grouped
into storage pools. Storage pools virtualize access to one or more logically grouped storage components with
similar price/performance characteristics. The storage components can be either primary storage (HCP storage) or
extended storage. Primary storage includes direct attached storage (DAS) and SAN storage; internal DAS storage
is always running, while SAN storage may be running or spin-down-capable. Extended storage includes non-
HCP external storage devices (NFS and S3-compatible) and public cloud storage services (Amazon S3, Microsoft
Azure, Google Cloud Storage and Hitachi Cloud Services).
The topology of the adaptive cloud tiering is shown in Figure 4.
Objects are stored in storage pools and are managed by object life-cycle policies, which are defined in service plans.
Service plans determine content life cycle from ingest to obsolescence or disposition and implement protection strat-
egies at each tier; they effectively represent customer SLAs. Service plans can be offered to a tenant administrator so
they can be applied to individual namespaces.
Storage tiering functionality is implemented as an HCP service. Storage tiering service applies service plans and
moves objects between tiers of storage. Flexible service plans allow storage tiering to adapt to changes.
Spin-Down Capability
HCP spin-down-capable storage takes advantage of the power savings feature of Hitachi midrange storage sys-
tems and is one of the core elements of the storage tiering functionality and adaptive cloud tiering. According to
storage tiering strategy that an organization specifies, the storage tiering service identifies objects that are eligible
to reside on spin-down storage and moves them to and from the spin-down storage as needed. Tiering selected
content to spin-down-enabled storage lowers overall cost by reducing energy consumption for large-scale unstruc-
tured data storage, such as deep archives and disaster recovery sites. Storage tiering can very effectively be used
with customer-identified "dark data" (rarely accessed data) or data replicated for disaster recovery by moving that
data to spin-down storage some time after ingestion or replication.
WHITE PAPER 10
Figure 4. Adaptive Cloud Tiering
Replication and Global Access Topology

Replication, an add-on feature to HCP, is the process that keeps selected tenants and namespaces in 2 or more
HCP systems in sync with each other. The replication service copies one or more tenants or namespaces from one
HCP system to another, propagating object creations, objects deletions and metadata changes. HCP also replicates
tenant and namespace configuration, tenant-level user accounts, compliance and tenant log messages, and reten-
tion classes. The replication process is object-based and asynchronous.
The HCP system in which the objects are initially created is called the primary system. The second system is called
the replica. Typically, the primary system and the replica are in separate geographic locations and connected by a
high-speed wide area network. HCP supports advanced traditional replication topologies including many-to-one and
chain configurations, as well as revolutionary global access topology where globally distributed HCP systems are
synchronized in a way that allows users and applications to access data from the closest HCP site for improved col-
laboration, performance and availability.
Global access topology is based on bidirectional, active-active replication links that allow read-and-write access
to the same namespace on all participating HCP systems. The content is synchronized between systems (or loca-
tions) in both directions. This enables read-and-write access to data in any namespace and from any location across
entire replication topology, essentially creating global content point-of-presence network.
WHITE PAPER 11
Common Use Cases

Cloud-Enabled Storage
The powerful, industry-leading capabilities of Hitachi Content Platform make it well suited to the cloud storage space.
An HCP-based infrastructure solution is sufficiently flexible to accommodate any cloud deployment models (public,
private or hybrid) and simplify the migration to the cloud for both service providers and subscribers. HCP provides
edge-to-core, secure multitenancy and robust management capabilities, and a host of features to optimize cloud
storage operations.
HCP, in its role as an online data repository, is truly ready for a cloud-enabled market. While numerous HCP features
were already discussed earlier in this paper, the purpose of this section is to summarize those that contribute the
most to HCP cloud capabilities. They include:
Large-scale multitenancy.
Management segregation. HCP supports up to 1,000 tenants, each of which can be uniquely configured for
use by a separate cloud service subscriber.
Data segregation. HCP supports up to 10,000 namespaces, each of which can be uniquely configured for a
particular application or workload.
Massive scale.
Petabyte repository offers 80PB of storage, 80 nodes, 64 billion user objects and 30 million files per directory, all
on a single physical system.
Best node density in the object storage industry supports 500TB and 800 million objects per node. With fewer
numbers of nodes, HCP requires less power, less cooling and less floor space.
Unparalleled expandability allows organizations to "start small" and expand according to demand.
Nodes and/or storage can be added to expand an HCP system's storage and throughput capacity, without dis-
ruptions. Multiple storage systems are supported by a single HCP system.
Easy tenant and storage provisioning.
Geographical dispersal and global accessibility.
Global access topology that enables creation of a global content point-of-presence network.
WAN-friendly REST interface for namespace data access and replication.
WAN-optimized, high-throughput data transfer.
High availability.
Fully redundant hardware.
Automatic routing of client requests around hardware failures.
Load balancing across all available hardware.
Adaptive cloud tiering enables hybrid cloud configurations where resources can be easily scaled and shared
between public and private clouds. Specific data sets can be migrated on-demand across various cloud services
and local storage, and new cloud storage can be easily integrated and existing storage retired.
WHITE PAPER 12
Multiple REST interfaces. These interfaces include the HCP REST and Amazon S3-compatible REST APIs for
namespace data access, management API and metadata query API. REST API is a technology of choice for cloud
enablers and consumers. Some of the reasons for its popularity include high efficiency and low overhead, caching
at both the client and the server, and API uniformity. In addition, this technology offers a stateless nature that allows
accommodation of the latencies of Internet access and potentially complex firewall configurations.
Secure, granular access to tenants, namespaces and objects, which is crucial in any cloud environment. This
access is facilitated by the HCP multilayer, flexible permission mechanism, including object-level ACLs.
Usage metering. HCP has built-in chargeback capabilities, indispensable for cloud use, to facilitate provider-
subscriber transactions. HCP also provides tools for 3rd-party vendors and customers to write to the API for easy
integration with the HDS solution for billing and reporting.
Low-touch system that is self-monitoring, self-managing and self-healing. HCP features advanced monitoring,
audit and reporting capabilities. HCP services can automatically repair issues if they arise.
Support for multiple levels of service. This support is provided through HCP policies, service plans and quotas
that can be configured for each tenant. It helps enforce SLAs and allows the platform to accommodate a wide
range of subscriber use cases and business models on a single physical system.
Edge-to-core solution. HCP, working in tandem with Hitachi Data Ingestor provides an integrated edge-to-core
solution for cloud storage deployments. HCP serves as the "engine" at the core of the HDS cloud architecture. HDI
resides at the edge of the storage cloud (for instance, at a remote office or subscriber site) and serves as the "on-
ramp" for application data to enter the cloud infrastructure. HDI acts as a local storage cache while migrating data
into HCP and maintaining links to stored content for later retrieval. Users and applications interact with HDI at the
edge of the cloud but perceive bottomless, backup-free storage provided by HCP at the core.
File-sync-and-share solution. HCP, working in tandem with Hitachi Content Platform

HCP
Anywhere (HCP Anywhere), provides a secure file and folder synchronization and sharing Anywhere
solution for workforce mobility. HCP again serves as the "engine" at the core of the HDS cloud Benefits
architecture. HCP Anywhere servers are deployed in conjunction with HCP and client applications Video
that are installed on user devices including laptops, desktops and mobile devices. End users WATCH
simply save a file to their HCP Anywhere folder and it automatically synchronizes to all of their
registered devices and becomes available via popular Web browsers. Once saved to the HCP
Anywhere folder, the file is protected, compressed, single-instanced, encrypted, replicated and access-controlled
by the well-proven Hitachi Content Platform. Individual files or entire folders can then be shared with a simple
hyperlink.
Backup-Free Data Protection and Content Preservation

Hitachi Content Platform is a truly backup-free platform. HCP protects content without the need for backup. It uses
sophisticated data preservation technologies, such as configurable data and metadata protection levels, object ver-
sioning and change tracking, multisite replication with seamless application failover, and many others. HCP includes
a variety of features designed to protect the integrity, provide the privacy, and ensure the availability and security of
stored data. Below is a summary of the key HCP data protection features:
Content immutability. This intrinsic feature of HCP "write-once, read-many" (WORM) storage design protects
the integrity of the data in the repository.
Content verification. The content verification service maintains data integrity and protects against data corrup-
tion or tampering by ensuring that the data of each object matches its cryptographic hash value. Any violation is
repaired in a self-healing fashion.
WHITE PAPER 13
Scavenging. The scavenging service ensures that all objects in the repository have valid metadata. In case meta-
data is lost or corrupted, the service tries to reconstruct it by using the secondary, or scavenging, metadata (a copy
of the metadata stored with each copy of the object data).
Data encryption. HCP supports encryption at rest capability that allows seamless encryption of data on the physi-
cal volumes of the repository. This ensures data privacy by preventing unauthorized access to the stored data. The
encryption and decryption are handled automatically and transparently to users and applications.
Versioning. HCP uses versioning to protect against accidental deletes and storing wrong copies of objects.
Data availability.
RAID protection. RAID storage technology provides efficient protection from simple disk failures. SAN-based
HCP systems typically use RAID-6 erasure coding protection to guard against dual drive failures.
Multipathing and zero-copy failover. These features provide data availability in SAN-based HCP systems.
Data protection level (DPL) and protection service. In addition to using RAID and SAN technologies to provide
data integrity and availability, HCP can use software mirroring to store the data for each object in multiple loca-
tions on different nodes. HCP groups system nodes into protection sets with the same number of nodes in each
set. It tries to store all the copies of the data for an object in a single protection set where each copy is stored on
a different node. The protection service enforces the required level of data redundancy by checking and repair-
ing protection sets. In case of violation, it creates additional copies or deletes extra copies of an object to bring
the object into compliance. If replication is enabled, the protection service can use an object copy from a replica
system if the copy on the primary system is unavailable.
Metadata redundancy. In addition to the data redundancy as specified by DPL, HCP creates multiple copies of
the metadata for an object on different nodes. Metadata protection level or MDPL is a system-wide setting that
specifies the number of copies of the metadata that the HCP system must maintain (normally 2 copies, MDPL2).
Management of MDPL redundancy is independent of the management of data copies for DPL.
Nondisruptive software and hardware upgrades. HCP employs a number of techniques that minimize or elimi-
nate any disruption of normal system functions during software and hardware upgrades. Nondisruptive software
upgrade (NDSU) is one of these techniques. It includes greatly enhanced online upgrade support, nondisrup-
tive patch management, and online upgrade performance improvements. HCP supports media-free and remote
upgrades, HTTP or REST drain mode, and parallel operating system (OS) installation. It also supports automatic
online upgrade commit, offline upgrade duration estimate, enhanced monitoring and email alerts, and other fea-
tures. Nodes can be added to an HCP system without causing any downtime. HCP also supports nondisruptive
storage upgrades that allow online storage addition to SAN-based HCP systems without any data outage.
Seamless application failover. This feature is supported by HCP systems in a replicated topology. This capabil-
ity includes seamless failover routing feature that enables direct integration with customer-owned load balancers
by allowing HTTP requests to be serviced by any HCP system in a replication topology. Seamless domain name
system (DNS) failover is an HCP built-in, multisite, load-balancing and high-availability technology that is ideal for
cost efficient, best-effort customer environments.
Replication. If enabled, this feature provides a multitude of mechanisms to ensure data availability. The rep-
lica system can be used both as a source for disaster recovery and to maintain data availability by providing
good object copies for protection and content verification services. If an object cannot be read from the primary
system, HCP can try to read the object from the replica if read-from-replica feature is enabled.
WHITE PAPER 14
Data security.
Authentication of management and data access.
Granular, multilayer data access permission scheme.
IP filtering technology and protocol-specific access or deny lists.
Secure Sockets Layer (SSL) support for HTTP and WebDAV data access, management access and replication.
Node login prevention.
Shredding policy and service.
Autonomic technology refresh. This feature is implemented as HCP migration service. It enables organizations
to maintain continuously operating content stores that allows them to preserve their digital content assets for the
long term.
Fixed-Content Archiving
Hitachi Content Platform is optimized for fixed-content data archiving. Fixed-content data is information that does
not change but must be kept available for future reference and be easily accessible when needed. A fixed-content
storage system is one in which the data cannot be modified. HCP uses WORM storage technology, and a variety
of policies and services (such as retention, content verification and protection) to ensure the integrity of data in the
repository. The WORM storage means that data, once ingested into the repository, cannot be updated or modified;
that is, the data is guaranteed to remain unchanged from when it was originally stored. If the versioning feature is
enabled within the HCP system, different versions of the data can be stored and retrieved in which case each version
is WORM.
Compliance, E-Discovery and Metadata Analysis

Custom metadata brings structure to unstructured content. It enables building massive unstructured data stores
by providing means for faster and more accurate access of content. Custom metadata gives storage managers the
meaningful information they need to efficiently and intelligently process data and apply the right object policies to
meet all business, compliance and protection requirements. Structured custom metadata (content properties) and
multiple custom metadata annotations take this capability to the next level by helping yield better analytic results
and facilitating content sharing among applications.
Regulatory compliance features include namespace retention mode (compliance and enterprise), retention classes,
retention hold, automated content disposition, and privileged delete and purge. HCP search capabilities include sup-
port for e-discovery for litigation or audit purposes, and allow direct 3rd-party integration through built-in open
APIs.
The search console offers a structured environment for creating and executing queries (sets of criteria that each
object in the search results must satisfy). End users can apply various selection criteria, such as objects stored before
a certain date or larger than a specified size. Queries return metadata for objects included in the search result. This
metadata can be used to retrieve the object. From the search console, end users can open objects, perform bulk
operations on objects (hold, release, delete, purge, privileged delete and purge, change owner, set ACL), and export
search results in standard file formats for use as input to other applications.
Search is enabled at both the tenant and namespace levels. Indexing is enabled on a per-namespace basis. Settings
at the system and namespace levels determine whether custom metadata is indexed in addition to system meta-
data and ACLs. If indexing of custom metadata is disabled, the MQE index does not include custom metadata. If a
namespace is not indexed at all, searches do not return any results for objects in this namespace.
WHITE PAPER 15
MQE indexes system metadata, custom metadata (optionally), and ACLs of objects in each search-enabled and
index-enabled namespace. In namespaces with versioning enabled it indexes only the current version of an object.
Each object has an index setting that affects indexing of custom metadata by the metadata query engine. If indexing
is enabled for a namespace, MQE always indexes system metadata and ACLs, regardless of the index setting for an
object. If the index setting is set to true, MQE also indexes custom metadata for this object.
The MQE index resides on designated logical volumes on the HCP nodes, sharing or not sharing the HDDS
space on these volumes with the object data, depending on the type of system and volume con- Demo
figuration. The Hitachi Data Discovery Suite search facility creates and maintains its own index that
resides separately in HDDS.
WATCH
REST clients can search HCP programmatically using the metadata query API. As with the search
console, the response to a query is metadata for the objects that meet the query criteria, in XML or
JSON format. Two types of queries are supported:
Object-based query locates objects that currently exist in the repository based on their metadata, including
system metadata, custom metadata and ACLs, as well as object location (namespace or directory). Multiple, robust
metadata criteria can be specified in object-based queries. Objects must be indexed to support this type of query.
Operation-based query provides time-based retrieval of objects transactions. It searches for objects based
on operations performed on the objects during specified time periods. And it retrieves records of object creation,
deletion and purge (user-initiated actions) and disposition and pruning (system-initiated actions). Operation-based
queries return not only objects currently in the repository but also deleted, disposed, purged or pruned objects. If
versioning is enabled, both current and old versions of objects can be returned. The response is retrieved directly
from the HCP metadata database and internal logs; thus, no indexing is required to support this type of query.
Operation-based queries enable HCP integration with backup servers, search engines (such as HDDS), policy
engines and other applications.
System Fundamentals
Hardware Overview
An individual physical Hitachi Content Platform instance, or HCP system, is not a single device; it is a collection of
devices that, combined with HCP software, can provide all the features of an online object repository while tolerating
node, disk and other component failures.
From a hardware perspective, each HCP system consists of the following categories of components:
Nodes (servers).
Internal or SAN-attached storage.
Networking components (switches and cabling).
Infrastructure components (racks and power distribution units).
System nodes are the vital part of HCP. They store and manage the objects that reside in the physical system stor-
age. The nodes are conventional off-the-shelf servers. Each node can have multiple internal physical drives and/or
connect to external Fibre Channel storage (SAN). In addition to using RAID and SAN technologies and a host of other
features to protect the data, HCP uses software mirroring to store the data and metadata for each object in multiple
locations on different nodes. For data, this feature is managed by the namespace data protection level (DPL) set-
ting, which specifies the number of copies of each object HCP must maintain in the repository to ensure the required
WHITE PAPER 16
level of data protection. For metadata, this feature is managed by the metadata protection level (MDPL), which is a
system-wide setting.
An HCP system uses private back-end and public front-end networks. The isolated back-end network is used
for vital internode communication and coordination. It uses a bonded Ethernet interface in each node, 2 Ethernet
switches, and 2 sets of cables connecting the nodes to the switches, thereby making it fully redundant. The front-end
network is used for customer interaction with the system and also uses a bonded Ethernet interface in each node.
The recommended setup includes 2 independent switches that connect these ports to the front-end (corporate)
network.
HCP runs on a redundant array of independent nodes (RAIN) or a SAN-attached array of independent nodes
(SAIN). RAIN systems use the internal storage in each node. SAIN systems use the external SAN storage. HCP is
offered as 2 products: HCP 300 (based on RAIN configuration) and HCP 500 (based on SAIN configuration).
Hitachi Content Platform RAIN (HCP 300)

The nodes in an HCP 300 system are Hitachi Compute Rack 210H (CR 210H) servers. RAIN nodes contain internal
storage: RAID controller and disks. All nodes use hardware RAID-5 data protection. In an HCP RAIN system, the
physical disks in each node form a single RAID group, normally RAID-5 (5D+1P) (see Figure 5). This configuration
helps ensure the integrity of the data stored on each node.
Figure 5. HCP 300 Hardware Architecture

WHITE PAPER 17
An HCP 300 (RAIN) system must have a minimum of 4 nodes. Additional nodes are added in 4-node increments.
An HCP 300 system can have a maximum of 20 nodes.
HCP 300 systems are normally configured with a DPL setting of 2 (DPL2), which, coupled with hardware RAID-5,
yields an effective RAID-5+1 total protection level.
Hitachi Content Platform SAIN (HCP 500/500XL)

The nodes in an HCP 500 system are either Hitachi Compute Rack 210H (CR 210H) or Hitachi Compute Rack 220S
(CR 220S) servers. The HCP 500 nodes contain Fibre Channel host bus adapters (HBAs) and use external Fibre
Channel SAN storage; they are diskless servers that boot from the SAN-attached storage. HCP 500 may use Fibre
Channel switches or have nodes directly connected to external storage. The HCP 500 system using direct connect
is shown in Figure 6.
The nodes in a SAIN system can have internal storage in addition to being connected to external storage. These
nodes are called HCP 500XL nodes. They are an alternative to the standard HCP 500 nodes and have the same
hardware configuration, except the addition of the RAID controller and internal hard disk drives. A typical 500XL node
internal storage configuration includes six 500GB 7200RPM SATA II drives in a single RAID-5 (5D+1P) RAID group,
with 2 LUNs: 31GB (operating system) and 2.24TB (database).
In HCP 500XL nodes the system metadata database resides on the local disks, which leads to more efficient and
faster database operations. As a result, the system has the ability to better support larger capacity and higher object
counts per node and address higher performance requirements. The HCP 500XL nodes are usually considered when
the system configuration exceeds 4 standard nodes.
Figure 6. HCP 500 Hardware Architecture (Direct Connect)

WHITE PAPER 18
Typically, the external SAN-attached storage uses RAID-6. Best protection and high availability of an HCP 500 system
is achieved by giving each node its own RAID group or Hitachi Dynamic Provisioning (HDP) pool containing one RAID
group. SAIN systems support multiple storage arrays in a single system or even for a single node.
HCP 500 and 500XL systems are supported with a minimum of 4 nodes. With a SAIN system, additional nodes
are added in pairs, so the system always has an even number of nodes. A SAIN system can have a maximum of
80 nodes. Both RAIN and SAIN systems can have a DPL as high as 4, which affords maximum data availability but
greatly sacrifices storage utilization.
SAIN systems introduce a number of SAN-specific features that help maintain the organization's data availability.
They include multipathing, cross mapping and zero-copy failover.
In a SAN environment, multiple physical paths may be configured between an HCP node and any given LUN that
maps to it. Multipathing facilitates uninterrupted read and write access to the system, protecting it against storage
array controller, Fibre Channel switch, fiber optic cable and HBA port failures.
The process of one node automatically taking over management of storage previously managed by another, failed
node is called zero-copy failover. To support zero-copy failover, each LUN that stores object data or MQE index
must map to 2 different nodes. The pair of nodes forms a set such that the LUNs that map to one of the nodes also
map to the other. This is called cross-mapping. In a cross-mapped pair of nodes, the LUNs on a node that are
managed by this node during normal operation are called primary LUNs; the LUNs from the other node that will be
managed by this node after failover are called standby LUNs. Cross-mapping of LUNs from one node to another
node in the system allows instantaneous access to data from failed nodes.
Software Overview
Hitachi Content Platform system software consists of an operating system and core software. The Linux-based HCP
operating system is called appliance operating system.
The core software includes components that:

Enable access to the object repository through the industry-standard HTTP or HTTPS, WebDAV, CIFS, NFS, SMTP
and NDMP protocols.
Ingest fixed-content data, convert it into HCP objects, and manage the objects data and metadata over time.
Maintain the integrity, stability, availability and security of stored data by enforcing repository policies and executing
system services.
Enable configuration, monitoring and management of the HCP system through a human-readable interface.
Support searching the repository through an interactive Web interface (the search console) and a programmatic
interface (the metadata query API).
System Organization
HCP is a fully symmetric, distributed application that stores and manages objects (see Figure 7). An HCP
object encapsulates the raw fixed-content data that is written by a client application, and its associated system and
custom metadata. Each node in an HCP system is a Linux-based server that runs a complete HCP instance. The
HCP system can withstand multiple simultaneous node failures, and acts automatically to ensure that all object and
namespace policies are valid.
WHITE PAPER 19
External system communication is managed by the DNS manager, a distributed network component that balances
client requests across all nodes to ensure maximum system throughput and availability. The DNS manager works
in conjunction with a corporate DNS server to allow clients to access the system as a single entity, even though the
system is made up of multiple independent nodes. The HCP system is configured as a subdomain of an existing cor-
porate domain. Clients access the system using predefined protocol-specific or namespace-specific names.
Figure 7. The High-Level Structure of an HCP System
While not required, using DNS is important in ensuring balanced and problem-free client access to an HCP system,
especially for the REST HTTP clients.
Each node in the HCP system runs a complete software stack made up of the appliance operating system and the
HCP core software. All nodes have an identical software image to ensure maximum reliability and fully symmetrical
operation of the system. An HCP system node can serve as both an object repository and an access point for client
applications and is capable of taking over the functions of other nodes in the event of node failure.
All intranode and internode communication is based on scalable performance-oriented cluster communication
(SPOCC). This efficient, reliable and easily expandable message-based middleware runs over TCP/IP. It functions
as a unified message bus for distributed applications, forming the backbone of the back-end network where all
node interaction occurs. SPOCC supports multicast and point-to-point connections and is designed to deal grace-
fully with network and hardware failures.
An HCP system is inherently a distributed system. Many of its core components, including the database, have a
distributed nature. To process incoming client requests, software components on a particular node need to interact
WHITE PAPER 20
with the components on other nodes across the system by means of the SPOCC-powered system backbone. All
runtime operations are distributed among the system nodes. Each node bears equal responsibilities for process-
ing requests, storing data and sustaining the overall health of the system. No single node becomes a bottleneck: All
nodes are equally capable of handling any client request, ensuring reliability and performance.
Because HCP uses a distributed processing scheme, the system can scale linearly as the repository grows in size
and in the number of clients accessing it. When a new node is added to the HCP system, the system automatically
integrates that node into the overall workflow without manual intervention.
Namespaces and Tenants

Main Concepts
A Hitachi Content Platform repository is partitioned into namespaces. A namespace is a logical repository as viewed
by an application. Each namespace consists of a distinct logical grouping of objects with its own directory struc-
ture, such that the objects in one namespace are not visible in any other namespace. Access to one namespace
does not grant a user access to any other namespace. To the user of a namespace, the namespace is the repository.
Namespaces are not associated with any preallocated storage; they share the same underlying physical storage.
Namespaces provide a mechanism for separating the data stored for different applications, business units or custom-
ers. For example, there may be one namespace for accounts receivable and another for accounts payable. While a
single namespace can host one or more applications, it typically hosts only one application. Namespaces also enable
operations to work against selected subsets of repository objects. For example, a search could target the accounts
receivable and accounts payable namespaces but not the employees namespace.
Namespaces are owned and managed by tenants. Tenants are administrative entities that provide segregation of
management, while namespaces offer segregation of data. A tenant typically represents an actual organization
such as a company or a department within a company that uses a portion of a repository. A tenant can also corre-
spond to an individual person. Namespace administration is done at the owning tenant level.
Clients can access HCP namespaces through HTTP or HTTPS, WebDAV, CIFS, NFS and SMTP protocols. These
protocols can support authenticated and/or anonymous types of access. HCP namespaces are owned by HCP ten-
ants. An HCP system can have multiple HCP tenants, each of which can own multiple namespaces. The number of
namespaces each HCP tenant can own can be limited by an administrator.
Figure 8 shows the logical structure of an HCP system with respect to its multitenancy features.
WHITE PAPER 21
Figure 8. HCP System Logical Layout: Namespaces and Tenants
User and Group Accounts

User and group accounts control access to various Hitachi Content Platform interfaces and give users permission to
perform administrative tasks and access namespace content.
An HCP user account is defined in HCP; it has a set of credentials, username and password, which is stored locally
in the system. The HCP system uses these credentials to authenticate a user, performing local authentication.
An HCP group account is a representation of an Active Directory (AD) group. To create group accounts, HCP must
be configured to support Active Directory. The group account enables AD users in the AD group to access one or
more of HCP interfaces.
Like HCP user accounts, HCP group accounts are defined separately at the system and tenant levels. Different ten-
ants have different user and group accounts. These accounts cannot be shared across tenants. Group membership
is different at the system and tenant levels.
HCP administrative roles can be associated with both system-level and tenant-level user and group accounts. Data
access permissions can be associated with only tenant-level user and group accounts. Consequently, system-level
WHITE PAPER 22
local and AD users can only be administrative users, while tenant-level local and AD users can be both adminis-
trative users and have data access permissions. Tenant-level users can have only administrative roles without
namespace data permissions, or only namespace data permissions without administrative roles, or any combination
of administrative roles and namespace data permissions.
System and Tenant Management

The implementation of segregation of management in the Hitachi Content Platform system is illustrated in Figure 8.
An HCP system has both system-level and tenant-level administrators:

System-level administrative accounts are used for configuring system-wide features, monitoring system hard-
ware and software and overall repository usage, and managing system-level users. The system administrator user
interface, the system management console, provides the functionality needed by the maintainer of the physical
HCP system. For example, it allows the maintainer to shut down the system, see information about nodes, manage
policies and services, and create HCP tenants. System administrators have a view of the system as a whole,
including all HCP software and hardware that make up the system, and can perform all of the administration for
actions that have system scope.
Tenant-level administrative accounts are used for creating HCP namespaces and configuring individual tenants
and namespaces. They can monitor namespace usage at the tenant and namespace level, manage tenant-level
users, and control access to namespaces. The required functionality is provided by the tenant administrator user
interface, the tenant management console. This interface is intended for use by the maintainer of the virtual HCP
system (an individual tenant with a set of namespaces it owns). The tenant-level administration feature facilitates
segregation of management, which is essential in cloud environments.
An HCP tenant can optionally grant system-level users administrative access to itself. In this case, system-level users
with the monitor, administrator, security or compliance role can log into the tenant management console or use the
HCP management API for that tenant. System-level users with the monitor or administrator role can also access
the tenant management console directly from the system management console. This effectively enables a system
administrator to function as a tenant administrator, as shown in Figure 8. System-level users can perform all the
activities allowed by the tenant-level roles that correspond to their system-level roles. An AD user may belong to
AD groups for which the corresponding HCP group accounts exist at both the system and tenant levels. This user
has the roles associated with both the applicable system-level group accounts and the applicable tenant-level group
accounts.
Object Policies
Objects in a namespace have a variety of properties, such as the retention setting or index setting. These proper-
ties are defined for each object by the object system metadata. Objects can also be affected by some namespace
properties, such as the default metadata settings that are inherited by new objects stored in the namespace, or the
versioning setting. Both the namespace-level settings and the properties that are part of the object metadata serve as
parameters for the Hitachi Content Platform system's transactions and services, and determine the object's behavior
during its life cycle within the repository. These settings are called policies.
An HCP policy is one or more settings that influence how transactions and internal processes (services) affect
objects in a namespace. Policies ensure that objects behave in expected ways.
The HCP policies are described in Table 1.

WHITE PAPER 23
Table 1. HCP Policies
Policy Name Policy Description and Components Transactions and Services Influenced
DPL System DPL setting, namespace DPL setting. Object creation. Protection service.
Retention Default retention setting, object retention setting, Object creation, object deletion, system and custom
hold setting, system metadata and custom metadata handling. Disposition, Garbage collection
metadata options for objects under retention. services.
Shredding Default shred setting, object shred setting. Object deletion. Shredding service.
Indexing Default index setting, object index setting. Metadata query engine.
Versioning Versioning setting, pruning setting. Object creation and deletion. Garbage collection
service.
Custom Metadata XML syntax validation. Add/replace custom metadata operations.

Validation
Each policy may consist of one or more settings that may have different scopes of application and methods of con-
figuration. Policy settings are defined at the object and namespace level. While all policies affect objects, only the
object-level policy settings are included in the object's metadata; they affect individual objects. The namespace-level
policies affect all objects in the namespace and are part of the namespace configuration.
Table 2 lists all policy settings sorted according to their scope and method of configuration.
WHITE PAPER 24
Table 2. HITACHI CONTENT PLATFORM Policy Settings: Scope and Configuration
Hitachi Content Platform Namespaces

Policy Policy Setting
Scope/Level Configured Via
Data Protection Level System DPL: 1-4 System System UI
Namespace DPL: 1-4, dynamic Namespace Tenant UI, MAPI
Retention Default retention setting: fixed date, offset, special Namespace Tenant UI, MAPI
value, retention class
Retention setting: fixed date, offset, special value, Object REST API, retention.txt
retention class
Hold setting: true or false Object REST API
Ownership and POSIX permission changes under Namespace Tenant UI, MAPI
retention: true or false
Custom metadata operations allowed under retention Namespace Tenant UI, MAPI
Indexing Index setting: true or false (1/0) Object REST API, index.txt
Default index setting: true or false Namespace Tenant UI, MAPI
Shredding Shred setting: true or false (1/0) Object REST API, shred.txt
Default shred setting: true or false Namespace Tenant UI, MAPI
Custom Metadata Validation XML validation: true or false Namespace Tenant UI, MAPI
Versioning Versioning setting: true or false Namespace Tenant UI, MAPI
Pruning setting: true/false and number of days for Namespace Tenant UI, MAPI
primary or replica
Content Management Services

A Hitachi Content Platform service is a background process that performs a specific function that is targeted at
preserving and improving the overall health of the HCP system. In particular, services are responsible for optimizing
the use of system resources and maintaining the integrity and availability of the data stored in the HCP repository.
Services are configured during HCP installation and generally run without user intervention. They can be enabled or
disabled and started or stopped at the system level via the system management console using service role. Services
run either continuously, periodically (on a specific schedule), in response to certain events, or manually. Each service
WHITE PAPER 25
runs independently of other services. Multiple services can be executing at the same time, although some services
take precedence over others.
Services work by detecting and repairing conditions that do not conform to their requirements, while iterating over
objects in the background. They work on the repository as a whole, across all namespaces, with the exception of
disposition and replication that can be enabled or disabled at the namespace level.
HCP implements 12 services: protection, content verification, scavenging, garbage collection, duplicate elimination,
shredding, disposition, compression, capacity balancing, storage tiering, migration and replication.
The HCP services are briefly described in Table 3.

WHITE PAPER 26
Table 3. Hitachi Content Platform Services
Policy Description
Protection Enforces DPL policy compliance by ensuring that the proper number of copies of each object exists in the system,
and that damaged or lost objects can be recovered. Any policy violation invokes repair process. Offers both
scheduled and event-driven service. Events trigger a full service run, even if the service is disabled, after a
configurable amount of time: 90 minutes after node shutdown; 1 minute after logical volume failure; 10 minutes after
node removal.
Content Guarantees data integrity of repository objects by ensuring that the content of a file matches its digital signature.
Verification Repairs the object if the hash does not match. Detects and repairs discrepancies between primary and secondary
metadata. SHA-256 hash algorithm is used by default. Checksums are computed on external and internal files.
Computationally intensive and time-consuming service. Runs according to the active service schedule.
Scavenging Ensures that all objects in the repository have valid metadata, and reconstructs metadata in case the metadata is
lost or corrupted, but data files exist. The service verifies that both the primary metadata for each data object and the
copies of the metadata stored with the object data (secondary metadata) are complete, valid and in sync with each
other. Computationally intensive and time-consuming service. Scheduled service.
Garbage Reclaims storage space by purging hidden data and metadata for objects marked for deletion, or left behind by
Collection incomplete transactions. It also deletes old versions of objects that are eligible for pruning. When applicable, the
deletion triggers the shredding service. Scheduled service, not event driven.
Duplicate Identifies and eliminates redundant objects in the repository, and merges duplicate data to free space. The hash
Elimination signature of external file representations is used to select objects as input to the service. These objects are then
checked in a byte for byte manner to ensure that the data contents are indeed identical. Scheduled service.
Shredding Overwrites storage locations where copies of the deleted object were stored in such a way that none of its data
or metadata can be reconstructed, for security reasons. Also called secure deletion. The default HCP shredding
algorithm uses 3 passes to overwrite an object and is DoD 5220.22-M standard compliant. The algorithm is selected
at install time. Event-driven only service, not scheduled. It is triggered by the deletion of an object marked for
shredding.
Disposition Automatic cleanup of expired objects. All HCP namespaces can be configured to automatically delete objects
after their retention period expires. Can be enabled or disabled both at the system and namespace level; enabling
disposition for a namespace has no effect if the service is disabled at the system level. Disposition service deletes
only current versions of versioned objects. Scheduled service.
Compression Compresses object data to make more efficient use of system storage space. The space reclaimed by compression
can be used for additional storage. A number of configurable parameters are provided via System Management
Console. Scheduled service.
Capacity Attempts to keep the usable storage capacity balanced (roughly equivalent) across all storage nodes in the system. If
Balancing storage utilization for the nodes differs by a wide margin, the service moves objects around to bring the nodes closer
to a balanced state. Runs only when started manually. Additions and deletions of objects do not trigger the service.
Typically, an authorized HCP service provider starts this service after adding new storage nodes to the system. In
addition, while not part of the service, during normal system operation new objects tend to naturally spread among all
nodes in the system in fairly even proportion. This is due to the nature of the storage manager selection algorithm and
resource monitoring of the administrative engine.
Storage Tiering Determines which storage tiering strategy applies to an object, evaluates where the copies of the object should reside
based on the rules in the applied service plan, and moves objects between running and spin-down storage as
needed. Active only in spin-down-capable HCP SAIN systems. Scheduled service.
Migration Migrates data off selected nodes in an HCP RAIN system or selected storage arrays in an HCP SAIN system to allow
for these devices to be retired. Can only be run manually.
Replication Copies one or more tenants from one HCP system to another to ensure data availability and enable disaster recovery.
Ongoing service: once set up, runs continually in the background. Users can configure, monitor and control the
activity of this service. Replication is an optional feature.
WHITE PAPER 27
Conclusion
Hitachi Data Systems object storage solutions avoid the limitations of traditional storage systems by intelligently stor-
ing content in far larger quantities and in a much more efficient manner. These solutions provide for the new demands
imposed by the explosion of unstructured data and its growing importance to organizations, their partners, their cus-
tomers, their governments and their shareholders.
Hitachi Content Platform, a Hitachi Data Systems object storage platform, treats file data, file metadata and custom
metadata as a single object that is tracked and stored among a variety of storage tiers. With secure multitenancy and
configurable attributes for each logical partition, the HCP object repository can be divided into a number of smaller
virtual object stores that present configurable attributes to support different service levels. This allows the object store
to support a wide range of workloads, such as content preservation, data protection, content distribution and even
cloud, from a single physical infrastructure.
HCP is also part of a larger portfolio of solutions that include Hitachi Data Ingestor for elastic, backup-free file services
and Hitachi Content Platform Anywhere for synchronization and sharing of files and folders across a wide range of
user devices. One infrastructure is far easier to manage than disparate silos of technology for each application or
set of users. By integrating many key technologies in a single storage platform, Hitachi Data Systems object storage
solutions provide a path to short-term return on investment and significant long-term efficiency improvements. They
help IT evolve to meet new challenges, stay agile over the long term, and address future change and growth.
Corporate Headquarters Regional Contact Information
2845 Lafayette Street Americas: +1 408 970 1000 or info@hds.com
Santa Clara, CA 95050-2639 USA Europe, Middle East and Africa: +44 (0) 1753 618000 or info.emea@hds.com
www.HDS.com community.HDS.com Asia Pacific: +852 3189 7900 or hds.marketing.apac@hds.com
Hitachi Data Systems Corporation 2014. All rights reserved. HITACHI is a trademark or registered trademark of Hitachi, Ltd. Microsoft, Windows, Azure and Active Directory are
trademarks or registered trademarks of Microsoft Corporation. All other trademarks, service marks, and company names are properties of their respective owners.
WP-425-D DG November 2014

Hitachi White Paper Introduction To Object Storage and HCP

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Hitachi White Paper Introduction To Object Storage and HCP

Caricato da

Copyright:

Formati disponibili

DATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG

ON POWERFUL RELEVANT PERFORMANCE SOLUTION CLO

Better Object Storage

The Fundamentals of Hitachi Content Platform

Better Object Storage With Hitachi Content Platform

Figure 1. A single Hitachi Content Platform supports a wide range of applications.

Main Concepts and Features

Figure 2. HCP Object

HCP File System

Figure 3. HCP File System Data and Metadata Structure

Adaptive Cloud Tiering

The topology of the adaptive cloud tiering is shown in Figure 4.

Figure 4. Adaptive Cloud Tiering

Replication and Global Access Topology

Common Use Cases

File-sync-and-share solution. HCP, working in tandem with Hitachi Content Platform

Backup-Free Data Protection and Content Preservation

Compliance, E-Discovery and Metadata Analysis

Hitachi Content Platform RAIN (HCP 300)

Figure 5. HCP 300 Hardware Architecture

Hitachi Content Platform SAIN (HCP 500/500XL)

Figure 6. HCP 500 Hardware Architecture (Direct Connect)

The core software includes components that:

Figure 7. The High-Level Structure of an HCP System

Namespaces and Tenants

Figure 8. HCP System Logical Layout: Namespaces and Tenants

User and Group Accounts

System and Tenant Management

An HCP system has both system-level and tenant-level administrators:

The HCP policies are described in Table 1.

Table 1. HCP Policies

Custom Metadata XML syntax validation. Add/replace custom metadata operations.

Table 2. HITACHI CONTENT PLATFORM Policy Settings: Scope and Configuration

Hitachi Content Platform Namespaces

Data Protection Level System DPL: 1-4 System System UI

Namespace DPL: 1-4, dynamic Namespace Tenant UI, MAPI

Hold setting: true or false Object REST API

Default index setting: true or false Namespace Tenant UI, MAPI

Default shred setting: true or false Namespace Tenant UI, MAPI

Versioning Versioning setting: true or false Namespace Tenant UI, MAPI

Content Management Services

The HCP services are briefly described in Table 3.

Table 3. Hitachi Content Platform Services

Potrebbero piacerti anche