Distributed Storage Systems: Bachelor of Technology Computer Science and Engineering

DISTRIBUTED STORAGE
SYSTEMS
A Seminar Report
done in partial fulfilment of
the requirements for the award of the degree of
Bachelor of Technology
in
Computer Science and Engineering
by
KANISHK TEWARI
B070351CS
Department of Computer Science & Engineering

National Institute of Technology Calicut
Kerala - 673601
Monsoon 2010
National Institute of Technology Calicut
Department of Computer Science & Engineering
Certified that this Seminar Report entitled
DISTRIBUTED STORAGE
SYSTEMS
is a bonafide record of the Seminar presented by
KANISHK TEWARI
B070351CS
in partial fulfilment of
the requirements for the award of the degree of
Bachelor of Technology
in
Computer Science & Engineering
SREENU NAIK BHUKYA

(Seminar Co-ordinator)
Assistant Professor
Dept.of Computer Science & Engineering
Acknowledgment
I would like to thank Shri. Sreenu Naik Bhukya, Assistant Professor and his associates, De-
partment of Computer Science and Engineering, National Institute of Technology Calicut, and
LATEX, for providing me an opportunity to present this work. I would also like to thank God
and my parents.
3
Contents
1 Introduction 7
1.1 DISKOS (Distributed hash table-enabled virtual dISK based on Operating System) 7
1.2 Data Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Data Access in Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Web 2.0 and Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 DISKOS System Overview 9

2.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Replication Algorithms 12
4 Data Access Configuration 14
5 Web 2.0 and DDS API 16
6 Conclusion 20
4
List of Figures
1 [4]DISKOS Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 [2]The Classic Web 2.0 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 [2]Storage(Information Upload) . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 [2]Access(Data Linkage) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 [2]Data Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5
Abstract
During the past years, anywhere, anytime access to large and reliable storage has become
increasingly important in both enterprise and home computer environments. Also, web ap-
plications store vast quantities of information online, storage, protection and maintenance of
which becomes the sole responsibility of web application provider. This is where the distributed
storage system makes its foray as it has evolved from one host infrastructure to distributed en-
vironment. It becomes more and more evident for the necessary of the separation of computing
and storage. So, in this paper we discuss various aspects about distributed storage systems like
data replication schemes, integration of data into the Web 2.0 model and data access methods.
6
1 Introduction
1.1 DISKOS (Distributed hash table-enabled virtual dISK based on
Operating System)
During the past years, anywhere, anytime access to large and reliable storage has become
increasingly important in both enterprise and home computer environments. Enterprise data in
particular is one of the most valuable assets of any organization and must be readily and rapidly
accessible from users and customers according to specific access policies. Key requirements
include high performance in terms of data storage and retrieval, data redundancy, resilience to
disk failures and dynamic insertion of new storage space. Similar requirements exist in the case
of home storage, which however is generally confined to single-access use. Distributing storage
across several hosts is a suitable approach regarding these requirements. Accordingly, the
distributed storage research community devotes significant efforts to the challenge of designing
such systems. Notable innovations include remote and distributed file systems, generic wide-
area file storage and sharing systems built on top of structured peer-to-peer networks. Despite
this considerable variety of existing systems, we identified vast room for improvement that can
be unleashed by combining the benefits emanating through the use of a distributed hash table
with a low level operating system storage interface. Ergo, we present the design, implementation
and the experimental evaluation of DISKOS[4], a Distributed hash table-enabled virtual dISK
based on Operating System principles and mechanisms. It is a distributed storage infrastructure
suitable for both cluster and wide-area storage environment deployments.
1.2 Data Replication
Undesired delays in accessing the information have triggered much research activity on im-
proving web performance through caching (proxy servers) and replication (mirror servers). In
this paper we address the problem of replicating data objects according to their size, read and
write frequencies as well as sites capacities, in order to minimize the network traffic, which
would lead to the reduction of average response time. A more spherical study of replication
would include consistency and fault tolerance issues. This work focus on clarifying the data
replication problem. We provide a detailed formulation in the context of both read and write
queries. The resulting Data Replication Problem (DRP)[1] is provably NP-complete. There is
a greedy heuristic, which yields good solutions when the number of reads is large. However,
the greedy algorithm has its shortcoming in dealing with situations when the number of writes
7
is large and the storage capacity is limited. To overcome these limitations, an efficient genetic
algorithm based heuristic provides good quality solutions.
1.3 Data Access in Distributed Computing
The increasing demand for massive data processing has prompted the evolution from one host
to distributed environment. Distributed computing has become the most important method for
massive data processing and many platforms have emerged. At the same time the distributed
storage system has been used to supply massive distributed data. These different storage
systems vary greatly in design and implementation and it is desirable to construct the computing
that can adapt to a variety of applications. So data accessing becomes the key problem to
achieve it. Some research has been developed on this. Such as SRB(storage resource broker)[3]
,it supplies plenty of data accessing methods and protocols for distributed storages. It focuses
more on the data supplies from the resource and can be seen as a middleware between the
computing and the storage. Some research on the accessing of certain data type such as HENP
project[3] in Berkeley on physical data processing. The accessing of distributed data will be
designed as an interface on the accessing from the computing. The aim is to improve the
supportability of various storages and the data processing reliability and validity.
1.4 Web 2.0 and Data Storage
The current breed of Web Applications, known as Web 2.0, rely on monolithic storage models
which place the sole burden of data management on the web application provider. Having the
application provider manage this data has resulted in problems with data ownership issues, data
freshness and data duplication, as has been covered by our previous work . As the majority
of data in Web 2.0 is user-generated, it is suggested that the responsibility for storing user-
generated content should be given to the data owner themselves. As such, a distributed storage
model which allows web applications to offload the storage and management of usergenerated
content to storage systems managed by the data owner, is presented. The Distributed Data
Service (DDS) API[2] allows web applications to seamlessly present user-generated content to a
3rd party user. The 3rd- party interacts directly with both the web application and numerous
DDS systems which host the web applications content. Data owners can manage their data
elements by interacting with a DDS directly while also exposing this data to web applications
via a publish/subscribe model.
8
2 DISKOS System Overview
The distributed storage systems designed so far are tailored to specific storage environments
and are mostly implemented on the file system level of the I/O stack of an operating system.
Secondly, distributed file systems are optimized to specific application characteristics but disk
access patterns change over time as applications evolve, a fact that ensures a continuous need
for new file systems. But DISKOS, on the other hand, is able to achieve high performance in
a storage environment consisting of more than two storage hosts.
Overall, it appears that most distributed storage systems assume the use of expensive storage
and network equipment, which in some cases is totally proprietary. Other systems assume that
system reliability is maintained through the use of some type of replication mechanism, which is
implemented by external software or hardware. Also, cluster file systems require the existence
of powerful, dedicated metadata servers for conducting all file metadata operations, resulting to
a hierarchical structure that scales well only in combination with high-speed, low-latency and
symmetric network links. DISKOS, on the other hand, is a completely decentralized storage
system that may be deployed even in poor and heterogeneous environments, in terms of CPU,
storage and network resources. In DISKOS, there is decoupling of file system-related semantics
such as directories’ and files’ metadata storage and management, concurrent file access poli-
cies and file locking mechanisms from the actual distribution of data and provide a low-level
operating system (OS) storage interface, which can be directly integrated with existing or new
OS compliant file systems. As a result of this, DISKOS becomes a file system neutral storage
infrastructure, whose main component is implemented as a regular disk driver embedded in an
operating system kernel. File system neutrality is achieved since the disk driver is agnostic to
the I/O requests’ content.
To achieve scalability in terms of supporting thousands of users in both wide area networks
and cooperative storage environments, and to enable self management, complete decentraliza-
tion, data replication, resilience to failures, and transparency regarding underlying hardware
and exported interfaces to the upper layers, including applicability to a multitude of storage
environments, we chose to exploit the basic principles of Distributed Hash Tables(DHTs)[4].
Here, the DHT acts as storage provider for block requests that the disk driver has to satisfy.
9
Figure 1: [4]DISKOS Overview
2.1 System Overview
In this section we briefly discuss the components of the proposed system and their interaction
with each other and other parts of the I/O system of an operating system. Every application
that runs in the user space of the operating system is not able to directly access any kernel
component, except by using the system calls exported by the kernel. For instance, when an
application has to open a file found in a local file system for reading, it simply issues an open()
system call along with the OS specific arguments.
One of the main components of the proposed system, the Virtual Disk Driver (VDD), is built
in the kernel and therefore it supports direct integration with every file system that implements
the interfaces exported by the kernel. The VDD is directly connected with the Virtual Device
Controller (VDC). This connection is accomplished using some kind of IPC mechanism sup-
ported by the operating system. The VDD receives I/O requests for sectors from the upper
levels of the I/O system and asynchronously forwards them to the VDC. The VDC provides
the VDD with higher level I/O services, such as request buffering and sector prefetching. It
is also the component, which, on behalf of the VDD, accesses the DHT, which is implemented
in the Storage Provider (SP). The SP implements a DHT peer, which asynchronously receives
10
and satisfies requests from the VDC. The VDC and the SP are user level processes.
The reason for choosing a three-tier architecture is twofold. Firstly, the VDD is a kernel com-
ponent, and therefore it has to be lightweight in terms of memory and CPU usage. Secondly,
separating the VDC from the SP allows a host that needs access to a distributed storage space
to connect to a remote SP, rather than implementing one itself. Using this schema, the pro-
posed system can also be applied to devices with limited CPU and storage resources, such as
mobile phones and PDAs.
11
3 Replication Algorithms
Our replication policy assumes the existence of one primary copy for each object in the network.
Let SPk be the site which holds the primary copy of Ok , the object, i.e., the only copy in the
network that cannot be deallocated, hence referred to as primary site of the kth object. Each
primary site SPk , contains information about the whole replication scheme Rk of Ok . This can
be done by maintaining a list of the sites where the kth object is replicated at, called from now
on the replicators of Ok . Moreover, every site S(i) stores a two-field record for each object. The
(i)
first field is its primary site SPk and the second the nearest SNk site to S(i) which holds a
(i)
replica of Ok . In other words, SNk is the site for which the reads from S(i) for Ok , if served
(i)
there, would incur the minimum possible communication cost. It is possible that SNk = S(i) ,
(i)
if S(i) is a replicator or the primary site of Ok . Another possibility is that SNk =SPk , if the
primary site is the closest one holding a replica of Ok .
(i)
When a site S(i) reads an object, it does so by addressing the request to the corresponding SNk .
For the updates we assume that every site can update every object. Updates of an object Ok are
performed by sending the updated version to its primary site SPk , which afterwards broadcasts
it to every site in its replication scheme Rk . The simplicity of this policy allows us to develop
a general cost model that can be used with minor changes to formalize various replication and
consistency strategies.
We are basically interested in minimizing the total network transfer cost (NTC) due to object
movement, since the communication cost of control messages has minor impact to the overall
performance of the system. There are two components affecting NTC. First, is the NTC created
from the read requests.
(i)
Let Rk denote the total NTC, due to S(i) s reading requests for object Ok , addressed to the
(i)
nearest site SNk .
(i) (i) (i)

Rk = rk · ok · C(i, SNk ) (1)
where,
C(i,j) = Communication cost (per unit) between sites i and j
(i)
rk = Number of reads from site i for object k
ok = Size of object k
(i)
The second component of NTC is the cost arising due to the writes. Let Wk be the total
12
NTC, due to S(i) s writing requests for object Ok , addressed to the primary site SPk .
 
(i) (i) X
Wk = wk · ok · C(i, SPk ) + C(SPk , j) (2)
j∈Rk j6=i
where,
(i)
wk = Number of writes from site i for object k
Let Xik =1 if S(i) holds a replica of object Ok , and 0 otherwise. Hence Xik defines an MxN
replication matrix, named X, with boolean elements. Sites which are not replicators of object
create NTC equal to the communication cost of their reads from the nearest replicator, plus
that of sending their writes to the primary site of Ok . Sites belonging to the replication scheme
of Ok , are associated with the cost of sending/receiving all the updated versions of it. Using
the above formulation, the Data Replication Problem (DRP) can be defined as:
Find the assignment of 0, 1 values in the X matrix that minimize D.
There is a greedy method which tries to create replicas of objects by introducing a new term,
(i)
Bk , called the replication benefit for each site S(i) and object Ok . Larger the replication benefit
value, better the chance of an object’s replica to be created in that site.
(i)
We define the replication benefit value, Bk , as:
(i) PM (x) (i)

(i) Rk − h x=1 wk · ok · C(i, SPk ) − Wk i
Bk = (3)
ok
13
4 Data Access Configuration
Efficient data accessing from different storage systems was first considered for implementation
of the computing platform in our work. Here, the method proposed shields the difference below
and supplies the computing platform with universal data accessing to achieve higher flexibility
and efficiency.
The Data I/O model[3] which covers the difference of the below storage systems and fulfills
the reaction between the computing platform and the storage. This enables the computing
platform to focus only on data accessing from the I/O . The data storage and management will
not impact the computing.
The details of the storage system are covered by I/O. When different storage systems are used
in the simple configuration (or it is called the driver), then the computing can call the universal
API to make the read or write operations from the storage. This decreases the dependence
between the storage and the computing. The computing can configure with different storages.
When constructing a distributed application the storage and the computing can be implemented
separately therefore making the applications much more flexible.
The distributed computing platform can react with the distributed storage by the I/O. The
reaction between them should be emphasized. This is supplied by the framework of the comput-
ing platform. The properties of the different storage systems can be configured in configuration
files then the computing platform will switch to the corresponding storage system.
The design and implementation of configuration files is essential to universal data accessing.
The configuration file for different distributed file systems is DFS-Mapping.xml. It configures
the IO operations of the storage system and other information related details.
For example,
h?xmlversion = ”1.0”?i
h?xml − stylesheettype = ”text/xsl”href = ”conf iguration.xsl”?i
hConf igurationi
hnamei dfs.nameh/namei
hvaluei h/valuei
hdescriptioni The name of this Distributed File System, such as HDFS or GFSh/descriptioni
hnamei dfs.io.inputh/namei
hvaluei h/valuei
hdescriptioni The DataInputStream of the Distributed File System h/descriptioni
14
hnamei dfs.io.outputh/namei
hvaluei h/valuei
hdescriptioni The DataOutputSteam of the Distributed File Systemh/descriptioni
... ....
h/conf igurationi
The framework takes the mapping from distributed file systems and data I/O. Users can make
the simple configuration for the computing platform to achieve the universal data accessing
from low level file systems. Varieties are hidden so that reaction between the computing and
the storage becomes clearer and easier.
15
5 Web 2.0 and DDS API
Figure 2: [2]The Classic Web 2.0 Model
The problems with the existing storage models in Web 2.0 can be viewed from two aspects
- the web application and the end-user. From a web application viewpoint, monolithic storage
structures pose a lot problems while, three key problems affecting the end-user are: data own-
ership, data freshness and data duplication.
The Distributed Data Service(DDS) API identified problems by introducing additional technol-
ogy that allows storage of data to be offloaded from the web application. The model definition
can be broken into three main areas: storage, access and presentation.
The Storage[2] service is responsible for storing the owners data elements and is located either
16
in-house or outsourced to a specialised data storage provider. When a Web 2.0 site requests
content from the data owner, he or she provides a link into the DSS in which the data is stored.
Later, when the data is needed for display as part of a page generated by the site, the dis-
playing browser instance uses the embedded link to directly retrieve the data from the ownerss
DSS[2].To aid integration of the DDS into the data owners web experience a new module is
introduced called the DDS browser extension. Communication between the browser extension
and the storage service is achieved by inserting a DDSStorageService-AttachRequest into the
HTTP response. Thisrequest is transparently inspected on-the-fly by the browser extension
which allows the extension to re-write parts of the response HTML dynamically. For the en-
duser, the extension re-writes links into a remote DSS with data returned from that specific
DSS.
Figure 3: [2]Storage(Information Upload)
Phase two of the process, involves the data owner publishing content to the storage service.
Again, the implementation is not restricted by the API as long as each piece of stored data
is given a unique identifier that is global in that data owners domain. The unique identifier
comprises three components:
[name]:[path]@[system]
The name component represents an identifier for each specific piece of data (for example, credit
card). The path component supports a hierarchical storage structure allowing a data owner
wishing to store various groupings of data (for example, a data owner may have separate sets
of personal and business data). The system component is a unique identifier for the specific
17
distributed data service.
Once the data owner has uploaded data into their DDS, the next stage is to allow web applica-
tions to Access[2] this data.During the registration process for a DDS-enabled web application,
the web browser extension inspects the HTTP response traffic from the application and detects
a DDS-Application-Subscribe-AuthRequest request. This request, and the corresponding re-
sponse generated by the browser extension, is the basis for establishing a trust relationship
between the web application and the distributed data service.Once this relationship is estab-
lished, the data owner establishes a link between their data element and the web application.
This link is akin to the data owner uploading content to the web application in a standard Web
2.0 scenario, except that in the DDS design the data owner provides the unique identifier of the
data as opposed to uploading the content itself. Once the link is established between a piece
of data referenced by the web application and the storage location for that data in a DDS, the
web application is free to request the data directly from the DDS.
Figure 4: [2]Access(Data Linkage)
The final stage in the design, is the Presentation[2] stage. Here we define how the data is
transparently presented to end users. Again the browser extension plays a key role. In this
instance the extension operates as a 3rd party does not have any direct relationship to the
18
DDS of the rendered data. For example, an end user may access a web application and request
to view an image collage built from images stored in multiple DDSs owned by multiple data
owners.In this instance the web application, instead of returning raw data as in the classic Web
2.0 case, will return a DDS-Present-DataRequest containing security information exchanged
between the web application and DDS during the initial authentication request. This security
information is protected using PKI to ensure that it cannot be abused to falsify links between
a DDS and unauthorised web applications and clients. The trust relationship enforced in this
case is between the web application and the DDS, hence the DDS itself does not need to be
aware of all the end users who can render the data linked to a specific web application. The
DDS-Present-DataRequest message triggers a handoff of the user from the web application to
the DDS, allowing the browser extension to request and render the data directly from the DDS,
under the web applications instruction.
Figure 5: [2]Data Presentation
19
6 Conclusion
In this report, I have tried to outline some of the basic but important points regarding Dis-
tributed Storage Systems, ranging right from one of its infrastructure, DISKOS, to its appli-
cation in Web 2.0. Data replication schemes and methods of accessing data have also been
outlined.
DISKOS has shown that we can create a file system neutral distributed storage environment
while reaching high performance for several write access patterns.
The data replication problem has also been addressed and a cost model has been developed
which is applicable to very large distributed systems such as the WWW and distributed
databases.
Light has also been put on some data access methods and how they are used in separating
storage from the computing. The accessing efficiency and availability would be the focused
research points.
We have also seen concerns resulting from the growing popularity of Web 2.0 applications and
defined a new paradigm for the distributed storage of data on the Internet. As the user take-up
of Web 2.0 applications continues, it is sensible to adopt a distributed approach that parallels
the way content is originally generated.
20
References
[1] Thanasis Loukopoulos and Ishfaq Ahmad, “Static and Adaptive Data Replication Algo-
rithms for Fast Information Access in Large Distributed Systems.”
[2] Mark Wallis, Frans Henskens and Michael Hannaford, “A Distributed Content Storage
Model for Web Applications.”
[3] Zeng Dadan, Zhou Minqi, Zhou Aoying, “A Data Accessing Method in Distributed Massive
Computing.”
[4] George Parissis and Theodore Apostolopoulos, “A distributed hash table-based approach
for providing file system neutral, robust and resilient storage infrastructure.”
21

Distributed Storage Systems: Bachelor of Technology Computer Science and Engineering

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Distributed Storage Systems: Bachelor of Technology Computer Science and Engineering

Caricato da

Copyright:

Formati disponibili

DISTRIBUTED STORAGE

Department of Computer Science & Engineering

Certified that this Seminar Report entitled

SREENU NAIK BHUKYA

2 DISKOS System Overview 9

4 Data Access Configuration 14

5 Web 2.0 and DDS API 16

1.2 Data Replication

1.3 Data Access in Distributed Computing

1.4 Web 2.0 and Data Storage

2.1 System Overview

(i) (i) (i)

(i) PM (x) (i)

Figure 2: [2]The Classic Web 2.0 Model

Figure 3: [2]Storage(Information Upload)

Figure 4: [2]Access(Data Linkage)

Figure 5: [2]Data Presentation

Potrebbero piacerti anche