Sei sulla pagina 1di 5

Name: Id

Wasim Hyder 19111

Arif Mehmood 20435

Subject: Research Method and Skills

Date: April 17, 2016

Project Title: Availability and Reliability Properties of


Distributed Database System

Submitted To: Mr. Muhammad Zeeshan


Methodology:
First we clarify the meaning of availability and reliability and the type of distributed bases we
consider. Then we introduce a system model which defining specific application settings and error
conditions we tackle in this document.

A. Availability and Reliability:


Availability is the degree to which a system is operational and accessible when required for use. In
turn, reliability enables a component to perform its required functions under stated conditions for a
specific period of time. It is defined as a measure of the continuity of correct service. Thus, availability is
aliveness guarantee, while reliability is a safety guarantee. Systems that provide both reliability and
availability are often said to be fault-tolerant. Here, the fault-tolerance property simply states that a
system has a specific behavior for dedicated failures classes.

B. Modern Database Systems:


Classically, a database consists of a set of named data items each of which has a value. The state of a
database at a particular time is the set of available items and their values at that point in time.
Distributed database systems provide a single database system spread over multiple physical nodes.
Distributed database systems are commonly classified according to the storage models they provide:
Relational databases store data as tuples, forming an ordered set of attributes. In turn, a relation
consists of sets of tuples while a tuple is a row, an attribute is a column and a relation forms a table.
Tables are defined using a static, normalized data schema and different tables can be referenced using
foreign keys. SQL has established itself as a generic data definition, manipulation and query language for
relational data.
Column-oriented databases store data by columns rather than by rows. This enables storing large
amounts of data in bulk and for efficiently querying over very large, structured data sets. A column-
oriented data model does not rely on a fixed schema. It provides nest able, map-like structures for data
items instead which improves flexibility over fixed schemas.

C. System Model:
Now we introduce a system model for the discussions in this document. The model covers the
architecture of the database, the environment it is operated in, the way it is accessed, and failure types we
address. We also clarify our view on availability.
In our discussion, we only consider distributed shared nothing database clusters. We assume that the
various database manager instances are operated by a single administrative instance, but not necessarily
situated in the same geographical location (data-centre). The clients accessing the database are middle-tier
components of the enterprise application. They access the database by sending request messages
containing query or update operations; depending on the type of request, they either receive a response
with an update status or the actual data item.
We only tackle common failure events and do by no means consider catastrophic events such as
earthquakes, flooding, etc. We assume that access to the database instances is well protected by security
mechanism outside the databases and that the software does not contain any critical bugs. We also
do not consider data corruption on disk level or disk failures. Hence, our considerations only deal with the
full failures of one or multiple nodes as well as network partitions between the database nodes. Such
partitions may be full or partial. We argue that all of these additional failures can be dealt with by other
mechanisms, such as RAID for disk failures. In consequence, the system model allows that any failed
node can be repaired and re-introduced into the cluster. This is due to the fact that the persistent state on
disk does not get lost and can be made available again by repairs.
We assume an unlimited network capacity between clients and database nodes. Furthermore, we
assume that a client is always capable of reaching at least one node of the database cluster provided that
not all nodes have failed. Then, a database is available when it is possible to access a particular, but
arbitrary data item within an arbitrary, but fixed time interval t. A data item i is available when it is
possible to access that item within an arbitrary, but fixed time interval t. Our model allows the
overloading of nodes such that requests get dropped, the reply time reaches above t or a reply will not
even get created. Clients that interact with such a node will consider the node as failed and proceed as
they would in case of a failure. Usually, this means contacting another replica.

AVIALABLE AND RELIABLE DATABASE SYSTEMS


We now reveal the relevant challenges for providing availability of and reliability for a distributed
database as part of an enterprise application architecture respecting the application model and system
model.
A. Aspect of Availability:
With a single database server, a data item has to be considered unavailable when one of the following
situations occurs: (i) the high number of requests issued concurrently by clients overloads the server such
that the requests of clients cannot be handled or are handled with a latency > t. (ii) The database node
fails.
An intuitive way to deal with overload is to scale up the system so that it can process more client
requests per time interval. Unfortunately, the costs for scaling up do not scale linearly with the power of
the system. Moreover, scenarios exist that overload the largest affordable node. Finally scaling up does
not handle unavailability caused by failures of individual nodes. An alternative to scale up is to introduce
additional database servers (scale out) and have the data set split among them (sharding). If done right,
this reduces the load on individual database servers and avoids unavailability due to overload. In
consequence, this approach improves the global availability of the system, as the probability that all nodes
have failed at a particular point in time decreases .
Sadly, however, scaling out in combination with sharding alone does not impact the availability of a
single data item with respect to failures due to the following reason: when only sharding is used, each
item is still only stored on a single node whose failure properties have not changed due to the scale out.
Even worse, the probability that any node of a distributed system has crashed at a particular point in time
increases with the number of nodes the system consists of. Hence, the more nodes a database cluster
consists of, the more necessary it gets to deal with failures of individual nodes.
Improving on the availability of a single data item when scaling out can only be achieved by adding
additional redundancy for the data items, commonly called (data) replication. Then, a single item is not
stored on one, but multiple nodes and requests targeting an unavailable node can be redirected to another
node that stores a replica of the targeted item. Obviously, the existence of multiple, redundant copies of a
single data item create the need for dealing with consistency of these copies. On the upside, having
replicas of a data item offers the chance to have performance improvements as load caused by access to a
single item can be spread among multiple nodes.
The introduction of replicas adds some design options to a distributed database system. In particular,
the aspects of replication degree, replica placement, and replica access have to be addressed as well as
replica consistency. These aspects are highlighted by the following questions:
(1) How many replicas do exist and where are they located?
(2) Which replicas are accessible for reading / writing an item?
(3) What is the consistency between the replicas?

B. Aspects of Reliability
As stated earlier, reliability enables a component to perform its required functions under stated
conditions for a specific period of time. In contrast to availability (which is a statistical property), the
question whether a system is reliable primarily depends on the specification of the function a system is
supposed to fulfill. Hence, a database may be considered unreliable when interacting with it yields
different results than expected. This is particularly important for concurrency handling and client-side
consistency. With respect to reliability, the main challenges can be highlighted by the following two
questions:
(1) How are concurrent writes to the same item resolved?
(2) What is the consistency experienced by clients

C. Dimensions of Distributed Databases


From the previous discussion of aspects of availability and reliability, we derive the following design
dimensions for database systems that shall realize one or both properties in a distributed setting:
Scalability deals with the capability of the system to scale out, i.e. add more resources as needed. In
databases this property is realized through (data) partitioning (also called sharding). This is merely a
mechanism to achieve availability on a global level. Replication denotes the creation of copies of
individual data items and provides strategies to keep these copies synchronized. It is required for
achieving both reliability and availability on item level. Conflict management deals with concurrent write
access to the same item. It is required for reliability, but may influence availability. Moreover, it is a
property affecting both replicated and non-replicated items. Finally, consistency is a consequence of both
replication and conflict management properties.

DESIGNING FOR AVAILABILITY AND RELIABILITY


From the previous discussion, we have identified four topics as crucial dimensions of an available
and reliable database system. In this section we discuss each of these concepts individually.

A. Replication
With respect to data, replication means to have several physical copies of the same logical data item.
Apart from replication scope that defines the relation of replicas and nodes, the primary conceptual
decisions to be made when using replication are (i) which of the physical data items may be updated by
clients and (ii) how the other replicas are changed once such an update has taken place [12]. We refer to
the first aspect as the replication strategy of the system and to the latter as the update strategy. The update
strategy determines how the replicas interchange updates. That is, it defines the update data exchanged by
the database nodes, as well as update laziness that influences the relative ordering of responses sent to the
client and updates.

B. Conflict Management
Any sort of concurrent updates to a single data item may lead to conflicts. While this is obvious for
multi-master replication, it also has to be considered for single-master approaches and even non-
replicated items. Commonly two strategies are known to deal with this problem called optimistic and
pessimistic concurrency control. The strategy has an influence on the possible versions of an item.
Concurrent operations may be conflicting, so conflict detection and conflict resolution have to be applied.

C. Consistency
Databases as well as distributed systems research have developed a large number of different
consistencies. Consistency protocols define the allowed interleaving when multiple processes access the
same data sets. With respect to the scope of this document, we are merely interested in what the client, i.e.
the user of the database, can experience. In Vogel et al.s terminology this vantage point is called client-
centric consistency. Apparently, any database system also requires some internal consistency
specification, mainly with respect to how to deal with replicas.

D. Partitioning
In order to enable scalability the existence of a partitioning mechanism is necessary. Partitioning means
that each node is only responsible for dedicated subsets of data items stored in the system, i.e. data
partitions. Depending on the actual database system, partitions are also sometimes called ranges/regions.
A partitioning mechanism maps data items to partitions. A commonly established mechanism is consistent
hashing. Database systems differ with respect to the dynamic their partitioning mechanism allows. For
being able to access the data items, for any request issued by a client the appropriate partition has to be
resolved. Finally, a client may be restricted in the way it is allowed to access a cluster of database nodes.

Potrebbero piacerti anche