Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1)T:W(X)
3)Commit
2) Propagate W(X)
Lazy:
Primary Copy:
W(X)
2)Commit
3) Propagate W(X)
Figure 3: Approaches for submitting Updates Figure 4: Approaches for propagating updates
It depends on the data transfer patterns in the net- can see in the Eager approach, propagation of up-
work. For example a particular site may use a partic- dates occurs even before the commit occurs, but
ular piece of data more often, so a replica near that whereas in the Lazy only after the commit occurs
site will be very useful. does the propagation of updates happens.
3. Where to update?
2.3 Concurrency Control
There are two ways it can be done.
Concurrency control is one of the correctness criteria for
• First approach is Update Everywhere. I.e. exe replicated databases. A replicated database system that
cute update transactions at every site. The ad achieves replication and concurrency control has the same
vantage of this approach is simplicity , but it’s a input/output as a centralized, one-copy database systems
costly approach. that executes user requests one at a time. [13] Concurrency
control problem is exacerbated in a distributed database be-
• Second approach is Update only Primary Copies. cause:
Primary copies are maters copies, whereas sec
ondary copies (slave) are read-only. Figure 3 shows
how this is done. W(X) is the update process. We • Users may access data stored on different computers
can see in the Update every where all the replicas in a distributed system.
are updates. In the primary copy approach all
the Read -only replicas are left alone. • A concurrency control mechanism at one computer
cannot instantaneously know about interactions at other
computers.
4. When to update?
There are two approached for this.
Concurrency control has been actively investigated for the
• First approach is Eager. It means update within past several years, and the problem for non-distributed DBMSs
the boundaries of the transaction, i.e., transac is well understood. The two-phase locking protocol is ac-
tions terminate usually with Two Phase Proto- cepted as a standard solution. [2]
col
• Second approach is Lazy. It means update only 2.3.1 Two-phase locking protocol:
after the commit of transaction. The disadvan Two phase locking is a process used to gain ownership of
tage of this approach is it leads to inconsistency. shared resources without creating the possibility for dead-
Figure 4 shows the two approaches in detail. We lock. The technique is extremely simple, and breaks up the
modification of shared data into ”two phases”, this is what weights in different distributed environments. For example
gives the process its name. [1] the communication cost will dominate in a wide area net-
work, whereas in a local area network it is negligible.
There are actually three activities that take place in the ”two
phase” update algorithm: 3. SWARM INTELLIGENCE
We face dynamic optimization problems in almost all fields.
1. Lock Acquisition Even with today’s ever increasing computing power, some
of these problems are still hard to solve. Finding solutions
2. Modification of Data to these problems in most of the times is not finding the
extema, but to find something that is as close as possible.
3. Release Locks Most recently scientists are turning to insects like ants and
bees for ideas to solve such problems. This form of artificial
The modification of data, and the subsequent release of the intelligence based on the collective behavior of decentralized,
locks that protected the data are generally grouped together self organized systems is called swarm intelligence.
and called the second phase.
A single ant or bee isn’t smart, but their colonies are. The
Two phase locking prevents deadlock from occurring in dis- study of swarm intelligence is providing insights that can
tributed systems by releasing all the resources it has ac- help humans manage complex systems, from truck routing
quired, if it is not possible to obtain all the resources required to military robots [9]
without waiting for another process to finish using a lock.
This means that no process is ever in a state where it is hold- Following a trail of insects as they work together to accom-
ing some shared resources, and waiting for another process plish a task hovers unique possibilities for problem solving -
to release a shared resource which it requires. This means [15]
that deadlock cannot occur due to resource contention. The
resource (or lock) acquisition phase of a ”two phase” shared Swarm intelligence algorithms can be divided into two classes.
data access protocol is usually implemented as a loop within They are
which all the locks required to access the shared data are ac-
quired one by one. If any lock is not acquired on the first • Pheromone based navigational algorithms inspired by
attempt the algorithm gives up all the locks it had previ- biological ant-colony behavior.
ously been able to get, and starts to try to get all the locks
again. • Non-pheromone based navigational algorithms inspired
by biological bee-colony behavior.
2.4 Query Optimization
The queries in distributed data base systems often cannot 3.1 Pheromone based Algorithms
be answered by a single local unit. An aggregate of data, Let us look at the collective behavior of ants. The objec-
spanning over different data bases in a network is needed. tives of ants are very simple, finding food and building a
To do this often there will be many ways. Our goal is finding nest. To achieve these every single ant follows some sim-
the best way. [4] ple set of rules. No one is in-charge; no one knows the
complete picture. But despite this they achieve some ex-
2.4.1 Centralized Vs Distributed Query Processing: traordinary solutions to problems like finding the shortest
In centralized query processing the number of I/O opera- path to food, allocating workers to different tasks, defend-
tions and the usage of CPU to process the query are the main ing their nests from predators, etc,. Now let us look at one
concerns. Whereas in the distributed query processing along of the simple day to day tasks of a Ant, finding the shortest
with this, the amount of data transmission between the sites path to food from nest, which is analogous to finding short-
is also an important concern. Two new operators send and est path problems like travelling sales person [5] Whenever
receive are included in distributed query processing. These a ant bring food to nest it leaves a trail with a chemical
operators are used for transferring the data between sites. called Pheromone. So as many and many ants go through
The other important difference is the heterogeneity in data this trail the track gets reinforced, if no ant uses the trail
formats and data models in distributed databases. In dis- for sometime the chemical slowly evaporates (deleting the
tributed databases the data is replicated in various locations trail in our computer science words). Suppose half of the
to increase the performance. This leads to more complex ants choose the long path and other half choose the shorter
problems while trying process the queries. The usage of path. The shorter path will have more intense pheromone
resource vectors, interconnect matrix, and caching in dis- trail then the longer one, because on the longer path the
tributed environment will make a huge difference. pheromone evaporates faster than on the shorter path. On
the next rounds the ants choose the path with more intense
2.4.2 Query Processing Objectives: pheromone. So after a while all the ants choose the shorter
The cost function of a distributed query is path. This total process is shown in figures 5,6 and 7.
I/O cost + CPU cost + Communication Cost. 3.2 Non-Pheromone based Algorithms:
They are also called bee colony. Bee colony is the area of
The minimization of this function is the main objective of swarm intelligence which studies (1) the behavior of bees and
distributed query processing. These costs may have different similar behavior in other insects, and (2) the applicability of
Figure 5: At the start
– Recruit behavior
– Navigation behavior
4. METHODOLOGY
4.1 Agent Based Systems:
Swarm intelligence is been widely used in various distributed
environments. For example in telecommunications to solve
routing problems [10] and load balancing [14] .They were
also used in distributed pattern detection and classification
[3] and in robotics [12] . In this paper we will try dealing
the problem of replication management using bio-inspired
algorithms. The interesting feature of these bio-inspired al-
gorithms is the solutions to complex problems by following
some simple set of rules in the individual levels. 1.Query Executed 2.Data Item
Local Agent is Accessed
These Swarm biological systems can be quite naturally em- Generated. Condition
ulated with any distributed system through the multi-agent checked
paradigm . The main advantages of using such systems are: 3.Data item is
replicated
1. Self organization:
All decisions are made based on local information.
2. Adaptability:
Can adapt themselves to any dynamic environment.
3. Stigmergy:
It is the mechanism of spontaneous, indirect coordi- Figure 9: Working of a local agent
nation between agents or actions, where the trace left
in the environment by an action stimulates the per-
formance of a subsequent action, by the same or a 1. Local Agent
different agent.
2. Global Agent
4.2 Replication Issues:
Issue 1: The local agent will make replicas. The Global Agent is
responsible for maintaining consistency with the replicas.
In a distributed database system, if data items are far away So local agent deals with the issues 1,2 and 3, whereas the
in the network, then the systems performance will be af- global agent deals with the issue 4 described above.
fected. This is because it leads to network load which causes
a decline in the overall performance. So a system should
have its most accessed data itemś replica nearby. 4.3.1 Local Agent
A local agent is generated whenever a query is executed.
Issue 2: Over doing replication is also a major problem, Every agent has a ID which is unique for each source, i.e.
because when we over replicate, then the memory resources all the agents from the same source will have a same ID.
will be affected. So when a system no longer uses a replica Every time a data item is accessed the local agent checks if
it should be deleted. it has to be replicated. The condition on which this checking
takes place will be discussed in the next section. For now
Issue 3: In distributed database systems often a query will just think that the agent knows it.
be operated on more than one data item. If such operations
are very common on a set of data items that are not on a There are two possible cases here
single site, then replicating a merged copy of those data item
would drastically increase the performance of the system.
• Yes, it has to be replicated
Issue 4: Maintaining the consistency of a system is the
main issue when it comes to replication . Whenever a copy is • No, leave it like that.
updated, there should be consistency among all the replicas.
If it has to be replicated then a replica of the data item is
4.3 Agent generated and transferred to the source site. Otherwise no
An Agent is a piece of code. It is the mobile agent we use in action takes place.
our swarm approach in distributed database system. There
will be two types of agents. Figure 9 shows this process in a simple distributed database.
How does the agent know whether to replicate or
not?
Number Of Replicas
Every time a data item is accessed then the agent leaves
a time stamp there, with the agent ID. So when the data
item is accessed the agent generates a time stamp and then
searches for a time stamp with its agent ID. It then computes
the difference between timestamps. It then checks if the
difference with the X which is a constant, If the difference
is less than X, it means the data item is accessed recently.
Then it increments a count in there with its agent ID. After
this it checks the number of notes, if the count is greater than
Y (which is another constant) then it replicates. Otherwise X
it just increases the count. Graph showing value of X Versus Number of Replicas
Number Of Replicas
This count is reset after some interval of time.
6. REFERENCES
[1] Arnold.P. Two phase locking protocol, march 2009.
Good read with overview of two phase locking
protocol.
[2] Bernstein, P. A., and Goodman, N. Concurrency
control in distributed database systems. acm 13
(1981), 37.
[3] Brueckner, S., and Parunak, H. V. D. Swarming
agents for distributed pattern detection and
classification. pages (forthcoming),.
[4] Cem Evrendilek Asuman Dogac, F. O.
Multidatabase query optimization. Kluwer 39 (1997),
27.
[5] Dorigo, M., and Gambardella. And colonies for
the traveling salesman problem. acm 43 (1997), 13.
[6] Frisch, K. The dance language amd orientation of
bees. acm 5 (1967), 14.
[7] Karl Tuyls, A. N. A bee algorithm for multi-agent
systems. acm 7 (2000), 5.
[8] M. Tamer Özsu, P. V. Distributed database
systems: Where are we now? IEEE 18 (2001), 19.
[9] Miller. National geographic swarm theory. web site,
july 2007.
[10] Ns, T. I. Swarm intelligence and problem solving in.
[11] Ozsu. M. T, V. Principles of distributed database
systems. paperback, 1991.
[12] Pettinaro, G. C., Kwee, I. W., Gambardella,
L. M., Mondada, F., and louis Deneubourg, J.
Swarm robotics: A different approach to service
robotics. 71–76.
[13] Philip A. Berstein, N. G. An algorithm for
concurrency control and recovery in replicated
distributed databases. ACM 9 (1984), 20.
[14] Schoonderwoerd, R., Holland, O., Bruten, J.,
and Rothkrantz, L. Ant-based load balancing in
telecommunications networks.
[15] tarasewitch Patrick R. McMullen, P. Swarm
intelligence, power in numbers. ACM 45,No 8 (Aug
2002), 6.
APPENDIX
A. HEADINGS IN APPENDICES
A.1 Introduction
A.2 Distributed Database Systems
A.2.1 Advantages of Distributed Databases
A.2.2 Distributed Database Issues
Replication Concurrency Control Query Optimization