Sei sulla pagina 1di 29

MODULE-V

Transaction Management
Transaction
A transaction is a single logical unit of work which accesses and possibly modifies
the contents of a database. Transactions access data using read and write operations.
“Transaction is a set of operations which are all logically related.”
OR
“Transaction is a single logical unit of work formed by a set of operations.”
Operations in a transaction-
The main operations in a transaction are-
1. Read Operation
2. Write Operation
1. Read Operation
 Read operation reads the data from the database and then stores it in the buffer of
main memory.
 For example- Read(A) instruction will read the value of A from the database and
will store it in the buffer of main memory.
2. Write Operation
 Write operation writes the data back to the database from the buffer.
 For example- Write(A) will write the updated value of A from the buffer to the
database.
Transaction States
A transaction goes through many different states throughout its lifetime. These states
are known as transaction states.
Transaction states are as follows-
1. Active state
2. Partially committed state
3. Committed state
4. Failed state
5. Aborted state
6. Terminated state

1. Active state
 This is the first state in the lifecycle of a transaction.
 A transaction is called in an active state as long as its instructions are getting
executed from first to last.
 All the changes made by the transaction are being stored in a buffer in main
memory.
2. Partially committed state
 After the last instruction of the transaction gets executed, it enters into a partially
committed state.
 After the transaction has entered into this state, the transaction is now considered to
be partially committed.
 It is not considered to be fully committed because all the changes made by the
transaction are still stored only in the buffer in main memory and not into the
database.
3. Committed state
 After all the changes made by the transaction have been successfully stored into the
database, the transaction enters into a committed state.
NOTE-
 After a transaction has entered the committed state, it is not possible to roll back
the transaction i.e. we can not undo the changes the transaction has made because
the system has been now updated into a new consistent state.
 The only best possible thing that can be done to undo the changes made by the
transaction is to carry out another transaction called compensating transaction to
reverse the changes.
4. Failed state
 When a transaction is present in the active state or partially committed state i.e.
when a transaction is being executed or has partially committed and and some
failure occurs due to some reason and it is analyzed that the normal execution is
now impossible, the transaction then enters into a failed state.
5. Aborted state
 After the transaction has failed and has entered into a failed state, all the changes
made by the transaction have to be undone.
 To undo the changes made by the transaction, it is necessary to roll back the
transaction.
 After the transaction has rolled back completely, it enters into an aborted state.
6. Terminated state
 This is the last state in the life cycle of a transaction.
 After entering the committed state or aborted state, the transaction then finally
enters into a terminated state where the transaction life cycle finally comes to an
end.
In order to maintain consistency in a database, before and after transaction, certain
properties are followed. These are called ACID properties.
1.Atomicity
By this, we mean that either the entire transaction takes place at once or
doesn’t happen at all. There is no midway i.e. transactions do not occur partially.
Each transaction is considered as one unit and either runs to completion or is not
executed at all. It involves following two operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.
->Atomicity is also known as the ‘All or nothing rule’.
Example:
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from
account X to account Y.
If the transaction fails after completion of T1 but before completion of T2.( say,
after write(X) but before write(Y)), then amount has been deducted from X but not
added to Y. This results in an inconsistent database state. Therefore, the transaction
must be executed in entirety in order to ensure correctness of database state.
2.Consistency
This means that integrity constraints must be maintained so that the database is
consistent before and after the transaction. It refers to correctness of a database.
Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, database is consistent. Inconsistency occurs in case T1 completes
but T2 fails. As a result T is incomplete.
3.Isolation
Transactions are often executed concurrently (e.g., reading and writing to
multiple tables at the same time). Isolation ensures that concurrent execution of
transactions leaves the database in the same state that would have been obtained if the
transactions were executed sequentially. Isolation is the main goal of concurrency
control; depending on the method used, the effects of an incomplete transaction
might not even be visible to other transactions.
Let X= 500, Y = 500.
Consider two transactions T and T11.

Suppose T has been executed till Read (Y) and then T11starts. As a result ,
interleaving of operations takes place due to which T11 reads correct value of X but
incorrect value of Y and sum computed by
T11: (X+Y = 50000+500=50500)
is thus not consistent with the sum at end of transaction:
T: (X+Y = 50000 + 450 = 50450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions
must take place in isolation and changes should be visible only after a they have been
made to the main memory.
4.Durability:
This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and they
persist even is system failure occurs. These updates now become permanent and are
stored in a non-volatile memory. The effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure correctness
and consistency of a database in a way such that each transaction is a group of
operations that acts a single unit, produces consistent results, acts in isolation from
other operations and updates that it makes are durably stored.
Implementation of Atomicity and Durability
The recovery-management component of a database system can support
atomicity and durability by a variety of schemes. We first consider a simple, but
extremely in- efficient, scheme called the shadow copy scheme. This scheme, which
is based on making copies of the database, called shadow copies, assumes that only
one transaction is active at a time. The scheme also assumes that the database is
simply a file on disk. A pointer called db-pointer is maintained on disk; it points to
the current copy of the database.
In the shadow-copy scheme, a transaction that wants to update the database
first creates a complete copy of the database. All updates are done on the new
database copy, leaving the original copy, the shadow copy, untouched. If at any point
the transaction has to be aborted, the system merely deletes the new copy. The old
copy of the database has not been affected.
If the transaction completes, it is committed as follows. First, the operating
system is asked to make sure that all pages of the new copy of the database have been
written out to disk. (Unix systems use the flush command for this purpose.) After the
operating system has written all the pages to disk, the database system updates the
pointer db-pointer to point to the new copy of the database; the new copy then
becomes the current copy of the database. The old copy of the database is then
deleted. Figure 15.2 depicts the scheme, showing the database state before and after
the update.

The transaction is said to have been committed at the point where the updated db-
pointer is written to disk.
We now consider how the technique handles transaction and system failures.
First, consider transaction failure. If the transaction fails at any time before db-pointer
is updated, the old contents of the database are not affected. We can abort the trans-
action by just deleting the new copy of the database. Once the transaction has been
committed, all the updates that it performed are in the database pointed to by db-
pointer. Thus, either all updates of the transaction are reflected, or none of the effects
are reflected, regardless of transaction failure.
Now consider the issue of system failure. Suppose that the system fails at any
time before the updated db-pointer is written to disk. Then, when the system restarts,
it will read db-pointer and will thus see the original contents of the database, and
none of the effects of the transaction will be visible on the database. Next, suppose
that the system fails after db-pointer has been updated on disk. Before the pointer is
updated, all updated pages of the new copy of the database were written to disk.
Again, we assume that, once a file is written to disk, its contents will not be
damaged even if there is a system failure. Therefore, when the system restarts, it will
read db-pointer and will thus see the contents of the database after all the updates
performed by the transaction.
Thus, the atomicity and durability properties of transactions are ensured by the
shadow-copy implementation of the recovery-management component.
Serializability in DBMS:
What is Serializability in DBMS?
 When multiple transactions run concurrently, then it may give rise to inconsistency
of the database.
 A schedule is a list of actions (reading, writing, aborting, or committing)
from a set of transactions, and the order in which two actions of a transaction
T appear in a schedule must be the same as the order in which they appear in T.
 Serializability is a concept that helps to identify which non-serial schedules are
correct and will maintain the consistency of the database.
Serializable Schedules
 If a given schedule of ‘n’ transactions is found to be equivalent to some serial
schedule of ‘n’ transactions, then it is called as a serializable schedule.
Properties of Serializable Schedules-
Serializable schedules behave exactly same as serial schedules. So, they are always-
 Consistent
 Recoverable
 Casacadeless
 Strict
Difference between Serial Schedules and Serializable Schedules-
The only difference between serial schedules and serializable schedules is that-
 In serial schedules, only one transaction is allowed to execute at a time i.e. no
concurrency is allowed.
 Whereas in serializable schedules, multiple transactions can execute simultaneously
i.e. concurrency is allowed.
Which is better- Serial Schedules or Serializable schedules?
We have mentioned that serial schedules and serializable schedules behave
exactly same. The added advantage with serializable schedules is that they allow
multiple transactions to execute simultaneously i.e. they allow concurrency.
Since, serializable schedules improve both resource utilization and CPU
throughput, therefore serializable schedules are always better than serial
schedules.
1. Serial Schedules –
Schedules in which the transactions are executed non-interleaved, i.e., a
serial schedule is one in which no transaction starts until a running
transaction has ended are called serial schedules.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2

R(A)

W(A)

R(B)

W(B)

R(A)

R(B)

where R(A) denotes that a read operation is performed on some data item ‘A’
This is a serial schedule since the transactions perform serially in the order T1 —> T2
2. Complete Schedules –
Schedules in which the last operation of each transaction is either abort (or) commit
are called complete schedules. Example: Consider the following schedule involving
three transactions T1, T2 and T3.
T1 T2 T3
R(A)
W(A)
R(B)
W(B)
Commit
commit
abort
This is a complete schedule since the last operation performed under every
transaction is either “commit” or “abort”.

3. Recoverable Schedules –
Schedules in which transactions commit only after all transactions whose changes
they read commit are called recoverable schedules. In other words, if some
transaction Tj is reading value updated or written by some other transaction Ti, then
the commit of Tj must occur after the commit of Ti.
Example – Consider the following schedule involving two transactions T 1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
Commit

This is a recoverable schedule since T1 commits before T2, that makes the value read
by T2 correct.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
Abort
T2 read the value of A written by T1, and committed. If T2 had not yet committed, we
could deal with the situation by cascading the abort of TI and also aborting T2; this
process recursively aborts any transaction that read data written by T2, and so on.
This situation is called Cascading abort. but since T2 has already committed, and so
we cannot undo its actions. This schedule is unrecoverable.
4. Cascadeless Schedules –
Also called Avoids cascading aborts/rollbacks (ACA). Schedules in which
transactions read values only after all transactions whose changes they are going to
read commit are called cascadeless schedules. Avoids that a single transaction abort
leads to a series of transaction rollbacks. A strategy to prevent cascading aborts is to
disallow a transaction from reading uncommitted changes from another transaction in
the same schedule.
In other words, if some transaction Tj wants to read value updated or written by some
other transaction Ti, then the commit of Tj must read it after the commit of Ti.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
W(A)
commit
R(A)
Commit

This schedule is cascadeless. Since the updated value of A is read by T2 only after the
updating transaction i.e. T1 commits.
Example: Consider the following schedule involving two transactions T1 and T2.

T1 T2
R(A)
W(A)
R(A)
W(A)
abort
Abort
It is a recoverable schedule but it does not avoid cascading aborts. It can be seen that
if T1 aborts, T2will have to be aborted too in order to maintain the correctness of the
schedule as T2 has already read the uncommitted value written by T1.

5. Strict Schedules –
A schedule is strict if for any two transactions Ti, Tj, if a write operation of
Ti precedes a conflicting operation of Tj (either read or write), then the commit or
abort event of Ti also precedes that conflicting operation of Tj.
In other words, Tj can read or write updated or written value of Ti only after
Ti commits/aborts.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
R(A)
W(A)
commit
T1 T2
W(A)
R(A)
Commit

This is a strict schedule since T2 reads and writes A which is written by T1 only after
the commit of T1.
Note – It can be seen that:
1. Cascadeless schedules are stricter than recoverable schedules or are a subset of
recoverable schedules.
2. Strict schedules are stricter than cascadeless schedules or are a subset of
cascadeless schedules.
3. Serial schedules satisfy constraints of all recoverable, cascadeless and strict
schedules and hence is a subset of strict schedules.
The relation between various types of schedules can be depicted as:

Anomalies Due to Interleaved Execution


Two actions on the same data object conflict if at least one of them is a write. The
three anomalous situations can be described in terms of when the actions of two
transactions T1 and T2 conflict with each other: in a write-read (WR) conflict T2
reads a data object previously written by T1; we define read-write (RW) and
write-write (WW) conflicts similarly.
Reading Uncommitted Data (WR Conflicts)
The first source of anomalies is t hat a transaction T2 could read a database object
A that has been modified by another transaction T1, which has not yet committed.
Such a read is called a dirty read. A simple example illustrates how such a schedule
could lead to an inconsistent database state. Consider two transactions T1 and T2,
each of which, run alone, preserves database consistency: T1 transfers $100 from A to
B, and T2 increments both A and B by 6 percent (e.g., annual interest is deposited
into these two accounts). Suppose that their actions are interleaved so that (1) the
account transfer program T1 deducts $100 from account A, then (2) the interest
deposit program T2 reads the current values of accounts A and B and adds 6 percent
interest to each, and then (3) the account transfer program credits $100 to account B.
The corresponding schedule, which is the view the DBMS has of this series of events,
is illustrated in Figure.

Unrepeatable Reads (RW Conflicts)


The second way in which anomalous behavior could result is that transaction T2could
change the value of an object A that has been read by a transaction T1, whileT1 is still
in progress. This situation causes two problems.
 If T1 tries to read the value of A again, it will get a different result, even though it
has not modified A in the meantime. This situation could not arise in a serial
execution of two transactions; it is called an unrepeatable read.
 Phantom problem:
A transaction retrieves a collection of objects (in SQL terms, a collection of tuples)
twice and sees different results because of insertions or deletions of tuples, even
though it does not modify any of these tuples itself. This phenomenon is called the
phantom problem. One way to overcome this is to lock the entire table.

Overwriting Uncommitted Data (WW Conflicts)


a transaction T2 could overwrite
the value of an object A, which has already been modified by a transaction Tl,
while Tl is still in progress. Even if T2 does not read the value of A written
by Tl, a potential problem exists as the following example illustrates.

Suppose that Harry and Larry are two employees, and their salaries must be
kept equal. Transaction Tl sets their salaries to $2000 and transaction T2 sets
their salaries to $1000. If we execute these in the serial order Tl followed by
T2, both receive the salary $1000: the serial order T2 followed by Tl gives each
the salary $2000. Either of these is acceptable from a consistency standpoint.
Neither transaction reads a salary value before writing it----such a write is called a
blind write.

Now, consider the following interleaving of the actions of 1'1 and T2: T2 sets
Harry's salary to $1000, Tl sets Larry's salary to $2000, T2 sets La.rry's salary
to $1000 and commits, and finally Tl sets Harry's salary to $2000 and connnits.
The result is not identical to the result of either of the two possible serial
executions, and the interleaved schedule is therefore not serializable. It violates
the desired consistency criterion that the two salaries must be equal.
The problem is that we have a lost update. The first transaction to commit,
T2, overwrote Larry's salary as set by Tl. In the serial order T2 followed by
T1, Larry's salary should reflect Tl's update rather than T2's, but Tl's update
is 'lost'.

Types of Serializability-

Serializability is mainly of two types-


1. Conflict Serializability
2. View Serializability

1. Conflict Serializability
If a given schedule can be converted into a serial schedule by swapping its
non-conflicting operations, then it is called as a conflict serializable schedule.

What are conflicting operations?


Two operations are called as conflicting operations if all the following
conditions hold true for them-
 Both the operations belong to different transactions
 Both the operations are on same data item
 At least one of the two operations is a write operation
Example-
Consider the following schedule-
In this schedule, W1 (A) and R2 (A) are called as conflicting operations
because all the above conditions hold true for them.
How to check if a given schedule is conflict serializable or not-
Step-01:
Find and list all the conflicting operations.
Step-02:
Start creating a precedence graph by drawing one node for each transaction.
Step-03:
Draw an edge for each conflict pair such that if Xi (V) and Yj (V) forms a conflict
pair then draw an edge from Ti to Tj which ensures that Ti gets executed before Tj.
Step-04:
Check if there is any cycle formed in the graph. If there is no cycle found, then the
schedule is conflict serializable otherwise not.
Note-
The corresponding serial schedule(s) can be found by Topological Sorting of the
acyclic precedence graph so obtained.
 Such schedules can be more than 1.
PRACTICE PROBLEMS BASED ON CHECKING WHETHER THE GIVEN
SCHEDULE IS CONFLICT SERIALIZABLE OR NOT-
Problem-01:
Check whether the given schedule S is conflict serializable or not-
S : R1(A) , R2(A) , R1(B) , R2(B) , R3(B) , W1(A) , W2(B)
Solution-
Step-01:
List all the conflicting operations and determine the dependency between
transactions-
 R2(A) , W1(A) (T2 → T1)
 R1(B) , W2(B) (T1 → T2)
 R3(B) , W2(B) (T3 → T2)
Step-02:
Draw the precedence graph-
Since, there exists a cycle in the precedence graph, therefore, the given schedule S is
not conflict serializable.
Problem-02:
Check whether the given schedule S is conflict serializable and recoverable or not-

Solution-

Checking whether schedule S is conflict serializable or not-


Step-01:
List all the conflicting operations and determine the dependency between
transactions-
 R2(X) , W3(X) (T2 → T3)
 R2(X) , W1(X) (T2 → T1)
 W3(X) , W1(X) (T3 → T1)
 W3(X) , R4(X) (T3 → T4)
 W1(X) , R4(X) (T1 → T4)
 W2(Y) , R4(Y) (T2 → T4)

Step-02:
Draw the precedence graph-

Since, there exists no cycle in the precedence graph, therefore, the given schedule S
is conflict serializable.

View Serializability
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.
View Equivalent
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following
conditions:
1. Initial Read
An initial read of both schedules must be the same. Suppose two schedule S1 and S2.
In schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction
T1 should also read A.

Above two schedules are view equivalent because Initial read operation in S1 is done
by T1 and in S2 it is also done by T1.
2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should
read A which is updated by Tj.
Above two schedules are not view equal because, in S1, T3 is reading A updated by
T2 and in S2, T3 is reading A updated by T1.
3. Final Write
A final write must be the same between both the schedules. In schedule S1, if a
transaction T1 updates A at last then in S2, final writes operations should also be
done by T1.

Above two schedules is view equal because Final write operation in S1 is done by T3
and in S2, the final write operation is also done by T3.
Example:

Schedule S
With 3 transactions, the total number of possible schedule
1. = 3! = 6
2. S1 = <T1 T2 T3>
3. S2 = <T1 T3 T2>
4. S3 = <T2 T3 T1>
5. S4 = <T2 T1 T3>
6. S5 = <T3 T1 T2>
7. S6 = <T3 T2 T1>
Taking first schedule S1:
Schedule S1
Step 1: final updation on data items
In both schedules S and S1, there is no read except the initial read that's why we don't
need to check that condition.
Step 2: Initial Read
The initial read operation in S is done by T1 and in S1, it is also done by T1.
Step 3: Final Write
The final write operation in S is done by T3 and in S1, it is also done by T3. So, S
and S1 are view Equivalent.
The first schedule S1 satisfies all three conditions, so we don't need to check another
schedule.
Hence, view equivalent serial schedule is:
1. T1 → T2 → T3

CONCURRENCY CONTROL WITH LOCKING METHODS


When there are multiple transactions executing at the same time on same data, it may
affect the result of the transaction. Hence it is necessary to maintain the order of
execution of those transactions. In addition, it should not alter the ACID property of a
transaction.
In order to maintain the concurrent access of transactions, two protocols are
introduced.
 Lock Based Protocol
 Basic 2-PL
 Conservative 2-PL
 Strict 2-PL
 Rigorous 2-PL
 Graph Based Protocol
 Time-Stamp Ordering Protocol
 Multiple Granularity Protocol
 Multi-version Protocol
Lock Granularity
Indicates the level of lock use. Locking can take place at the following levels:
database, table, page, row or even field. High degree of concurrency is row level.
LOCK TYPES
Regardless of the level of locking, the DBMS may use different lock types:
1. Binary Locks
Have only two states: locked (1) or unlocked (0).
2. Shared/Exclusive Locks
An exclusive lock exists when access is reserved specifically for the
transaction that locked the object .The exclusive lock must be used when the potential
for conflict exists.
A shared lock exists when concurrent transactions are granted read access on
the basis of common lock. A shared lock produces no conflict as long as all the
concurrent transactions are read only.
Most multiuser DBMSs automatically initiate and enforce locking procedures. All
lock information is managed by a lock manager.
Lock Based Protocols –
A lock is a variable associated with a data item that describes a status of data item
with respect to possible operation that can be applied to it. They synchronize the
access by concurrent transactions to the database items. It is required in this protocol
that all the data items must be accessed in a mutually exclusive manner.
Simple Lock:
There are two common locks which are used and some terminology followed in this
protocol.
1. Shared Lock (S): also known as Read-only lock. As the name suggests it can
be shared between transactions because while holding this lock the transaction
does not have the permission to update data on the data item. S-lock is
requested using lock-S instruction.
2. Exclusive Lock (X): Data item can be both read as well as written.This is
Exclusive and cannot be held simultaneously on the same data item. X-lock is
requested using lock-X instruction.
Lock Compatibility Matrix –

 A transaction may be granted a lock on an item if the requested lock is


compatible with locks already held on the item by other
transactions.
 Any number of transactions can hold shared locks on an item, but if any
transaction holds an exclusive(X) on the item no other transaction may hold
any lock on the item.
 if a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. Then the
lock is granted.
Upgrade / Downgrade locks : A transaction that holds a lock on an item A is
allowed under certain condition to change the lock state from one state to another.
Upgrade: A S(A) can be upgraded to X(A) if Ti is the only transaction holding the S-
lock on element A.
Downgrade: We may downgrade X(A) to S(A) when we feel that we no longer want
to write on data-item A. As we were holding X-lock on A, we need not check any
conditions.
Simple Lock based protocol (or Binary Locking),it has its own disadvantages,
they does not guarantee Serializability. Schedules may follow the preceding rules
but a non-serializable schedule may result.
Problem With Simple Locking…
Consider the Partial Schedule:
T1 T2

1 lock-X(B)

2 read(B)

3 B:=B-50

4 write(B)

5 lock-S(A)

6 read(A)

7 lock-S(B)

8 lock-X(A)

9 …… ……

DEADLOCKS
In a database, a deadlock is an unwanted situation in which two or more transactions are
waiting indefinitely for one another to give up locks. Deadlock is said to be one of the most
feared complications in DBMS as it brings the whole system to a Halt.
Example – let us understand the concept of Deadlock with an example :
Suppose, Transaction T1 holds a lock on some rows in the Students table and needs to
update some rows in the Grades table. Simultaneously, Transaction T2 holds locks on
those very rows (Which T1 needs to update) in the Grades table but needs to update the
rows in the Student table held by Transaction T1.
Now, the main problem arises. Transaction T1 will wait for transaction T2 to give up lock,
and similarly transaction T2 will wait for transaction T1 to give up lock. As a consequence,
All activity comes to a halt and remains at a standstill forever unless the DBMS detects the
deadlock and aborts one of the transactions.
 Deadlock preventation:
When a database is stuck in a deadlock, It is always better to avoid the
deadlock rather than restarting or aborting the database. Deadlock avoidance
method is suitable for smaller database whereas deadlock prevention method is
suitable for larger database.
One method of avoiding deadlock is using application consistent logic. In the
above given example, Transactions that access Students and Grades should
always access the tables in the same order. In this way, in the scenario
described above, Transaction T1 simply waits for transaction T2 to release the
lock on Grades before it begins. When transaction T2 releases the lock,
Transaction T1 can proceed freely.
Another method for avoiding deadlock is to apply both row level locking
mechanism and READ COMMITTED isolation level. However, It does not
guarantee to remove deadlocks completely.

 Deadlock detection:

When a transaction waits indefinitely to obtain a lock, The database


management system should detect whether the transaction is involved in a
deadlock or not.
Wait-for-graph is one of the methods for detecting the deadlock situation.
This method is suitable for smaller database. In this method a graph is drawn
based on the transaction and their lock on the resource. If the graph created
has a closed loop or a cycle, then there is a deadlock.
For the above mentioned scenario the Wait-For graph is drawn below

The DBMS periodically tests the database for deadlocks.


if a deadlock is found one of the transactions is aborted (rolled back and
restarted) and the other transaction are continues.

 Deadlock avoidance: The transaction must obtain all of the locks it needs
before it can be executed .This technique avoids the rollback of conflicting
transactions by requiring that locks be obtained in succession
For large database, deadlock prevention method is suitable. A deadlock
can be prevented if the resources are allocated in such a way that
deadlock never occur. The DBMS analyses the operations whether they
can create deadlock situation or not, If they do, that transaction is never
allowed to be executed.
Deadlock prevention mechanism proposes two schemes :
 Wait-DieScheme –
In this scheme, If a transaction request for a resource that is locked by
other transaction, then the DBMS simply checks the timestamp of both
transactions and allows the older transaction to wait until the resource is
available for execution.
Suppose, there are two transactions T1 and T2 and Let timestamp of any
transaction T be TS (T). Now, If there is a lock on T2 by some other
transaction and T1 is requesting for resources held by T2, then DBMS
performs following actions:
Checks if TS (T1) < TS (T2) – if T1 is the older transaction and T2 has
held some resource, then it allows T1 to wait until resource is available
for execution. That means if a younger transaction has locked some
resource and older transaction is waiting for it, then older transaction is
allowed wait for it till it is available. If T1 is older transaction and has held
some resource with it and if T2 is waiting for it, then T2 is killed and
restarted latter with random delay but with the same timestamp. i.e. if the
older transaction has held some resource and younger transaction waits
for the resource, then younger transaction is killed and restarted with
very minute delay with same timestamp.
This scheme allows the older transaction to wait but kills the younger
one.

 Wound Wait Scheme –


In this scheme, if an older transaction requests for a resource
held by younger transaction, then older transaction forces
younger transaction to kill the transaction and release the
resource. The younger transaction is restarted with minute
delay but with same timestamp. If the younger transaction is
requesting a resource which is held by older one, then younger
transaction is asked to wait till older releases it.

Lock Based Protocols or concurrency control with locking methods:


1.Simplistic Lock Protocol: -As the name suggests it is the simplest way of locking
the data during the transaction. This protocol allows all the transaction to get the lock
on the data before insert / update /delete on it. After completing the transaction, it
will unlock the data.
2.Pre-claiming Protocol: - In this protocol, it evaluates the transaction to list all the
data items on which transaction needs lock. It then requests DBMS for the lock on all
those data items before the transaction begins. If DBMS gives the lock on all the
data, then this protocol allows the transaction to begin. Once the transaction is
complete, it releases all the locks. If all locks are given by DBMS, then it reverts the
transactions and waits for the lock.
For example, if we have to calculate total marks of 3 subjects, then this protocol will
evaluate the transaction and list the locks on subject1 marks, subject2 marks and then
subject3 marks. Once it gets all the locks, it will start the transaction.
3.Two Phase Locking Protocol (2PL): - In this type of protocol, as the transaction
begins to execute, it starts requesting for the locks that it needs. It goes on requesting
for the locks as and when it is needed. Hence it has a growing phase of locks. At one
stage it will have all the locks. Once the transaction is complete it goes on releasing
the locks. Hence it will have descending phase of locks. Thus this protocol has two
phases – growing phase of locks and shrinking phase of locks.

For example, if we have to calculate total marks of 3 subjects, then this protocol will
go on asking for the locks on subject1 marks, subject2 marks and then subject3
marks. As and when it gets the locks on the subject marks it reads the marks. It does
not wait till all the locks are received. Then it will have total calculation. Once it is
complete it release the lock on subject3 marks, subject2 marks and subject1 marks.
In this protocol, if we need to have exclusive lock on any data for writing, then we
have to first get the shared lock for reading. Then we have to request / modify the
lock to exclusive lock.
4.Strict Two Phase Locking (Strict 2PL): - This protocol is similar to 2PL in the
first phase. Once it receives the lock on the data, it completes the transaction. Here it
does not release the locks as it is used and no more required. It waits till whole
transaction to complete and commit, then it releases all the locks at a time. This
protocol hence does not have shrinking phase of lock release.

In the example of calculating total marks of 3 subjects, locks are achieved at growing
phase of the transaction and once it receives all the locks, it executes the transaction.
Once the transaction is fully complete, it releases all the locks together.
Time Stamp Based Protocol or concurrency control with timestamp method: -
As we have seen above in lock based protocol, it acquires locks at the time of
execution. But in this method, as soon as a transaction is created it assigns the order
of the transaction. The order of the transaction is nothing but the ascending order of
the transaction creation. The priority for older transaction is given to execute first.
This protocol uses system time or logical counter to determine the time stamp of the
transaction.
Suppose there are two transactions T1 and T2. Suppose T1 has entered the
system at time 0005 and T2 has entered the system at 0008 clock time. Priority will
be given to T1 to execute first as it is entered the system first.
In addition to the timestamp of a transaction, this protocol also maintains the
timestamp of last ‘read’ and ‘write’ operation on a data. Based on the timestamp of
transaction and the data which it is accessing a timestamp ordering protocol is
defined.
According to this protocol:
 If a transaction T is a read transaction on data X then

This algorithm states that if there is an active write operation on data X when a
transaction T is requesting for X, then reject the transaction T. If the transaction T is
started as soon as write is complete or no going write operation on X, then execute
T.
For example, if there is an update on marks1 on MARKS table and meanwhile
there is a request to read marks1, then do not perform read marks1. This is because
there is an update being happening on marks1. If there was an update on marks1
which is executed long back or it is complete just now and there is a request to read
marks1, then system will allow reading marks1.
 If a transaction T is a write transaction on data X then

This algorithm describes about write operation. If there is an active read or


write on data X, and at the same time if the transaction T is requesting for X, then the
transaction is rejected. If there is no active read / write operation on X, then execute
the transaction.
Suppose T1 is reading marks1 from MARKS table. Meanwhile transaction T2
begins and tries to update marks1 in MARKS. Then the transaction T2 is rejected and
rolled back.

Concurrency control with time stamping methods


The time stamping approach to scheduling concurrent transactions assigns
a global, unique transaction. The time stamp value produces an explicit order in
which transactions are submitted to the DBMS. Time stamps must have two
properties: uniqueness and monotonicity. Uniqueness ensures that no equal time
stamp values can exist; Monotonicity ensures that time stamp values always
increase.
The disadvantage of this approach is that each value stored in the database
requires two additional time stamp fields: one for the last time the field was read and
one for the last update .Time stamping thus increases memory needs and the
database's processing overhead.

Concurrency Control with Optimistic methods


Each transaction moves through two or three phases are:
 During the read phase, the transaction reads the database, executes the needed
computations, and makes the updates to private copy of the database values.
All update operations of the transaction are recorded in a temporary update
file, which is not accessed by the remaining transactions.
 During the validation phase, the transaction is validated to ensure that the
changes made will not affect the integrity and consistency of the database. if
the validation test is positive , the transaction goes to the write phase .if the
validation test is negative, the transaction is restarted and the changes are
discarded.
 During the write phase, the changes are permanently applied to the database.
The optimistic approach is acceptable for most read or query database systems that
require few update transactions.
Specialized Locking Techniques:
TWO-PHASE COMMIT PROTOCOL
The two-phase commit protocol guarantees that if a portion of a transaction
operation cannot be committed, all changes made at the other sites participating in the
transaction will be undone to maintain a consistent database state.
Each DP maintains its own transaction log. The two-phase commit protocol
requires that the transaction entry log for each DP be written before the database
fragment is actually updated. Therefore, the two-phase commit protocol requires a
DO-UNDO-REDO protocol and a write-ahead protocol.
The DO-UNDO-REDO protocol is used by the DP to roll back and/or roll
forward transactions with the help of the systems transaction log entries. The DO-
UNDO-REDO protocol defines three types of operations:
• DO performs the operation and records the “before” and “after” values in the
transaction log.
• UNDO reverses an operation, using the log entries written by the DO portion of
the sequence.
• REDO redoes an operation, using the log entries written by the DO portion of
the sequence.
To ensure that the DO, UNDO, and REDO operations can survive a system
crash while they are being executed, a write-ahead protocol is used .The write-
aheadprotocol ensures that transaction logs are always written before any database
data are actually updated and forces the log entry to be written to permanent storage
before the actual operation takes place.
The two-phase commit protocol defines the operations between two types of
nodes: the coordinator and one or more subordinates. The participating nodes agree
on a coordinator. Generally, the coordinator role is assigned to the node that initiates
the transaction. However, different systems implement various, more sophisticated
election methods. The protocol is implemented in two phases:
Phase 1: Preparation
The coordinator sends a PREPARE TO COMMIT message to all subordinates.
1. The subordinates receive the message; write the transaction log, using the write-
ahead protocol; and send an acknowledgment (YES/PREPARED TO COMMIT or
NO/NOT PREPARED) message to the coordinator.
2. The coordinator makes sure that all nodes are ready to commit, or it aborts the
action.
If all nodes are PREPARED TO COMMIT.the transaction goes to phase 2. If one or
more nodes reply NO or NOT PREPARED.the coordinator broadcasts an ABORT
message to all subordinates.
Phase 2: The Final COMMIT
1. The coordinator broadcasts a COMMIT message to all subordinates and waits
for the replies.
2.Each subordinate receives the COMMIT message, and then updates the
Database using the DO protocol.
3.The subordinates reply with a COMMITTED or NOT COMMITTED message to
the coordinator.

If one or more subordinates did not commit, the coordinator sends an ABORT
message, thereby forcing them to UNDO all changes.
III.Crash Recovery
DBMS may be an extremely complicated system with many transactions being
executed each second. The sturdiness and hardiness of software rely upon its
complicated design and its underlying hardware and system package. If it fails or
crashes amid transactions, it’s expected that the system would follow some style of
rule or techniques to recover lost knowledge.
DATABASE RECOVERY IN DBMS AND ITS TECHNIQUES
Classification of failure:
To see wherever the matter has occurred, we tend to generalize a failure into
numerous classes, as follows:
 Transaction failure
 System crash
 Disk failure

Types of Failure
1. Transaction failure: A transaction needs to abort once it fails to execute or
once it reaches to any further extent from wherever it can’t go to any extent
further. This is often known as transaction failure wherever solely many
transactions or processes are hurt. The reasons for transaction failure are:
 Logical errors
 System errors
1. Logical errors: Where a transaction cannot complete as a result of its code
error or an internal error condition.
2. System errors: Wherever the information system itself terminates an energetic
transaction as a result of the DBMS isn’t able to execute it, or it’s to prevent
due to some system condition. to Illustrate, just in case of situation or resource
inconvenience, the system aborts an active transaction.
3. System crash: There are issues − external to the system − that will cause the
system to prevent abruptly and cause the system to crash. For instance,
interruptions in power supply might cause the failure of underlying hardware
or software package failure. Examples might include OS errors.
4. Disk failure: In early days of technology evolution, it had been a typical
drawback wherever hard-disk drives or storage drives accustomed to failing
oftentimes. Disk failures include the formation of dangerous sectors,
unreachability to the disk, disk crash or the other failure, that destroys all or a
section of disk storage.
Storage structure:
Classification of storage structure is as explained below:

Classification Of Storage
1. Volatile storage: As the name suggests, a memory board (volatile storage)
cannot survive system crashes. Volatile storage devices are placed terribly near
to the CPU; usually, they’re embedded on the chipset itself. For instance, main
memory and cache memory are samples of the memory board. They’re quick
however will store a solely little quantity of knowledge.
2. Non-volatile storage: These recollections are created to survive system
crashes. they’re immense in information storage capability, however slower in
the accessibility. Examples could include hard-disks, magnetic tapes, flash
memory, and non-volatile (battery backed up) RAM.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions
performed by a transaction. It is important that the logs are written prior to
the actual modification and stored on a stable storage media, which is failsafe.

Log-based recovery works as follows −

 The log file is kept on a stable storage media.

 When a transaction enters the system and starts execution, it writes a log about it.

<Tn, Start>

Recovery with Concurrent Transactions


When more than one transaction are being executed in parallel, the logs are
interleaved. At the time of recovery, it would become hard for the recovery
system to backtrack all logs, and then start recovering. To ease this situation,
most modern DBMS use the concept of 'checkpoints'.

Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out
all the memory space available in the system. As time passes, the log file may
grow too big to be handled at all. Checkpoint is a mechanism where all the
previous logs are removed from the system and stored permanently in a
storage disk. Checkpoint declares a point before which the DBMS was in
consistent state, and all the transactions were committed.

Recovery
When a system with concurrent transactions crashes and recovers, it behaves
in the following manner −

 The recovery system reads the logs backwards from the end to the last checkpoint.

 It maintains two lists, an undo-list and a redo-list.

 If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list.

 If the recovery system sees a log with <Tn, Start> but no commit or abort log
found, it puts the transaction in undo-list.

All the transactions in the undo-list are then undone and their logs are
removed. All the transactions in the redo-list and their previous logs are
removed and then redone before saving their logs.

Shadow paging is an alternative to log-based recovery techniques, which has both


advantages and disadvantages. It may require fewer disk accesses, but it is hard to
extend paging to allow multiple concurrent transactions. The paging is very similar to
paging schemes used by the operating system for memory management.
The idea is to maintain two page tables during the life of a transaction: the current page
table and the shadow page table. When the transaction starts, both tables are
identical. The shadow page is never changed during the life of the transaction. The
current page is updated with each write operation. Each table entry points to a page
on the disk. When the transaction is committed, the shadow page entry becomes a
copy of the current page table entry and the disk block with the old data is released. If
the shadow is stored in non volatile memory and a system crash occurs, then the
shadow page table is copied to the current page table. This guarantees that the shadow
page table will point to the database pages corresponding to the state of the database
prior to any transaction that was active at the time of the crash, making aborts
automatic.

There are drawbacks to the shadow-page technique:

1. Commit overhead. The commit of a single transaction using shadow paging


requires multiple blocks to be output -- the current page table, the actual data and
the disk address of the current page table. Log-based schemes need to output only
the log records.
2. Data fragmentation. Shadow paging causes database pages to change locations
(therefore, no longer contiguous.
3. Garbage collection. Each time that a transaction commits, the database pages
containing the old version of data changed by the transactions must become
inaccessible. Such pages are considered to be garbage since they are not part of
the free space and do not contain any usable information. Periodically it is
necessary to find all of the garbage pages and add them to the list of free pages.
This process is called garbage collection and imposes additional overhead and
complexity on the system.

Media Recovery:
Media recovery is based on periodically making a copy of the database. Because copying a
large database object such as a file Can take a long time, and the DBMS must be allowed to
continue with its operations in the meantime, creating a copy is handled in a manner similar
to taking a fuzzy checkpoint. Then a databa.se object such as a file or a page is corrupted,
the copy of that object is brought up-to-date by using the log to identify and reapply the
changes of committed transactions and undo the changes of uncommitted transactions (as of
the time of the media recovery operation).
The begin checkpoint LSN of the most recent complete checkpoint is recorded along with
the copy of the database object to minimize the work in reapplying changes of committed
transactions. Let us compare the smallest recLSN of a dirty page in the corresponding end
check point record with the LSN of the begin check point record and call the smaller of
these two LSNs 1.
We observe that the actions recorded in all log records with LSNs less than 1 must be
ref1ected in the copy. Thus: only log records with LSNs greater than 1 need be reapplied to
the copy.
Finally, the updates of transactions that are incomplete at the time of media recovery or that
were aborted after the fuzzy copy was completed need to be undone to ensure that the page
reflects only the actions of committed transactions. The set of such transactions can be
identified in the Analysis pass, and we omit the details.
ARIES recovery algorithm
Algorithms for Recovery and Isolation Exploiting Semantics, or ARIES is a recovery algorithm
designed to work with a no-force, steal database approach.
ARIES uses a steal/no-force approach for writing, and it is based on three concepts:
1. Write-ahead logging: Any change to a database object is first recorded in the log; the
record in the log must be written to stable storage before the change to the database object is
written to disk.
2. Repeating history during redo:
ARIES will retrace all actions of the database system prior to the crash to reconstruct the database
state when the crash occurred. Transactions that were uncommitted at the time of the crash (active
transactions) are undone.
3. Logging changes during undo: It will prevent ARIES from repeating the completed undo
operations if a failure occurs during recovery, which causes a restart of the recovery
process.
Before describing this topic we need to elaborate some concepts:
1. Log sequence number
It refers to a pointer used to identify the log records.
2. Dirty page table
It refers to pages with their updated version placed in main memory and disk version of it is not
updated.
A table is maintained which is useful in reducing unnecessary redo operation.
3. Fuzzy checkpoints.
A new type checkpoints i.e. fuzzy checkpoints has been derived that allowed to process new
transactions alter the log has been updated without having to update the database.
The ARIES recovery procedure consists of three main steps:
1. Analysis
The analysis step identifies the dirty (updated) pages in the buffer (Note 6), and the set of
transactions active at the time of the crash. The appropriate point in the log where the REDO
operation should start is also determined
2. REDO
The REDO phase actually reapplies updates from the log to the database. Generally, the REDO
operation is applied to only committed transactions. However, in ARIES, this is not the case.
Certain information in the ARIES log will provide the start point for REDO, from which REDO
operations are applied until the end of the log is reached. In addition, information stored by
ARIES and in the data pages will allow ARIES to determine whether the operation to be redone
has actually been applied to the database and hence need not be reapplied. Thus only the
necessary REDO operations are applied during recovery.
3. UNDO
During the UNDO phase, the log is scanned backwards and the operations of transactions that
were active at the time of the crash are undone in reverse order. The information needed for
ARIES to accomplish its recovery procedure includes the log, the Transaction Table, and the
Dirty Page Table. In addition, check pointing is used. These two tables are maintained by the
transaction manager and written to the log during check pointing.

Potrebbero piacerti anche