Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Concepts
3
How things can go wrong?
Suppose we have a table of bank accounts
which contains the balance of the account.
A deposit of $50 to a particular account
# 1234 would be written as:
update Accounts
set balance = balance + $50
where account#= 1234;
5
A bad concurrent execution
But only one action (e.g. a read or a write)
can happen at a time, and there are a
variety of ways in which the two deposits
could be simultaneously executed:
Deposit 1 Deposit 2
BAD! read(X.bal)
read(X.bal)
time X.bal := X.bal + $50
X.bal:= X.bal + $10
write(X.bal)
write(X.bal)
6
A good execution
The previous execution would have been
fine if the two accounts were different
(i.e. one were X and one were Y).
The following execution is a serial
execution, and executes one transaction
after the other:
Deposit 1 Deposit 2
read(X.bal)
GOOD! X.bal := X.bal + $50
write(X.bal)
time read(X.bal)
X.bal:= X.bal + $10
write(X.bal) 7
Good executions
An execution is good if is it serial (i.e.
the transactions are executed one after
the other) or serializable (i.e. equivalent to
some serial execution)
Deposit 1 Deposit 3
read(X.bal)
read(Y.bal)
X.bal := X.bal + $50
Y.bal:= Y.bal + $10
write(X.bal)
write(Y.bal)
This execution is equivalent to executing
Deposit 1 then Deposit 3, or vice versa. 8
Motivation for Concurrent Executions
Multiple transactions are allowed to run concurrently
in the system.
Ensuring transaction isolation while permitting
concurrent execution is difficult but necessary for
performance reasons.
increased processor and disk utilization, leading
to better transaction throughput (the ave. no. of
transactions completed in a given time): one transaction can
be using the CPU while another is reading from or
writing to the disk.
reduced average response time for transactions
(average time taken to complete a transaction): short
transactions need not wait behind long ones.
9
SIMPLE MODEL OF A DATABASE
10
SIMPLE MODEL OF A DATABASE
(for purposes of discussing transactions)
read_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk
block is not already in some main memory buffer).
3. Copy item X from the buffer to the program variable named
X.
13
Additional Operations In Transaction Management
For recovery purposes, the recovery manager keeps track of the
following operations:
begin_transaction: marks the beginning of transaction execution.
read or write: specify read or write operations on the database items
executed as a part of a transaction.
end_transaction: specifies that read and write transaction operations
have ended and marks the end limit of transaction execution.
commit_transaction: signals a successful end of the transaction so
that any changes (updates) executed by the transaction can be
safely committed to the database and will not be undone.
rollback (or abort): signals that the transaction has ended
unsuccessfully, so that any changes or effects that the
transaction may have applied to the database must be undone.
Recovery Techniques Use The Following Operations:
undo: Similar to rollback except that it applies to a single operation
rather than to a whole transaction.
redo: specifies that certain transaction operations must be redone to
ensure that all the operations of a committed transaction have
been applied successfully to the database.
14
Aborting a Transaction
If a transaction Ti is aborted, all of its updates must
be undone. Not only that, if Tj reads an object last
written by Ti, Tj must be aborted as well!
cascading aborts.
18
Schedule 1
Let T1 transfer $50 from A to B, and T2 transfer 10% of the
balance from A to B.
A serial schedule in which T1 is followed by T2:
19
Schedule 2
A serial schedule where T2 is followed by T1
20
Schedule 3
Let T1 and T2 be the transactions defined previously.
The following schedule is not a serial schedule, but it is
equivalent to Schedule 1.
22
1. The temporary update (or dirty read) problem
(WR conflicts)
26
3. Lost update problem (WW Conflicts)
27
3. Lost update problem (WW Conflicts)
sum =0;
R(X)
sum = sum+M;
---
R(X);
X = X-N;
W(X);
R(X);
sum= sum+X;
R(Y);
W(Y);
T2 reads X after N is
R(Y); substracted and reads Y
Y = Y + N; before N is added; a
W(Y) wrong summary is the
result (off by N)
29
State transition diagram for
transaction execution
Read-write
Abort Abort
TERMINATED
FAILED
30
Commit Point of a Transaction
35
Implementation of Atomicity and
Durability (Cont.)
The shadow-database scheme:
36
Transaction Support in SQL2
A single SQL statement is always considered to be atomic.
With SQL, there is no explicit Begin Transaction statement.
Transaction initiation is done implicitly when particular SQL
statements are encountered.
Every transaction must have an explicit end statement, which is
either a commit or rollback.
38
Read-write transactions
If we state read-only, then the transaction
cannot perform any updates; that is ,insert,
delete, update, and create commands cannot
be executed.
set transaction read only;
ILLEGAL! update Accounts
set balance = balance - $100
where account#= 1234; ...
Instead, we must specify that the transaction
read-write, updates (the default):
set transaction read write;
update Accounts
set balance = balance - $100
where account#= 1234; ... 39
Isolation Levels
The default isolation level is SERIALIZABLE
To signal to the system that a dirty read is
acceptable,
SET TRANSACTION
ISOLATION LEVEL REPEATABLE READ; 40
Implementing Isolation Levels
The approach for implementing these isolation levels
is to use locking:
each data item is either locked (in some mode, e.g. shared or
exclusive) or is available (no lock)
an action on a data item can be executed if the transaction
holds an appropriate lock
Appropriate locks:
before a read operation can be executed, a shared lock must
be acquired
before a write operation can be executed, an exclusive lock
must be acquired.
Locks are acquired at some level: record, page, table, etc.
41
Locking
Locks on a data item are granted based on
a lock compatibility matrix:
Current Mode of Data Item
None Shared Exclusive
42
Bad execution revisited
If the system used locking, the first bad
execution could have been avoided :
Deposit 1 Deposit 2
xlock(X)
read(X.bal)
{xlock(X) is not granted}
X.bal := X.bal + $50
write(X.bal)
release(X)
xlock(X)
read(X.bal)
X.bal:= X.bal + $10
write(X.bal)
release(X) 43
Isolation levels and locking
READ UNCOMMITTED allows queries in the
transaction to read data without acquiring
any lock. For updates, exclusive locks must
be obtained and held to end of transaction.
READ COMMITTED requires a read-lock to
be obtained for all tuples touched by
queries, but releases the locks immediately
after the read. Exclusive locks must be
obtained for updates and held to end of
transaction.
44
Isolation levels and locking, cont.
REPEATABLE READ places shared locks on
tuples retrieved by queries, holds them until
the end of the transaction. Exclusive locks
must be obtained for updates and held to end
of transaction.
SERIALIZABLE places shared locks on tuples
retrieved by queries as well as the index,
holds them until the end of the transaction.
Exclusive locks must be obtained for updates
and held to end of transaction.
Holding locks to the end of a transaction is
called strict.
45
Summary of Isolation Levels
Type of Violation
Isolation Level Dirty Read Unrepeatable Read Phantoms
REPEATABLE No No Maybe
READ
SERIALIZABLE No No No
46
Levels of Consistency in SQL-92
Serializable default
Repeatable read only committed records to be read,
repeated reads of same record must return the same
value. However, a transaction may not be serializable it
may find some records inserted by a transaction but not
find others (phantom problem).
Read committed only committed records can be read,
but successive reads of record may return different
(but committed) values.
Read uncommitted even uncommitted records may be
read.
Note: Lower degrees of consistency useful for gathering
approximate information about the database
47
Levels of Consistency in SQL-92
phantom problem:
New rows being read using the same read with a
condition
A transaction T1 may read a set of rows from a table,
perhaps based on some condition specified in the SQL
WHERE clause. Now suppose that a transaction T2
inserts a new row that also satisfies the WHERE clause
condition of T1, into the table used by T1. If T1 is
repeated, then T1 will see a row that previously did not
exist, even though it does not modify any of these tuples
itself. This is called a phantom.
In short, a transaction retrieves a collection of tuples
twice and sees different results.
48
Levels of Consistency in SQL-92
Ex: T1: find the youngest sailor whose rating = 8
T2: insert a new sailor aged 16
Suppose that the DBMS sets shared locks on every existing Sailors
row with rating = 8 for T1. This does not prevent Xact T2 from
creating a brand new row with rating=8 and setting an exclusive
lock on this row. If this new row has a smaller age value than
existing rows, T1 returns an answer that depends on when it is
executed relative to T2. However, the locking scheme imposes no
relative order on these two transactions.
This phenomenon is called the phantom problem: a transaction
retrieves a collection of tuples twice and sees different results,
even though it does not modify any of these tuples itself.
To prevent phantom, DBMS must conceptually lock all possible rows
with rating = 8 on behalf of T1. One way to do this is to lock the
entire table, at the cost of low concurency. It is possible to take
advantage of indexes to do better, but in general preventing
phantoms can have a significant impact on concurrency.
49
Weak Levels of Consistency
Some applications are willing to live with weak
levels of consistency, allowing schedules that
are not serializable
E.g. a read-only transaction that wants to get an
approximate total balance of all accounts
E.g. database statistics computed for query
optimization can be approximate
Such transactions need not be serializable with
respect to other transactions
Tradeoff accuracy for performance
50
The theory of serializability
A schedule of a set of transactions is a linear
ordering of their actions. E.g. for the
simultaneous deposits example:
R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal)
A serial schedule is one in which all the steps
of each transaction occur consecutively. That
is, serial schedule does not interleave the
actions of different transactions.
The example above is not serial.
51
Scheduling Transactions
Equivalent schedules: For any database state,
the effect (on the set of objects in the
database) of executing the first schedule is
identical to the effect of executing the second
schedule.
A serializable schedule is one which is
equivalent to some serial execution of the
transactions.
53
Testing for serializability
Given a schedule S, we can construct a di-
graph G=(V,E) called a precedence graph
V : all transactions in S
E : Ti Tj whenever an action of Ti precedes and
conflicts with an action of Tj in S
Theorem: A schedule S is conflict serializable
if and only if its precedence graph contains no
cycles.
Note that testing for a cycle in a digraph can
be done in time O(|V|2)
54
Conflict Serializable Schedules
Two schedules are conflict equivalent
if:
Involve the same actions of the
same transactions
Every pair of conflicting actions is
ordered the same way
A
T1 T2 Dependency graph
B
The cycle in the graph reveals the problem. The
output of T1 depends on T2, and vice-versa.
57
Another example
T1 T2 T3
R(X) Y x
W(X) T1 T2 T3
R(X)
W(X) Acyclic: serializable
R(Y)
W(Y)
R(Y) A schedule: T1,T2,T3
W(Y)
58
An example
T1 T2 T3
R(X,Y,Z) x
R(X) Y Y
T1 T2 T3
W(X)
R(Y) x
W(Y)
Cyclic: Not serialiable.
R(Y)
R(X)
W(Z)
59
An example
T1 T2 T3
R(Y);
R(Z);
R(X);
W(X); W(Y); X,Y
W(Z);
Y Y,Z
R(Z); T1 T3 T2
R(Y);
W(Y); No Cycle: serialiable.
R(X);
W(Y);
R(X); Equivalent serial schedule:
W(X); T3;T1;T2;
Schedule A 60
An example
T1 T2 T3
R(Z);
R(Y);
W(Y);
R(Y); Y
R(Z);
Y Y,Z
R(X); T1 T3 T2
W(X); W(Y);
X
W(Z);
R(X);
R(Y); Cyclic: nonserializable.
W(Y);
W(X); Equivalent serial
schedules: None
Schedule B 61
Example Schedule Precedence Graph
T1 T2 T3 T4 T5
read(X)
read(Y)
read(Z)
read(V) Y
read(W) T1 T2
read(W)
read(Y) Y,Z Y
write(Y)
Z
write(Z)
read(U)
read(Y) T3 T4
write(Y)
Z
read(Z)
write(Z)
read(U) A sechedule:
write(U)
T1,T2,T3,T4
62
Test for Conflict Serializability
A schedule is conflict serializable if and
only if its precedence graph is acyclic.
Cycle-detection algorithms exist which
take order n2 time, where n is the number
of vertices in the graph.
(Better algorithms take order n + e
where e is the number of edges.)
If precedence graph is acyclic, the
serializability order can be obtained by a
topological sorting of the graph.
This is a linear order consistent with
the partial order of the graph.
For example, a serializability order for
the previous schedule would be
T1 T2 T3 T4
Are there others?
63
View Serializability
Conflict serializability is sufficient but necessary fro serializability.
View serializability is a more geberal sufficient condition for
serializability.
View serializability: definition of view serializability based on view
equivalence. A schedule is view serializable if it is view
equivalent to a serial schedule.
Schedules S1 and S2 are view equivalent if:
The same set of transactions participates in S1 and S2, and S1 and
S2 include the same operations of those transactions.
If Ti reads the initial value of object A in S1, then Ti also reads the
initial value of A in S2.
If Ti reads a value of A written by Tj in S1, then Ti also reads
the value of A written by Tj in S2.
For any data object A, If Ti writes final value of A in S1, then Ti
also writes final value of A in S2
T1: R(A) W(A) T1: R(A),W(A)
T2: W(A) T2: W(A)
T3: W(A) T3: W(A)
64
View Equivalence and View Serializability
65
View Equivalence and View Serializability
66
View Serializability
A schedule S is view serializable if it is view
equivalent to a serial schedule.
Every conflict serializable schedule is also
view serializable.
Below is a schedule which is view-serializable
but not conflict serializable.
68
Test for View Serializability
The precedence graph test for conflict serializability
cannot be used directly to test for view serializability.
Enforcing or testing view serializability is very
expensive.
Extension to test for view serializability has the
cost exponential in the size of the precedence
graph.
The problem of checking if a schedule is view
serializable falls in the class of NP-complete problems.
Thus existence of an efficient algorithm is
extremely unlikely.
However practical algorithms that just check some
sufficient conditions for view serializability can still be
used. 69
Recoverable Schedules
Need to address the effect of transaction failures on concurrently
running transactions.
Recoverable schedule if a transaction Tj reads a data item
previously written by a transaction Ti , then the commit operation of
Ti appears before the commit operation of Tj.
The following schedule is not recoverable if T9 commits immediately
after the read
70
Cascading Rollbacks
Cascading rollback a single transaction failure leads to a
series of transaction rollbacks. Consider the following schedule
where none of the transactions has yet committed (so the
schedule is recoverable)
71
Cascadeless Schedules
Cascadeless schedules cascading rollbacks
cannot occur; for each pair of transactions Ti
and Tj such that Tj reads a data item
previously written by Ti, the commit
operation of Ti appears before the read
operation of Tj (..Wi(X), ci, Rj(X)...)
Every cascadeless schedule is also
recoverable
It is desirable to restrict the schedules to
those that are cascadeless
72
Summary
Transactions are all-or-nothing units of work
guaranteed despite concurrency or failures in the
system.
Theoretically, the correct execution of transactions
is serializable (i.e. equivalent to some serial
execution).
Tests for serializability help us understand why a
concurrency control protocol is correct.
A database must provide a mechanism that will ensure
that all possible schedules are
either conflict or view serializable, and
are recoverable and preferably cascadeless
Practically, this may adversely affect throughput
isolation levels.
With isolation levels, users can specify the level of
incorrectness they are willing to tolerate. 73