Sei sulla pagina 1di 11

Implementation of Database

Exercise 4
Tanmaya Mahapatra
Matriculation Number : 340959
tanmaya.mahapatra@rwth-aachen.de
Bharath Rangaraj
Matriculation Number : 340909
bharath.rangaraj@rwth-aachen.de
Manasi Jayapal
Matriculation Number : 340892
manasi.jayapal@rwth-aachen.de
December 8, 2013
1 Exercise 4.1 [Query Optimization] :
Consider the join of three relations R, S, U . Their attributes and the esti-
mated size of values sets for the attributes in each relation are summarized
below, with V (Rel, attr) denoting the number of distinct values for the
attribute attr of relation Rel, and T (Rel) denoting the number of tuples
within relation Rel:
R(a,b) S(b,c) U(c,a)
T(R) = 800 T(S) = 600 T(U) = 400
V(R,a) = 100 V(S,b) = 120 V(U,c) = 100
V(R,b) = 100 V(S,c) = 200 V(U,a) = 40
Moreover, we make the following assumptions:
Cost of a sub-plan is the sum of the sizes of all intermediate relations,
excluding the base relations and the join result of the sub-plan under
consideration.
Size of a relation is simplied to be its cardinality, i.e., number of
tuples.
All attributes are mutually independent.
1
1 EXERCISE 4.1 [QUERY OPTIMIZATION] : 2
1.1 Estimate the sizes of R S, S U, R U and R S U
Solution
R S Number of Tuples in R = 800
Number of Tuples in S = 600
b is the common attribute/column.
Maximum Distinct values of b in both R and S = 100
Size of the Join =
T(R)T(S)
max(V (R,b),V (S,b))
=
800600
max(100,120)
=
800600
120
= 4000
S U Number of Tuples in S = 600
Number of Tuples in U = 400
c is the common attribute/column.
Maximum Distinct values of c in both S and U = 200
Size of the Join =
T(S)T(U)
max(V (S,c),V (U,c))
=
600400
max(200,100)
=
600400
200
= 1200
R U Number of Tuples in R = 800
Number of Tuples in U = 400
a is the common attribute/column.
Maximum Distinct values of a in both R and U = 100
Size of the Join =
T(R)T(U)
max(V (R,a),V (U,a))
=
800400
max(100,40)
=
800400
100
= 3200
R S U For this Join rst we consider Join of R and S
Number of Tuples in R = 800
Number of Tuples in S = 600
b is the common attribute/column.
Maximum Distinct values of b in both R and S = 100
Size of the Join =
T(R)T(S)
max(V (R,b),V (S,b))
=
800600
max(100,120)
=
800600
120
1.1 Estimate the sizes of R S, S U, R U and R S
U
1 EXERCISE 4.1 [QUERY OPTIMIZATION] : 3
= 4000
Let us consider the joined Relation of R and S to be X
Now join of X with U
Number of Tuples in X = 4000
Number of Tuples in U = 400
a and C the common attributes/columns.
Size of the Join =
T(X)T(U)
max(V (X,a),V (U,a))max(V (X,c),V (U,c))
=
4000400
max(100,40)max(200,100)
=
4000400
100200
= 80
1.2 Please use dynamic programming to nd the cheapest
left-deep join plan for the natural join of the three re-
lations. Be specic about the estimation of the cost of
each candidate sub-plan. For each subplan, specify the
estimated cost, and the best join plan.
Solution
In the First step of Dynamic Programming we consider the singleton
set of relations and their size, cost are given. Please Refer to Table 1.
1. For the Singleton sets, the sizes are as given.
2. The Cost is Zero since there are no intermediate relations needed.
In the Second step there are 2 possible plane i.e since either of the two
relations can be on the left side of the argument. But Since we are
considering only left-deep plan and the relations are of unequal sizes,
we keep the smallest relation in the left hand side of the argument.
Please Refer to Table 2.
1. The sizes are computed using the Standard Formula as done in
previous question.
2. The Cost for reach is zero since there are still no intermediate
relations in a join of two.
The joining of 3 relations is considered. Please Refer to Table 3.
1. The sizes are computed using the Standard Formula as done in
previous question.
2. The Cost estimate for each of the triple relations is the size of
the one intermediate relation - the join of the rst two chosen.
Since we want this cost to be as small as possible, we consider
1.2 Please use dynamic programming to nd the cheapest
left-deep join plan for the natural join of the three relations.
Be specic about the estimation of the cost of each candi-
date sub-plan. For each subplan, specify the estimated cost,
and the best join plan.
1 EXERCISE 4.1 [QUERY OPTIMIZATION] : 4
each pair of two out of the three relations and take the pair with
the smallest size.
The Join plan : (U S) R is the cheapest as evident from
the table and the corresponding Join tree is given below.
{R} {S} {U}
Size 800 600 400
Cost 0 0 0
Best Plan R S U
Table 1: The table for Singleton sets.
{S,R} {U,R} {U,S}
Size 4000 3200 1200
Cost 0 0 0
Best Plan S R U R U S
Table 2: The table for pair of relations.
{S,R,U} {U,R,S} {U,S,R}
Size 80 80 80
Cost 4000 3200 1200
Best Plan (S R) U (U R) S (U S) R
Table 3: The table for triple of relations.

U S
R
1.2 Please use dynamic programming to nd the cheapest
left-deep join plan for the natural join of the three relations.
Be specic about the estimation of the cost of each candi-
date sub-plan. For each subplan, specify the estimated cost,
and the best join plan.
2)
1) What are the slzes of the two relatlons ln terms of blocks?
Ans:
1(PumanCrlmlnal) = 10,000 tuples
1(uogCrlmlnal) = 30,000 tuples
Cne block wlll holo 1 PumanCrlmlnal tuple ano 1 uogCrlmlnal tuple
So the slze ln terms of blocks ls
1(PumanCrlmlnal) = 10,000 8locks
1(uogCrlmlnal) = 3000 8locks.
2) What ls the slze of the query result ln terms of tuples?
Ans:
SLLLC1 *
l8CM PumanCrlmlnal P, uogCrlmlnal u
WPL8L P.favorlte bar = u.favorlte bar Anu PasMustache(P.plcture)
1hls ls the glven query
llrst we flno the slze of the [oln relatlon over the [oln conoltlon
= 10,000 * 30,000 *10
-6
=300
then applylng the

PasMustache(P.plcture)
=300 * 0.1
=3
1he result of the query wlll have 3 tuples.



3. Cost of plan P1: { HasMustache(picture) (HumanCriminal)} DogCriminal
The cost of the plan = Cost(Selection) + Cost(Join)
= Cost() + Cost()
Cost() = T(HumanCriminal) * Cost(HasMustache() predicate)
= 10000 * N
Cost() = (Here, we assume that each block is held in 1 page)
= N + M(N / Buffer) (for Block Nested Loop Join)
where,
N = Number of blocks in outer relation = {Number of blocks in HumanCriminal}
* 1% = 100,
M = Number of blocks in inner relation (DogCriminal) = 5000
Therefore, Cost() = 100 + 5000(100/100) = 5100
Cost of plan P1
= Cost() + Cost()
= (10000N + 5100)
Cost of plan P2: HasMustache(picture) {HumanCriminal DogCriminal}
The cost of the plan = Cost (Join) + Cost(Selection) = Cost() + Cost().
Cost() = (We assume that each block is held in 1 page)
= N + M(N / Buffer) (for Block Nested Loop Join)
where, N = Number of blocks in outer relation (HumanCriminal) = 10000,
M = Number of blocks in inner relation (DogCriminal) = 5000
Therefore, Cost()
= 10000 + 5000(10000/100) = 510000
Cost()
= [Number of tuples in the join result {HumanCriminal DogCriminal}] *
Cost(HasMustache() predicate)
= [ { T(HumanCriminal) * T(DogCriminal) } * P ] (where P = probability 10-6) *
Cost(HasMustache() predicate)
= 500 * N
Therefore, cost of plan P2
= Cost() + Cost()
= (510000 + 500N)
4. For plan P1 to be cheaper than plan P2:
(10000N + 5100) <= (510000 + 500N)
Therefore, N < =53
So, for all values of N <= 53, plan P1 should be cheaper than plan P2.

Exercise 4.3 |Datalog| :
1. Given the Iollowing extensional database:
child(X, Y): X is child oI Y
male(X): X is a male person
DeIine the Iollowing relations oI the intensional database by speciIying
appropriate Datalog rules:
(a) parent(X,Y): X is a parent oI Y.
Solution :
Parent(X,Y) :- Child(Y,X).
(b) married(X,Y): X and Y are parents oI the same child.
Solution :
Married(X,Y) :- Child(Z,X),
Child(Z,Y),
Male(X),
not Male(Y).
OR
Married(X,Y) :- Child(Z,X),
Child(Z,Y),
Male(Y),
not Male(X).
(c) sister(X,Y): X is a sister oI Y.
Solution :
Sister(X,Y) :- Child(X,Z),
Child(Y,Z),
Male(Y),
not Male(X).
(d) halIbrother(X,Y): X is a halI-brother oI Y (X and Y have only one common
parent).
Solution :
HalI-Brother(X,Y) :- Child(X,Z),
Child(X,P),
Child(Y,Z),
Child(Y,Q),
Male(X).
2. Decide whether the following two Datalog programs are
strati_ed or not and explain why.

Datalog program 1:
q(X) :- p(X), t(X).
p(X) :- s(X,X), r(X).
s(X,Y) :- s(Y,X), t(Y).
r(X) :- t(X), s(X,X).


Since there is no negative self loop on any of the strata layers.
The given datalog is stratified.


Datalog program 2:
p(X) :- q(X,Y), t(Y).
t(Y) :- q(Y,Z), t(Z).
t(Y) :- s(Y).



There is negative self loop in stratum 2. As per the rule the stratum
in layer I cannot negatively depend on a table in the same stratum.
The given datalog is not stratified
3)
Ans:
Minimal model 1:
q(x) :- b(x) , p(x)
p(x) :-a(x) , q(x)
r(x) :-p(x) , b(x)

Minimal model 2:
p(x):- a(x) , b(x)
q(x):-b(x)
r(x):-p(x)

Potrebbero piacerti anche