Sei sulla pagina 1di 6

Implementation of Database Exercise 3

Tanmaya Mahapatra Matriculation Number : 340959 tanmaya.mahapatra@rwth-aachen.de Bharath Rangaraj Matriculation Number : 340909 bharath.rangaraj@rwth-aachen.de Manasi Jayapal Matriculation Number : 340892 manasi.jayapal@rwth-aachen.de November 24, 2013

Exercise 3.1 [Evaluating Relational Operators] :

Consider the join R R.a=S.b S , given the following information about the relations to be joined. The cost metric is the number of page I/Os unless otherwise noted, and the cost of writing out the result should be uniformly ignored. Relation R contains 10,000 tuples and has 10 tuples per page. Relation S contains 2000 tuples and also has 10 tuples per page. Attribute b of relation S is the primary key for S. Both relations are stored as simple heap les. Neither relation has any indexes built on it. 52 buer pages are available.

1.1

What is the cost of joining R and S using a page-oriented simple nested loops join ? What is the minimum number of buer pages required for this cost to remain unchanged ?

Solution : For Relation R 1. Number of Tuples in R = 10000 2. Number of Tuples per page = 10 = (PR ) 1

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPERATORS] : 3. Number of Pages in R = For Relation S 1. Number of Tuples in S = 2000 2. Number of Tuples per page = 10 = (PS ) 3. Number of Pages in S =
2000 10 10000 10

= 1000 = (M)

= 200 = (N)

In simple page-oriented the I/O cost is M + M N Considering the Relation R to be outer relation we have : 1000 + 1000 200 = 1000 + (2 105 ) = 201000 I/Os But if we consider relation S to be outer relation we have: 200 + 200 1000 = 200 + (2 105 ) = 200200 I/Os Let the Minimum Number of Buers required = B We leave 2 buers : 1 for Output and 1 For scanning the inner relation. We read the outer relation R in B-2 buer pages so the cost should remain same : The formula is M + N BM 2 Considering R to be the outer relation we get : 1000 1000 + 200 B 2 This value should be equal to = 1000 + (2 105 ) 1000 5 = 1000 + 200 B 2 = 1000 + (2 10 ) 1000 5 = 200 B 2 = (2 10 ) 1000 = 200 B 2 = 200 1000 1000 = B 2 = 1000 1000 The max. value of B 2 should be 1000 for the equation to hold true. 1000 = 1000 = B 2 = B 2 = 1 = B = 3 3 is the minimum number of buer pages required for this cost to remain unchanged.

1.2

What is the cost of joining R and S using a block nested loops join? What is the minimum number of buer pages required for this cost to remain unchanged ?

Solution : In block nested loop join the total I/O cost is given by : M + BM 2 N Number of Buers (B) = 52 1.2 What is the cost of joining R and S using a block nested loops join? What is the minimum number of buer pages required for this cost to remain unchanged ?

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPERATORS] :

Buers to be used (B-2) = 50 Considering Relation R to be outer relation & substituing the values in the equation we get: 1000 + 1000 50 200 = 1000 + 20 200 = 1000 + 4000 = 5000 I/Os. 5000 I/Os is the cost of joining R and S using a block nested loops join(R is Outer Relation). Considering Relation S to be outer relation & substituing the values in the equation we get: 200 + 200 50 1000 = 200 + 4 1000 = 200 + 4000 = 4200 I/Os. 4200 I/Os is the cost of joining R and S using a block nested loops join (S is Outer Relation). Let the Minimum Number of Buer reuired = B We leave 2 buers : 1 for Output and 1 For scanning the inner relation. We read the outer relation R in B-2 buer pages so the cost should remain same : Considering relation R to be the Outer Relation. The formula is M + BM 2 N 1000 1000 + B 2 200 = 5000 1000 = B 2 200 = 4000 1000 = B 2 = 20 1000 = 20 = B 2 = B 2 = 50 = B = 52 52 is the minimum number of buer pages required for this cost to remain unchanged.

1.3

What is the cost of joining R and S using a sort-merge join? What is the minimum number of buer pages required for this cost to remain unchanged ?

Solution : In sort Merge Join the total cost equals to the cost of sorting R + the cost of sorting S + the cost of merging R and S. Cost for sorting R Here M = 1000 and B = 52 Let N 2 = M B 1000 = 52 1.3 What is the cost of joining R and S using a sort-merge join? What is the minimum number of buer pages required for this cost to remain unchanged ?

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPERATORS] :

= 19.230769231 = N 2 = 20 I/O Cost is given by : 2 M {logB 1 N 2 + 1} = 2 1000 {log51 20 + 1} = 2000 {0.76191890317 + 1} = 2000 2 = 4000 I/Os Cost for sorting S Here N = 200 and B = 52 Let N 3 = M B = 200 52 = 3.846153846 = N 3 = 4 I/O Cost is given by : 2 N {logB 1 N 3 + 1} = 2 200 {log51 4 + 1} = 400 {0.35258286878 + 1} = 400 2 = 800 I/Os In addition, the second phase of the sort-merge join algorithm requires an additional scan of both relations. The total cost is 4000 + 800 + 1000 + 200 = 6000 I/Os (1000 and 200 for scanning R & S respectively in second pahse) Calculating Minimum Number of Buers Required to keep the cost same With minimum Number of Buers the I/O cost for sorting the largest Relation should remain same. Then other things would also remain same automatically. So calculating B for largest Relation R Let B = x N 1 = 1000 x = 2 M {logx1 1000 x + 1} = 4000 1000 = logx1 x + 1 = 2 = logx1 1000 x =1 The value of the ceil function should be 1 for the eqn. to hold true. = (x 1) = 1000 x Solving the quadratic Equation : x2 x 1000 = 0 2 4ac x = b b 2a Here a = 1 , b = -1 and c = -1000 1 1411000 x= 21 x= x=
1 4001 21 163.25 21

1.3 What is the cost of joining R and S using a sort-merge join? What is the minimum number of buer pages required for this cost to remain unchanged ?

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPERATORS] :

Ignoring Negative values .25 x = 1+63 2 x = 32.12 = x = 33 33 is the minimum number of buer pages required for this cost to remain unchanged.

1.4

What is the cost of joining R and S using a hash join? What is the minimum number of buer pages required for this cost to remain unchanged ?

Solution : In Hash Join : In the partitioning phase we have to scan both R and S once and write them both out once. The cost of this phase is therefore 2(M + N ). In the second phase we scan each partition once, assuming no partition overows, at a cost of M + N I/Os. The total cost is therefore 3(M + N) Here M = 1000 and N = 200. Substituting the values : 3 (1000 + 200) = 3 1200 = 3600 I/Os. 3600 I/Os is the cost of joining R and S using a hash join. Calculating Number of Minimum Buers In this algo we need B > f M where f is fudge factor (Considering the largest Outer Relation R) Since B > f M = B > f M = B > f and B > M = B > 1000 B = 31.62 B 32 32 is the minimum number of buer pages required for this cost to remain unchanged.

1.5

What would be the lowest possible I/O cost for joining R and S using any join algorithm, and how much buer space would be needed to achieve this cost ? Explain briey.

Solution : In Block Nested Loop Join if enough buer is available then each relation is scanned just once, for a total I/O cost of M + N , which is optimal in comparison to other methods. The lowest possible I/O cost for joining R and S is 1000 + 200 = 1200 1.4 What is the cost of joining R and S using a hash join? What is the minimum number of buer pages required for this cost to remain unchanged ?

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPERATORS] :

I/Os. (Using Block Nested Loops Join) Calculating Minimum Number of Buers Required Let the Minimum Number of Buers required = B Considering Relation R to be outer relation. The total I/O cost is given by : 1000 M + N BM 1000 + 200 B 2 = 2 = 1200 1000 = 1000 + 200 B 2 = 1200 1000 = 200 B 2 = 200 1000 = B 2 = 1 The ceil value should be 1 for eqn. to hold true. But for that 1000 should be divided by a number which should atleast give 1. = B 2 = 1000 = B = 1002 Checking for correctness if we choose S to be outer Relation N + M BN 2 200 = 200 + 1000 1000 = 200 + 1000 0.2 = 200 + 1000 1 = 1200 1002 is the number of buer space needed to achieve this cost.

1.5 What would be the lowest possible I/O cost for joining R and S using any join algorithm, and how much buer space would be needed to achieve this cost ? Explain briey.