Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Systems
Query rewriting
Query
statistics logical query plan Optimization
execute Query
Execution
result
SQL query
Initial logical plan
parse
parse tree Rewrite rules
execute
result
Query Rewriting
B,D B,D
R.C = S.C
R.A = “c” Λ R.C = S.C
R.A = “c”
X
X
R S
R S
parse
parse tree
Query rewriting
statistics Best logical query plan
execute
result
Physical Plan Generation
B,D Project
parse
parse tree Enumerate possible
physical plans
Query rewriting
statistics Best logical query plan
Find the cost of
each plan
Physical plan generation
result
Physical Plan Generation
Physical
P1 P2 …. Pn plans
C1 C2 …. Cn Costs
Disk(s)
Logical Plans Vs. Physical Plans
B,D Project
R.A = “c” S
R
• Materialization: output of one operator written to
disk, next operator reads from the disk
• Pipelining: output of one operator directly fed to
next operator
Materialization
R.A = “c” S
R
Iterators: Pipelining
B,D
Each operator supports:
• Open()
R.A = “c” S
• GetNext()
• Close()
R
Iterator for Table Scan (R)
Open() {
/** initialize variables */ GetNext() {
b = first block of R; IF (t is past last tuple in block b) {
t = first tuple in block b; set b to next block;
} IF (there is no next block)
/** no more tuples */
RETURN EOT;
ELSE t = first tuple in b;
}
/** return current tuple */
oldt = t;
Close() { set t to next tuple in block b;
/** nothing to be done */ RETURN oldt;
} }
Iterator for Select
R.A = “c”
GetNext() {
LOOP:
t = Child.GetNext();
IF (t == EOT) {
Open() {
/** no more tuples */
/** initialize child */
RETURN EOT;
Child.Open();
}
}
ELSE IF (t.A == “c”)
Close() { RETURN t;
/** inform child */ ENDLOOP:
Child.Close(); }
}
Iterator for Sort
R.A
GetNext() {
IF (more tuples)
RETURN next tuple in order;
ELSE RETURN EOT;
}
Open() {
/** Bulk of the work is here */
Child.Open(); Close() {
Read all tuples from Child /** inform child */
and sort them Child.Close();
} }
Iterator for Tuple Nested Loop Join
Lexp Rexp
• TNLJ (conceptually)
for each r Lexp do
for each s Rexp do
if Lexp.C = Rexp.C, output r,s
Example 1: Left-Deep Plan
TNLJ
TableScan
TNLJ
R3(C,D)
TableScan TableScan
R1(A,B) R2(B,C)
Question: What is the sequence of getNext() calls?
Example 2: Right-Deep Plan
TNLJ
TableScan TNLJ
R3(C,D)
TableScan TableScan
R1(A,B) R2(B,C)
Worked on blackboard
Cost Measure for a Physical Plan
• There are many cost measures
– Time to completion
– Number of I/Os (we will see a lot of this)
– Number of getNext() calls
• Tradeoff: Simplicity of estimation Vs.
Accurate estimation of performance as
seen by user
Query Processing - In class order
SQL query
parse 2; 16.1
parse tree
First: A quick look at this
Query rewriting 3; 16.2,16.3
statistics logical query plan
Chapter 16
16.1 Parsing
16.2 Algebraic laws
16.3 Parse tree logical query plan
16.4 Estimating result sizes
16.5-16.7 Cost based optimization
Background Material
Chapter 5 Relational Algebra
Chapter 6 SQL
Why do we need Query Rewriting?
• Pruning the HUGE space of physical plans
– Eliminating redundant conditions/operators
– Rules that will improve performance with very
high probability
• Preprocessing
– Getting queries into a form that we know how
to handle best
R S= S R Commutativity
(R S) T=R (S T) Associativity
p (R S) =
[ p (R)] S
q (R S) = R [ q (S)]
Apply Rewrite Rule (2)
B,D B,D
R.C = S.C R.C = S.C
R.A = “c” X
R.A = “c” S
X
R S R
B,D [ R.C=S.C [R.A=“c”(R)] X S]
Apply Rewrite Rule (3)
B,D
B,D
R.C = S.C
Natural join
X R.A = “c” S
R.A = “c” S
R
R
B,D [[R.A=“c”(R)] S]
Rules: + combined (continued)
Example: R(A,B,C,D,E)
P: (A=3) (B=“cat”)
B = “cat” A=3