Notes 3

CPS216: Advanced Database
Systems
Notes 03:Query Processing

(Overview, contd.)
Shivnath Babu
Announcements
• First homework will be posted this week
– I will send an email announcement when it is posted
• We will start discussing past projects and
project ideas from next class
– Your first goal is to have a concretely defined project by
the last week of September (the earlier the better!)
– Try to form groups of two
– October and November should be used to work on the
projects
– Final “demo’s” will be in the first week of December
Overview of SQL query
Query
parse
Processing
parse tree
Query rewriting
Query
statistics logical query plan Optimization
Physical plan generation
physical query plan
execute Query
Execution
result
SQL query
Initial logical plan
parse
parse tree Rewrite rules
Query rewriting Logical plan

statistics logical query plan
Physical plan generation “Best” logical plan

physical query plan
execute
result
Query Rewriting
B,D B,D
R.C = S.C
R.A = “c” Λ R.C = S.C
R.A = “c”
X
X
R S
R S
We will revisit it towards the end of this lecture

SQL query
parse
parse tree
Query rewriting
statistics Best logical query plan
Best physical query plan
execute
result
Physical Plan Generation
B,D Project
Natural join Hash join

R.A = “c” S Index scan Table scan
R
R S
Best logical plan
SQL query
parse
parse tree Enumerate possible
physical plans
Query rewriting
statistics Best logical query plan
Find the cost of
each plan
Best physical query plan

Pick plan with
execute minimum cost
result
Physical Plan Generation
Logical Query Plan
Physical
P1 P2 …. Pn plans
C1 C2 …. Cn Costs
Pick minimum cost one

Plans for Query Execution
• Roadmap
– Path of a SQL query
– Operator trees
– Physical Vs Logical plans
– Plumbing: Materialization Vs pipelining
Modern DBMS Architecture
Applications
SQL
DBMS
Parser
Logical query plan
Query Optimizer
Physical query plan
Query Executor
Access method API calls
Storage Manager
Storage system API calls File system API calls
OS
Disk(s)
Logical Plans Vs. Physical Plans
B,D Project
Natural join Hash join

R.A = “c” S Index scan Table scan
R
R S
Best logical plan
Operator Plumbing
B,D
R.A = “c” S
R
• Materialization: output of one operator written to
disk, next operator reads from the disk
• Pipelining: output of one operator directly fed to
next operator
Materialization
Materialized here B,D
R.A = “c” S
R
Iterators: Pipelining
B,D
 Each operator supports:
• Open()
R.A = “c” S
• GetNext()
• Close()
R
Iterator for Table Scan (R)
Open() {
/** initialize variables */ GetNext() {
b = first block of R; IF (t is past last tuple in block b) {
t = first tuple in block b; set b to next block;
} IF (there is no next block)
/** no more tuples */
RETURN EOT;
ELSE t = first tuple in b;
}
/** return current tuple */
oldt = t;
Close() { set t to next tuple in block b;
/** nothing to be done */ RETURN oldt;
} }
Iterator for Select
R.A = “c”
GetNext() {
LOOP:
t = Child.GetNext();
IF (t == EOT) {
Open() {
/** no more tuples */
/** initialize child */
RETURN EOT;
Child.Open();
}
}
ELSE IF (t.A == “c”)
Close() { RETURN t;
/** inform child */ ENDLOOP:
Child.Close(); }
}
Iterator for Sort
R.A
GetNext() {
IF (more tuples)
RETURN next tuple in order;
ELSE RETURN EOT;
}
Open() {
/** Bulk of the work is here */
Child.Open(); Close() {
Read all tuples from Child /** inform child */
and sort them Child.Close();
} }
Iterator for Tuple Nested Loop Join
Lexp Rexp
• TNLJ (conceptually)
for each r  Lexp do
for each s  Rexp do
if Lexp.C = Rexp.C, output r,s
Example 1: Left-Deep Plan
TNLJ
TableScan
TNLJ
R3(C,D)
TableScan TableScan
R1(A,B) R2(B,C)
Question: What is the sequence of getNext() calls?
Example 2: Right-Deep Plan
TNLJ
TableScan TNLJ
R3(C,D)
TableScan TableScan
R1(A,B) R2(B,C)
Question: What is the sequence of getNext() calls?

Example
Worked on blackboard
Cost Measure for a Physical Plan
• There are many cost measures
– Time to completion
– Number of I/Os (we will see a lot of this)
– Number of getNext() calls
• Tradeoff: Simplicity of estimation Vs.
Accurate estimation of performance as
seen by user
Query Processing - In class order
SQL query
parse 2; 16.1
parse tree
First: A quick look at this
Query rewriting 3; 16.2,16.3
statistics logical query plan
Physical plan generation 4; 16.4—16.7
physical query plan

execute 1; 13, 15
result
Textbook outline
Chapter 15
15.1 Physical operators
- Scan, Sort (Ch. 11.4), Indexes (Ch. 13)
15.2-15.6 Implementing operators +
estimating their cost
15.8 Buffer Management
15.9 Parallel Processing
Textbook outline (contd.)
Chapter 16
16.1 Parsing
16.2 Algebraic laws
16.3 Parse tree  logical query plan
16.4 Estimating result sizes
16.5-16.7 Cost based optimization
Background Material
Chapter 5 Relational Algebra
Chapter 6 SQL
Why do we need Query Rewriting?
• Pruning the HUGE space of physical plans
– Eliminating redundant conditions/operators
– Rules that will improve performance with very
high probability
• Preprocessing
– Getting queries into a form that we know how
to handle best
 Reduces optimization time drastically without

noticeably affecting quality
Some Query Rewrite Rules
• Transform one logical plan into another
– Do not use statistics
• Equivalences in relational algebra
• Push-down predicates
• Do projects early
• Avoid cross-products if possible
Equivalences in Relational Algebra
R S= S R Commutativity
(R S) T=R (S T) Associativity
Also holds for: Cross Products, Union, Intersection

RxS=SxR
(R x S) x T = R x (S x T)
RUS=SUR
R U (S U T) = (R U S) U T
Apply Rewrite Rule (1)
B,D B,D
R.C = S.C
R.A = “c” Λ R.C = S.C
R.A = “c”
X
X
R S
R S
B,D [ R.C=S.C [R.A=“c”(R X S)]]

Rules: Project
Let: X = set of attributes
Y = set of attributes
XY = X U Y
xy (R) = x [y (R)]

Rules:  + combined
Let p = predicate with only R attribs
q = predicate with only S attribs
m = predicate with only R,S attribs
 p (R S) = 
[ p (R)] S
 q (R S) = R [ q (S)]
B,D B,D
R.C = S.C R.C = S.C
R.A = “c” X
R.A = “c” S
X
R S R
B,D [ R.C=S.C [R.A=“c”(R)] X S]
B,D
B,D
R.C = S.C
Natural join
X R.A = “c” S
R.A = “c” S
R
R
B,D [[R.A=“c”(R)] S]
Rules:  + combined (continued)
pq (R S) = [p (R)] [q (S)]

pqm (R S) =
m [(p R) (q S)]
pvq (R S) =
[(p R) S] U [R (q S)]
Which are “good” transformations?
p1p2 (R)  p1 [p2 (R)]

p (R S)  [p (R)] S
R S  S R
x [p (R)]  x {p [xz (R)]}

Conventional wisdom: do projects early
Example: R(A,B,C,D,E)
P: (A=3)  (B=“cat”)
E {p (R)} vs. E {p{ABE(R)}}

But: What if we have A, B indexes?
B = “cat” A=3
Intersect pointers to get

pointers to matching tuples
Bottom line:
• No transformation is always good

• Some are usually good:
– Push selections down
– Avoid cross-products if possible
– Subqueries  Joins
Avoid Cross Products (if possible)
Select B,D
From R,S,T,U
Where R.A = S.B 
R.C=T.C  R.D = U.D
• Which join trees avoid cross-products?

• If you can't avoid cross products, perform
them as late as possible
More Query Rewrite Rules
• Transform one logical plan into another
– Do not use statistics
• Equivalences in relational algebra
• Push-down predicates
• Do projects early
• Avoid cross-products if possible
• Use left-deep trees
• Subqueries  Joins (we will study this rewrite
rule after we do physical plan selection)
• Use of constraints, e.g., uniqueness

Notes 3

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Notes 3

Caricato da

Copyright:

Formati disponibili

CPS216: Advanced Database

Notes 03:Query Processing

Physical plan generation

physical query plan

Query rewriting Logical plan

Physical plan generation “Best” logical plan

We will revisit it towards the end of this lecture

Physical plan generation

Best physical query plan

Natural join Hash join

Best physical query plan

Logical Query Plan

Pick minimum cost one

Natural join Hash join

Materialized here B,D

Question: What is the sequence of getNext() calls?

Physical plan generation 4; 16.4—16.7

physical query plan

 Reduces optimization time drastically without

Also holds for: Cross Products, Union, Intersection

B,D [ R.C=S.C [R.A=“c”(R X S)]]

xy (R) = x [y (R)]

pq (R S) = [p (R)] [q (S)]

p1p2 (R)  p1 [p2 (R)]

x [p (R)]  x {p [xz (R)]}

E {p (R)} vs. E {p{ABE(R)}}

Intersect pointers to get

• No transformation is always good

• Which join trees avoid cross-products?

Potrebbero piacerti anche