Sei sulla pagina 1di 35

ICS 424 - 01 (072) Query Processing and

Optimization
1
Chapter 15

Algorithms for Query Processing
and Optimization

ICS 424 Advanced Database Systems

Dr. Muhammad Shafique
ICS 424 - 01 (072) Query Processing and
Optimization
2
Outline
Introduction
Processing a query
SQL queries and relational algebra
Implementing basic query operations
Heuristics-based query optimization
Overview of query optimization in Oracle
ICS 424 - 01 (072) Query Processing and
Optimization
3
Material Covered from Chapter 15
Pages 537, 538, 539
Section 15.1
Section 15.2
Section 15.6
Section 15.7
Section 15.9
ICS 424 - 01 (072) Query Processing and
Optimization
4
Introduction to Query Processing
Query optimization
The process of choosing a suitable execution strategy
for processing a query.
Two internal representations of a query:
Query Tree
Query Graph
ICS 424 - 01 (072) Query Processing and
Optimization
5
Background Review
DDL compiler
DML compiler
Runtime
database
processor
System catalog
ICS 424 - 01 (072) Query Processing and
Optimization
6
Processing a Query
Tasks in processing a high-level query
1. Scanner scans the query and identifies the language tokens
2. Parser checks syntax of the query
3. The query is validated by checking that all attribute names and
relation names are valid
4. An intermediate internal representation for the query is created
(query tree or query graph)
5. Query execution strategy is developed
6. Query optimizer produces an execution plan
7. Code generator generates the object code
8. Runtime database processor executes the code
Query processing and query optimization
ICS 424 - 01 (072) Query Processing and
Optimization
7
Processing a Query
Typical steps in processing a high-level query
1. Query in a high-level query language like SQL
2. Scanning, parsing, and validation
3. Intermediate-form of query like query tree
4. Query optimizer
5. Execution plan
6. Query code generator
7. Object-code for the query
8. Run-time database processor
9. Results of query
ICS 424 - 01 (072) Query Processing and
Optimization
8
ICS 424 - 01 (072) Query Processing and
Optimization
9
SQL Queries and Relational Algebra
SQL query is translated into an equivalent extended
relational algebra expression --- represented as a query tree
In order to transform a given query into a query tree, the
query is decomposed into query blocks
Query block:
The basic unit that can be translated into the algebraic operators and
optimized.
A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if these are
part of the block.
The query optimizer chooses an execution plan for each
block
ICS 424 - 01 (072) Query Processing and
Optimization
10
COMPANY Relational Database Schema (1)
ICS 424 - 01 (072) Query Processing and
Optimization
11
COMPANY Relational Database Schema (2)
ICS 424 - 01 (072) Query Processing and
Optimization
12
SQL Queries and Relational Algebra (1)
Example
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX(Salary)
FROM EMPLOYEE
WHERE Dno = 5 )
Inner block and outer block

ICS 424 - 01 (072) Query Processing and
Optimization
13
Translating SQL Queries into Relational Algebra
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > C

LNAME, FNAME
(
SALARY>C
(EMPLOYEE))

MAX SALARY
(
DNO=5
(EMPLOYEE))
ICS 424 - 01 (072) Query Processing and
Optimization
14
SQL Queries and Relational Algebra (2)
Uncorrelated nested queries Vs Correlated nested queries
Example
Retrieve the name of each employee who works on all the projects
controlled by department number 5.

SELECT FNAME, LNAME
FROM EMPLOYEE
WHERE ( (SELECT PNO
FROM WORKS_ON
WHERE SSN=ESSN)
CONTAINS
(SELECT PNUMBER
FROM PROJECT
WHERE DNUM=5) )

ICS 424 - 01 (072) Query Processing and
Optimization
15
SQL Queries and Relational Algebra (3)
Example
For every project located in Stafford, retrieve the project number,
the controlling department number and the department managers
last name, address and birthdate.
SQL query:
SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=STAFFORD;
Relation algebra:

PNUMBER, DNUM, LNAME, ADDRESS, BDATE



(((
PLOCATION=STAFFORD
(PROJECT))

DNUM=DNUMBER
(DEPARTMENT))
MGRSSN=SSN
(EMPLOYEE))



ICS 424 - 01 (072) Query Processing and
Optimization
16
SQL Queries and Relational Algebra (4)
ICS 424 - 01 (072) Query Processing and
Optimization
17
Implementing Basic Query Operations
An RDBMS must provide implementation(s) for
all the required operations including relational
operators and more
External sorting
Sort-merge strategy
Sorting phase
Number of file blocks (b)
Number of available buffers (n
B
)
Runs --- (b / n
B
)
Merging phase --- passes
Degree of merging --- the number of runs that are merged
together in each pass

ICS 424 - 01 (072) Query Processing and
Optimization
18
Algorithms for External Sorting (1)
External sorting:
Refers to sorting algorithms that are suitable for large files
of records stored on disk that do not fit entirely in main
memory, such as most database files.
Sort-Merge strategy:
Starts by sorting small subfiles (runs) of the main file and
then merges the sorted runs, creating larger sorted subfiles
that are merged in turn.
ICS 424 - 01 (072) Query Processing and
Optimization
19
Algorithms
for
External
Sorting (2)
ICS 424 - 01 (072) Query Processing and
Optimization
20
Algorithms for External Sorting (3)
Analysis
Number of file blocks = b
Number of initial runs = n
R

Available buffer space = n
B

Sorting phase: n
R
= (b/n
B
)
Degree of merging: d
M
= Min (n
B
-1, n
R
);
Number of passes: n
P
= (log
dM
(n
R
))

Number of block accesses: (2 * b) + (2 * b * (log
dM
(n
R
)))
Example done in the class
ICS 424 - 01 (072) Query Processing and
Optimization
21
Implementing Basic Query Operations (cont.)
Estimates of selectivity
Selectivity is the ratio of the number of tuples that satisfy the
condition to the total number of tuples in the relation.
SELECT ( ) operator implementation
1. Linear search
2. Binary search
3. Using a primary index (or hash key)
4. Using primary index to retrieve multiple records
5. Using clustering index to retrieve multiple records
6. Using a secondary index on an equality comparison
7. Conjunctive selection using an individual index
8. Conjunctive selection using a composite index
9. Conjunctive selection by intersection of record pointers
ICS 424 - 01 (072) Query Processing and
Optimization
22
Implementing Basic Query Operations (cont.)
JOIN operator implementation
1. Nested-loop join
2. Sort-merge join
3. Hash join
Partition Hash join
Hybrid hash join
PROJECT operator implementation
Set operator implementation
Implementing Aggregate operators/functions
Implementing OUTER JOIN
ICS 424 - 01 (072) Query Processing and
Optimization
23
ICS 424 - 01 (072) Query Processing and
Optimization
24
Buffer Space and Join performance
In the nested-loop join, it makes a difference which file is chosen for
the outer loop and which for the inner loop. If EMPLOYEE is used for
the outer loop, each block of EMPLOYEE is read once, and the entire
DEPARTMENT file (each of its blocks) is read once for each time we
read in ( n
B
- 2) blocks of the EMPLOYEE file. We get the following:

Total number of blocks accessed for outer file = b
E

Number of times ( n
B
- 2) blocks of outer file are loaded = b
E
/ n
B
2
Total number of blocks accessed for inner file = b
D
* b
E
/ n
B
2

Hence, we get the following total number of block accesses:

b
E
+ ( b
E
/ n
B
2 * b
D
) = 2000 + ( (2000/5) * 10) = 6000 blocks

On the other hand, if we use the DEPARTMENT records in the outer
loop, by symmetry we get the following total number of block
accesses:

b
D
+ ( b
D
/ n
B
2 * b
E
) = 10 + ((10/5) * 2000) = 4010 blocks

ICS 424 - 01 (072) Query Processing and
Optimization
25
Implementing Basic Query Operations (cont.)
Combining operations using pipelining
Temporary files based processing
Pipelining or stream-based processing
Example: consider the execution of the following query


list of attributes
( (
c1
(R) (
c2
(S))
ICS 424 - 01 (072) Query Processing and
Optimization
26
General Transformation Rules for
Relational Algebra Operations
1. Cascade of : A conjunctive selection condition can be
broken up into a cascade (that is, a sequence) of
individual operations:

C1 AND C2 AND .AND Cn
(R)
C1
(
C2
( (
Cn
(R)))
2. Commutativity of : The operation is commutative:

C1
(
C2
(R))
C2
(
C1
(R))
3. Cascade of : In a cascade (sequence) of operations, all
but the last one can be ignored
4. Commuting with : If the selection condition c
involves only those attributes A1, ..., An in the projection
list, the two operations can be commuted
And more
ICS 424 - 01 (072) Query Processing and
Optimization
27
Heuristic-Based Query Optimization
Outline of heuristic algebraic optimization algorithm
1. Break up SELECT operations with conjunctive conditions into a
cascade of SELECT operations
2. Using the commutativity of SELECT with other operations, move
each SELECT operation as far down the query tree as is permitted
by the attributes involved in the select condition
3. Using commutativity and associativity of binary operations,
rearrange the leaf nodes of the tree
4. Combine a CARTESIAN PRODUCT operation with a
subsequent SELECT operation in the tree into a JOIN operation,
if the condition represents a join condition
5. Using the cascading of PROJECT and the commuting of
PROJECT with other operations, break down and move lists of
projection attributes down the tree as far as possible by creating
new PROJECT operations as needed
6. Identify sub-trees that represent groups of operations that can be
executed by a single algorithm

ICS 424 - 01 (072) Query Processing and
Optimization
28
Heuristic-Based Query Optimization:
Example
Query
"Find the last names of employees born after 1957
who work on a project named Aquarius."

SQL
SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME=Aquarius AND PNUMBER=PNO
AND ESSN=SSN AND BDATE.1957-12-31;
ICS 424 - 01 (072) Query Processing and
Optimization
29
ICS 424 - 01 (072) Query Processing and
Optimization
30
ICS 424 - 01 (072) Query Processing and
Optimization
31
ICS 424 - 01 (072) Query Processing and
Optimization
32
ICS 424 - 01 (072) Query Processing and
Optimization
33
ICS 424 - 01 (072) Query Processing and
Optimization
34
Overview of Query Optimization in Oracle
Rule-based query optimization: the optimizer chooses execution plans
based on heuristically ranked operations.
May be phased out
Cost-based query optimization: the optimizer examines alternative access
paths and operator algorithms and chooses the execution plan with lowest
estimate cost.
The query cost is calculated based on the estimated usage of resources such as
I/O, CPU and memory needed.
Application developers could specify hints to the ORACLE query
optimizer.
application developer might know more information about the data.
SELECT /*+ ...hint... */ [rest of query]
SELECT /*+ index(t1 t1_abc) index(t2 t2_abc) */ COUNT(*)
FROM t1, t2
WHERE t1.col1 = t2.col1;

ICS 424 - 01 (072) Query Processing and
Optimization
35
Summary
Background review
Processing a query
SQL queries and relational algebra
Implementing basic query operations
Heuristics-based query optimization
Overview of query optimization in Oracle

Potrebbero piacerti anche