Sei sulla pagina 1di 129

TEC 1224

Understanding SAP Sybase IQ Query Execution


and Query Plans
Lou Stanton, Principal Solution Engineer
October 2012
Disclaimer

This presentation outlines our general product direction and should not be relied on in making a
purchase decision. This presentation is not subject to your license agreement or any other agreement
with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to
develop or release any functionality mentioned in this presentation. This presentation and SAP's
strategy and possible future developments are subject to change and may be changed by SAP at any
time for any reason without notice. This document is provided without a warranty of any kind, either
express or implied, including but not limited to, the implied warranties of merchantability, fitness for a
particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this
document, except if such damages were caused by SAP intentionally or grossly negligent.

2012 SAP AG. All rights reserved. 2


About Me

Lou Stanton, Principal Solution Engineer

Over 19 years with Sybase/SAP

17 years in the Business Intelligence/Data Warehousing arena

Work with customers and partners to install, configure and tune SAP Sybase IQ databases

2012 SAP AG. All rights reserved. 3


Takeaways from this presentation

1. An understanding how queries are executed in SAP Sybase IQ


2. How to access graphical query plans
3. How to analyze and understand query plans
4. Tuning your IQ server and database for better performance

2012 SAP AG. All rights reserved. 4


Query Processing
General Overview
Query execution phases

Upon submission of a query -

1. Syntax and permissions are checked

2. Query is parsed

3. Query is optimized

4. Query Plan is created

5. Query is executed

6. Resources are cleaned up

2012 SAP AG. All rights reserved. 6


Query execution what happens where
Server Front End (SA)
Connection Management Query Plan
Parse Incoming Statement
Security Checking
Cross-DB Decomposition (CIS) IQ Query Optimizer
Stored Procedures Predicate Inference
SQL User Defined Functions Predicate Selectivity Estimation
Join Optimization IQ Run-Time Engine
Grouping Algorithm Selection Prefetch Manager
Subquery Optimization Predicate Execution
Index Access Selection Tuple (Row) Projection
Join Execution
Grouping Execution
Sorting
2012 SAP AG. All rights reserved. Subquery Execution 7
Typical query processing
Network Clients
SQL Query

SQL Anywhere (SA)


Server connections
SA Database security
prod.db R SQL parsing
e SA Bridge more

s IQ Query Engine
IQ Engine
u Optimization
Query execution
l Data access
t
IQ Temp Store
s IQ Main Store
2012 SAP AG. All rights reserved. 8
SQL Anywhere (SA) and SAP Sybase IQ query
execution
SA gets more involved when a query involves

Stored procedures
Procedure code is retrieved from system tables for execution in the IQ Query engine
Stored procs are recompiled and optimized for each execution

Cursors
Invokes row-by-row processing between SA and the IQ query engine

User Defined Functions*


Invokes row-by-row processing between SA and the IQ query engine

* Does NOT apply to Partner UDF libraries added the SAP Sybase IQ libraries

2012 SAP AG. All rights reserved. 9


Cursors
All Cursors are executed in the SA part of the engine
A row from a cursor in an IQ query will be passed to SA for processing - one row at a time

For performance reasons avoid using Cursors when many rows need to be processed
Try to leverage the IQ engine whenever possible
Use Set Logic or Case statements rather than Cursors

For Example:
Use a series of Update commands with a Where clause to modify values in specific rows

2012 SAP AG. All rights reserved. 10


SQL user defined functions (UDFS)

Like cursors UDFs handled in SA and processed row-by-row


And like cursors should be avoided with large result sets

Exceptions: (Not subject to row-by-row processing)


IQ system functions are processed entirely within the IQ engine
Partner analytical UDF libraries (IQ 15.x) are catalogued in SA but are processed entirely within the IQ engine

2012 SAP AG. All rights reserved. 11


Queries that can invoke SA
Network Clients
SQL Query

SA Cursor and SQL UDF Processing


prod.db
R IQ sends a row to SA for processing
SA Bridge
e
Row is returned to IQ
s
IQ Engine
u Repeat until all rows are processed
This may take some time
l
t Final result set is sent back to client

s
Temp Store Main Store
2012 SAP AG. All rights reserved. 12
Query parallelism in IQ 15

15.2 Introduced additional SMP query parallelism in queries


Using many threads on the same server to resolve a query
Many (not all) query operators could invoke parallelism

15.3 Added distributed query parallelism across others servers in IQ Multiplex (PlexQ)
Also added additional query operators eligible for parallel execution

Note: Query operators refer to Joins, Group By, Union, etc

2012 SAP AG. All rights reserved. 13


Query parallelism

Query operations now performed in parallel


Most table join operations
Group By
Sorting (Order By and Merge Joins)
Predicate execution

HTML Query Plans illustrate parallel operations and the degree of parallelism
Introduced later in this presentation

2012 SAP AG. All rights reserved. 14


Dynamic parallelism

SAP Sybase IQ maintains the ability to support many concurrent users but can take
advantage of an idle server
A big query running alone might use many or all cores

The Query engine dynamically adapts to changes in server activity by increasing or


decreasing parallelism for individual queries
As other users launch queries during a big query IQ gracefully scales back big query resources

2012 SAP AG. All rights reserved. 15


Delayed Projection in SAP Sybase IQ 15
Delayed Projection refers to the operation of decoding optimized FP indexes
(aka tokens) to the actual data values during query execution

SAP Sybase IQ 12.x - Decoding occurred immediately in the leaf node


Actual data values used for all query operations

SAP Sybase IQ 15.x - Decoding is delayed until actual data values are absolutely required
Tokens are manipulated rather than data values reducing memory usage

2012 SAP AG. All rights reserved. 16


Tokenization (review)
Many column data values can be stored as tokens
Commonly known as Optimized FP indexes (FP1, FP2, FP3)
Enabled using Minimize_Storage = ON or IQ Unique() with Create Table

Tokens are 1 byte, 2 bytes or 3 bytes


Actual column values are stored as Lookup Tables in the database (aka the FP Lookup Page)
Lookup tables are automatically maintained by the IQ server
Token value is stored as the row value rather than the actual value

2012 SAP AG. All rights reserved. 17


SAP Sybase IQ 15 - Delayed projection

Delaying decoding of token values in queries


Tokens are used in query operations rather than actual values
SORT operations, in particular

Using tokens reduces memory requirements for these operations (often significantly)

Query Plans show which value was used


FPORDINAL = token
FPVALUE = actual data value

2012 SAP AG. All rights reserved. 18


Sort Using Tokens Reduces Memory
Query run against a 600 million row table

Select top 100 l_orderkey, sum(l_quantity), max(l_shipdate), count(*)


from lineitem
Group By l_orderkey
having sum(l_quantity) > 300
Query has a big sort operation

25% faster using delayed projection


Temp Space Used:
SAP Sybase IQ 12.7 - 9.2 GB
SAP Sybase IQ 15.x - 4.6 GB
2012 SAP AG. All rights reserved. 19
Query Plans Overview
Database options for server side and client side plans
SAP Sybase IQ 15 query plans

Query performance changes in SAP Sybase IQ 15 are highly visible in query plans and show:
Operations performed in parallel
Thread and CPU utilization
Details in query timing

2012 SAP AG. All rights reserved. 21


Why are query plans important?
Query plans provide important details about how a query was executed

Consider using a query plan when ...


You suspect (or know) a query is running poorly
Query plans show you how the optimizer processed the query
You may be able to identify a problem in the plan
You are not sure you have the correct indexes for a query
Query plans tell you which indexes are being used
When enabled, the IQ Index Advisor will provide suggestions in plans

2012 SAP AG. All rights reserved. 22


How do you access a query plan?

In the IQ Message File


As text embedded in the IQ Message Log

Server side - as an HTML file


Created server side

Client Side as a Graphical Plan


Graphical image (in the Interactive SQL client)
As HTML plan
As an XML plan

2012 SAP AG. All rights reserved. 23


Query plan database options
Query_Plan = On (default On)
Creates a very basic query plan in the IQ Message File
Additional query plan options are required for detailed analysis

Query_Detail = On (default Off)


Adds important details to query plans
Should always be ON when analyzing queries

Index_Advisor = On (default = Off)


Adds index advice and design recommendations to plans

2012 SAP AG. All rights reserved. 24


Query plan database options (cont.)

Query_Plan_After_Run = On (default = Off)


Delays creating a query plan until query completes
Useful for analyzing optimizer decisions and timings for each operation

Query_Timing = On (default = Off)


Creates query timing histogram (on graphical plans)
Shows thread counts and cpu utilization
Must be used with the Query_Plan_After_Run = On option

2012 SAP AG. All rights reserved. 25


Server side graphical query plan database options

Use these options for creating graphical (html) query plans on the IQ Server

Query_Plan_As_HTML = On (default Off)


Creates a graphical html query plan
File is written to the directory where the IQ Message Log resides

Query_Plan_As_HTML_Directory = <dir_name>
Sets a server side directory for HTML query plans
This is optional but recommended

2012 SAP AG. All rights reserved. 26


Other database options for query plans

Query_Name = query_name (default = )


Prints query_name in the query plan and as part of the
file name of HTML query plans

NoExec = On (default = Off)


Creates query plan but does not execute the query

See Appendix 3 for database options to access client side graphical query plans

2012 SAP AG. All rights reserved. 27


HTML Query Plans
HTML query plan layout the big picture

Query Text
Query Tree

Node Detail
Timing Histogram

Threads
CPU Utilization

2012 SAP AG. All rights reserved.


Wall Clock 29
Query tree
A visual representation of a query plan
Consists of nodes representing query operations
Nodes are color coded and numbered

The query tree is displayed inverted


The top node in the tree is the Root Node
Data flow
Node connections show data flow up the tree
Node Connections show row counts between nodes

Leaf nodes represent tables


Typically at bottom or edges

2012 SAP AG. All rights reserved. 30


Query tree basics

Node Number
Node Type

Number of rows
flowing up the tree

Highlighted node indicating


Node Number (underlined) is a hot link to the
Index Advice in that node
details below in the Query Detail
2012 SAP AG. All rights reserved. 31
Query tree enhancements in SAP Sybase IQ 15
New Parallel Operator
Double lines || connecting nodes indicate parallelism (SMP)
Node depth (3D) shows relative number of threads used
Node connector line width is relative to number of rows flowing between nodes
Tooltips
Hovering the mouse over parts of the plan reveals notes
Node containing Index Advice is highlighted in red

2012 SAP AG. All rights reserved. 32


IQ 15.2 query tree (SMP)
Parallel Dataflow
Node Connectors
Threads Used
Tooltips (hover mouse pointer) Connector width varies
with row count

Double bar Tooltip for


Indicates estimated
parallel row count
dataflow Tooltip shows
max thread count
in nodes

Node depth
varies with
2012 SAP AG. All rights reserved. thread count 33
Query Timing Histogram and Wall Time
Depicts time expended at each node

Wall clock

2012 SAP AG. All rights reserved. 34


Thread Utilization
Displays threads used over the life of the query
Thread usage corresponds to timing histogram above

2012 SAP AG. All rights reserved. 35


Query Text

2012 SAP AG. All rights reserved. 36


Query Detail
Query Detail for each Node in the Query Tree

2012 SAP AG. All rights reserved. 37


Distributed Queries
Query Plans for IQ Multiplex PlexQ plans
SAP Sybase IQ Multiplex distributed queries

In IQ Multiplex environments some query operations can be distributed to other nodes (PlexQ)
Not all queries will qualify for distribution
The node where the query is submitted is the Leader node
Other nodes used with distributed processing are considered Worker nodes

Query plans for distributed queries reflect the work performed on all participating nodes
A query plan fragment is created for each worker node

2012 SAP AG. All rights reserved. 39


Distributed Query Plan

Mouse over shows rows


processed by worker node
Triple bar indicates
distributed processing

Distributed Queries create multiple query


plans one for each distributed fragment
Not
Distributed

2012 SAP AG. All rights reserved. 40


Distributed query plan fragments
This query has 4 distributed fragments (5 plans created)

Fragment 4
Fragment 1 Sort Merge Join
Sort operation

Fragment 2 Fragment 3
Sort operation Sort Merge Join

2012 SAP AG. All rights reserved. 41


Distributed query plans
One primary query plan and one plan for each fragment each is a different html file

Primary Plan Fragment 1

Fragment 2

Fragment 3

Fragment 4

Query text is displayed


only in the primary plan
2012 SAP AG. All rights reserved. 42
Timing & Threads in distributed queries
This is the Primary
Query Plan
Displays all node timings
Tool tips (hover mouse)
Timings
Threads Used
Node #3 (on Leader)
Used 9 threads
Wall time

Node #101 (distributed)


Node #92 on Server2 (S:2)
Used 2 threads
Wall time

2012 SAP AG. All rights reserved. 43


Query Node Details
Close look at important nodes in query plans
Query plan node details

A node represents one operation of a query execution

General node information


Node Type
Parent and Child Node(s)
Node Timing (when Query Timing option is On)
Columns projected (output columns) from the node

Node details otherwise vary by the type of node

2012 SAP AG. All rights reserved. 45


Details common to all nodes

Header shows node number and type


Hot link back to node in the Query Tree

Parent Node
Node that called this node
Link to that Node

Child Node(s) (if any)


Node(s) that feeds this Node

Result Rows
Generated: Actual rows from this node
Estimate: Optimizers estimate of rows from this node

2012 SAP AG. All rights reserved. 46


Node details - Outputs
Output Vector
Number of columns (entries) and total data width output (in bytes) from the node

Output Details
Data type and number of distinct values (Base Distinct) for each output column
Leaf Node outputs provide additional details including index type and index advice (if enabled)

2012 SAP AG. All rights reserved. 47


Node types (there are others)

Root Leaf

Scrolling Cursor Store Join (6 types)

Filter Subquery

Group By (3 types) Union All

Order By Parallel Combiner

Store Parallel Order By

Semi-join Filter Filler

2012 SAP AG. All rights reserved. 48


Nodes to Concentrate On

Some Nodes contain a wealth of information


Detail in other nodes is not as important

Start with these Nodes when analyzing queries:


Root
Leaf
Join

2012 SAP AG. All rights reserved. 49


Root Node

The Root node is at the top of the (inverted) query tree

Provides general information and timings for the entire


query

2012 SAP AG. All rights reserved. 50


Root Node

Root Node Details


Result Rows (actual and estimated)

User Name (login)

IQ Temp Space Used (total)

Active users on server

Number of CPUs

Cache sizes

Threads used for all invariant predicates

2012 SAP AG. All rights reserved. 51


Examining the Root node

Temp Space Usage


Estimate total usage over the life of the query
Maximum at any point in time may be lower

Effective Number of Users


The count of active cursors (IQ queries running)

Number of CPUs
The number of CPUs the IQ engines believes are available
With hyper-threaded CPUs or -iqnumbercpus server parameter is set the number of cpus may not match actual
number on the server

Optimizer may choose a different Query Plan depending upon


number of CPUs, IQ cache sizes and Active Users

2012 SAP AG. All rights reserved. 52


Root node attributes

Cache sizes
A reminder how large the caches are

Threads used for invariant predicates *


Total for all invariant predicates in all tables

Database Options
Options set to non-default values affecting query performance or behavior are displayed with their
current value

* Invariant predicate
A predicate with a constant value(s) throughout the life of the query
Example: customer_state IN (CA, NV, DC)

2012 SAP AG. All rights reserved. 53


Query Timings in the Root node

Overall query timing is listed in the Root node


Query_Plan_After_Run and Query_Timing options must be ON to report timings
Total time to resolve all predicates is listed as Elapsed Condition time

2012 SAP AG. All rights reserved. 54


Root node - Outputs

Identifies all column output details sent to the client

2012 SAP AG. All rights reserved. 55


Scrolling Cursor Store node

Scrolling Cursor Store node usually found just below the Root Node
Affected by the database option Force_No_Scroll_Cursors (default ON)

The store buffers all rows from a query result set


Allows scrolling through results (forward and backwards)
i.e. Scrollable Cursor

Consumes Temp Cache and Temp Store (for large results)


Typically this option is set OFF for better query performance
Some query client and ETL tools may require this option be set ON

2012 SAP AG. All rights reserved. 56


Temp memory usage with Scrolling Cursors

Force_No_Scroll_Cursors = OFF Force_No_Scroll_Cursors = ON

2012 SAP AG. All rights reserved. 57


Leaf node - represents access to a table in the IQ
Store
All table indexes and columns used in the query are identified and accessed in the Leaf node

Types of Leaf nodes:

(Regular) Leaf
Aggregation Leaf
Grouping Leaf
Distincting Leaf
Ordered Leaf
Proxy Leaf (Proxy Table)
SA Leaf (table in the SA database)

2012 SAP AG. All rights reserved. 58


Leaf node a wealth of information

All column metadata needed by the optimizer is identified in the Leaf node

Row Counts
Total rows in the table
Estimated count after executing all conditions
Generated count after executing all conditions

Conditions (predicates) in the table showing:


Selectivity
Usefulness
Index used to resolve the predicate
Index Advisor Messages
Predicate execution phase

2012 SAP AG. All rights reserved. 59


Leaf node details

Rows Counts
Total rows in the table
Generated rows after conditons (if any)

Conditions (Predicates)
Selectivity
Usefulness
Elapsed time (to execute)
Rows remaining (post execution)
Index used
Threads used

2012 SAP AG. All rights reserved. 60


Selectivity, Index Selection and Usefulness
For each predicate the optimizer determines

Selectivity
Portion (percentage) of the table rows that satisfy that predicate
HG, LF and FP indexes provide this metadata

Index used to resolve the predicate


When multiple indexes exist, the best index for the predicate
Cost and resources required influence index selection

Usefulness Score
The order the predicates have been executed
Execution phase, cost and resources required influence Usefulness scoring

2012 SAP AG. All rights reserved. 61


Estimated Selectivity
If no metadata exists the optimizer estimates selectivity based on the predicate operator

Percent of table
Predicate Type (Estimated) Selectivity
Equality (=) 20% 0.2000000
Open Range (>) 40% 0.4000000
Between 40% 0.4000000
Like (%) 20% 0.2000000
Inter-column equality (t.a = t.b) 30% 0.3000000
Inter-column comparison (t.a < t.b) 50% 0.5000000

Selectivity with trailing zeroes may indicate an optimizer estimate

2012 SAP AG. All rights reserved. 62


Index-based selectivity

When a useable index exists it can provide accurate selectivity


Exact selectivity is shown as a precise number without trailing zeroes such as: 0.34982376

Note: Some functions can negate the ability to use an index for selectivity
Example: SUBSTRING( t.a, 5, 5 ) = homes
Optimizer might use default electivity
20% for an equality search ( .20000000)

2012 SAP AG. All rights reserved. 63


Usefulness
Usefulness determines predicate execution order

Usefulness value ranges: 0.0 to 10.0 (10.0 = most useful)


Predicate with highest score is executed first
Remaining predicates executed in descending order of usefulness score

Factors determining Usefulness scoring


Selectivity how many rows in the table will be eliminated by this predicate
Execution Phase
Type of predicate (operator)
Index available (including optimized FP index)
Speed of predicate execution
Resources required to execute the predicate

2012 SAP AG. All rights reserved. 64


Local predicate Execution Phase

Three categories of local predicates


Invariant
Delayed
Bound

Category is based upon when the data required for predicate execution is available

Predicate category usually dictates the phase (i.e. timing) when it is executed in the query

2012 SAP AG. All rights reserved. 65


Invariant predicates

Invariant: expression does not change during query execution

Most local predicates are invariant predicates


Example: WHERE cust_id = 12345

Invariant predicates are usually executed in an early phase to ensure accurate row count
information is available to subsequent phases of query optimization

2012 SAP AG. All rights reserved. 66


Delayed predicates

These are predicates that need to be executed once but require information that will become
available after another part of a query is executed:

Example: WHERE cust_id in


(Select cust_id from customer where state = VA)

The Where cust_id in predicate is delayed since the subquery must be executed first

Delayed predicates are typically found with uncorrelated subqueries (above) and some Push-Down joins

2012 SAP AG. All rights reserved. 67


Bound predicates

Bound predicates
Need to be executed repeatedly and require information that will only be available after some other portion of the
query has been executed

Bound predicates are commonly used in


Correlated subqueries
Nested-Loop Push-Down joins

2012 SAP AG. All rights reserved. 68


Predicate execution phases

Local predicates are typically executed in this order -


1. Resolve all invariant predicates
2. Execute Delayed predicates
3. Execute Bound predicates

Usefulness scores are related to a predicate execution phase

Note: Execution phase can be user influenced by using Optimizer Hints written into query code

2012 SAP AG. All rights reserved. 69


Query operators and indexes
Predicates typically use these indexes with these operators

Equality and Inequality( = , != ), IN and NOT IN lists LF and HG

Ranges ( <, >, <=, >=, Between) including NOT Between DATE, HNG,
LF and HG

Like ( % ) or Not Like ( % ), Contains FP, WD,TEXT

2012 SAP AG. All rights reserved. 70


HG and LF indexes in predicates

The HG and LF indexes provide exact metadata to the optimizer:


The number of distinct values in the column and
The number of rows found for each value

These indexes also provide fast run-time access to the rows


Directly influences predicate execution cost and execution phase

2012 SAP AG. All rights reserved. 71


FP indexes in predicates

Includes FP(1), FP(2) and FP(3) indexes

Optimized FP indexes (tokens) also provide metadata:


Number of distinct values for a column
Number of rows for each value
Do not provide fast access to the rows (like HG and LF indexes)

Tokens dramatically reduce storage, disk I/O and memory


Used with sorts to dramatically reduce memory (IQ 15.x)

FP index searches can use multiple threads for predicate execution


Optimizer may prefer using FP index over other indexes since it can exploit many threads (and be executed faster)
IQ server work load at query time can influence this decision

2012 SAP AG. All rights reserved. 72


Multi-threaded FP index searches

Optimizer chooses multi-threaded FP index search to resolve a predicate

Example: C_MKTSEGMENT = Automobile


8 threads used

2012 SAP AG. All rights reserved. 73


Range predicates in Leaf nodes
Example shows range costing in a Leaf node (lowest cost usually selected)

Range searches are costed by the optimizer to determine which index is appropriate
Cost estimates are provided for different index types (even if they dont exist on the column)

Why wasnt the DATE Index used ?

2012 SAP AG. All rights reserved. 74


The answer lies in the column detail in the Leaf
node
The Leaf Node lists all the columns accessed for the query
Data type and indexes no DATE index on this column

With no DATE index the optimizer chose a multi-threaded FP search


The Index Advisor does NOT provide suggestions for DATE indexes for ranges (or DTTM, TIME or HNG indexes)

2012 SAP AG. All rights reserved. 75


Index Advice in leaf nodes
Query tree highlights Leaf nodes in red indicating Index Advice is available in the node

Database option Index_Advisor must be set ON


2012 SAP AG. All rights reserved. 76
Other Indexes in predicates

Except for special cases the other IQ indexes are used to locate values and do not provide
metadata
Only the LF, HG and FP indexes provide metadata

Some of the special cases are:


CONTAINS predicate with a WD or TEXT index
DATEPART function predicate on a column with a DATE index

2012 SAP AG. All rights reserved. 77


Other types of Leaf nodes
Under the right circumstances optimizer uses an Index Based algorithm to perform work
within a node

Examples where these algorithms are used:


Aggregation Leaf
SELECT COUNT(*) FROM ...
SELECT SUM(PAID_AMT) FROM

Grouping Leaf
SELECT MEMBER_GENDER, COUNT(*)
FROM
GROUP BY MEMBER_GENDER

Distincting Leaf
SELECT DISTINCT (shipdate)
FROM

2012 SAP AG. All rights reserved. 78


Leaf node wrap up

Leaf nodes provide a wealth of information about the data and indexes used in a query

List of all table predicates including:


Selectivity
Usefulness
Execution phase
Index used to resolve predicate

All columns referenced, including:


Available indexes
Data type
Distinct count

Index advice

2012 SAP AG. All rights reserved. 79


Join nodes

Join nodes merge a set of rows from one table with rows from another table passing all
combinations which satisfy the join condition(s)

For joins the optimizer must determine two things:


Join order (if more than 2 tables)
Join algorithm (6 types)

For these decisions the optimizer uses a cost-based process


Relies heavily on the metadata for the join columns
Metadata comes from indexes in Leaf nodes

2012 SAP AG. All rights reserved. 80


Metadata for join performance

Tips for providing the right metadata

Individual columns which are primary keys always declare as a PRIMARY KEY
(or create a UNIQUE HG index)
Candidate key columns create a UNIQUE HG index if used in joins

All other join columns create an HG (or LF) index

Multi-column primary keys where tables will be joined using all the keys create a multi-column
PRIMARY KEY (or a multi-column UNIQUE HG index)
If all keys are NOT used in a Join then a Primary Key (or multi-column Unique index) is not necessary
For example, the Primary Key for a Fact table would almost never be necessary (except to maintain entity
integrity)

2012 SAP AG. All rights reserved. 81


Join algorithms

SAP Sybase IQ has 6 different table join algorithms

Valid algorithms for a given join depend upon:


Type of join condition
Data types on the join columns (matched or mismatched)
Outer join (if so which kind)
Indexes on the join columns
Join constraint (Many to Many, One to Many)

Valid algorithms are compared against each other based on their estimated costs and available
server resources

2012 SAP AG. All rights reserved. 82


Types of join algorithms

SAP Sybase IQ Join algorithms and their abbreviations:

Nested Loop Join NLJ


Nested Loop Pushdown NLPD
Hash Join HJ
Hash Pushdown Join HPDJ
Sort Merge Join SMJ
Sort Merge Pushdown SMPDJ

Abbreviations are used in Join Nodes

2012 SAP AG. All rights reserved. 83


About Join Algorithms

Knowledge of the join algorithms, their strengths and weaknesses, may allow you to identify
ways you could change the query or the schema to improve query performance

Each join algorithms has different advantages:


Some are better for joining large tables to small tables
Some are better for joining two equally large tables
Some require large amounts of Temp Cache be available
Some require large amounts of Main Cache be available
Some are good only for very selective joins

2012 SAP AG. All rights reserved. 84


Classic Joins

These join algorithms are used in SAP Sybase IQ (as well as most relational databases):
Nested Loop
Hash
Sort-Merge

2012 SAP AG. All rights reserved. 85


Nested Loop Join
Take the smaller table keys
Store them in a row store
row store
For each row in the Big Table
Compare key against each row in the small table
Big table Return results that satisfy the join conditions

2012 SAP AG. All rights reserved. 86


Nested Loop join details

Advantages:
Used with any type of join condition, even LIKE join conditions (or with no join conditions at all)
Used with every type of OUTER JOIN

Disadvantages:
If small side has more than a couple rows (1 or 2) then this join may be very slow

Constraints:
Small side row store uses Temp Cache

2012 SAP AG. All rights reserved. 87


Hash Join (Classic Hash)
Take the smaller table
Store in a hash table based on the join keys
Hash

For each row in the Big Table


Probe into the hash table with its join keys
Return results that satisfy the join conditions
Big table

2012 SAP AG. All rights reserved. 88


Hash Join

Advantages:
Is fast, when appropriate
Used for any kind of INNER or OUTER JOIN

Disadvantages:
Requires at least one equality join condition where both join columns are identical data types
A data type mismatch of join columns will disqualify Hash joins (index advisor reports type mismatches)

Constraints:
The small side must fit within available and pinnable Temp Cache
Pinnable hash buffers in memory cannot be paged to swap

2012 SAP AG. All rights reserved. 89


Sort Merge Join

Sort the (join) keys from each table J3

Match up (merge) the keys

sort sort

T1 T2

2012 SAP AG. All rights reserved. 90


Sort Merge Join

Advantages:
Is fast and when it can be parallelized
Can be used for any kind of INNER or OUTER JOIN

Disadvantages:
Requires at least one equality join condition

Constraints:
Both join inputs must be sorted within the temp cache
Can consume large amounts of temp cache space
Modest performance penalty if sort set cannot remain entirely within Temp Cache

2012 SAP AG. All rights reserved. 91


Sort Merge Join in Query Tree

Sort-Merge Joins are preceded by a (child) Order By Node for sorting


Sorts can be performed in parallel (as in this case) or may be distributed (Multiplex)

Parallel Sort
Data flow

2012 SAP AG. All rights reserved. 92


Push Down joins (3 types)

These join algorithms are unique in SAP Sybase IQ

Pushdown joins reduce join work by filtering rows in one table using the keys of a smaller table
like an In List predicate
In cases of multi-table queries filtering may be performed several nodes below where the actual join occurs

When the optimizer uses a Push Down join, you will see an additional =, IN, or
PROBABLY_IN predicate in the Leaf Node being filtered

See Appendix 1 for details on Pushdown Joins

2012 SAP AG. All rights reserved. 93


Join Node

Joins always have 2 children

Result Rows after joining

Valid algorithms for this join

Hash Key Estimate

Optimization Notes

Table Inputs and row estimates

Join Constraint: Many to 1

2012 SAP AG. All rights reserved. 94


Join Performance Considerations
Ways to get the most efficient Join Algorithm and better query performance

Create Primary Keys (or Unique HG) on (most) tables


Create HG index on all components of composite keys that are actually used in Joins

Note the Join Constraint in Join Nodes


Many-to-Many indicates optimizer is unable recognize table relationship (if there is one)
Primary Key or Unique Index may be missing

Use Query_Plan_After_Run to check row counts


If estimate and actual result rows are way off check for missing indexes

Be sure join columns have the exact same data type


Data type mismatch disqualifies Hash joins
Examples: varchar(10) != char(10) and integer != smallint

2012 SAP AG. All rights reserved. 95


Group By Nodes

Merges a set of rows from one table into a smaller number of output rows
Each row represents collective information from one or more input rows

For a Group By the optimizer must determine two things:


The expected number of resulting groups
Which algorithm to use

To make these decisions the optimizer uses a cost-based process


Relies on available metadata for the grouping columns

2012 SAP AG. All rights reserved. 96


Group By algorithms

SAP Sybase IQ has 4 aggregation algorithms

Group By (Hash) for modest result sets


Uses a hash table in the Temp Cache

Group By (Sort) for large sets


Creates a sort object in the Temp Cache

Group By (Single)
1 result row, no Group By clause

Grouped & Aggregation Leafs


For modest result sets of single tables
To qualify all Group By columns must have an HG or LF index

2012 SAP AG. All rights reserved. 97


Tuning Tips for Joins and Group By

Indexing Techniques and Database Options Affecting Query Performance

Indexes provide critical metadata to the optimizer


Major influence on the choice of algorithm the optimizer selects

Hash algorithms can be faster than others


On servers with adequate memory you can improve chances the optimizer will use hashes by modifying some
options

2012 SAP AG. All rights reserved. 98


MetaData for Group By performance

Suggestions for getting better Group By performance

Group By (Hash) tends to be the fastest


Requires accurate metadata to determine the exact number of groups and if they can fit in Pinnable Cache*

Create HG / LF index on individual Group By columns


Provides distinct counts for grouping columns

For queries with many grouping columns


Create a multi-column HG index on group by keys from the same table
Composite indexes provide better result size estimates

* Pinnable cache cannot be paged


2012 SAP AG. All rights reserved. 99
Tuning Hash Joins and Group By hash

If you have > 4 GB of memory consider modifying database options for Hashes
Default values for these options are based on a server with 4 GB RAM

Improve chances of the optimizer using Hashes by modifying these Database Options:
Max_Hash_Rows - default value = 2,500,000 (keys)
Hash_Pinnable_Cache_Percent - default Value = 20 (percent)

2012 SAP AG. All rights reserved. 100


Max_Hash_Rows database option

Affecting Hash Joins and Group By (Hash)

Optimizer considers hash algorithms based upon available Temp Cache and the number of
hash keys expected
Number of hash keys is reported in Join nodes
If number of keys is less than Max_Hash_Rows option value a hash may be chosen

Max_Hash_Rows option is dynamically configurable


Can be set as temporary option

Increase this option value linearly based on your server memory


8 GB: Max_Hash_Rows = 5000000 (5 million)
64 GB Max_Hash_Rows = 40000000 (40 million)

2012 SAP AG. All rights reserved. 101


Hash_Pinnable_Cache_Percent option

Affects Hash Joins and Group By (Hash)

Sets the percentage of a users Temp Cache memory allocation that any one Hash object can
pin in memory
Dynamically configurable

If you increase Max_Hash_Rows then consider increasing the Hash_Pinnable_Cache_Percent


Suggest increasing option value in 5% increments
Recommended maximum* is 40 (percent)

* Depends on Temp Cache memory size and number of concurrent users

2012 SAP AG. All rights reserved. 102


Hash options IQ 15.4 ESD 1 and later

As 15.4 ESD1 (and beyond) the Max_Hash_Rows and Hash_Pinnable_Cache_Percent options


are no longer considered in the decision to use the Hash algorithm for Group By operations
There is no row limit consideration for Group Bys
Max_Hash_Rows and Hash_Pinnable_Cache_Percent can be left at default values

The optimizer evaluates the size and availability of the IQ Temp Cache to determine whether to
use hash Group By

Max_Hash_Rows is still considered for joins but should be used with caution
It may cause the optimizer to use hash joins with non-parallel components
Sort-Merge joins which can fully parallelized can be faster but might not be chosen when Max_Hash_Rows is high

2012 SAP AG. All rights reserved. 103


Individual user memory

This is a very common tuning error

IQ server parameter gm (max user connections) affects individual users memory allocation
The IQ memory manager sets aside memory to accommodate all possible connections based on gm
The higher the value of gm, the lower the users memory allocation

Recommendation: set gm to a reasonable value for user connections


Provides better user memory allocation

2012 SAP AG. All rights reserved. 104


Before tuning your IQ server using this knowledge

Have a plan before you start modifying database options and memory

Start with a baseline test


A series of queries or data loads

Make only one change at a time and rerun the tests


Monitor performance under peak loads

Make changes gradually


Increase option values in steps

Listen to users is performance better (or worse)?


Tell your boss what you learned here! (might gets you a ticket back next year?)

2012 SAP AG. All rights reserved. 105


Pop Quiz

Before we close

We covered some of the Join and Group By algorithms

Do you recall where Join and Group By algorithms are processed:


in the IQ Main Cache?
in the IQ Temp Cache?

Why does Sybase usually recommend the Temp Cache be larger than Main Cache?

2012 SAP AG. All rights reserved. 106


Query Operations and IQ Caches
Temp cache is where the action is!
What happens where in the caches

Hashes
Hash Joins (HJ, HPDJ)
Group By (Hash)
NLPD Joins Sorts
Sort Joins (SMJ, SMPDJ)
Correlated Subqueries Group By (Sort)
Order By
Nested Loop Joins (NLJ)

IQ IQ
Store Temp
2012 SAP AG. All rights reserved. Store 107
Questions

Thanks for your attention today!

This presentation is accompanied by three Appendicies


1. Pushdown Join algorithms
2. Influencing Joins using database options and optimizer hints
3. Client side graphical query plans

2012 SAP AG. All rights reserved. 108


Appendix I

SAP Sybase IQ pushdown join algorithms

2012 SAP AG. All rights reserved. 109


Nested Loop Pushdown

Reverse of Nested Loop Join

small For each Row in Small Table -


table
Probe the big table using a fast index (LF or HG)

big
table

2012 SAP AG. All rights reserved. 110


Nested Loop Pushdown

Advantages:
Is very fast, when appropriate
Can be used for INNER JOIN or RIGHT OUTER JOIN

Disadvantages:
The large side of the join must be a single table
Requires an equality join condition
Column from the large table needs HG or LF index

Constraints:
If small side is more than a few rows, then all projected columns from large side must fit within the Main Cache or
performance will be very slow

2012 SAP AG. All rights reserved. 111


2. Hash Pushdown Join

If the distinct count of T3.x is much smaller


This is an example with a 4 table join than the distinct count of T1.x
T3 is a small table or some T3 or T4 predicate can filter
J3 T1.x = T3.x T3 xT4 to small result set
Then J3 has a hash table keyed on T3.x when J2 is
completed

Create an artificial IN Clause on T1 (using the


hash table at J3) to filter out rows that cannot
J1 J2 satisfy T1.x = T3.x

T1 T2 T3 T4
By filtering T1 with keys from J3 there
is less work to be done in J1 and J3
2012 SAP AG. All rights reserved. 112
Hash Pushdown Join

Advantages:
Is very fast, when appropriate
Can be used for INNER JOIN or RIGHT OUTER JOIN

Disadvantages:
Requires equality join condition where both sides are identical data types
Expression from the large side must be column with an LF or HG index

Constraints:
The small side must fit within available Temp Cache

2012 SAP AG. All rights reserved. 113


3. Sort Merge Pushdown

Similar to the Hash Pushdown but in


J3 T1.x = T3.x this case there are too many rows from
J2 to store in a Classic Hash

J1 J2 Instead, compute a hash (stored as a


bit vector (BV) ) to report that 1 or more
rows in J2 have that hash value
T1 T2 T3 T4

Filtering T1 with keys from J3


2012 SAP AG. All rights reserved. 114
Sort Merge Pushdown
Bit vector wrapped within a Probably IN predicate
Push Down the Probably IN bit vector to T1 J3 T1.x = T3.x

For a given key in T1, if the hash value of that key is


NOT in the bit vector then we know that the row
CANNOT match at J3
If the keys hash value IS in the bit vector, then it MAY match
and we must pass it up to J1 and J3 J1 J2

T1 T2 T3 T4

2012 SAP AG. All rights reserved. 115


Sort Merge Pushdown Join

Advantages:
Can be used for INNER JOIN or RIGHT OUTER JOIN

Disadvantages:
Requires equality join condition
Both sides are identical data types
Expression from large side must be column with LF/HG index

Constraints:
Both join inputs passed through sorts in Temp Cache
Must be sufficient Temp cache space to pin the entire bit vector

2012 SAP AG. All rights reserved. 116


Appendix 2

Influencing table joins using database options and optimizer hints

2012 SAP AG. All rights reserved. 117


Methods to influence table joins

Two methods:
Database Options (usually set as a Temporary Option)
Optimizer Hints hard coded in SQL

Either method can be used to influence optimizer join decisions


Useful if coded within canned SQL scripts or stored procedures
For evaluating optimizer join choices

2012 SAP AG. All rights reserved. 118


Database options for joins

Influence the type of Join algorithm


Database Option - Join_Preference

Force the table join order


Database Option - Join_Optimization

2012 SAP AG. All rights reserved. 119


Join Preference
Default = 0 (Optimizer Decides) - Quotes are required for option value

Join_Preference Option Values


1 Prefer Sort/Merge -1 Avoid Sort-Merge

2 Prefer Nested Loop -2 Avoid Nested Loop

3 Prefer Nested Loop PD -3 Avoid Nested Loop PD

4 Prefer Hash -4 Avoid Hash

5 Prefer Hash PD -5 Avoid Hash PD

6 Prefer PreJoin -6 Avoid PreJoin


7 Prefer Sort Merge PD -7 Avoid Sort-Merge PD

You can only influence the join type from the list of valid join algorithms in the Query
Plan (join node), otherwise the option will be ignored by the optimizer
2012 SAP AG. All rights reserved. 120
Example: Join_Preference Option

Set Temporary Option Join_Preference = -2


Asking Optimizer to Avoid Nested Loop Joins
Option value must be quoted as shown

Use for problem solving and performance testing


Not helpful with query tools (BOBJ, etc.)

Use as a Temporary Option within a Query


Permanent setting of this option may cause performance problems with other queries

2012 SAP AG. All rights reserved. 121


Join optimization

Join_Optimization Default: ON

Switching option OFF parses table joins from left to right as specified in the From clause
May help an individual query but should only be used if you are certain the optimizer has misjudged the query
You may want to use parenthesis in the FROM clause to ensure you are getting the desired join order

This option has no effect on the selection of join algorithms


IQ Reference Manual (Statement & Options) has detailed description of this option and its effects

2012 SAP AG. All rights reserved. 122


Optimizer hints for join algorithms

One of the obvious problems using the Join_Preference option is the global scope and
potential to affect other joins in a query
Using the database option might help one join but could all other joins in the query

Optimizer hints were introduced to allow hard coding of join preference * in SQL code
More granular than using the join_preference option
These hints are useful for coding into canned SQL code and stored procedures

* Hints can also be used to suggest indexes, selectivity and execution phase. See the IQ reference Manual.

2012 SAP AG. All rights reserved. 123


Optimizer hint syntax

Simple equality join predicates can be tagged with a predicate hint allowing a join preference to
be specified for just that join
The following example requests a Hash Join:
Select
FROM customers c, orders o
WHERE (c.cust_key = o.cust_key , J:4 )

Syntax
Enclose the entire join clause and hint within parentheses
Comma precedes the J:_ hint
Use the same positive or negative numbers for the Join_Preference database option to specify the desired join type

2012 SAP AG. All rights reserved. 124


Appendix 3

Client side query plans

2012 SAP AG. All rights reserved. 125


Client side query plans

Use these Database Options and Functions to retrieve query plans at client workstations

Database Options
Query_Plan_Text_Access = ON (default = Off)
Allows access to query plans with dbisql (java) client

Query_Plan_Text_Caching = ON (default = Off)


Caches query plans for retrieval at clients

SQL Functions
HTML_Plan()
Returns an HTML query plan file to a client
Graphical_Plan()
Returns XML query plan file to a client

2012 SAP AG. All rights reserved. 126


Client side query plan in dbisql (IQ 15)

dbisql client can retrieve graphical Query Plans


With appropriate options set, run the query and get results
From the dbisql Menu Bar: Tools -> Plan Viewer
Click Get Plan

2012 SAP AG. All rights reserved. 127


dbisql graphical query plan

Query Tree Query Node Detail

Select a node in the


Query Tree to see
node details

2012 SAP AG. All rights reserved. 128


2012 SAP AG. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the Google App Engine, Google Apps, Google Checkout, Google Data API, Google Maps, Google Mobile Ads,
express permission of SAP AG. The information contained herein may be changed without prior notice. Google Mobile Updater, Google Mobile, Google Store, Google Sync, Google Updater, Google Voice,
Google Mail, Gmail, YouTube, Dalvik and Android are trademarks or registered trademarks of Google Inc.
Some software products marketed by SAP AG and its distributors contain proprietary software components
of other software vendors. INTERMEC is a registered trademark of Intermec Technologies Corporation.
Microsoft, Windows, Excel, Outlook, PowerPoint, Silverlight, and Visual Studio are registered trademarks of Wi-Fi is a registered trademark of Wi-Fi Alliance.
Microsoft Corporation.
Bluetooth is a registered trademark of Bluetooth SIG Inc.
IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z,
System z10, z10, z/VM, z/OS, OS/390, zEnterprise, PowerVM, Power Architecture, Power Systems, Motorola is a registered trademark of Motorola Trademark Holdings LLC.
POWER7, POWER6+, POWER6, POWER, PowerHA, pureScale, PowerPC, BladeCenter, System Storage, Computop is a registered trademark of Computop Wirtschaftsinformatik GmbH.
Storwize,
XIV, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, AIX, Intelligent Miner, WebSphere, SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork,
Tivoli, Informix, and Smarter Planet are trademarks or registered trademarks of IBM Corporation. SAP HANA, and other SAP products and services mentioned herein as well as their respective logos are
trademarks or registered trademarks of SAP AG in Germany and other countries.
Linux is the registered trademark of Linus Torvalds in the United States and other countries.
Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions,
Adobe, the Adobe logo, Acrobat, PostScript, and Reader are trademarks or registered trademarks of Adobe Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as
Systems Incorporated in the United States and other countries. their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business
Oracle and Java are registered trademarks of Oracle and its affiliates. Objects is an SAP company.

UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and
services mentioned herein as well as their respective logos are trademarks or registered trademarks of
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or Sybase Inc. Sybase is an SAP company.
registered trademarks of Citrix Systems Inc.
Crossgate, m@gic EDDY, B2B 360, and B2B 360 Services are registered trademarks of Crossgate AG
HTML, XML, XHTML, and W3C are trademarks or registered trademarks of W3C, World Wide Web in Germany and other countries. Crossgate is an SAP company.
Consortium, Massachusetts Institute of Technology.
All other product and service names mentioned are the trademarks of their respective companies. Data
Apple, App Store, iBooks, iPad, iPhone, iPhoto, iPod, iTunes, Multi-Touch, Objective-C, Retina, Safari, Siri, contained in this document serves informational purposes only. National product specifications may vary.
and Xcode are trademarks or registered trademarks of Apple Inc.
The information in this document is proprietary to SAP. No part of this document may be reproduced,
IOS is a registered trademark of Cisco Systems Inc. copied,
or transmitted in any form or for any purpose without the express prior written permission of SAP AG.
RIM, BlackBerry, BBM, BlackBerry Curve, BlackBerry Bold, BlackBerry Pearl, BlackBerry Torch, BlackBerry
Storm, BlackBerry Storm2, BlackBerry PlayBook, and BlackBerry App World are trademarks or registered
trademarks of Research in Motion Limited.
2012 SAP AG. All rights reserved. 129

Potrebbero piacerti anche