Tec 1123

TEC 1224
Understanding SAP Sybase IQ Query Execution

and Query Plans
Lou Stanton, Principal Solution Engineer
October 2012
Disclaimer
This presentation outlines our general product direction and should not be relied on in making a
purchase decision. This presentation is not subject to your license agreement or any other agreement
with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to
develop or release any functionality mentioned in this presentation. This presentation and SAP's
strategy and possible future developments are subject to change and may be changed by SAP at any
time for any reason without notice. This document is provided without a warranty of any kind, either
express or implied, including but not limited to, the implied warranties of merchantability, fitness for a
particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this
document, except if such damages were caused by SAP intentionally or grossly negligent.
2012 SAP AG. All rights reserved. 2

About Me
Lou Stanton, Principal Solution Engineer
Over 19 years with Sybase/SAP
17 years in the Business Intelligence/Data Warehousing arena
Work with customers and partners to install, configure and tune SAP Sybase IQ databases

Takeaways from this presentation
1. An understanding how queries are executed in SAP Sybase IQ

2. How to access graphical query plans
3. How to analyze and understand query plans
4. Tuning your IQ server and database for better performance

Query Processing
General Overview
Query execution phases
Upon submission of a query -
1. Syntax and permissions are checked
2. Query is parsed
3. Query is optimized
4. Query Plan is created
5. Query is executed
6. Resources are cleaned up

Query execution what happens where
Server Front End (SA)
Connection Management Query Plan
Parse Incoming Statement
Security Checking
Cross-DB Decomposition (CIS) IQ Query Optimizer
Stored Procedures Predicate Inference
SQL User Defined Functions Predicate Selectivity Estimation
Join Optimization IQ Run-Time Engine
Grouping Algorithm Selection Prefetch Manager
Subquery Optimization Predicate Execution
Index Access Selection Tuple (Row) Projection
Join Execution
Grouping Execution
Sorting
2012 SAP AG. All rights reserved. Subquery Execution 7
Typical query processing
Network Clients
SQL Query
SQL Anywhere (SA)

Server connections
SA Database security
prod.db R SQL parsing
e SA Bridge more
s IQ Query Engine
IQ Engine
u Optimization
Query execution
l Data access
t
IQ Temp Store
s IQ Main Store
SQL Anywhere (SA) and SAP Sybase IQ query
execution
SA gets more involved when a query involves
Stored procedures
Procedure code is retrieved from system tables for execution in the IQ Query engine
Stored procs are recompiled and optimized for each execution
Cursors
Invokes row-by-row processing between SA and the IQ query engine
User Defined Functions*

Invokes row-by-row processing between SA and the IQ query engine
* Does NOT apply to Partner UDF libraries added the SAP Sybase IQ libraries

Cursors
All Cursors are executed in the SA part of the engine
A row from a cursor in an IQ query will be passed to SA for processing - one row at a time
For performance reasons avoid using Cursors when many rows need to be processed
Try to leverage the IQ engine whenever possible
Use Set Logic or Case statements rather than Cursors
For Example:
Use a series of Update commands with a Where clause to modify values in specific rows

SQL user defined functions (UDFS)
Like cursors UDFs handled in SA and processed row-by-row

And like cursors should be avoided with large result sets
Exceptions: (Not subject to row-by-row processing)

IQ system functions are processed entirely within the IQ engine
Partner analytical UDF libraries (IQ 15.x) are catalogued in SA but are processed entirely within the IQ engine

Queries that can invoke SA
Network Clients
SQL Query
SA Cursor and SQL UDF Processing

prod.db
R IQ sends a row to SA for processing
SA Bridge
e
Row is returned to IQ
s
IQ Engine
u Repeat until all rows are processed
This may take some time
l
t Final result set is sent back to client
s
Temp Store Main Store
Query parallelism in IQ 15
15.2 Introduced additional SMP query parallelism in queries

Using many threads on the same server to resolve a query
Many (not all) query operators could invoke parallelism
15.3 Added distributed query parallelism across others servers in IQ Multiplex (PlexQ)
Also added additional query operators eligible for parallel execution
Note: Query operators refer to Joins, Group By, Union, etc

Query parallelism
Query operations now performed in parallel

Most table join operations
Group By
Sorting (Order By and Merge Joins)
Predicate execution
HTML Query Plans illustrate parallel operations and the degree of parallelism
Introduced later in this presentation

Dynamic parallelism
SAP Sybase IQ maintains the ability to support many concurrent users but can take
advantage of an idle server
A big query running alone might use many or all cores
The Query engine dynamically adapts to changes in server activity by increasing or

decreasing parallelism for individual queries
As other users launch queries during a big query IQ gracefully scales back big query resources

Delayed Projection in SAP Sybase IQ 15
Delayed Projection refers to the operation of decoding optimized FP indexes
(aka tokens) to the actual data values during query execution
SAP Sybase IQ 12.x - Decoding occurred immediately in the leaf node

Actual data values used for all query operations
SAP Sybase IQ 15.x - Decoding is delayed until actual data values are absolutely required
Tokens are manipulated rather than data values reducing memory usage

Tokenization (review)
Many column data values can be stored as tokens
Commonly known as Optimized FP indexes (FP1, FP2, FP3)
Enabled using Minimize_Storage = ON or IQ Unique() with Create Table
Tokens are 1 byte, 2 bytes or 3 bytes

Actual column values are stored as Lookup Tables in the database (aka the FP Lookup Page)
Lookup tables are automatically maintained by the IQ server
Token value is stored as the row value rather than the actual value

SAP Sybase IQ 15 - Delayed projection
Delaying decoding of token values in queries

Tokens are used in query operations rather than actual values
SORT operations, in particular
Using tokens reduces memory requirements for these operations (often significantly)
Query Plans show which value was used

FPORDINAL = token
FPVALUE = actual data value

Sort Using Tokens Reduces Memory
Query run against a 600 million row table
Select top 100 l_orderkey, sum(l_quantity), max(l_shipdate), count(*)

from lineitem
Group By l_orderkey
having sum(l_quantity) > 300
Query has a big sort operation
25% faster using delayed projection

Temp Space Used:
SAP Sybase IQ 12.7 - 9.2 GB
SAP Sybase IQ 15.x - 4.6 GB
Query Plans Overview
Database options for server side and client side plans
SAP Sybase IQ 15 query plans
Query performance changes in SAP Sybase IQ 15 are highly visible in query plans and show:
Operations performed in parallel
Thread and CPU utilization
Details in query timing

Why are query plans important?
Query plans provide important details about how a query was executed
Consider using a query plan when ...

You suspect (or know) a query is running poorly
Query plans show you how the optimizer processed the query
You may be able to identify a problem in the plan
You are not sure you have the correct indexes for a query
Query plans tell you which indexes are being used
When enabled, the IQ Index Advisor will provide suggestions in plans

How do you access a query plan?
In the IQ Message File

As text embedded in the IQ Message Log
Server side - as an HTML file

Created server side
Client Side as a Graphical Plan

Graphical image (in the Interactive SQL client)
As HTML plan
As an XML plan

Query plan database options
Query_Plan = On (default On)
Creates a very basic query plan in the IQ Message File
Additional query plan options are required for detailed analysis
Query_Detail = On (default Off)

Adds important details to query plans
Should always be ON when analyzing queries
Index_Advisor = On (default = Off)

Adds index advice and design recommendations to plans

Query plan database options (cont.)
Query_Plan_After_Run = On (default = Off)

Delays creating a query plan until query completes
Useful for analyzing optimizer decisions and timings for each operation
Query_Timing = On (default = Off)

Creates query timing histogram (on graphical plans)
Shows thread counts and cpu utilization
Must be used with the Query_Plan_After_Run = On option

Server side graphical query plan database options
Use these options for creating graphical (html) query plans on the IQ Server
Query_Plan_As_HTML = On (default Off)

Creates a graphical html query plan
File is written to the directory where the IQ Message Log resides
Query_Plan_As_HTML_Directory = <dir_name>
Sets a server side directory for HTML query plans
This is optional but recommended

Other database options for query plans
Query_Name = query_name (default = )

Prints query_name in the query plan and as part of the
file name of HTML query plans
NoExec = On (default = Off)

Creates query plan but does not execute the query
See Appendix 3 for database options to access client side graphical query plans

HTML Query Plans
HTML query plan layout the big picture
Query Text
Query Tree
Node Detail
Timing Histogram
Threads
CPU Utilization
2012 SAP AG. All rights reserved.

Wall Clock 29
Query tree
A visual representation of a query plan
Consists of nodes representing query operations
Nodes are color coded and numbered
The query tree is displayed inverted

The top node in the tree is the Root Node
Data flow
Node connections show data flow up the tree
Node Connections show row counts between nodes
Leaf nodes represent tables

Typically at bottom or edges

Query tree basics
Node Number
Node Type
Number of rows
flowing up the tree
Highlighted node indicating

Node Number (underlined) is a hot link to the
Index Advice in that node
details below in the Query Detail
Query tree enhancements in SAP Sybase IQ 15
New Parallel Operator
Double lines || connecting nodes indicate parallelism (SMP)
Node depth (3D) shows relative number of threads used
Node connector line width is relative to number of rows flowing between nodes
Tooltips
Hovering the mouse over parts of the plan reveals notes
Node containing Index Advice is highlighted in red

IQ 15.2 query tree (SMP)
Parallel Dataflow
Node Connectors
Threads Used
Tooltips (hover mouse pointer) Connector width varies
with row count
Double bar Tooltip for

Indicates estimated
parallel row count
dataflow Tooltip shows
max thread count
in nodes
Node depth
varies with
2012 SAP AG. All rights reserved. thread count 33
Query Timing Histogram and Wall Time
Depicts time expended at each node
Wall clock

Thread Utilization
Displays threads used over the life of the query
Thread usage corresponds to timing histogram above

Query Text

Query Detail
Query Detail for each Node in the Query Tree

Distributed Queries
Query Plans for IQ Multiplex PlexQ plans
SAP Sybase IQ Multiplex distributed queries
In IQ Multiplex environments some query operations can be distributed to other nodes (PlexQ)
Not all queries will qualify for distribution
The node where the query is submitted is the Leader node
Other nodes used with distributed processing are considered Worker nodes
Query plans for distributed queries reflect the work performed on all participating nodes
A query plan fragment is created for each worker node

Distributed Query Plan
Mouse over shows rows

processed by worker node
Triple bar indicates
distributed processing
Distributed Queries create multiple query

plans one for each distributed fragment
Not
Distributed

Distributed query plan fragments
This query has 4 distributed fragments (5 plans created)
Fragment 4
Fragment 1 Sort Merge Join
Sort operation
Fragment 2 Fragment 3
Sort operation Sort Merge Join

Distributed query plans
One primary query plan and one plan for each fragment each is a different html file
Primary Plan Fragment 1
Fragment 2
Fragment 3
Fragment 4
Query text is displayed

only in the primary plan
Timing & Threads in distributed queries
This is the Primary
Query Plan
Displays all node timings
Tool tips (hover mouse)
Timings
Threads Used
Node #3 (on Leader)
Used 9 threads
Wall time
Node #101 (distributed)

Node #92 on Server2 (S:2)
Used 2 threads
Wall time

Query Node Details
Close look at important nodes in query plans
Query plan node details
A node represents one operation of a query execution
General node information

Node Type
Parent and Child Node(s)
Node Timing (when Query Timing option is On)
Columns projected (output columns) from the node
Node details otherwise vary by the type of node

Details common to all nodes
Header shows node number and type

Hot link back to node in the Query Tree
Parent Node
Node that called this node
Link to that Node
Child Node(s) (if any)

Node(s) that feeds this Node
Result Rows
Generated: Actual rows from this node
Estimate: Optimizers estimate of rows from this node

Node details - Outputs
Output Vector
Number of columns (entries) and total data width output (in bytes) from the node
Output Details
Data type and number of distinct values (Base Distinct) for each output column
Leaf Node outputs provide additional details including index type and index advice (if enabled)

Node types (there are others)
Root Leaf
Scrolling Cursor Store Join (6 types)
Filter Subquery
Group By (3 types) Union All
Order By Parallel Combiner
Store Parallel Order By
Semi-join Filter Filler

Nodes to Concentrate On
Some Nodes contain a wealth of information

Detail in other nodes is not as important
Start with these Nodes when analyzing queries:

Root
Leaf
Join

Root Node
The Root node is at the top of the (inverted) query tree
Provides general information and timings for the entire

query

Root Node
Root Node Details

Result Rows (actual and estimated)
User Name (login)
IQ Temp Space Used (total)
Active users on server
Number of CPUs
Cache sizes
Threads used for all invariant predicates

Examining the Root node
Temp Space Usage

Estimate total usage over the life of the query
Maximum at any point in time may be lower
Effective Number of Users

The count of active cursors (IQ queries running)
Number of CPUs
The number of CPUs the IQ engines believes are available
With hyper-threaded CPUs or -iqnumbercpus server parameter is set the number of cpus may not match actual
number on the server
Optimizer may choose a different Query Plan depending upon

number of CPUs, IQ cache sizes and Active Users

Root node attributes
Cache sizes
A reminder how large the caches are
Threads used for invariant predicates *

Total for all invariant predicates in all tables
Database Options
Options set to non-default values affecting query performance or behavior are displayed with their
current value
* Invariant predicate
A predicate with a constant value(s) throughout the life of the query
Example: customer_state IN (CA, NV, DC)

Query Timings in the Root node
Overall query timing is listed in the Root node

Query_Plan_After_Run and Query_Timing options must be ON to report timings
Total time to resolve all predicates is listed as Elapsed Condition time

Root node - Outputs
Identifies all column output details sent to the client

Scrolling Cursor Store node
Scrolling Cursor Store node usually found just below the Root Node
Affected by the database option Force_No_Scroll_Cursors (default ON)
The store buffers all rows from a query result set

Allows scrolling through results (forward and backwards)
i.e. Scrollable Cursor
Consumes Temp Cache and Temp Store (for large results)

Typically this option is set OFF for better query performance
Some query client and ETL tools may require this option be set ON

Temp memory usage with Scrolling Cursors
Force_No_Scroll_Cursors = OFF Force_No_Scroll_Cursors = ON

Leaf node - represents access to a table in the IQ
Store
All table indexes and columns used in the query are identified and accessed in the Leaf node
Types of Leaf nodes:
(Regular) Leaf
Aggregation Leaf
Grouping Leaf
Distincting Leaf
Ordered Leaf
Proxy Leaf (Proxy Table)
SA Leaf (table in the SA database)

Leaf node a wealth of information
All column metadata needed by the optimizer is identified in the Leaf node
Row Counts
Total rows in the table
Estimated count after executing all conditions
Generated count after executing all conditions
Conditions (predicates) in the table showing:

Selectivity
Usefulness
Index used to resolve the predicate
Index Advisor Messages
Predicate execution phase

Leaf node details
Rows Counts
Total rows in the table
Generated rows after conditons (if any)
Conditions (Predicates)
Selectivity
Usefulness
Elapsed time (to execute)
Rows remaining (post execution)
Index used
Threads used

Selectivity, Index Selection and Usefulness
For each predicate the optimizer determines
Selectivity
Portion (percentage) of the table rows that satisfy that predicate
HG, LF and FP indexes provide this metadata
Index used to resolve the predicate

When multiple indexes exist, the best index for the predicate
Cost and resources required influence index selection
Usefulness Score
The order the predicates have been executed
Execution phase, cost and resources required influence Usefulness scoring

Estimated Selectivity
If no metadata exists the optimizer estimates selectivity based on the predicate operator
Percent of table
Predicate Type (Estimated) Selectivity
Equality (=) 20% 0.2000000
Open Range (>) 40% 0.4000000
Between 40% 0.4000000
Like (%) 20% 0.2000000
Inter-column equality (t.a = t.b) 30% 0.3000000
Inter-column comparison (t.a < t.b) 50% 0.5000000
Selectivity with trailing zeroes may indicate an optimizer estimate

Index-based selectivity
When a useable index exists it can provide accurate selectivity

Exact selectivity is shown as a precise number without trailing zeroes such as: 0.34982376
Note: Some functions can negate the ability to use an index for selectivity
Example: SUBSTRING( t.a, 5, 5 ) = homes
Optimizer might use default electivity
20% for an equality search ( .20000000)

Usefulness
Usefulness determines predicate execution order
Usefulness value ranges: 0.0 to 10.0 (10.0 = most useful)

Predicate with highest score is executed first
Remaining predicates executed in descending order of usefulness score
Factors determining Usefulness scoring

Selectivity how many rows in the table will be eliminated by this predicate
Execution Phase
Type of predicate (operator)
Index available (including optimized FP index)
Speed of predicate execution
Resources required to execute the predicate

Local predicate Execution Phase
Three categories of local predicates

Invariant
Delayed
Bound
Category is based upon when the data required for predicate execution is available
Predicate category usually dictates the phase (i.e. timing) when it is executed in the query

Invariant predicates
Invariant: expression does not change during query execution
Most local predicates are invariant predicates

Example: WHERE cust_id = 12345
Invariant predicates are usually executed in an early phase to ensure accurate row count
information is available to subsequent phases of query optimization

Delayed predicates
These are predicates that need to be executed once but require information that will become
available after another part of a query is executed:
Example: WHERE cust_id in

(Select cust_id from customer where state = VA)
The Where cust_id in predicate is delayed since the subquery must be executed first
Delayed predicates are typically found with uncorrelated subqueries (above) and some Push-Down joins

Bound predicates
Bound predicates
Need to be executed repeatedly and require information that will only be available after some other portion of the
query has been executed
Bound predicates are commonly used in

Correlated subqueries
Nested-Loop Push-Down joins

Predicate execution phases
Local predicates are typically executed in this order -

1. Resolve all invariant predicates
2. Execute Delayed predicates
3. Execute Bound predicates
Usefulness scores are related to a predicate execution phase
Note: Execution phase can be user influenced by using Optimizer Hints written into query code

Query operators and indexes
Predicates typically use these indexes with these operators
Equality and Inequality( = , != ), IN and NOT IN lists LF and HG
Ranges ( <, >, <=, >=, Between) including NOT Between DATE, HNG,
LF and HG
Like ( % ) or Not Like ( % ), Contains FP, WD,TEXT

HG and LF indexes in predicates
The HG and LF indexes provide exact metadata to the optimizer:

The number of distinct values in the column and
The number of rows found for each value
These indexes also provide fast run-time access to the rows

Directly influences predicate execution cost and execution phase

FP indexes in predicates
Includes FP(1), FP(2) and FP(3) indexes
Optimized FP indexes (tokens) also provide metadata:

Number of distinct values for a column
Number of rows for each value
Do not provide fast access to the rows (like HG and LF indexes)
Tokens dramatically reduce storage, disk I/O and memory

Used with sorts to dramatically reduce memory (IQ 15.x)
FP index searches can use multiple threads for predicate execution

Optimizer may prefer using FP index over other indexes since it can exploit many threads (and be executed faster)
IQ server work load at query time can influence this decision

Multi-threaded FP index searches
Optimizer chooses multi-threaded FP index search to resolve a predicate
Example: C_MKTSEGMENT = Automobile

8 threads used

Range predicates in Leaf nodes
Example shows range costing in a Leaf node (lowest cost usually selected)
Range searches are costed by the optimizer to determine which index is appropriate
Cost estimates are provided for different index types (even if they dont exist on the column)
Why wasnt the DATE Index used ?

The answer lies in the column detail in the Leaf
node
The Leaf Node lists all the columns accessed for the query
Data type and indexes no DATE index on this column
With no DATE index the optimizer chose a multi-threaded FP search

The Index Advisor does NOT provide suggestions for DATE indexes for ranges (or DTTM, TIME or HNG indexes)

Index Advice in leaf nodes
Query tree highlights Leaf nodes in red indicating Index Advice is available in the node
Database option Index_Advisor must be set ON

Other Indexes in predicates
Except for special cases the other IQ indexes are used to locate values and do not provide
metadata
Only the LF, HG and FP indexes provide metadata
Some of the special cases are:

CONTAINS predicate with a WD or TEXT index
DATEPART function predicate on a column with a DATE index

Other types of Leaf nodes
Under the right circumstances optimizer uses an Index Based algorithm to perform work
within a node
Examples where these algorithms are used:

Aggregation Leaf
SELECT COUNT(*) FROM ...
SELECT SUM(PAID_AMT) FROM
Grouping Leaf
SELECT MEMBER_GENDER, COUNT(*)
FROM
GROUP BY MEMBER_GENDER
Distincting Leaf
SELECT DISTINCT (shipdate)
FROM

Leaf node wrap up
Leaf nodes provide a wealth of information about the data and indexes used in a query
List of all table predicates including:

Selectivity
Usefulness
Execution phase
Index used to resolve predicate
All columns referenced, including:

Available indexes
Data type
Distinct count
Index advice

Join nodes
Join nodes merge a set of rows from one table with rows from another table passing all
combinations which satisfy the join condition(s)
For joins the optimizer must determine two things:

Join order (if more than 2 tables)
Join algorithm (6 types)
For these decisions the optimizer uses a cost-based process

Relies heavily on the metadata for the join columns
Metadata comes from indexes in Leaf nodes

Metadata for join performance
Tips for providing the right metadata
Individual columns which are primary keys always declare as a PRIMARY KEY
(or create a UNIQUE HG index)
Candidate key columns create a UNIQUE HG index if used in joins
All other join columns create an HG (or LF) index
Multi-column primary keys where tables will be joined using all the keys create a multi-column
PRIMARY KEY (or a multi-column UNIQUE HG index)
If all keys are NOT used in a Join then a Primary Key (or multi-column Unique index) is not necessary
For example, the Primary Key for a Fact table would almost never be necessary (except to maintain entity
integrity)

Join algorithms
SAP Sybase IQ has 6 different table join algorithms
Valid algorithms for a given join depend upon:

Type of join condition
Data types on the join columns (matched or mismatched)
Outer join (if so which kind)
Indexes on the join columns
Join constraint (Many to Many, One to Many)
Valid algorithms are compared against each other based on their estimated costs and available
server resources

Types of join algorithms
SAP Sybase IQ Join algorithms and their abbreviations:
Nested Loop Join NLJ

Nested Loop Pushdown NLPD
Hash Join HJ
Hash Pushdown Join HPDJ
Sort Merge Join SMJ
Sort Merge Pushdown SMPDJ
Abbreviations are used in Join Nodes

About Join Algorithms
Knowledge of the join algorithms, their strengths and weaknesses, may allow you to identify
ways you could change the query or the schema to improve query performance
Each join algorithms has different advantages:

Some are better for joining large tables to small tables
Some are better for joining two equally large tables
Some require large amounts of Temp Cache be available
Some require large amounts of Main Cache be available
Some are good only for very selective joins

Classic Joins
These join algorithms are used in SAP Sybase IQ (as well as most relational databases):
Nested Loop
Hash
Sort-Merge

Nested Loop Join
Take the smaller table keys
Store them in a row store
row store
For each row in the Big Table
Compare key against each row in the small table
Big table Return results that satisfy the join conditions

Nested Loop join details
Advantages:
Used with any type of join condition, even LIKE join conditions (or with no join conditions at all)
Used with every type of OUTER JOIN
Disadvantages:
If small side has more than a couple rows (1 or 2) then this join may be very slow
Constraints:
Small side row store uses Temp Cache

Hash Join (Classic Hash)
Take the smaller table
Store in a hash table based on the join keys
Hash
For each row in the Big Table

Probe into the hash table with its join keys
Return results that satisfy the join conditions
Big table

Hash Join
Advantages:
Is fast, when appropriate
Used for any kind of INNER or OUTER JOIN
Disadvantages:
Requires at least one equality join condition where both join columns are identical data types
A data type mismatch of join columns will disqualify Hash joins (index advisor reports type mismatches)
Constraints:
The small side must fit within available and pinnable Temp Cache
Pinnable hash buffers in memory cannot be paged to swap

Sort Merge Join
Sort the (join) keys from each table J3
Match up (merge) the keys
sort sort
T1 T2

Sort Merge Join
Advantages:
Is fast and when it can be parallelized
Can be used for any kind of INNER or OUTER JOIN
Disadvantages:
Requires at least one equality join condition
Constraints:
Both join inputs must be sorted within the temp cache
Can consume large amounts of temp cache space
Modest performance penalty if sort set cannot remain entirely within Temp Cache

Sort Merge Join in Query Tree
Sort-Merge Joins are preceded by a (child) Order By Node for sorting

Sorts can be performed in parallel (as in this case) or may be distributed (Multiplex)
Parallel Sort
Data flow

Push Down joins (3 types)
These join algorithms are unique in SAP Sybase IQ
Pushdown joins reduce join work by filtering rows in one table using the keys of a smaller table
like an In List predicate
In cases of multi-table queries filtering may be performed several nodes below where the actual join occurs
When the optimizer uses a Push Down join, you will see an additional =, IN, or
PROBABLY_IN predicate in the Leaf Node being filtered
See Appendix 1 for details on Pushdown Joins

Join Node
Joins always have 2 children
Result Rows after joining
Valid algorithms for this join
Hash Key Estimate
Optimization Notes
Table Inputs and row estimates
Join Constraint: Many to 1

Join Performance Considerations
Ways to get the most efficient Join Algorithm and better query performance
Create Primary Keys (or Unique HG) on (most) tables

Create HG index on all components of composite keys that are actually used in Joins
Note the Join Constraint in Join Nodes

Many-to-Many indicates optimizer is unable recognize table relationship (if there is one)
Primary Key or Unique Index may be missing
Use Query_Plan_After_Run to check row counts

If estimate and actual result rows are way off check for missing indexes
Be sure join columns have the exact same data type

Data type mismatch disqualifies Hash joins
Examples: varchar(10) != char(10) and integer != smallint

Group By Nodes
Merges a set of rows from one table into a smaller number of output rows
Each row represents collective information from one or more input rows
For a Group By the optimizer must determine two things:

The expected number of resulting groups
Which algorithm to use
To make these decisions the optimizer uses a cost-based process

Relies on available metadata for the grouping columns

Group By algorithms
SAP Sybase IQ has 4 aggregation algorithms
Group By (Hash) for modest result sets

Uses a hash table in the Temp Cache
Group By (Sort) for large sets

Creates a sort object in the Temp Cache
Group By (Single)
1 result row, no Group By clause
Grouped & Aggregation Leafs

For modest result sets of single tables
To qualify all Group By columns must have an HG or LF index

Tuning Tips for Joins and Group By
Indexing Techniques and Database Options Affecting Query Performance
Indexes provide critical metadata to the optimizer

Major influence on the choice of algorithm the optimizer selects
Hash algorithms can be faster than others

On servers with adequate memory you can improve chances the optimizer will use hashes by modifying some
options

MetaData for Group By performance
Suggestions for getting better Group By performance
Group By (Hash) tends to be the fastest

Requires accurate metadata to determine the exact number of groups and if they can fit in Pinnable Cache*
Create HG / LF index on individual Group By columns

Provides distinct counts for grouping columns
For queries with many grouping columns

Create a multi-column HG index on group by keys from the same table
Composite indexes provide better result size estimates
* Pinnable cache cannot be paged

Tuning Hash Joins and Group By hash
If you have > 4 GB of memory consider modifying database options for Hashes
Default values for these options are based on a server with 4 GB RAM
Improve chances of the optimizer using Hashes by modifying these Database Options:
Max_Hash_Rows - default value = 2,500,000 (keys)
Hash_Pinnable_Cache_Percent - default Value = 20 (percent)

Max_Hash_Rows database option
Affecting Hash Joins and Group By (Hash)
Optimizer considers hash algorithms based upon available Temp Cache and the number of
hash keys expected
Number of hash keys is reported in Join nodes
If number of keys is less than Max_Hash_Rows option value a hash may be chosen
Max_Hash_Rows option is dynamically configurable

Can be set as temporary option
Increase this option value linearly based on your server memory

8 GB: Max_Hash_Rows = 5000000 (5 million)
64 GB Max_Hash_Rows = 40000000 (40 million)

Hash_Pinnable_Cache_Percent option
Affects Hash Joins and Group By (Hash)
Sets the percentage of a users Temp Cache memory allocation that any one Hash object can
pin in memory
Dynamically configurable
If you increase Max_Hash_Rows then consider increasing the Hash_Pinnable_Cache_Percent

Suggest increasing option value in 5% increments
Recommended maximum* is 40 (percent)
* Depends on Temp Cache memory size and number of concurrent users

Hash options IQ 15.4 ESD 1 and later
As 15.4 ESD1 (and beyond) the Max_Hash_Rows and Hash_Pinnable_Cache_Percent options

are no longer considered in the decision to use the Hash algorithm for Group By operations
There is no row limit consideration for Group Bys
Max_Hash_Rows and Hash_Pinnable_Cache_Percent can be left at default values
The optimizer evaluates the size and availability of the IQ Temp Cache to determine whether to
use hash Group By
Max_Hash_Rows is still considered for joins but should be used with caution
It may cause the optimizer to use hash joins with non-parallel components
Sort-Merge joins which can fully parallelized can be faster but might not be chosen when Max_Hash_Rows is high

Individual user memory
This is a very common tuning error
IQ server parameter gm (max user connections) affects individual users memory allocation
The IQ memory manager sets aside memory to accommodate all possible connections based on gm
The higher the value of gm, the lower the users memory allocation
Recommendation: set gm to a reasonable value for user connections

Provides better user memory allocation

Before tuning your IQ server using this knowledge
Have a plan before you start modifying database options and memory
Start with a baseline test

A series of queries or data loads
Make only one change at a time and rerun the tests

Monitor performance under peak loads
Make changes gradually

Increase option values in steps
Listen to users is performance better (or worse)?

Tell your boss what you learned here! (might gets you a ticket back next year?)

Pop Quiz
Before we close
We covered some of the Join and Group By algorithms
Do you recall where Join and Group By algorithms are processed:

in the IQ Main Cache?
in the IQ Temp Cache?
Why does Sybase usually recommend the Temp Cache be larger than Main Cache?

Query Operations and IQ Caches
Temp cache is where the action is!
What happens where in the caches
Hashes
Hash Joins (HJ, HPDJ)
Group By (Hash)
NLPD Joins Sorts
Sort Joins (SMJ, SMPDJ)
Correlated Subqueries Group By (Sort)
Order By
Nested Loop Joins (NLJ)
IQ IQ
Store Temp
2012 SAP AG. All rights reserved. Store 107
Questions
Thanks for your attention today!
This presentation is accompanied by three Appendicies

1. Pushdown Join algorithms
2. Influencing Joins using database options and optimizer hints
3. Client side graphical query plans

Appendix I
SAP Sybase IQ pushdown join algorithms

Nested Loop Pushdown
Reverse of Nested Loop Join
small For each Row in Small Table -

table
Probe the big table using a fast index (LF or HG)
big
table

Nested Loop Pushdown
Advantages:
Is very fast, when appropriate
Can be used for INNER JOIN or RIGHT OUTER JOIN
Disadvantages:
The large side of the join must be a single table
Requires an equality join condition
Column from the large table needs HG or LF index
Constraints:
If small side is more than a few rows, then all projected columns from large side must fit within the Main Cache or
performance will be very slow

2. Hash Pushdown Join
If the distinct count of T3.x is much smaller

This is an example with a 4 table join than the distinct count of T1.x
T3 is a small table or some T3 or T4 predicate can filter
J3 T1.x = T3.x T3 xT4 to small result set
Then J3 has a hash table keyed on T3.x when J2 is
completed
Create an artificial IN Clause on T1 (using the

hash table at J3) to filter out rows that cannot
J1 J2 satisfy T1.x = T3.x
T1 T2 T3 T4
By filtering T1 with keys from J3 there
is less work to be done in J1 and J3
Hash Pushdown Join
Advantages:
Is very fast, when appropriate
Disadvantages:
Requires equality join condition where both sides are identical data types
Expression from the large side must be column with an LF or HG index
Constraints:
The small side must fit within available Temp Cache

3. Sort Merge Pushdown
Similar to the Hash Pushdown but in

J3 T1.x = T3.x this case there are too many rows from
J2 to store in a Classic Hash
J1 J2 Instead, compute a hash (stored as a

bit vector (BV) ) to report that 1 or more
rows in J2 have that hash value
T1 T2 T3 T4
Filtering T1 with keys from J3

Sort Merge Pushdown
Bit vector wrapped within a Probably IN predicate
Push Down the Probably IN bit vector to T1 J3 T1.x = T3.x
For a given key in T1, if the hash value of that key is

NOT in the bit vector then we know that the row
CANNOT match at J3
If the keys hash value IS in the bit vector, then it MAY match
and we must pass it up to J1 and J3 J1 J2
T1 T2 T3 T4

Sort Merge Pushdown Join
Advantages:
Disadvantages:
Requires equality join condition
Both sides are identical data types
Expression from large side must be column with LF/HG index
Constraints:
Both join inputs passed through sorts in Temp Cache
Must be sufficient Temp cache space to pin the entire bit vector

Appendix 2
Influencing table joins using database options and optimizer hints

Methods to influence table joins
Two methods:
Database Options (usually set as a Temporary Option)
Optimizer Hints hard coded in SQL
Either method can be used to influence optimizer join decisions

Useful if coded within canned SQL scripts or stored procedures
For evaluating optimizer join choices

Database options for joins
Influence the type of Join algorithm

Database Option - Join_Preference
Force the table join order

Database Option - Join_Optimization

Join Preference
Default = 0 (Optimizer Decides) - Quotes are required for option value
Join_Preference Option Values

1 Prefer Sort/Merge -1 Avoid Sort-Merge
2 Prefer Nested Loop -2 Avoid Nested Loop
3 Prefer Nested Loop PD -3 Avoid Nested Loop PD
4 Prefer Hash -4 Avoid Hash
5 Prefer Hash PD -5 Avoid Hash PD
6 Prefer PreJoin -6 Avoid PreJoin

7 Prefer Sort Merge PD -7 Avoid Sort-Merge PD
You can only influence the join type from the list of valid join algorithms in the Query
Plan (join node), otherwise the option will be ignored by the optimizer
Example: Join_Preference Option
Set Temporary Option Join_Preference = -2

Asking Optimizer to Avoid Nested Loop Joins
Option value must be quoted as shown
Use for problem solving and performance testing

Not helpful with query tools (BOBJ, etc.)
Use as a Temporary Option within a Query

Permanent setting of this option may cause performance problems with other queries

Join optimization
Join_Optimization Default: ON
Switching option OFF parses table joins from left to right as specified in the From clause
May help an individual query but should only be used if you are certain the optimizer has misjudged the query
You may want to use parenthesis in the FROM clause to ensure you are getting the desired join order
This option has no effect on the selection of join algorithms

IQ Reference Manual (Statement & Options) has detailed description of this option and its effects

Optimizer hints for join algorithms
One of the obvious problems using the Join_Preference option is the global scope and
potential to affect other joins in a query
Using the database option might help one join but could all other joins in the query
Optimizer hints were introduced to allow hard coding of join preference * in SQL code
More granular than using the join_preference option
These hints are useful for coding into canned SQL code and stored procedures
* Hints can also be used to suggest indexes, selectivity and execution phase. See the IQ reference Manual.

Optimizer hint syntax
Simple equality join predicates can be tagged with a predicate hint allowing a join preference to
be specified for just that join
The following example requests a Hash Join:
Select
FROM customers c, orders o
WHERE (c.cust_key = o.cust_key , J:4 )
Syntax
Enclose the entire join clause and hint within parentheses
Comma precedes the J:_ hint
Use the same positive or negative numbers for the Join_Preference database option to specify the desired join type

Appendix 3
Client side query plans

Client side query plans
Use these Database Options and Functions to retrieve query plans at client workstations
Database Options
Query_Plan_Text_Access = ON (default = Off)
Allows access to query plans with dbisql (java) client
Query_Plan_Text_Caching = ON (default = Off)

Caches query plans for retrieval at clients
SQL Functions
HTML_Plan()
Returns an HTML query plan file to a client
Graphical_Plan()
Returns XML query plan file to a client

Client side query plan in dbisql (IQ 15)
dbisql client can retrieve graphical Query Plans

With appropriate options set, run the query and get results
From the dbisql Menu Bar: Tools -> Plan Viewer
Click Get Plan

dbisql graphical query plan
Query Tree Query Node Detail
Select a node in the

Query Tree to see
node details

2012 SAP AG. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the Google App Engine, Google Apps, Google Checkout, Google Data API, Google Maps, Google Mobile Ads,
express permission of SAP AG. The information contained herein may be changed without prior notice. Google Mobile Updater, Google Mobile, Google Store, Google Sync, Google Updater, Google Voice,
Google Mail, Gmail, YouTube, Dalvik and Android are trademarks or registered trademarks of Google Inc.
Some software products marketed by SAP AG and its distributors contain proprietary software components
of other software vendors. INTERMEC is a registered trademark of Intermec Technologies Corporation.
Microsoft, Windows, Excel, Outlook, PowerPoint, Silverlight, and Visual Studio are registered trademarks of Wi-Fi is a registered trademark of Wi-Fi Alliance.
Microsoft Corporation.
Bluetooth is a registered trademark of Bluetooth SIG Inc.
IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z,
System z10, z10, z/VM, z/OS, OS/390, zEnterprise, PowerVM, Power Architecture, Power Systems, Motorola is a registered trademark of Motorola Trademark Holdings LLC.
POWER7, POWER6+, POWER6, POWER, PowerHA, pureScale, PowerPC, BladeCenter, System Storage, Computop is a registered trademark of Computop Wirtschaftsinformatik GmbH.
Storwize,
XIV, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, AIX, Intelligent Miner, WebSphere, SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork,
Tivoli, Informix, and Smarter Planet are trademarks or registered trademarks of IBM Corporation. SAP HANA, and other SAP products and services mentioned herein as well as their respective logos are
trademarks or registered trademarks of SAP AG in Germany and other countries.
Linux is the registered trademark of Linus Torvalds in the United States and other countries.
Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions,
Adobe, the Adobe logo, Acrobat, PostScript, and Reader are trademarks or registered trademarks of Adobe Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as
Systems Incorporated in the United States and other countries. their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business
Oracle and Java are registered trademarks of Oracle and its affiliates. Objects is an SAP company.
UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and
services mentioned herein as well as their respective logos are trademarks or registered trademarks of
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or Sybase Inc. Sybase is an SAP company.
registered trademarks of Citrix Systems Inc.
Crossgate, m@gic EDDY, B2B 360, and B2B 360 Services are registered trademarks of Crossgate AG
HTML, XML, XHTML, and W3C are trademarks or registered trademarks of W3C, World Wide Web in Germany and other countries. Crossgate is an SAP company.
Consortium, Massachusetts Institute of Technology.
All other product and service names mentioned are the trademarks of their respective companies. Data
Apple, App Store, iBooks, iPad, iPhone, iPhoto, iPod, iTunes, Multi-Touch, Objective-C, Retina, Safari, Siri, contained in this document serves informational purposes only. National product specifications may vary.
and Xcode are trademarks or registered trademarks of Apple Inc.
The information in this document is proprietary to SAP. No part of this document may be reproduced,
IOS is a registered trademark of Cisco Systems Inc. copied,
or transmitted in any form or for any purpose without the express prior written permission of SAP AG.
RIM, BlackBerry, BBM, BlackBerry Curve, BlackBerry Bold, BlackBerry Pearl, BlackBerry Torch, BlackBerry
Storm, BlackBerry Storm2, BlackBerry PlayBook, and BlackBerry App World are trademarks or registered
trademarks of Research in Motion Limited.

Tec 1123

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Tec 1123

Caricato da

Copyright:

Formati disponibili

TEC 1224

Understanding SAP Sybase IQ Query Execution

2012 SAP AG. All rights reserved. 2

Lou Stanton, Principal Solution Engineer

Over 19 years with Sybase/SAP

17 years in the Business Intelligence/Data Warehousing arena

2012 SAP AG. All rights reserved. 3

1. An understanding how queries are executed in SAP Sybase IQ

2012 SAP AG. All rights reserved. 4

Upon submission of a query -

1. Syntax and permissions are checked

4. Query Plan is created

6. Resources are cleaned up

2012 SAP AG. All rights reserved. 6

SQL Anywhere (SA)

User Defined Functions*

2012 SAP AG. All rights reserved. 9

2012 SAP AG. All rights reserved. 10

Like cursors UDFs handled in SA and processed row-by-row

Exceptions: (Not subject to row-by-row processing)

2012 SAP AG. All rights reserved. 11

SA Cursor and SQL UDF Processing

15.2 Introduced additional SMP query parallelism in queries

Note: Query operators refer to Joins, Group By, Union, etc

2012 SAP AG. All rights reserved. 13

Query operations now performed in parallel

2012 SAP AG. All rights reserved. 14

The Query engine dynamically adapts to changes in server activity by increasing or

2012 SAP AG. All rights reserved. 15

SAP Sybase IQ 12.x - Decoding occurred immediately in the leaf node

2012 SAP AG. All rights reserved. 16

Tokens are 1 byte, 2 bytes or 3 bytes

2012 SAP AG. All rights reserved. 17

Delaying decoding of token values in queries

Query Plans show which value was used

2012 SAP AG. All rights reserved. 18

Select top 100 l_orderkey, sum(l_quantity), max(l_shipdate), count(*)

25% faster using delayed projection

2012 SAP AG. All rights reserved. 21

Consider using a query plan when ...

2012 SAP AG. All rights reserved. 22

In the IQ Message File

Server side - as an HTML file

Client Side as a Graphical Plan

2012 SAP AG. All rights reserved. 23

Query_Detail = On (default Off)

Index_Advisor = On (default = Off)

2012 SAP AG. All rights reserved. 24

Query_Plan_After_Run = On (default = Off)

Query_Timing = On (default = Off)

2012 SAP AG. All rights reserved. 25

Query_Plan_As_HTML = On (default Off)

2012 SAP AG. All rights reserved. 26

Query_Name = query_name (default = )

NoExec = On (default = Off)

2012 SAP AG. All rights reserved. 27

2012 SAP AG. All rights reserved.

The query tree is displayed inverted

Leaf nodes represent tables

2012 SAP AG. All rights reserved. 30

Highlighted node indicating

2012 SAP AG. All rights reserved. 32

Double bar Tooltip for