Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Optimization is the technique of selecting the least expensive plan (fastest plan) for
the query to fetch results. The optimizer considers the possible query plans for a
given input query, and attempts to determine which of those plans will be the most
efficient.
Teradata performance tuning is a technique of improving the process in order for
query to perform faster with the minimal use of CPU resources.
The typical goal of an SQL optimization is to get the result (data set) with less
computing resources consumed and/or with shorter response time.
joins.
3. Processes non-correlated subqueries by materializing the subquery and placing its
value in the USING row for the query regardless of whether the subquery is on the
LHS or the RHS of the operator in the predicate.
4. Searches for a relevant join or hash index.
5. Materializes subqueries to spool files.
6. Analyzes the materialized subqueries for optimization possibilities.
a. Separates conditions from one another.
b. Pushes down predicates.
c. Generates connection information.
d. Locates any complex joins.
e. Discovers aggregations and opportunities for partial group by optimizations
7. Generates size and content estimates of spool files required for further processing.
8. Generates an optimal single-table access path.
9. Simplifies and optimizes any complex joins identified in stage 6d.
10. Maps join columns from a join (spool) relation to the list of field IDs from the input
base tables to prepare the relation for join planning.
11. Generates information about local connections. A connecting condition is one that
connects an outer query and a subquery. A direct connection exists between two
tables if either of the following conditions is found.
ANDed bind term: miscellaneous terms such as inequalities, ANDs, and ORs; cross,
outer, or minus join term that satisfies the dependent information between the two
tables
A spool file of an uncorrelated subquery EXIST predicate that connects with any
outer table
12. Generates information about indexes that might be used in join planning, including
the primary indexes for the relevant tables and pointers to the table descriptors of
any other useful indexes.
13. Performs row and column partition elimination for partitioned tables.
14. Uses a recursive greedy 1-table lookahead algorithm to generate the best join plan.
15. If the join plan identified in step14 does not meet the heuristics-based criteria for
an adequate join plan, generate another best join plan using an n-table lookahead
algorithm.
16. Selects the better join plan of the two plans generated in steps 14 and 15.
17. Generates a star join plan.
18. Selects the better plan of the selection in step 16 and the star join plan generated
in stage 17.
19. Passes the optimized white tree to the Generator.
The Generator then generates plastic steps for the plan chosen in step 19.
Methodologies
Optimization is one the most talked about technique in todays time for Teradata.
Because of the huge amount of data in Teradata database, it becomes very
important to take out the optimized performance from it, otherwise the queries will
perform poorly and the meaning of parallelism will be lost.
In order to select the least expensive plan for the query to fetch results, mentioned
techniques or practices can be followed:
(1) STATISTICS
Collecting statistics is one of the most primary steps in Teradata query Optimization.
Statistics collection is essential for the optimal performance of the Teradata query
optimizer. The query optimizer relies on statistics to help it determine the best way
to access data. Statistics also help the optimizer ascertain how many rows exist in
tables being queried and predict how many rows will qualify for given conditions.
DIAGNOSTIC STATEMENT
DIAGNOSTIC HELPSTATS ON FOR SESSION
The above statement can be used to determine the stats that might be
required to improve the performance of the SQL. The EXPLAIN plan needs to be
executed following the above statement to find the stats suggestion.
Stats will qualify one of the below confidence levels:
1) No Confidence - no statistics defined for a table.
2) Low Confidence - Stats are difficult to use precisely.
3) High Confidence - Optimizer is sure of results based on the stats available.
Statistics need to be collected for:
1. All non-unique indexes.
2. UPI of small tables (tables with less than x rows per AMP, depends on
Available number of AMPs)
3. All indexes of a join index
4. Any column used in joins
5. Any columns used in a WHERE clause
6. Indexes of global temporary tables.
Stats cannot be collected on:
1. Volatile tables
2. LOB columns
Collected statistics are not automatically updated by the system. Should refresh
statistics when 5-10 % of the table rows have changed.
Always collect statistics at the column level even when collecting on an index. This
is because indexes can be dropped at any time, so they are often dropped and
recreated.
When to collect Statistics:
While joining two tables make sure that both the columns fall under the same
character set. Otherwise implicit conversion of one to the other takes place resulting
in poor performance.
(7) DATE COMPARISON
When comparing values of date in a particular range, the query may result in
product join.
This can be avoided with the usage of SYS_CALENDAR.CALENDAR, which is
Teradata's in-built database.
Example:
Insert into table_a
select
t2.a1,t2.a2,t2.a3,t2.a4
from
table_2 t2
join table_3 t3
on t2.a1=t3.a1
and t2.a5_dt>=t3.a4_dt
and t2.a5_dt<=t3.a5_dt;
The above query can be replaced with sys_calendar to eliminate (but not
completely) product join
Example:
Insert into table_a
select
t2.a1,t2.a2,t2.a3,t2.a4
from table_2 t2
join SYS_CALENDAR.CALENDAR sys_cal
on sys_cal.calendar_date = t2.a5_dt
join table_3 t3
on t2.a1=t3.a1
and sys_cal.calendar_date >=t3.a4_dt
and sys_cal.calendar_date <=t3.a5_dt;
(8) PROPER USAGE OF ALIAS AND TABLE NAMES
Example:
Insert into table_a
select
This may result in utilization of high CPU, if the tables used above consist of
few rows as the table name and the alias name goes for full table scan of
the same table twice.(table_3) as per the above given example.
Suppose, if either of the table is very big, the above case may lead to SPOOL
error.
tab2
tab3
empno
deptn
o
enam
e
ename
dnam
e
dnam
e
If both of the tables used in the above SQL are small, it may result in a
product join (Cartesian product) and may consume high CPU.
If either of the table is very big, the case may lead to SPOOL error.
E.EMP_NAME,
D.DEPT_NAME,
D.DEPT_LOC
FROM
EMPLYOEE E
JOIN DEPT D ON E.DEPT_ID = D.DEPT_ID
LEFT JOIN PAY_ROLL P ON E.EMP_ID = P.EMP_ID
WHERE
E.JOIN_DT >= 2009-10-12
A table when created, by default, assumes first column to be the PI when the
index is not specified explicitly.
PI for a table should be chosen in such a way on a column with the most
unique values.
The below query can be used to see the distribution of rows to the AMPs in a
system.
select
hashamp(hashbucket(hashrow(primary_index_columns))) as "AMP"
,count(*)
from
your table
group by 1
order by 2 desc;
When a table does not have any column with the most unique values, identity
column may help.
IDENTITY COLUMNS:
If a table does not have any column with the most unique values ,
identity columns can be used.
Example:
unq_pk INTEGER GENERATED ALWAYS AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE -2147483647 MAXVALUE 100000000
CYCLE),
The "unq_pk" represents column name and "INTEGER" followed by
"unq_pk" represents data type. These values are generated
dynamically in a random way whenever data is inserted in to the table
holding the above Identity column.
Using the above column as a PI, would distribute the data almost
equally in to the available AMP's which reduces skewing data to a
single AMP.
The PI column,
The following methods can be used to scope down the size of sqls.
A SET table with no unique indexes will force Teradata to check for duplicate rows
every time a row is inserted or updated. This can cause a lot of overhead on such
operations.
It is better to use MULTISET after ensuring that the records getting populated to the
target is always unique.
References or Sources:
1. http://apps.teradata.com/tdmo/v08n02/Features/OptimizationChallenge.aspx
2. http://www.teradatamagazine.com/v09n04/Tech2Tech/Grand-master-of-thedatabase/
3. http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Refere
nce/B035_1142_111A/ch02.124.068.html
4. http://apps.teradata.com/tdmo/v08n02/Features/OptimizationChallenge.aspx
5. http://www.teradata-sql.com/2012/03/sql-optimization.html
6. http://www.teradatahelp.com/2010/11/teradata-performance-tuning-basic-tips.html