Sei sulla pagina 1di 38

Database Statistics - good practices not only

for experienced administrators


18.03. 2015

Asseco at a Glance

Founded in 1991

The largest IT company in CEE

6th largest software producer in Europe

Traded on the WSE, included in the WIG30 Blue Chip index

17 000 employees worldwide

Selling proprietary software and services

Strong financials with a great track record


2013 revenue of PLN 5,9b (EUR 1,4b)

CAGR +17.9% (2009-2013)


2013 EBIT of PLN 611m (EUR 145m)

Our Offices Worldwide

Asseco Poland S.A.


The Asseco Group

Presentation Objective
Question:
How to maintain everyday database statistics process
in a complex environment?

What do I mean by the term:


complex environment?

Complex Environment

Maintenance

DB2 environment

> 60 TB of production
data (6 DB2 members
in Data Sharing)
DML 48K/s,
GETPAGES 392K/s, IO
RW 13K/s
IO R 12K/s
> 475 GB data growth
per month

Weekly cost
of maintenance for
RUNSTATS - 240 MIPS
3 807 colgroups
definitions
2 344 columns to verify
> 500 active plans and
over 12 000 active
packages to BIND
wasny.

Automation

Largest objects - NPI


INDEX >230 GB,
Tablespace 280 GB
400 000 objects
to maintain
Every year >30K new
objects

Presentation Plan

Introduction to database statistics

Tools for database statistics

Automation of the process of database statistics

Challenges in the project

Conclusions

Introduction
to Database statistics

Database Statistics General


Overview
Usage of the RUNSTATS utility:
Running RUNSTATS utillity enables DB2 to choose efficient access
paths by keeping the statistics accurate and up-to-date.

The collected statistics concerning database objects


are stored in DB2 in a catalog.
The collected statistics are used by DB2 during the
BIND process, when the most efficient access
paths are determined.

RUNSTATS Utility

Functionalities

1/2

RUNSTATS
TABLESPACE

gathers statistics on a tablespace

gathers statistics on tables

gathers statistics on indexes

gathers statistics on columns

RUNSTATS
INDEX

* RUNSTATS does not collect statistics for clone tables, CGTT or index spaces.
** Leading columns are collected by using RUNSTATS INDEX.

RUNSTATS Utility

2/2

RUNSTATS collects the following three types


of distribution statistics:

01

Frequency

The percentage of rows


in the table that contain
a value for a column or
combination of values
for a set of columns.

02

Cardinality

The number of distinct


values in the column
or set of columns.

03

Histograms

Histogram statistics are


to be gathered for the
specified group
of columns.

Selection of Statistics Stored in DB2


Catalog Used for Access Path Selection

SYSIBM.SYSCOLDIST - CARDF, COLGROUPCOLNO, COLVALUE,


FREQUENCYF, HIGHVALUE, LOWVALUE, NUMCOLUMNS, TYPE,
QUANTILENO
SYSIBM.SYSCOLSTATS - COLCARD, HIGHKEY, LOWKEY, PARTITION
SYSIBM.SYSCOLUMNS - COLCARDF, HIGH2KEY, LOW2KEY
SYSIBM.SYSINDEXES - CLUSTERING, CLUSTERRATIOF, FIRSTKEYCARDF,
FULLKEYCARDF, NLEAF, NLEVELS, DATAREPEATFACTORF
SYSIBM.SYSINDEXPART LIMITKEY
SYSIBM.SYSTABLES - CARDF, EDPROC, NPAGES, NPAGESF,
PCTROWCOMP
SYSIBM.SYSTABLESPACE - NACTIVEF
SYSIBM.SYSTABSTATS - CARDF, NPAGES

Statistics Rules General


Recommendations 1/2
Database statistics are recommended to run:
after loading a table and before binding application
plans and packages that access the table,
after creating an index,
after reorganizing a table space or an index,
after running utilities such as RECOVER or REBUILD,

Statistics Rules General


Recommendations 2/2
Database statistics are recommended to run:
after heavy insert, update, and delete activity,
against the DB2 catalog to provide DB2 with more
accurate information for access path selection of user
queries to the catalog,
before REORG or REBUILD in order to determine
which objects need reorganisation.

Other Factors Influencing the


Access Path
Among other factors influencing access paths there are:
DB2 HINTS,
amount of CPU and bufferpool definition;
In order to examine or improve an access path, it is worth
considering:

BIND EXPLAIN YES,


virtual indexes,

what if;

For more information check explain tables.

Invalidate Dynamic Statement


Cache
After database statistics changes the RUNSTATS utility
can be run with the REPORT NO and UPDATE NONE
options on the tablespace or on the index that the query is
dependent on.
It allows invalidating dynamic statement cache.
RUNSTATS TABLESPACE
BPG01.SPGOBJRT
TABLE(ALL)
SHRLEVEL CHANGE
REPORT NO
UPDATE NONE

Database statistics - tools

DSNACCOX
Procedure

1/2

DSNACCOX helps you determine on which objects


RUNSTATS utility should be run.

Recommendations are based on the amount of


changes rather than on the type of changes (distribution
statistics).

DSNACCOX
Procedure

2/2

IBM Data Studio Statistics


Advisor

Types of RUNSTATS:
COMPLETE RUNSTATS - for the query or workload.
REPAIR RUNSTATS - repairs the immediate
statistics problems.

Automation of the process

Maintenance Process

IBM TWS

ADT

BKP

REO

RTS

Integration of the DB2 Utilities


with IBM Tivoli Workload Scheduler
Automation

Characteristics of the implementation


of the IBM TWS:

universal pattern for every job,


application limit 255 programs per application,
automatic restart in case of an error,
one steering wheel for all maintenance tasks,
(dependencies between utilities);

RUNSTATS Process - How We Use It

Make decision reports, prioritise objects

Select candidate object from a control table

RUNSTATS

Verify and update / insert manually


DB2 catalog stats (only when applies)

Statistics Rules Reality of the


Project
Database statistics are run in the discussed project when:
objects do not have statistics, but are not empty,
object growth exceeds 3 mln rows since the last
RUNSTATS or
10% of changes occured,
table activities reached:
WHERE (STATSINSERTS + STATSDELETES + STATSUPDATES +
+ STATSMASSDELETE) > 100 and CARD = 0,
maintenance occured and:
REORGLASTTIME>STATSLASTTIME,
LOADRLASTTIME>STATSLASTTIME,
REBUILDLASTTIME>STATSLASTTIME,

every 3 months.

Control Table Maintenance


Processes
Select candidate object for control table

CREATE TABLE
PG.MAINTENANCE_CONTROL_TABLE (
ST_OBJECT
ST_DATABASE
ST_PARTITION
COPY part
ST_OBJECT_TYPE
ST_PRIORITY
ST_PLANNING_DATE
ST_UPDATEPRIO_DATE
ST_JOBID
ST_SAMPLE
ST_SQLID
ST_RULE_NAME
ST_NACTIVE
ST_ONDEMAND

RUNSTATS part

AD_IF_COPY_FULL
AD_IF_COPY_INC

Control Tables - Content

Example of RUNSTATS
Implementation IBM TWS

Challenges
in maintaining database statistics

Challenge #1 Mass Update and Copy


Statistics in a Production Environment
Complexity of the process:
many correlations;

Clone test statistics to the production environment;


Recommendations coming from tests are distributed as
a project product in order to:
indicate which release of application they concern,
minimise the cost of RUNSTATS,
install before the first BIND;

Every year new databases partitioned by year have to


be prepared and statistics need to be copied:
update LOW2KEY, HIGH2KEY and COLVALUE.

Challenge #2
Home-made Procedure for Updating
Statistics
Home-made procedure for updating statistics ad hoc
Requirements for the procedure:
security of transactions procedure saves previous values,
easy to use,

Columns allowed to be changed:

sysibm.SYSCOLUMNS.COLCARDF
sysibm.SYSINDEXES.CLUSTERRATIOF
sysibm.SYSINDEXES.NLEVELS
sysibm.SYSINDEXES.NLEAF
sysibm.SYSINDEXES.FIRSTKEYCARDF
sysibm.SYSINDEXES.FULLKEYCARDF
sysibm.SYSINDEXES.DATAREPEATFACTORF
sysibm.SYSCOLDIST (INSERT, UPDATE, DELETE)

Challenge #3 Statistics Conflicts

Statistics older on tablespaces than on indexes


Try to run statistics on tablespace and idexes at the same time.
Take care of your indexes statistics especially for distribution
of the first column of the index.

DB2 v11 helps you discover which statistics conflict


with each other.

See presentation Runstats Challenges for Optimal Query


Performance by Jase Alpers (IBM), Terry Purcell (IBM).

Challenge #4 Excessive MIPS


Consumption
Use as much as possible inline statistics to lower the
maintenance cost. They:
can be used in LOAD, REORG TABLESPACE, REORG
INDEX, and REBUILD INDEX utilities,
REORG TABLESPACE LIST REORG_TBSP DRAIN_WAIT 30 RETRY 4 RETRY_DELAY 10
STATISTICS TABLE (ALL) SAMPLE 60 INDEX (ALL KEYCARD FREQVAL NUMCOLS 2 COUNT
15)

cost less than using RUNSTATS utility in a separate job,


have some restrictions:
LOAD with inline statistics is only valid for REPLACE or RESUME
NO options,
be aware of collecting statistics during REORG SHRLEVEL
CHANGE
or REBUILD SHRLEVEL CHANGE.

Challenge #5
Running RUNSTATS in the Right Time

Be aware of your data (Frequency Statistics) on indexes


with status column.
Inconvenient time of running statistics influences access
paths.
Try to run statistics for indexes and tables at the same
time (keep indexes statistics newer).

Challenge #6
Building Access Paths by Using CGTT
Update CGTT statistics in order to put this table as first
in the access path.
CGTT

SELECT

JOIN
SMALL
TABLE

Update table colcard and


columns used for joins
SELECT * FROM SMALL, BIG
WHERE SMALL.ID = BIG.ID

BIG TABLE

Small = 1 000 rows


Big = 2,5 mld rows

Conclusions

To Sum up
Inline statistics cost less.
Automation and database statistics rules help you keep
statistics process under control.
Coping statistics cost less than running RUNSTATS.
Remember about correlation between statistics.
DB2 v11 can find inconsistencies in statistics.
Remember about TESTS!

Dzikuj!
Jacek Rafalak
jacek.rafalak@asseco.pl
Asseco Poland S.A.

Potrebbero piacerti anche