Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Objectives
Understand the statistics and how they effect the optimizer Understand why you may need/want to add or modify statistics Know when and how to add or modify the statistics
Assumptions
Know what the optimizer is - and some basics of how it works and what it does Have access to ASE 11.9.2 or above - this presentation does not deal with versions prior to 11.9.2 Know the basics of optimizer analysis Have some experience with traceon 302 and optdiag
Caveats
Adding or changing the statistics is not something you must do. It is something you should be aware of and consider doing
After full testing of course
You will need SA role to do some of this Please test changes to statistics before implementing them You may want to consider using optdiag simulate to test changes to the statistics - see Tech Doc 20472 Using Optdiag Simulate Statistics Mode Your mileage may vary
Used by the optimizer to estimate the most efficient way to access the data required by a query They are the optimizers only view of its universe
The statistics are all it knows about your dataset
Most statistics stored as varbinary - use optdiag to read and write them
To change the default selectivity values - writing statistics To change the density values - writing statistics To add statistics to a columns histogram - writing statistics
Default statistics Statistics obtained by running update statistics on a table or index. As youve done in pre-11.9.2 versions update statistics table_A [index_1] Writes statistics for the leading column of index(es) only
Statistics on columns other than the leading column of an index - minor index columns and non-indexed columns
Changing the number of requested steps Modified statistics -
Values of the steps are used as boundary values in the new histogram.
Weights are approximated based on the steps of the distribution page FCs are created from duplicate step values, again weights are only approximated Density values are copied over Step counts are based on the number of steps in the old page
Maintenance considerations include The time it takes to run update statistics on each column Editing and/or reading in an optdiag file Increased use of tempdb for updating column statistics
A worktable will be used to sort all inner index columns and non-indexed columns
All values based on the data are overwritten Density values, and all histogram values Range and In between default selectivity values are persistent You will need to reset non-persistent values after running update statistics - Read in an optdiag file
Gives the optimizer more info about a composite index (more selective). On a non-indexed column - able to cost joins on non-indexed columns Not absolutely necessary, but highly recommended
update statistics table_name (col_name)- for single column update index statistics table_name [ind_name] - for all columns of composite index(es)
Estimating selectivity of index 't1_i1', indid 2 scan selectivity 0.900283, filter selectivity 0.000283 28 rows, 882 pages
Without statistics on the column there is no Total density value to use in costing joins Assumptions made about how many rows will qualify Not usually accurate - based on the join operator
Estimated selectivity for col_A, selectivity = 0.100000.
Statistics on the column allow the total density value to be used to estimate the number of qualifying rows.
Estimated selectivity for col_A, selectivity = 0.000025, upper limit = 0.081425.
By default new column statistics are built using 20 steps (cells) If statistics exist the same step count will be reused unless you specify a different count Increasing the step count may result in more frequency count cells - know your data May help in optimization of range SARGs because cell granularity is increased - narrower cells (steps)
FCs represent only one value in the column - most accurate since the weight is for only one value. Range Cells represent more than one value, uniform distribution within the cell is assumed FCs can be pulled out of a RC when the value is > 50% of a cell width and there are enough requested steps available
Cell width = number of rows/(number of requested steps -1)
Table has 100K rows and 35 distinct values, the lowest number of rows occupied by a value is 6 (we want a FC for this value) The number of steps to request =
((rows * .50)/(rows for the value with least rows))+1
( (100,000*.5)/6)+1 = 8335 The final histogram will have either 36 or 71 cells depending on the type of frequency count cell This is an extreme example. You may not need to have an FC for each value in the table
Assumes Range Cells - low degree of duplicates Try doubling requested step count - then work up from there Interpolation will establish how close a range SARG value is to a boundary value of a cell and then estimate the number of rows that qualify for the SARG.
create index I1 on T1 (colA) using X values update statistics T1 using X values X values = requested steps, seen in optdiag Remember - cells use procedure cache when read You may not need a lot of cells Again, dont change step counts after an upgrade until youve tested create index with 0 values will create index, but will not write the statistics
Always get an optdiag output file before writing or changing the statistics - as insurance Useful in general if you want to go back to a previous set of statistics -o output_file_name, -i input_file_name Save a clean copy of the output file Rename and edit output file for changes to the statistics
All changes to the column level statistics, other than the default selectivity values, will be over written by update statistics Traceon 302 output will display message when edited statistics are used in costing
Statistics for this column have been edited.
If you change non-persistent values you will need set them back after updating statistics
Data skew will have an effect on the Total Density value and thus on the costing of joins. Possible that the estimated number of rows qualifying for a join from an inner table will be pessimistic - check traceon 310 Weighted averaging used to obtain the total density Change Total Density or leave it alone?
Test it first!! - the Total density value will be used for all queries that join the column One approach is to set Total Density equal to Range Cell density Be careful of a 0 density value, use something higher than 0 Maybe change it to the arithmetic average Changes to density values are not persistent - update statistics will over write them
Will set the Total Density to equal the Range Cell Density Available in 11.9.2.2 and 12.0.1- not yet documented Caution - remember that the Total Density will be used for all joins on the column. If the Range cell density is very low you may want to modify Total density using optdiag sp_modifystats tab_name, col_name, REMOVE_SKEW_FROM_DENSITY
If you see an unknown value message in 302 and the default selectivity is inefficient
Try to eliminate the unknown value, if not possible
Traceon 302 sample Estimated selectivity for colA, selectivity = 0.000000, upper limit = 0.000000. Lower bound search value 10000 is greater than the largest value in sysstatistics for this column.
Its July 8 and update statistics hasnt been run - step 20 is last
18 19 20 0.05301946 0.05290456 0.04818739 <= <= <= "May "Jun "Jul 1 2000 12:00AM" 1 2000 12:00AM" 1 2000 12:00AM"
Usually done to add FCs or a dummy boundary value A few rules apply when adding cells to the histogram:
The step numbers must increase monotonically. The weight of a cell must be between 0 and 1.0. The sum of the cell weights must be close to 1.0 (0.99 to 1.01)
Always test it before implementing it!! Save a copy (optdiag output) of the original stats
The dump and load method - no editing of statistics required Dump dataset and load somewhere else Run update stats on the loaded dataset Get an optdiag output file for those tables you want new stats on Load the optdiag file into the original dataset May take as much time as running update statistics on the original dataset - but no interference with users
Conclusion
ASE 11.9.2 and above allows you add or write statistics
Adding and writing statistics in not absolutely necessary Adding column level statistics is highly recommended in most cases Writing statistics is recommended only when necessary
Any Whats New docs for a new ASE release Tech Docs at Sybase Support
http://techinfo.sybase.com/css/techinfo.nsf/Home
Join the ISUG (ISUG Technical Journal, feature requests and more)
http://www.isug.com