Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data Warehouse
Jeff Moss
My First Presentation
My background
Five tips
– Partition for success
– Squeeze your data with data segment compression
– Make the most of your PGA memory
– Beware of temporal data affecting the optimizer
– Find out where your query is at
Questions
My Background
Independent Consultant
13 years Oracle experience
Blog: http://oramossoracle.blogspot.com/
Focused on warehousing / VLDB since 1998
First project
– UK Music Sales Data Mart
– Produces BBC Radio 1 Top 40 chart and many more
– 2 billion row sales fact table
– 1 Tb total database size
Currently working with Eon UK (Powergen)
– 4Tb Production Warehouse, 8Tb total storage
– Oracle Product Stack
What Is Partitioning ?
– Read / Write
P_JAN_2006
– Read Only P_FEB_2006 T_Q1_2006
P_MAR_2006
performance FEB
MAR
SELECT SUM(sales)
FROM part_tab
APR
– Pruning or elimination MAY
JUN
WHERE sales_date BETWEEN ‘01-JAN-2005’
AND ’30-JUN-2005’
JUL
– Partition wise joins AUG
SEP
OCT
– Quicker checkpointing
– Quicker backup
– Quicker recovery
– …but it depends on
mapping of:
– partition:tablespace:datafile
P_JAN_2006
P_FEB_2006 T_Q1_2006
P_MAR_2006
Archiving
Exchange
Partition
Partition
Truncation
Local Indexes
What Is Data Segment
Compression ?
Compresses data by eliminating intra block
repeated column values
Reduces the space required for a segment
– …but only if there are appropriate repeats!
Self contained
Lossless algorithm
Where Can Data Segment
Compression Be Used ?
100
101
102 bill amount
Call to discuss new product TEL
MAIL NO
YES YES
N/A
Database Block
Symbol Table
1 100 4 NO 7 Call to discuss new product 10 102
2 Call to discuss bill amount 5 YES 8 MAIL
3 TEL 6 101 9 N/A
Pros Cons
– Saves space – Increases CPU load
Reduces LIO / PIO – Can only be used on Direct
Speeds up Path operations
backup/recovery CTAS
Improves query response Serial Inserts using
time INSERT /*+ APPEND */
– Transparent Parallel Inserts (PDML)
To readers ALTER TABLE…MOVE…
…and writers Direct Path SQL*Loader
– Decreases time to perform – Increases time to perform
some DML some DML
Deletes should be quicker Bulk inserts may be
Bulk inserts may be slower
quicker Updates are slower
Ordering Your Data For
Maximum Benefits
Colocate data to maximise compression benefits
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Uniformly distributed
1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Colocated
[] V$SQL_WORKAREA.OPERATION_TYPE
PGA Memory
Management: Manual
The “old” way of doing things
– Still available though – even in 10g R2
Configuring
– ALTER SESSION SET WORKAREA_SIZE_POLICY=MANUAL;
– Initialisation parameter: WORKAREA_SIZE_POLICY=MANUAL
Set memory parameters yourself
– HASH_AREA_SIZE
– SORT_AREA_SIZE
– SORT_AREA_RETAINED_SIZE
– BITMAP_MERGE_AREA_SIZE
– CREATE_BITMAP_AREA_SIZE
Optimal values depend on the type of work1
– One size does not fit all!
Jože Senegačnik - Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005
Auto PGA Parameters: Pre 10gR2
WORKAREA_SIZE_POLICY
– Set to AUTO
PGA_AGGREGATE_TARGET
– The target for summed PGA across all processes
– Can be exceeded if too small
Over Allocation
_PGA_MAX_SIZE
– Target maximum PGA size for a single process
– Default is a fixed value of 200Mb
– Hidden / Undocumented Parameter
Usual caveats apply
Auto PGA Parameters : Pre 10gR2
_SMM_MAX_SIZE
– Limit for a single workarea operation for one process
– Derived Default
LEAST(5% of PGA_AGGREGATE_TARGET
, 50% of _PGA_MAX_SIZE)
Hits limit of 100Mb
– When PGA_AGGREGATE_TARGET is >= 2000Mb
– And _PGA_MAX_SIZE is left at default of 200Mb
– _PGA_MAX_SIZE = 2 * _SMM_MAX_SIZE
Parallel operations
– _SMM_PX_MAX_SIZE = 50% * PGA_AGGREGATE_TARGET
– When DOP <=5 then _smm_max_size is used
– When DOP > 5 _smm_px_max_size / DOP is used
Jože Senegačnik - Advanced Management Of Working Areas In Oracle 9i/10g, presented at UKOUG 2005
PGA Target Advisor
select trunc(pga_target_for_estimate/1024/1024) pga_target_for_estimate
, to_char(pga_target_factor * 100,'999.9') ||'%' pga_target_factor
, trunc(bytes_processed/1024/1024) bytes_processed
, trunc(estd_extra_bytes_rw/1024/1024) estd_extra_bytes_rw
, to_char(estd_pga_cache_hit_percentage,'999') ||
'%' estd_pga_cache_hit_percentage
, estd_overalloc_count
from v$pga_target_advice
/
PGA Target For PGA Tgt Estimated Extra Estimated PGA Estimated
Estimate Mb Factor Bytes Processed Bytes Read/Written Cache Hit % Overallocation Count
-------------- ------- ---------------- ------------------ --------------- --------------------
5,376 12.5% 5,884,017 7,279,799 45% 113
10,752 25.0% 5,884,017 3,593,510 62% 8
21,504 50.0% 5,884,017 3,140,993 65% 0
32,256 75.0% 5,884,017 3,104,894 65% 0
43,008 100.0% 5,884,017 2,300,826 72% 0
51,609 120.0% 5,884,017 2,189,160 73% 0
60,211 140.0% 5,884,017 2,189,160 73% 0
68,812 160.0% 5,884,017 2,189,160 73% 0
77,414 180.0% 5,884,017 2,189,160 73% 0
86,016 200.0% 5,884,017 2,189,160 73% 0
129,024 300.0% 5,884,017 2,189,160 73% 0
172,032 400.0% 5,884,017 2,189,160 73% 0
258,048 600.0% 5,884,017 2,189,160 73% 0
Beware Of Temporal Data
Affecting The Optimizer
Slowly Changing Dimensions
– Cover ranges of time
– “From” and “To” DATE columns define applicability
– Need BETWEEN operator to retrieve rows for a reporting point in time
SELECT * FROM d_customer
WHERE ’15/01/2005’ BETWEEN valid_from AND valid_to
CUSTOMER
CUSTOMER_ID NAME CUSTOMER_TYPE
487438 Jeff Moss SME Month 1
1st Jan, 2004
D_CUSTOMER
CUSTOMER_ID NAME CUSTOMER_TYPE VALID_FROM VALID_TO
487438 Jeff Moss SME 01/01/2004
CUSTOMER
CUSTOMER_ID NAME CUSTOMER_TYPE
487438 Jeff Moss I&C
839398 Mark Rittman SME
Month 2
D_CUSTOMER 1st Feb, 2004
CUSTOMER_ID NAME CUSTOMER_TYPE VALID_FROM VALID_TO
487438 Jeff Moss SME 01/01/2004 31/01/2004
487438 Jeff Moss I&C 01/02/2004
839398 Mark Rittman SME 01/02/2004
Dependent Predicates
Key Non Key Attr From To Key Non Key Attr From To Key Non Key Attr From To
1 Jeff 01-Jan-2005 31-Jan-2005 1 Jeff 01-Jan-2005 30-Jun-2005 1 Jeff 01-Jan-2005 31-Dec-2005
2 Mark 01-Feb-2005 28-Feb-2005 2 Mark 01-Feb-2005 30-Jun-2005 2 Mark 01-Feb-2005 31-Dec-2005
3 Doug 01-Mar-2005 31-Mar-2005 3 Doug 01-Mar-2005 30-Jun-2005 3 Doug 01-Mar-2005 31-Dec-2005
4 Niall 01-Apr-2005 30-Apr-2005 4 Niall 01-Apr-2005 30-Jun-2005 4 Niall 01-Apr-2005 31-Dec-2005
5 Tom 01-May-2005 31-May-2005 5 Tom 01-May-2005 30-Jun-2005 5 Tom 01-May-2005 31-Dec-2005
6 Jonathan 01-Jun-2005 30-Jun-2005 6 Jonathan 01-Jun-2005 30-Jun-2005 6 Jonathan 01-Jun-2005 31-Dec-2005
7 Lisa 01-Jul-2005 31-Jul-2005 7 Lisa 01-Jul-2005 31-Dec-2005 7 Lisa 01-Jul-2005 31-Dec-2005
8 Cary 01-Aug-2005 31-Aug-2005 8 Cary 01-Aug-2005 31-Dec-2005 8 Cary 01-Aug-2005 31-Dec-2005
9 Mogens 01-Sep-2005 30-Sep-2005 9 Mogens 01-Sep-2005 31-Dec-2005 9 Mogens 01-Sep-2005 31-Dec-2005
10 Anjo 01-Oct-2005 31-Oct-2005 10 Anjo 01-Oct-2005 31-Dec-2005 10 Anjo 01-Oct-2005 31-Dec-2005
11 Larry 01-Nov-2005 30-Nov-2005 11 Larry 01-Nov-2005 31-Dec-2005 11 Larry 01-Nov-2005 31-Dec-2005
12 Pete 01-Dec-2005 31-Dec-2005 12 Pete 01-Dec-2005 31-Dec-2005 12 Pete 01-Dec-2005 31-Dec-2005
Optimizer Gets Incorrect
Cardinality
select * from test_1_distinct_td
where to_date('09-OCT-2005','DD-MON-YYYY') between from_date and to_date;
Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 264 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_1_DISTINCT_TD | 11 | 264 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
…And Again
Ignore it
– If your query still gets the right plan of course!
Hints
– Force the optimizer to do as you tell it
Stored outlines
Adjust statistics held against the table
– Affects any SQL that accesses that object
Optimizer Profile (10g)
– Offline Optimisation1
Dynamic sampling level 4 or above
– Samples “single table predicates that reference 2 or more
columns”
– Takes extra time during the parse – minimal but often worth it
1 - Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2
Dynamic Sampling With A Hint
10 rows selected.
Execution Plan
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 240 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST_1_DISTINCT_TD | 10 | 240 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Find Out Where Your Query Is At
Sorts
| 4 | BUFFER SORT | | | | | |
| 5 | PX RECEIVE | | 207K| 9510K| | 25982 (9)|
| 6 | PX SEND BROADCAST | :TQ20000 | 207K| 9510K| | 25982 (9)|
Aggregations
| 7 | VIEW | | 207K| 9510K| | 25982 (9)|
| 8 | WINDOW SORT | | 207K| 10M| 26M| 25982 (9)|
| 9 | MERGE JOIN | | 207K| 10M| | 25976 (9)|
Hash joins
| 10 | TABLE ACCESS BY INDEX ROWID| AML_T_ANALYSIS_DATE | 1 | 22 | | 2 (0)|
| 11 | INDEX UNIQUE SCAN | AML_I_ANL_PK | 1 | | | 0 (0)|
| 12 | SORT AGGREGATE | | 1 | 9 | | |
| 13 |
| 14 |
| 15 |
PX COORDINATOR
PX SEND QC (RANDOM)
SORT AGGREGATE
|
|
|
:TQ10000
|
|
|
|
1 |
1 |
|
9 |
9 |
|
|
|
|
|
|
Merge joins
| 16 |
| 17 |
| 18 | FILTER
PX BLOCK ITERATOR
TABLE ACCESS FULL
|
| AML_T_ANALYSIS_DATE
|
|
|
|
1 |
1 |
|
9 |
9 |
|
|
|
|
2
2
(0)|
(0)|
|
Table scans
| 19 |
| 20 |
FILTER
TABLE ACCESS FULL
| |
| AML_T_BILLING_ACCOUNT_DIM|
|
82M| 2371M|
| |
| 5457
|
(5)| Materialized
View scans
| 21 | HASH JOIN | | 18M| 1340M| | 23704 (10)|
| 22 | HASH JOIN | | 10M| 500M| | 17005 (11)|
| 23 | PX RECEIVE | | 10M| 265M| | 11304 (14)|
Analytics
| 24 | PX SEND HASH | :TQ20003 | 10M| 265M| | 11304 (14)|
| 25 | BUFFER SORT | | 1 | 124 | | |
| 26 | VIEW | AML_V_MD_CUH_SID | 10M| 265M| | 11304 (14)|
Parallel
| 27 | HASH JOIN | | 10M| 337M| | 11304 (14)|
| 28 | PX RECEIVE | | 17M| 310M| | 5228 (18)|
| 29 | PX SEND HASH | :TQ20001 | 17M| 310M| | 5228 (18)|
| 30 |
| 31 |
PX BLOCK ITERATOR
TABLE ACCESS FULL
|
| AML_T_MEASURE_DIM
|
|
17M|
17M|
310M|
310M|
| 5228 (18)|
| 5228 (18)|
Query
Pruning
| 32 | PX RECEIVE | | 34M| 461M| | 5958 (10)|
| 33 | PX SEND HASH | :TQ20002 | 34M| 461M| | 5958 (10)|
| 34 | PX BLOCK ITERATOR | | 34M| 461M| | 5958 (10)|
Temp Space
| 35 | TABLE ACCESS FULL | AML_T_CUSTOMER_DIM | 34M| 461M| | 5958 (10)|
| 36 | PX RECEIVE | | 55M| 1212M| | 5562 (3)|
| 37 | PX SEND HASH | :TQ20004 | 55M| 1212M| | 5562 (3)|
| 38 |
| 39 |
PX BLOCK ITERATOR
TABLE ACCESS FULL
|
| AML_T_CUSTOMER_DIM
|
|
55M| 1212M|
55M| 1212M|
| 5562
| 5562
(3)|
(3)| Use
| 40 | PX RECEIVE | | 94M| 2516M| | 6483 (5)|
| 41 | PX SEND HASH | :TQ20005 | 94M| 2516M| | 6483 (5)|
| 42 | PX BLOCK ITERATOR | | 94M| 2516M| | 6483 (5)|
| 43 | MAT_VIEW ACCESS FULL | AML_M_CD_BAD | 94M| 2516M| | 6483 (5)|
V$ Views To The Rescue ?
V$SQL_PLAN Bug
– Service Request: 4990863.992
– Broken in 10gR1, Works in 10gR2
– PARENT_ID corruption
Can’t link rows in this view to their parents as the values are
corrupted due to this bug
Shows up in TEMP TABLE TRANSFORMATION operations
Multiple Work Areas can be active…or None
Some operations are not shown in Long ops
V$SESSION sql_id may not be the executing cursor
– E.g. for refreshing Materialized View