Top Ten Workflow Tips

Top 10 Workflow Tips and Log Reading New York User Group Meeting
Chris Main Northeast Consultant April 20th, 2005
Agenda
Log Reading and Performance Tuning
#1 Interpret the Session Log #2 Tweak Session Properties #3 Use Partitioning #4 Consolidate Multiple Lookups #5 Use Server Grids
Agenda
#6 Use Expression Variables #7 Use Changed-Data Capture #8 Server Variables #9 Reduce Failures #10 Use Performance Counters
Tip #1: Interpret the Session Log

Session logs hold all session information Determine bottlenecks, errors, load statistics
(#1) Session LogsInitialization

MASTER> CMN_1688 Allocated [12000000] bytes from process memory for [DTM Buffer Pool]. MASTER> PETL_24000 Parallel Pipeline Engine initializing. MASTER> PETL_24001 Parallel Pipeline Engine running. MASTER> PETL_24003 Initializing session run. MAPPING> TM_6014 Initializing session [S_M_CONNECTION_EXTRACT] at [Thu Feb 05 16:35:41 2004] MAPPING> TM_6101 Mapping name: M_CONNECTION_EXTRACT Version 1.0.0 MAPPING> CMN_1569 Server Mode: [ASCII] MAPPING> CMN_1570 Server Codepage: [MS Windows Latin 1 (ANSI), superset of Latin1] MAPPING> TM_6151 Session Sort Order: [Binary] MAPPING> TM_6156 Using LOW precision decimal arithmetic MAPPING> TM_6180 Deadlock retry logic will not be implemented. MAPPING> TE_7022 TShmWriter: Initialized MAPPING> DBG_21321 Loaded external module library [C:\Program Files\Informatica\Informatica PowerCenterRT 6.2 - Server\ExtProc\crc32.dll] MAPPING> TM_6007 DTM initialized successfully for session [S_M_CONNECTION_EXTRACT]
Allocated memory Mapping name Libraries

5
(#1) Session LogsLookups

Check Query Check start time Check number of rows Compare amount of data with allotted memory in session
TRANSF_1_1_1_1> DBG_21097 Default sql to create lookup cache: SELECT CLASS_UID,CLASS_ID FROM IMW_CLASS ORDER BY CLASS_ID,CLASS_UID
TRANSF_1_1_1_1> DBG_21079 Creating Lookup Cache : (Thu Feb 05 16:35:41 2004)
TRANSF_1_1_1_1> DBG_21297 Lookup cache row count : 704
TRANSF_1_1_1_1> DBG_21294 Lookup cache creation completed : (Thu Feb 05 16:35:41 2004)
(#1) Session LogsStatistics

READER_1_1_1] Total Run Time = [1685.518988] secs, Total Idle Time = [1009.423504] secs, Busy Percentage = [40.112006]. TRANSF_1_1_1_1] Total Run Time = [1524.413078] secs, Total Idle Time = [1380.883920] secs, Busy Percentage = [9.415372]. TRANSF_1_1_1_2] Total Run Time = [1705.480528] secs, Total Idle Time = [48.280356] secs, Busy Percentage = [97.169105]. WRITER_1_1_1] Total Run Time = [1660.035222] secs, Total Idle Time = [493.218257] secs, Busy Percentage = [70.288687].
Busy percentages can identify potential bottlenecks Adjust your mappings to level out the bottlenecks Here the transformation is of bottleneckconsider partitioning Dont need 100% utilization, but should be high and close together
(#1) Session LogsErrors

Find rejected records in the log files Eliminate errors, especially if reoccurring Use error-handling strategies
Keep your sessions running error free. Anticipate errors and write mappings that will handle these errors in the appropriate method. Each error that occurs will cause the session to run slower.
Tip #2: Tweak Session Properties

Experiment!! Use the following tips as guidelines (what to look for in Properties) Discover settings optimal for your particular system Any settings changes must consider all other settings
(#2) Server ArchitectureMemory

DTM Buffer Pool Size sets amount of memory for Reader and Writer buffers
Total number of blocks available Optimal value is about 25MB Block size of 64K means 25M/64K = 390 blocks
Buffer Block Size sets size of blocks in the pipeline

Optimum size depends on the row size being processed 64KB 64 rows of 1KB 128KB 128 rows of 1KB
10
(#2) Server ArchitectureDTM Parameters

Session Task Parameters Control the Processing Pipeline and Are Found on the Properties and Config Object Tabs
11
(#2) Tweaking Commit Intervals

Commit Intervals
Default 10,000 rows target Can switch between target commit and source commit A commit can slow down load process Large commit levels may fill database transaction logs Small commits may be slower
12
(#2) Tweaking Cache Sizes

Cache Sizes
Default 1MB index and 1MB data Calculate how much cache you will need Too little = Paging Too much = Wasted resources, wasted time Set in either mapping or session (session overrides mapping) PowerCenter Server uses cache for Lookup, Aggregator, Rank, Joiner and Sorter Transformations
13
Tip #3: Use Partitioning

When read or writes are at 100% busy When databases are partitioned (use database partitioning) When transformations are at 100% busy When source SQL is too complex When additional thread will increase performance
14
(#3) Threads, Partition Points, and Stages
Shortcut_To_ ORDER_DATA (Flat File)
SQ_Shortcut _To_ORDER_ DATA 1
FILTRANS
EXP_RPT
AGG_RPT
Shortcut_To_ AGG_Customer _SALES (Oracle)
Reader Thread (First Stage)
Transformation Thread (Second Stage)
Transformn Writer Thread Thread (Third Stage) (Fourth Stage)
Threads are created to move data down the pipeline The data is moved in pipeline stages defined by partition points By default PowerCenter assigns a partition point Source Qualifier, Target, Aggregator, and Rank Transformations
15
at the
Terminology in action
Pipeline
Thread 1
Thread 2
Thread 3
16
17
No Tuning Session Thread Stats

MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [256.619917] secs, Total Idle Time = [189.648261] secs, Busy Percentage = [26.097606]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [255.374506] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]. MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [DEMAND2_ALL_INSERTS, DWLDEMDISD] has completed: Total Run Time = [198.297469] secs, Total Idle Time = [173.083904] secs, Busy Percentage = [12.715021]. MASTER> PETL_24021 ***** END RUN INFO *****
18
Re/Partition Point
Create Partition Point
Pipeline
PP
Thread 1
Thread 2
Thread 3
19
Re/Partition Points
Pipeline
PP
Thread 1
Thread 2
Thread 3
20
Re/Partition Points
Pipeline
PP
Thread 1
Thread 2
Thread 4
21
Re/Partition Points
Pipeline
PP
Thread 1
Thread 2
Thread 3
Thread 4
22
Adding Partition Points
23
24
25
Thread Stats after adding Partition Points

MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [160.635861] secs, Total Idle Time = [92.679054] secs, Busy Percentage = [42.304879]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [158.843670] secs, Total Idle Time = [146.467909] secs, Busy Percentage = [7.791158]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_2] created for the transformation stage of partition point [exp_Evaluate_Conditions] has completed: Total Run Time = [159.006092] secs, Total Idle Time = [15.034492] secs, Busy Percentage = [90.544707]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_3] created for the transformation stage of partition point [exp_Target_Placeholder] has completed: Total Run Time = [126.347018] secs, Total Idle Time = [72.726160] secs, Busy Percentage = [42.439354]. MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [DEMAND2_ALL_INSERTS, DWLDEMDISD] has completed: Total Run Time = [112.095416] secs, Total Idle Time = [86.447935] secs, Busy Percentage = [22.880044].
26
Partitioning (Data Partition)

Pipeline
Create Partition (add #2)
PP PP
Thread 1 Thread 2
Thread 3
Thread 4
Thread 5
27

Partition 1 Partition 2
PP PP
Thread 1 Thread 2 Thread 6 Thread 7
Thread 3 Thread 8
Thread 4 Thread 9
Thread 5 Thread 10
28

Partition 1
Partition 2
Thread 1 Thread 2 Thread 3
PP
PP
PP
Thread 4
PP
Thread 5
Thread 6 Thread 7
Thread 8
Thread 9
Thread 10
29

Partition 1
PP
PP
Thread 1 Thread 2
Partition 2
Thread 3
Thread 4
Thread 5
PP
PP
Thread 6 Thread 7
30
Thread 8
Thread 9
Thread 10

Partition 1
PP
PP
Thread 1 Thread 2
Thread 3
Thread 4
Thread 5
Partition 2
PP
PP
31
Thread 6 Thread 7
Thread 8
Thread 9
Thread 10
Add Data Partition
32
Thread Stats after adding Data Partition

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], SRC PIPELINE [1] ***** MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [147.332109] secs, Total Idle Time = [121.743271] secs, Busy Percentage = [17.368134]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [145.683316] secs, Total Idle Time = [140.503645] secs, Busy Percentage = [3.555432]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_2] created for the transformation stage of partition point [exp_Evaluate_Conditions] has completed: Total Run Time = [146.017557] secs, Total Idle Time = [25.919622] secs, Busy Percentage = [82.248969]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_3] created for the transformation stage of partition point [exp_Target_Placeholder] has completed: Total Run Time = [83.096817] secs, Total Idle Time = [36.874846] secs, Busy Percentage = [55.624238]. MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [DEMAND2_ALL_INSERTS, DWLDEMDISD] has completed: Total Run Time = [88.927355] secs, Total Idle Time = [74.749625] secs, Busy Percentage = [15.943047]. MASTER> PETL_24018 Thread [READER_1_1_2] created for the read stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [106.301414] secs, Total Idle Time = [97.831194] secs, Busy Percentage = [7.968116]. MASTER> PETL_24019 Thread [TRANSF_1_1_2_1] created for the transformation stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [103.615048] secs, Total Idle Time = [102.736694] secs, Busy Percentage = [0.847709]. MASTER> PETL_24019 Thread [TRANSF_1_1_2_2] created for the transformation stage of partition point [exp_Evaluate_Conditions] has completed: Total Run Time = [145.728914] secs, Total Idle Time = [27.767894] secs, Busy Percentage = [80.945515]. MASTER> PETL_24019 Thread [TRANSF_1_1_2_3] created for the transformation stage of partition point [exp_Target_Placeholder] has completed: Total Run Time = [84.941995] secs, Total Idle Time = [39.152031] secs, Busy Percentage = [53.907333]. MASTER> PETL_24022 Thread [WRITER_1_1_2] created for the write stage of partition point(s) [DEMAND2_ALL_INSERTS, DWLDEMDISD] has completed: Total Run Time = [87.924204] secs, Total Idle Time = [74.557008] secs, Busy Percentage = [15.203091].
33
(#3) Adding Partitions and Partition Points
Shortcut_To_ ORDER_DATA (Flat File)
SQ_Shortcut _To_ORDER_ DATA 1
FILTRANS
EXP_RPT
AGG_RPT
Shortcut_To_ AGG_Customer _SALES (Oracle)
Threads - partition 1 Threads partition 2 Threads partition 3
3 Reader Threads (First Stage) (Second Stage)
6 Transformation Threads (Third Stage)
3 Writer Threads (Fourth Stage)
Adding partitions increases the number of threads Adding partition points increases the number of pipeline stages
34
(#3) Partition TypesPass Through

Define your own partitions SQ define your own SQL statements Partitions based on incoming data values For changeable source queries For evenly distributed extract data When partitions help avoid contention/deadlocks
35
(#3) Partition TypesPass Through

Sample Session with Pass Through Partitioning
New Partition Point at the EXP_RPT Transformation
36
(#3) Partition TypesRound Robin

Records alternate between partitions Evenly distributed between the partitions Use when source records are not evenly distributed or are unpredictable Grouping of data is not necessary among the partitions
37
(#3) Example: Round Robin Partitioning

Source Partitioning Pass Through
Target Partitioning Round Robin
Partitioned Flat File Order_Data_1.dat Order_Data_2.dat
38
(#3) Partition TypesKey Range

Specify keys and ranges for key values User-defined distribution Use when keys are defined easily (numbers) Compound keys accepted
39
(#3) Example: Source and Target Partitioning By Key Range

Source Partitioning Target Partitioning
40
Leave the start range blank or the end range blank to force a < or >.
(#3) Partition TypesHash

Specify keys on which to partition User-defined keys to create hash partitions Use to easily distribute the data evenly Data is grouped based on keys Hash auto-keys = PowerCenter Server uses all grouped or sorted ports as compound key
41
(#3) Example: Hash Partition

Hash partitioning used with an Aggregator transformation
42
(#3) Partition TypesDatabase

Based on Target IBM DB2 Database Multinode loading Target table must contain a partition key Non-DB2 targets will default to Pass Through
43
(#3) Partition Points Summary

A session pipeline has Reader, Writer, and one or more Transformation threads Those are default partition points One type of partitioning can be applied at these points The Session Mappings-Partitions tab displays the partition type allowed at that point
44
(#3) Partition Points Summary

Default partition points
Reader
Transformation
Transformation
Writer
Adding partition points
Reader
Transformation
Transformation
Transformation
Writer
Adding an additional partition point may increase session performance by adding additional threads
45
Tip #4: Consolidate Multiple Lookups

Lookups to lookupsreduce and reuse Consolidate lookups by using lookup overrides Use reusable lookups when multiple calls are needed Use unconnected lookups in conditional functions
46
(#4) Consolidating Lookups
47
48
49
Tip #5: Use Server Grids

A grid of servers in which sessions can run Assign workflow to a master server that will run sessions on the worker servers Currently Round Robin Master server can detect which server to run on
50
(#5) Server Grid Tips

Not a complete failover systemMaster server must always be up Configure all servers the sameparameter files, connections, scripts, email, etc. Can include different server platforms Use specified server if special properties are necessary (connections, proximity to source/targets, hardware, etc.)
51
Server Grid Option

Load Balancing and Failover for Heterogeneous Systems
Off-grid PowerCenter Server
PowerCenter Servers on Server Grid
Repository
52
Server Grid Option

Session3
Off-grid PowerCenter Server
PowerCenter Servers on Server Grid
Session2 Session1 Session5 (off-grid)
Session4
Session6
Repository
53
Tip #6: Use Expression Variables

Utilize variables in expressions to capture previous records data Can use with filters to perform aggregate functions Can use to compare two records (make sure source data is sorted) ORDER MATTERS!
54
(#6) Variables Example
var_RUNNING_TOTAL + SALES_REVENUE
55
Tip #7: Use Changed-Data Capture

Reduce data volumes to changes only Reduce load volume, increase performance May require special logic Some solutions may burden source system
56
(#7) Using Mapping Variables for CDC

Mapping variables to change dates, keys
Use mapping variables to store the last change date/key range Max date/key stored in the repository Successive runs, use the mapping variable in the SQ
Pros Only bring in affected records Minimal impact to mappings Restart or resetting dates possible
57
Cons Requires source to carry effective dates
(#7) Using CRC Checks for CDC

CRC checks
Calculate CRC values for each record Store CRC values in target tables Join source and target tables where keys match but CRC values do not match
Pros Only changed or new records Failure can just return
Cons Overhead on source system CRC function will slow mappings slightly
58
(#7) Using EAI Solution for CDC

EAI Message bus Use message bus to send only changed or new records to PowerCenter Real-time capabilities possible Minimal impact on source system
59
(#7) Using PowerExchange CDC

PowerExchange Pre-built CDC solutions for various databases Only processes new records or changed records Low-level logic provides optimal performance Minimal impact on source systems Supported by Informatica (not you)
60
Tip #8: Server Variables

Use Server Variables to ensure smooth migrations between environments Use Variables in sessions and workflow objects Update using PMREP or Workflow Manager
61
(#8) Server Variables

Server Variable Description A root directory for use by server variables. (usu. PowerCenter Server installation directory) Default directory for session logs. Defaults to $PMRootDir/SessLogs. Default directory for reject files. Defaults to $PMRootDir/BadFiles. Default directory for the index and data cache files. Defaults to $PMRootDir/Cache. Use a drive local to the PowerCenter Server. Default directory for target files. Defaults to $PMRootDir/TgtFiles. Default directory for source files. Defaults to $PMRootDir/SrcFiles. Default directory for external procedures. (def.: $PMRootDir/ExtProc) Default directory for temporary files. Defaults to $PMRootDir/Temp. Email address to receive post-session email when the session completes successfully. Email address to receive post-session email when the session fails.
$PMRootDir $PMSessionLogDir $PMBadFileDir $PMCacheDir $PMTargetFileDir $PMSourceFileDir $PMExtProcDir $PMTempDir $PMSuccessEmailUser $PMFailureEmailUser
62

Server Variable Description Number of session logs the PowerCenter Server archives for the session. (Def: 0) Number of non-fatal errors the PowerCenter Server allows before failing the session. Non-fatal errors include reader, writer, and DTM errors. If you want to stop the session on errors, enter the number of non-fatal errors you want to allow before stopping the session. The PowerCenter Server maintains an independent error count for each source, target, and transformation. Use to configure the Stop On option in the session properties.
$PMSessionLogCount $PMSessionErrorThreshold
$PMWorkflowLogDir
Default directory for workflow logs. Defaults to $PMRootDir/WorkflowLogs.
$PMWorkflowLogCount $PMLookupFileDir
Number of workflow logs the PowerCenter Server archives for the workflow. Defaults to 0. Default directory for lookup files. Defaults to $PMRootDir/LkpFiles.
63
64

System Variables:
$$$SessStartTime, SESSSTARTTIME, SYSDATE, WORKFLOWSTARTTIME
User Defined Variables:

$DBConnection, $InputFile, $OutputFile, $PMFailureEmailUser, $PMSessionLogCount, $PMSessionLogDir, $PMSessionLogFile, $PMSuccessEmailUser, $PMWorkflowLogCount, $PMWorkflowLogDir, $Source, $Target
65
Tip #9: Reduce Failures

Error Handling Strategies Capture errors into tables or flat files before they reach the target Use for error reports, corrections or warnings to sources Built-in Error Handling Functionality 7.x allows built-in error handling Captures function errors Filter Early! Downstream rejection of bad data can degrade performance
66
Tip #10: Use Performance Counters

All transformations have counters maintained by the serveruse to identify bottlenecks early
Enable Counters for session via Collect Performance Data option Server creates session_name.perf file for counter statistics Default location is session log directory Collecting performance data has some impact on session performance but not as much as tracing
The important counters:

Reads and writes to disk (Aggregators, Ranks, Sorters Joiners) Rows read from cache for Lookups Efficiency counters
67
(#10) Enabling Performance Counters

Performance counters provide a variety of statistics for each transformation in a map Counters are enabled in the session property
Session Wizard: Collect Performance Data
68
(#10) Viewing Performance Counters

Right click on the Session Task while running, select Properties, then Performance tab
69
Questions and Answers
70

Top Ten Workflow Tips

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Top Ten Workflow Tips

Caricato da

Copyright:

Formati disponibili

Top 10 Workflow Tips and Log Reading New York User Group Meeting

Chris Main Northeast Consultant April 20th, 2005

Tip #1: Interpret the Session Log

(#1) Session LogsInitialization

Allocated memory Mapping name Libraries

(#1) Session LogsLookups

TRANSF_1_1_1_1> DBG_21079 Creating Lookup Cache : (Thu Feb 05 16:35:41 2004)

TRANSF_1_1_1_1> DBG_21297 Lookup cache row count : 704

(#1) Session LogsStatistics

(#1) Session LogsErrors

Tip #2: Tweak Session Properties

(#2) Server ArchitectureMemory

Buffer Block Size sets size of blocks in the pipeline

(#2) Server ArchitectureDTM Parameters

(#2) Tweaking Commit Intervals

(#2) Tweaking Cache Sizes

Tip #3: Use Partitioning

(#3) Threads, Partition Points, and Stages

Shortcut_To_ ORDER_DATA (Flat File)

SQ_Shortcut _To_ORDER_ DATA 1

Shortcut_To_ AGG_Customer _SALES (Oracle)

Reader Thread (First Stage)

Transformation Thread (Second Stage)

Transformn Writer Thread Thread (Third Stage) (Fourth Stage)

No Tuning Session Thread Stats

Adding Partition Points

Thread Stats after adding Partition Points

Partitioning (Data Partition)

Partitioning (Data Partition)

Thread 1 Thread 2 Thread 6 Thread 7

Partitioning (Data Partition)

Partitioning (Data Partition)

Partitioning (Data Partition)

Add Data Partition

Thread Stats after adding Data Partition

(#3) Adding Partitions and Partition Points

Shortcut_To_ ORDER_DATA (Flat File)

SQ_Shortcut _To_ORDER_ DATA 1

Shortcut_To_ AGG_Customer _SALES (Oracle)

Threads - partition 1 Threads partition 2 Threads partition 3

3 Reader Threads (First Stage) (Second Stage)

6 Transformation Threads (Third Stage)

3 Writer Threads (Fourth Stage)

(#3) Partition TypesPass Through

(#3) Partition TypesPass Through

New Partition Point at the EXP_RPT Transformation

(#3) Partition TypesRound Robin

(#3) Example: Round Robin Partitioning

Partitioned Flat File Order_Data_1.dat Order_Data_2.dat

(#3) Partition TypesKey Range

(#3) Example: Source and Target Partitioning By Key Range

(#3) Partition TypesHash

(#3) Example: Hash Partition

(#3) Partition TypesDatabase

(#3) Partition Points Summary

(#3) Partition Points Summary

Adding partition points

Tip #4: Consolidate Multiple Lookups

(#4) Consolidating Lookups

(#4) Consolidating Lookups

(#4) Consolidating Lookups

Tip #5: Use Server Grids

(#5) Server Grid Tips