Sei sulla pagina 1di 70

Top 10 Workflow Tips and Log Reading New York User Group Meeting

Chris Main Northeast Consultant April 20th, 2005

Agenda
Log Reading and Performance Tuning
#1 Interpret the Session Log #2 Tweak Session Properties #3 Use Partitioning #4 Consolidate Multiple Lookups #5 Use Server Grids

Agenda

#6 Use Expression Variables #7 Use Changed-Data Capture #8 Server Variables #9 Reduce Failures #10 Use Performance Counters

Tip #1: Interpret the Session Log


Session logs hold all session information Determine bottlenecks, errors, load statistics

(#1) Session LogsInitialization


MASTER> CMN_1688 Allocated [12000000] bytes from process memory for [DTM Buffer Pool]. MASTER> PETL_24000 Parallel Pipeline Engine initializing. MASTER> PETL_24001 Parallel Pipeline Engine running. MASTER> PETL_24003 Initializing session run. MAPPING> TM_6014 Initializing session [S_M_CONNECTION_EXTRACT] at [Thu Feb 05 16:35:41 2004] MAPPING> TM_6101 Mapping name: M_CONNECTION_EXTRACT Version 1.0.0 MAPPING> CMN_1569 Server Mode: [ASCII] MAPPING> CMN_1570 Server Codepage: [MS Windows Latin 1 (ANSI), superset of Latin1] MAPPING> TM_6151 Session Sort Order: [Binary] MAPPING> TM_6156 Using LOW precision decimal arithmetic MAPPING> TM_6180 Deadlock retry logic will not be implemented. MAPPING> TE_7022 TShmWriter: Initialized MAPPING> DBG_21321 Loaded external module library [C:\Program Files\Informatica\Informatica PowerCenterRT 6.2 - Server\ExtProc\crc32.dll] MAPPING> TM_6007 DTM initialized successfully for session [S_M_CONNECTION_EXTRACT]

Allocated memory Mapping name Libraries


5

(#1) Session LogsLookups


Check Query Check start time Check number of rows Compare amount of data with allotted memory in session
TRANSF_1_1_1_1> DBG_21097 Default sql to create lookup cache: SELECT CLASS_UID,CLASS_ID FROM IMW_CLASS ORDER BY CLASS_ID,CLASS_UID

TRANSF_1_1_1_1> DBG_21079 Creating Lookup Cache : (Thu Feb 05 16:35:41 2004)

TRANSF_1_1_1_1> DBG_21297 Lookup cache row count : 704

TRANSF_1_1_1_1> DBG_21294 Lookup cache creation completed : (Thu Feb 05 16:35:41 2004)

(#1) Session LogsStatistics


READER_1_1_1] Total Run Time = [1685.518988] secs, Total Idle Time = [1009.423504] secs, Busy Percentage = [40.112006]. TRANSF_1_1_1_1] Total Run Time = [1524.413078] secs, Total Idle Time = [1380.883920] secs, Busy Percentage = [9.415372]. TRANSF_1_1_1_2] Total Run Time = [1705.480528] secs, Total Idle Time = [48.280356] secs, Busy Percentage = [97.169105]. WRITER_1_1_1] Total Run Time = [1660.035222] secs, Total Idle Time = [493.218257] secs, Busy Percentage = [70.288687].

Busy percentages can identify potential bottlenecks Adjust your mappings to level out the bottlenecks Here the transformation is of bottleneckconsider partitioning Dont need 100% utilization, but should be high and close together

(#1) Session LogsErrors


Find rejected records in the log files Eliminate errors, especially if reoccurring Use error-handling strategies

Keep your sessions running error free. Anticipate errors and write mappings that will handle these errors in the appropriate method. Each error that occurs will cause the session to run slower.

Tip #2: Tweak Session Properties


Experiment!! Use the following tips as guidelines (what to look for in Properties) Discover settings optimal for your particular system Any settings changes must consider all other settings

(#2) Server ArchitectureMemory


DTM Buffer Pool Size sets amount of memory for Reader and Writer buffers
Total number of blocks available Optimal value is about 25MB Block size of 64K means 25M/64K = 390 blocks

Buffer Block Size sets size of blocks in the pipeline


Optimum size depends on the row size being processed 64KB 64 rows of 1KB 128KB 128 rows of 1KB
10

(#2) Server ArchitectureDTM Parameters


Session Task Parameters Control the Processing Pipeline and Are Found on the Properties and Config Object Tabs

11

(#2) Tweaking Commit Intervals


Commit Intervals
Default 10,000 rows target Can switch between target commit and source commit A commit can slow down load process Large commit levels may fill database transaction logs Small commits may be slower

12

(#2) Tweaking Cache Sizes


Cache Sizes
Default 1MB index and 1MB data Calculate how much cache you will need Too little = Paging Too much = Wasted resources, wasted time Set in either mapping or session (session overrides mapping) PowerCenter Server uses cache for Lookup, Aggregator, Rank, Joiner and Sorter Transformations
13

Tip #3: Use Partitioning


When read or writes are at 100% busy When databases are partitioned (use database partitioning) When transformations are at 100% busy When source SQL is too complex When additional thread will increase performance

14

(#3) Threads, Partition Points, and Stages

Shortcut_To_ ORDER_DATA (Flat File)

SQ_Shortcut _To_ORDER_ DATA 1

FILTRANS

EXP_RPT

AGG_RPT

Shortcut_To_ AGG_Customer _SALES (Oracle)

Reader Thread (First Stage)

Transformation Thread (Second Stage)

Transformn Writer Thread Thread (Third Stage) (Fourth Stage)

Threads are created to move data down the pipeline The data is moved in pipeline stages defined by partition points By default PowerCenter assigns a partition point Source Qualifier, Target, Aggregator, and Rank Transformations
15

at the

Terminology in action
Pipeline

Thread 1

Thread 2

Thread 3

16

17

No Tuning Session Thread Stats


MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [256.619917] secs, Total Idle Time = [189.648261] secs, Busy Percentage = [26.097606]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [255.374506] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]. MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [DEMAND2_ALL_INSERTS, DWLDEMDISD] has completed: Total Run Time = [198.297469] secs, Total Idle Time = [173.083904] secs, Busy Percentage = [12.715021]. MASTER> PETL_24021 ***** END RUN INFO *****

18

Re/Partition Point
Create Partition Point

Pipeline

PP

Thread 1

Thread 2

Thread 3

19

Re/Partition Points
Pipeline

PP

Thread 1

Thread 2

Thread 3

20

Re/Partition Points
Pipeline

PP

Thread 1

Thread 2

Thread 4

21

Re/Partition Points
Pipeline

PP

Thread 1

Thread 2

Thread 3

Thread 4

22

Adding Partition Points

23

24

25

Thread Stats after adding Partition Points


MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [160.635861] secs, Total Idle Time = [92.679054] secs, Busy Percentage = [42.304879]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [158.843670] secs, Total Idle Time = [146.467909] secs, Busy Percentage = [7.791158]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_2] created for the transformation stage of partition point [exp_Evaluate_Conditions] has completed: Total Run Time = [159.006092] secs, Total Idle Time = [15.034492] secs, Busy Percentage = [90.544707]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_3] created for the transformation stage of partition point [exp_Target_Placeholder] has completed: Total Run Time = [126.347018] secs, Total Idle Time = [72.726160] secs, Busy Percentage = [42.439354]. MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [DEMAND2_ALL_INSERTS, DWLDEMDISD] has completed: Total Run Time = [112.095416] secs, Total Idle Time = [86.447935] secs, Busy Percentage = [22.880044].

26

Partitioning (Data Partition)


Pipeline
Create Partition (add #2)
PP PP

Thread 1 Thread 2

Thread 3

Thread 4

Thread 5

27

Partitioning (Data Partition)


Partition 1 Partition 2
PP PP

Thread 1 Thread 2 Thread 6 Thread 7

Thread 3 Thread 8

Thread 4 Thread 9

Thread 5 Thread 10

28

Partitioning (Data Partition)


Partition 1

Partition 2
Thread 1 Thread 2 Thread 3
PP

PP

PP

Thread 4
PP

Thread 5

Thread 6 Thread 7

Thread 8

Thread 9

Thread 10

29

Partitioning (Data Partition)


Partition 1

PP

PP

Thread 1 Thread 2

Partition 2
Thread 3

Thread 4

Thread 5

PP

PP

Thread 6 Thread 7
30

Thread 8

Thread 9

Thread 10

Partitioning (Data Partition)


Partition 1

PP

PP

Thread 1 Thread 2

Thread 3

Thread 4

Thread 5

Partition 2

PP

PP

31

Thread 6 Thread 7

Thread 8

Thread 9

Thread 10

Add Data Partition

32

Thread Stats after adding Data Partition


***** RUN INFO FOR TGT LOAD ORDER GROUP [1], SRC PIPELINE [1] ***** MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [147.332109] secs, Total Idle Time = [121.743271] secs, Busy Percentage = [17.368134]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [145.683316] secs, Total Idle Time = [140.503645] secs, Busy Percentage = [3.555432]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_2] created for the transformation stage of partition point [exp_Evaluate_Conditions] has completed: Total Run Time = [146.017557] secs, Total Idle Time = [25.919622] secs, Busy Percentage = [82.248969]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_3] created for the transformation stage of partition point [exp_Target_Placeholder] has completed: Total Run Time = [83.096817] secs, Total Idle Time = [36.874846] secs, Busy Percentage = [55.624238]. MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [DEMAND2_ALL_INSERTS, DWLDEMDISD] has completed: Total Run Time = [88.927355] secs, Total Idle Time = [74.749625] secs, Busy Percentage = [15.943047]. MASTER> PETL_24018 Thread [READER_1_1_2] created for the read stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [106.301414] secs, Total Idle Time = [97.831194] secs, Busy Percentage = [7.968116]. MASTER> PETL_24019 Thread [TRANSF_1_1_2_1] created for the transformation stage of partition point [SQ_ORPDTL] has completed: Total Run Time = [103.615048] secs, Total Idle Time = [102.736694] secs, Busy Percentage = [0.847709]. MASTER> PETL_24019 Thread [TRANSF_1_1_2_2] created for the transformation stage of partition point [exp_Evaluate_Conditions] has completed: Total Run Time = [145.728914] secs, Total Idle Time = [27.767894] secs, Busy Percentage = [80.945515]. MASTER> PETL_24019 Thread [TRANSF_1_1_2_3] created for the transformation stage of partition point [exp_Target_Placeholder] has completed: Total Run Time = [84.941995] secs, Total Idle Time = [39.152031] secs, Busy Percentage = [53.907333]. MASTER> PETL_24022 Thread [WRITER_1_1_2] created for the write stage of partition point(s) [DEMAND2_ALL_INSERTS, DWLDEMDISD] has completed: Total Run Time = [87.924204] secs, Total Idle Time = [74.557008] secs, Busy Percentage = [15.203091].

33

(#3) Adding Partitions and Partition Points

Shortcut_To_ ORDER_DATA (Flat File)

SQ_Shortcut _To_ORDER_ DATA 1

FILTRANS

EXP_RPT

AGG_RPT

Shortcut_To_ AGG_Customer _SALES (Oracle)

Threads - partition 1 Threads partition 2 Threads partition 3

3 Reader Threads (First Stage) (Second Stage)

6 Transformation Threads (Third Stage)

3 Writer Threads (Fourth Stage)

Adding partitions increases the number of threads Adding partition points increases the number of pipeline stages
34

(#3) Partition TypesPass Through


Define your own partitions SQ define your own SQL statements Partitions based on incoming data values For changeable source queries For evenly distributed extract data When partitions help avoid contention/deadlocks

35

(#3) Partition TypesPass Through


Sample Session with Pass Through Partitioning

New Partition Point at the EXP_RPT Transformation

36

(#3) Partition TypesRound Robin


Records alternate between partitions Evenly distributed between the partitions Use when source records are not evenly distributed or are unpredictable Grouping of data is not necessary among the partitions

37

(#3) Example: Round Robin Partitioning


Source Partitioning Pass Through
Target Partitioning Round Robin

Partitioned Flat File Order_Data_1.dat Order_Data_2.dat

38

(#3) Partition TypesKey Range


Specify keys and ranges for key values User-defined distribution Use when keys are defined easily (numbers) Compound keys accepted

39

(#3) Example: Source and Target Partitioning By Key Range


Source Partitioning Target Partitioning

40

Leave the start range blank or the end range blank to force a < or >.

(#3) Partition TypesHash


Specify keys on which to partition User-defined keys to create hash partitions Use to easily distribute the data evenly Data is grouped based on keys Hash auto-keys = PowerCenter Server uses all grouped or sorted ports as compound key

41

(#3) Example: Hash Partition


Hash partitioning used with an Aggregator transformation

42

(#3) Partition TypesDatabase


Based on Target IBM DB2 Database Multinode loading Target table must contain a partition key Non-DB2 targets will default to Pass Through

43

(#3) Partition Points Summary


A session pipeline has Reader, Writer, and one or more Transformation threads Those are default partition points One type of partitioning can be applied at these points The Session Mappings-Partitions tab displays the partition type allowed at that point

44

(#3) Partition Points Summary


Default partition points

Reader

Transformation

Transformation

Writer

Adding partition points

Reader

Transformation

Transformation

Transformation

Writer

Adding an additional partition point may increase session performance by adding additional threads

45

Tip #4: Consolidate Multiple Lookups


Lookups to lookupsreduce and reuse Consolidate lookups by using lookup overrides Use reusable lookups when multiple calls are needed Use unconnected lookups in conditional functions

46

(#4) Consolidating Lookups

47

(#4) Consolidating Lookups

48

(#4) Consolidating Lookups

49

Tip #5: Use Server Grids


A grid of servers in which sessions can run Assign workflow to a master server that will run sessions on the worker servers Currently Round Robin Master server can detect which server to run on

50

(#5) Server Grid Tips


Not a complete failover systemMaster server must always be up Configure all servers the sameparameter files, connections, scripts, email, etc. Can include different server platforms Use specified server if special properties are necessary (connections, proximity to source/targets, hardware, etc.)

51

Server Grid Option


Load Balancing and Failover for Heterogeneous Systems
Off-grid PowerCenter Server

PowerCenter Servers on Server Grid

Repository

52

Server Grid Option


Session3

Off-grid PowerCenter Server

PowerCenter Servers on Server Grid

Session2 Session1 Session5 (off-grid)

Session4

Session6

Repository

53

Tip #6: Use Expression Variables


Utilize variables in expressions to capture previous records data Can use with filters to perform aggregate functions Can use to compare two records (make sure source data is sorted) ORDER MATTERS!

54

(#6) Variables Example

var_RUNNING_TOTAL + SALES_REVENUE

55

Tip #7: Use Changed-Data Capture


Reduce data volumes to changes only Reduce load volume, increase performance May require special logic Some solutions may burden source system

56

(#7) Using Mapping Variables for CDC


Mapping variables to change dates, keys
Use mapping variables to store the last change date/key range Max date/key stored in the repository Successive runs, use the mapping variable in the SQ

Pros Only bring in affected records Minimal impact to mappings Restart or resetting dates possible
57

Cons Requires source to carry effective dates

(#7) Using CRC Checks for CDC


CRC checks
Calculate CRC values for each record Store CRC values in target tables Join source and target tables where keys match but CRC values do not match

Pros Only changed or new records Failure can just return

Cons Overhead on source system CRC function will slow mappings slightly

58

(#7) Using EAI Solution for CDC


EAI Message bus Use message bus to send only changed or new records to PowerCenter Real-time capabilities possible Minimal impact on source system

59

(#7) Using PowerExchange CDC


PowerExchange Pre-built CDC solutions for various databases Only processes new records or changed records Low-level logic provides optimal performance Minimal impact on source systems Supported by Informatica (not you)

60

Tip #8: Server Variables


Use Server Variables to ensure smooth migrations between environments Use Variables in sessions and workflow objects Update using PMREP or Workflow Manager

61

(#8) Server Variables


Server Variable Description A root directory for use by server variables. (usu. PowerCenter Server installation directory) Default directory for session logs. Defaults to $PMRootDir/SessLogs. Default directory for reject files. Defaults to $PMRootDir/BadFiles. Default directory for the index and data cache files. Defaults to $PMRootDir/Cache. Use a drive local to the PowerCenter Server. Default directory for target files. Defaults to $PMRootDir/TgtFiles. Default directory for source files. Defaults to $PMRootDir/SrcFiles. Default directory for external procedures. (def.: $PMRootDir/ExtProc) Default directory for temporary files. Defaults to $PMRootDir/Temp. Email address to receive post-session email when the session completes successfully. Email address to receive post-session email when the session fails.

$PMRootDir $PMSessionLogDir $PMBadFileDir $PMCacheDir $PMTargetFileDir $PMSourceFileDir $PMExtProcDir $PMTempDir $PMSuccessEmailUser $PMFailureEmailUser

62

(#8) Server Variables


Server Variable Description Number of session logs the PowerCenter Server archives for the session. (Def: 0) Number of non-fatal errors the PowerCenter Server allows before failing the session. Non-fatal errors include reader, writer, and DTM errors. If you want to stop the session on errors, enter the number of non-fatal errors you want to allow before stopping the session. The PowerCenter Server maintains an independent error count for each source, target, and transformation. Use to configure the Stop On option in the session properties.

$PMSessionLogCount $PMSessionErrorThreshold

$PMWorkflowLogDir

Default directory for workflow logs. Defaults to $PMRootDir/WorkflowLogs.

$PMWorkflowLogCount $PMLookupFileDir

Number of workflow logs the PowerCenter Server archives for the workflow. Defaults to 0. Default directory for lookup files. Defaults to $PMRootDir/LkpFiles.

63

(#8) Server Variables

64

(#8) Server Variables


System Variables:
$$$SessStartTime, SESSSTARTTIME, SYSDATE, WORKFLOWSTARTTIME

User Defined Variables:


$DBConnection, $InputFile, $OutputFile, $PMFailureEmailUser, $PMSessionLogCount, $PMSessionLogDir, $PMSessionLogFile, $PMSuccessEmailUser, $PMWorkflowLogCount, $PMWorkflowLogDir, $Source, $Target

65

Tip #9: Reduce Failures


Error Handling Strategies Capture errors into tables or flat files before they reach the target Use for error reports, corrections or warnings to sources Built-in Error Handling Functionality 7.x allows built-in error handling Captures function errors Filter Early! Downstream rejection of bad data can degrade performance

66

Tip #10: Use Performance Counters


All transformations have counters maintained by the serveruse to identify bottlenecks early
Enable Counters for session via Collect Performance Data option Server creates session_name.perf file for counter statistics Default location is session log directory Collecting performance data has some impact on session performance but not as much as tracing

The important counters:


Reads and writes to disk (Aggregators, Ranks, Sorters Joiners) Rows read from cache for Lookups Efficiency counters

67

(#10) Enabling Performance Counters


Performance counters provide a variety of statistics for each transformation in a map Counters are enabled in the session property

Session Wizard: Collect Performance Data

68

(#10) Viewing Performance Counters


Right click on the Session Task while running, select Properties, then Performance tab

69

Questions and Answers

70

Potrebbero piacerti anche