Informatica Performance Tuning

Jyotheswar Kuricheti
Agenda:
1. Performance Tuning Overview
2. Identify Bottlenecks
3. Optimizing at different levels :
Target
Source
Mapping
Session
System
2
Performance Tuning Overview:
What is Performance Tuning?

Goal of Performance Tuning
How do you measure performance?

Throughput
Why is it Critical?
Load time is very critical to meet SLA needs of the data availability
in the reports.
How do you improve performance?
Identify Bottlenecks
Eliminate Bottlenecks
Test Load option to see if any improvement in the performance
Add partitions
Change one variable at a time
5
Reasons for Session Performance Issues:

CPU : CPU intensive operations like string manipulation inside
Expression transformation
Memory/Disk access :
File system read/write issues
Paging (lookup cache etc.) due to non availability of RAM
Non availability of buffer blocks
Network : Database and PowerCenter servers connected by WAN
Input/output Operations
Poor/Complex Design
Incorrect Load Strategies
The optimization can be done on different levels

of Informatica:
1) Target level
2) Source level
3) Mapping level
4) Transformation level
5) Session level
6) Grid Level
7) Component level
8) System level
Identify Bottlenecks :
WRT_8165 : TIMEOUT BASED COMMIT POINT

This is a common message that can be seen in the session log where
there are session performance issues
Signifies that there arent enough rows available in memory to insert

and issue a commit
This message means that there is a bottle neck either in source, target
or any of the transformations and the bottleneck needs to be identified
and removed to improve session performance.
Methods to identify performance bottlenecks:

1. Run test sessions
2. Analyze performance details
3. Analyze thread statistics
4. Monitor system performance
1.Run Test Sessions: Running a test load of few records to read data
from a flat file or to write to a flat file target to identify source and target
bottlenecks. Precisely depending upon the throughput performance will
be measured
2.Analyze thread statistics: Analyze thread statistics to determine the
optimal number of partition points.
10
3.Analyze performance details: Like performance counters, to

determine where session performance decreases.
Use Collect Performance Data in Session properties
Areas to check when repository performance is a concern
User would like to see statistics from the monitor
4.Monitor system performance: You can use system monitoring tools

to view the percentage of CPU use, I/O waits, and paging to identify
system bottlenecks. Use the Workflow Monitor to view system resource
usage.
11
2.Using Thread statistics: This is the way where we get statistics from
a session log file. Before going we need to know few points about
Thread.
DTM (Data Transformation manager) create a master thread to run our
sessions. For each target load order group in a mapping, the master
thread can create several threads. The types of threads depend on the
session properties and the transformations in the mapping. The number
of threads depends on the partitioning information for each target load
order group in the mapping.
1. Mapping Threads
2. Pre- and Post-Session Threads
3. Reader Threads
4. Transformation Threads
5. Writer Threads
Thread analysis is to decide the mapping performance depending upon
the statistics of threads. we can use these statistics to identifying the
source, target, or transformation bottlenecks.
From session log file we will have 4 entries which give details about
performance.
12
1. Run Time : total time taken by a thread

2. Idle Time: time period where thread is idle.
3. Busy: (run time - idle time) / run time X 100
4. Thread work time: time taken by each transformation in a thread
Example :
MANAGER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition
point [SQ_XXXX] has completed: Total Run Time = [576.620988] secs, Total Idle Time =
[535.601729] secs, Busy Percentage = [7.113730].
MANAGER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of
partition point [SQ_XXXX] has completed: Total Run Time = [577.301318] secs, Total Idle Time
= [0.000000] secs, Busy Percentage = [99.000000].
LKP_ADDRESS: 20.000000 percent
AGG_ADDRESS: 79.000000 percent
MANAGER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition
point(s) [TGT_XXXX] has completed: Total Run Time = [577.363934] secs, Total Idle Time =
[492.602374] secs, Busy Percentage = [14.680785].
The thread with the highest busy percentage identifies the bottleneck in the session. Blindly
we can't add a partition point also to enable few thread to the process because the CPU may
be busy with other task and just we are putting more pressure on CPU. Even we can ignore
high busy percentage if total run time is less than 60 seconds
13
Optimizing at different levels :
14
(1).Target Level optimization:

We are having two different target types.
1. Flat File
2. Database.
If you are facing issue with flat file target then problem may not be with
the flat file target, but problem lies with the storage space or with
storage drive.
Database: While loading into database we need to consider the
following points.
1. Drop indexes and key constraints
2. Increase checkpoint intervals
3. Use bulk loading
4. Use external loading
5. Minimize deadlocks
6. Increase database network packet size
7. Optimize Oracle target databases
15
Drop indexes and key constraints : The loading of data will be slow
on indexes or key constraints defined tables. Use pre-session
commands to drop indexes before session loading. After loading the
data the constraints or indexes need to be built again using postsession commands.
Increase checkpoint intervals: The performance of loading depends
on how many less check points do we have. To do so increase the
checkpoint interval in the database
Use bulk loading: Integration Service bypasses the database log,

which speeds performance. Recovery is not possible.
Use external loaders: With almost all the databases we have self built
loading mechanism. Like for oracle SQL loader, for Teradata we can
use Teradata external loader, to increase the performance we can load
separate pipelines for separate partitions.
Minimize deadlocks: We need to avoid attacking on same target from
multiple sources systems, i.e. using multiple ways we shouldn't try to
populate data at a single target. Use different target connection groups.
16
Increase database network packet size: If you realized that problem is

with database consult your DBA and try to increase network packet size
in listener.ora and tnsnames.ora
Optimize Oracle target databases: With help of your DBA you can
increase storage segments size or any database level changes can be
created or added. Tune the Oracle redo log in the init.ora file
17
(2).Source Level Optimization:

1. Optimize the query.
2. Use conditional filters
3. Increase database network packet size
Optimize the query: Join multiple sources in one SQ with hints,
Indexes on group by and order by classes. Configure the source
database to run parallel queries.
Use conditional Filters: Apply filter on source data, but we need to do

complete analysis before applying filters on source data. Connect ports
from SQ only if they are needed in target.
Increase database network packet size: If you realized that problem
is with database consult your DBA and try to increase network packet
size in listener.ora and tnsnames.ora
18
(3). Mapping Level Optimization:

Mapping level optimization is a time taking process . Eliminate
unwanted transformations, unwanted fields and links. The mapping
optimization has to be done after source and target level optimization.
1. Optimize the flat file sources
2. Configure single-pass reading
3. Optimize Simple Pass Through mappings
4. Optimize filters
5. Optimize data type conversions
6. Optimize expressions
Optimize the flat file sources: by avoiding double or single quotes and
escape characters we can optimize flat file sources for delimiter files
and managing the sequential buffer length for fixed files.
Configure single-pass reading: Consider using single-pass reading if
you have multiple sessions that use the same sources. It allows you to
populate multiple targets with one source qualifier. It avoids using of a
joiner for RDBMS source tables.
19
20
Optimize Simple Pass Through mappings : If we are passing data

from source to target, connect directly from source qualifier to target. If
use wizard to create Simple Pass Through mappings it will add an
expression in between target and source qualifier.
Optimize filters: If your source is a relational table use filter at source
qualifier. It restricts some of the rows which are not valid for mapping
process.
For flat file sources, use filter transformation after the source qualifier.
Avoid complex conditions at filter, go for integer or true/false conditions.
Optimize datatype conversions: eliminate unnecessary datatype
conversions. Use integer values in conditions of Lookup and Filter
transformations. Before doing data conversion be aware of source and
target data types.
Optimize expressions: Factoring Out Common Logic. Minimizing
Aggregate Function Calls. Ex: Use SUM(COLUMN_A + COLUMN_B)
instead of SUM(COLUMN_A) + SUM(COLUMN_B)
Call lookups conditionally. Use local variables in expression
transformation. Use operators instead of Functions.
21
(4).Transformation level Optimization:

The Transformation level optimization we can consider as a part of
mapping optimization. Here we will get more information how to handle
transformations more effectively.
Optimizing Aggregator Transformations: They often slow performance

because they must group data before processing it.
1. Group on Numeric : Group on numeric columns instead of string and

date columns.
2. Group on Indexed Columns:
3. Using Sorted input: It reduces the amount of data cached which
improves performance.
4. Reduce complex logic in aggregator expressions
5. Using Incremental Aggregation
5. Filter Data Before You Aggregate.
6. Limiting Port Connections
22
Optimizing Joiner Transformations : Joiner joins data of different sources into a

single pipeline.
1.Designate the master source as the source with fewer duplicate key values.
2.Designate the master source as the source with fewer rows as it compares each
row of the detail source against the master source
3.Perform joins in a database. Use SQ to perform join for relational tables
4.Use Sorted Data to join
Optimizing Lookup Transformations :
Caching lookup tables:

Use the appropriate cache type: Static, Shared and Persistent caches
Enable concurrent caches: number of additional concurrent pipelines is set to
one or more
Optimize Lookup policy on multiple matching: use any matching value,
performance can improve because the transformation does not index on all ports
but it still returns first value that matches lkp condition.
Reduce the number of cached rows.
Override the ORDER BY statement.
23
Optimizing the Lookup Condition: =,<,>,<=,>=,!=

Filtering Lookup Rows
Indexing the conditional columns in Lookup Table
Optimizing Multiple Lookups
Optimizing Sequence Generator Transformations: create a

reusable Sequence Generator and use it in multiple mappings
simultaneously. By configuring the Number of Cached Values property
for sequence number we get some good results.
Optimizing Sorter Transformations: If the Integration Service cannot
allocate enough memory to sort data, it fails the session. For best
performance, configure Sorter cache size with a value less than or
equal to the amount of available physical RAM on the Integration
Service machine. Default size is 16 MB
Use the following formula to determine the size of incoming data:
# input rows ([Sum(column size)] + 16
24
Optimizing Source Qualifier Transformations:

Select distinct, filter, tune the query
Optimizing SQL Transformations:
Use query mode instead of script mode
Do not use transaction statements like commit, rollback in an SQL
transformation query.
In query mode construct a static query by using parameter binding

instead of string substitution in the SQL Editor
Choose static connection instead of dynamic connection
25
(5).Session Level optimization:

1.Grid
2.Pushdown Optimization
3.Concurrent Sessions and Workflows
4.Buffer Memory
5.Caches
6.Target-Based Commit
7.Real-time Processing
8.Staging Areas
9.Log Files
10.Error Tracing
11.Post-Session Emails
26
1.Grid:
A Load Balancer distributes tasks to nodes without overloading any
node.
A grid can improve performance when you have a performance

bottleneck in the extract and load steps of a session, when memory or
temporary storage is a performance bottleneck
Ex : Sorter, Aggregator, Joiner (stores intermediate results)
2.Pushdown Optimization: Integration Service executes SQL against
the source or target database instead of processing the transformation
logic within the Integration Service.
3.Concurrent Sessions and Workflows :
4.Buffer Memory: Adjust DTM Buffer Size & Default Buffer Block Size
27
5.Caches:
Limit the Number of Connected Ports
With a 64-bit platform, the Integration Service is not limited to the 2 GB
cache limit of a 32-bit platform.
If the allocated cache is not large enough to store the data, the
Integration Service stores the data in a temporary disk file, a cache file.
Performance slows each time the Integration Service pages to a
temporary file.
The Transformation_readfromdisk or Transformation_writetodisk
counters for any Aggregator, Rank, or Joiner transformation indicate the
number of times the Integration Service pages to disk to process the
transformation.
6.Target-Based Commit :
If the commit interval is too high, the Integration Service may fill the
database log file and cause the session to fail.
28
7.Real-time Processing:
Increase the flush latency to improve throughput
Source-based commit interval determines how often the Integration
Service commits real-time data to the target. To obtain the fastest
latency, set the source-based commit to 1.
8.Staging Areas:
The Integration Service can read multiple sources with a single pass,
which can reduce the need for staging areas.
9.Log Files:
Workflows and sessions always create binary logs which can be
accessed in the Administrator tool.
10.Error Tracing:
Set the tracing level appropriately. To debug use Verbose. Use Terse
when you do not want to log error messages for reject data.
29
11.Post-Session Emails:
configure the session to write to log file when you configure post-session
email to attach a session log. Enable flat file logging
(6).Optimizing Grid Deployments:
Add nodes to the grid.

Increase storage capacity and bandwidth.
Use shared file systems.
Use a high-throughput network
(7).Optimizing PowerCenter Repository Performance:

Ensure the Repository Service process runs on the same machine where
the repository database resides
Order conditions in object queries
Use a single-node tablespace for the PowerCenter repository if you install it
on a DB2 db.
Optimize the database schema for the PowerCenter repository if you install
it on a DB2 or Microsoft SQL Server database by
30
enabling the Optimize Database Schema option for the Repository

Service in the Administration Console.
Optimizing Integration Service Performance:

Use native drivers instead of ODBC drivers for the Integration Service.
Run the Integration Service in ASCII data movement mode if character
data is 7-bit ASCII or EBCDIC. ASCII mode take 1 byte to store each
character where as UNICODE takes 2 bytes.
Cache PowerCenter metadata for the Repository Service.
Run Integration Service with high availability : Integration Service
recovers workflows and sessions that may fail because of temporary
network or machine failures. To recover from a workflow or session, the
Integration Service writes the states of each workflow and session to
temporary files in a shared directory which may decrease performance
31
(8).Optimizing the System:

Improve network speed :
Minimize the number of network hops between the source and target
databases and the IS
A local disk can move data 5 to 20 times faster than a network. Store flat files
as source or target in IS machine
Move Target DB to a Server System if possible
Ask Network Engineer to provide enough Bandwidth
Use multiple CPUs to run multiple sessions in parallel
Reduce paging
Use processor binding
Using Pipeline Partitions:
After you tune the application, databases, and system for maximum singlepartition performance, you may find that the system is under-utilized. At this
point, you can configure the session to have two or more partitions
To improve performance, ensure the number of pipeline partitions equals the
number of database partitions.
32
Use the database partitioning partition type for source and target
databases. Enable parallel queries/inserts
SQ : pass-through partition
Filter : round-robin partition
Sorter : hash auto-keys partitioning. Delete default partition at
Aggregator
Performance Counters :
All transformations have counters. The Integration Service tracks the
number of input rows, output rows, and error rows for each
transformation. Some transformations have performance counters:
right-click the session in the Workflow Monitor and choose Properties.
Click the Properties tab in the details dialog box.
Errorrows
Readfromcache and Writetocache
Readfromdisk and Writetodisk
Rowsinlookupcache
33
If these counters display any number other than zero, you can increase
the cache sizes to improve session performance.
34
2014 by Author (Jyotheswar Kuricheti).

All rights reserved. No part of this document may be reproduced or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without prior written
permission of Author.
35

Informatica Performance Tuning

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Informatica Performance Tuning

Caricato da

Copyright:

Formati disponibili

Jyotheswar Kuricheti

Performance Tuning Overview:

What is Performance Tuning?

How do you measure performance?

Reasons for Session Performance Issues:

The optimization can be done on different levels

WRT_8165 : TIMEOUT BASED COMMIT POINT

Signifies that there arent enough rows available in memory to insert

Methods to identify performance bottlenecks:

3.Analyze performance details: Like performance counters, to

4.Monitor system performance: You can use system monitoring tools

1. Run Time : total time taken by a thread

Optimizing at different levels :

(1).Target Level optimization:

Use bulk loading: Integration Service bypasses the database log,

Increase database network packet size: If you realized that problem is

(2).Source Level Optimization:

Use conditional Filters: Apply filter on source data, but we need to do

(3). Mapping Level Optimization:

Optimize Simple Pass Through mappings : If we are passing data

(4).Transformation level Optimization:

Optimizing Aggregator Transformations: They often slow performance

1. Group on Numeric : Group on numeric columns instead of string and

Optimizing Joiner Transformations : Joiner joins data of different sources into a

Optimizing Lookup Transformations :

Caching lookup tables:

Optimizing the Lookup Condition: =,<,>,<=,>=,!=

Optimizing Sequence Generator Transformations: create a

Optimizing Source Qualifier Transformations:

In query mode construct a static query by using parameter binding

(5).Session Level optimization:

A grid can improve performance when you have a performance

(6).Optimizing Grid Deployments:

Add nodes to the grid.

(7).Optimizing PowerCenter Repository Performance:

enabling the Optimize Database Schema option for the Repository

Optimizing Integration Service Performance:

(8).Optimizing the System:

2014 by Author (Jyotheswar Kuricheti).

Potrebbero piacerti anche