Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Informatica PowerCenter®
(Version 8.1.1)
Informatica PowerCenter Performance Tuning Guide
Version 8.1.1
September 2006
This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing
restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be
reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as
provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR
52.227-14 (ALT III), as applicable.
The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing.
Informatica Corporation does not warrant that this documentation is error free.
Informatica, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerMart, SuperGlue, Metadata Manager, Informatica Data
Quality and Informatica Data Explorer are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout
the world. All other company and product names may be trade names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies,
1999-2002. All rights reserved. Copyright © Sun Microsystems. All Rights Reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal
Technology Corp. All Rights Reserved.
Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University and
University of California, Irvine, Copyright (c) 1993-2002, all rights reserved.
Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU Lesser General
Public License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials are provided free of charge by
Informatica, “as-is”, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness
for a particular purpose.
Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration® is a registered trademark of Meta Integration
Technology, Inc.
This product includes software developed by the Apache Software Foundation (http://www.apache.org/). The Apache Software is Copyright (c) 1999-2005 The
Apache Software Foundation. All rights reserved.
This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit and redistribution of this software is subject to terms available
at http://www.openssl.org. Copyright 1998-2003 The OpenSSL Project. All Rights Reserved.
The zlib library included with this software is Copyright (c) 1995-2003 Jean-loup Gailly and Mark Adler.
The Curl license provided with this Software is Copyright 1996-2004, Daniel Stenberg, <Daniel@haxx.se>. All Rights Reserved.
The PCRE library included with this software is Copyright (c) 1997-2001 University of Cambridge Regular expression support is provided by the PCRE library
package, which is open source software, written by Philip Hazel. The source for this library may be found at ftp://ftp.csx.cam.ac.uk/pub/software/programming/
pcre.
Portions of the Software are Copyright (c) 1998-2005 The OpenLDAP Foundation. All rights reserved. Redistribution and use in source and binary forms, with
or without modification, are permitted only as authorized by the OpenLDAP Public License, available at http://www.openldap.org/software/release/license.html.
This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775 and other U.S. Patents Pending.
DISCLAIMER: Informatica Corporation provides this documentation “as is” without warranty of any kind, either express or implied,
including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information provided in this
documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or changes in the products described in
this documentation at any time without notice.
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . . xi
Visiting the Informatica Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . xi
Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
iii
Using Bulk Loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Using External Loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Minimizing Deadlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Increasing Database Network Packet Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Optimizing Oracle Target Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
iv Table of Contents
Chapter 6: Optimizing Transformations . . . . . . . . . . . . . . . . . . . . . . . 45
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Optimizing Aggregator Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Group By Simple Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Use Sorted Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Use Incremental Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Filter Data Before You Aggregate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Limit Port Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Optimizing Custom Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Optimizing Joiner Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Optimizing Lookup Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Using Optimal Database Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Caching Lookup Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Optimizing the Lookup Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Indexing the Lookup Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Optimizing Multiple Lookups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Optimizing Sequence Generator Transformations . . . . . . . . . . . . . . . . . . . . . 55
Optimizing Sorter Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Allocating Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Work Directories for Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Optimizing Source Qualifier Transformations . . . . . . . . . . . . . . . . . . . . . . . 57
Optimizing SQL Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Eliminating Transformation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
v
Increasing the Commit Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Disabling High Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Reducing Error Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Removing Staging Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
vi Table of Contents
Rowsinlookupcache Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Errorrows Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Welcome to PowerCenter, the Informatica software product that delivers an open, scalable
data integration solution addressing the complete life cycle for all data integration projects
including data warehouses, data migration, data synchronization, and information hubs.
PowerCenter combines the latest technology enhancements for reliably managing data
repositories and delivering information resources in a timely, usable, and efficient manner.
The PowerCenter repository coordinates and drives a variety of core functions, including
extracting, transforming, loading, and managing data. The Integration Service can extract
large volumes of data from multiple platforms, handle complex transformations on the data,
and support high-speed loads. PowerCenter can simplify and accelerate the process of
building a comprehensive data warehouse from disparate data sources.
ix
About This Book
The Performance Tuning Guide is written for PowerCenter administrators and developers,
network administrators, database administrators who are interested in improving
PowerCenter performance. This guide assumes you have knowledge of your operating
systems, networks, PowerCenter, relational database concepts, and flat files in your
environment. For more information about database performance tuning not covered in this
guide, see the documentation accompanying your database products.
The material in this book is also available online.
Document Conventions
This guide uses the following formatting conventions:
italicized monospaced text This is the variable name for a value you enter as part of an
operating system command. This is generic text that should be
replaced with user-supplied values.
Warning: The following paragraph notes situations where you can overwrite
or corrupt data, unless you follow the specified procedure.
bold monospaced text This is an operating system command you enter from a prompt to
run a task.
x Preface
Other Informatica Resources
In addition to the product manuals, Informatica provides these other resources:
♦ Informatica Customer Portal
♦ Informatica web site
♦ Informatica Developer Network
♦ Informatica Knowledge Base
♦ Informatica Technical Support
Preface xi
Use the following email addresses to contact Informatica Technical Support:
♦ support@informatica.com for technical inquiries
♦ support_admin@informatica.com for general customer service requests
WebSupport requires a user name and password. You can request a user name and password at
http://my.informatica.com.
North America / South America Europe / Middle East / Africa Asia / Australia
xii Preface
Chapter 1
Performance Tuning
Overview
This chapter includes the following topic:
♦ Overview, 2
1
Overview
The goal of performance tuning is to optimize session performance by eliminating
performance bottlenecks. To tune session performance, first identify a performance
bottleneck, eliminate it, and then identify the next performance bottleneck until you are
satisfied with the session performance. You can use the test load option to run sessions when
you tune session performance.
If you tune all the bottlenecks, you can further optimize session performance by increasing
the number of pipeline partitions in the session. Adding partitions can improve performance
by utilizing more of the system hardware while processing the session.
Because determining the best way to improve performance can be complex, change one
variable at a time, and time the session both before and after the change. If session
performance does not improve, you might want to return to the original configuration.
Complete the following tasks to improve session performance:
1. Optimize the target. Enables the Integration Service to write to the targets efficiently.
For more information, see “Optimizing the Target” on page 15.
2. Optimize the source. Enables the Integration Service to read source data efficiently. For
more information, see “Optimizing the Source” on page 25.
3. Optimize the mapping. Enables the Integration Service to transform and move data
efficiently. For more information, see “Optimizing Mappings” on page 33.
4. Optimize the transformation. Enables the Integration Service to process transformations
in a mapping efficiently. For more information, see “Optimizing Transformations” on
page 45.
5. Optimize the session. Enables the Integration Service to run the session more quickly.
For more information, see “Optimizing Sessions” on page 61.
6. Optimize the PowerCenter components. Enables the Integration Service and Repository
Service to function optimally. For more information, see “Optimizing the PowerCenter
Components” on page 75.
7. Optimize the system. Enables PowerCenter service processes to run more quickly. For
more information, see “Optimizing the System” on page 79.
Identifying Bottlenecks
3
Overview
The first step in performance tuning is to identify performance bottlenecks. Performance
bottlenecks can occur in the source and target databases, the mapping, the session, and the
system. Generally, look for performance bottlenecks in the following order:
1. Target
2. Source
3. Mapping
4. Session
5. System
The strategy is to identify a performance bottleneck, eliminate it, and then identify the next
performance bottleneck until you are satisfied with the performance. To tune session
performance, you can use the test load option to run sessions.
Use the following methods to identify performance bottlenecks:
♦ Run test sessions. You can configure a test session to read from a flat file source or to write
to a flat file target to identify source and target bottlenecks.
♦ Study performance details and thread statistics. You can analyze performance details to
identify session bottlenecks. Performance details provide information such as readfromdisk
and writetodisk counters. For more information about performance details, see
“Monitoring Workflows” in the Workflow Administration Guide. For more information
about thread statistics, see “Using Thread Statistics” on page 5.
♦ Monitor system performance. You can use system monitoring tools to view the percentage
of CPU usage, I/O waits, and paging to identify system bottlenecks.
Once you determine the location of a performance bottleneck, use the following guidelines to
eliminate the bottleneck:
♦ Eliminate source and target database bottlenecks. Have the database administrator
optimize database performance by optimizing the query, increasing the database network
packet size, or configuring index and key constraints.
♦ Eliminate mapping bottlenecks. Fine tune the pipeline logic and transformation settings
and options in mappings to eliminate mapping bottlenecks.
♦ Eliminate session bottlenecks. Optimize the session strategy and use performance details
to help tune session configuration.
♦ Eliminate system bottlenecks. Have the system administrator analyze information from
system monitoring tools and improve CPU and network performance.
If a transformation thread is 100% busy, consider adding a partition point in the segment.
When you add partition points to the mapping, the Integration Service increases the number
of transformation threads it uses for the session. However, if the machine is already running at
or near full capacity, then do not add more threads.
If the reader or writer thread is 100% busy, consider using string datatypes in the source or
target ports. Non-string ports require more processing.
You can identify which thread the Integration Service uses the most by reading the thread
statistics in the session log.
Example
When you run a session, the session log lists run information and thread statistics similar to
the following text:
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], SRC PIPELINE [1] *****
MANAGER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point
[SQ_NetData] has completed: Total Run Time = [576.620988] secs, Total Idle Time =
[535.601729] secs, Busy Percentage = [7.113730].
MANAGER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of
partition point [SQ_NetData] has completed: Total Run Time = [577.301318] secs, Total Idle
Time = [0.000000] secs, Busy Percentage = [100.000000].
MANAGER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s)
[DAILY_STATUS] has completed: Total Run Time = [577.363934] secs, Total Idle Time =
[492.602374] secs, Busy Percentage = [14.680785].
MANAGER> PETL_24021 ***** END RUN INFO *****
Notice the Total Run Time and Busy Percentage statistic for each thread. The thread with the
highest Busy Percentage identifies the bottleneck in the session.
In this session log, the Total Run Time for the transformation thread is 577 seconds and the
Busy Percentage is 100%. This means the transformation thread was never idle for the 577
15
Overview
You can optimize the following types of targets:
♦ Flat file
♦ Relational
Relational Target
If the session writes to a relational target, you can perform the following tasks to increase
performance:
♦ Drop indexes and key constraints. For more information, see “Dropping Indexes and Key
Constraints” on page 17.
♦ Increase checkpoint intervals. For more information, see “Increasing Database
Checkpoint Intervals” on page 18.
♦ Use bulk loading. For more information, see “Using Bulk Loads” on page 19.
♦ Use external loading. For more information, see “Using External Loads” on page 20.
♦ Minimize deadlocks. For more information, see “Minimizing Deadlocks” on page 21.
♦ Increase database network packet size. For more information, see “Increasing Database
Network Packet Size” on page 22.
♦ Optimize Oracle target databases. For more information, see “Optimizing Oracle Target
Databases” on page 23.
Minimizing Deadlocks 21
Increasing Database Network Packet Size
If you write to Oracle, Sybase ASE or, Microsoft SQL Server targets, you can improve the
performance by increasing the network packet size. Increase the network packet size to allow
larger packets of data to cross the network at one time. Increase the network packet size based
on the database you write to:
♦ Oracle. You can increase the database server network packet size in listener.ora and
tnsnames.ora. Consult your database documentation for additional information about
increasing the packet size, if necessary.
♦ Sybase ASE and Microsoft SQL. Consult your database documentation for information
about how to increase the packet size.
For Sybase ASE or Microsoft SQL Server, you must also change the packet size in the
relational connection object in the Workflow Manager to reflect the database server packet
size.
25
Overview
If the session reads from a relational source, review the following suggestions for improving
performance:
♦ Optimize the query. For example, you may want to tune the query to return rows faster or
create indexes for queries that contain ORDER BY or GROUP BY clauses. For more
information, see “Optimizing the Query” on page 27.
♦ Use conditional filters. You can use the PowerCenter conditional filter in the Source
Qualifier to improve performance. For more information, see “Using Conditional Filters”
on page 28.
♦ Increase database network packet size. You can improve the performance of a source
database by increasing the network packet size, which allows larger packets of data to cross
the network at one time. For more information, see “Increasing Database Network Packet
Size” on page 29.
♦ Connect to Oracle databases using IPC protocol. You can use IPC protocol to connect to
the Oracle database to improve performance. For more information, see “Connecting to
Oracle Database Sources” on page 30.
♦ Use the FastExport utility to extract Teradata data. You can create a PowerCenter session
that uses FastExport to read Teradata sources quickly. For more information, see “Using
Teradata FastExport” on page 31.
♦ Create tempdb to join Sybase ASE or Microsoft SQL Server tables. When you join large
tables on a Sybase ASE or Microsoft SQL Server database, it is possible to improve
performance by creating the tempdb as an in-memory database. For more information, see
“Using tempdb to Join Sybase or Microsoft SQL Server Tables” on page 32.
Optimizing Mappings
33
Overview
Mapping-level optimization may take time to implement, but it can significantly boost
session performance. Focus on mapping-level optimization after you optimize the targets and
sources.
Generally, you reduce the number of transformations in the mapping and delete unnecessary
links between transformations to optimize the mapping. Configure the mapping with the
least number of transformations and expressions to do the most amount of work possible.
Delete unnecessary links between transformations to minimize the amount of data moved.
You can also perform the following tasks to optimize the mapping:
♦ Optimize the flat file sources. For example, you can improve session performance if the
source flat file does not contain quotes or escape characters. For more information, see
“Optimizing Flat File Sources” on page 35.
♦ Configure single-pass reading. You can use single-pass readings to reduce the number of
times the Integration Service reads sources. For more information, see “Configuring
Single-Pass Reading” on page 36.
♦ Optimize Simple Pass Through mappings. You can use simple pass through mappings to
improve session throughput. For more information, see “Optimizing Simple Pass Through
Mappings” on page 37.
♦ Optimize filters. You can use Source Qualifier transformations and Filter transformations
to filter data in the mapping. To maximize session performance, only process records that
belong in the pipeline. For more information, see “Optimizing Filters” on page 38.
♦ Optimize datatype conversions. You can increase performance by eliminating unnecessary
datatype conversions. For more information, see “Optimizing Datatype Conversions” on
page 39.
♦ Optimize expressions. For example, minimize aggregate function calls and replace
common expressions with local variables. For more information, see “Optimizing
Expressions” on page 40.
♦ Optimize external procedures. If an external procedure needs to alternate reading from
input groups, you can block input data to increase session performance. For more
information, see “Optimizing External Procedures” on page 43.
If you factor out the aggregate function call, as below, the Integration Service adds
COLUMN_A to COLUMN_B, then finds the sum of both.
SUM(COLUMN_A + COLUMN_B)
Optimizing Expressions 41
IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N',
0.0,
))))))))
This results in three IIFs, two comparisons, two additions, and a faster session.
Evaluating Expressions
If you are not sure which expressions slow performance, evaluate the expression performance
to isolate the problem.
Complete the following steps to evaluate expression performance:
1. Time the session with the original expressions.
2. Copy the mapping and replace half of the complex expressions with a constant.
3. Run and time the edited session.
4. Make another copy of the mapping and replace the other half of the complex expressions
with a constant.
5. Run and time the edited session.
Optimizing
Transformations
This chapter includes the following topics:
♦ Overview, 46
♦ Optimizing Aggregator Transformations, 47
♦ Optimizing Custom Transformations, 49
♦ Optimizing Joiner Transformations, 50
♦ Optimizing Lookup Transformations, 51
♦ Optimizing Sequence Generator Transformations, 55
♦ Optimizing Sorter Transformations, 56
♦ Optimizing Source Qualifier Transformations, 57
♦ Optimizing SQL Transformations, 58
♦ Eliminating Transformation Errors, 59
45
Overview
You can further optimize mappings by optimizing the transformations contained in the
mappings.
You can optimize the following transformations in a mapping:
♦ Aggregator transformations. For more information, see “Optimizing Aggregator
Transformations” on page 47.
♦ Custom transformations. For more information, see “Optimizing Custom
Transformations” on page 49.
♦ Joiner transformations. For more information, see “Optimizing Joiner Transformations”
on page 50.
♦ Lookup transformations. For more information, see “Optimizing Lookup
Transformations” on page 51.
♦ Sequence Generator transformations. For more information, see “Optimizing Sequence
Generator Transformations” on page 55.
♦ Sorter transformations. For more information, see “Optimizing Sorter Transformations”
on page 56.
♦ Source Qualifier transformations. For more information, see “Optimizing Source
Qualifier Transformations” on page 57.
♦ SQL transformations. For more information, see “Optimizing SQL Transformations” on
page 58.
You can also eliminate transformation errors to improve transformation performance. For
more information, see “Eliminating Transformation Errors” on page 59.
The Lookup transformation includes three lookup ports used in the mapping, ITEM_ID,
ITEM_NAME, and PRICE. When you enter the ORDER BY statement, enter the columns
in the same order as the ports in the lookup condition. You must also enclose all database
reserved words in quotes.
Enter the following lookup query in the lookup SQL override:
SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE, ITEMS_DIM.ITEM_ID FROM
For more information about overriding the Order By statement, see “Lookup Transformation”
in the Transformation Guide.
Allocating Memory
If the Integration Service cannot allocate enough memory to sort data, it fails the session. For
best performance, configure Sorter cache size with a value less than or equal to the amount of
available physical RAM on the Integration Service machine. Informatica recommends
allocating at least 8 MB (8,388,608 bytes) of physical memory to sort data using the Sorter
transformation. Sorter cache size is set to 8,388,608 bytes by default.
If the amount of incoming data is greater than the amount of Sorter cache size, the
Integration Service temporarily stores data in the Sorter transformation work directory. The
Integration Service requires disk space of at least twice the amount of incoming data when
storing data in the work directory. If the amount of incoming data is significantly greater than
the Sorter cache size, the Integration Service may require much more than twice the amount
of disk space available to the work directory.
Use the following formula to determine the size of incoming data:
# input rows ([Sum(column size)] + 16)
For more information about sorter caches, see “Sorter Transformation” in the Transformation
Guide.
Optimizing Sessions
61
Overview
Once you optimize the source database, target database, and mapping, you can focus on
optimizing the session. You can perform the following tasks to improve overall performance:
♦ Use a grid. You can increase performance by using a grid to balance the Integration Service
workload. For more information, see “Using a Grid” on page 63.
♦ Use pushdown optimization. You can increase session performance by pushing
transformation logic to the source or target database. For more information, see “Using
Pushdown Optimization” on page 64.
♦ Run sessions and workflows concurrently. You can run independent sessions and
workflows concurrently to improve session and workflow performance. For more
information, see “Run Concurrent Sessions and Workflows” on page 65.
♦ Allocate buffer memory. You can increase the buffer memory allocation for sources and
targets that require additional memory blocks. If the Integration Service cannot allocate
enough memory blocks to hold the data, it fails the session. For more information, see
“Allocating Buffer Memory” on page 66.
♦ Optimize caches. You can improve session performance by setting the optimal location
and size for the caches. For more information, see “Optimizing Caches” on page 69.
♦ Increase the commit interval. Each time the Integration Service commits changes to the
target, performance slows. You can increase session performance by increasing the interval
at which the Integration Service commits changes. For more information, see “Increasing
the Commit Interval” on page 71.
♦ Disable high precision. Performance slows when the Integration Service reads and
manipulates data with the high precision datatype. You can disable high precision to
improve session performance. For more information, see “Disabling High Precision” on
page 72.
♦ Reduce errors tracing. To improve performance, you can reduce the error tracing level,
which reduces the number of log events generated by the Integration Service. For more
information, see “Reducing Error Tracing” on page 73.
♦ Remove staging areas. When you use a staging area, the Integration Service performs
multiple passes on the data. You can eliminate staging areas to improve session
performance. For more information, see “Removing Staging Areas” on page 74.
Using a Grid 63
Using Pushdown Optimization
You can increase session performance by pushing transformation logic to the source or target
database. Based on the mapping and session configuration, the Integration Service executes
SQL against the source or target database instead of processing the transformation logic
within the Integration Service.
For more information about PowerCenter pushdown optimization, see “Using Pushdown
Optimization” in the Workflow Administration Guide.
100 * 2 = 200
2. Based on default settings, you determine that you can change the DTM Buffer Size to
15,000,000, or you can change the Default Buffer Block Size to 54,000:
(session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer Block
Size) * (number of partitions)
or
200 = .9 * 12000000 / 54000 * 1
Note: For a session that contains n partitions, set the DTM Buffer Size to at least n times the
value for the session with one partition. The Log Manager writes a warning message in the
session log if the number of memory blocks is so small that it causes performance
degradation. The Log Manager writes this warning message even if the number of memory
blocks is enough for the session to run successfully. The warning message also gives a
suggestion for the proper value.
Optimizing Caches 69
process may have faster access to the cache files if it runs on the same machine that contains
the cache directory.
Note: You may encounter performance degradation when you cache large quantities of data on
a mapped or mounted drive.
For more information about choosing the cache directory, see “Session Caches” in the
Workflow Administration Guide.
Optimizing the
PowerCenter Components
This chapter includes the following topics:
♦ Overview, 76
♦ Optimizing PowerCenter Repository Performance, 77
♦ Optimizing Integration Service Performance, 78
75
Overview
You can optimize performance of the following PowerCenter components:
♦ PowerCenter repository
♦ Integration Service
If you run PowerCenter on multiple machines, run the Repository Service and Integration
Service on different machines. To load large amounts of data, run the Integration Service on
the higher processing machine. Also, run the Repository Service on the machine hosting the
PowerCenter repository.
79
Overview
Often performance slows because the session relies on inefficient connections or an
overloaded Integration Service process system. System delays can also be caused by routers,
switches, network protocols, and usage by many users.
Slow disk access on source and target databases, source and target file systems, and nodes in
the domain can slow session performance. Have the system administrator evaluate the hard
disks on the machines.
After you determine from the system monitoring tools that you have a system bottleneck,
make the following global changes to improve the performance of all sessions:
♦ Improve network speed. Slow network connections can slow session performance. Have
the system administrator determine if the network runs at an optimal speed. Decrease the
number of network hops between the Integration Service process and databases. For more
information, see “Improving Network Speed” on page 81.
♦ Use multiple CPUs. You can use multiple CPUs to run multiple sessions in parallel and
run multiple pipeline partitions in parallel. For more information, see “Using Multiple
CPUs” on page 82.
♦ Reduce paging. When an operating system runs out of physical memory, it starts paging to
disk to free physical memory. Configure the physical memory for the Integration Service
process machine to minimize paging to disk. For more information, see “Reducing Paging”
on page 83.
♦ Use processor binding. In a multi-processor UNIX environment, the Integration Service
may use a large amount of system resources. Use processor binding to control processor
usage by the Integration Service process. Also, if the source and target database are on the
same machine, use processor binding to limit the resources used by the database. For more
information, see “Using Processor Binding” on page 84.
Reducing Paging 83
Using Processor Binding
In a multi-processor UNIX environment, the Integration Service may use a large amount of
system resources if you run a large number of sessions. As a result, other applications on the
machine may not have enough system resources available. You can use processor binding to
control processor usage by the Integration Service process node. Also, if the source and target
database are on the same machine, use processor binding to limit the resources used by the
database.
In a Sun Solaris environment, the system administrator can create and manage a processor set
using the psrset command. The system administrator can then use the pbind command to
bind the Integration Service to a processor set so the processor set only runs the Integration
Service. The Sun Solaris environment also provides the psrinfo command to display details
about each configured processor and the psradm command to change the operational status of
processors. For more information, see the system administrator and Sun Solaris
documentation.
In an HP-UX environment, the system administrator can use the Process Resource Manager
utility to control CPU usage in the system. The Process Resource Manager allocates
minimum system resources and uses a maximum cap of resources. For more information, see
the system administrator and HP-UX documentation.
In an AIX environment, system administrators can use the Workload Manager in AIX 5L to
manage system resources during peak demands. The Workload Manager can allocate resources
and manage CPU, memory, and disk I/O bandwidth. For more information, see the system
administrator and AIX documentation.
85
Overview
After you tune the application, databases, and system for maximum single-partition
performance, you may find that the system is under-utilized. At this point, you can configure
the session to have two or more partitions.
You can use pipeline partitioning to improve session performance. Increasing the number of
partitions or partition points increases the number of threads. Therefore, increasing the
number of partitions or partition points also increases the load on the nodes in the
Integration Service. If the Integration Service node or nodes contain ample CPU bandwidth,
processing rows of data in a session concurrently can increase session performance.
Note: If you use a single-node Integration Service and you create a large number of partitions
or partition points in a session that processes large amounts of data, you can overload the
system.
If you have the partitioning option, perform the following tasks to manually set up partitions:
♦ Increase the number of partitions.
♦ Select the best performing partition types at particular points in a pipeline.
♦ Use multiple CPUs.
For more information about pipeline partitioning, see “Understanding Pipeline Partitioning”
in the Workflow Administration Guide.
Overview 87
Using Multiple CPUs
The Integration Service performs read, transformation, and write processing for a pipeline in
parallel. It can process multiple partitions of a pipeline within a session, and it can process
multiple sessions in parallel.
If you have a symmetric multi-processing (SMP) platform, you can use multiple CPUs to
concurrently process session data or partitions of data. This provides increased performance,
as true parallelism is achieved. On a single processor platform, these tasks share the CPU, so
there is no parallelism.
The Integration Service can use multiple CPUs to process a session that contains multiple
partitions. The number of CPUs used depends on factors such as the number of partitions,
the number of threads, the number of available CPUs, and amount or resources required to
process the mapping.
For more information about partitioning, see “Understanding Pipeline Partitioning” the
Workflow Administration Guide.
Performance Counters
93
Overview
All transformations have counters. For example, each transformation tracks the number of
input rows, output rows, and error rows for each session. Some transformations also have
performance counters. You can use the performance counters to increase session performance.
This appendix discusses the following performance counters:
♦ Readfromdisk
♦ Writetodisk
♦ Rowsinlookupcache
♦ Errorrows
For more information about performance counters, see “Monitoring Workflows” in the
Administrator Guide.
Errorrows Counter 97
98 Appendix A: Performance Counters
Index
A system 12
targets 7
aggregate function calls UNIX system 13
minimizing 40 Windows system 12
aggregate functions buffer block size
performance 40 performance 67
Aggregator transformation buffer length
optimizing performance 47 performance for flat file source 35
optimizing with filters 48 buffer memory
optimizing with GROUP BY 47 allocating 66
optimizing with incremental aggregation 47 bulk loading
optimizing with limited port connections 48 performance for targets 19, 20
optimizing with Sorted Input 47 target 19
performance 47 busy
performance details 95 thread statistic 5
allocating memory
XML sources 66
ASCII mode C
performance 78
cache directory
sharing 69
B caches
optimizing location 69
binding optimizing size 70
processor for performance 84 performance 69
bottlenecks caching
identifying 4 PowerCenter metadata 78
mappings 10 Char datatypes
sessions 11 removing trailing blanks for optimization 41
sources 8
99
check point interval DTM buffer
performance for targets 18 performance for pool size 67
commit interval
performance 71
common logic E
factoring 40
errors
conditional filters
See also Troubleshooting Guide
performance for sources 28
eliminating to improve performance 59
counters
minimizing tracing level to improve performance 73
Rowsinlookupcache 96
evaluating
Transformation_errorrows 97
expressions 42
Transformation_readfromdisk 95
expressions
Transformation_writetodisk 95
evaluating 42
CPUs
performance 40
performance 82
replacing with local variables 40
Custom transformation
external loader
performance 49
performance 20
external procedures
D performance 43
data caches
optimizing location 69 F
optimizing size 70
filters
database drivers
performance for mappings 38
performance for Integration Service 78
flat files
database indexes
See also Designer Guide
performance for targets 17
increasing performance 81
database joins
performance for buffer length 35
performance for sources 32
performance for delimited source files 35
database key constraints
performance for sources 35
performance for targets 17
functions
database queries
performance 41
performance for sources 27
databases
optimizing sources 26
optimizing targets 16
G
datatypes grid
Char 41 performance 63
performance for conversions 39
Varchar 41
DB2 H
bulk loading 19 high precision
PowerCenter repository performance 77 performance 72
deadlocks
performance for targets 21
DECODE function
See also Transformation Language Reference
I
performance 41 idle time (thread statistic)
using for optimization 41 description 5
directory
shared caches 69
100 Index
IIF expressions
See also Transformation Language Reference
M
performance 41 mappings
incremental aggregation factoring common logic 40
performance 47 identify bottlenecks 10
index caches single-pass reading 36
optimizing location 69 memory
optimizing size 70 increasing for performance 83
indexes Microsoft SQL Server
optimizing by dropping 17 bulk loading 19
Integration Service optimizing 32
performance 78 monitoring
performance with database drivers 78 data flow 97
performance with grid 63
N
J network
Joiner transformation performance 81
performance 50 network packets
performance details 95 increasing 29
performance for database sources 29
performance for targets 22
K numeric operations
performance 40
key constraints
optimizing by dropping 17
O
L object queries
performance 77
local variables
operators
replacing expressions 40
performance 41
LOOKUP function
optimizing
See also Transformation Language Reference
Aggregator transformation 47
minimizing for optimization 41
data flow 97
performance 41
dropping indexes and key constraints 17
Lookup SQL Override option
increasing network packet size 29
reducing cache size 53
mappings 34
Lookup transformation
minimizing aggregate function calls 40
optimizing 96
pipeline partitioning 86
optimizing lookup condition 53
PowerCenter repository 77
optimizing lookup condition matching 52
removing trailing blank spaces 41
optimizing multiple lookup expressions 54
sessions 62
optimizing with cache reduction 53
Simple Pass Through mapping 37
optimizing with caches 51
single-pass reading 36
optimizing with concurrent caches 52
sources 26
optimizing with database drivers 51
target database 16
optimizing with high-memory machine 53
transformations 46
optimizing with indexing 53
using DECODE v. LOOKUP expressions 41
optimizing with ORDER BY statement 53
performance 51
Index 101
Oracle minimizing error tracing 73
bulk loading 19 network for database sources 29
performance for source connections 30 numeric and string operations 40
performance for targets 23 object queries 77
Oracle external loader operators and functions 41
bulk loading 20 Oracle source connections 30
Oracle targets 23
PowerCenter components 76
P PowerCenter repository 77
processor binding 84
paging
pushdown optimization 64
eliminating 83
removing staging areas 74
performance
replacing expressions with local variables 40
See also optimizing
Sequence Generator transformation 55
Aggregator transformation 47
single-pass reading for mappings 36
allocating buffer memory 66
Sorter transformation 56
block size 67
Source Qualifier transformation 57
buffer length for flat file sources 35
SQL transformation 58
bulk loading targets 19, 20
Sybase IQ 20
caching PowerCenter metadata 78
system 80
Char-Char and Char-Varchar 41
Teradata FastExport for sources 31
checkpoint interval for targets 18
tuning, overview 2
commit interval 71
using multiple CPUs 82
conditional filters for sources 28
persistent cache
Custom transformation 49
for lookups 52
data movement mode 78
pipeline partitioning
database indexes for targets 17
optimizing performance 86
database joins for sources 32
optimizing source databases 89
database key constraints for targets 17
optimizing target databases 91
database network packets for targets 22
pipelines
database queries for sources 27
data flow monitoring 97
datatype conversions 39
PowerCenter repository
deadlocks for targets 21
optimal location 77
DECODE and LOOKUP functions 41
optimization 77
delimited flat file sources 35
performance 77
disabling high precision 72
performance on DB2 77
eliminating transformation errors 59
pushdown optimization
expressions 40
performance 64
external procedures 43
factoring out common logic 40
flat file sources 35
for aggregate function calls 40
R
for DTM buffer pool size 67 Rank transformation
grid 63 See also Transformation Guide
identifying bottlenecks 4 performance details 95
IIF expressions 41 Repository Service
improving network speed 81 caching PowerCenter metadata 78
increasing memory 83 Repository Service process
Integration Service 78 optimal location 77
Joiner transformation 50 run time
Lookup transformation 51 thread statistic 5
mapping filters 38
102 Index
S targets
identify bottlenecks 7
Sequence Generator transformation Teradata FastExport
performance 55 performance for sources 31
sessions threads
identify bottlenecks 11 busy time 5
optimizing 62 idle time 5
performance tuning 2 run time 5
performance with grid 63 transformations
shared cache eliminating errors 59
for lookups 52 optimizing 46, 97
Simple Pass Through mapping
optimizing 37
single-pass reading U
definition 36
UNIX
performance for mappings 36
system bottlenecks 13
Sorter transformation
optimizing with memory allocation 56
optimizing with partition directories 56
performance 56
V
source databases Varchar datatypes
optimizing by partitioning 89 See also Designer Guide
Source Qualifier transformation removing trailing blanks for optimization 41
performance 57
sources
identifying bottlenecks 8 W
identifying bottlenecks with database queries 9
Workflow Manager
identifying bottlenecks with Filter transformation 8
increasing network packet size 29
identifying bottlenecks with read test mapping 8
optimizing 26
performance for flat file sources 35
SQL transformation
X
optimizing performance 58 XML sources
staging areas allocating memory 66
removing to improve performance 62, 74
string operations
minimizing for performance 40
Sybase IQ external loader
bulk loading 20
Sybase SQL Server
bulk loading 19
optimizing 32
system
identify bottlenecks 12
performance 80
T
target databases
optimizing 16
optimizing by partitioning 91
Index 103
104 Index