Sei sulla pagina 1di 161

All workflows begin with the Start task, but you need

to instruct the Integration Service which task to run


next.
To do this, you link tasks in the Workflow Manager

1. GETTING STARTED
Metadata Extensions tab
Execute SQL to create table
The source qualifier represents the rows that the
Integration Service reads from the source when it
runs a session

When the PowerCenter Integration Service runs


workflows, you can monitor workflow progress in
the Workflow Monitor.
You can view details about a workflow or task in
either a Gantt chart view or a Task view. You can
start, stop, and abort workflows from the Workflow
Monitor.
The Workflow Monitor displays workflows that have
run at least once.
A transformation is a part of a mapping that
generates or modifies data.
Every mapping includes a Source Qualifier
transformation, representing all data read from a
source and temporarily stored by the Integration
Service.

A session is a set of instructions that tells the


Integration Service how to move data from sources
to targets.
A session is a task, similar to other tasks available in
the Workflow Manager.
You create a session for each mapping that you want
the Integration Service to run.
The Integration Service uses the instructions
configured in the session and mapping to move
data from sources to targets.
A workflow is a set of instructions that tells the
Integration Service how to execute tasks, such as
sessions, email notifications, and shell commands.
You create a workflow for sessions you want the
Integration Service to run.
You can include multiple sessions in a workflow to
run sessions in parallel or sequentially.
The Integration Service uses the instructions
configured in the workflow to run sessions and
other tasks.

The Advanced Transformation toolbar contains


transformations such as Java, SQL, and XML Parser
transformations
Creating Target You can also manually create a target definition,
import the definition for an existing target from a
database, or create a relational target from a
transformation in the Designer

**Select a code page for the database connection


(DB Connection Objects)
The source code page must be a subset of the target
code page.
The target code page must be a superset of the
source code page.

Click the Indexes tab to add an index to the target


table.
If the target database is Oracle, skip to the final
step. You cannot add an index to a column that
already has the PRIMARY KEY constraint added to it.
Click Layout > Link Columns.
When you drag ports from one transformation to
another, the Designer copies the port description
and links the original port to its copy.
If you click Layout > Copy Columns, every port you
drag is copied, but not linked.
Link Columns vs. Copy columns??

You can create reusable or non-reusable sessions in


the Workflow Manager
When you create a workflow, you can include
reusable tasks that you create in the Task
Developer.
You can also include non-reusable tasks that you
create in the Workflow Designer.

By default, the Lookup transformation queries and


stores the contents of the lookup table before the
rest of the transformation runs, so it performs the
join through a local copy of the table that it has
cached
If the data types, including precision and scale, of
these two columns do not match, the Designer
displays a message and marks the mapping invalid.

By default, the workflow is scheduled to run on


demand. The Integration Service only runs the
workflow when you manually start the workflow.
You can configure workflows to run on a schedule

**Overview window
When you run the workflow, the Integration Service
runs all sessions in the workflow, either
simultaneously or in sequence, depending on how
you arrange the sessions in the workflow.

Service executes the next task in the workflow by


default
If the link condition evaluates to True, the
Integration Service runs the next task in the
workflow. You can also use predefined or userdefined workflow variables in the link condition.
You can use the -- or // comment indicators with the
Expression Editor to add comments. Use comments
to describe the expression
You can view results of link evaluation during
workflow runs in the workflow log

Click the Gantt Chart tab at the bottom of the Time


window to verify the Workflow Monitor is in Gantt
Chart view.
Note: You can also click the Task View tab at the
bottom of the Time window to view the Workflow
Monitor in Task view. You can switch back and forth
between views at any time.

XML Creating sources/targets etc


From the list of source definitions, add the following
source definitions to the mapping:
- PROMOTIONS
- ITEMS_IN_PROMOTIONS
- ITEMS
- MANUFACTURERS
- ORDER_ITEMS
Delete all Source Qualifier transformations that the
Designer creates when you add these source
definitions
Add a Source Qualifier transformation named
SQ_AllData to the mapping, and connect all the
source definitions to it.
Sequence Generator The starting number (normally 1).
The current value stored in the repository.
The number that the Sequence Generator
transformation adds to its current value for every
request for a new ID.
The maximum value in the sequence.
A flag indicating whether the Sequence Generator
transformation counter resets to the minimum
value once it has reached its maximum value.
The Sequence Generator transformation has two
output ports, NEXTVAL and CURRVAL, which
correspond to the two pseudo-columns in a
sequence. When you query a value from the
NEXTVAL port, the transformation generates a new
value.
cannot add any new ports to this transformation or
reconfigure NEXTVAL and CURRVAL
Defining a Link Condition
After you create links between tasks, you can
specify conditions for each link to determine the
order of execution in the workflow. If you do not
specify conditions for each link, the Integration

information such as available EBFs and Informatica


recommendations

You can configure more than one node to serve as a


gateway.
If the master gateway node becomes unavailable,
the Service Manager on other gateway nodes elects
another master gateway node.
If you configure one node to serve as the gateway
and the node becomes unavailable, the domain
cannot accept service requests

2. INFA ADMIN GUIDE


A domain is a collection of nodes and services that
you can group in folders based on administration
ownership
One node in the domain acts as a gateway to
receive service requests from clients and route
them to the appropriate service and node.
Services and processes run on nodes in a domain.

Service Manager - A service that manages all


domain operations
Application Services - Services that represent
server-based functionality, such as the Model
Repository Service and the Data Integration Service.
The Service Manager and application services
control security.
The Service Manager manages users and groups
that can log in to application clients and
authenticates the users who log in to the
application clients.
The Service Manager and application services
authorize user requests from application clients.
Informatica Administrator (the Administrator tool),
consolidates the administrative tasks for domain
objects such as services, nodes, licenses, and grids.
You manage the domain and the security of the
domain through the Administrator tool

Gateway Nodes One node acts as the gateway at any given time.
That node is called the master gateway.
A gateway node can run application services, and it
can serve as a master gateway node.
The master gateway node is the entry point to the
domain

Nodes Each node in the domain runs a Service Manager


that manages domain operations on that node
The operations that the Service Manager performs
depend on the type of node.
A node can be a gateway node or a worker node
You can subscribe to alerts to receive notification
about node events such as node failure or a master
gateway election.
You can also generate and upload node diagnostics
to the Configuration Support Manager and review

Worker Nodes A worker node is any node not configured to serve


as a gateway.
Service Manager It runs as a service on Windows and as a daemon on
UNIX.
When you start Informatica services, you start the
Service Manager.
The Service Manager runs on each node.
If the Service Manager is not running, the node is
not available.
Application service support It starts and stops services and service processes
based on requests from clients.
It also directs service requests to application
services.
The Service Manager uses TCP/IP to communicate
with the application services
Domain support The functions that the Service Manager performs on
a node depend on the type of node.
For example, the Service Manager running on the
master gateway node performs all domain
functions on that node.
The Service Manager running on any other node
performs some domain functions on that node

Informatica Administrator
Use the Administrator tool to complete the following
types of tasks:
Domain administrative tasks Manage logs, domain objects, user permissions, and
domain reports.
Generate and upload node diagnostics.
Monitor jobs and applications that run on the Data
Integration Service.
Domain objects include application services, nodes,
grids, folders, database connections, operating
system profiles, and licenses.
Security administrative tasks Manage users, groups, roles, and privileges

Application
services
represent
functionality. Application services
following services:
- Analyst Service
- Content Management Service
- Data Integration Service
- Metadata Manager Service
- Model Repository Service
- PowerCenter Integration Service
- PowerCenter Repository Service
- PowerExchange Listener Service
- PowerExchange Logger Service
- Reporting Service
- SAP BW Service
- Web Services Hub

server-based
include the

The Administrator tool has the following tabs:


Domain View and edit the properties of the domain and
objects within the domain.
The contents that appear and the tasks you can
complete on the Domain tab vary based on the
view that you select.
You can select the following views:
Services and Nodes - View and manage application
services, nodes, grids, licenses
Connections - View and manage connections
Logs - View log events for the domain and services
within the domain.
Monitoring - View the status of profile jobs,
scorecard jobs, preview jobs, mapping jobs, and SQL
data services for each Data Integration Service.
Reports - Run a Web Services Report or License
Management Report.
Security - Manage users, groups, roles, and
privileges.

High Availability
High
availability
consists
of
the
following
components:
Resilience - The ability of application services to
tolerate transient network failures until either the
resilience timeout expires or the external system
failure is fixed.
Failover - The migration of an application service or
task to another node when the node running the
service process becomes unavailable.
Recovery - The automatic completion of tasks after
a service is interrupted. Automatic recovery is
available for PowerCenter Integration Service and
PowerCenter Repository Service tasks. You can also
manually recover PowerCenter Integration Service
workflows and sessions. Manual recovery is not part
of high availability

Folder Management
Folders can contain nodes, services, grids, licenses,
and other folders.
User Accounts
Default Administrator The default administrator is a user account in the
native security domain.
You cannot create a default administrator.
You cannot disable or modify the user name or
privileges of the default administrator.
You can change the default administrator password

Domain Administrator A domain administrator can create and manage


objects in the domain, including user accounts,
nodes, grids, licenses, and application services
However, by default, the domain administrator
cannot log in to application clients.
The default administrator must explicitly give a
domain
administrator
full
permissions
and
privileges to the application services

Administrator tool and the infacmd and pmrep


command line programs.
Analyst Service privilege - Determines actions
that users can perform using Informatica Analyst.
Data Integration Service privilege - Determines
actions on applications that users can perform using
the
Administrator tool and the infacmd command
line program - This privilege also determines
whether users can drill down and export profile
results.
Metadata
Manager
Service
privileges
Determine actions that users can perform using
Metadata Manager.
Model Repository Service privilege - Determines
actions on projects that users can perform using
Informatica Analyst and Informatica Developer.
PowerCenter Repository Service privileges Determine PowerCenter repository actions that
users can perform using the Repository Manager,
Designer, Workflow Manager, Workflow Monitor, and
the pmrep and pmcmd command line programs.
PowerExchange application service privileges Determine actions that users can perform on the
PowerExchange
Listener
Service
and
PowerExchange Logger Service using the infacmd
pwx commands.
Reporting Service privileges - Determine
reporting actions that users can perform using Data
Analyzer.

Application Client Administrator An application client administrator can create and


manage objects in an application client.
You must create administrator accounts for the
application clients
By default, the application client administrator does
not have permissions or privileges on the domain.
User A user with an account in the Informatica domain
can perform tasks in the application clients.
Typically, the default administrator or a domain
administrator creates and manages user accounts
and assigns roles, permissions, and privileges in the
Informatica domain.
However, any user with the required domain
privileges and permissions can create a user
account and assign roles, permissions, and
privileges

You assign privileges to users and groups and to


application services.
You assign privileges to users and groups on the
Security tab of the Administrator tool

Understanding Authentication and Security


Domains When a user logs in to an application client, the
Service Manager authenticates the user account in
the Informatica domain and verifies that the user
can use the application client.
The Service Manager uses native and LDAP
authentication to authenticate users logging in to
the Informatica domain.
You can use more than one type of authentication in
an Informatica domain.
By default, the Informatica domain uses native
authentication.
You can configure the Informatica domain to use
LDAP authentication in addition to native
authentication.

High Availability
If you have the high availability option, you can
achieve full high availability of internal Informatica
components.
You can achieve high availability with external
components based on the availability of those
components.
If you do not have the high availability option,
you can achieve some high availability of
internal components
Example
While you are fetching a mapping into the
PowerCenter Designer workspace, the PowerCenter
Repository Service becomes unavailable, and the
request fails. The PowerCenter Repository Service
fails over to another node because it cannot restart
on the same node.

Privileges
Informatica includes the following privileges:
Domain privileges - Determine actions on the
Informatica domain that users can perform using the

The PowerCenter Designer is resilient to temporary


failures and tries to establish a connection to the
PowerCenter Repository Service.
The PowerCenter Repository Service starts within
the resilience timeout period, and the PowerCenter
Designer reestablishes the connection.
After the PowerCenter Designer reestablishes the
connection, the PowerCenter Repository Service
recovers from the failed operation and fetches the
mapping into the PowerCenter Designer workspace.

Restart services - The Service Manager can restart


application services after a failure.
Manual recovery of PowerCenter workflows
and sessions - You can manually recover
PowerCenter workflows and sessions.
Multiple gateway nodes - You can configure
multiple nodes as gateway.
Note: You must have the high availability option for
failover and automatic recovery
You can configure the following resilience properties
for the domain, application services, and command
line programs:
Resilience timeout - The amount of time a client
tries to connect or reconnect to a service. A limit on
resilience timeouts can override the timeout.
Limit on resilience timeout - The amount of time
a service waits for a client to connect or reconnect
to the service. This limit can override the client
resilience timeouts configured for a connecting
client. This is available for the domain and
application services

Resilience All clients of PowerCenter components are


resilient to service failures.
PowerCenter services may also be resilient to
temporary failures of external systems, such as
database systems, FTP servers, and message
queue sources.
For this type of resilience to work, the external
systems must be highly available
Internal Resilience
You can configure internal resilience at the following
levels:
- Domain
- Application Services
- Gateway

Configuring Service Resilience for the Domain


The domain resilience timeout determines how long
services try to connect as clients to other services.
The default value is 30 seconds.
The limit on resilience timeout is the maximum
amount of time that a service allows another
service to connect as a client.
This limit overrides the resilience timeout for the
connecting service if the resilience timeout is a
greater value.
The default value is 180 seconds.

The Model Repository, Data Integration


Service, and Analyst Service do not have
internal resilience.
If the master gateway node becomes unavailable
and fails over to another gateway node, you must
restart these services.
After the restart, the services do not restore the
state of operation and do not recover from the
point of interruption.
You must restart jobs that were previously running
during the interruption

The PowerCenter Client resilience timeout is 180


seconds and is not configurable.
This resilience timeout is bound by the service limit
on resilience timeout.

High Availability in the Base Product


Informatica
provides
some
high
availability
functionality that does not require the high
availability option.
The base product provides the following high
availability functionality:
Internal PowerCenter resilience - The Service
Manager, application services, PowerCenter Client,
and command line programs are resilient to
temporary unavailability of other PowerCenter
internal components.
PowerCenter Repository database resilience The PowerCenter Repository Service is resilient to
temporary unavailability of the repository database.

When you use a command line program to connect


to the domain or an application service, the
resilience timeout is determined by one of the
following values:
Command line option - You can determine the
resilience timeout for command line programs by
using a command line option, -timeout or -t, each
time you run a command
Environment variable - If you do not use the
timeout option in the command line syntax, the
command line program uses the value of the
environment
variable

INFA_CLIENT_RESILIENCE_TIMEOUT
that
is
configured on the client machine.
Default value - If you do not use the command line
option or the environment variable, the command
line program uses the default resilience timeout of
180 seconds.
Limit on timeout - If the limit on resilience timeout
for the service is smaller than the command line
resilience timeout, the command line program uses
the limit as the resilience timeout

To move data from sources to targets, the


PowerCenter Integration Service uses the following
components:
PowerCenter Integration Service process - The
PowerCenter Integration Service starts one or more
PowerCenter Integration Service processes to run
and monitor workflows. When you run a workflow,
the PowerCenter Integration Service process starts
and locks the workflow, runs the workflow tasks, and
starts the process to run sessions.
Load Balancer - The PowerCenter Integration
Service uses the Load Balancer to dispatch tasks.
The Load Balancer dispatches tasks to achieve
optimal performance. It may dispatch tasks to a
single node or across the nodes in a grid.
Data Transformation Manager (DTM) process The PowerCenter Integration Service starts a DTM
process to run each Session and Command task
within a workflow. The DTM process performs
session validations, creates threads to initialize the
session, read, write, and transform data, and
handles pre- and post- session operations.
PowerCenter Integration Service process accepts
requests from the PowerCenter Client and from
pmcmd and performs the following tasks:
- Manage workflow scheduling.
- Lock and read the workflow.
- Read the parameter file.
- Create the workflow log.
- Run workflow tasks and evaluates the conditional
links connecting tasks.
- Start the DTM process or processes to run the
session.
- Write historical run information to the repository.
- Send post-session email in the event of a DTM
failure.
Thread Types
The types of threads the master thread creates
depend on the pre- and post-session properties, as
well as the types of transformations in the
mapping.
The master thread can create the following types of
threads:
Mapping threads The master thread creates one mapping thread for
each session.
The mapping thread fetches session and mapping
information, compiles the mapping, and cleans up
after session execution
Pre- and post-session threads -

PowerCenter Integration Service


PowerCenter Integration Service files include runtime files, state of operation files, and session log
files.
The PowerCenter Integration Service creates files to
store the state of operations for the service.
The state of operations includes information such as
the active service requests, scheduled tasks, and
completed and running processes.
If the service fails, the PowerCenter Integration
Service can restore the state and recover
operations from the point of interruption.

The master thread creates one pre-session and one


post-session thread to perform pre- and post-session
operations.
Reader threads
Transformation threads
The number of transformation threads depends on
the partitioning information for each pipeline
Writer threads

PowerCenter Integration Service variable directory,


$PMTargetFileDir, by default
Cache Files
When the PowerCenter Integration Service process
creates memory cache, it also creates cache files.
The PowerCenter Integration Service process creates
cache files for the following mapping objects:
- Aggregator transformation
- Joiner transformation
- Rank transformation
- Lookup transformation
- Sorter transformation
- XML target

Reject Files
By default, the PowerCenter Integration Service
process creates a reject file for each target in the
session.
The writer may reject a row in the following
circumstances:
- It is flagged for reject by an Update Strategy or
Custom transformation.
- It violates a database constraint such as primary
key constraint.
- A field in the row was truncated or overflowed, and
the target database is configured to reject truncated
or overflowed data.
Note: If you enable row error logging, the
PowerCenter Integration Service process does not
create a reject file.

By default, the DTM creates the index and data files


for
Aggregator,
Rank,
Joiner,
Lookup
transformations and XML targets in the directory
configured for the $PMCacheDir service process
variable.
The PowerCenter Integration Service process names
the index file PM*.idx, and the data file PM*.dat.
The PowerCenter Integration Service process creates
the cache file for a Sorter transformation in the
$PMTempDir service process variable directory.
Incremental Aggregation Files
If the session performs incremental aggregation, the
PowerCenter Integration Service process saves
index and data cache information to disk when the
session finished.
The next time the session runs, the PowerCenter
Integration Service process uses this historical
information
to
perform
the
incremental
aggregation.
By default, the DTM creates the index and data files
in the directory configured for the $PMCacheDir
service process variable

Recovery Tables Files


The PowerCenter Integration Service process creates
recovery tables on the target database system
when it runs a session enabled for recovery.

Control File
When you run a session that uses an external
loader, the PowerCenter Integration Service process
creates a control file and a target flat file.
The control file contains information about the
target flat file such as data format and loading
instructions for the external loader.
The control file has an extension of .ctl.
The PowerCenter Integration Service process creates
the control file and the target flat file in the
PowerCenter Integration Service variable directory,
$PMTargetFileDir, by default.

Persistent Lookup Cache


By default, the DTM creates the index and data files
in the directory configured for the $PMCacheDir
service process variable
Load Balancer
You configure the following settings for the domain
to determine how the Load Balancer dispatches
tasks:
Dispatch mode - The dispatch mode determines
how the Load Balancer dispatches tasks. You can
configure the Load Balancer to dispatch tasks in a
simple round-robin fashion, in a round-robin fashion
using node load metrics, or to the node with the
most available computing resources.

Indicator File
If you use a flat file as a target, you can configure
the PowerCenter Integration Service to create an
indicator file for target row type information.
For each target row, the indicator file contains a
number to indicate whether the row was marked for
insert, update, delete, or reject.
The PowerCenter Integration Service process names
this file target_name.ind and stores it in the

Service level - Service levels establish dispatch


priority among tasks that are waiting to be
dispatched. You can create different service levels
that a workflow developer can assign to workflows
You configure the following Load Balancer settings
for each node:
Resources - When the PowerCenter Integration
Service runs on a grid, the Load Balancer can
compare the resources required by a task with the
resources available on each node. The Load
Balancer dispatches tasks to nodes that have the
required resources. You assign required resources in
the task properties. You configure available
resources using the Administrator tool or infacmd.
CPU profile - In adaptive dispatch mode, the Load
Balancer uses the CPU profile to rank the computing
throughput of each CPU and bus architecture in a
grid. It uses this value to ensure that more powerful
nodes get precedence for dispatch.
Resource provision thresholds - The Load
Balancer checks one or more resource provision
thresholds to determine if it can dispatch a task. The
Load Balancer checks different thresholds depending
on the dispatch mode

USING PMCMD
pmcmd is a program you use to communicate with
the Integration Service.
With pmcmd, you can perform some of the tasks
that you can also perform in the Workflow Manager,
such as starting and stopping workflows and
sessions.

The Load Balancer uses the following dispatch


modes:
Round-robin - The Load Balancer dispatches tasks
to available nodes in a round-robin fashion. It checks
the Maximum Processes threshold on each available
node and excludes a node if dispatching a task
causes the threshold to be exceeded. This mode is
the least compute-intensive and is useful when the
load on the grid is even and the tasks to dispatch
have similar computing requirements.
Metric-based - The Load Balancer evaluates nodes
in a round-robin fashion. It checks all resource
provision thresholds on each available node and
excludes a node if dispatching a task causes the
thresholds to be exceeded. The Load Balancer
continues to evaluate nodes until it finds a node that
can accept the task. This mode prevents overloading
nodes when tasks have uneven computing
requirements.
Adaptive - The Load Balancer ranks nodes
according to current CPU availability. It checks all
resource provision thresholds on each available
node and excludes a node if dispatching a task
causes the thresholds to be exceeded. This mode
prevents overloading nodes and ensures the best
performance on a grid that is not heavily loaded.

Use pmcmd in the following modes:


Command line mode - You invoke and exit pmcmd
each time you issue a command. You can write
scripts to schedule workflows with the command line
syntax. Each command you write in command line
mode must include connection information to the
Integration Service.
Interactive mode - You establish and maintain an
active connection to the Integration Service. This
lets you issue a series of commands.
You can use environment variables for user names
and passwords with pmcmd.
You can also use environment variables to customize
the way pmcmd displays the date and time on the
machine running the Integration Service process.
Before you use pmcmd, configure these variables on
the machine running the Integration Service
process.
The environment variables apply to pmcmd
commands that run on the node.
Note: If the domain is a mixed-version domain, run
pmcmd from the installation directory of the
Integration Service version

Running Commands in Command Line Mode


When you run pmcmd in command line mode, you
enter connection information such as domain
name, Integration Service name, user name and
password in each command.
For example, to start the workflow wf_SalesAvg in
folder SalesEast, use the following syntax:
pmcmd
startworkflow
-sv
MyIntService
-d
MyDomain -u seller3 -p jackson -f SalesEast
wf_SalesAvg

Running Commands in Interactive Mode


Use pmcmd in interactive mode to start and stop
workflows and sessions without writing a script.
When you use the interactive mode, you enter
connection information such as domain name,
Integration Service name, user name, and
password. You can run subsequent commands
without entering the connection information for
each command.
For example, the following commands invoke the
interactive mode, establish a connection to
Integration Service MyIntService, and start
workflows wf_SalesAvg and wf_SalesTotal in
folder SalesEast:

Interactive mode - You can issue pmrep


commands from an interactive prompt. pmrep does
not exit after it completes a command.
You can use environment variables to set user
names and passwords for pmrep. Before you use
pmrep, configure these variables.
The environment variables apply to pmrep
commands that run on the node.
All pmrep commands require a connection to the
repository except for the following commands:
- Help
- ListAllPrivileges
Use the pmrep Connect command to connect to the
repository before using other pmrep commands.
Note: If the domain is a mixed-version domain, run
pmrep from the installation directory of the
Repository Service version.

pmcmd
pmcmd> connect -sv MyIntService -d MyDomain -u
seller3 -p jackson
pmcmd> setfolder SalesEast
pmcmd> startworkflow wf_SalesAvg
pmcmd> startworkflow wf_SalesTotal
USING PMREP
pmrep is a command line program that you use to
update
repository
information
and perform
repository functions.
pmrep is installed in the PowerCenter Client and
PowerCenter Services bin directories.
Use pmrep to perform repository administration
tasks such as listing repository objects, creating
and editing groups, restoring and deleting
repositories,
and
updating
session-related
parameters and security information in the
PowerCenter repository.
When you use pmrep, you can enter commands in
the following modes:
Command line mode - You can issue pmrep
commands directly from the system command line.
Use command line mode to script pmrep commands.

3. DESIGNER GUIDE
The Designer provides the following tools:
Source Analyzer - Import or create source definitions
for flat file, XML, COBOL, Application, and relational
sources.
Target Designer - Import or create target definitions.
Transformation Developer - Create reusable
transformations.
Mapplet Designer - Create mapplets.
Mapping Designer - Create mappings.
The Designer consists of the following windows:
Navigator - Connect to multiple repositories and folders. You
can also copy and delete objects and create shortcuts using the
Navigator. Workspace - View or edit sources, targets, mapplets,
transformations, and mappings. You work with a single tool at a
time in the workspace, which has two formats: default and
workbook. You can view multiple versions of an object in the
workspace.
Status bar - Displays the status of the operation you perform.
Output - Provides details when you perform certain tasks, such
as saving work or validating a mapping. Rightclick the Output
window to access window options, such as printing output text,
saving text to file, and changing the font size.

10

Overview - View workbooks that contain large mappings or a lot


of objects. The Overview window outlines the visible area in the
workspace and highlights selected objects in color. To open the
Overview window, click View > Overview Window.
Instance Data - View transformation data while you run the
Debugger to debug a mapping.
Target Data - View target data while you run the Debugger to
debug a mapping.
You can view a list of open windows, and you can switch from
one window to another in the Designer. To view the list of open
windows, click Window > Windows.
**Configuring Designer options

Creating a Toolbar
You can create a new toolbar and choose buttons for
the new toolbar.
You can create toolbars in the Designer, Workflow
Manager, and the Workflow Monitor.
Find Next
Use the Find Next tool to search for a column or port
name in:
- Transformations
- Mapplets
- Source definitions
- Target definitions
With the Find Next tool, you can search one object at
a time.
You cannot search multiple objects at the same
time.
Use Find Next in each Designer tool.
Select a single transformation or click in the Output
window before performing the search.
The Designer saves the last 10 strings searched in
the Find Next box on the Standard toolbar
You can search for a string in the Save, Generate, or
Validate tabs in the Output window.
The Find in Workspace tool searches for a field name
or transformation name in all transformations in the
workspace.
The Find in Workspace tool lets you to search all of
the transformations in the workspace for port or
transformation names.
You can search for column or port names or table
names matching the search string.
You can specify whether to search across all names
in the workspace, or across the business name of a
table, column, or port.
You can also choose to search for whole word
matches for the search string or matches which
match the case of the search string
You can complete the following tasks in each Designer
tool:

- Add a repository.
- Print the workspace.
- View date and time an object was last saved.
- Open and close a folder.
- Create shortcuts (You cannot create shortcuts to
objects in non-shared folders)
- Check out and in repository objects.
- Search for repository objects.
- Enter descriptions for repository objects.
- View older versions of objects in the workspace.
- Revert to a previously saved object version.
- Copy objects.
- Export and import repository objects.
- Work with multiple objects, ports, or columns.
- Rename ports.
- Use shortcut keys.
You can also view object dependencies in the
Designer.
Rules and Guidelines for Viewing and Comparing
Versioned Repository Objects
You cannot simultaneously view multiple versions of
composite objects, such as mappings and
mapplets.
Older versions of composite objects might not
include the child objects that were used when the
composite object was checked in.
If you open a composite object that includes a child
object version that is purged from the repository,
the preceding version of the child object appears in
the workspace as part of the composite object.
For example, you want to view version 5 of a
mapping that originally included version 3 of a
source definition, but version 3 of the source
definition is purged from the repository. When you
view version 5 of the mapping, version 2 of the
source definition appears as part of the mapping.
Shortcut objects are not updated when you modify
the objects they reference. When you open a
shortcut object, you view the same version of the
object that the shortcut originally referenced, even
if subsequent versions exist.
Viewing an Older Version of a Repository Object
To open an older version of an object in the
workspace:
1. In the workspace or Navigator, select the object
and click Versioning > View History.
2. Select the version you want to view in the
workspace and click Tools > Open in Workspace.
Note: An older version of an object is read-only, and
the version number appears as a prefix before the
object name. You can simultaneously view multiple

11

versions of a non-composite object in the


workspace
Reverting to a Previous Object Version
When you edit an object in the Designer, you can
revert to a previously saved version, undoing
changes you entered since the last save.
You can revert to the previously saved versions of
multiple objects at the same time.
To revert to a previously saved version of an object:
1. Open the object in the workspace.
2. Select the object and click Edit > Revert to
Saved.
3. Click Yes. If you selected more than one object,
click Yes to All.
The Designer removes all changes entered since the
last time you saved the object.
Copying Designer Objects
You can copy Designer objects within the same
folder, to a different folder, or to a different
repository.
You can copy any of the Designer objects such as
sources, targets, mappings, mapplets,
transformations, and dimensions.
You must open the target folder before you can copy
objects to it.
The Copy Wizard checks for conflicts in the target
folder and provides choices to resolve the conflicts.
The Copy Wizard displays possible resolutions.
For a duplicate object, you can rename, reuse,
replace, or skip copying the object.
To configure display settings and functions of the
Copy Wizard, click Tools > Options in the Designer.
You can import objects from an XML file through the
Import Wizard in the Designer.
The Import Wizard provides the same options to
resolve conflicts as the Copy Wizard
Working with Multiple Ports or Columns
In all Designer tools, you can move or delete
multiple ports or columns at the same time.
Note: You cannot select multiple ports or columns
when editing COBOL sources in the Source Analyzer
Note: When you select multiple ports or columns,
the Designer disables add, copy, and paste
Working with Metadata Extensions
You can extend the metadata stored in the
repository by associating information with
individual repository objects.

For example, you may want to store contact


information with the sources you create.
You associate information with repository objects
using metadata extensions.
Repository objects can contain both vendor-defined
and user-defined metadata extensions.
You can view and change the values of vendordefined metadata extensions, but you cannot
create, delete, or redefine them.
You can create, edit, delete, and view user-defined
metadata extensions and change their values.
You can create metadata extensions for the
following objects in the Designer:
- Source definitions
- Target definitions
- Transformations
- Mappings
- Mapplets
You can create either reusable or non-reusable
metadata extensions.
You associate reusable metadata extensions with all
repository objects of a certain type, such as all
source definitions or all Expression transformations.
You associate non-reusable metadata extensions
with a single repository object, such as one target
definition or one mapping.
If you create a reusable metadata extension for a
transformation, the metadata extension applies to
all transformations of that type (for example, all
Aggregator transformations or all Router
transformations), and not to all transformations.
Note: If you make a metadata extension reusable,
you cannot change it back to non-reusable. The
Designer makes the extension reusable as soon as
you confirm the action.
Editing Reusable Metadata Extensions
If the metadata extension you want to edit is
reusable and editable, you can change the value of
the metadata extension, but not any of its
properties. However, if the vendor or user who
created the metadata extension did not make it
editable, you cannot edit the metadata extension or
its value.
To restore the default value for a metadata
extension, click Revert in the UnOverride column.
Editing Non-Reusable Metadata Extensions
If the metadata extension you want to edit is nonreusable, you can change the value of the
metadata extension and its properties.
You can also promote the metadata extension to a
reusable metadata extension.
To restore the default value for a metadata
extension, click Revert in the UnOverride column.

12

Using Business Names


You can add business names to sources, targets, and
columns.
Business names are descriptive names that you give
to a source, target, or column.
They appear in the Navigator in the Business
Components source node and in the source and
target nodes.
Business names can also appear as column names
of the source and target definition in the
workspace.
You can also create source qualifiers to display
business names as column names in the Mapping
and Mapplet Designers.
Using Business Documentation
Business documentation provides details about a
repository object or transformation expression.
You can create and edit links to business
documentation that you have developed for
repository objects through the Designer.
The documentation must reside on a local machine,
network server, or company intranet or internet
web site in a Windows environment.
You can develop business documentation in HTML,
PDF, or any text format, for the following repository
objects:
- Source and target tables and table instances
- All transformations and transformation instances
- Mapplets
- Mappings
- Business component directories
To access business documentation, you need to
complete the following tasks:
- Specify the documentation path in the Designer.
- Create a link in the repository object.
- Click the link to view the documentation
Viewing Mapplet and Mapping Reports
You can view PowerCenter Repository Reports for
mappings and mapplets in the Designer.
View reports to get more information about the
sources, targets, ports, and transformations in
mappings and mapplets.
When you view a report, the Designer launches the
Data Analyzer application in a browser window and
displays the report.

Before you run reports from the Designer, create a


Reporting Service in the PowerCenter domain that
contains the PowerCenter repository.
When you create a Reporting Service for a
PowerCenter repository, Data Analyzer imports the
PowerCenter Repository Reports.
Viewing a Mapplet Composite Report
The Mapplet Composite Report includes information
about a mapplet:
- All objects. Information about all objects in the
mapplet.
- Lookup transformations. Lookup transformations
in the mapplet.
- Dependencies. Mappings that use the mapplet.
- Ports. Port details for the input and output ports.
- Sources. Source instances in the mapplet.
- Transformations. Transformations used in the
mapplet.
To view a Mapplet Composite Report:
1. In the Designer, open a mapplet.
2. Right-click in the workspace and choose View
Mapplet Report
Viewing a Mapping Composite Report
View a mapping report to get more information
about the objects in a PowerCenter mapping.
The Mapping Composite Report includes information
about the following components in the mapplet:
- Source and target fields. Fields used in mapping
sources.
- Port connections. Port-level connections between
objects.
- Transformation ports. Transformation ports for
each transformation in the mapping.
- Unconnected ports. Unconnected ports in
mapping objects.
- Object-level connections. Connections between all
objects in the mapping.
To view a Mapping Composite Report:
1. In the Designer, open a mapplet.
2. Right-click in the workspace and choose View
Mapping Report.
You can import or create the following types
of source definitions in the Source Analyzer:
- Relational tables, views, and synonyms
- Fixed-width and delimited flat files that do
not contain binary data.
- COBOL files
- XML files
- Web Services Description Language (WSDL)
- Data models using certain data modeling
tools through - -

You can view the following reports:


- Mapplet Composite Report
- Mapping Composite Report

13

Metadata Exchange for Data Models (an addon product)


You can import sources that use multibyte character
sets.
Source code pages must be a superset of the target
code pages.
Source definitions can be single- or multi-group.
A single-group source has a single group in the
source definition.
Relational sources use a single-group source
definition.
A multi-group source has multiple groups in the
source definition.
Non-relational sources such as XML sources use
multi-group source definitions.
Editing Relational Source Definitions
You might want to manually edit a source definition
to record properties that you cannot import from
the source.
You can edit a relational source definition to create
key columns and key relationships.
These relationships can be logical relationships.
They do not have to exist in the database.
Working with COBOL Sources
To provide support for mainframe source data, you
can import a COBOL file as a source definition in
the Designer.
COBOL files are fixed-width files that may contain
text and binary data.
PowerCenter supports the following code pages for
COBOL files:
- 7-bit ASCII
- EBCDIC-US
- 8-bit ASCII
- 8-bit EBCDIC
- ASCII-based MBCS
- EBCDIC-based MBCS
You can import shift-sensitive COBOL files that do
not contain shift keys.
Define the shift states for each column in the COBOL
source definition.
COBOL sources often de-normalize data and
compact the equivalent of separate table records
into a single record.
You use the Normalizer transformation to normalize
these records in the mapping.
COBOL files often represent the functional
equivalent of multiple source tables within the
same set of records.

When you review the structure of the COBOL file,


you can adjust the description to identify which
groups of fields constitute a single pseudo-table
Working with COBOL Copybooks
The Designer cannot recognize a COBOL copybook
(.cpy file) as a COBOL file (.cbl file) because it lacks
the proper format.
To import a COBOL copybook in the Designer, you
can insert it into a COBOL file template by using the
COBOL statement copy.
After you insert the copybook file into the COBOL file
template, you can save the file as a .cbl file and
import it in the Designer.
If the .cbl file and the .cpy file are not in the same
local directory, the Designer prompts for the
location of the .cpy file.
When the COBOL copybook file contains tabs, the
Designer expands tabs into spaces.
By default, the Designer expands a tab character
into eight spaces.
You can change this default setting in powrmart.ini.
You can find powrmart.ini in the root directory of the
PowerCenter Client installation.
To change the default setting, add the following text
to powrmart.ini:
[AnalyzerOptions]
TabSize=n
where n is the number of spaces the Designer
reads for every tab character.
To apply changes, restart the Designer.
For example, the COBOL copybook file is called
sample.cpy. The COBOL file below shows how to
use the copy statement to insert the sample
copybook into a COBOL file template:
identification division.
program-id. mead.
environment division.
select file-one assign to "fname".
data division.
file section.
fd FILE-ONE.
copy sample.cpy.
working-storage section.
procedure division.
stop run.
Components in a COBOL Source File
When you import a COBOL source, the Designer
scans the file for the following components:
- FD Section

14

- Fields
- OCCURS
- REDEFINES
FD Section
The Designer assumes that each FD entry defines
the equivalent of a source table in a relational
source and creates a different COBOL source
definition for each such entry.
For example, if the COBOL file has two FD entries,
CUSTOMERS and ORDERS, the Designer creates
one COBOL source definition containing the fields
attributed to CUSTOMERS, and another with the
fields that belong to ORDERS
Fields
The Designer identifies each field definition, reads
the datatype, and assigns it to the appropriate
source definition.
OCCURS
COBOL files often contain multiple instances of the
same type of data within the same record.
For example, a COBOL file may include data about
four different financial quarters, each stored in the
same record.
When the Designer analyzes the file, it creates a
different column for each OCCURS statement in the
COBOL file.
These OCCURS statements define repeated
information in the same record. Use the Normalizer
transformation to normalize this information.
For each OCCURS statement, the Designer creates
the following items:
- One target table when you drag the COBOL source
definition into the Target Designer.
- A primary-foreign key relationship
- A generated column ID (GCID)

REDEFINES
COBOL uses REDEFINES statements to build the
description of one record based on the definition of
another record.
When you import the COBOL source, the Designer
creates a single source that includes REDEFINES.
The REDEFINES statement lets you specify multiple
PICTURE clauses for the sample physical data
location.
Therefore, you need to use Filter transformations to
separate the data into the tables created by
REDEFINES.
For each REDEFINES:

- The Designer creates one target table when you


drag the COBOL source definition into the Target
Designer.
- The Designer creates one primary-foreign key
relationship.
- The Designer creates a generated key (GK).
- You need a separate Filter transformation in the
mapping.
Rules and Guidelines for Delimited File Settings
Delimited files are character-oriented and line
sequential. Use the following rules and guidelines when
you configure delimited files:
- The column and row delimiter character, quote
character, and escape character must all be different
for a source definition. These properties must also be
contained in the source or target file code page.
- The escape character and delimiters must be valid in
the code page of the source or target file. Use the
following rules and guidelines when you configure
delimited file sources:
- In a quoted string, use the escape character to
escape the quote character. If the escape character
does not immediately precede a quote character, the
Integration Service reads the escape character as an
ordinary character.
- Use an escape character to escape the column
delimiter. However, in a quoted string, you do not need
to use an escape character to escape the delimiter
since the quotes serve this purpose. If the escape
character does not immediately precede a delimiter
character, the Integration Service reads the escape
character as an ordinary character.
- When two consecutive quote characters appear
within a quoted string, the Integration Service reads
them as one quote character. For example, the
Integration Service reads the following quoted string as
Im going tomorrow: 2353,Im going tomorrowMD
- The Integration Service reads a string as a quoted
string only if the quote character you select is the first
character of the field.
- If the field length exceeds the column size defined in
the Source Qualifier transformation, the Integration
Service truncates the field.
- If the row of data exceeds the larger of the line
sequential buffer length or the total row size defined in
the Source Qualifier transformation, the Integration
Service drops the row and writes it to the session log
file. To determine the row size defined in the Source
Qualifier transformation, add the column precision and
the delimiters, and then multiply the total by the
maximum bytes per character.
Working with File Lists

15

A file list is a file that contains the names and


directories of each source file you want the
Integration Service to use.
When you configure a session to read a file list, the
Integration Service reads rows of data from the
different source files in the file list.
To configure the mapping to write the source
file name to each target row, add the
CurrentlyProcessedFileName port to the flat
file source definition.
The Integration Service uses this port to return the
source file name.
Creating Target Definitions
You can create the following types of target
definitions in the Target Designer:
Relational - Create a relational target for a
particular database platform. Create a relational
target definition when you want to use an external
loader to the target database.
Flat file - Create fixed-width and delimited flat file
target definitions.
XML file - Create an XML target definition to
output data to an XML file.
You can create target definitions in the following
ways:
- Import the definition for an existing target. Import
the target definition from a relational target or a
flat file. The Target Designer uses a Flat File Wizard
to import flat files.
- Create a target definition based on a source
definition. Drag a source definition into the Target
Designer to make a target definition.
- Create a target definition based on a
transformation or mapplet. Drag a transformation
into the Target Designer to make a target definition.
- Manually create a target definition. Create a
target definition in the Target Designer.
- Design several related target definitions. Create
several related target definitions at the same time.
You can create the overall relationship, called a
schema, and the target definitions, through wizards
in the Designer.
The Cubes and Dimensions Wizards follow common
principles of data warehouse design to simplify the
process of designing related targets.
Creating a Target Definition from a Transformation
To create a relational target definition that closely
matches a transformation in the repository; you can
create the target from the transformation.
Drag a transformation from the Navigator to the
Target Designer, or create a target from a
transformation in the Mapping Designer workspace.

Create target definitions from the following types of


transformations:
- Single-group transformations. Create a single
target definition from a transformation with one
output group.
- Multiple-group transformations. Create multiple
target definitions from a transformation with
multiple output groups.
- Normalizer transformations. Create a target
definition from a source qualifier or pipeline
Normalizer transformation.
- Mapplets. Create one or more target definitions
from a mapplet instance in a mapping.
When you create a target definition from a
transformation, the target database type is the
same as the repository database by default.
After you create the target definition in the
repository, you can edit it. For example, you might
want to change the target type.
Creating a Target from a Transformation with One
Output Group
When you create a target from a transformation with
one output group, the Designer creates one target.
All the output ports become input ports in the
target. The name of the target is the same as the
transformation name.
Creating a Target from a Transformation with Multiple
Output Groups
When you create targets from a transformation with
more than one output group, the Designer creates
one target for each output group in the
transformation.
When the transformation is a plug-in or Custom
transformation, the Designer retains the primary
key-foreign key relationships between the groups in
the target definitions.
Creating a Target from a Normalizer Transformation
You can create a target from a source qualifier or
pipeline Normalizer transformation.
When you create a target from a Normalizer
transformation, the Designer creates one
target and includes all the columns from the
Normalizer.
It does not create separate targets to represent the
record hierarchies or multiple-occurring fields in the
Normalizer transformation
Creating a Target from a Mapplet
You can create a target from a mapplet that is under
a mapping Transformation Instances node.

16

When you drag the mapplet instance to the Target


Designer, the Designer creates a target for each
output group in the mapplet.
Note: You cannot create a target when you drag a
transformation instance from a mapplet to the
Target Designer

Data Transformation Source and Target


Use a Data Transformation source or target to
process data in any file format such as Excel
spreadsheets or PDF documents.
You can also transform data in formats such as HL7,
EDI-X12, EDIFACT, SWIFT, NACHA, FIXBAI2, and
DTCC.
A Data Transformation source or target calls a Data
Transformation service from a PowerCenter session.
Data transformation is the application that
transforms the file formats.
The Data Transformation service is a Data
Transformation service that is deployed to the Data
Transformation repository and is ready to run

Data Transformation Service Types


When you create a project in Data Transformation
Studio, you choose a Data Transformation service
type to define the project. Data Transformation has
the following types of services that transform data:
Parser - Converts source documents to XML. The
input can have any format. The output of a parser
is always XML.
Serializer - Converts an XML file to another
document. The input is XML. The output can be any
format.
Mapper - Converts an XML source document to
another XML structure or schema. The input is XML.
The output is XML .
Transformer - Modifies the data in any format.
Adds, removes, converts, or changes text. Use
transformers with a parser, mapper, or serializer.
You can also run a transformer as standalone
component.
Streamer - Splits large input documents, such as
multiple gigabyte data streams, into segments. The
Streamer splits documents that have multiple
messages or multiple records in them.
MAPPINGS Object Dependency
Some objects in a mapping are also stored as
independent objects in the repository:
- Sources
- Targets
- Reusable transformations

- Mapplets
The mapping is dependent on these objects.
When this metadata changes, the Designer and
other PowerCenter Client applications track the
effects of these changes on mappings.
In these cases, you may find that mappings become
invalid even though you do not edit the mapping.
When a mapping becomes invalid, the Integration
Service cannot run it properly, and the Workflow
Manager invalidates the session.
The only objects in a mapping that are not
stored as independent repository objects are
the non-reusable transformations that you
build within the mapping.
These non-reusable transformations are
stored within the mapping only.

Exporting and Importing a Mapping


You export a mapping to an XML file and import a
mapping from an XML file through the Designer.
You might want to use the export and import feature
to copy a mapping to the same repository, a
connected repository, or a repository to which you
cannot connect
Invalidating Sessions
When you edit and save a mapping, some changes
cause the session to be invalid even though the
mapping remains valid.
The Integration Service does not run invalid
sessions. If you edit a mapping, the Designer
invalidates sessions when you perform the
following actions:
- Add or remove sources or targets.
- Remove mapplets or transformations.
- Replace a source, target, mapplet, or
transformation when importing or copying objects
- Add or remove Source Qualifiers or COBOL
Normalizers, or change the list of associated
sources for these transformations.
- Add or remove a Joiner or Update Strategy
transformation.
- Add or remove transformations from a mapplet in
the mapping.
- Change the database type for a source or target.
Deleting a Mapping
You may delete mappings that you no longer use.
When you delete a mapping, you do not delete any
sources, targets, mapplets, or reusable
transformations defined outside the mapping.
Note: If you enable version control, deleted
mapping remains checked out until you check
it in. To check in a deleted mapping, click

17

Versioning > Find Checkouts. Select the


deleted mapping and click Tools > Check In.

- If the Designer detects an error when you try to


link ports between two mapping objects, it displays
a symbol indicating that you cannot link the ports.
- Follow logic of data flow in the mapping. You can
link the following types of ports:
- The receiving port must be an input or
input/output port.
- The originating port must be an output or
input/output port.
- You cannot link input ports to input ports or output
ports to output ports.
- You must link at least one port of an input
group to an upstream transformation.
- You must link at least one port of an output
group to a downstream transformation.
- You can link ports from one active transformation
or one output group of an active transformation to
an input group of another transformation.
- You cannot connect an active
transformation and a passive transformation
to the same downstream transformation or
transformation input group.
- You cannot connect more than one active
transformation to the same downstream
transformation or transformation input
group.
- You can connect any number of passive
transformations to the same downstream
transformation, transformation input group,
or target
- You can link ports from two output groups in
the same transformation to one Joiner
transformation configured for sorted data if
the data from both output groups is sorted.
- You can only link ports with compatible
datatypes. The Designer verifies that it can map
between the two datatypes before linking them.
The Integration Service cannot transform data
between ports with incompatible datatypes. While
the datatypes do not have to be identical,
they do have to be compatible, such as Char
and Varchar.
- You must connect a source definition to a source
qualifier only. You then link the source qualifier to
targets or other transformations.
- You can link columns to a target definition
in a mapping, but you cannot copy columns
into a target definition in a mapping. Use the
Target Designer to add columns to a target
definition.
- The Designer marks some mappings invalid if the
mapping violates data flow validation.

Viewing Link Paths to a Port


When displaying both link paths, the Designer traces
the flow of data from one column in the source, in
and out of each transformation, and into a single
port in the target.
For unconnected transformations, the Designer does
not display a link path.
For connected Lookup transformations, the Designer
shows each output port dependent upon the input
ports involved in the lookup condition.
For Custom transformations, the Designer shows
that an output port depends on all input ports by
default. However, if you define port relationships in
a Custom transformation, the Designer shows the
dependent ports you define.
Note: You can configure the color the Designer uses
to display connectors in a link path. When
configuring the format options, choose the Link
Selection option.
Viewing Source Column Dependencies
To view column dependencies, right-click a target
column in a mapping and choose Show Field
Dependencies.
The Designer displays the Field Dependencies dialog
box which lists all source columns connected to the
target column.
When you define a port expression that performs a
calculation using multiple source columns, and then
connect that port to a target column, the Field
Dependencies dialog box lists all source columns
you use in the expression
Options for Linking Ports
When you link transformations, you can link with
one of the following options:
- One to one. Link one transformation or output
group to one transformation, input group, or target
only.
- One to many.
- Link one port to multiple transformations, input
groups, or targets.
- Link multiple ports in one transformation or
output group to multiple transformations, input
groups, or targets.
- Many to one. Link many transformations to one
transformation, input group, or target.
Rules and Guidelines for Connecting Mapping
Objects

Propagating Ports and attributes

18

The Designer does not propagate changes to the


following mapping objects:
- Unconnected transformations
- Reusable transformations
- Mapplets
- Source and target instances
- SDK Source Qualifier
Rules and Guidelines for Propagating Ports and
Attributes
- The Designer does not propagate to implicit
dependencies within the same transformation.
- When you propagate a port description, the Designer
overwrites the description for the port in the other
transformations in the mapping
- When you propagate backward along the link path,
verify that the change does not cause the Integration
Service to fail the session. For example, if you
propagate changes to a source qualifier, the
Integration Service might generate invalid SQL when it
runs the session. If you change the port name
CUST_ID to CUSTOMER_ID, the Integration Service
might generate SQL to select the wrong column name
if the source table uses CUST_ID.
- When you propagate port attributes, verify that the
change does not cause the Designer to invalidate the
mapping. For example, when you change the datatype
of a port from integer to string and propagate the
datatype to other transformations, the Designer
invalidates the mapping if a calculation uses one of the
changed ports. Validate the mapping after you
propagate ports. If the Designer invalidates the
mapping, click Edit > Revert to Saved to revert to the
last saved version of the mapping.
- When you propagate multiple ports, and an
expression or condition depends on more than one
propagated port, the Designer does not propagate
attributes to implicit dependencies if the attributes do
not match. For example, you have the following
expression in an Expression transformation:
Item_desc_out = Substr(ITEM_NAME, 0, 6) ||
Substr(ITEM_DESC, 0, 6)
The precision of Item_desc_out is 12, ITEM_NAME is 10,
and ITEM_DESC is 10. You change the precision of
ITEM_DESC to 15. You select parse expressions to infer
dependencies and propagate the port attributes of
ITEM_NAME and ITEM_DESC. The Designer does not
update the precision of the Item_desc_out port in the
Expression transformation since the ITEM_NAME and
ITEM_DESC ports have different precisions.
Creating Target Files by Transaction
You can generate a separate output file each time
the Integration Service starts a new transaction.

You can dynamically name each target flat file.


To generate a separate output file for each
transaction, add a FileName port to the flat file
target definition.
When you connect the FileName port in the
mapping, the Integration Service creates a
separate target file at each commit.
The Integration Service names the output file based
on the FileName port value from the first row in
each transaction.
By default, the Integration Service writes output files
to $PMTargetFileDir
Rules and Guidelines for Creating Target Files by
Transaction
- You can use a FileName column with flat file
targets.
- You can add one FileName column to the flat file
target definition.
- You can use a FileName column with data from
real-time sources.
- A session fails if you use a FileName column
with merge files, file lists, or FTP targets.
- If you pass the same file name to targets in
multiple partitions, you might get
unexpected results.
- When a transformation drops incoming
transaction boundaries and does not generate
commits, the Integration Service writes all rows into
the same output file. The output file name is the
initial value of the FileName port.
Rejecting Truncated and Overflow Data
When a conversion causes an overflow, the
Integration Service, by default, skips the row.
The Integration Service does not write the
data to the reject file.
For strings, the Integration Service truncates
the string and passes it to the next
transformation.
The Designer provides an option to let you
include all truncated and overflow data
between the last transformation and target in
the session reject file. If you select Reject
Truncated/Overflow Rows, the Integration Service
sends all truncated rows and any overflow rows to
the session reject file or to the row error logs,
depending on how you configure the session.
Rules and Guidelines for Configuring the Target Update
Override
- If you use target update override, you must
manually put all database reserved words in
quotes.

19

- You cannot override the default UPDATE statement


if the target column name contains any of the
following characters:
' , ( ) < > = + - * / \ t \ n \ 0 <space>
- You can use parameters and variables in the
target update query. Use any parameter or variable
type that you can define in the parameter file.
You can enter a parameter or variable within the
UPDATE statement, or you can use a parameter or
variable as the update query.
For example, you can enter a session parameter,
$ParamMyOverride, as the update query, and set
$ParamMyOverride to the UPDATE statement in a
parameter file.
- When you save a mapping, the Designer
verifies that you have referenced valid port
names. It does not validate the SQL.
- If you update an individual row in the target table
more than once, the database only has data from
the last update.
If the mapping does not define an order for the
result data, different runs of the mapping on
identical input data may result in different data in
the target table.
- A WHERE clause that does not contain any column
references updates all rows in the target table, or
no rows in the target table, depending on the
WHERE clause and the data from the mapping.
For example, the following query sets the
EMP_NAME to MIKE SMITH for all rows in the
target table if any row of the transformation has
EMP_ID > 100:
UPDATE T_SALES set EMP_NAME = 'MIKE SMITH'
WHERE :TU.EMP_ID > 100
- If the WHERE clause contains no port references,
the mapping updates the same set of rows for each
row of the mapping.
For example, the following query updates all
employees with EMP_ID > 100 to have the
EMP_NAME from the last row in the mapping:
UPDATE T_SALES set EMP_NAME = :TU.EMP_NAME
WHERE EMP_ID > 100
- If the mapping includes an Update Strategy or
Custom transformation, the Target Update
statement only affects records marked for update.
- If you use the Target Update option, configure the
session to mark all source records as update.

Rules and Guidelines for Adding Pre- and Post-Session


SQL Commands
- Use any command that is valid for the database
type. However, the Integration Service does not
allow nested comments, even though the database
might.

- You can use parameters and variables in the


target pre- and post-session SQL commands.
For example, you can enter a parameter or variable
within the command. Or, you can use a session
parameter, $ParamMyCommand, as the SQL
command, and set $ParamMyCommand to the SQL
statement in a parameter file.
- Use a semicolon (;) to separate multiple
statements. The Integration Service issues a
commit after each statement.
- The Integration Service ignores semicolons
within /* ...*/.
- If you need to use a semicolon outside of
comments, you can escape it with a backslash (\).
- The Designer does not validate the SQL.
Note: You can also enter pre- and post-session SQL
commands on the Properties tab of the Source
Qualifier transformation.
Validating the mapping
When you develop a mapping, you must configure it
so the Integration Service can read and process the
entire mapping.
The Designer marks a mapping invalid when it
detects errors that will prevent the Integration
Service from running sessions associated with the
mapping.
The Designer marks a mapping valid for the
following reasons:
Connection validation - Required ports are
connected and that all connections are valid.
Expression validation - All expressions are valid.
Objects validation - The independent object
definition matches the instance in the mapping.
Data flow validation - The data must be able to
flow from the sources to the targets without hanging
at blocking transformations.
Connection Validation
The Designer performs connection validation each
time you connect ports in a mapping and each time
you validate or save a mapping.
When you connect ports, the Designer verifies that
you make valid connections.
When you save or validate a mapping, the Designer
verifies that the connections are valid and that all
required ports are connected.
When you save or validate a mapping, the Designer
makes the following connection validations:
- At least one source and one target must be
connected.
- Source qualifiers must be mapped to a target.
- Mapplets must be connected. At least one mapplet
input port and output port is connected to the

20

mapping. If the mapplet includes a source qualifier


that uses an SQL override, the Designer prompts
you to connect all mapplet output ports to the
mapping.
- Datatypes between ports must be compatible. If
you change a port datatype to one that is
incompatible with the port it is connected to, the
Designer generates an error and invalidates the
mapping. For example, you have two Date/Time
ports connected, and you change one port to a
Decimal. The Designer invalidates the mapping. You
can however, change the datatype if it remains
compatible with the connected ports, such as Char
and Varchar

Data Flow Validation


When you validate or save a mapping, the Designer
verifies that the data can flow from all sources in a
target load order group to the targets without the
Integration Service blocking all sources.
Mappings that include blocking transformations
might hang at runtime with any of the following
mapping configurations:
- You connect one source pipeline to multiple input
groups of the blocking transformation
- You connect the sources and transformations in a
target load order group in such a way that multiple
blocking transformations could possibly block all
source pipelines.
Depending on the source data used in a session, a
blocking transformation might block data from one
source while it waits for a row from a different
source.
When you save or validate a mapping with one of
these configurations, the Designer marks the
mapping invalid.
When the Designer marks a mapping invalid
because the mapping violates data flow validation,
you must configure the mapping differently, or use
a non-blocking transformation where possible.
The following figure shows mappings that are invalid
because one source provides data for multiple
input groups of a blocking transformation:

To make the mappings valid, use a non-blocking


transformation for MGT1 or create two instances of the
same source and connect them to the blocking
transformation.
The following figure shows two similar mappings, one
which is valid, and one which is invalid:

Mapping A contains two multigroup transformations


that block data, MGT1 and MGT2. If you could run
this session, MGT1 might block data from S1 while
waiting for a row from S2.
And MGT2 might block data from S2 while waiting
for a row from S1.
The blocking transformations would block both
source pipelines and the session would hang.
Therefore, the Designer marks the mapping invalid.
Mapping B contains one multigroup transformation
that blocks data, MGT1.
Blocking transformations can never block all input
groups, so MGT1 might block either S1 or S2, but
never both.
MGT2 is not a blocking transformation, so it will
never block data.
Therefore, this session will not hang at runtime due
to blocking.
The Designer marks the mapping valid
Steps to Validate a Mapping
You can validate a mapping while you are working
on it through the Designer. Also, when you click
Repository > Save, the Designer validates all
mappings since the last time you saved.
When you validate or save a mapping the results of
the validation appear in the Output window.
The Repository Manager also displays whether a
mapping is valid.
To validate a mapping, check out and open the
mapping, and click Mappings > Validate.
If the Output window is not open, click View >
Output Window. Review any errors to determine
how to fix the mapping.
Validating Multiple Mappings

21

You can validate multiple mappings without fetching


them into the workspace.
To validate multiple mappings you must select and
validate the mappings from either a query results
view or a view object dependencies list.
Note: If you use the Repository Manager, you can
select and validate multiple mappings from the
Navigator.
You can save and optionally check in mappings that
change from invalid to valid status as a result of the
validation.
To validate multiple mappings:
1. Select mappings from either a query or a view
dependencies list.
2. Right-click one of the selected mappings and choose
Validate. The Validate Objects dialog box displays.
3. Choose whether to save objects and check in objects
that you validate
MAPPLETS
Mapplets Overview
When you use a mapplet in a mapping, you use an
instance of the mapplet. Like a reusable
transformation, any change made to the mapplet is
inherited by all instances of the mapplet.
Mapplets help simplify mappings in the following
ways:
Include source definitions - Use multiple source
definitions and source qualifiers to provide source data
for a mapping.
Accept data from sources in a mapping - If you
want the mapplet to receive data from the mapping,
use an Input transformation to receive source data.
Include multiple transformations - A mapplet can
contain as many transformations as you need.
Pass data to multiple transformations - You can
create a mapplet to feed data to multiple
transformations. Each Output transformation in a
mapplet represents one output group in a mapplet.
Contain unused ports - You do not have to connect
all mapplet input and output ports in a mapping.
Understanding Mapplet Input and Output
To use a mapplet in a mapping, you must configure
it for input and output.
In addition to transformation logic that you
configure, a mapplet has the following components:
Mapplet input: You can pass data into a mapplet
using source definitions or Input transformations or
both. When you use an Input transformation, you
connect it to the source pipeline in the mapping.

Mapplet output: Each mapplet must contain one or


more Output transformations to pass data from the
mapplet into the mapping.
Mapplet ports: Mapplet ports display only in the
Mapping Designer. Mapplet ports consist of input ports
from Input transformations and output ports from
Output transformations. If a mapplet uses source
definitions rather than Input transformations for input,
it does not contain any input ports in the mapping.
Mapplet Input
Mapplet input can originate from a source definition
and/or from an Input transformation in the mapplet.
You can create multiple pipelines in a mapplet.
Use multiple source definitions and source qualifiers
or Input transformations.
You can also use a combination of source definitions
and Input transformations.
Using Source Definitions for Mapplet Input
Use one or more source definitions in a mapplet to
provide source data.
When you use the mapplet in a mapping, it is the
first object in the mapping pipeline and contains no
input ports.
Using Input Transformations for Mapplet Input
Use an Input transformation in a mapplet when you
want the mapplet to receive input from a source in
a mapping.
When you use the mapplet in a mapping, the Input
transformation provides input ports so you can pass
data through the mapplet.
Each port in the Input transformation connected to
another transformation in the mapplet becomes a
mapplet input port.
Input transformations can receive data from a single
active source.
Unconnected ports do not display in the Mapping
Designer.
You can connect an Input transformation to
multiple transformations in a mapplet.
However, you cannot connect a single port in
the Input transformation to multiple
transformations in the mapplet.
Mapplet Output
Use an Output transformation in a mapplet to pass
data through the mapplet into a mapping.
A mapplet must contain at least one Output
transformation with at least one connected port in
the mapplet.
Each connected port in an Output transformation
displays as a mapplet output port in a mapping.

22

Each Output transformation in a mapplet displays as


an output group in a mapping.
An output group can pass data to multiple pipelines
in a mapping.
Viewing Mapplet Input and Output
Mapplets and mapplet ports display differently in the
Mapplet Designer and the Mapping Designer
The following figure shows a mapplet with both an
Input transformation and an Output transformation:

When you use the mapplet in a mapping, the


mapplet object displays only the ports from the
Input and Output transformations. These are
referred to as the mapplet input and mapplet
output ports.
The following figure shows the same mapplet in the
Mapping Designer:

You can expand the mapplet in the Mapping


Designer by selecting it and clicking Mappings >
Expand.
This expands the mapplet within the mapping for
view. Transformation icons within an expanded
mapplet display as shaded.
You can open or iconize all the transformations in
the mapplet and mapping.
You cannot edit any of the properties, navigate to
other folders, or save the repository while the
mapplet is expanded.
The following figure shows an expanded mapplet in
the Mapping Designer:

In an expanded mapping, you do not see the


Input and Output transformations.
Creating a Mapplet
A mapplet can be active or passive depending on
the transformations in the mapplet.
Active mapplets contain one or more active
transformations.
Passive mapplets contain only passive
transformations.
When you use a mapplet in a mapping, all
transformation rules apply to the mapplet
depending on the mapplet type.
For example, as with an active transformation, you
cannot concatenate data from an active mapplet
with a different pipeline.
Use the following rules and guidelines when you add
transformations to a mapplet:
- If you use a Sequence Generator
transformation, you must use a reusable
Sequence Generator transformation.
- If you use a Stored Procedure
transformation, you must configure the Stored
Procedure Type to be Normal.
- You cannot include PowerMart 3.5-style
LOOKUP functions in a mapplet.
- You cannot include the following objects in a
mapplet:
- Normalizer transformations
- COBOL sources
- XML Source Qualifier transformations
- XML sources
- Target definitions
- Other mapplets
Although reusable transformations and shortcuts in
a mapplet can be used, to protect the validity of
the mapplet, use a copy of a transformation
instead.
Reusable transformations and shortcuts inherit
changes to their original transformations. This
might invalidate the mapplet and the mappings
that use the mapplet
Validating Mapplets

23

The Designer validates a mapplet when you save it.


You can also validate a mapplet using the Mapplets
> Validate menu command. When you validate a
mapplet, the Designer writes all relevant messages
about the mapplet in the Output window.
The Designer validates the mapplet pipeline in the
same way it validates a mapping.
The Designer also performs the following checks
specific to mapplets:
- The mapplet can contain Input transformations and
source definitions with at least one port connected
to a transformation in the mapplet.
- The mapplet contains at least one Output
transformation with at least one port connected to a
transformation in the mapplet.

Editing Mapplets
You can edit a mapplet in the Mapplet Designer. The
Designer validates the changes when you save the
mapplet.
When you save changes to a mapplet, all
instances of the mapplet and all shortcuts to
the mapplet inherit the changes.
These changes might invalidate mappings that use
the mapplet.
To see what mappings or shortcuts may be
affected by changes you make to a mapplet,
select the mapplet in the Navigator, rightclick, and select Dependencies. Or, click
Mapplets > Dependencies from the menu.

You can make the following changes to a mapplet


without affecting the validity of existing mappings
and sessions:
- Add input or output ports.
- Change port names or comments.
- Change Input or Output transformation names or
comments.
- Change transformation names, comments, or
properties.
- Change port default values for transformations in
the mapplet.
- Add or remove transformations in the
mapplet, providing you do not change the
mapplet type from active to passive or from
passive to active.
Use the following rules and guidelines when you edit
a mapplet that is used by mappings:
Do not delete a port from the mapplet: The Designer
deletes mapplet ports in the mapping when you
delete links to an Input or Output transformation or
when you delete ports connected to an Input or
Output transformation.

Do not change the datatype, precision, or scale of a


mapplet port: The data type, precision, and scale of
a mapplet port is defined by the transformation
port to which it is connected in the mapplet.
Therefore, if you edit a mapplet to change the
datatype, precision, or scale of a port connected to
a port in an Input or Output transformation, you
change the mapplet port.
Do not change the mapplet type: If you remove all
active transformations from an active mapplet, the
mapplet becomes passive. If you add an active
transformation to a passive mapplet, the mapplet
becomes active.
Mapplets and Mappings
The following mappings tasks can also be performed
on mapplets:
Set tracing level: You can set the tracing level on
individual transformations within a mapplet in the
same manner as in a mapping.
Copy mapplet: You can copy a mapplet from one
folder to another as you would any other repository
object. After you copy the mapplet, it appears in the
Mapplets node of the new folder. If you make
changes to a mapplet, but you do not want to
overwrite the original mapplet, you can make a copy
of the mapplet by clicking Mapplets > Copy As.
Export and import mapplets: You can export a
mapplet to an XML file or import a mapplet
from an XML file through the Designer. You
might want to use the export and import
feature to copy a mapplet to another
repository.
Delete mapplets: When you delete a mapplet, you
delete all instances of the mapplet. This invalidates
each mapping containing an instance of the mapplet
or a shortcut to the mapplet.
Compare mapplets: You can compare two
mapplets to find differences between them. For
example, if you have mapplets with the same name
in different folders, you can compare them to see if
they differ.
Compare instances within a mapplet: You can
compare instances in a mapplet to see if they
contain similar attributes. For example, you can
compare a source instance with another source
instance, or a transformation with another
transformation. You compare instances within a
mapplet in the same way you compare instances
within a mapping.
Create shortcuts to mapplets: You can create a
shortcut to a mapplet if the mapplet is in a shared
folder. When you use a shortcut to a mapplet in a
mapping, the shortcut inherits any changes you

24

might make to the mapplet. However, these


changes might not appear until the Integration
Service runs the workflow using the shortcut.
Therefore, only use a shortcut to a mapplet when
you do not expect to edit the mapplet.
Add a description: You can add a description to
the mapplet in the Mapplet Designer in the same
manner as in a mapping. You can also add a
description to the mapplet instance in a mapping.
When you add a description, you can also create
links to documentation files. The links must be a
valid URL or file path to reference the business
documentation.
View links to a port: You can view links to a port in
a mapplet in the same way you would view links to a
port in a mapping. You can view the forward path,
the backward path, or both paths.
Propagate port attributes: You can propagate
port attributes in a mapplet in the same way you
would propagate port attributes in a mapping. You
can propagate attributes forward, backward, or in
both directions.
Using Mapplets in Mappings
In a mapping, a mapplet has input and output ports
that you can connect to other transformations in
the mapping.
You do not have to connect all mapplet ports in a
mapping.
However, if the mapplet contains an SQL
override, you must connect all mapplet
output ports in the mapping.

ports in a mapping. Unconnected ports do not


display when you use the mapplet in a mapping.
You can create a mapplet port in the following ways:
Manually create ports in the Input/output
transformation: You can create port names in
Input and Output transformations. You can also
enter a description for each port name. The port has
no defined data type, precision, or scale until you
connect it to a transformation in the mapplet.
Drag a port from another transformation: You
can create an input or output port by dragging a
port from another transformation into the Input or
Output transformation. The new port inherits the
port name, description, data type, and scale of the
original port. You can edit the new port name and
description in the transformation. If you change a
port connection, the Designer updates the Input or
Output transformation port to match the attributes
of the new connection.
You can view the data type, precision, and scale of
available mapplet ports when you use the mapplet
in a mapping.
Connecting to Mapplet Output Groups
Each Output transformation displays as an output
group when you use a mapplet in a mapping.
Connect the mapplet output ports to the mapping
pipeline.
Use Autolink to connect the ports.

Like a reusable transformation, when you drag a


mapplet into a mapping, the Designer creates an
instance of the mapplet.
You can enter comments for the instance of
the mapplet in the mapping.
You cannot otherwise edit the mapplet in the
Mapping Designer.
If you edit the mapplet in the Mapplet Designer,
each instance of the mapplet inherits the changes.
The PowerCenter Repository Reports has a Mapplets
list report that you use to view all mappings using a
particular mapplet.

Use the following rules and guidelines when you


connect mapplet output ports in the mapping:
- When a mapplet contains a source qualifier
that has an override for the default SQL query,
you must connect all of the source qualifier
output ports to the next transformation within
the mapplet.
- If the mapplet contains more than one source
qualifier, use a Joiner transformation to join the
output into one pipeline.
- If the mapplet contains only one source qualifier,
you must connect the mapplet output ports to
separate pipelines. You cannot use a Joiner
transformation to join the output.

Creating and Configuring Mapplet Ports


After creating transformation logic for a mapplet,
you can create mapplet ports. Use an Input
transformation to define mapplet input ports if the
mapplet contains no source definitions. Use an
Output transformation to create a group of output
ports. Only connected ports in an Input or Output
transformation become mapplet input or output

If you need to join the pipelines, you can create two


mappings to perform this task:
- Use the mapplet in the first mapping and write
data in each pipeline to separate targets.
- Use the targets as sources in the second mapping
to join data, and then perform any additional
transformation necessary.

25

- Do not change the datatype, precision, or scale of


a mapplet port when the mapplet is used by a
mapping.
- If you use a Sequence Generator transformation,
you must use a reusable Sequence Generator
transformation.
- If you use a Stored Procedure transformation, you
must configure the Stored Procedure Type to be
Normal.
- You cannot include PowerMart 3.5-style LOOKUP
functions in a mapplet.
- You cannot include the following objects in a
mapplet:
- Normalizer transformations
- Cobol sources
- XML Source Qualifier transformations
- XML sources
- Target definitions
- Pre- and post- session stored procedures
- Other mapplets

Setting the Target Load Plan


When you use a mapplet in a mapping, the Mapping
Designer lets you set the target load plan for
sources within the mapplet.
Pipeline Partitioning
If you have the partitioning option, you can increase
the number of partitions in a pipeline to improve
session performance. Increasing the number of
partitions allows the Integration Service to create
multiple connections to sources and process
partitions of source data concurrently.
When you create a session, the Workflow Manager
validates each pipeline in the mapping for
partitioning. You can specify multiple partitions in a
pipeline if the Integration Service can maintain data
consistency when it processes the partitioned data.
Some partitioning restrictions apply to mapplets.
Rules and Guidelines for Mapplets
The following list summarizes the rules and
guidelines that appear throughout this chapter:
- You can connect an Input transformation to
multiple transformations in a mapplet. However, you
cannot connect a single port in the Input
transformation to multiple transformations in the
mapplet.
- An Input transformation must receive data from a
single active source.
- A mapplet must contain at least one Input
transformation or source definition with at least one
port connected to a transformation in the mapplet.
- A mapplet must contain at least one Output
transformation with at least one port connected to
another transformation in the mapping.
- When a mapplet contains a source qualifier that
has an override for the default SQL query, you must
connect all of the source qualifier output ports to
the next transformation within the mapplet.
- If the mapplet contains more than one source
qualifier, use a Joiner transformation to join the
output into one pipeline. If the mapplet contains
only one source qualifier, you must connect the
mapplet output ports to separate pipelines. You
cannot use a Joiner transformation to join the
output.
- When you edit a mapplet, you might invalidate
mappings if you change the mapplet type from
passive to active.
- If you delete ports in the mapplet when the
mapplet is used in a mapping, you can invalidate
the mapping.

Mapping Parameters and Variables


Mapping Parameters
A mapping parameter represents a constant value
that you can define before running a session.
A mapping parameter retains the same value
throughout the entire session.
When you use a mapping parameter, you declare
and use the parameter in a mapping or mapplet.
Then define the value of the parameter in a
parameter file.
The Integration Service evaluates all references to
the parameter to that value.
When you want to use the same value for a mapping
parameter each time you run the session, use the
same parameter file for each session run.
When you want to change the value of a mapping
parameter between sessions you can perform one
of the following tasks:
- Update the parameter file between sessions.
- Create a different parameter file and configure the
session to use the new file.
- Remove the parameter file from the session
properties. The Integration Service uses the
parameter value in the pre-session variable
assignment. If there is no pre-session variable
assignment, the Integration Service uses the
configured initial value of the parameter in the
mapping.
Mapping Variables
Unlike a mapping parameter, a mapping variable
represents a value that can change through the
session.

26

The Integration Service saves the value of a


mapping variable to the repository at the end of
each successful session run and uses that value the
next time you run the session.
When you use a mapping variable, you declare the
variable in the mapping or mapplet, and then use a
variable function in the mapping to change the
value of the variable.
At the beginning of a session, the Integration
Service evaluates references to a variable to
determine the start value.
At the end of a successful session, the Integration
Service saves the final value of the variable to the
repository.
The next time you run the session, the Integration
Service evaluates references to the variable to the
saved value.
To override the saved value, define the start value of
the variable in a parameter file or assign a value in
the pre-session variable assignment in the session
properties.
Using Mapping Parameters and Variables
When the Designer validates a mapping variable in a
reusable transformation, it treats the variable as an
Integer datatype.
You cannot use mapping parameters and variables
interchangeably between a mapplet and a
mapping.
Mapping parameters and variables declared for a
mapping cannot be used within a mapplet.
Similarly, you cannot use a mapping parameter or
variable declared for a mapplet in a mapping.
Initial and Default Values
When you declare a mapping parameter or variable
in a mapping or a mapplet, you can enter an initial
value.
The Integration Service uses the configured initial
value for a mapping parameter when the
parameter is not defined in the parameter file.
Similarly, the Integration Service uses the
configured initial value for a mapping variable when
the variable value is not defined in the parameter
file, and there is no saved variable value in the
repository.
When the Integration Service needs an initial value,
and you did not declare an initial value for the
parameter or variable, the Integration Service uses
a default value based on the datatype of the
parameter or variable.
The following table lists the default values the
Integration Service uses for different types of data:

Using String Parameters and Variables


For example, you might use a parameter named $
$State in the filter for a Source Qualifier
transformation to extract rows for a particular
state:
STATE = $$State
During the session, the Integration Service replaces
the parameter with a string. If $$State is defined as
MD in the parameter file, the Integration Service
replaces the parameter as follows:
STATE = MD
You can perform a similar filter in the Filter
transformation using the PowerCenter
transformation language as follows:
STATE = $$State
If you enclose the parameter in single quotes in the
Filter transformation, the Integration Service reads
it as the string literal $$State instead of replacing
the parameter with MD.
Variable Datatype and Aggregation Type
The Integration Service uses the aggregate type of a
mapping variable to determine the final current
value of the mapping variable.
When you have a pipeline with multiple partitions,
the Integration Service combines the variable value
from each partition and saves the final current
variable value into the repository.
You can create a variable with the following
aggregation types:
- Count
- Max
- Min
You can configure a mapping variable for a Count
Aggregation type when it is an Integer or Small
Integer.
You can configure mapping variables of any
datatype for Max or Min aggregation types.
To keep the variable value consistent throughout the
session run, the Designer limits the variable
functions you use with a variable based on
aggregation type.
For example, use the SetMaxVariable function for a
variable with a Max aggregation type, but not with
a variable with a Min aggregation type.

27

The following table describes the available variable


functions and the aggregation types and datatypes
you use with each function:

Variable Functions
Variable functions determine how the Integration
Service calculates the current value of a mapping
variable in a pipeline.
Use variable functions in an expression to set the
value of a mapping variable for the next session
run.
The transformation language provides the following
variable functions to use in a mapping:
SetMaxVariable: Sets the variable to the maximum
value of a group of values. It ignores rows marked for
update, delete, or reject. To use the SetMaxVariable
with a mapping variable, the aggregation type of the
mapping variable must be set to Max.
SetMinVariable: Sets the variable to the minimum
value of a group of values. It ignores rows marked for
update, delete, or reject. To use the SetMinVariable
with a mapping variable, the aggregation type of the
mapping variable must be set to Min.
SetCountVariable: Increments the variable value by
one. In other words, it adds one to the variable value
when a row is marked for insertion, and subtracts one
when the row is marked for deletion. It ignores rows
marked for update or reject. To use the
SetCountVariable with a mapping variable, the
aggregation type of the mapping variable must be set
to Count.
SetVariable: Sets the variable to the configured
value. At the end of a session, it compares the final
current value of the variable to the start value of the
variable. Based on the aggregate type of the variable,
it saves a final value to the repository. To use the
SetVariable function with a mapping variable, the
aggregation type of the mapping variable must be set
to Max or Min. The SetVariable function ignores rows
marked for delete or reject.
Use variable functions only once for each mapping
variable in a pipeline.
The Integration Service processes variable functions
as it encounters them in the mapping.
The order in which the Integration Service
encounters variable functions in the mapping
may not be the same for every session run.

This may cause inconsistent results when you


use the same variable function multiple times
in a mapping.
The Integration Service does not save the
final current value of a mapping variable to
the repository when any of the following
conditions are true:
- The session fails to complete.
- The session is configured for a test load.
- The session is a debug session.
- The session runs in debug mode and is
configured to discard session output.
You cannot use variable functions in the Rank
or Aggregator transformation. Use a different
transformation for variable functions.
Working with User-Defined Functions
After you create the function, you can create the
following expression in an Expression
transformation to remove leading and trailing
spaces from last names:
:UDF.REMOVESPACES(LAST_NAME)
Configuring a User-Defined Function Name
A valid function name meets the following
requirements:
- It begins with a letter.
- It can contain letters, numbers, and underscores. It
cannot contain any other character.
- It cannot contain spaces.
- It must be 80 characters or fewer.
Configuring the Function Type
You can place user-defined functions in other userdefined functions. You can also configure a userdefined function to be callable from expressions.
Callable means that you can place user-defined
functions in an expression.
Select one of the following options when you
configure a user-defined function:
Public: Callable from any user-defined function,
transformation expression, link condition expression,
or task expression.
Private: Callable from another user-defined
function. Create a private function when you want
the function to be part of a more complex function.
The simple function may not be usable
independently of the complex function.
After you create a public user-defined
function, you cannot change the function
type to private.

28

Although you can place a user-defined


function in another user-defined function, a
function cannot refer to itself.
For example, the user-defined function
RemoveSpaces includes a user-defined function
TrimLeadingandTrailingSpaces.
TrimLeadingandTrailingSpaces cannot include
RemoveSpaces. Otherwise, RemoveSpaces is
invalid.
Configuring Public Functions that Contain Private
Functions
When you include ports as arguments in a private
user-defined function, you must also include the
ports as arguments in any public function that
contains the private function. Use the same
datatype and precision for the arguments in the
private and public function.
For example, you define a function to modify order
IDs to include INFA and the customer ID. You first
create the following private function called
ConcatCust that concatenates INFA with the port
CUST_ID: CONCAT (INFA, CUST_ID)
After you create the private function, you create a
public function called ConcatOrder that contains
ConcatCust:
CONCAT (:UDF.CONCATCUST( CUST_ID), ORDER_ID)
When you add ConcatCust to ConcatOrder, you add
the argument CUST_ID with the same datatype and
precision to the public function.
Note: If you enter a user-defined function when you
manually define the public function syntax, you
must prefix the user-defined function with :UDF
The following table describes the user-defined
function management tasks and lists where you
can perform each task:

- Query Results window


- View History window
When you validate a user-defined function,
the PowerCenter Client does not validate
other user-defined functions and expressions
that use the function.
If a user-defined function is invalid, any userdefined function and expression that uses the
function is also invalid.
Similarly, mappings and workflows that use
the user-defined function are invalid.
Using Debugger
Debugger Session Types
You can select three different debugger session
types when you configure the Debugger. The
Debugger runs a workflow for each session type.
You can choose from the following Debugger
session types when you configure the Debugger:
Use an existing non-reusable session: The
Debugger uses existing source, target, and session
configuration properties. When you run the
Debugger, the Integration Service runs the nonreusable session and the existing workflow. The
Debugger does not suspend on error.
Use an existing reusable session: The Debugger
uses existing source, target, and session
configuration properties. When you run the
Debugger, the Integration Service runs a debug
instance of the reusable session and creates and
runs a debug workflow for the session.
Create a debug session instance: You can
configure source, target, and session configuration
properties through the Debugger Wizard. When you
run the Debugger, the Integration Service runs a
debug instance of the debug workflow and creates
and runs a debug workflow for the session
The following figure shows the windows in the
Mapping Designer that appears when you run the
Debugger:

Validating User-Defined Functions


You can validate a user-defined function from the
following areas:
- Expression Editor when you create or edit a UDF
- Tools menu

29

contains a user-defined default value of a


constant value or constant expression.
- The Integration Service encounters a null
input value, and the port contains a userdefined default value of a constant value or
constant expression.
Running an Existing Session in Debug Mode
If you choose to run an existing session in debug
mode, the Debugger Wizard displays a list of all
sessions in the current folder that use the mapping.
Select the session you want to use.
You cannot run the Debugger against a
session configured with multiple partitions or
a session configured to run on a grid. You
must either change the properties of the
session or choose to create a debug session
for the mapping.

Note: You cannot create breakpoints for


mapplet Input and Output transformations.

Creating Error Breakpoints


When you create an error breakpoint, the Debugger
pauses when the Integration Service encounters
error conditions such as a transformation error or
calls to the ERROR function.
You also set the number of errors to skip for each
breakpoint before the Debugger pauses:
- If you want the Debugger to pause at every error,
set the number of errors to zero.
- If you want the Debugger to pause after a specified
number of errors, set the number of errors greater
than zero. For example, if you set the number of
errors to five, the Debugger skips five errors and
pauses at every sixth error.
Using ISNULL and ISDEFAULT
You can create ISNULL and ISDEFAULT conditions in
transformation and global data breakpoints.
When you use the ISNULL or ISDEFAULT operator,
you cannot use the type or value in the condition.
When you create an ISNULL condition, the
Debugger pauses when the Integration
Service encounters a null input value, and the
port contains the system default value.
When you create an ISDEFAULT condition, the
Debugger pauses in the following
circumstances:
- The Integration Service encounters an
output transformation error, and the port

Set Target Options


On the last page of the Debugger Wizard, you can
select the following target options:
Discard target data: You can choose to load or
discard target data when you run the Debugger. If
you discard target data, the Integration
Service does not connect to the target.
Display target data: You can select the target
instances you want to display in the Target window
while you run a debug session.
When you click Finish, if the mapping includes
mapplets, the Debugger displays the mapplet
instance dialog box.
Select the mapplets from this dialog box that you
want to debug. To clear a selected mapplet, press
the Ctrl key and select the mapplet.
When you select a mapplet to debug, the Designer
expands it to display the individual transformations
when the Debugger runs.
When you do not select a mapplet to debug, the
Designer does not expand it in the workspace.

30

You cannot complete the following tasks for


transformations in the mapplet:
- Monitor or modify transformation data.
- Evaluate expressions.
- Edit breakpoints.
- Step to a transformation instance.
The Debugger can be in one of the following states:
Initializing: The Designer connects to the
Integration Service.
Running: The Integration Service processes the
data.
Paused: the Integration Service encounters a
break and pauses the Debugger.
Note: To enable multiple users to debug the same
mapping at the same time, each user must
configure different port numbers in the Tools >
Options > Debug tab.
The Debugger does not use the high
availability functionality.
The following table describes the different tasks you
can perform in each of the Debugger states:

Working with Persisted Values


When you run the Debugger against mappings with
sequence generators and mapping variables, the
Integration Service might save or discard persisted
values:
Discard persisted values: The Integration Service
does not save final values of generated sequence
numbers or mapping variables to the repository
when you run a debug session or you run a session
in debug mode and discard target data.
Save persisted values: The Integration Service
saves final values of generated sequence numbers
and mapping variables to the repository when you
run a session in debug mode and do not discard
target data. You can view the final value for
Sequence Generator and Normalizer transformations
in the transformation properties.

31

Designer Behavior
When the Debugger starts, you cannot
perform the following tasks:
- Close the folder or open another folder.
- Use the Navigator.
- Perform repository functions, such as Save.
- Edit or close the mapping.
- Switch to another tool in the Designer, such
as Target Designer.
- Close the Designer.
Note: Dynamic partitioning is disabled during
debugging.
Monitoring the Debugger
When you run the Debugger, you can monitor the
following information:
Session status: Monitor the status of the session.
Data movement: Monitor data as it moves through
transformations.
Breakpoints: Monitor data that meets breakpoint
conditions.
Target data: Monitor target data on a row-by-row
basis.
The Mapping Designer displays windows and debug
indicators that help you monitor the session:
Debug indicators: Debug indicators on
transformations help you follow breakpoints and
data flow.
Instance window: When the Debugger pauses, you
can view transformation data and row information in
the Instance window.
Target window: View target data for each target in
the mapping.
Output window: The Integration Service writes
messages to the following tabs in the Output
window:
- Debugger tab: The debug log displays in the
Debugger tab.
- Session Log tab: The session log displays in the
Session Log tab.
- Notifications tab: Displays messages from the
Repository Service.
You can step to connected transformations in
the mapping, even if they do not have an
associated breakpoint.
You cannot step to the following instances:
- Sources
- Targets
- Unconnected transformations
- Mapplets not selected for debugging

Modifying Data
When the Debugger pauses, the current instance
displays in the Instance window, and the current
instance indicator displays on the transformation in
the mapping. You can make the following
modifications to the current instance when
the Debugger pauses on a data breakpoint:
Modify output data: You can modify output data
of the current transformation. When you continue
the session, the Integration Service validates the
data. It performs the same validation it performs
when it passes data from port to port in a regular
session.
Change null data to not-null: Clear the null
column, and enter a value in the value column to
change null data to not-null.
Change not-null to null: Select the null column to
change not-null data to null. The Designer prompts
you to confirm that you want to make this change.
Modify row types: Modify Update Strategy, Filter,
or Router transformation row types.
For Router transformations, you can change the row
type to override the group condition evaluation for
user defined groups.
For example, if the group condition evaluates to
false, the rows are not passed through the output
ports to the next transformation or target.
The Instance window displays <no data
available>, and the row type is filtered. If you
want to pass the filtered row to the next
transformation or target, you can change the
row type to Insert.
Likewise, for a group that meets the group condition,
you can change the row type from insert to filtered.
After you change data, you can refresh the
cache before you continue the session.
When you issue the Refresh command, the Designer
processes the request for the current
transformation, and you can see if the data you
enter is valid.
You can change the data again before you continue
the session
Restrictions
You cannot change data for the following output
ports:
Normalizer transformation: Generated Keys and
Generated Column ID ports.
Rank transformation: RANKINDEX port.
Router transformation: All output ports.
Sequence Generator transformation: CURRVAL
and NEXTVAL ports.

32

Lookup transformation: NewLookupRow port for a


Lookup transformation configured to use a dynamic
cache.
Custom transformation: Ports in output groups
other than the current output group.
Java transformation: Ports in output groups other
than the current output group.
Additionally, you cannot change data associated
with the following:
- Mapplets that are not selected for debugging
- Input or input/output ports
- Output ports when the Debugger pauses on an
error breakpoint

4. WORKFLOW BASICS GUIDE


Workflow Tasks
You can create tasks in Task Developer, Workflow
designer or Worklet designer
Tasks created in Task Dev are reusable but those
created in Workflow/worklet designer are not.
You can create the following types of tasks in the
Workflow Manager:
Assignment: Assigns a value to a workflow
variable.
Command: Specifies a shell command to run during
the workflow.
Control: Stops or aborts the workflow.
Decision: Specifies a condition to evaluate.
Email: Sends email during the workflow.
Event-Raise: Notifies the Event-Wait task that an
event has occurred.
Event-Wait: Waits for an event to occur before
executing the next task.
Session: Runs a mapping you create in the
Designer.
Timer: Waits for a timed event to trigger.
Workflow Manager Windows
The Workflow Manager displays the following
windows to help you create and organize
workflows:
- Navigator
- Workspace
- Output
- Overview

Enhanced Security
The Workflow Manager has an enhanced security
option to specify a default set of permissions for
connection objects.
When you enable enhanced security, the
Workflow Manager assigns default
permissions on connection objects for users,
groups, and others.
When you disable enable enhanced security, the
Workflow Manager Assigns read, write, and execute
permissions to all users that would otherwise
receive permissions of the default group.
If you delete the owner from the repository,
the Workflow Manager assigns ownership of
the object to the administrator.
Viewing and Comparing Versioned Repository
Objects
You can view and compare versions of objects in the
Workflow Manager. If an object has multiple
versions, you can find the versions of the object in
the View History window. In addition to comparing
versions of an object in a window, you can view the
various versions of an object in the workspace to
graphically compare them.
Use the following rules and guidelines when you
view older versions of objects in the workspace:
- You cannot simultaneously view multiple versions
of composite objects, such as workflows and
worklets.
- Older versions of a composite object might not
include the child objects that were used when the
composite object was checked in. If you open a
composite object that includes a child object version
that is purged from the repository, the preceding
version of the child object appears in the workspace
as part of the composite object. For example, you
might want to view version 5 of a workflow that

33

originally included version 3 of a session, but version


3 of the session is purged from the repository. When
you view version 5 of the workflow, version 2 of the
session appears as part of the workflow.
- You cannot view older versions of sessions if
they reference deleted or invalid mappings, or
if they do not have a session configuration.
Searching for Versioned Objects
Use an object query to search for versioned objects
in the repository that meet specified conditions.
When you run a query, the repository returns
results based on those conditions. You may want to
create an object query to perform the following
tasks:
Track repository objects during development: You
can add Label, User, Last saved, or Comments
parameters to queries to track objects during
development.
Associate a query with a deployment group: When
you create a dynamic deployment group, you can
associate an object query with it.
Comparing Repository Objects
Use the Workflow Manager to compare two
repository objects of the same type to identify
differences between the objects. For example, if
you have two similar Email tasks in a folder, you
can compare them to see which one contains the
attributes you need. When you compare two
objects, the Workflow Manager displays their
attributes in detail.
You can compare objects across folders and
repositories. You must open both folders to
compare the objects. You can compare a
reusable object with a non-reusable object.
You can also compare two versions of the
same object.
You can compare the following types of objects:
- Tasks
- Sessions
- Worklets
- Workflows
You can also compare instances of the same type.
For example, if the workflows you compare contain
worklet instances with the same name, you can
compare the instances to see if they differ.
Use the Workflow Manager to compare the following
instances and attributes:
- Instances of sessions and tasks in a workflow or
worklet comparison. For example, when you

compare workflows, you can compare task instances


that have the same name.
- Instances of mappings and transformations in a
session comparison. For example, when you
compare sessions, you can compare mapping
instances.
- The attributes of instances of the same type within
a mapping comparison. For example, when you
compare flat file sources, you can compare
attributes, such as file type (delimited or fixed),
delimiters, escape characters, and optional quotes.
You can compare schedulers and session
configuration objects in the Repository
Manager.
You cannot compare objects of different types. For
example, you cannot compare an Email task with a
Session task.
When you compare objects, the Workflow Manager
displays the results in the Diff Tool window. The Diff
Tool output contains different nodes for different
types of objects.
When you import Workflow Manager Objects, you
can compare object conflicts.
A workflow must contain a Start task. The
Start task represents the beginning of a
workflow.
When you create a workflow, the Workflow Designer
creates a Start task and adds it to the workflow.
You cannot delete the Start task
You may decide to delete a workflow that you no
longer use. When you delete a workflow, you delete
all nonreusable tasks and reusable task instances
associated with the workflow. Reusable tasks used
in the workflow remain in the folder when you
delete the workflow.
If you delete a workflow that is running, the
Integration Service aborts the workflow.
If you delete a workflow that is scheduled to
run, the Integration Service removes the
workflow from the schedule.
If you want to write performance data to the
repository you must perform the following tasks:
- Configure the session to collect performance data.
- Configure the session to write performance data to
repository.
- Configure Integration Service to persist run-time
statistics to the repository at the verbose level.
Guidelines for Entering Pre- and Post-Session SQL
Commands

34

Use the following guidelines when creating the SQL


statements:
- Use any command that is valid for the database
type. However, the Integration Service does not
allow nested comments, even though the database
might.
- Use a semicolon (;) to separate multiple
statements. The Integration Service issues a commit
after each statement.
- The Integration Service ignores semicolons
within /* ...*/.
- If you need to use a semicolon outside of
comments, you can escape it with a backslash (\).
- The Workflow Manager does not validate the
SQL.

Log options - Log options allow you to configure


how you want to save the session log. By default,
the Log Manager saves only the current
session log.
Error handling - Error Handling settings allow you
to determine if the session fails or continues when it
encounters pre-session command errors, stored
procedure errors, or a specified number of session
errors.
Partitioning options - Partitioning options allow
the Integration Service to determine the number of
partitions to create at run time.
Session on grid - When Session on Grid is enabled,
the Integration Service distributes session threads to
the nodes in a grid to increase performance and
scalability.

Error Handling You can configure error handling on the Config


Object tab.
You can choose to stop or continue the session if the
Integration Service encounters an error issuing the
pre- or post- session SQL command
The Workflow Manager provides the following types
of shell commands for each Session task:
Pre-session command - The Integration Service
performs pre-session shell commands at the
beginning of a session. You can configure a session
to stop or continue if a pre-session shell command
fails.
Post-session success command - The Integration
Service performs post-session success commands
only if the session completed successfully.
Post-session failure command - The Integration
Service performs post-session failure commands
only if the session failed to complete.
Pre-Session Shell Command Errors
If you select stop, the Integration Service stops the
session, but continues with the rest of the workflow.
If you select Continue, the Integration Service
ignores the errors and continues the session.
By default the Integration Service stops the session
upon shell command errors.
Session Configuration Object
Configuration Object and Config Object Tab Settings
You can configure the following settings in a session
configuration object or on the Config Object tab in
session properties:
Advanced - Advanced settings allow you to
configure constraint-based loading, lookup caches,
and buffer sizes.

35

Tasks

36

If you do not specify a condition in the Decision task,


the Integration Service evaluates the Decision task
to True.
Using the Decision Task
Use the Decision task instead of multiple link
conditions in a workflow.
Instead of specifying multiple link conditions, use
the predefined condition variable in a Decision task
to simplify link conditions.
Example
For example, you have a Command task that
depends on the status of the three sessions in the
workflow. You want the Integration Service to run
the Command task when any of the three sessions
fails. To accomplish this, use a Decision task with
the following decision condition:
$Q1_session.status = FAILED OR $Q2_session.status
= FAILED OR $Q3_session.status = FAILED
You can then use the predefined condition variable
in the input link condition of the Command task.
Configure the input link with the following link
condition:
$Decision.condition = True

Decision Task
You can enter a condition that determines the
execution of the workflow, similar to a link condition
with the Decision task.
The Decision task has a predefined variable called
$Decision_task_name.condition that represents the
result of the decision condition.
The Integration Service evaluates the condition in
the Decision task and sets the predefined condition
variable to True (1) or False (0).
You can specify one decision condition per
Decision task.
After the Integration Service evaluates the Decision
task, use the predefined condition variable in other
expressions in the workflow to help you develop the
workflow.
Depending on the workflow, you might use link
conditions instead of a Decision task.
However, the Decision task simplifies the workflow.

You can configure the same logic in the workflow


without the Decision task.
Without the Decision task, you need to use
three link conditions and treat the input links
to the Command task as OR links.
Event Task
You can define events in the workflow to specify the
sequence of task execution.
The event is triggered based on the completion of
the sequence of tasks.
Use the following tasks to help you use events in the
workflow:
Event-Raise task - Event-Raise task represents a
user-defined event. When the Integration Service
runs the Event-Raise task, the Event-Raise task

37

triggers the event. Use the Event-Raise task with the


Event-Wait task to define events.
Event-Wait task - The Event-Wait task waits for an
event to occur. Once the event triggers, the
Integration Service continues executing the rest of
the workflow.
To coordinate the execution of the workflow, you
may specify the following types of events for the
Event-Wait and
Event-Raise tasks:
Predefined event - A predefined event is a filewatch event. For predefined events, use an EventWait task to instruct the Integration Service to wait
for the specified indicator file to appear before
continuing with the rest of the workflow. When the
Integration Service locates the indicator file, it starts
the next task in the workflow.

you can choose a user-defined workflow variable to


specify the time.
Relative time - You instruct the Integration Service
to wait for a specified period of time after the Timer
task, the parent workflow, or the top-level workflow
starts.
For example, a workflow contains two sessions. You
want the Integration Service wait 10 minutes after
the first session completes before it runs the
second session. Use a Timer task after the first
session. In the Relative Time setting of the Timer
task, specify ten minutes from the start time of the
Timer task. Use a Timer task anywhere in the
workflow after the Start task.

User-defined event - A user-defined event is a


sequence of tasks in the workflow. Use an EventRaise task to specify the location of the user-defined
event in the workflow. A user-defined event is
sequence of tasks in the branch from the Start task
leading to the Event-Raise task.
When all the tasks in the branch from the Start task
to the Event-Raise task complete, the Event-Raise
task triggers the event. The Event-Wait task waits
for the Event-Raise task to trigger the event before
continuing with the rest of the tasks in its branch.

Timer Task
You can specify the period of time to wait before the
Integration Service runs the next task in the
workflow with the Timer task.
You can choose to start the next task in the workflow
at a specified time and date.
You can also choose to wait a period of time after
the start time of another task, workflow, or worklet
before starting the next task.
The Timer task has the following types of settings:
Absolute time - You specify the time that the
Integration Service starts running the next task in
the workflow. You may specify the date and time, or

Sources
Allocating Buffer Memory
When the Integration Service initializes a session, it
allocates blocks of memory to hold source and
target data.
The Integration Service allocates at least two blocks
for each source and target partition.
Sessions that use a large number of sources or
targets might require additional memory blocks.
If the Integration Service cannot allocate
enough memory blocks to hold the data, it
fails the session.

Partitioning Sources
You can create multiple partitions for relational,
Application, and file sources.
For relational or Application sources, the Integration
Service creates a separate connection to the source
database for each partition you set in the session
properties.
For file sources, you can configure the session to
read the source with one thread or multiple
threads.
Overriding the Source Table Name
If you override the source table name on the
Properties tab of the source instance, and you
override the source table name using an SQL query,
the Integration Service uses the source table name
defined in the SQL query.
Targets
Working with Relational Targets
When you configure a session to load data to a
relational target, you define most properties in the
Transformations view on the Mapping tab.
Performing a Test Load

38

With a test load, the Integration Service reads and


transforms data without writing to targets.
The Integration Service reads the number you
configure for the test load.
The Integration Service generates all session files
and performs all pre- and post-session functions, as
if running the full session.
To configure a session to perform a test load, enable
test load and enter the number of rows to test.
The Integration Service writes data to
relational targets, but rolls back the data
when the session completes.
For all other target types, such as flat file and
SAP BW, the Integration Service does not
write data to the targets.
Use the following guidelines when performing a test
load:
- You cannot perform a test load on sessions
using XML sources.
- You can perform a test load for relational
targets when you configure a session for
normal mode.
- If you configure the session for bulk mode,
the session fails.
- Enable a test load on the session Properties tab.
You can configure the following properties for
relational targets:
Target database connection - Define database
connection information.
Target properties - You can define target
properties such as target load type, target update
options, and reject options.
Truncate target tables - The Integration Service
can truncate target tables before loading data.
Deadlock retries - You can configure the session to
retry deadlocks when writing to targets or a
recovery table.
Drop and recreate indexes - Use pre- and postsession SQL to drop and recreate an index on a
relational target table to optimize query speed.
Constraint-based loading - The Integration
Service can load data to targets based on primary
key-foreign key constraints and active sources in the
session mapping.
Bulk loading - You can specify bulk mode when
loading to DB2, Microsoft SQL Server, Oracle, and
Sybase databases.

Table name prefix - You can specify the target


owner name or prefix in the session properties to
override the table name prefix in the mapping.
Pre-session SQL - You can create SQL commands
and execute them in the target database before
loading data to the target. For example, you might
want to drop the index for the target table before
loading data into it.
Post-session SQL - You can create SQL commands
and execute them in the target database after
loading data to the target. For example, you might
want to recreate the index for the target table after
loading data into it.
Target table name - You can override the target
table name for each relational target.
Target Table Truncation
If you enable truncate target tables with the
following sessions, the Integration Service does not
truncate target tables:
Incremental aggregation - When you enable both
truncate target tables and incremental aggregation
in the session properties, the Workflow Manager
issues a warning that you cannot enable truncate
target tables and incremental aggregation in the
same session.
Test load - When you enable both truncate target
tables and test load, the Integration Service disables
the truncate table function, runs a test load session,
and writes a message to the session log indicating
that the truncate target tables option is turned off
for the test load session.
Real-time - The Integration Service does not
truncate target tables when you restart a JMS or
WebSphere MQ real-time session that has recovery
data.

You can define the following properties in the


session and override the properties you define in
the mapping:

39

Dropping and Recreating Indexes


After you insert significant amounts of data into a
target, you normally need to drop and recreate
indexes on that table to optimize query speed.
You can drop and recreate indexes by:
Using pre- and post-session SQL - The preferred
method for dropping and re-creating indexes is to
define an SQL statement in the Pre SQL property
that drops indexes before loading data to the target.
Use the Post SQL property to recreate the indexes
after loading data to the target. Define the Pre SQL
and Post SQL properties for relational targets in the
Transformations view on the Mapping tab in the
session properties.
Using the Designer - The same dialog box you use
to generate and execute DDL code for table creation
can drop and recreate indexes. However, this
process is not automatic

Constraint-Based Loading
In the Workflow Manager, you can specify constraintbased loading for a session.
When you select this option, the Integration Service
orders the target load on a row-by-row basis.
For every row generated by an active source, the
Integration Service loads the corresponding
transformed row first to the primary key table, then
to any foreign key tables.
Constraint-based loading depends on the following
requirements:
Active source - Related target tables must have
the same active source.
Key relationships - Target tables must have key
relationships.
Target connection groups - Targets must be in
one target connection group.
Treat rows as insert - Use this option when you
insert into the target. You cannot use updates
with constraint based loading.

In the first pipeline, target T_1 has a primary key,


T_2 and T_3 contain foreign keys referencing the T1
primary key. T_3 has a primary key that T_4
references as a foreign key
Since these tables receive records from a single
active source, SQ_A, the Integration Service loads
rows to the target in the following order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The Integration Service loads T_1 first because it has
no foreign key dependencies and contains a
primary key referenced by T_2 and T_3.
The Integration Service then loads T_2 and T_3, but
since T_2 and T_3 have no dependencies, they are
not loaded in any particular order.
The Integration Service loads T_4 last, because it
has a foreign key that references a primary key in
T_3.
After loading the first set of targets, the Integration
Service begins reading source B.
If there are no key relationships between T_5 and
T_6, the Integration Service reverts to a normal
load for both targets.
If T_6 has a foreign key that references a primary
key in T_5, since T_5 and T_6 receive data from a
single active source, the Aggregator AGGTRANS,
the Integration Service loads rows to the tables in
the following order:
- T_5
- T_6
T_1, T_2, T_3, and T_4 are in one target connection
group if you use the same database connection for

40

each target, and you use the default partition


properties.
T_5 and T_6 are in another target connection group
together if you use the same database connection
for each target and you use the default partition
properties.
The Integration Service includes T_5 and T_6 in a
different target connection group because they are
in a different target load order group from the first
four targets.
Bulk Loading
You can enable bulk loading when you load to DB2,
Sybase, Oracle, or Microsoft SQL Server.
If you enable bulk loading for other database types,
the Integration Service reverts to a normal load.
When bulk loading, the Integration Service invokes
the database bulk utility and bypasses the
database log, which speeds performance.
Without writing to the database log, however, the
target database cannot perform rollback.
Note: When loading to DB2, Microsoft SQL Server,
and Oracle targets, you must specify a normal load
for data driven sessions. When you specify bulk
mode and data driven, the Integration Service
reverts to normal load.
Committing Data
When bulk loading to Sybase and DB2 targets, the
Integration Service ignores the commit interval you
define in the session properties and commits data
when the writer block is full.
When bulk loading to Microsoft SQL Server and
Oracle targets, the Integration Service commits
data at each commit interval. Also, Microsoft SQL
Server and Oracle start a new bulk load transaction
after each commit.

Use the following rules and guidelines when working


with reserved words.
- The Integration Service searches the reserved
words file when it generates SQL to connect to
source, target, and lookup databases.
- If you override the SQL for a source, target, or
lookup, you must enclose any reserved word in
quotes.
Working with Active Sources
An active source is an active transformation the
Integration Service uses to generate rows. An
active source can be any of the following
transformations:
- Aggregator
- Application Source Qualifier
- Custom, configured as an active transformation
- Joiner
- MQ Source Qualifier
- Normalizer (VSAM or pipeline)
- Rank
- Sorter
- Source Qualifier
- XML Source Qualifier
- Mapplet, if it contains any of the above
transformations
Note: The Filter, Router, Transaction Control, and
Update Strategy transformations are active
transformations in that they can change the
number of rows that pass through. However, they
are not active sources in the mapping because they
do not generate rows. Only transformations that
can generate rows are active sources.

Reserved Words
If any table name or column name contains a
database reserved word, such as MONTH or YEAR,
the session fails with database errors when the
Integration Service executes SQL against the
database.
You can create and maintain a reserved words file,
reswords.txt, in the server/bin directory.
When the Integration Service initializes a session, it
searches for reswords.txt.
If the file exists, the Integration Service places
quotes around matching reserved words when it
executes SQL against the database.

Integration Service Handling for File Targets

41

Writing to Fixed-Width Flat Files with Relational


Target Definitions

Expressions - Expressions in the workflow must be


valid.
Tasks - Non-reusable task and Reusable task
instances in the workflow must follow validation
rules.
Scheduler - If the workflow uses a reusable
scheduler, the Workflow Manager verifies that the
scheduler exists.
The Workflow Manager marks the workflow invalid if
the scheduler you specify for the workflow does not
exist in the folder.
The Workflow Manager also verifies that you linked
each task properly.
Note: The Workflow Manager validates
Session tasks separately. If a session is
invalid, the workflow may still be valid

Writing to Fixed-Width Files with Flat File Target


Definitions
When you want to output to a fixed-width flat file
based on a flat file target definition, you must
configure precision and field width for the target
field to accommodate the total length of the target
field.
If the data for a target field is too long for the total
length of the field, the Integration Service performs
one of the following actions:
- Truncates the row for string columns
- Writes the row to the reject file for numeric and
date time columns
Note: When the Integration Service writes a row to
the reject file, it writes a message in the session
log.
Writing Empty Fields for Unconnected Ports in FixedWidth File Definitions
The Integration Service does not write data in
unconnected ports to fixed-width files.
If you want the Integration Service to write empty
fields for the unconnected ports, create output
ports in an upstream transformation that do not
contain data.
Then connect these ports containing null values to
the fixed width flat file target definition
Workflow Validation
The Workflow Manager validates the following
properties:

Validating Multiple Workflows


You can validate multiple workflows or worklets
without fetching them into the workspace.
To validate multiple workflows, you must select and
validate the workflows from a query results view or
a view dependencies list.
When you validate multiple workflows, the validation
does not include sessions, nested worklets, or
reusable worklet objects in the workflows.
You can save and optionally check in workflows that
change from invalid to valid status.

Task Validation
The Workflow Manager validates each task in the
workflow as you create it.
When you save or validate the workflow, the
Workflow Manager validates all tasks in the
workflow except Session tasks
When you delete a reusable task, the Workflow
Manager removes the instance of the deleted task
from workflows.
The Workflow Manager also marks the workflow
invalid when you delete a reusable task used in a
workflow.
The Workflow Manager verifies that there are no
duplicate task names in a folder, and that there are
no duplicate task instances in the workflow

The Workflow Manager uses the following rules to


validate tasks:
Assignment - The Workflow Manager validates the
expression you enter for the Assignment task. For
example, the Workflow Manager verifies that you
assigned a matching datatype value to the workflow
variable in the assignment expression.

42

Command - The Workflow Manager does not


validate the shell command you enter for the
Command task.
Event-Wait - If you choose to wait for a predefined
event, the Workflow Manager verifies that you
specified a file to watch. If you choose to use the
Event-Wait task to wait for a user-defined event, the
Workflow Manager verifies that you specified an
event.
Event-Raise - The Workflow Manager verifies that
you specified a user-defined event for the EventRaise task.
Timer - The Workflow Manager verifies that the
variable you specified for the Absolute Time setting
has the Date/Time datatype.
Start - The Workflow Manager verifies that you
linked the Start task to at least one task in the
workflow.
When a task instance is invalid, the workflow using
the task instance becomes invalid.
When a reusable task is invalid, it does not affect
the validity of the task instance used in the
workflow.
However, if a Session task instance is invalid, the
workflow may still be valid.
The Workflow Manager validates sessions differently.

Session Validation
If you delete objects associated with a Session task
such as session configuration object, Email, or
Command task, the Workflow Manager marks a
reusable session invalid.
However, the Workflow Manager does not mark a
non-reusable session invalid if you delete an object
associated with the session.
If you delete a shortcut to a source or target from
the mapping, the Workflow Manager does not mark
the session invalid.
The Workflow Manager does not validate SQL
overrides or filter conditions entered in the session
properties when you validate a session.
You must validate SQL override and filter conditions
in the SQL Editor.
If a reusable or non-reusable session instance is
invalid, the Workflow Manager marks it invalid in
the Navigator and in the Workflow Designer
workspace.
Workflows using the session instance remain valid

Expression Validation
The Workflow Manager validates all expressions in
the workflow. You can enter expressions in the
Assignment task, Decision task, and link conditions.

The Workflow Manager writes any error message to


the Output window
Workflow Schedules
You can schedule a workflow to run continuously,
repeat at a given time or interval, or you can
manually start a workflow
If you configure multiple instances of a
workflow, and you schedule the workflow run
time, the Integration Service runs all
instances at the scheduled time. You cannot
schedule workflow instances to run at
different times.
If you choose a different Integration Service for the
workflow or restart the Integration Service, it
reschedules all workflows.
This includes workflows that are scheduled to run
continuously but whose start time has passed and
workflows that are scheduled to run continuously
but were unscheduled.
You must manually reschedule workflows whose
start time has passed if they are not scheduled to
run continuously.
If you delete a folder, the Integration Service
removes workflows from the schedule when it
receives notification from the Repository Service.
If you copy a folder into a repository, the Integration
Service reschedules all workflows in the folder
when it receives the notification.
The Integration Service does not run the workflow in
the following situations:
- The prior workflow run fails. When a workflow fails,
the Integration Service removes the workflow from
the schedule, and you must manually reschedule it.
You can reschedule the workflow in the Workflow
Manager or using pmcmd.
- The Integration Service process fails during a prior
workflow run. When the Integration Services process
fails in a highly available domain and a running
workflow is not configured for recovery, the
Integration Service removes the workflow from the
schedule. You can reschedule the workflow in the
Workflow Manager or using pmcmd.
- You remove the workflow from the schedule. You
can remove the workflow from the schedule in the
Workflow Manager or using pmcmd.
- The Integration Service is running in safe mode. In
safe mode, the Integration Service does not run
scheduled workflows, including workflows scheduled
to run continuously or run on service initialization.

43

When you enable the Integration Service in normal


mode, the Integration Service runs the scheduled
workflows.

configuration database. The domain configuration


database stores information such as the path to the
log file location, the node that contains the log, and
the Integration Service that created the log.
3. When you view a session or workflow in the Log
Events window, the Log Manager retrieves the
information from the domain configuration database
to determine the location of the session or workflow
logs.
4. The Log Manager dispatches a Log Agent to
retrieve the log events on each node to display in
the Log Events window.
You can also configure a workflow to produce text
log files
When you configure the workflow or session
to produce text log files, the Integration
Service creates the binary log and the text
log file
Message Severity

Session and Workflow Logs


The following steps describe how the Log Manager
processes session and workflow logs:
1. The Integration Service writes binary log files on
the node. It sends information about the sessions
and workflows to the Log Manager.
2. The Log Manager stores information about
workflow and session logs in the domain

Log Events Window


The Log Events window displays the following
information for each session and workflow:
Severity - Lists the type of message, such as
informational or error.
Time stamp - Date and time the log event reached
the Log Agent.
Node - Node on which the Integration Service
process is running.
Thread - Thread ID for the workflow or session.
Process ID - Windows or UNIX process identification
numbers. Displays in the Output window only.
Message Code - Message code and number.
Message - Message associated with the log event.
Writing to Log Files
When you create a workflow or session log, you can
configure log options in the workflow or session
properties.
You can configure the following information for a
workflow or session log:

44

Write Backward Compatible Log File - Select


this option to create a text file for workflow or
session logs. If you do not select the option, the
Integration Service creates the binary log
only.
Log File Directory - The directory where you want
the log file created. By default, the Integration
Service writes the workflow log file in the directory
specified in the service process variable,
$PMWorkflowLogDir. It writes the session log file in
the directory specified in the service process
variable, $PMSessionLogDir

Partition points - Partition points mark thread


boundaries and divide the pipeline into stages. The
Integration Service redistributes rows of data at
partition points.
Number of partitions - A partition is a pipeline
stage that executes in a single thread. If you
purchase the Partitioning option, you can set the
number of partitions at any partition point. When
you add partitions, you increase the number of
processing threads, which can improve session
performance.
Partition types - The Integration Service creates a
default partition type at each partition point. If you
have the Partitioning option, you can change the
partition type. The partition type controls how the
Integration Service distributes data among partitions
at partition points

5. ADVANCED WORKFLOW GUIDE


Understanding Pipeline Partitioning
A partition is a pipeline stage that executes in a
single reader, transformation, or writer thread.
The number of partitions in any pipeline stage
equals the number of threads in the stage.
By default, the Integration Service creates one
partition in every pipeline stage
Complete the following tasks to configure partitions
for a session:
- Set partition attributes including partition points,
the number of partitions, and the partition types.
- You can enable the Integration Service to set
partitioning at run time. When you enable
dynamic partitioning, the Integration Service
scales the number of session partitions based
on factors such as the source database
partitions or the number of nodes in a grid.
- After you configure a session for partitioning, you
can configure memory requirements and cache
directories for each transformation.
- The Integration Service evaluates mapping
variables for each partition in a target load
order group. You can use variable functions in the
mapping to set the variable values.
- When you create multiple partitions in a
pipeline, the Workflow Manager verifies that
the Integration Service can maintain data
consistency
in
the
session
using
the
partitions. When you edit object properties in
the session, you can impact partitioning and
cause a session to fail.
- You add or edit partition points in the session
properties. When you change partition points you
can define the partition type and add or delete
partitions
Partitioning Attributes

Partition Points
A stage is a section of a pipeline between any
two partition points.
When you set a partition point at a transformation,
the
new
pipeline
stage
includes
that
transformation.
Figure shows the default partition points and
pipeline stages for a mapping with one pipeline:

When you add a partition point, you increase the


number of pipeline stages by one.
Similarly, when you delete a partition point, you
reduce the number of stages by one.
Partition points mark the points in the pipeline
where the Integration Service can redistribute data
across partitions.
Number of Partitions
A partition is a pipeline stage that executes in a
single reader, transformation, or writer thread.
The number of partitions in any pipeline stage
equals the number of threads in that stage
You can define up to 64 partitions at any
partition point in a pipeline
When you increase or decrease the number of
partitions at any partition point, the
Workflow Manager increases or decreases the

45

number of partitions at all partition points in


the pipeline.
The number of partitions remains consistent
throughout the pipeline
If you define three partitions at any partition
point, the Workflow Manager creates three
partitions at all other partition points in the
pipeline
The number of partitions you create equals the
number of connections to the source or target.
If the pipeline contains a relational source or target,
the number of partitions at the source qualifier or
target instance equals the number of connections
to the database.
If the pipeline contains file sources, you can
configure the session to read the source with one
thread or with multiple threads.

from the corresponding nodes in the database. You


can use database partitioning with Oracle or
IBM DB2 source instances on a multi-node
tablespace. You can use database partitioning with
DB2 targets.
Hash auto-keys - The Integration Service uses a
hash function to group rows of data among
partitions. The Integration Service groups the
data based on a partition key. The Integration
Service uses all grouped or sorted ports as a
compound partition key. You may need to use
hash auto-keys partitioning at Rank, Sorter,
and unsorted Aggregator transformations.
Hash user keys - The Integration Service uses a
hash function to group rows of data among
partitions. You define the number of ports to
generate the partition key.
Key range - With key range partitioning, the
Integration Service distributes rows of data based on
a port or set of ports that you define as the partition
key. For each port, you define a range of values. The
Integration Service uses the key and ranges to send
rows to the appropriate partition. Use key range
partitioning when the sources or targets in
the pipeline are partitioned by key range.
Pass-through - In pass-through partitioning, the
Integration Service processes data without
redistributing rows among partitions. All rows
in a single partition stay in the partition after
crossing a pass-through partition point. Choose
pass-through partitioning when you want to create
an
additional
pipeline
stage
to
improve
performance, but do not want to change the
distribution of data across partitions.
Round-robin - The Integration Service distributes
data evenly among all partitions. Use round-robin
partitioning where you want each partition to
process approximately the same number of rows.

For example, when you define three partitions


across the mapping, the master thread creates
three threads at each pipeline stage, for a
total of 12 threads

Partitioning Multiple Input Group Transformations


When you connect more than one pipeline to a
multiple input group transformation, the Integration
Service maintains the transformation threads or
creates a new transformation thread depending on
whether or not the multiple input group
transformation is a partition point:
Partition point does not exist at multiple
input group transformation - When a partition
point does not exist at a multiple input group
transformation, the Integration Service processes
one thread at a time for the multiple input group
transformation and all downstream transformations
in the stage.
Partition point exists at multiple input group
transformation - When a partition point exists at
a multiple input group transformation, the
Integration Service creates a new pipeline stage
and processes the stage with one thread for each
partition. The Integration Service creates one
transformation thread for each partition regardless
of the number of output groups the transformation
contains
Partition Types
You can define the following partition types in the
Workflow Manager:
Database partitioning - The Integration Service
queries the IBM DB2 or Oracle database system for
table partition information. It reads partitioned data

Dynamic Partitioning
When you use dynamic partitioning, you can
configure the partition information so the
Integration Service determines the number of
partitions to create at run time
The Integration Service scales the number of
session partitions at run time based on
factors such as source database partitions or
the number of nodes in a grid.
If any transformation in a stage does not
support partitioning, or if the partition
configuration does not support dynamic
partitioning, the Integration Service does not

46

scale partitions in the pipeline. The data passes


through one partition.

-The session fails if you use a parameter other


than $DynamicPartitionCount to set the
number of partitions.

Note: Do not configure dynamic partitioning


for a session that contains manual partitions.
If you set dynamic partitioning to a value
other than disabled and you manually
partition the session, the session is invalid
Configuring Dynamic Partitioning
Configure dynamic partitioning using one of the
following methods:
Disabled - Do not use dynamic partitioning. Define
the number of partitions on the Mapping tab.
Based on number of partitions- Sets the
partitions to a number that you define in the
Number
of
Partitions
attribute.
Use
the
$DynamicPartitionCount session parameter, or enter
a number greater than 1.
Based on number of nodes in grid - Sets the
partitions to the number of nodes in the grid running
the session. If you configure this option for sessions
that do not run on a grid, the session runs in one
partition and logs a message in the session log.
Based on source partitioning - Determines the
number of partitions using database partition
information. The number of partitions is the
maximum of the number of partitions at the source.
For Oracle sources that use composite partitioning,
the number of partitions is the maximum of the
number of subpartitions at the source.
Based on number of CPUs - Sets the number of
partitions equal to the number of CPUs on the node
that prepares the session. If the session is
configured to run on a grid, dynamic partitioning
sets the number of partitions equal to the number of
CPUs on the node that prepares the session
multiplied by the number of nodes in the grid.
Rules and Guidelines for Dynamic Partitioning
Use the following rules and guidelines with dynamic
partitioning:
-Dynamic partitioning uses the same connection for
each partition.
-You cannot use dynamic partitioning with XML
sources and targets.
-You cannot use dynamic partitioning with the
Debugger.
-Sessions that use SFTP fail if you enable dynamic
partitioning.
-When you set dynamic partitioning to a value other
than disabled, and you manually partition the
session on the Mapping tab, you invalidate the
session.

The
following
dynamic
partitioning
configurations cause a session to run with one
partition:
1. You override the default cache directory for an
Aggregator, Joiner, Lookup, or Rank transformation.
The Integration Service partitions a transformation
cache directory when the default is $PMCacheDir.
2. You override the Sorter transformation default
work directory. The Integration Service partitions the
Sorter transformation work directory when the
default is $PMTempDir.
3. You use an open-ended range of numbers or date
keys with a key range partition type.
4. You use datatypes other than numbers or dates
as keys in key range partitioning.
5. You use key range relational target partitioning.
6. You create a user-defined SQL statement or a
user-defined source filter.
7. You set dynamic partitioning to the number of
nodes in the grid, and the session does not run on a
grid.
8. You use pass-through relational
source
partitioning.
9. You use dynamic partitioning with an Application
Source Qualifier.
10. You use SDK or PowerConnect sources and
targets with dynamic partitioning
Cache Partitioning
When you create a session with multiple partitions,
the
Integration
Service
may
use
cache
partitioning for the Aggregator, Joiner,
Lookup, Rank, and Sorter transformations.
When the Integration Service partitions a cache, it
creates a separate cache for each partition
and allocates the configured cache size to
each partition. The Integration Service stores
different data in each cache, where each cache
contains only the rows needed by that partition.
As a result, the Integration Service requires a portion
of total cache memory for each partition.

Mapping Variables in Partitioned Pipelines


Integration Service evaluates the value of a
mapping variable in each partition separately.
The Integration Service uses the following process to
evaluate variable values:

47

1. It updates the current value of the variable


separately in each partition according to the
variable function used in the mapping.
2. After loading all the targets in a target load order
group, the Integration Service combines the current
values from each partition into a single final value
based on the aggregation type of the variable.
3. If there is more than one target load order group
in the session, the final current value of a mapping
variable in a target load order group becomes the
current value in the next target load order group.
4. When the Integration Service finishes loading the
last target load order group, the final current value
of the variable is saved into the repository

You can configure a session to read a source file with


one thread or with multiple threads.
You must choose the same connection type for
all partitions that read the file.
When you configure a session to write to a file
target, you can write the target output to a
separate file for each partition or to a merge file
that contains the target output for all partitions.
You can configure connection settings and file
properties for each target partition.
When
you
create
a
partition
point
at
transformations, the Workflow Manager sets the
default partition type.
You can change the partition type depending on the
transformation type
Adding and Deleting Partition Points

The following changes to mappings can cause


session failure:
Any changes you make (such as adding/altering
transformations etc) that affects the existing
partitions or partition points.
Partition Points
Partition points mark the boundaries between
threads in a pipeline and divide the pipeline into
stages.
By default, IS creates one Reader and one
Writer Partition points.
The Integration Service redistributes rows of data at
partition points.
You can add partition points to increase the number
of transformation threads and increase session
performance.

Rules and Guidelines for Adding and Deleting


Partition Points
- You cannot create a partition point at a source
instance.
- You cannot create a partition point at a
Sequence Generator transformation or an
unconnected transformation.
- You can add a partition point at any other
transformation provided that no partition
point receives input from more than one
pipeline stage.
- You cannot delete a partition point at a
Source Qualifier transformation, a Normalizer
transformation for COBOL sources, or a target
instance.

When you configure a session to read a source


database, the Integration Service creates a
separate connection and SQL query to the
source database for each partition
When you configure a session to load data to a
relational target, the Integration Service creates a
separate connection to the target database for
each partition at the target instance.
You configure the reject file names and directories
for the target. The Integration Service creates
one reject file for each target partition

48

- You cannot delete a partition point at a multiple


input group Custom transformation that is
configured to use one thread per partition.
- You cannot delete a partition point at a multiple
input group transformation that is upstream from a
multiple input group Custom transformation that is
configured to use one thread per partition
- The following partition types have restrictions with
dynamic partitioning:
Pass-through
When
you
use
dynamic
partitioning, if you change the number of partitions
at a partition point, the number of partitions in each
pipeline stage changes.
Key Range - To use key range with dynamic
partitioning you must define a closed range of
numbers or date keys. If you use an open-ended
range, the session runs with one partition.
You can add and delete partition points at
other
transformations
in
the
pipeline
according to the following rules:
- You cannot create partition points at source
instances.
- You cannot create partition points at
Sequence
Generator
transformations
or
unconnected transformations.
- You can add partition points at any other
transformation provided that no partition
point receives input from more than one
pipeline stage.

To specify single- or multi-threaded reading for flat


file sources, configure the source file name
property for partitions 2-n.
To configure for single-threaded reading, pass empty
data through partitions 2-n.
To configure for multi-threaded reading, leave the
source file name blank for partitions 2-n.
Rules and Guidelines for Partitioning File Sources
Use the following rules and guidelines when you
configure a file source session with multiple
partitions:
- Use pass-through partitioning at the source
qualifier.
- Use single- or multi-threaded reading with flat file
or COBOL sources.
- Use single-threaded reading with XML sources
- You cannot use multi-threaded reading if the
source files are non-disk files, such as FTP files or
WebSphere MQ sources.
- If you use a shift-sensitive code page, use multithreaded reading if the following conditions are true:
- The file is fixed-width.
- The file is not line sequential.
- You did not enable user-defined shift state in the
source definition.
- To read data from the three flat files concurrently,
you must specify three partitions at the source
qualifier. Accept the default partition type, passthrough.
- If you configure a session for multi-threaded
reading, and the Integration Service cannot create
multiple threads to a file source, it writes a message
to the session log and reads the source with one
thread.
- When the Integration Service uses multiple threads
to read a source file, it may not read the rows in the
file sequentially. If sort order is important, configure
the session to read the file with a single thread. For
example, sort order may be important if the
mapping contains a sorted Joiner transformation and
the file source is the sort origin.
- You can also use a combination of direct and
indirect files to balance the load.
- Session performance for multi-threaded reading is
optimal with large source files. The load may be
unbalanced if the amount of input data is small.
- You cannot use a command for a file source if the
command generates source data and the session is
configured to run on a grid or is configured with the
resume from the last checkpoint recovery strategy.

Note: When you create a custom SQL query to read


database tables and you set database partitioning,
the Integration Service reverts to pass-through
partitioning and prints a message in the session log.
Partitioning File Sources
The Integration Service creates one connection to
the file source when you configure the session to
read with one thread, and it creates multiple
concurrent connections to the file source when you
configure the session to read with multiple threads.
Use the following types of partitioned file sources:
Flat file - You can configure a session to read flat
file, XML, or COBOL source files.
Command - You can configure a session to use an
operating system command to generate source data
rows or generate a file list.
When connecting to file sources, you must choose
the same connection type for all partitions.
You may choose different connection objects as long
as each object is of the same type.

Using One Thread to Read a File Source

49

When the Integration Service uses one thread to


read a file source, it creates one connection to the
source.
The Integration Service reads the rows in the file or
file list sequentially.
You can configure single-threaded reading for direct
or indirect file sources in a session:
Reading direct files - You can configure the
Integration Service to read from one or more direct
files. If you configure the session with more than one
direct file, the Integration Service creates a
concurrent connection to each file. It does not create
multiple connections to a file.
Reading indirect files - When the Integration
Service reads an indirect file, it reads the file list and
then reads the files in the list sequentially. If the
session has more than one file list, the Integration
Service reads the file lists concurrently, and it reads
the files in the list sequentially.
Using Multiple Threads to Read a File Source
When the Integration Service uses multiple threads
to read a source file, it creates multiple concurrent
connections to the source.
The Integration Service may or may not read the
rows in a file sequentially.
You can configure multi-threaded reading for direct
or indirect file sources in a session:
Reading direct files - When the Integration
Service reads a direct file, it creates multiple reader
threads to read the file concurrently. You can
configure the Integration Service to read from one or
more direct files. For example, if a session reads
from two files and you create five partitions, the
Integration Service may distribute one file between
two partitions and one file between three partitions.
Reading indirect files - When the Integration
Service reads an indirect file, it creates multiple
threads to read the file list concurrently. It also
creates multiple threads to read the files in the list
concurrently. The Integration Service may use more
than one thread to read a single file.
Partitioning Joiner Transformations
When you create a partition point at the Joiner
transformation, the Workflow Manager sets the
partition type to hash auto-keys when the
transformation scope is All Input.
The Workflow Manager sets the partition type
to passthrough when the transformation
scope is Transaction.
You must create the same number of partitions for
the master and detail source.

If you configure the Joiner transformation for sorted


input, you can change the partition type to passthrough.
You can specify only one partition if the pipeline
contains the master source for a Joiner
transformation and you do not add a partition point
at the Joiner transformation.
The Integration Service uses cache partitioning
when you create a partition point at the Joiner
transformation.
When you use partitioning with a Joiner
transformation, you can create multiple partitions
for the master and detail source of a Joiner
transformation.
If you do not create a partition point at the Joiner
transformation, you can create n partitions for the
detail source, and one partition for the master
source (1:n).
Note: You cannot add a partition point at the
Joiner transformation when you configure the
Joiner
transformation
to
use
the
row
transformation scope
When you join data, you can partition data for the
master and detail pipelines in the following ways:
1:n - Use one partition for the master source and
multiple partitions for the detail source. The
Integration Service maintains the sort order because
it does not redistribute master data among
partitions.
n:n - Use an equal number of partitions for the
master and detail sources. When you use n:n
partitions, the Integration Service processes multiple
partitions concurrently. You may need to configure
the partitions to maintain the sort order depending
on the type of partition you use at the Joiner
transformation.
Note: When you use 1:n partitions, do not add a
partition point at the Joiner transformation. If you
add a partition point at the Joiner transformation,
the Workflow Manager adds an equal number of
partitions to both master and detail pipelines.
Pushdown Optimization
When you run a session configured for pushdown
optimization, the Integration Service translates the
transformation logic into SQL queries and sends the
SQL queries to the database.
The source or target database executes the SQL
queries to process the transformations.
The amount of transformation logic you can push to
the
database
depends
on
the
database,

50

transformation
logic,
mapping
and
session
configuration.
The Integration Service processes all transformation
logic that it cannot push to a database
Pushdown Optimization Types
You can configure the following types of pushdown
optimization:
Source-side pushdown optimization
- The Integration Service pushes as much
transformation logic as possible to the source
database.
- Integration Service analyzes the mapping from the
source to the target or until it reaches a downstream
transformation it cannot push to the source
database.
- Integration Service generates and executes a
SELECT statement based on the transformation logic
for each transformation it can push to the database.
- Then, it reads the results of this SQL query and
processes the remaining transformations
Target-side pushdown optimization
- The Integration Service pushes as much
transformation logic as possible to the target
database.
- Integration Service analyzes the mapping from the
target to the source or until it reaches an upstream
transformation it cannot push to the target
database.
- It generates an INSERT, DELETE, or UPDATE
statement based on the transformation logic for
each transformation it can push to the target
database.
Full pushdown optimization
- Integration Service attempts to push all
transformation logic to the target database.
- To use full pushdown optimization, the
source and target databases must be in the
same relational database management system
- If the Integration Service cannot push all
transformation logic to the database, it performs
both
source-side
and
target-side
pushdown
optimization.
- When you run a session with large quantities of
data and full pushdown optimization, the database
server must run a long transaction.
- Consider the following database performance
issues when you generate a long transaction:
- A long transaction uses more database
resources.
- A long transaction locks the database for longer
periods of time. This reduces database concurrency
and increases the likelihood of deadlock.

- A long transaction increases the likelihood of an


unexpected event.
- To minimize database performance issues for long
transactions, consider using source-side or targetside pushdown optimization

Active and Idle Databases


During pushdown optimization, the Integration
Service pushes the transformation logic to one
database, which is called the active database.
A database that does not process transformation
logic is called an idle database.
For example, a mapping contains two sources that
are joined by a Joiner transformation.
If the session is configured for source-side pushdown
optimization, the Integration Service pushes the
Joiner transformation logic to the source in the
detail pipeline, which is the active database.
The source in the master pipeline is the idle
database
because
it
does
not
process
transformation logic.
The Integration Service uses the following criteria to
determine which database is active or idle:
- When using full pushdown optimization, the target
database is active and the source database is idle.
- In sessions that contain a Lookup transformation,
the source or target database is active, and the
lookup database is idle.
- In sessions that contain a Joiner transformation,
the source in the detail pipeline is active, and the
source in the master pipeline is idle.
- In sessions that contain a Union transformation,
the source in the first input group is active. The
sources in other input groups are idle

Pushdown Compatibility
To push a transformation with multiple connections
to a database, the connections must be pushdown
compatible.
Connections are pushdown compatible if they
connect to databases on the same database
management system and the Integration Service
can identify the database tables that the
connections access.
The following transformations can have multiple
connections:
Joiner - The Joiner transformation can join data from
multiple source connections.
Union - The Union transformation can merge data
from multiple source connections.

51

Lookup - The connection for the Lookup


transformation
can differ from
the
source
connection.
Target - The target connection can differ from the
source connection.
Each connection object is pushdown compatible with
itself.
If you configure a session to use the same
connection object for the source and target
connections, the Integration Service can push the
transformation logic to the source or target
database
Error Handling, Logging, and Recovery
The Integration Service and database process error
handling, logging, and recovery differently.
Error Handling
When the Integration Service pushes transformation
logic to the database, it cannot track errors that
occur in the database.
As a result, it handles errors differently than when it
processes the transformations in the session.
When the Integration Service runs a session
configured for full pushdown optimization and an
error occurs, the database handles the errors.
When the database handles errors, the Integration
Service does not write reject rows to the reject file.
Logging
When the Integration Service pushes transformation
logic to the database, it cannot trace all the events
that occur inside the database server.
The statistics the Integration Service can trace
depend on the type of pushdown optimization.
When you push transformation logic to the
database, the Integration Service generates a
session log with the following differences:
- The session log does not contain details for
transformations processed by the database.
- The session log does not contain the thread busy
percentage when the session is configured for full
pushdown optimization.
- The session log contains the number of loaded
rows when the session is configured for source-side,
targetside, and full pushdown optimization.
- The session log does not contain the number of
rows read from the source when the Integration
Service uses full pushdown optimization and pushes
all transformation logic to the database.
- The session log contains the number of rows read
from each source when the Integration Service uses
sourceside pushdown optimization.

Recovery
If you configure a session for full pushdown
optimization and the session fails, the Integration
Service cannot perform incremental recovery
because
the
database
processes
the
transformations.
Instead, the database rolls back the transactions.
If the database server fails, it rolls back transactions
when it restarts.
If the Integration Service fails, the database server
rolls back the transaction.
If the failure occurs while the Integration Service is
creating temporary sequence objects or views in
the database, which is before any rows have been
processed, the Integration Service runs the
generated SQL on the database again.
If the failure occurs before the database processes
all rows, the Integration Service performs the
following tasks:
1. If applicable, the Integration Service drops and
recreates temporary view or sequence objects in the
database to ensure duplicate values are not
produced.
2. The Integration Service runs the generated SQL
on the database again.
If the failure occurs while the Integration Service is
dropping the temporary view or sequence objects
from the database, which is after all rows are
processed, the Integration Service tries to drop the
temporary objects again
Using the $$PushdownConfig Mapping Parameter
Depending on the database workload, you might
want to use source-side, target-side, or full
pushdown optimization at different times.
For example, use source-side or target-side
pushdown optimization during the peak hours of
the day, but use full pushdown optimization from
midnight until 2 a.m. when database activity is low.
To
use
different
pushdown
optimization
configurations at different times, use the $
$PushdownConfig mapping parameter.
The settings in the $$PushdownConfig parameter
override the pushdown optimization settings in the
session properties.
Partitioning
You can push a session with multiple partitions to a
database if the partition types are pass-through
partitioning or key range partitioning

52

Rules and Guidelines for Pushdown Optimization and


Transformations
Use the following rules and guidelines when you
configure the Integration Service to push
transformation logic to a database.
The
Integration
Service
processes
the
transformation logic if any of the following
conditions are true:
- The transformation logic updates a mapping
variable and saves it to the repository database.
- The transformation contains a variable port.
- The transformation meets all of the following
criteria:
- Is not a Sorter transformation, Union
transformation, or target.
- Is pushed to Microsoft SQL Server, Sybase, or
Teradata.
- Is downstream from a Sorter transformation,
which is downstream from a Union transformation or
contains a distinct sort.
- The session is configured to override the default
values of input or output ports.
- The database does not have an equivalent
operator, variable, or function that is used in an
expression in the transformation.
- The mapping contains too many branches. When
you branch a pipeline, the SQL statement required
to represent the mapping logic becomes more
complex. The Integration Service cannot generate
SQL for a mapping that contains more than 64 twoway branches, 43 three-way branches, or 32 fourway branches. If the mapping branches exceed
these limitations, the Integration Service processes
the downstream transformations.
The
Integration
Service
processes
all
transformations in the mapping if any of the
following conditions are true:
- The session is a data profiling or debug session.
- The session is configured to log row errors.
Row Error Logging
You can log row errors into relational tables or flat
files.
When you enable error logging, the Integration
Service creates the error tables or an error log file
the first time it runs the session. Error logs are
cumulative.
If the error logs exist, the Integration Service
appends error data to the existing error logs
You can log source row data from flat file or
relational sources but you cannot log row
errors from XML file sources.

You can view the XML source errors in the session


log
By
default,
the
Integration
Service
logs
transformation errors in the session log and reject
rows in the reject file.
When
you
enable
error
logging,
the
Integration Service does not generate a
reject file or write dropped rows to the
session log.
Without a reject file, the Integration Service does not
log Transaction Control transformation rollback or
commit errors.
If you want to write rows to the session log in
addition to the row error log, you can enable
verbose data tracing.
Note: When you log row errors, session
performance may decrease because the
Integration Service processes one row at a
time instead of a block of rows at once
Understanding the Error Log Tables
When you choose relational database error logging,
the Integration Service creates the following error
tables the first time you run a session:
PMERR_DATA - Stores data and metadata about
a transformation row error and its corresponding
source row.
PMERR_MSG - Stores metadata about an error and
the error message.
PMERR_SESS - Stores metadata about the session.
PMERR_TRANS - Stores metadata about the source
and transformation ports, such as name and
datatype, when a transformation error occurs.
If the error tables exist for a session, the Integration
Service appends row errors to these tables.
Relational database error logging lets you
collect row errors from multiple sessions in
one set of error tables.
You can specify a prefix for the error tables. The
error table names can have up to eleven
characters.
The Integration Service creates the error tables
without specifying primary and foreign keys.
However, you can specify key columns
Workflow Recovery
You can recover a workflow if the Integration Service
can access the workflow state of operation.
The workflow state of operation includes the status
of tasks in the workflow and workflow variable
values.

53

The Integration Service stores the state in memory


or on disk, based on how you configure the
workflow:
Enable recovery - When you enable a workflow for
recovery, the Integration Service saves the
workflow state of operation in a shared
location. You can recover the workflow if it
terminates, stops, or aborts. The workflow
does not have to be running.
Suspend - When you configure a workflow to
suspend on error, the Integration Service stores
the workflow state of operation in memory.
You can recover the suspended workflow if a task
fails. You can fix the task error and recover the
workflow
The Integration Service recovers tasks in the
workflow based on the recovery strategy of the
task.
By default, the recovery strategy for Session
and Command tasks is to fail the task and
continue running the workflow.
You can configure the recovery strategy for Session
and Command tasks.
The strategy for all other tasks is to restart
the task
Workflow State of Operation
When you enable a workflow for recovery, the
Integration Service stores the workflow state of
operation in the shared location, $PMStorageDir.
The Integration Service can restore the state of
operation to recover a stopped, aborted, or
terminated workflow.
The workflow state of operation
following information:
- Active service requests
- Completed and running task status
- Workflow variable values

includes

the

When you run concurrent workflows, the Integration


Service appends the instance name or the workflow
run ID to the workflow recovery storage file in
$PMStorageDir.
When you enable a workflow for recovery the
Integration Service does not store the session state
of operation by default.

The Integration Service also saves relational target


recovery information in target database tables.
When the Integration Service performs recovery, it
restores the state of operation to recover the
session from the point of interruption. It uses the
target recovery data to determine how to recover
the target tables
You can configure the session to save the
session state of operation even if you do not
save the workflow state of operation.
You can recover the session, or you can
recover the workflow from the session.
The session state of operation includes the following
information:
Source - If the output from a source is not
deterministic and repeatable, the Integration Service
saves the result from the SQL query to a shared
storage file in $PMStorageDir.
Transformation - The Integration Service creates
checkpoints in $PMStorageDir to determine where to
start processing the pipeline when it runs a recovery
session. When you run a session with an incremental
Aggregator transformation, the Integration Service
creates a backup of the Aggregator cache files in
$PMCacheDir at the beginning of a session run. The
Integration Service promotes the backup cache to
the initial cache at the beginning of a session
recovery run.
Relational target recovery data - The Integration
Service writes recovery information to recovery
tables in the target database to determine the
last row committed to the target when the session
was interrupted
Target Recovery Tables
When the Integration Service runs a session that has
a resume recovery strategy, it writes to recovery
tables on the target database system.
When the Integration Service recovers the session, it
uses information in the recovery tables to
determine where to begin loading data to target
tables.
The Integration Service creates the following
recovery tables in the target database:
PM_RECOVERY - Contains target load information
for the session run. The Integration Service removes
the information from this table after each successful
session and initializes the information at the
beginning of subsequent sessions.
PM_TGT_RUN_ID - Contains information the
Integration Service uses to identify each target on
the database. The information remains in the table

Session State of Operation


When you configure the session recovery strategy to
resume from the last checkpoint, the Integration
Service stores the session state of operation in the
shared location, $PMStorageDir.

54

between session runs. If you manually create this


table, you must create a row and enter a value other
than zero for LAST_TGT_RUN_ID to ensure that the
session recovers successfully.
PM_REC_STATE
Contains
information
the
Integration Service uses to determine if it needs to
write messages to the target table during recovery
for a real-time session.
Task Recovery Strategies
Each task in a workflow has a recovery strategy.
When the Integration Service recovers a workflow, it
recovers tasks based on the recovery strategy:
Restart task - When the Integration Service
recovers a workflow, it restarts each recoverable
task that is configured with a restart strategy. You
can configure Session and Command tasks with a
restart recovery strategy. All other tasks have a
restart recovery strategy by default.
Fail task and continue workflow - When the
Integration Service recovers a workflow, it does not
recover the task. The task status becomes failed,
and the Integration Service continues running the
workflow. Configure a fail recovery strategy if you
want to complete the workflow, but you do not want
to recover the task. You can configure Session and
Command tasks with the fail task and continue
workflow recovery strategy.
Resume from the last checkpoint - The
Integration Service recovers a stopped, aborted, or
terminated session from the last checkpoint. You can
configure a Session task with a resume strategy.

Automatically Recovering Terminated Tasks


When you have the high availability option, you can
configure automatic recovery of terminated tasks.
When you enable automatic task recovery, the
Integration Service recovers terminated Session

and Command tasks without user intervention if the


workflow is still running.
You configure the number of times the Integration
Service attempts to recover the task.
Enable automatic task recovery in the workflow
properties
Resuming Sessions
When the Integration Service resumes a session, the
recovery session must produce the same data as
the original session.
The session is not valid if you configure recovery to
resume from the last checkpoint, but the session
cannot produce repeatable data
When you recover a session from the last
checkpoint, the Integration Service restores the
session state of operation to determine the type of
recovery it can perform:
Incremental - The Integration Service starts
processing data at the point of interruption. It does
not read or transform rows that it processed before
the interruption. By default, the Integration Service
attempts to perform incremental recovery.
Full - The Integration Service reads all source rows
again and performs all transformation logic if it
cannot
perform
incremental
recovery.
The
Integration Service begins writing to the target at
the last commit point. If any session component
requires full recovery, the Integration Service
performs full recovery on the session

Working with Repeatable Data


When you configure recovery to resume from the
last checkpoint, the recovery session must be able
to produce the same data in the same order as the
original session.

55

When you validate a session, the Workflow Manager


verifies that the transformations are configured to
produce repeatable and deterministic data.
The session is not valid if you configure recovery to
resume from the last checkpoint, but the
transformations are not configured for repeatable
and deterministic data.
Session data is repeatable when all targets receive
repeatable data from the following mapping
objects:
Source - The output data from the source is
repeatable between the original run and the
recovery run.
Transformation - The output data from each
transformation to the target is repeatable.

The Lookup transformation produces repeatable


data because it receives repeatable data from the
Sorter transformation.
The
following
table
describes
when
transformations produce repeatable data:

Configuring a Mapping for Recovery


You
can
configure
a mapping
to
enable
transformations in the session to produce the same
data between the session and recovery run.
When a mapping contains a transformation that
never produces repeatable data, you can add a
transformation that always produces repeatable
data immediately after it.

The mapping contains two Source Qualifier


transformations that produce repeatable data.
The mapping contains a Union and Custom
transformation
that
never
produce
repeatable data.
The
Lookup
transformation
produces
repeatable data when it receives repeatable
data.
Therefore, the target does not receive repeatable
data and you cannot configure the session to
resume recovery.
You can modify the mapping to enable resume
recovery.
Add a Sorter transformation configured for
distinct output rows immediately after the
transformations that never output repeatable
data. Add the Sorter transformation after the
Custom transformation.

56

You can configure the Output is Repeatable and


Output is Deterministic properties for the following
transformations, or you can add a transformation
that produces repeatable data immediately after
these transformations:
- Application Source Qualifier
- Custom
- External Procedure
- Source Qualifier, relational
- Stored Procedure
Steps to Recover Workflows and Tasks
You can use one of the following methods to recover
a workflow or task:
Recover a workflow - Continue processing the
workflow from the point of interruption.
Recover a session - Recover a session but not the
rest of the workflow.
Recover a workflow from a session - Recover a
session and continue processing a workflow
If you want to restart a workflow or task without
recovery, you can restart the workflow or task in
cold start mode.
Rules and Guidelines for Session Recovery
Configuring Recovery to Resume from the Last
Checkpoint
Use the following rules and guidelines when
configuring recovery to resume from last checkpoint:
- You must use pass-through partitioning for each
transformation.
- You cannot configure recovery to resume from the
last checkpoint for a session that runs on a grid.
- When you configure a session for full pushdown
optimization, the Integration Service runs the
session on the database. As a result, it cannot
perform incremental recovery if the session fails.
When you perform recovery for sessions that contain
SQL overrides, the Integration Service must drop
and recreate views.
- When you modify a workflow or session between
the interrupted run and the recovery run, you might
get unexpected results. The Integration Service does
not prevent recovery for a modified workflow. The
recovery workflow or session log displays a message
when the workflow or the task is modified since last
run.
- The pre-session command and pre-SQL commands
run only once when you resume a session from the
last checkpoint. If a pre- or post- command or SQL
command fails, the Integration Service runs the

command again during recovery. Design the


commands so you can rerun them.
- You cannot configure a session to resume if it
writes to a relational target in bulk mode
Unrecoverable Workflows or Tasks
In some cases, the Integration Service cannot
recover a workflow or task. You cannot recover a
workflow or task under the following circumstances:
You change the number of partitions - If you
change the number of partitions after a session fails,
the recovery session fails.
The interrupted task has a fail recovery
strategy - If you configure a Command or Session
recovery to fail and continue the workflow recovery,
the task is not recoverable.
Recovery storage file is missing - The
Integration Service fails the recovery session or
workflow if the recovery storage file is missing from
$PMStorageDir or if the definition of $PMStorageDir
changes between the original and recovery run.
Recovery table is empty or missing from the
target database - The Integration Service fails a
recovery session under the following circumstances:
- You deleted the table after the Integration Service
created it.
- The session enabled for recovery failed
immediately after the Integration Service removed
the recovery information from the table.
You might get inconsistent data if you perform
recovery under the following circumstances:
The sources or targets change after the initial
session - If you drop or create indexes or edit data
in the source or target tables before recovering a
session, the Integration Service may return missing
or repeat rows.
The source or target code pages change after
the initial session failure - If you change the
source or target code page, the Integration Service
might return incorrect data. You can perform
recovery if the code pages are two-way compatible
with the original code pages.
Stopping and Aborting
When you stop a workflow, the Integration Service
tries to stop all the tasks that are currently running
in the workflow
If it cannot stop the workflow, you need to abort the
workflow.
The Integration Service can stop the following tasks
completely:
- Session
- Command

57

- Timer
- Event-Wait
- Worklet

Aggregator transformation - You cannot use an


incremental aggregation in a concurrent workflow.
The session fails.
Lookup transformation - Use the following rules
and guidelines for Lookup transformations in
concurrent workflows:
- You can use static or dynamic lookup cache with
concurrent workflows.
- When the cache is non-persistent, the Integration
Service adds the workflow run ID as a prefix to the
cache file name.
- When the cache is an unnamed persistent cache,
the Integration Service adds the run instance name
as a prefix to the cache file name.
- If the cache is a dynamic, unnamed, persistent
cache and the current workflow is configured to
allow concurrent runs with the same instance name,
the session fails.
- If the lookup cache name is parameterized, the
Integration Service names the cache file with the
parameter value. Pass a different file name for each
run instance.
Sequence Generator transformation - To avoid
generating the same set of sequence numbers for
concurrent workflows, configure the number of
cached values in the Sequence Generator
transformation.

When you stop a Command task that contains


multiple commands, the Integration Service finishes
executing the current command and does not run
the rest of the commands.
The Integration Service cannot stop tasks such as
the Email task.
For example, if the Integration Service has already
started sending an email when you issue the stop
command, the Integration Service finishes sending
the email before it stops running the workflow.
The Integration Service aborts the workflow if the
Repository Service process shuts down
Concurrent Workflows
Use the following rules and guidelines for concurrent
workflows:
- You cannot reference workflow run instances in
parameter files. To use separate parameters for
each instance, you must configure different
parameter files.
- If you use the same cache file name for more than
one concurrent workflow instance, each workflow
instance will be valid. However, sessions will fail if
conflicts occur writing to the cache.
- You can use pmcmd to restart concurrent
workflows by run ID or instance name.
- If you configure multiple instances of a workflow
and you schedule the workflow, the Integration
Service runs all instances at the scheduled time. You
cannot run instances on separate schedules.
- Configure a worklet to run concurrently on the
worklet General tab.
- You must enable a worklet to run concurrently if
the parent workflow is enabled to run concurrently.
Otherwise the workflow is invalid.
- You can enable a worklet to run concurrently and
place it in two non-concurrent workflows. The
Integration Service can run the two worklets
concurrently.
- Two workflows enabled to run concurrently can run
the same worklet. One workflow can run two
instances of the same worklet if the worklet has no
persisted variables.
- A session in a worklet can run concurrently with a
session in another worklet of the same instance
name when the session does not contain persisted
variables.

Grid Processing
Rules and Guidelines for Configuring a Workflow or
Session to Run on a Grid
- To run sessions over the grid, verify that the
operating system and bit mode is the same for each
node of the grid. A session might not run on the grid
if the nodes run on different operating systems or bit
modes.
- If you override a service process variable, ensure
that the Integration Service can access input files,
caches, logs, storage and temporary directories, and
source and target file directories.
- To ensure that a Session, Command, or predefined
Event-Wait task runs on a particular node, configure
the Integration Service to check resources and
specify a resource requirement for a the task.
- To ensure that session threads for a mapping
object run on a particular node, configure the
Integration Service to check resources and specify a
resource requirement for the object.
- When you run a session that creates cache files,
configure the root and cache directory to use a
shared location to ensure consistency between
cache files.
- Ensure the Integration Service builds the cache in a
shared location when you add a partition point at a
Joiner transformation and the transformation is

The following transformations have restrictions with


concurrent workflows:

58

configured for 1:n partitioning. The cache for the


Detail pipeline must be shared.
- Ensure the Integration Service builds the cache in a
shared location when you add a partition point at a
Lookup transformation, and the partition type is not
hash auto-keys.
- When you run a session that uses dynamic
partitioning, and you want to distribute session
threads across all nodes in the grid, configure
dynamic partitioning for the session to use the
Based on number of nodes in the grid method.
- You cannot run a debug session on a grid.
- You cannot configure a resume recovery strategy
for a session that you run on a grid.
- Configure the session to run on a grid when you
work with sessions that take a long time to run.
- Configure the workflow to run on a grid when you
have multiple concurrent sessions.
- You can run a persistent profile session on a grid,
but you cannot run a temporary profile session on a
grid.
When
you
use
a
Sequence
Generator
transformation, increase the number of cached
values to reduce the communication required
between the master and worker DTM processes and
the repository.
- To ensure that the Log Viewer can accurately order
log events when you run a workflow or session on a
grid, use time synchronization software to ensure
that the nodes of a grid use a synchronized
date/time.
- If the workflow uses an Email task in a Windows
environment, configure the same Microsoft Outlook
profile on each node to ensure the Email task can
run.
Workflow Variables
Use the following types of workflow variables:
Predefined workflow variables - The Workflow
Manager provides predefined workflow variables for
tasks within a workflow.
User-defined workflow variables - You create
user-defined workflow variables when you create a
workflow.
Use workflow variables when you configure the
following types of tasks:
Assignment tasks - Use an Assignment task to
assign a value to a user-defined workflow variable.
For example, you can increment a user-defined
counter variable by setting the variable to its current
value plus 1.
Decision tasks - Decision tasks determine how the
Integration Service runs a workflow. For example,

use the Status variable to run a second session only


if the first session completes successfully.
Links - Links connect each workflow task. Use
workflow variables in links to create branches in the
workflow. For example, after a Decision task, you
can create one link to follow when the decision
condition evaluates to true, and another link to
follow when the decision condition evaluates to
false.
Timer tasks - Timer tasks specify when the
Integration Service begins to run the next task in the
workflow. Use a user-defined date/time variable to
specify the time the Integration Service starts to run
the next task
Print 161-163
Workflow Variable Start and Current Values
Conceptually, the Integration Service holds two
different values for a workflow variable during a
workflow run:
- Start value of a workflow variable
- Current value of a workflow variable
The Integration Service looks for the start
value of a variable in the following order:
1. Value in parameter file
2. Value saved in the repository (if the variable is
persistent)
3. User-specified default value
4. Datatype default value
Parameters and Variables in Sessions
Use user-defined session parameters in session or
workflow properties and define the values in a
parameter file
In the parameter file, folder and session
names are case sensitive
User-defined session parameters do not have default
values, so you must define them in a parameter
file.
If the Integration Service cannot find a value
for a user-defined session parameter, it fails
the session, takes an empty string as the
default value, or fails to expand the
parameter at run time
You can run a session with different parameter files
when you use pmcmd to start a session.
The parameter file you set with pmcmd overrides
the parameter file in the session or workflow
properties

59

You cannot define built-in session parameter


values in the parameter file. The Integration
Service expands these parameters when the
session runs.
Rules and Guidelines for Creating File Parameters
and Database Connection Parameters
- When you define the parameter file as a resource
for a node, verify the Integration Service runs the
session on a node that can access the parameter
file. Define the resource for the node, configure the
Integration Service to check resources, and edit the
session to require the resource.
- When you create a file parameter, use
alphanumeric and underscore characters. For
example, to name a source file parameter, use
$InputFileName, such as $InputFile_Data.
- All session file parameters of a particular type must
have distinct names. For example, if you create two
source file parameters, you might name them
$SourceFileAccts and $SourceFilePrices.
- When you define the parameter in the file, you can
reference any directory local to the Integration
Service.
- Use a parameter to define the location of a file.
Clear the entry in the session properties that define
the file location. Enter the full path of the file in the
parameter file.
- You can change the parameter value in the
parameter file between session runs, or you can
create multiple parameter files. If you use multiple
parameter files, use the pmcmd Startworkflow
command with the -paramfile or -localparamfile
options to specify which parameter file to use.
Use the following rules and guidelines when you
create database connection parameters:
- You can change connections for relational sources,
targets, lookups, and stored procedures.
- When you define the parameter, you can reference
any database connection in the repository.
- Use the same $DBConnection parameter for more
than one connection in a session.

If there are no assigned values, the Integration


Service uses the initial values defined in the
mapping.
Assigning Parameter and Variable Values in a
Session
You can update the values of certain parameters and
variables before or after a non-reusable session
runs
Note: You cannot assign parameters and variables in
reusable sessions
You can update the following types of parameters
and variables before or after a session runs:
Pre-session variable assignment - You can
update mapping parameters, mapping variables,
and session parameters before a session runs. You
can assign these parameters and variables the
values of workflow or worklet variables in the parent
workflow or worklet. Therefore, if a session is in a
worklet within a workflow, you can assign values
from the worklet variables, but not the workflow
variables.
You cannot update mapplet variables in the presession variable assignment.
Post-session on success variable assignment You can update workflow or worklet variables in the
parent workflow or worklet after the session
completes successfully. You can assign these
variables the values of mapping parameters and
variables.
Post-session on failure variable assignment You can update workflow or worklet variables in the
parent workflow or worklet when the session fails.
You can assign these variables the values of
mapping parameters and variables.
Passing Parameter and Variable Values between
Sessions
To pass the mapping variable value from
s_NewCustomers to s_MergeCustomers, complete
the following steps:
1. Configure the mapping associated with session
s_NewCustomers to use a mapping variable, for
example, $ $Count1.
2. Configure the mapping associated with session
s_MergeCustomers to use a mapping variable, for
example, $$Count2.
3. Configure the workflow to use a user-defined
workflow variable, for example, $$PassCountValue.
4. Configure session s_NewCustomers to assign the
value of mapping variable $$Count1 to workflow
variable $$PassCountValue after the session
completes successfully.

Mapping Parameters and Variables in Sessions


If you use mapping variables in a session, you can
clear any of the variable values saved in the
repository by editing the session.
When you clear the variable values, the Integration
Service uses the values in the parameter file the
next time you run a session.
If the session does not use a parameter file, the
Integration Service uses the values assigned in the
pre-session variable assignment.

60

5. Configure session s_MergeCustomers to assign


the value of workflow variable $$PassCountValue to
mapping variable $$Count2 before the session
starts.
Parameter Files
A parameter file is a list of parameters and variables
and their associated values.
These values define properties for a service, service
process, workflow, worklet, or session.
The Integration Service applies these values when
you run a workflow or session that uses the
parameter file
The Integration Service reads the parameter file at
the start of the workflow or session to determine
the start values for the parameters and variables
defined in the file
Consider the following information when you use
parameter files:
Types of parameters and variables - You can
define different types of parameters and variables in
a parameter file. These include service variables,
service process variables, workflow and worklet
variables, session parameters, and mapping
parameters and variables.
Properties you can set in parameter files - Use
parameters and variables to define many properties
in the Designer and Workflow Manager. For example,
you can enter a session parameter as the update
override for a relational target instance, and set this
parameter to the UPDATE statement in the
parameter file. The Integration Service expands the
parameter when the session runs.
Parameter file structure - Assign a value for a
parameter or variable in the parameter file by
entering the parameter or variable name and value
on a single line in the form name=value. Groups of
parameters and variables must be preceded by a
heading that identifies the service, service process,
workflow, worklet, or session to which the
parameters or variables apply.
Parameter file location - Specify the parameter file
to use for a workflow or session. You can enter the
parameter file name and directory in the workflow or
session properties or in the pmcmd command line.
Parameter and Variable Types
A parameter file can contain different types of
parameters and variables. When you run a session
or workflow that uses a parameter file, the
Integration Service reads the parameter file and

expands the parameters and variables defined in


the file.
You can define the following types of parameter and
variable in a parameter file:
Service variables - Define general properties for
the Integration Service such as email addresses, log
file
counts,
and
error
thresholds.
$PMSuccessEmailUser, $PMSessionLogCount, and
$PMSessionErrorThreshold are examples of service
variables. The service variable values you define in
the parameter file override the values that are set in
the Administrator tool.
Service process variables - Define the directories
for Integration Service files for each Integration
Service process. $PMRootDir, $PMSessionLogDir, and
$PMBadFileDir are examples of service process
variables. The service process variable values you
define in the parameter file override the values that
are set in the Administrator tool. If the Integration
Service uses operating system profiles, the
operating system user specified in the operating
system profile must have access to the directories
you define for the service process variables.
Workflow variables - Evaluate task conditions and
record information in a workflow. For example, you
can use a workflow variable in a Decision task to
determine whether the previous task ran properly. In
a
workflow,
$TaskName.PrevTaskStatus
is
a
predefined workflow variable and $$VariableName is
a user-defined workflow variable.
Worklet variables - Evaluate task conditions and
record information in a worklet. You can use
predefined worklet variables in a parent workflow,
but you cannot use workflow variables from the
parent workflow in a worklet. In a worklet,
$TaskName.PrevTaskStatus is a predefined worklet
variable and $$VariableName is a user-defined
worklet variable.
Session parameters - Define values that can
change from session to session, such as database
connections or file names. $PMSessionLogFile and
$ParamName are user-defined session parameters.
Mapping parameters - Define values that remain
constant throughout a session, such as state sales
tax rates. When declared in a mapping or mapplet, $
$ParameterName is a user-defined mapping
parameter.
Mapping variables - Define values that can
change during a session. The Integration Service
saves the value of a mapping variable to the
repository at the end of each successful session run
and uses that value the next time you run the
session. When declared in a mapping or mapplet, $
$VariableName is a mapping variable.

61

You cannot define the following types of variables in


a parameter file:
$Source and $Target connection variables Define the database location for a relational source,
relational target, lookup table, or stored procedure.
Email variables - Define session information in an
email message such as the number of rows loaded,
the session completion time, and read and write
statistics.
Local variables - Temporarily store data in variable
ports in Aggregator, Expression, and Rank
transformations.
Built-in variables - Variables that return run-time
or system information, such as Integration Service
name or system date.
Transaction control variables - Define conditions
to commit or rollback transactions during the
processing of database rows.
ABAP program variables - Represent SAP
structures, fields in SAP structures, or values in the
ABAP program.
Parameter File Structure
Warning: The Integration Service uses the period
character (.) to qualify folder, workflow, and session
names when you run a workflow with a parameter
file. If the folder name contains a period (.), the
Integration Service cannot qualify the names
properly and fails the workflow.
You can define parameters and variables in any
section in the parameter file.
If you define a service or service process variable in
a workflow, worklet, or session section, the variable
applies to the service process that runs the task.
Similarly, if you define a workflow variable in a
session section, the value of the workflow variable
applies only when the session runs
The following table describes the parameter file
headings that define each section in the parameter
file and the scope of the parameters and variables
that you define in each section:

If you specify the same heading more than


once in a parameter file, the Integration
Service uses the information in the section
below the first heading and ignores the
information in the sections below subsequent
identical headings.
If you define the same parameter or variable
in multiple sections in the parameter file, the
parameter or variable with the smallest
scope takes precedence over parameters or
variables with larger scope. For example, a
parameter file contains the following sections:
[HET_TGTS.WF:wf_TGTS_ASC_ORDR]
$DBConnection_ora=Ora2
[HET_TGTS.WF:wf_TGTS_ASC_ORDR.ST:s_TGTS_ASC_
ORDR]
$DBConnection_ora=Ora3
In session s_TGTS_ASC_ORDR, the value for session
parameter $DBConnection_ora is Ora3. In all other
sessions in the workflow, it is Ora2.
Using Variables to Specify Session Parameter Files
When you define a workflow parameter file and a
session parameter file for a session within the
workflow, the Integration Service uses the workflow
parameter file, and ignores the session parameter
file.
To use a variable to define the session parameter file
name, you must define the session parameter file
name and set $PMMergeSessParamFile=TRUE in
the workflow parameter file.
The $PMMergeSessParamFile property causes the
Integration Service to read both the session and
workflow parameter files.
For example, you configured a workflow to run two
concurrent instances that contains three sessions:

62

The following command starts workflowA using the


parameter file, myfile.txt:
pmcmd
startworkflow
-uv
USERNAME
-pv
PASSWORD -s SALES:6258 -f east -w wSalesAvg
-paramfile '\$PMRootDir/myfile.txt' workflowA
The following command starts taskA using the
parameter file, myfile.txt:
pmcmd starttask -uv USERNAME -pv PASSWORD -s
SALES:6258 -f east -w wSalesAvg -paramfile '\
$PMRootDir/myfile.txt' taskA
Guidelines for Creating Parameter Files
List
all
session
parameters
Session
parameters do not have default values. If the
Integration Service cannot find a value for a
session parameter, it may fail the session,
take an empty string as the default value, or
fail to expand the parameter at run time.
Session parameter names are not case
sensitive.
List all necessary mapping parameters and
variables - Mapping parameter and variable values
become start values for parameters and variables in
a mapping. Mapping parameter and variable names
are not case sensitive.
Enter folder names for non-unique session
names - When a session name exists more than
once in a repository, enter the folder name to
indicate the location of the session.
Precede parameters and variables in mapplets
with the mapplet name - Use the following
format:
mapplet_name.parameter_name=value
mapplet2_name.variable_name=value
Use multiple parameter files - You assign
parameter files to workflows, worklets, and sessions
individually. You can specify the same parameter file
for all of these tasks or create multiple parameter
files.
When defining parameter values, do not use
unnecessary line breaks or spaces - The
Integration Service interprets additional spaces as
part of a parameter name or value. Use correct date
formats for datetime values. Use the following date
formats for datetime values:
- MM/DD/RR
- MM/DD/YYYY
- MM/DD/RR HH24:MI
- MM/DD/YYYY HH24:MI
- MM/DD/RR HH24:MI:SS
- MM/DD/YYYY HH24:MI:SS
- MM/DD/RR HH24:MI:SS.MS

Create workflow variables to store the session


parameter file names. For example, you create
user-defined
workflow
variables
$
$s_1ParamFileName, $$s_2ParamFileName, and $
$s_3ParamFileName. In the session properties for
each session, set the parameter file name to a
workflow variable:

If you use a variable as the session parameter file


name, and you define the same parameter or
variable in both the session and workflow
parameter files, the Integration Service sets
parameter and variable values according to the
following rules:
- When a parameter or variable is defined in
the same section of the workflow and session
parameter files, the Integration Service uses
the value in the workflow parameter file.
- When a parameter or variable is defined in
both the session section of the session
parameter file and the workflow section of the
workflow parameter file, the Integration
Service uses the value in the session
parameter file.
Using a Parameter File with pmcmd
The -localparamfile option defines a parameter file
on a local machine that you can reference when
you do not have access to parameter files on the
Integration Service machine.

63

- MM/DD/YYYY HH24:MI:SS.MS
- MM/DD/RR HH24:MI:SS.US
- MM/DD/YYYY HH24:MI:SS.US
- MM/DD/RR HH24:MI:SS.NS
- MM/DD/YYYY HH24:MI:SS.NS
You can use the following separators: dash (-), slash
(/), backslash (\), colon (:), period (.), and space. The
Integration Service ignores extra spaces. You cannot
use one- or three-digit values for year or the HH12
format for hour.
Do not enclose parameter or variable values in
quotes - The Integration Service interprets
everything after the first equals sign as part
of the value.
Use a parameter or variable value of the
proper length for the error log table name
prefix - If you use a parameter or variable for the
error log table name prefix, do not specify a prefix
that exceeds 19 characters when naming Oracle,
Sybase, or Teradata error log tables. The error table
names can have up to 11 characters, and Oracle,
Sybase, and Teradata databases have a maximum
length of 30 characters for table names. The
parameter or variable name can exceed 19
characters
Troubleshooting Parameters and Parameter Files
I have a section in a parameter file for a
session, but the Integration Service does not
seem to read it.
Make sure to enter folder and session names as they
appear in the Workflow Manager. Also, use the
appropriate prefix for all user-defined session
parameters.
I am trying to use a source file parameter to
specify a source file and location, but the
Integration Service cannot find the source file.
Make sure to clear the source file directory in the
session
properties.
The
Integration
Service
concatenates the source file directory with the
source file name to locate the source file. Also,
make sure to enter a directory local to the
Integration Service and to use the appropriate
delimiter for the operating system.
I am trying to run a workflow with a parameter
file and one of the sessions keeps failing.
The session might contain a parameter that is not
listed in the parameter file. The Integration Service
uses the parameter file to start all sessions in the
workflow. Check the session properties, and then
verify that all session parameters are defined
correctly in the parameter file.

I ran a workflow or session that uses a


parameter file, and it failed. What parameter
and variable values does the Integration
Service use during the recovery run?
For
service
variables,
service
process
variables, session parameters, and mapping
parameters, the Integration Service uses the
values specified in the parameter file, if they
exist. If values are not specified in the
parameter file, then the Integration Service
uses the value stored in the recovery storage
file. For workflow, worklet, and mapping
variables, the Integration Service always uses
the value stored in the recovery storage file.
Tips for Parameters and Parameter Files
Use a single parameter file to group
parameter information for related sessions.
When sessions are likely to use the same database
connection or directory, you might want to include
them in the same parameter file. When connections
or directories change, you can update information
for all sessions by editing one parameter file.
Use pmcmd and multiple parameter files for
sessions with regular cycles
Sometimes you reuse session parameters in a cycle.
For example, you might run a session against a
sales database everyday, but run the same session
against sales and marketing databases once a
week. You can create separate parameter files for
each session run. Instead of changing the
parameter file in the session properties each time
you run the weekly session, use pmcmd to specify
the parameter file to use when you start the
session.
Use reject file and session log parameters in
conjunction with target file or target database
connection parameters.
When you use a target file or target database
connection parameter with a session, you can keep
track of reject files by using a reject file parameter.
You can also use the session log parameter to write
the session log to the target machine.
Use a resource to verify the session runs on a
node that has access to the parameter file.
In the Administrator tool, you can define a file
resource for each node that has access to the
parameter file and configure the Integration Service
to check resources. Then, edit the session that uses
the parameter file and assign the resource. When

64

you run the workflow, the Integration Service runs


the session with the required resource on a node
that has the resource available.

You can capture new source data - Use


incremental aggregation when you can capture new
source data each time you run the session. Use a
Stored Procedure or Filter transformation to process
new data.
Incremental changes do not significantly
change the target - Use incremental aggregation
when the changes do not significantly change the
target. If processing the incrementally changed
source alters more than half the existing target, the
session may not benefit from using incremental
aggregation. In this case, drop the table and
recreate the target with complete source data

You can override initial values of workflow


variables for a session by defining them in a
session section.
If a workflow contains an Assignment task that
changes the value of a workflow variable, the next
session in the workflow uses the latest value of the
variable as the initial value for the session. To
override the initial value for the session, define a
new value for the variable in the session section of
the parameter file.
You can define parameters and variables using
other parameters and variables.
For example, in the parameter file, you can define
session parameter $PMSessionLogFile using a
service process variable as follows:
$PMSessionLogFile=$PMSessionLogDir/TestRun.txt

Incremental Aggregation
If the source changes incrementally and you can
capture changes, you can configure the session to
process those changes.
This allows the Integration Service to update the
target incrementally, rather than forcing it to
process the entire source and recalculate the same
data each time you run the session.
For example, you might have a session using a
source that receives new data every day.
You can capture those incremental changes because
you have added a filter condition to the mapping
that removes pre-existing data from the flow of
data. You then enable incremental aggregation

When the session runs with incremental aggregation


enabled for the first time on March 1, you use the
entire source.
This allows the Integration Service to read and store
the necessary aggregate data. On March 2, when
you run the session again, you filter out all the
records except those time-stamped March 2.
The Integration Service then processes the new data
and updates the target accordingly.
Consider using incremental
following circumstances:

aggregation

Integration Service Processing for Incremental


Aggregation
The first time you run an incremental aggregation
session, the Integration Service processes the
entire source.
At the end of the session, the Integration Service
stores aggregate data from that session run in two
files, the index file and the data file.
The Integration Service creates the files in the cache
directory specified in the Aggregator transformation
properties.
Each subsequent time you run the session with
incremental aggregation, you use the incremental
source changes in the session.
For each input record, the Integration Service checks
historical information in the index file for a
corresponding group.
If it finds a corresponding group, the Integration
Service
performs
the
aggregate
operation
incrementally, using the aggregate data for that
group, and saves the incremental change.
If it does not find a corresponding group, the
Integration Service creates a new group and saves
the record data.
When writing to the target, the Integration Service
applies the changes to the existing target.
It saves modified aggregate data in the index and
data files to be used as historical data the next
time you run the session.
If the source changes significantly and you want the
Integration Service to continue saving aggregate
data for future incremental changes, configure the
Integration Service to overwrite existing aggregate
data with new aggregate data.

in the
Each subsequent time you run a session with
incremental aggregation, the Integration Service

65

creates a backup of the incremental aggregation


files.
The
cache
directory
for
the
Aggregator
transformation must contain enough disk space for
two sets of the files.

High Precision 28 digits

When you partition a session that uses incremental


aggregation, the Integration Service creates one
set of cache files for each partition.
The
Integration
Service
creates
new
aggregate data, instead of using historical
data, when you perform one of the following
tasks:
- Save a new version of the mapping.
- Configure the session to reinitialize the aggregate
cache.
- Move the aggregate files without correcting the
configured path or directory for the files in the
session properties.
- Change the configured path or directory for the
aggregate files without moving the files to the new
location.
- Delete cache files.
- Decrease the number of partitions.
When the Integration Service rebuilds incremental
aggregation files, the data in the previous files is
lost.
Note: To protect the incremental aggregation files
from file corruption or disk failure, periodically back
up the files
Reinitializing the Aggregate Files
If the source tables change significantly, you might
want the Integration Service to create new
aggregate data, instead of using historical data.
For example, you can reinitialize the aggregate
cache if the source for a session changes
incrementally every day and completely changes
once a month.
When you receive the new source data for the
month, you might configure the session to
reinitialize the aggregate cache, truncate the
existing target, and use the new source table
during the session.
After you run a session that reinitializes the
aggregate cache, edit the session properties to
disable the Reinitialize Aggregate Cache option.
Avoid moving or modifying the index and data files
that store historical aggregate information.
If you move the files into a different directory, the
Integration Service rebuilds the files the next time
you run the session

66

Change the row type - For example, the Update


Strategy transformation is active because it flags
rows for insert, delete, update, or reject
The Designer does not allow you to connect multiple
active transformations or an active and a passive
transformation to the same downstream
transformation or transformation input group
because the Integration Service may not be able to
concatenate the rows passed by active
transformations.
The Sequence Generator transformation is an
exception to the rule.
The Designer does allow you to connect a Sequence
Generator transformation and an active
transformation to the same downstream
transformation or transformation input group.
A Sequence Generator transformation does not
receive data. It generates unique numeric values

6. TRANSFORMATIONS
Active Transformations
An active transformation can perform any of the
following actions:
Change the number of rows that pass through
the transformation - For example, the Filter
transformation is active because it removes rows
that do not meet the filter condition. All multi-group
transformations are active because they might
change the number of rows that pass through the
transformation.
Change the transaction boundary - For example,
the Transaction Control transformation is active
because it defines a commit or roll back transaction
based on an expression evaluated for each row.

67

Ports
Port name - Use the following conventions while
naming ports:
- Begin with a single- or double-byte letter or singleor double-byte underscore (_).
- Port names can contain any of the following singleor double-byte characters: a letter, number,
underscore (_), $, #, or @.

All multi-group transformations are active


transformations. You cannot connect multiple
active transformations or an active and a passive
transformation to the same downstream
transformation or transformation input group.
Some multiple input group transformations require
the Integration Service to block data at an input
group while the Integration Service waits for a row
from a different input group.
A blocking transformation is a multiple input group
transformation that blocks incoming data.

Creating a Transformation
You can create transformations using the following
Designer tools:
Mapping Designer - Create transformations that
connect sources to targets. Transformations in a
mapping cannot be used in other mappings unless
you configure them to be reusable.
Transformation Developer - Create individual
transformations, called reusable transformations
that use in multiple mappings.
Mapplet Designer - Create and configure a set of
transformations, called mapplets, that you use in
multiple mapping

The following transformations are blocking


transformations:
- Custom transformation with the Inputs May
Block property enabled
- Joiner transformation configured for
unsorted input
The Designer performs data flow validation when
you save or validate a mapping.
Some mappings that contain active or blocking
transformations might not be valid.

68

Using Expression Editor - The maximum


number of characters that you can include in
an expression is 32,767

Adding Expressions to a Port - In the Data


Masking transformation, you can add an
expression to an input port.
For all other transformations, add the expression to
an output port.
Guidelines for Configuring Variable Ports
Consider the following factors when you configure
variable ports in a transformation:
Port order - The Integration Service evaluates ports
by dependency. The order of the ports in a
transformation must match the order of evaluation:
input ports, variable ports, output ports.
Data type - The datatype you choose reflects the
return value of the expression you enter.
Variable initialization - The Integration Service
sets initial values in variable ports, where you can
create counters.
Since variables can reference other variables,
the display order for variable ports is the
same as the order in which the Integration
Service evaluates each variable

Using Default Values for Ports


All transformations use default values that
determine how the Integration Service handles
input null values and output transformation errors.
Input, output, and input/output ports are created
with a system default value that you can
sometimes override with a user-defined default
value.
Default values have different functions in different
types of ports:
Input port - The system default value for null input
ports is NULL. It displays as a blank in the
transformation. If an input value is NULL, the
Integration Service leaves it as NULL.
Output port - The system default value for output
transformation errors is ERROR. The default value
appears in the transformation as
ERROR(transformation error). If a transformation
error occurs, the Integration Service skips the row.
The Integration Service notes all input rows skipped
by the ERROR function in the session log file.
The following errors are considered
transformation errors:
- Data conversion errors, such as passing a
number to a date function.
- Expression evaluation errors, such as
dividing by zero.
- Calls to an ERROR function.

The display order for output ports does not matter


since output ports cannot reference other output
ports. Be sure output ports display at the bottom of
the list of ports.

Input/output port - The system default value for


null input is the same as input ports, NULL. The
system default value appears as a blank in the
transformation. The default value for output
transformation errors is the same as output ports.
The default value for output transformation errors
does not display in the transformation.

Variable Initialization
The Integration Service does not set the initial value
for variables to NULL. Instead, the Integration
Service uses the following guidelines to set initial
values for variables:
- Zero for numeric ports
- Empty strings for string ports
- 01/01/1753 for Date/Time ports with PMServer 4.0
date handling compatibility disabled
- 01/01/0001 for Date/Time ports with PMServer 4.0
date handling compatibility enabled

Note: The Java Transformation converts


PowerCenter datatypes to Java datatypes, based on
the Java Transformation port type. Default values for
null input differ based on the Java datatype.
The following table shows the system default
values for ports in connected transformations:

Therefore, use variables as counters, which need an


initial value. For example, you can create a numeric
variable with the following expression:
VAR1 + 1
This expression counts the number of rows in the
VAR1 port. If the initial value of the variable were
set to NULL, the expression would always evaluate
to NULL. This is why the initial value is set to zero.

69

Entering User-Defined Default Values


You can override the system default values with
user-defined default values for supported input,
input/output, and output ports within a connected
transformation:
Input ports - You can enter user-defined default
values for input ports if you do not want the
Integration Service to treat null values as NULL.
Output ports - You can enter user-defined default
values for output ports if you do not want the
Integration Service to skip the row or if you want the
Integration Service to write a specific message with
the skipped row to the session log.
Input/output ports - You can enter user-defined
default values to handle null input values for
input/output ports in the same way you can enter
user-defined default values for null input values for
input ports. You cannot enter user-defined default
values for output transformation errors in an
input/output port.
Note: The Integration Service ignores userdefined default values for unconnected
transformations. For example, if you call a Lookup
or Stored Procedure transformation through an
expression, the Integration Service ignores any
user-defined default value and uses the
system default value only.

Use the ABORT function to abort a session when the


Integration Service encounters null input values.
Entering User-Defined Default Output Values

70

General Rules for Default Values


Use the following rules and guidelines when you
create default values:
The default value must be a NULL, a constant value,
a constant expression, an ERROR function, or an
ABORT function.
For input/output ports, the Integration Service uses
default values to handle null input values. The
output default value of input/output ports is always
ERROR(Transformation Error).
Variable ports do not use default values.
You can assign default values to group by ports in
the Aggregator and Rank transformations.
Not all port types in all transformations allow userdefined default values. If a port does not allow userdefined default values, the default value field is
disabled.
Not all transformations allow user-defined default
values.
If a transformation is not connected to the mapping
data flow, the Integration Service ignores userdefined default values.
If any input port is unconnected, its value is
assumed to be NULL and the Integration
Service uses the default value for that input
port.
If an input port default value contains the ABORT
function and the input value is NULL, the
Integration Service immediately stops the session.
Use the ABORT function as a default value to
restrict null input values. The first null value in an
input port stops the session.
If an output port default value contains the ABORT
function and any transformation error occurs for
that port, the session immediately stops. Use the
ABORT function as a default value to enforce strict
rules for transformation errors. The first
transformation error for this port stops the session.
The ABORT function, constant values, and constant
expressions override ERROR functions configured in
output port expressions.

aggregation, it passes source data through the


mapping and uses historical cache data to perform
aggregation calculations incrementally.
Configuring Aggregator Transformation Properties
Modify the Aggregator Transformation properties on
the Properties tab.
Configure the following options:

Reusable Transformations
The Designer stores each reusable transformation as
metadata separate from any mapping that uses the
transformation.
If you review the contents of a folder in the
Navigator, you see the list of all reusable
transformations in that folder
You can create most transformations as a nonreusable or reusable.
However, you can only create the External
Procedure transformation as a reusable
transformation.
When you add instances of a reusable
transformation to mappings, you must be careful
that changes you make to the transformation do
not invalidate the mapping or generate unexpected
data
Instances and Inherited Changes
Note that instances do not inherit changes to
property settings, only modifications to
ports, expressions, and the name of the
transformation.

A. AGGREGATOR
The Integration Service performs aggregate
calculations as it reads and stores data group and
row data in an aggregate cache.
Its unlike the Expression transformation, in that you
use the Aggregator transformation to perform
calculations on groups.
The Expression transformation permits you to
perform calculations on a row-by-row basis.
After you create a session that includes an
Aggregator transformation, you can enable the
session option, Incremental Aggregation. When the
Integration Service performs incremental

Configuring Aggregate Caches


When you run a session that uses an Aggregator
transformation, the Integration Service creates the
index and the data caches in memory to process
the transformation.
If the Integration Service requires more space, it
stores overflow values in cache files.
You can configure the index and the data caches in
the Aggregator transformation or in the session
properties. Or,
You can configure the Integration Service to
determine the cache size at run time.
Note: The Integration Service uses memory to
process an Aggregator transformation with
sorted ports. The Integration Service does not use
cache memory. You do not need to configure
cache memory for Aggregator
transformations that use sorted ports.
The result of an aggregate expression varies based
on the group by ports in the transformation.
For example, when the Integration Service
calculates the following aggregate expression with

71

no group by ports defined, it finds the total quantity


of items sold:
SUM( QUANTITY )
However, if you use the same expression, and you
group by the ITEM port, the Integration Service
returns the total quantity of items sold, by item
Aggregate Functions
Use the following aggregate functions within an
Aggregator transformation. You can nest one
aggregate function within another aggregate
function.
The transformation language includes the following
aggregate functions:
- AVG COUNT FIRST LAST MAX MEDIAN - MIN
- PERCENTILE STDDEV - SUM
- VARIANCE

Nested Aggregate Functions


You can include multiple single-level or multiple
nested functions in different output ports in an
Aggregator transformation.
However, you cannot include both single-level and
nested functions in an Aggregator transformation.
Therefore, if an Aggregator transformation contains
a single-level function in any output port, you
cannot use a nested function in any other port in
that transformation.
When you include single-level and nested functions
in the same Aggregator transformation, the
Designer marks the mapping or mapplet invalid.
If you need to create both single-level and nested
functions create separate Aggregator
transformations.

Group By Ports
When you group values, the Integration Service
produces one row for each group.
If you do not group values, the Integration Service
returns one row for all input rows.
The Integration Service typically returns the
last row of each group (or the last row
received) with the result of the aggregation.
However, if you specify a particular row to be
returned (for example, by using the FIRST function),
the Integration Service then returns the specified
row

- The session uses incremental aggregation.


Tips for Aggregator Transformations
Limit connected input/output or output ports.
Limit the number of connected input/output or
output ports to reduce the amount of data the
Aggregator transformation stores in the data cache.
Filter the data before aggregating it.
If you use a Filter transformation in the mapping,
place the transformation before the Aggregator
transformation to reduce unnecessary aggregation.

B. CUSTOM TRANSFORMATION
Custom transformations operate in conjunction with
procedures you create outside of the Designer
interface to extend PowerCenter functionality.
You can create a Custom transformation and bind it
to a procedure that you develop using the Custom
transformation functions.
Use the Custom transformation to create
transformation applications, such as sorting
and aggregation, which require all input rows
to be processed before outputting any output
rows.
To support this process, the input and output
functions occur separately in Custom
transformations compared to External
Procedure transformations.
The Integration Service passes the input data
to the procedure using an input function.
The output function is a separate function
that you must enter in the procedure code to
pass output data to the Integration Service.
In contrast, in the External Procedure
transformation, an external procedure
function does both input and output, and its
parameters consist of all the ports of the
transformation.
You can also use the Custom transformation to
create a transformation that requires
multiple input groups, multiple output
groups, or both
Rules and Guidelines for Custom Transformations
- Custom transformations are connected
transformations. You cannot reference a Custom
transformation in an expression.
- You can include multiple procedures in one
module. For example, you can include an XML
writer procedure and an XML parser procedure in the
same module.

Sorted Input Conditions


Do not use sorted input if either of the following
conditions are true:
- The aggregate expression uses nested aggregate
functions.

72

- You can bind one shared library or DLL to multiple


Custom transformation instances if you write the
procedure code to handle multiple Custom
transformation instances.
- When you write the procedure code, you must
make sure it does not violate basic mapping rules.
- The Custom transformation sends and receives
high precision decimals as high precision decimals.
- Use multi-threaded code in Custom transformation
procedures
Creating Groups and Ports
You can create multiple input groups and multiple
output groups in a Custom transformation.
You must create at least one input group and
one output group
When you create a passive Custom
transformation, you can only create one input
group and one output group.
Working with Port Attributes
Ports have attributes, such as datatype and
precision. When you create a Custom
transformation, you can create user-defined port
attributes.
User-defined port attributes apply to all ports in a
Custom transformation.
For example, you create a external procedure to
parse XML data.
You can create a port attribute called XML path
where you can define the position of an element in
the XML hierarchy.
Custom Transformation Properties
Properties for the Custom transformation apply to
both the procedure and the transformation.
Configure the Custom transformation properties on
the Properties tab of the Custom transformation.
The following table describes the Custom
transformation properties:

Setting the Update Strategy


Use an active Custom transformation to set the
update strategy for a mapping at the following
levels:
Within the procedure - You can write the external
procedure code to set the update strategy for output
rows. The external procedure can flag rows for
insert, update, delete, or reject.
Within the mapping - Use the Custom
transformation in a mapping to flag rows for insert,
update, delete, or reject. Select the Update Strategy

73

Transformation property for the Custom


transformation.
Within the session - Configure the session to treat
the source rows as data driven.
If you do not configure the Custom
transformation to define the update strategy,
or you do not configure the session as data
driven, the Integration Service does not use
the external procedure code to flag the
output rows.
Instead, when the Custom transformation is
active, the Integration Service flags the
output rows as insert.
When the Custom transformation is passive,
the Integration Service retains the row type.
For example, when a row flagged for update enters a
passive Custom transformation, the Integration
Service maintains the row type and outputs the row
as update.
Working with Transaction Control
You can define transaction control for Custom
transformations using the following transformation
properties:
Transformation Scope - Determines how the
Integration Service applies the transformation logic
to incoming data.
Generate Transaction - Indicates that the
procedure generates transaction rows and outputs
them to the output groups.
The following table describes how the Integration
Service handles transaction boundaries at Custom
transformations:

Blocking Input Data


By default, the Integration Service concurrently
reads sources in a target load order group.
However, you can write the external procedure code
to block input data on some input groups.
Blocking is the suspension of the data flow into an
input group of a multiple input group
transformation.
To use a Custom transformation to block input
data, you must write the procedure code to
block and unblock data.
You must also enable blocking on the
Properties tab for the Custom transformation
Note: When the procedure blocks data and you
configure the Custom transformation as a nonblocking transformation, the Integration Service fails
the session
Validating Mappings with Custom Transformations
When you include a Custom transformation in a
mapping, both the Designer and Integration Service
validate the mapping.
The Designer validates the mapping you save or
validate and the Integration Service validates the
mapping when you run the session.
Validating at Design Time
When you save or validate a mapping, the
Designer performs data flow validation.
When the Designer does this, it verifies that the data
can flow from all sources in a target load order

74

group to the targets without blocking


transformations blocking all sources.
Validating at Runtime
When you run a session, the Integration Service
validates the mapping against the procedure
code at runtime.
When the Integration Service does this, it tracks
whether or not it allows the Custom
transformations to block data

C. DATA MASKING
The Data Masking transformation modifies source
data based on masking rules that you configure for
each column.
You can maintain data relationships in the masked
data and maintain referential integrity between
database tables.

The Data Masking transformation creates a default


seed value that is a random number between 1 and
1,000.
You can enter a different seed value or apply a
mapping parameter value.
Apply the same seed value to a column to return the
same masked data values in different source data.
Associated O/P
The Associated O/P is the associated output port for
an input port.
The Data Masking transformation creates an
output port for each input port.
The naming convention is out_<port name>. The
associated output port is a readonly port.

You can apply the following types of masking with


the Data Masking transformation:
Key masking - Produces deterministic results for
the same source data, masking rules, and seed
value.
Random masking - Produces random, nonrepeatable results for the same source data and
masking rules.
Expression masking - Applies an expression to a
port to change the data or create data.
Substitution - Replaces a column of data with
similar but unrelated data from a dictionary.
Special mask formats - Applies special mask
formats to change SSN, credit card number, phone
number, URL, email address, or IP addresses

You can configure the following masking rules for


key masking string values:
Seed - Apply a seed value to generate deterministic
masked data for a column. Select one of the
following options:
Value - Accept the default seed value or enter a
number between 1 and 1,000.
Mapping Parameter - Use a mapping parameter to
define the seed value. The Designer displays a list of
the mapping parameters that you create for the
mapping. Choose the mapping parameter from the
list to use as the seed value.
Mask Format - Define the type of character to
substitute for each character in the input data. You
can limit each character to an alphabetic, numeric,
or alphanumeric character type.

Locale
The locale identifies the language and region of the
characters in the data.
Choose a locale from the list.
The Data Masking transformation masks character
data with characters from the locale that you
choose.
The source data must contain characters that are
compatible with the locale that you select.

Seed
The seed value is a start number that enables
the Data Masking transformation to return
deterministic data with Key Masking.

75

Source String Characters - Define the characters


in the source string that you want to mask. For
example, mask the number sign (#) character
whenever it occurs in the input data. The Data
Masking transformation masks all the input
characters when Source String Characters is blank.
The Data Masking transformation does not always
return unique data if the number of source string
characters is less than the number of result string
characters.
Result String Characters - Substitute the
characters in the target string with the characters
you define in Result String Characters. For example,
enter the following characters to configure each
mask to contain all uppercase alphabetic characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Masking Numeric Values
Configure key masking for numeric source data to
generate deterministic output.
When you configure a column for numeric key
masking, the Designer assigns a random seed
value to the column.
When the Data Masking transformation masks the
source data, it applies a masking algorithm that
requires the seed.
You can change the seed value for a column to
produce repeatable results if the same source value
occurs in a different column.
For example, you want to maintain a primary-foreign
key relationship between two tables.
In each Data Masking transformation, enter the
same seed value for the primary-key column as the
seed value for the foreign-key column.
The Data Masking transformation produces
deterministic results for the same numeric values.
The referential integrity is maintained between the
tables.
Masking Datetime Values
The Data Masking transformation can mask dates
between 1753 and 2400 with key masking.
If the source year is in a leap year, the Data Masking
transformation returns a year that is also a leap
year.
If the source month contains 31 days, the Data
Masking transformation returns a month that has
31 days.
If the source month is February, the Data Masking
transformation returns February.
The Data Masking transformation always generates
valid dates
Masking with Mapping Parameters

The Integration Service applies a default seed value


in the following circumstances:
- The mapping parameter option is selected for a
column but the session has no parameter file.
- You delete the mapping parameter.
- A mapping parameter seed value is not between 1
and 1,000.
The Integration Service applies masked values from
the default value file.
You can edit the default value file to change the
default values.
The default value file is an XML file in the following
location:
<PowerCenter Installation
Directory>\infa_shared\SrcFiles\defaultValue.xml
The name-value pair for the seed is default_seed =
"500".
If the seed value in the default value file is not
between 1 and 1,000, the Integration Service
assigns a value of 725 to the seed and writes a
message in the session log.
Substitution Masking
Substitution masking replaces a column of data with
similar but unrelated data.
When you configure substitution masking, define the
relational or flat file dictionary that contains the
substitute values.
The Data Masking transformation performs a lookup
on the dictionary that you configure.
The Data Masking transformation replaces source
data with data from the dictionary. Substitution is
an effective way to replace production data with
realistic test data.
You can substitute data with repeatable or nonrepeatable values.
When you choose repeatable values, the Data
Masking transformation produces deterministic
results for the same source data and seed value.
You must configure a seed value to substitute data
with deterministic results.
The Integration Service maintains a storage table of
source and masked values for repeatable masking
Dictionaries
A dictionary is a flat file or relational table that
contains the substitute data and a serial number for
each row in the file.
The Integration Service generates a number to
retrieve a dictionary row by the serial number.

76

The Integration Service generates a hash key for


repeatable substitution masking or a random
number for non-repeatable masking.
You can configure an additional lookup condition if
you configure repeatable substitution masking.
You can configure a dictionary to mask more than
one port in the Data Masking transformation.
The following example shows a flat file dictionary
that contains first name and gender:
SNO,GENDER,FIRSTNAME
1,M,Adam
2,M,Adeel
3,M,Adil
4,F,Alice
5,F,Alison
Use the following rules and guidelines when you
create a dictionary:
- Each record in the dictionary must have a serial
number. The serial number does not have to be the
key in a relational table.
- The serial numbers are sequential integers starting
at one. The serial numbers cannot have a missing
number in the sequence.
- The serial number column can be anywhere in a
dictionary row. It can have any label.
- The first row of a flat file dictionary must have
column labels to identify the fields in each record.
The fields are separated by commas. If the first row
does not contain column labels, the Integration
Service takes the values of the fields in the first row
as column names.
- A flat file dictionary must be in the
$PMLookupFileDir lookup file directory. By default,
this directory is in the following location:
<PowerCenter_Installation_Directory>\server\infa_sh
ared\LkpFiles
- If you create a flat file dictionary on Windows and
copy it to a UNIX machine, verify that the file format
is correct for UNIX. For example, Windows and UNIX
use different characters for the end of line marker.
- If you configure substitution masking for more than
one port, all relational dictionaries must be in the
same database schema.
- You cannot change the dictionary type or the
substitution dictionary name in session properties.
Storage Tables
The Data Masking transformation maintains storage
tables for repeatable substitution between
sessions.
A storage table row contains the source column and
a masked value pair.

Each time the Integration Service masks a value


with a repeatable substitute value, it searches the
storage table by dictionary name, locale, column
name, input value, and seed.
If it finds a row, it returns the masked value from the
storage table to the Data Masking transformation.
If the Integration Service does not find a row, it
retrieves a row from the dictionary with a hash key.
Rules and Guidelines for Substitution Masking
- If a storage table does not exist for a repeatable
substitution mask, the session fails.
- If the dictionary contains no rows, the Integration
Service returns default masked values.
- When the Integration Service finds an input value
with the locale, dictionary, and seed in the storage
table, it retrieves the masked value, even if the row
is no longer in the dictionary.
- If you delete a connection object or modify the
dictionary, truncate the storage table. Otherwise,
you might get unexpected results.
- If the number of values in the dictionary is less
than the number of unique values in the source
data, the Integration Service cannot mask the data
with unique repeatable values. The Integration
Service returns default masked values.
Random Masking
Random masking generates random
nondeterministic masked data.
The Data Masking transformation returns different
values when the same source value occurs in
different rows.
You can define masking rules that affect the format
of data that the Data Masking transformation
returns.
Mask numeric, string, and date values with random
masking
Masking Numeric Values
When you mask numeric data, you can configure a
range of output values for a column.
The Data Masking transformation returns a value
between the minimum and maximum values of the
range depending on port precision.
To define the range, configure the minimum and
maximum ranges or a blurring range based on a
variance from the original source value.
You can configure the following masking parameters
for numeric data:
Range - Define a range of output values. The Data
Masking transformation returns numeric data
between the minimum and maximum values.

77

Blurring Range - Define a range of output values


that are within a fixed variance or a percentage
variance of the source data. The Data Masking
transformation returns numeric data that is close to
the value of the source data. You can configure a
range and a blurring range.
Masking String Values
Configure random masking to generate random
output for string columns.
To configure limitations for each character in the
output string, configure a mask format.
Configure filter characters to define which source
characters to mask and the characters to mask
them with.
You can apply the following masking rules for a
string port:
Range - Configure the minimum and maximum
string length. The Data Masking transformation
returns a string of random characters between the
minimum and maximum string length.
Mask Format - Define the type of character to
substitute for each character in the input data. You
can limit each character to an alphabetic, numeric,
or alphanumeric character type.
Source String Characters - Define the characters
in the source string that you want to mask. For
example, mask the number sign (#) character
whenever it occurs in the input data. The Data
Masking transformation masks all the input
characters when Source String Characters is blank.
Result String Replacement Characters Substitute the characters in the target string with
the characters you define in Result String
Characters. For example, enter the following
characters to configure each mask to contain
uppercase alphabetic characters A - Z:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Masking Date Values


To mask date values with random masking, either
configure a range of output dates or choose a
variance.
When you configure a variance, choose a part of the
date to blur.
Choose the year, month, day, hour, minute, or
second.
The Data Masking transformation returns a date that
is within the range you configure.

You can configure the following masking rules when


you mask a datetime value:

Range - Sets the minimum and maximum values to


return for the selected datetime value
Blurring - Masks a date based on a variance that
you apply to a unit of the date. The Data Masking
transformation returns a date that is within the
variance. You can blur the year, month, day, or hour.
Choose a low and high variance to apply.
Applying Masking Rules
Apply masking rules based on the source datatype.
When you click a column property on the Masking
Properties tab, the Designer displays masking rules
based on the datatype of the port.
The following table describes the masking rules that
you can configure based on the masking type and
the source datatype:

Source String Characters


Source string characters are source characters that
you choose to mask or not mask.
The position of the characters in the source string
does not matter.
The source characters are case sensitive.
You can configure any number of characters.
When Characters is blank, the Data Masking
transformation replaces all the source characters in
the column.
Select one of the following options for source string
characters:
Mask Only - The Data Masking transformation
masks characters in the source that you configure as
source string characters. For example, if you enter
the characters A, B, and c, the Data Masking
transformation replaces A, B, or c with a different
character when the character occurs in source data.

78

A source character that is not an A, B, or c does not


change. The mask is case sensitive.
Mask All Except - Masks all characters except the
source string characters that occur in the source
string. For example, if you enter the filter source
character - and select Mask All Except, the Data
Masking transformation does not replace the -
character when it occurs in the source data. The rest
of the source characters change.
Example
A source file has a column named Dependents. The
Dependents column contains more than one name
separated by commas. You need to mask the
Dependents column and keep the comma in the
test data to delimit the names. For the Dependents
column, select Source String Characters. Choose
Dont Mask and enter , as the source character to
skip

Result String Replacement Characters


Result string replacement characters are characters
you choose as substitute characters in the masked
data.
When you configure result string replacement
characters, the Data Masking transformation
replaces characters in the source string with the
result string replacement characters.
To avoid generating the same output for different
input values, configure a wide range of substitute
characters, or mask only a few source characters.
The position of each character in the string does not
matter.
Select one of the following options for result string
replacement characters:
Use Only - Mask the source with only the
characters you define as result string replacement
characters. For example, if you enter the characters
A, B, and c, the Data Masking transformation
replaces every character in the source column with
an A, B, or c. The word horse might be replaced
with BAcBA.
Use All Except - Mask the source with any
characters except the characters you define as
result string replacement characters. For example, if
you enter A, B, and c result string replacement
characters, the masked data never has the
characters A, B, or c.
Example
To replace all commas in the Dependents column
with semicolons, complete the following tasks:

1. Configure the comma as a source string character


and select Mask Only.
The Data Masking transformation masks only the
comma when it occurs in the Dependents column.
2. Configure the semicolon as a result string
replacement character and select Use Only.
The Data Masking transformation replaces each
comma in the Dependents column with a semicolon
Range
Define a range for numeric, date, or string data.
When you define a range for numeric or date
values the Data Masking transformation masks the
source data with a value between the minimum and
maximum values. When you configure a range for a
string, you configure a range of string lengths.
Blurring
Blurring creates an output value within a fixed or
percent variance from the source data value.
Configure blurring to return a random value that is
close to the original value. You can blur numeric
and date values

Special Mask Formats


The following types of masks retain the format of the
original data:
- Social Security numbers
- Credit card numbers
- Phone numbers
- URL addresses
- Email addresses
- IP addresses
The Data Masking transformation returns a masked
value that has a realistic format, but is not a valid
value.
For example, when you mask an SSN, the Data
Masking transformation returns an SSN that is the
correct format but is not valid.
You can configure repeatable masking for Social
Security numbers
When the source data format or datatype is invalid
for a mask, the Integration Service applies a default
mask to the data.
The Integration Service applies masked values from
the default value file
Default Value File
The default value file is an XML file in the following
location:

79

<PC
Directory>\infa_shared\SrcFiles\defaultValue.xml

The defaultValue.xml file contains the following


name-value pairs:
<?xml version="1.0" standalone="yes" ?>
<defaultValue
default_char = "X"
default_digit = "9"
default_date = "11/11/1111 00:00:00"
default_email = "abc@xyz.com"
default_ip = "99.99.9.999"
default_url = "http://www.xyz.com"
default_phone = "999 999 999 9999"
default_ssn = "999-99-9999"
default_cc = "9999 9999 9999 9999"
default_seed = "500"
/>
Rules and Guidelines for Data Masking
Transformations
- The Data Masking transformation does not mask
null values. If the source data contains null values,
the Data Masking transformation returns null values.
To replace null values, add an upstream
transformation that allows user-defined default
values for input ports.
- When the source data format or datatype is invalid
for a mask, the Integration Service applies a default
mask to the data. The Integration Service applies
masked values from a default values file.
- The Data masking transformation returns an invalid
Social Security number with the same format and
area code as the source. If the Social Security
Administration has issued more than half of the
numbers for an area, the Data Masking
transformation might not be able to return unique
invalid Social Security numbers with key masking.

D. EXTERNAL PROCEDURE
External Procedure transformations operate in
conjunction with procedures you create outside of
the Designer interface to extend PowerCenter
functionality.
If you are an experienced programmer, you may
want to develop complex functions within a
dynamic link library (DLL) or UNIX shared library,
instead of creating the necessary Expression
transformations in a mapping.
To get this kind of extensibility, use the
Transformation Exchange (TX) dynamic invocation
interface built into PowerCenter.

Using TX, you can create an Informatica


External Procedure transformation and bind it
to an external procedure that you have developed.
You can bind External Procedure transformations to
two kinds of external procedures:
- COM external procedures (available on Windows
only)
- Informatica external procedures (available on
Windows, AIX, HP-UX, Linux, and Solaris)
External Procedures and External Procedure
Transformations
There are two components to TX: external
procedures and External Procedure transformations.
An External procedure exists separately from the
Integration Service.
It consists of C, C++, or Visual Basic code written by
a user to define a transformation.
This code is compiled and linked into a DLL or
shared library, which is loaded by the Integration
Service at runtime.
An external procedure is bound to an External
Procedure transformation.
An External Procedure transformation is created
in the Designer.
It is an object that resides in the Informatica
repository and serves several purposes:
1. It contains the metadata describing the following
external procedure. It is through this metadata that
the Integration Service knows the signature
(number and types of parameters, type of return
value, if any) of the external procedure.
2. It allows an external procedure to be referenced in
a mapping. By adding an instance of an External
Procedure transformation to a mapping, you call the
external procedure bound to that transformation.
Note: You can create a connected or unconnected
External Procedure.
3. When you develop Informatica external
procedures, the External Procedure transformation
provides the information required to generate
Informatica external procedure stubs.
External Procedure transformations return
one or no output rows for each input row.

E. FILTER
A filter condition returns TRUE or FALSE for each row
that the Integration Service evaluates, depending
on whether a row meets the specified condition.
For each row that returns TRUE, the Integration
Services pass through the transformation.

80

For each row that returns FALSE, the


Integration Service drops and writes a
message to the session log.
You cannot concatenate ports from more than
one transformation into the Filter
transformation.
The input ports for the filter must come from a
single transformation
If the filter condition evaluates to NULL, the
row is treated as FALSE.
Note: The filter condition is case sensitive.

F. HTTP
The HTTP transformation enables you to connect to
an HTTP server to use its services and applications.
When you run a session with an HTTP
transformation, the Integration Service connects to
the HTTP server and issues a request to retrieve
data from or update data on the HTTP server,
depending on how you configure the
transformation:
Read data from an HTTP server - When the
Integration Service reads data from an HTTP server;
it retrieves the data from the HTTP server and
passes the data to the target or a downstream
transformation in the mapping. For example, you
can connect to an HTTP server to read current
inventory data, perform calculations on the data
during the PowerCenter session, and pass the data
to the target.
Update data on the HTTP server - When the
Integration Service writes to an HTTP server, it posts
data to the HTTP server and passes HTTP server
responses to the target or a downstream
transformation in the mapping. For example, you
can post data providing scheduling information from
upstream transformations to the HTTP server during
a session

G. IDENTITY RESOLUTION
The Identity Resolution transformation is an active
transformation that you can use to search and
match data in Informatica Identity Resolution
(IIR).
The PowerCenter Integration Service uses the search
definition that you specify in the Identity Resolution
transformation to search and match data residing in
the IIR tables.
The input and output views in the system determine
the input and output ports of the transformation.

An Identity Resolution transformation contains an


input group and an output group.
The input group has ports that represent
fields in the input view of the search
definition.
The output group has ports that represent fields in
the output view of the search definition in addition
to ports that describe the result of the search.

H. JAVA
Extend PowerCenter functionality with the Java
transformation.
The Java transformation provides a simple native
programming interface to define transformation
functionality with the Java programming language.
You can use the Java transformation to quickly
define simple or moderately complex
transformation functionality without advanced
knowledge of the Java programming language or an
external Java development environment.
The PowerCenter Client uses the Java
Development Kit (JDK) to compile the Java
code and generate byte code for the
transformation.
The PowerCenter Client stores the byte code in
the PowerCenter repository.
The Integration Service uses the Java Runtime
Environment (JRE) to execute generated byte
code at run time.
When the Integration Service runs a session with a
Java transformation, the Integration Service uses
the JRE to execute the byte code and process input
rows and generate output rows.
Create Java transformations by writing Java code
snippets that define transformation logic.
Define transformation behavior for a Java
transformation based on the following events:
- The transformation receives an input row.
- The transformation has processed all input rows.
- The transformation receives a transaction
notification such as commit or rollback.

Active and Passive Java Transformations


When you create a Java transformation, you define
its type as active or passive.
After you set the transformation type, you cannot
change it.
A Java transformation runs the Java code that you
define on the On Input Row tab one time for each
row of input data.

Groups and Ports

81

A Java transformation handles output rows based on


the transformation type as follows:
- A passive Java transformation generates one
output row for each input row in the
transformation after processing each input
row.
- An active Java transformation generates
multiple output rows for each input row in the
transformation.
Use the generateRow method to generate each
output row. For example, if the transformation
contains two input ports that represent a start date
and an end date, you can use the generateRow
method to generate an output row for each date
between the start date and the end date.
Datatype Conversion
When a Java transformation reads input rows, it
converts input port datatypes to Java datatypes.
When a Java transformation writes output rows, it
converts Java datatypes to output port datatypes.
For example, the following processing occurs for an
input port with the integer datatype in a Java
transformation:
1. The Java transformation converts the integer
datatype of the input port to the Java primitive int
datatype.
2. In the transformation, the transformation treats
the value of the input port as the Java primitive int
datatype.
3. When the transformation generates the output
row, it converts the Java primitive int datatype to the
integer datatype.
A Java transformation can have input ports, output
ports, and input/output ports. You create and edit
groups and ports on the Ports tab.
A Java transformation always has one input
group and one output group.
The transformation is not valid if it has
multiple input or output groups

Compiling a Java Transformation


The PowerCenter Client uses the Java compiler to
compile the Java code and generate the byte code
for the transformation.
The Java compiler compiles the Java code and
displays the results of the compilation in the Output
window on the code entry tabs.
The Java compiler installs with the PowerCenter
Client in the java/bin directory.
To compile the full code for the Java transformation,
click Compile on the Java Code tab.

When you create a Java transformation, it contains a


Java class that defines the base functionality for a
Java transformation.
The full code for the Java class contains the
template class code for the transformation, plus the
Java code you define on the code entry tabs.
When you compile a Java transformation, the
PowerCenter Client adds the code from the code
entry tabs to the template class for the
transformation to generate the full class code for
the transformation.
The PowerCenter Client then calls the Java compiler
to compile the full class code.
The Java compiler compiles the transformation and
generates the byte code for the transformation
Note: The Java transformation is also compiled
when you click OK in the transformation
Java Expressions
You can invoke PowerCenter expressions in a Java
transformation with the Java programming
language.
Use expressions to extend the functionality of a Java
transformation.
For example, you can invoke an expression in a Java
transformation to look up the values of input or
output ports or look up the values of Java
transformation variables.
To invoke expressions in a Java transformation, you
generate the Java code or use Java transformation
API methods to invoke the expression.
You invoke the expression and use the result of the
expression on the appropriate code entry tab.
You can generate the Java code that invokes an
expression or use API methods to write the Java
code that invokes the expression

I. JOINER
The master pipeline ends at the Joiner
transformation, while the detail pipeline
continues to the target.
The Joiner transformation accepts input from most
transformations. However, consider the following
limitations on the pipelines you connect to the
Joiner transformation:
- You cannot use a Joiner transformation when
either input pipeline contains an Update
Strategy transformation.

82

- You cannot use a Joiner transformation if you


connect a Sequence Generator transformation
directly before the Joiner transformation
Joiner Data Cache Size - Default cache size is
2,000,000 bytes
Joiner Index Cache Size - Default cache size is
1,000,000 bytes

When you run the Integration Service in ASCII mode,


it sorts all character data using a binary sort order.
To ensure that data is sorted as the Integration
Service requires, the database sort order must be
the same as the user-defined session sort order.
When you join sorted data from partitioned
pipelines, you must configure the partitions to
maintain the order of sorted data

To improve performance for an unsorted Joiner


transformation, use the source with fewer
rows as the master source.
To improve performance for a sorted Joiner
transformation, use the source with fewer
duplicate key values as the master

If you pass unsorted or incorrectly sorted data to a


Joiner transformation configured to use sorted data,
the session fails and the Integration Service logs
the error in the session log file

By default, when you add ports to a Joiner


transformation, the ports from the first source
pipeline display as detail sources.
Adding the ports from the second source pipeline
sets them as master sources.

Adding Transformations to the Mapping


When you add transformations between the sort
origin and the Joiner transformation, use the
following guidelines to maintain sorted data:
- Do not place any of the following transformations
between the sort origin and the Joiner
transformation: CARNUXX
- Custom
- Unsorted Aggregator
- Normalizer
- Rank
- Union
- XML Parser
- XML Generator
- Mapplet, if it contains one of the above
- You can place a sorted Aggregator
transformation between the sort origin and
the Joiner transformation if you use the
following guidelines:
- Configure the Aggregator transformation for
sorted input.
- Use the same ports for the group by columns in
the Aggregator transformation as the ports at the
sort origin.
- The group by ports must be in the same order as
the ports at the sort origin.
- When you join the result set of a Joiner
transformation with another pipeline, verify that the
data output from the first Joiner transformation is
sorted.
Tip: You can place the Joiner transformation directly
after the sort origin to maintain sorted data

If you use multiple ports in the join condition, the


Integration Service compares the ports in the order
you specify.
If you join Char and Varchar datatypes, the
Integration Service counts any spaces that pad
Char values as part of the string:
Char(40) = "abcd"
Varchar(40) = "abcd"
The Char value is abcd padded with 36 blank
spaces, and the Integration Service does not join
the two fields because the Char field contains
trailing spaces.
Note: The Joiner transformation does not match null
values. For example, if both EMP_ID1 and EMP_ID2
contain a row with a null value, the Integration
Service does not consider them a match and does
not join the two rows. To join rows with null values,
replace null input with default values, and then join
on the default values.
Using Sorted Input
When you configure the Joiner transformation to use
sorted data, the Integration Service improves
performance by minimizing disk input and output.
When you configure the sort order in a session, you
can select a sort order associated with the
Integration Service code page.
When you run the Integration Service in Unicode
mode, it uses the selected session sort order to sort
character data.

Example of a Join Condition


For example, you configure Sorter transformations in
the master and detail pipelines with the following
sorted ports:
1. ITEM_NO

83

2. ITEM_NAME
3. PRICE
When you configure the join condition, use the
following guidelines to maintain sort order:
- You must use ITEM_NO in the first join condition.
- If you add a second join condition, you must use
ITEM_NAME.
- If you want to use PRICE in a join condition,
you must also use ITEM_NAME in the second
join condition.
If you skip ITEM_NAME and join on ITEM_NO
and PRICE, you lose the sort order and the
Integration Service fails the session.
Joining Two Branches of the Same Pipeline
When you join data from the same source, you can
create two branches of the pipeline.
When you branch a pipeline, you must add a
transformation between the source qualifier
and the Joiner transformation in at least one
branch of the pipeline.
You must join sorted data and configure the Joiner
transformation for sorted input.
The following figure shows a mapping that joins two branches
of the same pipeline:

Joining two branches might impact performance if


the Joiner transformation receives data from one
branch much later than the other branch.
The Joiner transformation caches all the data from
the first branch, and writes the cache to disk if the
cache fills.
The Joiner transformation must then read the data
from disk when it receives the data from the second
branch. This can slow processing.

Joining Two Instances of the Same Source


You can also join same source data by creating a
second instance of the source.
After you create the second source instance, you
can join the pipelines from the two source
instances.
If you want to join unsorted data, you must
create two instances of the same source and
join the pipelines
The following figure shows two instances of the
same source joined with a Joiner transformation:

Note: When you join data using this method, the


Integration Service reads the source data for
each source instance, so performance can be
slower than joining two branches of a pipeline
Guidelines for Joining Data from a Single Source
Use the following guidelines when deciding whether
to join branches of a pipeline or join two instances
of a source:
- Join two branches of a pipeline when you have a
large source or if you can read the source data only
once. For example, you can only read source data
from a message queue once.
- Join two branches of a pipeline when you use
sorted data. If the source data is unsorted and you
use a Sorter transformation to sort the data, branch
the pipeline after you sort the data.
- Join two instances of a source when you need to
add a blocking transformation to the pipeline
between the source and the Joiner transformation.
- Join two instances of a source if one pipeline may
process slower than the other pipeline.
- Join two instances of a source if you need to join
unsorted data
Blocking the Source Pipelines
When you run a session with a Joiner transformation,
the Integration Service blocks and un-blocks the
source data, based on the mapping configuration
and whether you configure the Joiner
transformation for sorted input.

Unsorted Joiner Transformation


When the Integration Service processes an unsorted
Joiner transformation, it reads all master rows
before it reads the detail rows.
To ensure it reads all master rows before the detail
rows, the Integration Service blocks the detail
source while it caches rows from the master source.
Once the Integration Service reads and caches all
master rows, it unblocks the detail source and
reads the detail rows.
Some mappings with unsorted Joiner
transformations violate data flow validation.
Sorted Joiner Transformation

84

When the Integration Service processes a sorted


Joiner transformation, it blocks data based on the
mapping configuration.
Blocking logic is possible if master and detail input
to the Joiner transformation originate from different
sources.
The Integration Service uses blocking logic to
process the Joiner transformation if it can do so
without blocking all sources in a target load order
group simultaneously.
Otherwise, it does not use blocking logic.
Instead, it stores more rows in the cache.
When the Integration Service can use blocking
logic to process the Joiner transformation, it
stores fewer rows in the cache, increasing
performance.

Caching Master Rows


When the Integration Service processes a Joiner
transformation, it reads rows from both sources
concurrently and builds the index and data cache
based on the master rows.
The Integration Service then performs the join based
on the detail source data and the cache data.
The number of rows the Integration Service stores in
the cache depends on the partitioning scheme, the
source data, and whether you configure the Joiner
transformation for sorted input.
To improve performance for an unsorted Joiner
transformation, use the source with fewer rows as
the master source.
To improve performance for a sorted Joiner
transformation, use the source with fewer duplicate
key values as the master.
Working with Transactions
When the Integration Service processes a Joiner
transformation, it can apply transformation logic to
all data in a transaction, all incoming data, or one
row of data at a time.
The Integration Service can drop or preserve
transaction boundaries depending on the mapping
configuration and the transformation scope.
You configure how the Integration Service applies
transformation logic and handles transaction
boundaries using the transformation scope
property.
You configure transformation scope values based on
the mapping configuration and whether you want
to preserve or drop transaction boundaries.

You can preserve transaction boundaries when you


join the following sources:

You join two branches of the same source


pipeline - Use the Transaction transformation scope
to preserve transaction boundaries.
You join two sources, and you want to
preserve transaction boundaries for the detail
source - Use the Row transformation scope to
preserve transaction boundaries in the detail
pipeline.
You can drop transaction boundaries when you join
the following sources:
You join two sources or two branches and you
want to drop transaction boundaries - Use the
All Input transformation scope to apply the
transformation logic to all incoming data and drop
transaction boundaries for both pipelines.
The following table summarizes how to
preserve transaction boundaries using
transformation scopes with the Joiner
transformation:

85

Preserving Transaction Boundaries for a Single


Pipeline
When you join data from the same source, use the
Transaction transformation scope to preserve
incoming transaction boundaries for a single
pipeline.
Use the Transaction transformation scope when the
Joiner transformation joins data from the same
source, either two branches of the same pipeline or
two output groups of one transaction generator.
Use this transformation scope with sorted data and
any join type.
When you use the Transaction transformation scope,
verify that master and detail pipelines originate
from the same transaction control point and that
you use sorted input.
For example, in Preserving Transaction Boundaries
for a Single Pipeline on page 223 the Sorter
transformation is the transaction control point.
You cannot place another transaction control point
between the Sorter transformation and the Joiner
transformation.

In the mapping, the master and detail pipeline


branches originate from the same transaction
control point, and the Integration Service joins the
pipeline branches with the Joiner transformation,
preserving transaction boundaries
The following figure shows a mapping that joins two
branches of a pipeline and preserves transaction
boundaries:

Preserving Transaction Boundaries in the Detail


Pipeline
When you want to preserve the transaction
boundaries in the detail pipeline, choose the Row
transformation scope.
The Row transformation scope allows the Integration
Service to process data one row at a time.
The Integration Service caches the master data and
matches the detail data with the cached master
data.
When the source data originates from a real-time
source, such as IBM MQ Series, the Integration
Service matches the cached master data with each
message as it is read from the detail source.
Use the Row transformation scope with Normal and
Master Outer join types that use unsorted data.

Dropping Transaction Boundaries for Two Pipelines


When you want to join data from two sources or two
branches and you do not need to preserve
transaction boundaries, use the All Input
transformation scope.
When you use All Input, the Integration Service
drops incoming transaction boundaries for both
pipelines and outputs all rows from the
transformation as an open transaction.
At the Joiner transformation, the data from the
master pipeline can be cached or joined
concurrently, depending on how you configure the
sort order.
Use this transformation scope with sorted and
unsorted data and any join type.

J. LOOKUP
Use a Lookup transformation in a mapping to look up
data in a flat file, relational table, view, or
synonym

You can import a lookup definition from any flat file


or relational database to which both the
PowerCenter Client and Integration Service can
connect.
You can also create a lookup definition from a
source qualifier
Configure the Lookup transformation to perform the
following types of lookups:
Relational or flat file lookup Perform a lookup on a flat file or a relational table.
When you create a Lookup transformation using a
relational table as the lookup source, you can
connect to the lookup source using ODBC and
import the table definition as the structure for the
Lookup transformation.
Use the following options with relational lookups:
- Override the default SQL statement to add a
WHERE clause or to query multiple tables.
- Sort null data high or low, based on database
support.
- Perform case-sensitive comparisons based on the
database support.
When you create a Lookup transformation using a
flat file as a lookup source, the Designer invokes
the Flat File Wizard.
Use the following options with flat file lookups:
- Use indirect files as lookup sources by configuring
a file list as the lookup file name
- Use sorted input for the lookup.
- Sort null data high or low.
- Use case-sensitive string comparison with flat file
lookups.

86

Pipeline Lookups
Create a pipeline Lookup transformation to
perform a lookup on an application source
that is not a relational table or flat file.
A pipeline Lookup transformation has a source
qualifier as the lookup source. The source qualifier
can represent any type of source definition,
including JMS and MSMQ.
The source definition cannot have more than one
group.
When you configure a pipeline Lookup
transformation, the lookup source and source
qualifier are in a different pipeline from the Lookup
transformation.
The source and source qualifier are in a partial
pipeline that contains no target.
The Integration Service reads the source data in this
pipeline and passes the data to the Lookup
transformation to create the cache.

You can create multiple partitions in the partial


pipeline to improve performance.
To improve performance when processing relational
or flat file lookup sources, create a pipeline Lookup
transformation instead of a relational or flat file
Lookup transformation.
You can create partitions to process the lookup
source and pass it to the Lookup transformation.
Create a connected or unconnected pipeline Lookup
transformation.
Note: Do not enable HA recovery for sessions that
have real-time sources for pipeline lookups. You
might get unexpected results

Configuring a Pipeline Lookup Transformation in a


Mapping
A mapping that contains a pipeline Lookup
transformation includes a partial pipeline that
contains the lookup source and source qualifier.
The partial pipeline does not include a target. The
Integration Service retrieves the lookup source data
in this pipeline and passes the data to the lookup
cache.
The partial pipeline is in a separate target load order
group in session properties.
You can create multiple partitions in the pipeline to
improve performance.
You cannot configure the target load order with the
partial pipeline.
The following mapping shows a mapping that
contains a pipeline Lookup transformation and the
partial pipeline that processes the lookup source:

The mapping contains the following objects:


- The lookup source definition and source qualifier
are in a separate pipeline. The Integration Service
creates a lookup cache after it processes the lookup
source data in the pipeline.
- A flat file source contains new department names
by employee number.
- The pipeline Lookup transformation receives
Employee_Number and New_Dept from the source
file. The pipeline Lookup performs a lookup on
Employee_ID in the lookup cache. It retrieves the
employee first and last name from the lookup cache.

- A flat file target receives the Employee_ID,


First_Name, Last_Name, and New_Dept from the
Lookup transformation.

Connected or unconnected lookup A connected Lookup transformation receives source


data, performs a lookup, and returns data to the
pipeline.
An unconnected Lookup transformation is not
connected to a source or target.
A transformation in the pipeline calls the Lookup
transformation with a :LKP expression.
The unconnected Lookup transformation returns one
column to the calling transformation.

Connected Lookup Transformation


The following steps describe how the Integration
Service processes a connected Lookup
transformation:
1. A connected Lookup transformation receives
input values directly from another transformation in
the pipeline.
2. For each input row, the Integration Service
queries the lookup source or cache based on the
lookup ports and the condition in the transformation.
3. If the transformation is uncached or uses a static
cache, the Integration Service returns values from the
lookup query.
If the transformation uses a dynamic cache, the
Integration Service inserts the row into the cache
when it does not find the row in the cache. When

87

the Integration Service finds the row in the cache, it


updates the row in the cache or leaves it
unchanged. It flags the row as insert, update, or no
change.
4. The Integration Service passes return values
from the query to the next transformation.
If the transformation uses a dynamic cache, you can
pass rows to a Filter or Router transformation to
filter new rows to the target.
Unconnected Lookup Transformation
An unconnected Lookup transformation receives
input values from the result of a :LKP expression in
another transformation.
You can call the Lookup transformation more than
once in a mapping.
A common use for unconnected Lookup
transformations is to update slowly changing
dimension tables.
For more information about slowly changing
dimension tables, visit the Informatica Knowledge
Base at http://mysupport.informatica.com.

- You can delete lookup ports from a relational


lookup if the mapping does not use the lookup
port. This reduces the amount of memory the
Integration Service needs to run the session.
Lookup Properties
The following table describes the Lookup
transformation properties:

The following steps describe the way the Integration


Service processes an unconnected Lookup
transformation:
1. An unconnected Lookup transformation receives
input values from the result of a :LKP expression in
another transformation, such as an Update Strategy
transformation.
2. The Integration Service queries the lookup source or
cache based on the lookup ports and condition in the
transformation.
3. The Integration Service returns one value into the
return port of the Lookup transformation.
4. The Lookup transformation passes the return value
into the :LKP expression.
The lookup table can be a single table, or you can
join multiple tables in the same database using a
lookup SQL override.
The Integration Service queries the lookup table or
an in-memory cache of the table for all incoming
rows into the Lookup transformation
Return port - Use only in unconnected Lookup
transformations. Designates the column of data you
want to return based on the lookup condition. You can
designate one lookup port as the return port
Use the following guidelines to configure lookup ports:
- If you delete lookup ports from a flat file
lookup, the session fails.

88

Cached or un-cached lookup Cache the lookup source to improve performance.


If you cache the lookup source, you can use a
dynamic or static cache.
By default, the lookup cache remains static and does
not change during the session.
With a dynamic cache, the Integration Service
inserts or updates rows in the cache.
When you cache the target table as the lookup
source, you can look up values in the cache to
determine if the values exist in the target.
The Lookup transformation marks rows to insert or
update the target.
Lookup Query
The Integration Service queries the lookup based on
the ports and properties you configure in the
Lookup transformation.

89

The Integration Service runs a default SQL


statement when the first row enters the Lookup
transformation.
If you use a relational lookup or a pipeline
lookup against a relational table, you can
customize the default query with the Lookup
SQL Override property

SQL statement in a parameter file. The Designer


cannot expand parameters and variables in the
query override and does not validate it when you
use a parameter or variable. The Integration Service
expands the parameters and variables when you run
the session.
A lookup column name contains a slash (/)
character - When generating the default lookup
query, the Designer and Integration Service replace
any slash character (/) in the lookup column name
with an underscore character. To query lookup
column names containing the slash character,
override the default lookup query, replace the
underscore characters with the slash character, and
enclose the column name in double quotes.
Add a WHERE clause - Use a lookup SQL override
to add a WHERE clause to the default SQL
statement. You might want to use the WHERE clause
to reduce the number of rows included in the cache.
When you add a WHERE clause to a Lookup
transformation using a dynamic cache, use a Filter
transformation before the Lookup transformation to
pass rows into the dynamic cache that match the
WHERE clause.
Note: The session fails if you include large object
ports in a WHERE clause.
Other - Use a lookup SQL override if you want to
query lookup data from multiple lookups or if you
want to modify the data queried from the lookup
table before the Integration Service caches the
lookup rows. For example, use TO_CHAR to convert
dates to strings.

If you configure both the Lookup SQL Override and


the Lookup Source Filter properties, the Integration
Service ignores the Lookup Source Filter property
Default Lookup Query
The default lookup query contains the following
statements:
SELECT - The SELECT statement includes all the
lookup ports in the mapping. You can view the
SELECT statement by generating SQL using the
Lookup SQL Override property. Do not add or delete
any columns from the default SQL statement.
ORDER BY - The ORDER BY clause orders the
columns in the same order they appear in the
Lookup transformation. The Integration Service
generates the ORDER BY clause. You cannot view
this when you generate the default SQL using
the Lookup SQL Override property.
Overriding the Lookup Query
Override the lookup query in the following
circumstances:
Override the ORDER BY clause - Create the
ORDER BY clause with fewer columns to increase
performance. When you override the ORDER BY
clause, you must suppress the generated ORDER BY
clause with a comment notation.
Note: If you use pushdown optimization, you cannot
override the ORDER BY clause or suppress the
generated ORDER BY clause with a comment
notation.
A lookup table name or column names
contains a reserved word - If the table name or
any column name in the lookup query contains a
reserved word, you must ensure that all reserved
words are enclosed in quotes.
Use parameters and variables - Use parameters
and variables when you enter a lookup SQL override.
Use any parameter or variable type that you can
define in the parameter file. You can enter a
parameter or variable within the SQL statement, or
you can use a parameter or variable as the SQL
query. For example, you can use a session
parameter, $ParamMyLkpOverride, as the lookup
SQL query, and set $ParamMyLkpOverride to the

Guidelines for Overriding the Lookup Query


Use the following guidelines when you override the
lookup SQL query:
- You can override the lookup SQL query for
relational lookups.
- Generate the default query, and then
configure the override. This ensures that all the
lookup/output ports are included in the query. If you
add or subtract ports from the SELECT
statement, the session fails.
- Add a source lookup filter to filter the rows that are
added to the lookup cache. This ensures the
Integration Service inserts rows in the dynamic
cache and target table that match the WHERE
clause.
- To share the cache, use the same lookup SQL
override for each Lookup transformation.
- If you override the ORDER BY clause, the
session fails if the ORDER BY clause does not
contain the condition ports in the same order
they appear in the Lookup condition or if you

90

do not suppress the generated ORDER BY


clause with the comment notation.
- If you use pushdown optimization, you
cannot override the ORDER BY clause or
suppress the generated ORDER BY clause with
comment notation.
- If the table name or any column name in the
lookup query contains a reserved word, you must
enclose all reserved words in quotes.
- You must choose the Use Any Value Lookup Policy
on Mulitple Match condition to override the lookup
query for an uncached lookup.
Handling Multiple Matches
- Use the first matching value, or use the last
matching value
- Use any matching value
- Use all values
- Return an error. When the Lookup transformation
uses a static cache or no cache, the Integration
Service marks the row as an error. The Lookup
transformation writes the row to the session log by
default, and increases the error count by one. When
the Lookup transformation has a dynamic cache, the
Integration Service fails the session when it
encounters multiple matches. The session fails while
the Integration Service is caching the lookup table or
looking up the duplicate key values. Also, if you
configure the Lookup transformation to output old
values on updates, the Lookup transformation
returns an error when it encounters multiple
matches. The transformation creates an index based
on the key ports instead of all Lookup transformation
ports.

Lookup Caches
You can configure a Lookup transformation to cache
the lookup file or table.
The Integration Service builds a cache in memory
when it processes the first row of data in a cached
Lookup transformation.
It allocates memory for the cache based on the
amount you configure in the transformation or
session properties.
The Integration Service stores condition values in
the index cache and output values in the data
cache.
The Integration Service queries the cache for each
row that enters the transformation.
The Integration Service also creates cache files by
default in the $PMCacheDir.
If the data does not fit in the memory cache, the
Integration Service stores the overflow values in
the cache files.

When the session completes, the Integration


Service releases cache memory and deletes
the cache files unless you configure the
Lookup transformation to use a persistent
cache.
When configuring a lookup cache, you can configure
the following options:
- Persistent cache
- Re-cache from lookup source
- Static cache
- Dynamic cache
- Shared cache
- Pre-build lookup cache
Note: You can use a dynamic cache for relational or
flat file lookups
Rules and Guidelines for Returning Multiple Rows
- The Integration Service caches all rows from the
lookup source for cached lookups.
- You can configure an SQL override for a cached or
uncached lookup that returns multiple rows.
- You cannot enable dynamic cache for a Lookup
transformation that returns multiple rows.
- You cannot return multiple rows from an
unconnected Lookup transformation.
- You can configure multiple Lookup transformations
to share a named cache if the Lookup
transformations have matching caching lookup on
multiple match policies.
- A Lookup transformation that returns multiple rows
cannot share a cache with a Lookup transformation
that returns one matching row for each input row.
Configuring Unconnected Lookup Transformations
An unconnected Lookup transformation is a Lookup
transformation that is not connected to a source or
target.
Call the lookup from another transformation with
a :LKP expression.
You can perform the following tasks when you call a
lookup from an expression:
- Test the results of a lookup in an expression.
- Filter rows based on the lookup results.
- Mark rows for update based on the result of a
lookup and update slowly changing dimension
tables.
- Call the same lookup multiple times in one
mapping.
Database Deadlock Resilience
The Lookup transformation is resilient to a database
deadlock for un-cached lookups.

91

When a database deadlock error occurs, the session


does not fail.
The Integration Service attempts to re-execute the
last statement for a specified retry period.
You can configure the number of deadlock retries
and the deadlock sleep interval for an Integration
Service.
These values also affect database deadlocks for the
relational writer. You can override these values at
the session level as custom properties.
Configure following Integration Service Properties:
NumOfDeadlockRetries - The number of times the
PowerCenter Integration Service retries a target
write on a database deadlock. Minimum is 0. Default
is 10. If you want the session to fail on deadlock set
NumOfDeadlockRetries to zero.
DeadlockSleep - Number of seconds before the
PowerCenter Integration Service retries a target
write on database deadlock.
If a deadlock occurs, the Integration Service
attempts to run the statement. The Integration
Service waits for a delay period between each retry
attempt. If all attempts fail due to deadlock, the
session fails. The Integration Service logs a message
in the session log whenever it retries a statement.
Tips for Lookup Transformations
Add an index to the columns used in a lookup
condition:
If you have privileges to modify the database
containing a lookup table, you can improve
performance for both cached and un-cached
lookups.
This is important for very large lookup tables.
Since the Integration Service needs to query,
sort, and compare values in these columns,
the index needs to include every column used
in a lookup condition.
Place conditions with an equality operator (=) first:
If you include more than one lookup condition,
place the conditions in the following order to
optimize lookup performance:
- Equal to (=)
- Less than (<), greater than (>), less than or equal
to (<=), greater than or equal to (>=)
- Not equal to (!=)
Cache small lookup tables:
Improve session performance by caching small
lookup tables.
The result of the lookup query and processing is the
same, whether or not you cache the lookup table.

Join tables in the database:


If the lookup table is on the same database as the
source table in the mapping and caching is not
feasible, join the tables in the source database
rather than using a Lookup transformation.

Use a persistent lookup cache for static lookups:


If the lookup source does not change between
sessions, configure the Lookup transformation to
use a persistent lookup cache.
The Integration Service then saves and reuses cache
files from session to session, eliminating the time
required to read the lookup source.
Configure a pipeline Lookup transformation to
improve performance when processing a relational
or flat file lookup source:
You can create partitions to process a relational or
flat file lookup source when you define the lookup
source as a source qualifier. Configure a nonreusable pipeline Lookup transformation and create
partitions in the partial pipeline that processes the
lookup source.

92

Lookup Caches
The Integration Service builds a cache in memory
when it processes the first row of data in a cached
Lookup transformation.
It allocates memory for the cache based on the
amount you configure in the transformation or
session properties.
The Integration Service stores condition values in
the index cache and output values in the data
cache.
The Integration Service queries the cache for each
row that enters the transformation
If the data does not fit in the memory cache,
the Integration Service stores the overflow
values in the cache files.
When the session completes, the Integration Service
releases cache memory and deletes the cache files
unless you configure the Lookup transformation to
use a persistent cache.
If you use a flat file or pipeline lookup, the
Integration Service always caches the lookup
source.
If you configure a flat file lookup for sorted input, the
Integration Service cannot cache the lookup if the
condition columns are not grouped.

If the columns are grouped, but not sorted, the


Integration Service processes the lookup as if you
did not configure sorted input

When you configure a lookup cache, you can


configure the following cache settings:
Building caches You can configure the session to build caches
sequentially or concurrently.
When you build sequential caches, the Integration
Service creates caches as the source rows enter the
Lookup transformation.
When you configure the session to build concurrent
caches, the Integration Service does not wait for
the first row to enter the Lookup transformation
before it creates caches.
Instead, it builds multiple caches concurrently.
Persistent cache You can save the lookup cache files and reuse them
the next time the Integration Service processes a
Lookup transformation configured to use the cache.
If the lookup table does not change between
sessions, you can configure the Lookup
transformation to use a persistent lookup cache

Shared cache You can share the lookup cache between multiple
transformations.
You can share an unnamed cache between
transformations in the same mapping.
You can share a named cache between
transformations in the same or different mappings.
Lookup transformations can share unnamed static
caches within the same target load order group if
the cache sharing rules match. Lookup
transformations cannot share dynamic cache within
the same target load order group.

When you do not configure the Lookup


transformation for caching, the Integration Service
queries the lookup table for each input row.
The result of the Lookup query and processing is the
same, whether or not you cache the lookup table.
However, using a lookup cache can increase session
performance.

Re-cache from source If the persistent cache is not synchronized with the
lookup table, you can configure the Lookup
transformation to rebuild the lookup cache.
Static cache You can configure a static, or read-only, cache for
any lookup source.
By default, the Integration Service creates a static
cache.
It caches the lookup file or table and looks up values
in the cache for each row that comes into the
transformation.
When the lookup condition is true, the Integration
Service returns a value from the lookup cache.
The Integration Service does not update the cache
while it processes the Lookup transformation.
Dynamic cache To cache a table, flat file, or source definition and
update the cache, configure a Lookup
transformation with dynamic cache.
The Integration Service dynamically inserts or
updates data in the lookup cache and passes the
data to the target.
The dynamic cache is synchronized with the target.

The Integration Service can build lookup caches for


connected Lookup transformations in the following
ways:
- Sequential caches
- Concurrent caches
The Integration Service builds caches for
unconnected Lookup transformations sequentially
regardless of how you configure cache building.
Sequential Caches
By default, the Integration Service builds a cache in
memory when it processes the first row of data in a
cached Lookup transformation.
The Integration Service creates each lookup cache in
the pipeline sequentially.

93

The Integration Service waits for any upstream


active transformation to complete processing
before it starts processing the rows in the Lookup
transformation.
The Integration Service does not build caches for a
downstream Lookup transformation until an
upstream Lookup transformation completes
building a cache.
For example, the following mapping contains an
unsorted Aggregator transformation followed by
two Lookup transformations:

Configuring sequential caching may allow you to


avoid building lookup caches unnecessarily.
For example, a Router transformation might route
data to one pipeline if a condition resolves to true,
and it might route data to another pipeline if the
condition resolves to false.
In this case, a Lookup transformation might not
receive data at all.
Concurrent Caches
You can configure the Integration Service to create
lookup caches concurrently.
You may be able to improve session performance
using concurrent caches.
Performance may especially improve when the
pipeline contains an active transformation
upstream of the Lookup transformation.
You may want to configure the session to create
concurrent caches if you are certain that you will
need to build caches for each of the Lookup
transformations in the session.

Use a Lookup transformation in each session to


perform a lookup on the same customer table.
Each Lookup transformation inserts rows into the
customer table and it inserts them in the dynamic
lookup cache.
For more information about synchronizing dynamic
cache between multiple sessions, see
Synchronizing Cache with the Lookup Source on
page 278.
Loading data into a slowly changing dimension
table and a fact table Create two pipelines and configure a Lookup
transformation that performs a lookup on the
dimension table.
Use a dynamic lookup cache to load data to the
dimension table.
Use a static lookup cache to load data to the fact
table, and specify the name of the dynamic cache
from the first pipeline.

Reading a flat file that is an export from a


relational table Read data from a Teradata table when the ODBC
connection is slow.
You can export the Teradata table contents to a flat
file and use the file as a lookup source.
Configure the Teradata table as a relational target in
the mapping and pass the lookup cache changes
back to the Teradata table.
Note - When you create multiple partitions in a
pipeline that use a dynamic lookup cache, the
Integration Service creates one memory cache and
one disk cache for each transformation.
However, if you add a partition point at the Lookup
transformation, the Integration Service creates one
memory cache for each partition.

Dynamic Lookup Cache


The following list describes some situations when
you use a dynamic lookup cache:
Updating a master customer table with new
and updated customer information Use a Lookup transformation to perform a lookup on
the customer table to determine if a customer
exists in the target.
The cache represents the customer table.
The Lookup transformation inserts and updates rows
in the cache as it passes rows to the target.

Dynamic Lookup Properties A Lookup transformation with a dynamic cache has


the following properties:
NewLookupRow - The Designer adds this port to a
Lookup transformation configured to use a dynamic
cache. Indicates with a numeric value whether the
Integration Service inserts or updates the row in the
cache, or makes no change to the cache. To keep
the lookup cache and the target table synchronized,
pass rows to the target when the NewLookupRow
value is equal to 1 or 2.

Inserting rows into a master customer table


from multiple real-time sessions -

94

Associated Expression - Associate lookup ports or


the associated ports with an expression, an
input/output port, or a sequence ID. The Integration
Service uses the data in the associated expression
to insert or update rows in the lookup cache. If you
associate a sequence ID, the Integration Service
generates a primary key for inserted rows in the
lookup cache.

row is an update row, the Lookup transformation


updates the dynamic lookup cache only.
- You can only create an equality lookup condition.
You cannot look up a range of data in dynamic
cache.
- Associate each lookup port that is not in the lookup
condition with an input port, sequence ID, or
expression.
- Use a Router transformation to pass rows to the
cached target when the NewLookupRow value
equals one or two. Use the Router transformation to
drop rows when the NewLookupRow value equals
zero, or you can output those rows to a different
target.
- Verify that you output the same values to the
target that the Integration Service writes to the
lookup cache. When you choose to output new
values on update, only connect lookup/output ports
to the target table instead of input/output ports.
When you choose to output old values on update,
add an Expression transformation after the Lookup
transformation and before the Router
transformation. Add output ports in the Expression
transformation for each port in the target table and
create expressions to ensure you do not output null
input values to the target.
- When you use a lookup SQL override, map the
correct columns to the appropriate targets for
lookup.
- When you add a WHERE clause to the lookup SQL
override, use a Filter transformation before the
Lookup transformation. This ensures the Integration
Service inserts rows in the dynamic cache and
target table that match the WHERE clause.
- When you configure a reusable Lookup
transformation to use a dynamic cache, you cannot
edit the condition or disable the Dynamic Lookup
Cache property in a mapping.
- Use Update Strategy transformations after the
Lookup transformation to flag the rows for insert or
update for the target.
- Use an Update Strategy transformation before the
Lookup transformation to define some or all rows as
update if you want to use the Update Else Insert
property in the Lookup transformation.
- Set the row type to Data Driven in the session
properties.
- Select Insert and Update as Update for the target
table options in the session properties

Ignore Null Inputs for Updates - The Designer


activates this port property for lookup/output ports
when you configure the Lookup transformation to
use a dynamic cache. Select this property when you
do not want the Integration Service to update the
column in the cache when the data in this column
contains a null value.
Ignore in Comparison - The Designer activates
this port property for lookup/output ports not used in
the lookup condition when you configure the Lookup
transformation to use a dynamic cache. The
Integration Service compares the values in all lookup
ports with the values in their associated input ports
by default. Select this property if you want the
Integration Service to ignore the port when it
compares values before updating a row.
Update Dynamic Cache Condition - Allow the
Integration Service to update the dynamic cache
conditionally. You can create a Boolean expression
that determines whether to update the cache for an
input row. Or, you can enable the Integration Service
to update the cache with an expression result for an
input row. The expression can contain values from
the input row or the lookup cache
Rules and Guidelines for Dynamic Lookup Caches
Use the following guidelines when you use a
dynamic lookup cache:
- You cannot share the cache between a dynamic
Lookup transformation and static Lookup
transformation in the same target load order group.
- You can create a dynamic lookup cache from a
relational table, flat file, or source qualifier
transformation.
- The Lookup transformation must be a connected
transformation.
- Use a persistent or a non-persistent cache.
- If the dynamic cache is not persistent, the
Integration Service always rebuilds the cache from
the database, even if you do not enable Re-cache
from Lookup Source. When you synchronize dynamic
cache files with a lookup source table, the Lookup
transformation inserts rows into the lookup source
table and the dynamic lookup cache. If the source

K. NORMALIZER
The Normalizer transformation generates a key for
each source row.

95

This generated key remains same for the output


group created for each source row.
The Integration Service increments the generated
key sequence number each time it processes a
source row

select file-one assign to "fname".


data division.
file section.
fd FILE-ONE.
The Designer does not read hidden characters in the
COBOL program. Use a text-only editor to make
changes to the COBOL file. Do not use Word or
Wordpad. Remove extra spaces.

You can create a VSAM Normalizer transformation or


a pipeline Normalizer transformation:
VSAM Normalizer transformation A non-reusable transformation that is a Source
Qualifier transformation for a COBOL source.
The Mapping Designer creates VSAM Normalizer
columns from a COBOL source in a mapping.
The column attributes are read-only.
The VSAM Normalizer receives a multiple-occurring
source column through one input port.

A session that reads binary data completed,


but the information in the target table is
incorrect.
Edit the session in the Workflow Manager and verify
that the source file format is set correctly. The file
format might be EBCDIC or ASCII. The number of
bytes to skip between records must be set to 0.

Pipeline Normalizer transformation A transformation that processes multiple-occurring


data from relational tables or flat files.
You create the columns manually and edit them in
the Transformation Developer or Mapping Designer.
The pipeline Normalizer transformation represents
multiple-occurring columns with one input port for
each source column occurrence.

I have a COBOL field description that uses a


non-IBM COMP type. How should I import the
source?
In the source definition, clear the IBM COMP option.
In my mapping, I use one Expression
transformation and one Lookup
transformation to modify two output ports
from the Normalizer transformation. The
mapping concatenates them into a single
transformation. All the ports are under the
same level. When I check the data loaded in
the target, it is incorrect. Why is that?
You can only concatenate ports from level one.
Remove the concatenation.

When a Normalizer transformation receives more


than one type of data from a COBOL source, you
need to connect the Normalizer output ports to
different targets based on the type of data in each
row.

Troubleshooting Normalizer Transformations


I cannot edit the ports in my Normalizer
transformation when using a relational source.
When you create ports manually, add them on the
Normalizer tab in the transformation, not the Ports
tab.
Importing a COBOL file failed with numberrors.
What should I do?
Verify that the COBOL program follows the COBOL
standard, including spaces, tabs, and end of line
characters.
The COBOL file headings should be similar to the
following text:
identification division.
program-id. mead.
environment division.

L. RANK
You can select only the top or bottom rank of data
with Rank transformation.
Use a Rank transformation to return the largest or
smallest numeric value in a port or group.
You can also use a Rank transformation to return the
strings at the top or the bottom of a session sort
order.
During the session, the Integration Service caches
input data until it can perform the rank calculations.

96

The Rank transformation differs from the


transformation functions MAX and MIN, in
that it lets you select a group of top or
bottom values, not just one value.
You can also write expressions to transform data or
perform calculations. You can also create local
variables and write non-aggregate expressions.
When the Integration Service runs in the ASCII data
movement mode, it sorts session data using a
binary sort order.
When the Integration Service runs in Unicode data
movement mode, the Integration Service uses the
sort order configured for the session
Rank Caches
During a session, the Integration Service compares
an input row with rows in the data cache.
If the input row outranks a cached row, the
Integration Service replaces the cached row with
the input row.
If you configure the Rank transformation to rank
across multiple groups, the Integration Service
ranks incrementally for each group it finds.
The Integration Service stores group information in
an index cache and row data in a data cache.
If you create multiple partitions in a pipeline, the
Integration Service creates separate caches for
each partition.
When you create a Rank transformation, you can
configure the following properties:
- Enter a cache directory.
- Select the top or bottom rank.
- Select the input/output port that contains
values used to determine the rank.
- You can select only one port to define a
rank.
- Select the number of rows falling within a
rank.
- Define groups for ranks, such as the 10 least
expensive products for each manufacturer

M. ROUTER
A Filter transformation tests data for one condition
and drops the rows of data that do not meet the
condition.
However, a Router transformation tests data for one
or more conditions and gives you the option to
route rows of data that do not meet any of the
conditions to a default output group

When you use a Router transformation in a mapping,


the Integration Service processes the incoming
data only once.
When you use multiple Filter transformations in a
mapping, the Integration Service processes the
incoming data for each transformation.
You cannot modify or delete output ports or their
properties
The Integration Service determines the order
of evaluation for each condition based on the
order of the connected output groups.
The Integration Service processes user-defined
groups that are connected to a transformation or a
target in a mapping.
The Integration Service only processes userdefined groups that are not connected in a
mapping if the default group is connected to
a transformation or a target.
If a row meets more than one group filter condition,
the Integration Service passes this row multiple
times
The Designer deletes the default group when
you delete the last user-defined group from
the list.

N. SEQUENCE GENERATOR
The Sequence Generator transformation generates
numeric values.
Use the Sequence Generator to create unique
primary key values, replace missing primary keys,
or cycle through a sequential range of numbers.
The Sequence Generator transformation is a
connected transformation.
It contains two output ports that you can connect to
one or more transformations.
The Integration Service generates a block of
sequence numbers each time a block of rows
enters a connected transformation
If you connect CURRVAL, the Integration Service
processes one row in each block.
When NEXTVAL is connected to the input port of
another transformation, the Integration Service
generates a sequence of numbers.
When CURRVAL is connected to the input port of
another transformation, the Integration Service
generates the NEXTVAL value plus the Increment
By value
You can make a Sequence Generator reusable, and
use it in multiple mappings

97

You can use a range of values from 1 to


9,223,372,036,854,775,807 with the smallest
interval of 1.
The Sequence Generator transformation has two
output ports: NEXTVAL and CURRVAL. You cannot
edit or delete these ports.
Likewise,
you
cannot
add
ports
to
the
transformation.

NEXTVAL
Connect NEXTVAL to multiple transformations to
generate unique values for each row in each
transformation.
Use the NEXTVAL port to generate sequence
numbers by connecting it to a downstream
transformation or target.
You connect the NEXTVAL port to generate the
sequence based on the Current Value and
Increment By properties.
If the Sequence Generator is not configured to cycle
through the sequence, the NEXTVAL port generates
sequence numbers up to the configured End Value.

For example, you might connect NEXTVAL to two


targets in a mapping to generate unique primary
key values.
The Integration Service creates a column of unique
primary key values for each target table.
The column of unique primary key values is sent to
one target table as a block of sequence numbers.
The other target receives a block of sequence
numbers
from
the
Sequence
Generator
transformation after the first target receives the
block of sequence numbers.
For example, you configure the Sequence Generator
transformation as follows:
Current Value = 1, Increment By = 1.
The Integration Service generates the following
primary key values for the T_ORDERS_PRIMARY and
T_ORDERS_FOREIGN target tables:
T_ORDERS_PRIMARY TABLE: T_ORDERS_FOREIGN
TABLE:
PRIMARY KEY
PRIMARY KEY
1
6
2
7
3
8
4
9
5
10
If you want the same values to go to more than one
target that receives data from a single

transformation, you can connect a Sequence


Generator transformation to that preceding
transformation.
The Integration Service processes the values into a
block of sequence numbers.
This allows the Integration Service to pass unique
values to the transformation, and then route rows
from the transformation to targets.
The following figure shows a mapping with a
Sequence Generator that passes unique values to
the Expression transformation.
The Expression transformation populates both
targets with identical primary key values.

CURRVAL
CURRVAL is NEXTVAL plus the Increment By value.
You typically only connect the CURRVAL port
when the NEXTVAL port is already connected
to a downstream transformation.
When
a
row
enters
a
transformation
connected to the CURRVAL port, the
Integration Service passes the last created
NEXTVAL value plus one.
The following figure shows connecting CURRVAL and
NEXTVAL ports to a target:
For example, you configure the Sequence Generator
transformation as follows:
Current Value = 1, Increment By = 1.
The Integration Service generates the following
values for NEXTVAL and CURRVAL:
NEXTVAL
CURRVAL
1
2
2
3
3
4
4
5
5
6
If you connect the CURRVAL port without
connecting the NEXTVAL port, the Integration
Service passes a constant value for each row.
When you connect the CURRVAL port in a Sequence
Generator transformation, the Integration Service
processes one row in each block.

98

You can optimize performance by connecting only


the NEXTVAL port in a mapping.

Sequence Generator Transformation Properties


The following table describes the Sequence
Generator transformation properties you can
configure:

Number of Cached Values


When you have a reusable Sequence Generator
transformation in several sessions and the sessions
run at the same time, use Number of Cached
Values to ensure each session receives
unique values in the sequence.
By default, Number of Cached Values is set to
1000 for reusable Sequence Generators.
For non-reusable Sequence Generator, Number
of Cached Values is set to 0 by default
Reset
If you select Reset for a non-reusable Sequence
Generator transformation, the Integration Service
generates values based on the original current
value each time it starts the session.
Otherwise, the Integration Service updates
the current value to reflect the lastgenerated value plus one, and then uses the
updated value the next time it uses the
Sequence Generator transformation.
For example, you might configure a Sequence
Generator transformation to create values from 1 to
1,000 with an increment of 1, and a current value
of 1 and choose Reset.
During the first session run, the Integration Service
generates numbers 1 through 234.
Each subsequent time the session runs, the
Integration Service again generates numbers
beginning with the current value of 1.
If you do not select Reset, the Integration Service
updates the current value to 235 at the end of the
first session run.
The next time it uses the Sequence Generator
transformation, the first value generated is 235.
Note: Reset is disabled for reusable Sequence
Generator transformations.

O. SORTER

End Value
End Value is the maximum value you want the
Integration Service to generate.
If the Integration Service reaches the end
value and the Sequence Generator is not
configured to cycle through the sequence,
the session fails with the following error
message:
TT_11009 Sequence Generator Transformation:
Overflow error.

You can sort data from relational or flat file sources


When you specify multiple ports for the sort key, the
Integration Service sorts each port sequentially.
The order the ports appear in the Ports tab
determines the succession of sort operations.
The Sorter transformation treats the data passing
through each successive sort key port as a
secondary sort of the previous port.
Sorter Cache
You can configure a numeric value for the Sorter
cache, or you can configure the Integration Service
to determine the cache size at run time

99

If you configure the Integration Service to determine


the cache size, you can also configure a maximum
amount of memory for the Integration Service to
allocate to the cache.
If the total configured session cache size is 2 GB
(2,147,483,648 bytes) or greater, you must run the
session on a 64-bit Integration Service
Before starting the sort operation, the Integration
Service allocates the amount of memory configured
for the Sorter cache size.
If the Integration Service runs a partitioned session,
it allocates the specified amount of Sorter cache
memory for each partition
If it cannot allocate enough memory, the Integration
Service fails the session.
For best performance, configure Sorter cache size
with a value less than or equal to the amount of
available physical RAM on the Integration Service
machine.
Allocate at least 16 MB (16,777,216 bytes) of
physical memory to sort data using the Sorter
transformation.
Sorter cache size is set to 16,777,216 bytes by
default.
The Integration Service requires disk space of
at least twice the amount of incoming data
when storing data in the work directory
Use the following formula to determine the size of
incoming data:
number_of_input_rows [( S column_size) + 16]

P. SOURCE QUALIFIER
The Source Qualifier transformation represents the
rows that the Integration Service reads when it runs
a session.
Use the Source Qualifier transformation to complete
the following tasks:
Join data originating from the same source
database You can join two or more tables with primary key
foreign key relationships by linking the sources to
one Source Qualifier transformation.
Filter rows when the Integration Service reads
source data If you include a filter condition, the Integration
Service adds a WHERE clause to the default query.
Specify an outer join rather than the default
inner join If you include a user-defined join, the Integration
Service replaces the join information specified by
the metadata in the SQL query.

Specify sorted ports If you specify a number for sorted ports, the
Integration Service adds an ORDER BY clause to the
default SQL query.
Select only distinct values from the source If you choose Select Distinct, the Integration Service
adds a SELECT DISTINCT statement to the default
SQL query.
Create a custom query to issue a special
SELECT statement for the Integration Service
to read source data For example, you might use a custom query to
perform aggregate calculations
If the datatypes in the source definition and
Source Qualifier transformation do not
match, the Designer marks the mapping
invalid when you save it.
You specify a target load order based on the
Source
Qualifier
transformations
in
a
mapping
If one Source Qualifier transformation provides data
for multiple targets, you can enable constraintbased loading in a session to have the Integration
Service load data based on target table primary
and foreign key relationships
You can use parameters and variables in the SQL
query, user-defined join, source filter, and pre- and
post-session SQL commands of a Source Qualifier
transformation
The Integration Service first generates an SQL
query and expands each parameter or
variable.
It replaces each mapping parameter, mapping
variable, and workflow variable with its start value.
Then it runs the query on the source database
Source Qualifier Transformation Properties

100

Use the Joiner transformation for heterogeneous


sources and to join flat files.
Viewing the Default Query
Do not connect to the source database. You
only connect to the source database when
you enter an SQL query that overrides the
default query.
You must connect the columns in the Source
Qualifier
transformation
to
another
transformation or target before you can
generate the default query
Default Join When you join related tables in one Source Qualifier
transformation, the Integration Service joins the
tables based on the related keys in each table
This default join is an inner equijoin, using the
following syntax in the WHERE clause:
Source1.column_name = Source2.column_name
The columns in the default join must have:
- A primary key-foreign key relationship
- Matching datatypes

Default Query
For relational sources, the Integration Service
generates a query for each Source Qualifier
transformation when it runs a session.
The default query is a SELECT statement for
each source column used in the mapping.
In other words, the Integration Service reads
only the columns that are connected to
another transformation
If any table name or column name contains a
database reserved word, you can create and
maintain a file, reswords.txt, containing reserved
words.
When the Integration Service initializes a session, it
searches for reswords.txt in the Integration Service
installation directory.
If the file exists, the Integration Service places
quotes around matching reserved words when it
executes SQL against the database.
If you override the SQL, you must enclose any
reserved word in quotes.
When a mapping uses related relational sources, you
can join both sources in one Source Qualifier
transformation.
During the session, the source database performs
the join before passing data to the Integration
Service

Custom Join You might need to override the default join under
the following circumstances:
- Columns do not have a primary key-foreign key
relationship.
- The datatypes of columns used for the join do not
match.
- You want to specify a different type of join, such as
an outer join.
Adding an SQL Query
The Source Qualifier transformation provides the
SQL Query option to override the default query.
You can enter an SQL statement supported by the
source database.
Before entering the query, connect all the input and
output ports you want to use in the mapping.
Entering a User-Defined Join
Entering a user-defined join is similar to entering a
custom SQL query.
However, you only enter the contents of the WHERE
clause, not the entire query.
When you perform an outer join, the Integration
Service may insert the join syntax in the WHERE
clause or the FROM clause of the query, depending
on the database syntax.

101

When you add a user-defined join, the Source


Qualifier transformation includes the setting
in the default SQL query.
However, if you modify the default query after
adding a user-defined join, the Integration
Service uses only the query defined in the
SQL Query property of the Source Qualifier
transformation.
When including a string mapping parameter or
variable, use a string identifier appropriate to the
source system.
For most databases, you need to enclose the name
of a string parameter or variable in single quotes.
Outer Join Support
The Integration Service supports two kinds of outer
joins:
Left - Integration Service returns all rows for the
table to the left of the join syntax and the rows from
both tables that meet the join condition
Right - Integration Service returns all rows for the
table to the right of the join syntax and the rows
from both tables that meet the join condition
Informatica Join Syntax
When you enter join syntax, use the Informatica or
database-specific join syntax.
When you use the Informatica join syntax, the
Integration Service translates the syntax and
passes it to the source database during the session.
Note: Always use database-specific syntax for join
conditions.
When you use Informatica join syntax, enclose the
entire join statement in braces ({Informatica
syntax}).
When you use database syntax, enter syntax
supported by the source database without braces.
Normal Join Syntax
{ source1 INNER JOIN source2 on join_condition }
Left Outer Join Syntax
{ source1 LEFT OUTER JOIN source2 on
join_condition }

If you include the string WHERE or large


objects in the source filter, the Integration
Service fails the session.
The Source Qualifier transformation includes
source filters in the default SQL query.
If, however, if you modify the default query
after adding a source filter, the Integration
Service uses only the query defined in the
SQL query portion of the Source Qualifier
transformation.
You can use a parameter or variable as the source
filter or include parameters and variables within the
source filter
Sorted Ports When you use sorted ports, the Integration
Service adds the ports to the ORDER BY
clause in the default query.
The sorted ports are applied on the connected
ports rather than the ports that start at the
top of the SQ
Use sorted ports for relational sources only.
When using sorted ports, the sort order of the
source database must match the sort order
configured for the session.
To ensure data is sorted as the Integration Service
requires, the database sort order must be the same
as the user-defined session sort order
The Source Qualifier transformation includes the
number of sorted ports in the default SQL query.
However, if you modify the default query after
choosing the Number of Sorted Ports, the
Integration Service uses only the query
defined in the SQL Query property.
Pre and post-session SQL commands You can add pre- and post-session SQL commands
on the Properties tab in the Source Qualifier
transformation
The Integration Service runs pre-session SQL
commands against the source database before it
reads the source.
It runs post-session SQL commands against the
source database after it writes to the target
Guidelines for pre- and post-session SQL
commands in SQ:
- Use any command that is valid for the database
type. However, the Integration Service does not
allow nested comments, even though the database
might.
- You can use parameters and variables in source
pre- and post-session SQL commands or you can use
a parameter or variable as the command. Use any

Right Outer Join Syntax


{ source1 RIGHT OUTER JOIN source2 on
join_condition }
Entering a Source Filter
You can enter a source filter to reduce the number of
rows the Integration Service queries.

102

parameter or variable type that you can define in


the parameter file.
- Use a semicolon (;) to separate multiple
statements. The Integration Service issues a commit
after each statement.
- The Integration Service ignores semicolons
within /*...*/.
- If you need to use a semicolon outside of
comments, you can escape it with a backslash (\).
When you escape the semicolon, the Integration
Service ignores the backslash, and it does not use
the semicolon as a statement separator.
- The Designer does not validate the SQL.

query. You can connect to a database and click the


Validate button to test the SQL. The Designer
displays any errors.
The most common reason a session fails is
because the database login in both the
session and Source Qualifier transformation
is not the table owner. You need to specify
the table owner in the session and when you
generate the SQL Query in the Source
Qualifier transformation.
You can test the SQL Query by cutting and pasting it
into the database client tool (such as Oracle Net) to
see if it returns an error.

Note: You can also enter pre- and post-session SQL


commands on the Properties tab of the target
instance in a mapping

I used a mapping variable in a source filter


and now the session fails.
Try testing the query by generating and validating
the SQL in the Source Qualifier transformation. If
the variable or parameter is a string, you probably
need to enclose it in single quotes. If it is a
datetime variable or parameter, you might need to
change its format for the source system.

Troubleshooting Source Qualifier Transformations


I cannot connect a source definition to a
target definition.
You cannot directly connect sources to targets.
Instead, you need to connect them through a
Source Qualifier transformation for relational and
flat file sources, or through a Normalizer
transformation for COBOL sources.
I cannot connect multiple sources to one
target.
The Designer does not allow you to connect multiple
Source Qualifier transformations to a single target.
There are two workarounds:
Reuse targets - Since target definitions are
reusable, you can add the same target to the
mapping multiple times. Then connect each Source
Qualifier transformation to each target.
Join the sources in a Source Qualifier
transformation. Then remove the WHERE clause
from the SQL query.

Q. SQL TRANSFORMATION
The SQL transformation processes SQL queries
midstream in a pipeline.
You can insert, delete, update, and retrieve rows
from a database.
You can pass the database connection information to
the SQL transformation as input data at run time.
The transformation processes external SQL scripts or
SQL queries that you create in an SQL editor.
The SQL transformation processes the query and
returns rows and database errors
When you create an SQL transformation, you
configure the following options:
Mode - The SQL transformation runs in one of the
following modes:
Script mode: The SQL transformation runs ANSI SQL
scripts that are externally located. You pass a script
name to the transformation with each input row.
The SQL transformation outputs one row for
each input row.
Query mode: The SQL transformation executes a
query that you define in a query editor. You can pass
strings or parameters to the query to define
dynamic queries or change the selection
parameters. You can output multiple rows when
the query has a SELECT statement.

The source has QNAN (not a number) values in


some columns, but the target shows 1.#QNAN.
Operating
systems
have
different
string
representations of NaN. The Integration Service
converts QNAN values to 1.#QNAN on Win64EMT
platforms. 1.#QNAN is a valid representation of
QNAN.
I entered a custom query, but it is not working
when I run the workflow containing the
session.
Be sure to test this setting for the Source Qualifier
transformation before you run the workflow. Reopen
the dialog box in which you entered the custom

Passive or active transformation - The SQL


transformation is an active transformation by

103

default. You can configure it as a passive


transformation when you create the transformation.
Database type - The type of database the SQL
transformation connects to.
Connection type - Pass database connection
information to the SQL transformation or use a
connection object.
Script Mode
An SQL transformation running in script mode runs
SQL scripts from text files.
You pass each script file name from the source to
the SQL transformation ScriptName port.
The script file name contains the complete path to
the script file.
When you configure the transformation to run
in script mode, you create a passive
transformation
The transformation returns one row for each input
row

You can create the following types of SQL queries in


the SQL transformation:
Static SQL query - The query statement does not
change, but you can use query parameters to
change the data. The Integration Service prepares
the query once and runs the query for all input rows.
Dynamic SQL query - You can change the query
statements and the data. The Integration Service
prepares a query for each input row.
Static Query When you create a static query, the Integration
Service prepares the SQL procedure once and
executes it for each row.
When you create a dynamic query, the
Integration Service prepares the SQL for each
input row.
You can optimize performance by creating
static queries
Bind a parameter to an input port - SQL Editor
encloses the name in question marks (?)
When the SQL query contains a SELECT
statement, the output ports must be in the
same order as the columns in the SELECT
statement.

The Integration Service ignores the output of


any SELECT statement you include in the SQL
script.
The SQL transformation in script mode does not
output more than one row of data for each
input row.
You cannot use nested scripts where the SQL
script calls another SQL script.
A script cannot accept run-time arguments
Query Mode It executes an SQL query that you define in the
transformation.
When you configure the SQL transformation to
run in query mode, you create an active
transformation
The transformation can return multiple rows for each
input row.
When you create a query, the SQL Editor
validates the port names in the query.
It also verifies that the ports you use for string
substitution are string datatypes.
The SQL Editor does not validate the syntax of
the SQL query

Dynamic Query To change a query statement, configure a string


variable in the query for the portion of the query
you want to change.
To configure the string variable, identify an input
port by name in the query and enclose the name
with the tilde (~).
The query changes based on the value of the data in
the port.
The transformation input port that contains the
query parameter must be a string datatype.

You can pass the full query or pass part of the query
in an input port:
Full query - You can substitute the entire SQL query
with query statements from source data.
Partial query - You can substitute a portion of the
query statement, such as the table name.
You can add pass-through ports to the SQL
transformation
When the source row contains a SELECT query
statement, the SQL transformation returns the data
in the pass-through port in each row it returns from
the database.

104

If the query result contains multiple rows, the SQL


transformation repeats the pass-through data in
each row

SQL transformation returns a row with NULL data in


the output ports.
- You cannot add the "_output" suffix to output port
names that you create.
- You cannot use the pass-through port to return
data from a SELECT query.
- When the number of output ports is more
than the number of columns in the SELECT
clause, the extra ports receive a NULL value.
- When the number of output ports is less than
the number of columns in the SELECT clause,
the Integration Service generates a row error.
- You can use string substitution instead of
parameter binding in a query. However, the input
ports must be string datatypes.

Passive Mode Configuration


When you create a SQL transformation, you can
configure the SQL transformation to run in passive
mode instead of active mode.
You cannot change the mode after you create the
transformation
Guidelines to configure the SQL
transformation to run in passive mode:
- If a SELECT query returns more than one row,
the Integration Service returns the first row and
an error to the SQLError port. The error states
that the SQL transformation generated multiple
rows.
- If the SQL query has multiple SQL
statements, then the Integration Service
executes all the statements. The Integration
Service returns data for the first SQL
statement only. The SQL transformation returns
one row. The SQLError port contains the errors from
all the SQL statements. When multiple errors occur,
they are separated by semi-colons in the SQLError
port.
- If the SQL query has multiple SQL statements and
a statistics port is enabled, the Integration Service
returns the data and statistics for the first SQL
statement. The SQLError port contains the errors for
all the SQL statements.
Guidelines to configure the SQL
transformation to run in query mode:
- The number and the order of the output ports
must match the number and order of the
fields in the query SELECT clause.
- The native datatype of an output port in the
transformation must match the datatype of the
corresponding column in the database. The
Integration Service generates a row error when the
datatypes do not match.
- When the SQL query contains an INSERT,
UPDATE, or DELETE clause, the transformation
returns data to the SQLError port, the passthrough ports, and the NumRowsAffected port
when it is enabled. If you add output ports the
ports receive NULL data values.
- When the SQL query contains a SELECT statement
and the transformation has a pass-through port, the
transformation returns data to the pass-through port
whether or not the query returns database data. The

Ways to connect the SQL transformation to a


database:
Static connection - Configure the connection object
in the session. You must first create the connection
object in Workflow Manager.
Logical connection - Pass a connection name to the
SQL transformation as input data at run time. You
must first create the connection object in Workflow
Manager.
Full database connection - Pass the connect string,
user name, password, and other connection
information to the SQL transformation input ports at
run time
Note: If a session has multiple partitions, the SQL
transformation creates a separate database
connection for each partition.
The following transaction control SQL statements are
not valid with the SQL transformation:
SAVEPOINT - Identifies a rollback point in the
transaction.
SET TRANSACTION - Changes transaction options.
When you have high availability, the SQL
transformation provides database connection
resiliency for static and dynamic connections. When
the Integration Service fails to connect to the
database, it retries the connection.
You can configure the connection retry period for a
connection
When the Integration Service cannot connect to the
database in the time period that you configure, it
generates a row error for a dynamic connection or
fails the session for a static connection.
Database Deadlock Resiliency

105

The SQL transformation is resilient to database


deadlock errors when you enable the Session Retry
on Deadlock session property.
The SQL transformation is resilient to
database deadlock errors in Query mode but
it is not resilient to deadlock errors in Script
mode.
If a deadlock occurs in Query mode, the Integration
Service tries to reconnect to the database for the
number of deadlock retries that you configure.
When a deadlock occurs, the Integration Service
retries the SQL statements in the current row if the
current row has no DML statements.
If the row contains a DML statement such as INSERT,
UPDATE, or DELETE, the Integration Service does
not process the current row again

The stored procedure issues a status code that


notifies whether or not the stored procedure
completed successfully.
You cannot see this value.
The Integration Service uses it to determine whether
to continue running the session or stop
Connected - The flow of data through a mapping in
connected mode also passes through the Stored
Procedure transformation. All data entering the
transformation through the input ports affects the
stored procedure. You should use a connected
Stored Procedure transformation when you
need data from an input port sent as an input
parameter to the stored procedure, or the
results of a stored procedure sent as an
output parameter to another transformation.

For a dynamic connection, if the retry attempt


fails, the Integration Service returns an error in the
SQLError port. The Integration Service processes the
next statement based on the Continue on SQL Error
within Row property. If the property is disabled, the
Integration Service skips the current row. If the
current row contains a DML statement such as
INSERT, UPDATE, or DELETE, the Integration Service
increments the error counts.

Unconnected - The unconnected Stored Procedure


transformation is not connected directly to the flow
of the mapping. It either runs before or after the
session, or is called by an expression in
another transformation in the mapping.

For a static connection, if the retry attempts fail,


the Integration Service returns an error in the
SQLError port. If the current row contains a DML
statement, then the Integration Service fails the
session. The Integration Service processes the next
statement based on Continue on SQL Error within a
Row property. If the property is disabled the
Integration Service skips the current row.
<Print - 361 to 364>

R. STORED PROCEDURE
There are three types of data that pass between the
Integration Service and the stored procedure:
- Input/output parameters
- Return values
- Status codes

Specifying when the Stored Procedure Runs


The following list describes the options for running a
Stored Procedure transformation:
Normal - The stored procedure runs where the
transformation exists in the mapping on a row-byrow basis. This is useful for calling the stored
procedure for each row of data that passes through
the mapping, such as running a calculation against
an input port. Connected stored procedures run only
in normal mode.
Pre-load of the Source - Before the session
retrieves data from the source, the stored procedure
runs. This is useful for verifying the existence of

If a stored procedure returns a result set rather than


a single return value, the Stored Procedure
transformation takes only the first value returned
from the procedure
Status Codes
Status codes provide error handling
Integration Service during a workflow.

for

the

106

tables or performing joins of data in a temporary


table.
Post-load of the Source - After the session
retrieves data from the source, the stored procedure
runs. This is useful for removing temporary tables.
Pre-load of the Target - Before the session sends
data to the target, the stored procedure runs. This is
useful for verifying target tables or disk space on the
target system.
Post-load of the Target - After the session sends
data to the target, the stored procedure runs. This is
useful for re-creating indexes on the database.
You can run more than one Stored Procedure
transformation in different modes in the same
mapping.
For example, a pre-load source stored procedure can
check table integrity, a normal stored procedure
can populate the table, and a post-load stored
procedure can rebuild indexes in the database.
However, you cannot run the same instance of
a Stored Procedure transformation in both
connected and unconnected mode in a
mapping.
You must create different instances of the
transformation
If the mapping calls more than one source or target
pre- or post-load stored procedure in a mapping,
the Integration Service executes the stored
procedures in the execution order that you specify
in the mapping

If you run Stored Procedure C before Stored


Procedure B, using another database connection,
Stored Procedure B cannot commit the transaction
because the Integration Service closes the
database connection when it runs Stored Procedure
C.
Use the following guidelines to run multiple stored
procedures within a database connection:
- The stored procedures use the same database
connect string defined in the stored procedure
properties.
- You set the stored procedures to run in consecutive
order.
- The stored procedures have the same stored
procedure type:
- Source pre-load
- Source post-load
- Target pre-load
- Target post-load
Creating a Stored Procedure Transformation
After you configure and test a stored procedure in
the database, you must create the Stored
Procedure transformation in the Mapping Designer.
There are two ways to configure the Stored
Procedure transformation:
- Use the Import Stored Procedure dialog box to
configure the ports used by the stored procedure.
- Configure the transformation manually,
creating the appropriate ports for any input or
output parameters.

The Integration Service opens the database


connection when it encounters the first
stored procedure.
The database connection remains open until
the Integration Service finishes processing all
stored procedures for that connection.
The Integration Service closes the database
connections and opens a new one when it
encounters a stored procedure using a
different database connection.

Stored Procedure transformations are created as


Normal type by default, which means that they run
during the mapping, not before or after the session.
New Stored Procedure transformations are not
created as reusable transformations.
To create a reusable transformation, click Make
Reusable in the Transformation properties after
creating the transformation.
Note: Configure the properties of reusable
transformations in the Transformation Developer,
not the Mapping Designer, to make changes
globally for the transformation.

To run multiple stored procedures that use the


same database connection, set these stored
procedures to run consecutively.
If you do not set them to run consecutively,
you might have unexpected results in the
target.
For example, you have two stored procedures:
Stored Procedure A and Stored Procedure B. Stored
Procedure A begins a transaction, and Stored
Procedure B commits the transaction.

Importing Stored Procedures


When you import a stored procedure, the Designer
creates ports based on the stored procedure input
and output parameters.
You should import the stored procedure whenever
possible.
There are three ways to import a stored procedure in
the Mapping Designer:

107

- Select the stored procedure icon and add a


Stored Procedure transformation.
- Click Transformation > Import Stored
Procedure.
- Click Transformation > Create, and then
select Stored Procedure.
When you import a stored procedure containing a
period (.) in the stored procedure name, the
Designer substitutes an underscore (_) for the
period in the Stored Procedure transformation
name.
Manually Creating Stored Procedure Transformations
To create a Stored Procedure transformation
manually, you need to know the input parameters,
output parameters, and return values of the stored
procedure, if there are any.
You must also know the datatypes of those
parameters, and the name of the stored procedure.
All these are configured through Import Stored
Procedure.
To create a Stored Procedure transformation, In the
Mapping Designer, click Transformation > Create,
and then select Stored Procedure

method of returning the value of output


parameters to a port. Use one of the
following methods to capture the output
values:
- Assign the output value to a local variable.
- Assign the output value to the system
variable PROC_RESULT.
By using PROC_RESULT, you assign the value
of the return parameter directly to an output
port, which can apply directly to a target.
You can also combine the two options by
assigning
one
output
parameter
as
PROC_RESULT, and the other parameter as a
variable.
Use PROC_RESULT only within an expression.
If you do not use PROC_RESULT or a variable, the
port containing the expression captures a NULL.
You cannot use PROC_RESULT in a connected
Lookup transformation or within the Call Text
for a Stored Procedure transformation
Expression Rules
- A single output parameter is returned using the
variable PROC_RESULT.
- When you use a stored procedure in an expression,
use the :SP reference qualifier. To avoid typing
errors, select the Stored Procedure node in the
Expression Editor, and double-click the name of the
stored procedure.
- However, the same instance of a Stored Procedure
transformation cannot run in both connected and
unconnected mode in a mapping. You must create
different instances of the transformation.
- The input/output parameters in the expression
must match the input/output ports in the Stored
Procedure transformation. If the stored procedure
has an input parameter, there must also be an input
port in the Stored Procedure transformation.
- When you write an expression that includes a
stored procedure, list the parameters in the same
order that they appear in the stored procedure and
the Stored Procedure transformation.
- The parameters in the expression must include all
of the parameters in the Stored Procedure
transformation. You cannot leave out an input
parameter. If necessary, pass a dummy variable to
the stored procedure.
- The arguments in the expression must be the same
datatype and precision as those in the Stored
Procedure transformation.
- Use PROC_RESULT to apply the output parameter
of a stored procedure expression directly to a target.

<Print 384-385>
Changing the Stored Procedure
If the number of parameters or the return value in a
stored procedure changes, you can either re-import
it or edit the Stored Procedure transformation
manually.
The Designer does not verify the Stored
Procedure transformation each time you open
the mapping.
After you import or create the transformation, the
Designer does not validate the stored procedure.
The session fails if the stored procedure does not
match the transformation.
Configuring an Unconnected Transformation
An unconnected Stored Procedure transformation is
not directly connected to the flow of data through
the mapping.
Instead, the stored procedure runs either:
From an expression - Called from an expression
written in the Expression Editor within another
transformation in the mapping.
Pre- or post-session - Runs before or after a
session
When using an unconnected Stored Procedure
transformation in an expression, you need a

108

You cannot use a variable for the output parameter


to pass the results directly to a target. Use a local
variable to pass the results to an output port within
the same transformation.
- Nested stored procedures allow passing the return
value of one stored procedure as the input
parameter of another stored procedure. For
example, if you have the following two stored
procedures:
- get_employee_id (employee_name)
- get_employee_salary (employee_id)
And the return value for get_employee_id is an
employee ID number, the syntax for a nested stored
procedure is:
:sp.get_employee_salary (:sp.get_employee_id
(employee_name))
You can have multiple levels of nested stored
procedures.
- Do not use single quotes around string parameters.
If the input parameter does not contain spaces, do
not use any quotes. If the input parameter contains
spaces, use double quotes.
Tips for Stored Procedure Transformations
Do not run unnecessary instances of stored
procedures.
Each time a stored procedure runs during a
mapping, the session must wait for the stored
procedure to complete in the database. You have
two possible options to avoid this:
Reduce the row count - Use an active
transformation prior to the Stored Procedure
transformation to reduce the number of rows that
must be passed the stored procedure. Or, create an
expression that tests the values before passing
them to the stored procedure to make sure that the
value does not really need to be passed.
Create an expression - Most of the logic used in
stored procedures can be easily replicated using
expressions in the Designer.

Troubleshooting Stored Procedures


The session did not have errors before, but
now it fails on the stored procedure.
The most common reason for problems with a
Stored Procedure transformation results from
changes made to the stored procedure in the
database. If the input/output parameters or return
value changes in a stored procedure, the Stored
Procedure transformation becomes invalid. You
must either import the stored procedure again, or

manually configure the stored procedure to add,


remove, or modify the appropriate ports.
The session has been invalidated since I last
edited the mapping. Why?
Any changes you make to the Stored Procedure
transformation may invalidate the session. The
most common reason is that you have changed the
type of stored procedure, such as from a Normal to
a Post-load Source type.

S. TRANSACTION CONTROL
A transaction is the set of rows bound by commit or
roll back rows. You can define a transaction based
on a varying number of input rows. You might want
to define transactions based on a group of rows
ordered on a common key, such as employee ID or
order entry date.
In PowerCenter, you define transaction control at the
following levels:
Within a mapping - Within a mapping, you use the
Transaction Control transformation to define a
transaction. You define transactions using an
expression in a Transaction Control transformation.
Based on the return value of the expression, you can
choose to commit, roll back, or continue without any
transaction changes.
Within a session - When you configure a session,
you configure it for user-defined commit. You can
choose to commit or roll back a transaction if the
Integration Service fails to transform or write any
row to the target.
If the mapping has a flat file target you can
generate an output file each time the
Integration Service starts a new transaction.
You can dynamically name each target flat
file.
Transaction Control Transformation Properties
Use the Transaction Control transformation to define
conditions to commit and roll back transactions
from transactional targets.
Transactional targets include relational, XML,
and dynamic MQSeries targets
The transaction control expression uses the IIF
function to test each row against the condition.
Use the following syntax for the expression:
IIF (condition, value1, value2)
The Integration Service evaluates the condition on a
row-by-row basis.

109

The return value determines whether the Integration


Service commits, rolls back, or makes no
transaction changes to the row.
When the Integration Service issues a commit or roll
back based on the return value of the expression, it
begins a new transaction.
Use the following built-in variables in the Expression
Editor when you create a transaction control
expression:
TC_CONTINUE_TRANSACTION - The Integration
Service does not perform any transaction change for
this row. This is the default value of the expression.
TC_COMMIT_BEFORE - The Integration Service
commits the transaction, begins a new transaction,
and writes the current row to the target. The current
row is in the new transaction.
TC_COMMIT_AFTER - The Integration Service
writes the current row to the target, commits the
transaction, and begins a new transaction. The
current row is in the committed transaction.
TC_ROLLBACK_BEFORE - The Integration Service
rolls back the current transaction, begin a new
transaction, and write the current row to the target.
The current row is in the new transaction.
TC_ROLLBACK_AFTER - The Integration Service
writes the current row to the target, rolls back the
transaction, and begins a new transaction. The
current row is in the rolled back transaction.
If the transaction control expression evaluates
to a value other than commit, roll back, or
continue, the Integration Service fails the
session.

Using Transaction Control Transformations in


Mappings
Transaction Control transformations are transaction
generators.
They define and redefine transaction boundaries in a
mapping.
They drop any incoming transaction boundary from
an upstream active source or transaction
generator, and they generate new transaction
boundaries downstream.
You can also use Custom transformations configured
to generate transactions to define transaction
boundaries.

Transaction Control transformations can be effective


or ineffective for the downstream transformations
and targets in the mapping.
The Transaction Control transformation becomes
ineffective for downstream transformations or

targets if you put a transformation that drops


incoming transaction boundaries after it.
This includes any of the following active
sources or transformations:
- Aggregator with the All Input level transformation
scope
- Joiner with the All Input level transformation scope
- Rank with the All Input level transformation scope
- Sorter with the All Input level transformation scope
- Custom with the All Input level transformation
scope
- Custom transformation configured to generate
transactions
- Transaction Control transformation
- A multiple input group transformation, such as a
Custom transformation, connected to multiple
upstream transaction control points
Mappings with Transaction Control transformations
that are ineffective for targets may be valid or
invalid.
When you save or validate the mapping, the
Designer displays a message indicating which
Transaction Control transformations are ineffective
for targets.
Although a Transaction Control transformation may
be ineffective for a target, it can be effective for
downstream transformations.
Downstream transformations with the Transaction
level transformation scope can use the transaction
boundaries defined by an upstream Transaction
Control transformation.
The following figure shows a valid mapping with a
Transaction Control transformation that is effective
for a Sorter transformation, but ineffective for the
target:

In this example, TCT1 transformation is ineffective


for the target, but effective for the Sorter
transformation.
The Sorter transformations Transformation Scope
property is Transaction. It uses the transaction
boundaries defined by TCT1.
The Aggregator Transformation Scope property is All
Input.
It drops transaction boundaries defined by TCT1.
The TCT2 transformation is an effective Transaction
Control transformation for the target.
Mapping Guidelines and Validation

110

- If the mapping includes an XML target, and you


choose to append or create a new document on
commit, the input groups must receive data from
the same transaction control point.
- Transaction Control transformations
connected to any target other than relational,
XML, or dynamic MQSeries targets are
ineffective for those targets.
- You must connect each target instance to a
Transaction Control transformation you can connect
multiple targets to a single Transaction Control
transformation.
- You can connect only one effective Transaction
Control transformation to a target.
- You cannot place a Transaction Control
transformation in a pipeline branch that starts
with a Sequence Generator transformation.
- If you use a dynamic Lookup transformation and a
Transaction Control transformation in the same
mapping, a rolled-back transaction might result in
unsynchronized target data.
- A Transaction Control transformation may be
effective for one target and ineffective for another
target. If each target is connected to an effective
Transaction Control transformation, the mapping is
valid.
- Either all targets or none of the targets in the
mapping should be connected to an effective
Transaction Control transformation.

T. UNION
The Integration Service processes all input
groups in parallel.
It concurrently reads sources connected to the Union
transformation and pushes blocks of data into the
input groups of the transformation.
You can connect heterogeneous sources to a Union
transformation.
The transformation merges sources with matching
ports and outputs the data from one output group
with the same ports as the input groups.
The Union transformation is developed using the
Custom transformation
Similar to the UNION ALL statement, the Union
transformation does not remove duplicate rows
Rules and Guidelines for Union - You can create multiple input groups, but only one
output group.
- All input groups and the output group must have
matching ports. The precision, datatype, and scale
must be identical across all groups.
- The Union transformation does not remove
duplicate rows. To remove duplicate rows, you must

add another transformation such as a Router or


Filter transformation.
- You cannot use a Sequence Generator or
Update Strategy transformation upstream
from a Union transformation.
- The Union transformation does not generate
transactions
When a Union transformation in a mapping
receives data from a single transaction
generator, the Integration Service propagates
transaction boundaries.
When the transformation receives data from
multiple
transaction
generators,
the
Integration Service drops all incoming
transaction boundaries and outputs rows in
an open transaction.

U. UPDATE STRATEGY
In PowerCenter, you set the update strategy at two
different levels:
Within a session - When you configure a session,
you can instruct the Integration Service to either
treat all rows in the same way (for example, treat all
rows as inserts), or use instructions coded into the
session mapping to flag rows for different database
operations.
Within a mapping - Within a mapping, you use the
Update Strategy transformation to flag rows for
insert, delete, update, or reject.
Note: You can also use the Custom transformation
to flag rows for insert, delete, update, or reject
Flagging Rows Within a Mapping
For the greatest degree of control over the update
strategy, you add Update Strategy transformations
to a mapping.
The most important feature of this transformation is
its update strategy expression, used to flag
individual rows for insert, delete, update, or reject.
The following table lists the constants for each
database operation and their numeric equivalent:

Forwarding Rejected Rows


You
can
configure
the
Update
Strategy
transformation to either pass rejected rows to the
next transformation or drop them.

111

By default, the Integration Service forwards rejected


rows to the next transformation.
The Integration Service flags the rows for reject and
writes them to the session reject file.
If you do not select Forward Rejected Rows,
the Integration Service drops rejected rows
and writes them to the session log file.
If you enable row error handling, the
Integration Service writes the rejected rows
and the dropped rows to the row error logs.
It does not generate a reject file. If you want to write
the dropped rows to the session log in addition to
the row error logs, you can enable verbose data
tracing.
Update strategy expression uses the IIF or DECODE
function from the transformation language to test
each row to see if it meets a particular condition

Aggregator and Update Strategy Transformations


When you connect Aggregator and Update Strategy
transformations as part of the same pipeline, you
have the following options:
Position the Aggregator before the Update
Strategy transformation In this case, you perform the aggregate calculation,
and then use the Update Strategy transformation to
flag rows that contain the results of this calculation
for insert, delete, or update.
Position the Aggregator after the Update
Strategy transformation Here, you flag rows for insert, delete, update, or
reject before
you
perform
the
aggregate
calculation.
How you flag a particular row determines how the
Aggregator transformation treats any values in that
row used in the calculation.
For example, if you flag a row for delete and
then later use the row to calculate the sum,
the Integration Service subtracts the value
appearing in this row.
If the row had been flagged for insert, the
Integration Service would add its value to the
sum.
Lookup and Update Strategy Transformations
When you create a mapping with a Lookup
transformation that uses a dynamic lookup
cache, you must use Update Strategy
transformations to flag the rows for the
target tables.

When you configure a session using Update Strategy


transformations and a dynamic lookup cache, you
must define certain session properties.
You must define the Treat Source Rows As option as
Data Driven.
Specify this option on the Properties tab in the
session properties.
You must also define the following update strategy
target table options:
- Select Insert
- Select Update as Update
- Do not select Delete
These update strategy target table options ensure
that the Integration Service updates rows marked
for update and inserts rows marked for insert.
If you do not choose Data Driven, the Integration
Service flags all rows for the database operation
you specify in the Treat Source Rows As option and
does not use the Update Strategy transformations
in the mapping to flag the rows.
The Integration Service does not insert and update
the correct rows.
If you do not choose Update as Update, the
Integration Service does not correctly update the
rows flagged for update in the target table.
As a result, the lookup cache and target table might
become unsynchronized.
Only perform inserts into a target table.
When you configure the session, select Insert for the
Treat Source Rows As session property. Also, make
sure that you select the Insert option for all target
instances in the session.
Delete all rows in a target table.
When you configure the session, select Delete for
the Treat Source Rows As session property. Also,
make sure that you select the Delete option for all
target instances in the session.
Only perform updates on the contents of a
target table.
When you configure the session, select Update for
the Treat Source Rows As session property. When
you configure the update options for each target
table instance, make sure you select the Update
option for each target instance.
Perform different database operations with
different rows destined for the same target
table.
Add an Update Strategy transformation to the
mapping. When you write the transformation
update strategy expression, use either the DECODE
or IIF function to flag rows for different operations
(insert, delete, update, or reject). When you
configure a session that uses this mapping, select
Data Driven for the Treat Source Rows As session

112

property. Make sure that you select the Insert,


Delete, or one of the Update options for each target
table instance.
Reject data.
Add an Update Strategy transformation to the
mapping. When you write the transformation
update strategy expression, use DECODE or IIF to
specify the criteria for rejecting the row. When you
configure a session that uses this mapping, select
Data Driven for the Treat Source Rows As session
property.

V. XML SOURCE QUALIFIER


An XML Source Qualifier transformation always has
one input or output port for every column in the
XML source.
When you create an XML Source Qualifier
transformation for a source definition, the Designer
links each port in the XML source definition to a
port in the XML Source Qualifier transformation.
You cannot remove or edit any of the links.
If you remove an XML source definition from a
mapping, the Designer also removes the
corresponding XML Source Qualifier transformation.
You can link one XML source definition to one XML
Source Qualifier transformation

X. XML GENERATOR
Use an XML Generator transformation to create XML
inside a pipeline.
The XML Generator transformation lets you read
data from messaging systems, such as TIBCO and
MQ Series, or from other sources, such as files or
databases.
The XML Generator transformation functionality is
similar to the XML target functionality, except it
generates the XML in the pipeline.
For example, you might want to extract data from
relational sources and pass XML data to targets.
The XML Generator transformation accepts data
from multiple ports and writes XML through a single
output port.

You can link ports of one XML Source Qualifier group


to ports of different transformations to form
separate data flows.
However, you cannot link ports from more than one
group in an XML Source Qualifier transformation to
ports in the same target transformation

W. XML PARSER
Use an XML Parser transformation to extract XML
inside a pipeline.
The XML Parser transformation lets you extract XML
data from messaging systems, such as TIBCO or
MQ Series, and from other sources, such as files or
databases.
The XML Parser transformation functionality is
similar to the XML source functionality, except it
parses the XML in the pipeline.
For example, you might want to extract XML data
from a TIBCO source and pass the data to relational
targets.
The XML Parser transformation reads XML data from
a single input port and writes data to one or more
output ports.

7. TRANSFORMATION LANGUAGE
REFERENCE

113

- Except for literals, the Designer and PowerCenter


Integration Service ignore spaces.
- The colon (:), comma (,), and period (.) have
special meaning and should be used only to specify
syntax.
- The PowerCenter Integration Service treats a dash
(-) as a minus operator.
- If you pass a literal value to a function, enclose
literal strings within single quotation marks. Do not
use quotation marks for literal numbers. The
PowerCenter Integration Service treats any string
value enclosed in single quotation marks as a
character string.
- When you pass a mapping parameter or variable or
a workflow variable to a function within an
expression, do not use quotation marks to designate
mapping parameters or variables or workflow
variables.
- Do not use quotation marks to designate ports.
- You can nest multiple functions within an
expression except aggregate functions, which allow
only
one
nested
aggregate
function.
The
PowerCenter Integration Service evaluates the
expression starting with the innermost function.

Rules and Guidelines for Expression Syntax


Use the following rules and guidelines when you
write expressions:
- You cannot include both single-level and
nested aggregate functions in an Aggregator
transformation.
- If you need to create both single-level and nested
functions,
create
separate
Aggregator
transformations.
- You cannot use strings in numeric expressions. For
example, the expression 1 + '1' is not valid because
you can only perform addition on numeric
datatypes. You cannot add an integer and a string.
- You cannot use strings as numeric parameters. For
example, the expression SUBSTR (TEXT_VAL, '1', 10)
is not valid because the SUBSTR function requires an
integer value, not a string, as the start position.
- You cannot mix datatypes when using comparison
operators. For example, the expression 123.4 =
'123.4' is not valid because it compares a decimal
value with a string.
- You can pass a value from a port, literal string or
number, variable, Lookup transformation, Stored
Procedure
transformation,
External
Procedure
transformation, or the results of another expression.
- Use the ports tab in the Expression Editor to enter
a port name into an expression. If you rename a port
in a connected transformation, the Designer
propagates the name change to expressions in the
transformation.
- Separate each argument in a function with a
comma.
- Except for literals, the transformation language is
not case sensitive.

Reserved Words
Some keywords in the transformation language,
such as constants, operators, and built-in variables,
are reserved for specific functions. These include:
- :EXT
- :INFA
- :LKP
- :MCR
- :SD
- :SEQ
- :SP
- :TD
- AND
- DD_DELETE
- DD_INSERT
- DD_REJECT
- DD_UPDATE
- FALSE
- NOT
- NULL
- OR
- PROC_RESULT
- SESSSTARTTIME
- SPOUTPUT
- SYSDATE
- TRUE
- WORKFLOWSTARTTIME
The following words are reserved for workflow
expressions:

114

ABORTED
DISABLED
FAILED
NOTSTARTED
STARTED
STOPPED
SUCCEEDED

Note: You cannot use a reserved word to name a


port or local variable. You can only use reserved
words
within
transformation
and
workflow
expressions. Reserved words have predefined
meanings in expressions
Working with Null Values in Boolean Expressions
Expressions that combine a null value with a
Boolean expression produce results that are ANSI
compliant.
For example, the PowerCenter Integration Service
produces the following results:
- NULL AND TRUE = NULL
- NULL AND FALSE = FALSE
Working with Null Values in Comparison Expressions
When you use a null value in an expression
containing a comparison operator, the PowerCenter
Integration Service produces a null value.
However, you can also configure the PowerCenter
Integration Service to treat null values as high or
low in comparison operations.
Use the Treat Null In Comparison Operators As
property to configure how the PowerCenter
Integration Service handles null values in
comparison expressions.
This PowerCenter Integration Service configuration
property affects the behavior of the following
comparison operators in expressions:
=, !=, ^=, <>, >, >=, <, <=
For example, consider the following expressions:
NULL > 1
NULL = NULL

Transaction Control Variables


The following example uses transaction control
variables to determine where to process a row:
IIF
(NEWTRAN=1,
TC_CONTINUE_TRANSACTION)

TC_COMMIT_BEFORE,

If NEWTRAN=1, the TC_COMMIT_BEFORE variable


causes a commit to occur before the current row
processes.
Otherwise,
the
TC_CONTINUE_TRANSACTION
variable forces the row to process in the current
transaction.
Use the following variables in the Expression Editor
when you create a transaction control expression:
TC_CONTINUE_TRANSACTION - The PowerCenter
Integration Service does not perform any transaction
change for the current row. This is the default
transaction control variable value.
TC_COMMIT_BEFORE
The
PowerCenter
Integration Service commits the transaction, begins
a new transaction, and writes the current row to the
target. The current row is in the new transaction.
TC_COMMIT_AFTER - The PowerCenter Integration
Service writes the current row to the target, commits
the transaction, and begins a new transaction. The
current row is in the committed transaction.
TC_ROLLBACK_BEFORE
The
PowerCenter
Integration
Service
rolls
back
the
current
transaction, begin a new transaction, and write the
current row to the target. The current row is in the
new transaction.
DATES
Date functions accept datetime values only. To pass
a string to a date function, first use TO_DATE to
convert it to a datetime value.
For example, the following expression converts a
string port to datetime values and then adds one
month to each date:
ADD_TO_DATE(
TO_DATE(
'MM/DD/RR'), 'MM', 1 )

STRING_PORT,

You can use dates between 1 A.D. and 9999 A.D in


the Gregorian calendar system.
Julian Day, Modified Julian Day, and the Gregorian
calendar
You can use dates in the Gregorian calendar system
only.
Dates in the Julian calendar are called Julian dates
and are not supported in Informatica.
This term should not be confused with Julian Day or
with Modified Julian Day.

115

You can manipulate Modified Julian Day (MJD)


formats using the J format string.
The MJD for a given date is the number of days to
that date since Jan 1 4713 B.C. 00:00:00
(midnight).
By definition, MJD includes a time component
expressed as a decimal, which represents some
fraction of 24 hours.
The J format string does not convert this time
component.

century plus the two-digit year from the source


string. If the source string year is between 50 and
99, the PowerCenter Integration Service returns the
current century plus the specified two-digit year
The following table summarizes how the RR format
string converts to dates:

For example, the following TO_DATE expression


converts strings in the SHIP_DATE_MJD_STRING port
to date values in the default date format:
TO_DATE (SHIP_DATE_MJD_STR, 'J')
SHIP_DATE_MJD_STR RETURN_VALUE
2451544
Dec
31
1999
00:00:00.000000000
2415021
Jan
1
1900
00:00:00.000000000
Because the J format string does not include the
time portion of a date, the return values have the
time set to 00:00:00.000000000.
You can also use the J format string in TO_CHAR
expressions. For example, use the J format string in
a TO_CHAR expression to convert date values to
MJD values expressed as strings. For example:
TO_CHAR(SHIP_DATE, 'J')

Example
The following expression produces the same return
values for any current year between 1950 and 2049:
TO_DATE( ORDER_DATE, 'MM/DD/RR' )
ORDER_DATE RETURN_VALUE
'04/12/98'
04/12/1998 00:00:00.000000000
'11/09/01'
11/09/2001 00:00:00.000000000

SHIP_DATE
RETURN_VALUE
Dec 31 1999 23:59:59
2451544
Jan 1 1900 01:02:03 2415021
RR FORMAT STRING
The transformation language provides the RR format
string to convert strings with two-digit years to
dates.
Using TO_DATE and the RR format string, you can
convert a string in the format MM/DD/RR to a date.
The RR format string converts data differently
depending on the current year.
Current Year Between 0 and 49 - If the current year
is between 0 and 49 (such as 2003) and the source
string year is between 0 and 49, the PowerCenter
Integration Service returns the current century plus
the two-digit year from the source string. If the
source string year is between 50 and 99, the
Integration Service returns the previous century
plus the two-digit year from the source string.
Current Year Between 50 and 99 - If the current year
is between 50 and 99 (such as 1998) and the
source string year is between 0 and 49, the
PowerCenter Integration Service returns the next

DIFFERENCE BETWEEN THE YY AND RR FORMAT


STRINGS
PowerCenter also provides a YY format string.
Both the RR and YY format strings specify two-digit
years.
The YY and RR format strings produce
identical results when used with all date
functions except TO_DATE.
In TO_DATE expressions, RR and YY produce
different results.
The following table shows the different results each
format string returns:

For dates in the year 2000 and beyond, the YY


format string produces less meaningful results than
the RR format string. Use the RR format string for
dates in the twenty-first century
Default Date Format
By default, the date
HH24:MI:SS.US.

116

format

is

MM/DD/YYYY

Note: The format string is not case sensitive. It


must always be enclosed within single quotation
marks.
The following table describes date functions that use
date format strings to evaluate input dates:

TO_CHAR Format Strings


TO_CHAR is generally used when the target is a flat
file or a database that does not support a
Date/Time datatype.
You can convert the entire date or a part of the date
to a string.
The following table summarizes the format strings
for dates in the function TO_CHAR:

TO_DATE and IS_DATE Format Strings


The TO_DATE function converts a string with the
format you specify to a datetime value.
TO_DATE is generally used to convert strings from
flat files to datetime values.

117

Note: TO_DATE and IS_DATE use the same set of


format strings.
The source string format and the format string must
match, including any date separator.
If any part does not match, the PowerCenter
Integration Service does not convert the string, and
it skips the row.
If you omit the format string, the source string must
be in the date format specified in the session
The following table summarizes the format strings
for the functions TO_DATE and IS_DATE:
<Same as the TO_CHAR formats>

Adds a specified amount to one part of a datetime


value, and returns a date in the same format as the
date you pass to the function.
ADD_TO_DATE accepts positive and negative
integer values. Use ADD_TO_DATE to change the
following parts of a date:
Year - Enter a positive or negative integer in the
amount argument. Use any of the year format
strings: Y, YY, YYY, or YYYY. The following expression
adds 10 years to all dates in the SHIP_DATE port:
ADD_TO_DATE ( SHIP_DATE, 'YY', 10 )
Month - Enter a positive or negative integer in the
amount argument. Use any of the month format
strings: MM, MON, MONTH. The following expression
subtracts 10 months from each date in the
SHIP_DATE
port:
ADD_TO_DATE(
SHIP_DATE,
'MONTH', -10 )
Day - Enter a positive or negative integer in the
amount argument. Use any of the day format
strings: D, DD, DDD, DY, and DAY. The following
expression adds 10 days to each date in the
SHIP_DATE port: ADD_TO_DATE( SHIP_DATE, 'DD', 10
)
Hour - Enter a positive or negative integer in the
amount argument. Use any of the hour format
strings: HH, HH12, HH24. The following expression
adds 14 hours to each date in the SHIP_DATE port:
ADD_TO_DATE( SHIP_DATE, 'HH', 14 )
Minute - Enter a positive or negative integer in the
amount argument. Use the MI format string to set
the minute. The following expression adds 25
minutes to each date in the SHIP_DATE port:
ADD_TO_DATE( SHIP_DATE, 'MI', 25 )
Seconds - Enter a positive or negative integer in
the amount argument. Use the SS format string to
set the second. The following expression adds 59
seconds to each date in the SHIP_DATE port:
ADD_TO_DATE( SHIP_DATE, 'SS', 59 )
Milliseconds - Enter a positive or negative integer
in the amount argument. Use the MS format string
to set the milliseconds. The following expression
adds 125 milliseconds to each date in the SHIP_DATE
port: ADD_TO_DATE( SHIP_DATE, 'MS', 125 )
Microseconds - Enter a positive or negative integer
in the amount argument. Use the US format string to
set the microseconds. The following expression adds
2,000 microseconds to each date in the SHIP_DATE
port: ADD_TO_DATE( SHIP_DATE, 'US', 2000 )
Nanoseconds - Enter a positive or negative integer
in the amount argument. Use the NS format string to
set the nanoseconds. The following expression adds
3,000,000 nanoseconds to each date in the

RULES AND GUIDELINES FOR DATE FORMAT


STRINGS
The format of the TO_DATE string must match the
format string including any date separators. If it
does not, the PowerCenter Integration Service
might return inaccurate values or skip the row.
The format string must be enclosed within single
quotation marks
ABORT
When
the
PowerCenter
Integration
Service
encounters
an
ABORT
function,
it
stops
transforming data at that row.
It processes any rows read before the session aborts
and loads them based on the source- or targetbased commit interval and the buffer block size
defined for the session.
The PowerCenter Integration Service writes to the
target up to the aborted row and then rolls back all
uncommitted data to the last commit point.
You can perform recovery on the session after
rollback
If you use ABORT in an expression for an
unconnected
port,
the
PowerCenter
Integration Service does not run the ABORT
function.
Syntax
ABORT( string )
Return Value
NULL.
ABS
Returns the absolute value of a numeric value.
Syntax
ABS( numeric_value )
Return Value
Positive numeric value
ADD_TO_DATE

118

SHIP_DATE port: ADD_TO_DATE( SHIP_DATE, 'NS',


3000000 )
If you pass a value that creates a day that does not
exist in a particular month, the PowerCenter
Integration Service returns the last day of the
month.
For example, if you add one month to Jan 31
1998, the PowerCenter Integration Service
returns Feb 28 1998.
ASCII
The ASCII function returns the numeric ASCII value
of the first character of the string passed to the
function
You can pass a string of any size to ASCII, but it
evaluates only the first character in the
string
This function is identical in behavior to the
CHRCODE function.
If you use ASCII in existing expressions, they will still
work correctly. However, when you create new
expressions, use the CHRCODE function instead of
the ASCII function.
AVG
Returns the average of all values in a group of rows.
Optionally, you can apply a filter to limit the rows
you read to calculate the average.
You can nest only one other aggregate function
within AVG, and the nested function must return a
Numeric datatype.
Syntax
AVG( numeric_value [, filter_condition ] )
If a value is NULL, AVG ignores the row.
However, if all values passed from the port
are NULL, AVG returns NULL
AVG groups values based on group by ports you
define in the transformation, returning one result
for each group.
If there is not a group by port, AVG treats all rows as
one group, returning one value.
Example:
The following expression returns the average
wholesale
cost
of
flashlights:
AVG( WHOLESALE_COST, ITEM_NAME='Flashlight' )

You can perform arithmetic on the values passed to


AVG before the function calculates the average. For
example: AVG( QTY * PRICE - DISCOUNT )
CEIL
Returns the smallest integer greater than or equal to
the numeric value passed to this function.
For example, if you pass 3.14 to CEIL, the function
returns 4. If you pass 3.98 to CEIL, the function
returns 4.
Likewise, if you pass -3.17 to CEIL, the function
returns -3.
Syntax
CEIL( numeric_value )
CHOOSE
Chooses a string from a list of strings based on a
given position.
You specify the position and the value.
If the value matches the position, the PowerCenter
Integration Service returns the value.
Syntax
CHOOSE( index, string1 [, string2, ..., stringN] )
The following expression returns the string
flashlight based on an index value of 2: CHOOSE( 2,
'knife', 'flashlight', 'diving hood' )

The following expression returns NULL based on an


index value of 4: CHOOSE( 4, 'knife', 'flashlight',
'diving hood' )

CHR
ASCII Mode - CHR returns the ASCII character
corresponding to the numeric value you pass
to this function.
Unicode Mode - returns the Unicode character
corresponding to the numeric value you pass to this
function
ASCII values fall in the range 0 to 255.
You can pass any integer to CHR, but only ASCII
codes 32 to 126 are printable characters.
Syntax
CHR( numeric_value )

Use the CHR function to concatenate a single quote


onto a string.
The single quote is the only character that you
cannot use inside a string literal.
Consider the following example: 'Joan' || CHR(39)
|| 's car'

119

COUNT
The return value is: Joan's car

Returns the number of rows that have non-null

CHRCODE
ASCII Mode - CHRCODE returns the numeric ASCII
value of the first character of the string passed to
the function.
UNICODE Mode - returns the numeric Unicode value
of the first character of the string passed to the
function
COMPRESS
Compresses data using the zlib 1.2.1 compression
algorithm.
Use the COMPRESS function before you send large
amounts of data over a wide area network.
Syntax
COMPRESS( value )
Return Value
Compressed binary value of the input value.
NULL if the input is a null value.
CONCAT
Syntax
CONCAT( first_string, second_string )
Return Value
String.
If one of the strings is NULL, CONCAT ignores
it and returns the other string.
If both strings are NULL, CONCAT returns NULL

values in a group.
you can include the asterisk (*)
argument to count all input values in a
transformation.

Optionally,

You can nest only one other aggregate function


within COUNT.
You can apply a condition to filter rows before
counting them.
Syntax
COUNT( value [, filter_condition] )
or
COUNT( * [, filter_condition] )
COUNT groups values based on group by ports you
define in the transformation, returning one result
for each group.
If there is no group by port COUNT treats all rows as
one group, returning one value
CRC32
Returns a 32-bit Cyclic Redundancy Check (CRC32)
value.
Use CRC32 to find data transmission errors.
You can also use CRC32 if you want to verify that
data stored in a file has not been modified.
If you use CRC32 to perform a redundancy check on
data in ASCII mode and Unicode mode, the
PowerCenter Integration Service may generate
different results on the same input value.
Note: CRC32 can return the same output for
different input strings. If you want to generate keys
in a mapping, use a Sequence Generator
transformation. If you use CRC32 to generate keys in
a mapping, you may receive unexpected results.

CONCAT does not add spaces to separate strings.


If you want to add a space between two strings, you
can write an expression with two nested CONCAT
functions.

CONVERT_BASE
Converts a number from one base value to another
base value.
Syntax
CONVERT_BASE( value, source_base, dest_base )
The following example converts 2222 from the
decimal base value 10 to the binary base value 2:

Syntax
CRC32( value )
CUME
Returns a running total. A running total means CUME
returns a total each time it adds a value.
You can add a condition to filter rows out of the row
set before calculating the running total.
Use CUME and similar functions (such as
MOVINGAVG and MOVINGSUM) to simplify reporting
by calculating running values.

CONVERT_BASE( "2222", 10, 2 )

The PowerCenter
100010101110

Integration

Service

returns

Syntax
CUME( numeric_value [, filter_condition] )

120

DATE_COMPARE
Returns an integer indicating which of two dates
is earlier. DATE_COMPARE returns an integer value
rather than a date value.

Return Value
-1 if the first date is earlier.
0 if the two dates are equal.
1 if the second date is earlier.
NULL if one of the date values is NULL

DECODE returns the default value of NONE for items


17 and 25 because the search values did not match
the ITEM_ID.
Also, DECODE returns NONE for the NULL ITEM_ID.
The following expression tests multiple columns and
conditions, evaluated in a top to bottom order for
TRUE or FALSE:
DECODE( TRUE,
Var1 = 22, 'Variable 1 was 22!',
Var2 = 49, 'Variable 2 was 49!',
Var1 < 23, 'Variable 1 was less than 23.',
Var2 > 30, 'Variable 2 was more than 30.',
'Variables were out of desired ranges.')

DATE_DIFF
Returns the length of time between two dates. You
can request the format to be years, months, days,
hours,
minutes,
seconds,
milliseconds,
microseconds, or nanoseconds.
The PowerCenter Integration Service subtracts the
second date from the first date and returns the
difference.

Var1
21
22
23
24
25

Syntax
DATE_DIFF( date1, date2, format )
Return Value
Double value. If date1 is later than date2, the return
value is a positive number. If date1 is earlier than
date2, the return value is a negative number.
0 if the dates are the same.
NULL if one (or both) of the date values is NULL.

DECODE
Examples
You might use DECODE in an expression that
searches for a particular ITEM_ID and returns the
ITEM_NAME:
DECODE( ITEM_ID, 10, 'Flashlight',
14, 'Regulator',
20, 'Knife',
40, 'Tank',
'NONE' )
ITEM_ID RETURN VALUE
10
Flashlight
14
Regulator
17
NONE
20
Knife
25
NONE
NULL NONE
40
Tank

Var2
47
49
49
27
50

RETURN VALUE
Variable 1 was less than 23.
Variable 1 was 22!
Variable 2 was 49!
Variables were out of desired ranges.
Variable 2 was more than 30.

ERROR
Causes the PowerCenter Integration Service to skip
a row and issue an error message, which you
define.
The error message displays in the session log.
The PowerCenter Integration Service does not write
these skipped rows to the session reject file.
For example, you use the ERROR function in an
expression, and you assign the default value,
1234, to the output port.
Each time the PowerCenter Integration Service
encounters the ERROR function in the expression, it
overrides the error with the value 1234 and
passes 1234 to the next transformation.
It does not skip the row, and it does not log an error
in the session log
IIF( SALARY < 0, ERROR ('Error. Negative salary
found. Row skipped.', EMP_SALARY )

SALARY
RETURN VALUE
10000 10000
-15000
'Error. Negative salary found. Row
skipped.'
NULL NULL
150000
150000
EXP
Returns e raised to the specified power (exponent),
where e=2.71828183.

121

For example, EXP(2) returns 7.38905609893065.


You might use this function to analyze scientific and
technical data rather than business data.
EXP is the reciprocal of the LN function, which
returns the natural logarithm of a numeric value.

NULL if a value passed to the function is NULL.


FV
Returns the future value of an investment, where
you make periodic, constant payments and the
investment earns a constant interest rate.

Syntax
EXP( exponent )
Return Value
Double value.
NULL if a value passed as an argument to the
function is NULL
FIRST
Returns the first value found within a port or group.
Optionally, you can apply a filter to limit the rows
the PowerCenter Integration Service reads.
You can nest only one other aggregate function
within FIRST.
Syntax
FIRST( value [, filter_condition ] )
Return Value
First value in a group
If a value is NULL, FIRST ignores the row. However, if
all values passed from the port are NULL, FIRST
returns NULL
FIRST groups values based on group by ports you
define in the transformation, returning one result
for each group.
If there is no group by port, FIRST treats all rows as
one group, returning one value.

FLOOR
Returns the largest integer less than or equal to the
numeric value you pass to this function.
For example, if you pass 3.14 to FLOOR, the function
returns 3.
If you pass 3.98 to FLOOR, the function returns 3.
Likewise, if you pass -3.17 to FLOOR, the function
returns -4.

Syntax
FV( rate, terms, payment [, present value, type] )
Example
You deposit $2,000 into an account that earns 9%
annual interest compounded monthly (monthly
interest of 9%/ 12, or 0.75%).
You plan to deposit $250 at the beginning of every
month for the next 12 months.
The following expression returns $5,337.96 as the
account balance at the end of 12 months:
FV(0.0075, 12, -250, -2000, TRUE)

GET_DATE_PART
Returns the specified part of a date as an integer
value.
Therefore, if you create an expression that returns
the month portion of the date, and pass a date
such as Apr 1 1997 00:00:00, GET_DATE_PART
returns 4.
Syntax
GET_DATE_PART( date, format )
Return Value
Integer representing the specified part of the date.
NULL if a value passed to the function is NULL.

The following expressions return the day for each


date in the DATE_SHIPPED port:
GET_DATE_PART( DATE_SHIPPED, 'D' )
GET_DATE_PART( DATE_SHIPPED, 'DD' )
GET_DATE_PART( DATE_SHIPPED, 'DDD' )
GET_DATE_PART( DATE_SHIPPED, 'DY' )
GET_DATE_PART( DATE_SHIPPED, 'DAY' )
DATE_SHIPPED
Mar 13 1997 12:00:00 AM
June 3 1997 11:30:44PM
Aug 22 1997 12:00:00PM
NULL
NULL

Syntax
FLOOR( numeric_value )
Return Value
Integer if you pass a numeric value with declared
precision between 0 and 28.
Double if you pass a numeric value with declared
precision greater than 28.

GREATEST

122

RETURN VALUE
13
3
22

Returns the greatest value from a list of input


values.
Use this function to return the greatest string,
date, or number.
By default, the match is case sensitive
Return Value
Value1 if it is the greatest of the input values, value2
if it is the greatest of the input values, and so on.
NULL if any of the arguments is null
IIF
Returns one of two values you specify, based on the
results of a condition.

IN
Matches input data to a list of values. By default, the
match is case sensitive.
Syntax
IN( valueToSearch, value1, [value2, ..., valueN,]
CaseFlag )
Example
The following expression determines if the input
value is a safety knife, chisel point knife, or
medium titanium knife.
The input values do not have to match the case of
the values in the comma-separated list:
IN( ITEM_NAME, Chisel Point Knife, Medium Titanium
Knife, Safety Knife, 0

Syntax
IIF( condition, value1 [,value2] )
Unlike conditional functions in some systems, the
FALSE (value2) condition in the IIF function is not
required.
If you omit value2, the function returns the
following when the condition is FALSE:
- 0 if value1 is a Numeric datatype.
- Empty string if value1 is a String datatype.
- NULL if value1 is a Date/Time datatype.
For example, the following expression does not
include a FALSE condition and value1 is a string
datatype so the PowerCenter Integration Service
returns an empty string for each row that evaluates
to FALSE: IIF( SALES > 100, EMP_NAME )
SALES
150
50
120
NULL

EMP_NAME
John Smith
Pierre Bleu
Sally Green
Greg Jones

RETURN VALUE
John Smith
'' (empty string)
Sally Green
'' (empty string)

ITEM_NAME
RETURN VALUE
Stabilizing Vest
0 (FALSE)
Safety knife
1 (TRUE)
Medium Titanium knife
1 (TRUE)
NULL
INDEXOF
Finds the index of a value among a list of values. By
default, the match is case sensitive.
Syntax
INDEXOF( valueToSearch,
stringN,] CaseFlag )

You can often use a Filter transformation instead of


IIF to maximize session performance.

[string2,

...,

Example
The following expression determines if values from
the ITEM_NAME port match the first, second, or
third string:
INDEXOF( ITEM_NAME, diving hood, flashlight,
safety knife)

ITEM_NAME
Safety Knife
diving hood
Compass
safety knife
flashlight

When you use IIF, the datatype of the return


value is the same as the datatype of the
result with the greatest precision.
For example, you have the following expression:
IIF( SALES < 100, 1, .3333 )
The TRUE result (1) is an integer and the FALSE
result (.3333) is a decimal.
The Decimal datatype has greater precision than
Integer, so the datatype of the return value is
always a Decimal

string1,

RETURN VALUE
0
1
0
3
2

Safety Knife returns a value of 0 because it does not


match the case of the input value.
INITCAP
Capitalizes the first letter in each word of a string
and converts all other letters to lowercase.
Words are delimited by white space (a blank space,
formfeed, newline, carriage return, tab, or vertical
tab) and characters that are not alphanumeric.

123

For example, if you pass the string THOMAS, the


function returns Thomas.

INSTR( COMPANY, 'Blue Fin Aqua Center', -1, 1 )

COMPANY
Blue Fin Aqua Center
Maco Shark Shop
Scuba Gear
Frank's Dive Shop
VIP Diving Club

Syntax
INITCAP( string )
Example
The following expression capitalizes all names in the
FIRST_NAME port: INITCAP( FIRST_NAME )
FIRST_NAME
ramona
18-albert
NULL
?!SAM
THOMAS
PierRe

RETURN VALUE
Ramona
18-Albert
NULL
?!Sam
Thomas
Pierre

You can nest the INSTR function within other


functions to accomplish more complex tasks.
The following expression evaluates a string, starting
from the end of the string.
The expression finds the last (rightmost) space in
the string and then returns all characters to the left
of it:
SUBSTR(
1,1 ))

INSTR
Returns the position of a character set in a string,
counting from left to right.
Syntax
INSTR( string, search_value
[,comparison_type ]]] )

[,start

[,occurrence

COMPANY
Blue Fin Aqua Center
Maco Shark Shop
Scuba Gear
Frank's Dive Shop
VIP Diving Club

RETURN VALUE
0
8
9
0
0

The following expression returns the position of the


first character in the string Blue Fin Aqua Center
(starting from the last character in the company
name):

CUST_NAME,1,INSTR(

CUST_NAME
PATRICIA JONES
MARY ELLEN SHAH

CUST_NAME,'

'

,-

RETURN VALUE
PATRICIA
MARY ELLEN

The following expression removes the character '#'


from a string:
SUBSTR( CUST_ID, 1, INSTR(CUST_ID, '#')-1 ) ||
SUBSTR( CUST_ID, INSTR(CUST_ID, '#')+1 )

Return Value
Integer if the search is successful. Integer represents
the position of the first character in the
search_value, counting from left to right.
0 if the search is unsuccessful.
NULL if a value passed to the function is NULL
The following expression returns the position of the
second occurrence of the letter a, starting at the
beginning of each company name.
Because the search_value argument is case
sensitive, it skips the A in Blue Fin Aqua Center,
and returns 0:
INSTR( COMPANY, 'a', 1, 2 )

RETURN VALUE
1
0
0
0
0

CUST_ID
ID#33
#A3577
SS #712403399

RETURN VALUE
ID33
A3577
SS 712403399

ISNULL
Returns whether a value is NULL. ISNULL evaluates
an empty string as FALSE.
Note: To test for empty strings, use LENGTH.
Syntax
ISNULL( value )
Example
The following example checks for null values in the
items table:
ISNULL( ITEM_NAME )

ITEM_NAME
Flashlight
NULL
Regulator system
''
NULL

RETURN VALUE
0 (FALSE)
1 (TRUE)
0 (FALSE)
0 (FALSE) Empty string is not

IS_NUMBER
Returns whether a string is a valid number. A valid
number consists of the following parts:

124

- Optional space before the number


- Optional sign (+/-)
- One or more digits with an optional decimal point
- Optional scientific notation, such as the letter e or
E (and the letter d or D on Windows) followed by
an optional sign (+/-), followed by one or more digits
- Optional white space following the number
The following numbers are all valid:
' 100 '
' +100'
'-100'
'-3.45e+32'
'+3.45E-32'
'+3.45d+32' (Windows only)
'+3.45D-32' (Windows only)
'.6804'
The output port for an IS_NUMBER expression must
be a String or Numeric datatype.
ITEM_PRICE
'123.00'
'-3.45e+3'
'-3.45D-3'
'-3.45d-3'
'3.45E-'
''
''
'+123abc'
' 123'
'123 '
'ABC'
'-ABC'
NULL

RETURN VALUE
1 (True)
1 (True)
1 (True - Windows only)
0 (False - UNIX only)
0 (False) Incomplete number
0 (False) Consists entirely of blanks
0 (False) Empty string
0 (False)
1 (True) Leading white blanks
1 (True) Trailing white blanks
0 (False)
0 (False)
NULL

IS_SPACES
Returns whether a string value consists entirely of
spaces.
A space is a blank space, a formfeed, a newline, a
carriage return, a tab, or a vertical tab.
IS_SPACES evaluates an empty string as FALSE
because there are no spaces. To test for an
empty string, use LENGTH.

''
0 (FALSE) (Empty string does
not contain spaces.)
LAST
Returns the last row in the selected port.
Optionally, you can apply a filter to limit the rows
the PowerCenter Integration Service reads.
You can nest only one other aggregate function
within LAST
Syntax
LAST( value [, filter_condition ] )
Return Value
Last row in a port.
NULL if all values passed to the function are NULL, or
if no rows are selected (for example, the filter
condition evaluates to FALSE or NULL for all rows).
LAST_DAY
Returns the date of the last day of the month for
each date in a port.
Syntax
LAST_DAY( date )
Return Value
Date. The last day of the month for that date value
you pass to this function
If a value is NULL, LAST_DAY ignores the row.
However, if all values passed from the port are
NULL, LAST_DAY returns NULL
LAST_DAY groups values based on group by ports
you define in the transformation, returning one
result for each group. If there is no group by port,
LAST_DAY treats all rows as one group, returning
one value
Examples
The following expression returns the last day of the
month for each date in the ORDER_DATE port:
LAST_DAY( ORDER_DATE )

Example
The following expression checks the ITEM_NAME port
for rows that consist entirely of spaces:

ORDER_DATE
RETURN VALUE
Apr 1 1998 12:00:00AM
Apr
30
1998
12:00:00AM
Jan 6 1998 12:00:00AM
Jan
31
1998
12:00:00AM
Feb 2 1996 12:00:00AM
Feb
29
1996
12:00:00AM (Leap year)
NULL
NULL
Jul 31 1998 12:00:00AM
Jul 31 1998 12:00:00AM

IS_SPACES( ITEM_NAME )

ITEM_NAME
Flashlight
Regulator system
NULL

RETURN VALUE
0 (False)
1 (True)
0 (False)
NULL

125

LEAST
Returns the smallest value from a list of input
values.
By default, the match is case sensitive.

The following expression returns the logarithm for all


values in the NUMBERS port:
LOG( BASE, EXPONENT )

BASE EXPONENT
RETURN VALUE
15
1
0
.09
10
-0.956244644696599
NULL 18
NULL
35.78 NULL
NULL
-9
18
Error. (PowerCenter Integration
Service does not write the row.)
0
5
Error. (PowerCenter Integration
Service does not write the row.)
10
-2
Error. (PowerCenter Integration
Service does not write the row.)

Syntax
LEAST( value1, [value2, ..., valueN,] CaseFlag )
Example
The following expression
quantity of items ordered:

returns

the

smallest

LEAST( QUANTITY1, QUANTITY2, QUANTITY3 )


QUANTITIY1
VALUE

QUANTITY2

QUANTITY3

RETURN

150

756

27

5000
120

97
1724

17
965

27
NULL
17
120

LENGTH
Returns the number of characters in a string,
including trailing blanks.
Return Value
Integer representing the length of the string.
NULL if a value passed to the function is NULL
LN
Returns the natural logarithm of a numeric value.
For example, LN(3) returns 1.098612.
You usually use this function to analyze scientific
data rather than business data.
This function is the reciprocal of the function EXP.

The PowerCenter Integration Service displays an


error and does not write the row if you pass a
negative number, 0, or 1 as a base value, or if you
pass a negative value for the exponent.
LOOKUP
Note: This function is not supported in mapplets.
Use the Lookup transformation rather than the
LOOKUP function to look up values in PowerCenter
mappings.
If you use the LOOKUP function in a mapping, you
need to enable the lookup caching option for 3.5
compatibility in the session properties
This option exists expressly for PowerMart 3.5 users
who want to continue using the LOOKUP function,
rather than creating Lookup transformation
Syntax
LOOKUP( result,
value2]... )

Syntax
LN( numeric_value )
Return Value
Double value.
NULL if a value passed to the function is NULL
LOG
Returns the logarithm of a numeric value.
Most often, you use this function to analyze
scientific data rather than business data
Syntax
LOG( base, exponent )
Return Value
Double value.
NULL if a value passed to the function is NULL
Example

search1,

value1

[,

search2,

Return Value
Result if all searches find matching values. If the
PowerCenter Integration Service finds matching
values, it returns the result from the same row as
the search1 argument.
NULL if the search does not find any matching
values.
Error if the search finds more than one matching
value.

Example
The following expression searches the lookup source
:TD.SALES for a specific item ID and price, and
returns the item name if both searches find a
match:

126

LOOKUP(
:TD.SALES.ITEM_NAME,
:TD.SALES.ITEM_ID, 10, :TD.SALES.PRICE, 15.99 )

ITEM_NAME
ITEM_ID
Regulator
5
Flashlight
10
Halogen Flashlight
15
NULL
20
RETURN VALUE: Flashlight

PRICE
100.00
15.99
15.99
15.99

When you compare char and varchar values, the


LOOKUP function returns a result only if the two
rows match.
This means that both the value and the length for
each row must match (use trim)
LOWER
Converts uppercase string characters to lowercase
Return Value
Lowercase character string. If the data contains
multibyte characters, the return value depends on
the code page and data movement mode of the
Integration Service.
NULL if a value in the selected port is NULL.
LPAD
Adds a set of blanks or characters to the beginning
of a string to set the string to a specified length.
Syntax
LPAD( first_string, length [,second_string] )
Return Value
String of the specified length.
NULL if a value passed to the function is NULL or if
length is a negative number.
Examples
The following expression standardizes numbers to
six digits by padding them with leading zeros.
LPAD( PART_NUM, 6, '0')
PART_NUM
702
1
0553
484834

RETURN VALUE
000702
000001
000553
484834

LPAD counts the length from left to right.


If the first string is longer than the length, LPAD
truncates the string from right to left.
For example, LPAD(alphabetical, 5, x) returns the
string alpha.

If the second string is longer than the total


characters needed to return the specified length,
LPAD uses a portion of the second string:
LPAD( ITEM_NAME, 16, '*..*' )
ITEM_NAME
Flashlight
Compass
Regulator System
Safety Knife

RETURN VALUE
*..**.Flashlight
*..**..**Compass
Regulator System
*..*Safety Knife

LTRIM
Removes blanks or characters from the beginning of
a string.
You can use LTRIM with IIF or DECODE in an
Expression or Update Strategy transformation to
avoid spaces in a target table.
If you do not specify a trim_set parameter in the
expression:
- In UNICODE mode, LTRIM removes both single- and
double-byte spaces from the beginning of a string.
- In ASCII mode, LTRIM removes only single-byte
spaces.
If you use LTRIM to remove characters from a string,
LTRIM compares the trim_set to each character in
the
string
argument,
character-by-character,
starting with the left side of the string.
If the character in the string matches any character
in the trim_set, LTRIM removes it.
LTRIM continues comparing and removing characters
until it fails to find a matching character in the
trim_set.
Then it returns the string, which does not include
matching characters.
Syntax
LTRIM( string [, trim_set] )
Return Value
String. The string values with the specified
characters in the trim_set argument removed.
NULL if a value passed to the function is NULL. If the
trim_set is NULL, the function returns NULL.
Example
The following expression removes the characters S
and . from the strings in the LAST_NAME port:
LTRIM( LAST_NAME, 'S.')

LAST_NAME
Nelson

127

RETURN VALUE
Nelson

Osborne
NULL
S. MacDonald
Sawyer
H. Bender
Steadman

Osborne
NULL
MacDonald
awyer
H. Bender
teadman

LTRIM removes S. from S. MacDonald and the S


from both Sawyer and Steadman, but not the
period from H. Bender.
This is because LTRIM searches, character-bycharacter, for the set of characters you specify in
the trim_set argument.
If the first character in the string matches the first
character in the trim_set, LTRIM removes it.
Then LTRIM looks at the second character in the
string. If it matches the second character in the
trim_set, LTRIM removes it, and so on.
When the first character in the string does not
match the corresponding character in the trim_set,
LTRIM returns the string and evaluates the next row.
In the example of H. Bender, H does not match
either character in the trim_set argument, so LTRIM
returns the string in the LAST_NAME port and
moves to the next row.
RTRIM
Removes blanks or characters from the end of a
string.
If you do not specify a trim_set parameter in the
expression:
- In UNICODE mode, RTRIM removes both single- and
double-byte spaces from the end of a string.
- In ASCII mode, RTRIM removes only single-byte
spaces.
If you use RTRIM to remove characters from a string,
RTRIM compares the trim_set to each character in
the
string
argument,
character-by-character,
starting with the right side of the string.
If the character in the string matches any character
in the trim_set, RTRIM removes it.
RTRIM
continues
comparing
and
removing
characters until it fails to find a matching character
in the trim_set.
It returns the string without the matching
characters.

The following expression removes the characters re


from the strings in the LAST_NAME port:
RTRIM( LAST_NAME, 're')

LAST_NAME
Nelson
Page
Osborne
NULL
Sawyer
H. Bender
Steadman

RETURN VALUE
Nelson
Pag
Osborn
NULL
Sawy
H. Bend
Steadman

RTRIM removes e from Page even though r is the


first character in the trim_set.
This is because RTRIM searches, character-bycharacter, for the set of characters you specify in
the trim_set argument.
If the last character in the string matches the first
character in the trim_set, RTRIM removes it.
If, however, the last character in the string does not
match, RTRIM compares the second character in
the trim_set.
If the second from last character in the string
matches the second character in the trim_set,
RTRIM removes it, and so on.
When the character in the string fails to match the
trim_set, RTRIM returns the string and evaluates
the next row.
In the last example, the last character in Nelson
does not match any character in the trim_set
argument, so RTRIM returns the string 'Nelson' and
evaluates the next row.
MAKE_DATE_TIME
Returns the date and time based on the input
values.
Syntax
MAKE_DATE_TIME( year, month, day, hour, minute,
second, nanosecond )
Return Value
Date as MM/DD/YYYY HH24:MI:SS. Returns a null
value if you do not pass the function a year, month,
or day.

Syntax
RTRIM( string [, trim_set] )

Example
The following expression creates a date and time
from the input ports:

Example

MAKE_DATE_TIME( SALE_YEAR, SALE_MONTH, SALE_DAY,


SALE_HOUR, SALE_MIN, SALE_SEC )

128

SALE_YR SALE_MTH SALE_DAY SALE_HR SALE_MIN


SALE_SEC
RETURN VALUE
2002 10
27
8
36
22
10/27/2002 08:36:22
2000 6
15
15
17
06/15/2000 15:17:00
2003 1
3
22
45
01/03/2003 00:22:45
04
3
30
12
5
10
03/30/0004 12:05:10
99
12
12
5
16
12/12/0099 05:00:16
MAX/MIN (Dates)
Returns the latest date found within a port or group.
You can apply a filter to limit the rows in the search.
You can nest only one other aggregate function
within MAX.
You can also use MAX to return the largest numeric
value or the highest string value in a port or group
Syntax
MAX( date [, filter_condition] )
Return Value
Date.
NULL if all values passed to the function are NULL, or
if no rows are selected (for example, the filter
condition evaluates to FALSE or NULL for all rows).
MAX/MIN (Numbers)
Returns the maximum numeric value found within a
port or group.
You can apply a filter to limit the rows in the search.
You can nest only one other aggregate function
within MAX.
You can also use MAX to return the latest date or the
highest string value in a port or group.
Syntax
MAX( numeric_value [, filter_condition] )
Return Value
Numeric value
NULL if all values passed to the function are NULL or
if no rows are selected (for example, the filter
condition evaluates to FALSE or NULL for all rows).
Note: If the return value is Decimal with precision
greater than 15, you can enable high precision to
ensure decimal precision up to 28 digits

MAX groups values based on group by ports you


define in the transformation, returning one result
for each group.
If there is no group by port, MAX treats all rows as
one group, returning one value.

MAX/MIN (String)
Returns the highest string value found within a port
or group.
You can apply a filter to limit the rows in the search.
You can nest only one other aggregate function
within MAX.
Note: The MAX function uses the same sort order
that the Sorter transformation uses. However, the
MAX function is case sensitive, and the Sorter
transformation may not be case sensitive.
You can also use MAX to return the latest date or the
largest numeric value in a port or group.
Syntax
MAX( string [, filter_condition] )
Return Value
String.
NULL if all values passed to the function are NULL, or
if no rows are selected (for example, the filter
condition evaluates to FALSE or NULL for all rows).
MAX groups values based on group by ports you
define in the transformation, returning one result
for each group.
If there is no group by port, MAX treats all rows as
one group, returning one value
MD5
Calculates the checksum of the input value. The
function uses Message-Digest algorithm 5 (MD5).
MD5 is a oneway cryptographic hash function with a
128-bit hash value.
You can conclude that input values are different
when the checksums of the input values are
different. Use MD5 to verify data integrity.
Syntax
MD5( value )
Return Value
Unique 32-character string of hexadecimal digits 0-9
and a-f.
NULL if the input is a null value.

If a value is NULL, MAX ignores it. However, if all


values passed from the port are NULL, MAX returns
NULL.

129

MEDIAN
Returns the median of all values in a selected port.
If there is an even number of values in the port, the
median is the average of the middle two values
when all values are placed ordinally on a number
line. If there is an odd number of values in the port,
the median is the middle number.
You can nest only one other aggregate function
within MEDIAN, and the nested function must return
a Numeric datatype.
The PowerCenter Integration Service reads all rows
of data to perform the median calculation.
The process of reading rows of data to perform the
calculation may affect performance.
Optionally, you can apply a filter to limit the rows
you read to calculate the median.

MOD
Returns the remainder of a division calculation. For
example, MOD(8,5) returns 3.
Syntax
MOD( numeric_value, divisor )
Return Value
Numeric value of the datatype you pass to the
function. The remainder of the numeric value
divided by the divisor.
NULL if a value passed to the function is NULL.
Examples
The following expression returns the modulus of the
values in the PRICE port divided by the values in
the QTY port:
MOD( PRICE, QTY )

PRICE QTY
10.00 2
12.00 5
9.00 2
15.00 3
NULL 3
20.00 NULL
25.00 0
write row.

Syntax
MEDIAN( numeric_value [, filter_condition ] )
Return Value
Numeric value.
NULL if all values passed to the function are NULL, or
if no rows are selected. For example, the filter
condition evaluates to FALSE or NULL for all rows.
Note: If the return value is Decimal with precision
greater than 15, you can enable high precision to
ensure decimal precision up to 28 digits.
MEDIAN groups values based on group by ports you
define in the transformation, returning one result
for each group.
If there is no group by port, MEDIAN treats all rows
as one group, returning one value

The last row (25, 0) produced an error because you


cannot divide by 0.
To avoid dividing by 0, you can create an expression
similar to the following, which returns the modulus
of Price divided by Quantity only if the quantity is
not 0.
If the quantity is 0, the function returns NULL:
MOD( PRICE, IIF( QTY = 0, NULL, QTY ))

PRICE
10.00
12.00
9.00
15.00
NULL
20.00
25.00

METAPHONE
Encodes string values. You can specify the length of
the string that you want to encode.
METAPHONE encodes characters of the English
language alphabet (A-Z).
It encodes both uppercase and lowercase letters in
uppercase.
METAPHONE encodes characters according to the
following list of rules:
- Skips vowels (A, E, I, O, and U) unless one of them
is the first character of the input string.
METAPHONE(CAR)
returns
KR
and
METAPHONE(AAR) returns AR.
- Uses special encoding guidelines

RETURN VALUE
0
2
1
0
NULL
NULL
Error. Integration Service does not

QTY
2
5
2
3
3
NULL
0

RETURN VALUE
0
2
1
0
NULL
NULL
NULL

The last row (25, 0) produced a NULL rather than an


error because the IIF function replaces NULL with
the 0 in the QTY port.
MOVINGAVG
Returns the average (row-by-row) of a specified set
of rows.

130

Optionally, you can apply a condition to filter rows


before calculating the moving average.
Syntax
MOVINGAVG(
numeric_value,
filter_condition] )

rowset

ROW_NO
1
2
3
4
5
6
7

[,

Return Value
Numeric value.
MOVINGAVG ignores null values when calculating
the moving average. However, if all values are
NULL, the function returns NULL thereafter, returns
the average for the last five rows read:

SALES
600
504
36
100
550
39
490

RETURN VALUE
NULL
NULL
NULL
NULL
1790
1229
1215

The function returns the sum for a set of five rows:


1790 based on rows 1 through 5, 1229 based on
rows 2 through 6, and 1215 based on rows 3
through 7

MOVINGAVG( SALES, 5 )

ROW_NO
1
600
2
504
3
36
4
100
5
550
6
39
7
490

SALES RETURN VALUE


NULL
NULL
NULL
NULL
358
245.8
243

NPER
Returns the number of periods for an investment
based on a constant interest rate and periodic,
constant payments.
Syntax
NPER( rate, present value, payment [, future value,
type] )

The function returns the average for a set of five


rows: 358 based on rows 1 through 5, 245.8 based
on rows 2 through 6, and 243 based on rows 3
through 7.
MOVINGSUM
Returns the sum (row-by-row) of a specified set of
rows.
Optionally, you can apply a condition to filter rows
before calculating the moving sum.
Syntax
MOVINGSUM(
numeric_value,
filter_condition] )

rowset

Return Value
Numeric.
Example
The present value of an investment is $2,000. Each
payment is $500 and the future value of the
investment is $20,000.
The following expression returns 9 as the number of
periods for which you need to make the payments:

[,

Return Value
Numeric value.
MOVINGSUM ignores null values when calculating
the moving sum. However, if all values are NULL,
the function returns NULL.
Example
The following expression returns the sum of orders
for a Stabilizing Vest, based on the first five rows in
the Sales port, and thereafter, returns the average
for the last five rows read:

NPER( 0.01, -2000, -500, 20000, TRUE )

PERCENTILE
Calculates the value that falls at a given percentile
in a group of numbers.
You can nest only one other aggregate function
within PERCENTILE, and the nested function must
return a Numeric datatype.
The PowerCenter Integration Service reads all rows
of data to perform the percentile calculation.
The process of reading rows to perform the
calculation may affect performance.
Optionally, you can apply a filter to limit the rows
you read to calculate the percentile

MOVINGSUM( SALES, 5 )

Syntax

131

PERCENTILE(
numeric_value,
filter_condition ] )

percentile

[,

Return Value
Numeric value.
If a value is NULL, PERCENTILE ignores the row.
However, if all values in a group are NULL,
PERCENTILE returns NULL.
PERCENTILE groups values based on group by ports
you define in the transformation, returning one
result for each group.
If there is no group by port, PERCENTILE treats all
rows as one group, returning one value
Example
The PowerCenter Integration Service calculates a
percentile using the following logic:
Use the following guidelines for this equation:
- x is the number of elements in the group of values
for which you are calculating a percentile.
- If i < 1, PERCENTILE returns the value of the first
element in the list.
- If i is an integer value, PERCENTILE returns the
value of the ith element in the list
- Otherwise PERCENTILE returns the value of n:

Return Value
Numeric.
Example
The following expression returns -2111.64 as the
monthly payment amount of a loan:
PMT( 0.01, 10, 20000 )

POWER
Returns a value raised to the exponent you pass to
the function.
Syntax
POWER( base, exponent )
Return Value
Double value.
Example
The following expression returns the values in the
Numbers port raised to the values in the Exponent
port:
POWER( NUMBERS, EXPONENT )

NUMBERS EXP
RETURN VALUE
10.0 2.0
100
3.5
6.0
1838.265625
3.5
5.5
982.594307804838
NULL 2.0
NULL
10.0 NULL NULL
-3.0
-6.0
0.00137174211248285
3.0
-6.0
0.00137174211248285
-3.0
6.0
729.0
-3.0
5.5
729.0

The following expression returns the salary that falls


at the 75th percentile of salaries greater than
$50,000:
PERCENTILE( SALARY, 75, SALARY > 50000 )

SALARY
125000.0
27900.0
100000.0
NULL
55000.0
9000.0
85000.0
86000.0
48000.0
99000.0
RETURN VALUE: 106250.0

The value -3.0 raised to 6 returns the same results


as -3.0 raised to 5.5.
If the base is negative, the exponent must be an
integer.
Otherwise, the PowerCenter Integration Service
rounds the exponent to the nearest integer value.
PV
Returns the present value of an investment.
Syntax
PV( rate, terms, payment [, future value, type] )

PMT
Returns the payment for a loan based on constant
payments and a constant interest rate.
Syntax
PMT( rate, terms, present value[, future value,
type] )

Return Value
Numeric.
Example
The following expression returns 12,524.43 as the
amount you must deposit in the account today to

132

have a future value of $20,000 in one year if you


also deposit $500 at the beginning of each period:
PV( 0.0075, 12, -500, 20000, TRUE )

RAND
Returns a random number between 0 and 1. This is
useful for probability scenarios.
Syntax
RAND( seed )
Return Value
Numeric.
For the same seed, the PowerCenter Integration
Service generates the same sequence of numbers

Return Value
Returns the value of the nth subpattern that is part
of the input value. The nth subpattern is based on
the value you specify for subPatternNum.
NULL if the input is a null value or if the pattern is
null
Example
You might use REG_EXTRACT in an expression to
extract middle names from a regular expression
that matches first name, middle name, and last
name.
For example, the following expression returns the
middle name of a regular expression:
REG_EXTRACT( Employee_Name, '(\w+)\s+(\w+)\s+
(\w+)',2)

Example
The following expression may return a value of
0.417022004702574:

Employee_Name
Return Value
Stephen Graham
Smith Graham
Juan Carlos Fernando
Carlos

RAND (1)

RATE
Returns the interest rate earned per period by a
security.
Syntax
RATE( terms, payment, present value[, future value,
type] )

REG_MATCH
Returns whether a value matches a regular
expression pattern.
This lets you validate data patterns, such as IDs,
telephone numbers, postal codes, and state names.
Note: Use the REG_REPLACE function to replace a
character pattern in a string with a new character
pattern.

Return Value
Numeric.
Example
The following expression returns 0.0077 as the
monthly interest rate of a loan:
RATE( 48, -500, 20000 )

To calculate the annual interest rate of the loan,


multiply 0.0077 by 12. The annual interest rate is
0.0924 or 9.24%.
REG_EXTRACT
Extracts subpatterns of a regular expression within
an input value.
For example, from a regular expression pattern for a
full name, you can extract the first name or last
name.
Note: Use the REG_REPLACE function to replace a
character pattern in a string with another character
pattern

Syntax
REG_MATCH( subject, pattern )
Return Value
TRUE if the data matches the pattern.
FALSE if the data does not match the pattern.
NULL if the input is a null value or if the pattern is
NULL.
Example
You might use REG_MATCH in an expression to
validate telephone numbers.
For example, the following expression matches a 10digit telephone number against the pattern and
returns a Boolean value based on the match:
REG_MATCH
'(\d\d\d-\d\d\d-\d\d\d\d)' )

Phone_Number
408-555-1212
NULL
510-555-1212
92 555 51212

Syntax
REG_EXTRACT( subject, 'pattern', subPatternNum )

133

Return Value
TRUE
TRUE
FALSE

(Phone_Number,

650-555-1212
415-555-1212
831 555 12123

TRUE
TRUE
FALSE

You can also use REG_MATCH for the following tasks:


- To verify that a value matches a pattern. This use is
similar to the SQL LIKE function.
- To verify that values are characters. This use is
similar to the SQL IS_CHAR function.
To verify that a value matches a pattern, use a
period (.) and an asterisk (*) with the
REG_MATCH function in an expression.
A period matches any one character. An
asterisk matches 0 or more instances of
values that follow it.
For example, use the following expression to find
account numbers that begin with 1835:
REG_MATCH(ACCOUNT_NUMBER, 1835.*)

To verify that values are characters, use a


REG_MATCH function with the regular expression [azA-Z]+. a-z matches all lowercase characters.
A-Z matches all uppercase characters.
The plus sign (+) indicates that there should be at
least one character.

Employee_Name
Adam Smith Adam
Greg Sanders
Sarah Fe
Sarah
Sam Cooper

RETURN VALUE
Smith
Greg Sanders
Fe
Sam Cooper

REPLACECHR
Replaces characters in a string with a single
character or no character.
REPLACECHR searches the input string for the
characters you specify and replaces all occurrences
of all characters with the new character you specify.
Syntax
REPLACECHR( CaseFlag, InputString, OldCharSet,
NewChar )

Return Value
String.
Empty string if REPLACECHR removes all characters
in InputString.
NULL if InputString is NULL.
InputString if OldCharSet is NULL or empty

For example, use the following expression to verify


that a list of last names contain only characters:
REG_MATCH(LAST_NAME, [a-zA-Z]+)

REG_REPLACE
Replaces characters in a string with another
character pattern.
By default, REG_REPLACE searches the input string
for the character pattern you specify and replaces
all occurrences with the replacement pattern.
You can also indicate the number of occurrences of
the pattern you want to replace in the string.
Syntax
REG_REPLACE(
subject,
numReplacements )

pattern,

replace,

Return Value
String
Example
The following expression removes additional spaces
from the Employee name data for each row of the
Employee_name port:
REG_REPLACE( Employee_Name, \s+, )

REPLACESTR
Replaces characters in a string with a single
character, multiple characters, or no character.
REPLACESTR searches the input string for all strings
you specify and replaces them with the new string
you specify.
Syntax
REPLACESTR ( CaseFlag, InputString, OldString1,
[OldString2, ... OldStringN,] NewString )
Return Value
String.

134

Empty string if REPLACESTR removes all characters


in InputString.
NULL if InputString is NULL.
InputString if all OldString arguments are NULL or
empty.

At the end of the session, it determines the total


count for all partitions and saves the total to the
repository.
Unless overridden, it uses the saved value as the
initial value of the variable for the next time you
use this session.
The PowerCenter Integration Service does not save
the final value of a mapping variable to the
repository when any of the following are true:
- The session fails to complete.
- The session is configured for a test load.
- The session is a debug session.
- The session runs in debug mode and is configured
to discard session output.
Syntax
SETCOUNTVARIABLE( $$Variable )
SET_DATE_PART
Sets one part of a Date/Time value to a value you
specify. With SET_DATE_PART, you can change the
following parts of a date:
Year - Change the year by entering a positive
integer in the value argument. Use any of the year
format strings:
Y, YY, YYY, or YYYY to set the year. For example, the
following expression changes the year to 2001 for all
dates in the SHIP_DATE port:

REVERSE
Reverses the input string.
Syntax
REVERSE( string )

SET_DATE_PART( SHIP_DATE, 'YY', 2001 )

Month - Change the month by entering a positive


integer between 1 and 12 (January=1 and
December=12) in the value argument. Use any of
the month format strings: MM, MON, MONTH to set
the month. For example, the following expression
changes the month to October for all dates in the
SHIP_DATE port:

<Print 123-127>
SETCOUNTVARIABLE
Counts the rows evaluated by the function and
increments the current value of a mapping variable
based on the count.
Increases the current value by one for each row
marked for insertion.
Decreases the current value by one for each row
marked for deletion.
Keeps the current value the same for each row
marked for update or reject.
Returns the new current value.
At the end of a successful session, the PowerCenter
Integration Service saves the last current value to
the repository.
When used with a session that contains multiple
partitions, the PowerCenter Integration Service
generates different current values for each
partition.

SET_DATE_PART( SHIP_DATE, 'MONTH', 10 )

Day - Change the day by entering a positive integer


between 1 and 31 (except for the months that have
less than 31 days: February, April, June, September,
and November) in the value argument. Use any of
the month format strings (D, DD, DDD, DY, and DAY)
to set the day. For example, the following expression
changes the day to 10 for all dates in the SHIP_DATE
port:
SET_DATE_PART( SHIP_DATE, 'DD', 10 )

Syntax
SET_DATE_PART( date, format, value )
Return Value
Date in the same format as the source date with the
specified part changed.

135

NULL if a value passed to the function is NULL.


Examples
The following expressions change the hour to 4PM
for each date in the DATE_PROMISED port:
SET_DATE_PART( DATE_PROMISED, 'HH', 16 )
SET_DATE_PART( DATE_PROMISED, 'HH12', 16 )
SET_DATE_PART( DATE_PROMISED, 'HH24', 16 )
DATE_PROMISED
RETURN VALUE
Jan 1 1997 12:15:56AM
Jan 1 1997 4:15:56PM
Feb 13 1997 2:30:01AM
Feb 13 1997 4:30:01PM
Mar 31 1997 5:10:15PM
Mar 31 1997 4:10:15PM
Dec 12 1997 8:07:33AM
Dec 12 1997 4:07:33PM
NULL
NULL
SETMAXVARIABLE/SETMINVARIABLE
Sets the current value of a mapping variable to the
higher of two values: the current value of the
variable or the value you specify.
Returns the new current value.
The function executes only if a row is marked as
insert.
SETMAXVARIABLE ignores all other row types and
the current value remains unchanged.
At the end of a successful session, the PowerCenter
Integration Service saves the final current value to
the repository.
When used with a session that contains multiple
partitions, the PowerCenter Integration Service
generates different current values for each
partition.
At the end of the session, it saves the highest
current value across all partitions to the repository.
Unless overridden, it uses the saved value as the
initial value of the variable for the next session run.
When used with a string mapping variable,
SETMAXVARIABLE returns the higher string based
on the sort order selected for the session.
Examples
The following expression compares the number of
items purchased in each transaction with a
mapping variable $$MaxItems.
It sets $$MaxItems to the higher of two values and
returns the historically highest number of items
purchased in a single transaction to the MAX_ITEMS
port.
The initial value of $$MaxItems from the previous
session run is 22.
SETMAXVARIABLE ($$MAXITEMS, ITEMS)
TRANSACTION

0100002
0100003
0100004
0100005
0100006
0100007

12
5
18
35
5
14

22
22
22
35
35
35

At the end of the session, the PowerCenter


Integration Service saves 35 to the repository as
the maximum current value for $$MaxItems.
The next time the session runs, the PowerCenter
Integration Service evaluates the initial value to $
$MaxItems to 35.
If the same session contains three partitions, the
PowerCenter Integration Service evaluates $
$MaxItems for each partition.
Then, it saves the largest value to the repository. For
example, the last evaluated value for $$MaxItems
in each partition is as follows:
Partition
Final Current Value for $$MaxItems
Partition 1
35
Partition 2
23
Partition 3
22
SETVARIABLE
Sets the current value of a mapping variable to a
value you specify.
Returns the specified value.
The SETVARIABLE function executes only if a row is
marked as insert or update. SETVARIABLE ignores
all other row types and the current value remains
unchanged.
At the end of a successful session, the PowerCenter
Integration Service compares the final current value
of the variable to the start value of the variable.
Based on the aggregate type of the variable, it
saves a final current value to the repository.
Unless overridden, it uses the saved value as the
initial value of the variable for the next session run.
Return Value
Current value of the variable.
When value is NULL, the PowerCenter Integration
Service returns the current value of $$Variable.
Examples
The following expression sets a mapping variable $
$Time to the system date at the time the
PowerCenter Integration Service evaluates the row
and returns the system date to the SET_$$TIME
port:
SETVARIABLE ($$Time, SYSDATE)

ITEMS MAX_ITEMS

136

TRANSACTION
TOTAL SET_$$TIME
0100002
534.23
10/10/2000 01:34:33
0100003
699.01
10/10/2000 01:34:34
0100004
97.50 10/10/2000 01:34:35
0100005
116.43
10/10/2000 01:34:36
0100006
323.95
10/10/2000 01:34:37
At the end of the session, the PowerCenter
Integration Service saves 10/10/2000 01:34:37 to
the repository as the last evaluated current value
for $$Time. The next time the session runs, the
PowerCenter Integration Service evaluates all
references to $$Time to 10/10/2000 01:34:37
SIGN
Returns whether
negative, or 0.

numeric

value

is

positive,

- Skips the characters A, E, I, O, U, H, and W unless


one of them is the first character in string. For
example, SOUNDEX(A123) returns A000 and
SOUNDEX(MAeiouhwC) returns M000.
- If string produces fewer than four characters,
SOUNDEX pads the resulting string with zeroes. For
example, SOUNDEX(J) returns J000.
- If string contains a set of consecutive consonants
that use the same code listed in SOUNDEX on
page 140, SOUNDEX encodes the first occurrence
and skips the remaining occurrences in the set. For
example, SOUNDEX(AbbpdMN) returns A135.
- Skips numbers in string. For example, both
SOUNDEX(Joh12n) and SOUNDEX(1John) return
J500.
- Returns NULL if string is NULL or if all the
characters in string are not letters of the English
alphabet.

Syntax
SIGN( numeric_value )

Return Value
-1 for negative values.
0 for 0.
1 for positive values.
NULL if NULL.
SOUNDEX
Encodes a string value into a four-character string.
SOUNDEX works for characters in the English
alphabet (A-Z).
It uses the first character of the input string as the
first character in the return value and encodes the
remaining three unique consonants as numbers.
SOUNDEX encodes characters according to the
following list of rules:
- Uses the first character in string as the first
character in the return value and encodes it in
uppercase. For example, both SOUNDEX(John) and
SOUNDEX(john) return J500.
- Encodes the first three unique consonants
following the first character in string and ignores the
rest. For example, both SOUNDEX(JohnRB) and
SOUNDEX(JohnRBCD) return J561.
- Assigns a single code to consonants that sound
alike.

The following table lists


guidelines for consonants:

SOUNDEX

Syntax
SOUNDEX( string )
Return Value
String.
NULL if one of the following conditions is true:
- If value passed to the function is NULL.
- No character in string is a letter of the English
alphabet.
- string is empty.
Example
The following expression encodes the values in the
EMPLOYEE_NAME port:
SOUNDEX( EMPLOYEE_NAME )

encoding

EMPLOYEE_NAME
John
William
jane
joh12n

137

RETURN VALUE
J500
W450
J500
J500

1abc
NULL

A120
NULL

The expression still reads the source string from left


to right when returning the result of the length
argument:

STDDEV
Returns the standard deviation of the numeric
values you pass to this function.
STDDEV is used to analyze statistical data.
You can nest only one other aggregate function
within STDDEV, and the nested function must
return a Numeric datatype.
Syntax
STDDEV( numeric_value [,filter_condition] )
Return Value
Numeric value.
NULL if all values passed to the function are NULL or
if no rows are selected (for example, the filter
condition evaluates to FALSE or NULL for all rows).

SUBSTR( PHONE, -8, 3 )

PHONE
808-555-0269
809-555-3915
357-687-6708
NULL
NULL

You can nest INSTR in the start or length argument


to search for a specific string and return its
position.
The following expression evaluates a string, starting
from the end of the string.
The expression finds the last (rightmost) space in
the string and then returns all characters preceding
it:
SUBSTR( CUST_NAME,1,INSTR( CUST_NAME,' ' ,-1,1 )
-1)

STDDEV groups values based on group by ports you


define in the transformation, returning one result
for each group.
If there is no group by port, STDDEV treats all rows
as one group, returning one value.
SUBSTR
Returns a portion of a string.
SUBSTR counts all characters, including blanks,
starting at the beginning of the string.
Syntax
SUBSTR( string, start [,length] )
Return Value
String.
Empty string if you pass a negative or 0 length
value.
NULL if a value passed to the function is NULL.

RETURN VALUE
555
555
687

CUST_NAME
PATRICIA JONES
MARY ELLEN SHAH

RETURN VALUE
PATRICIA
MARY ELLEN

The following expression removes the character '#'


from a string:
SUBSTR( CUST_ID, 1, INSTR(CUST_ID, '#')-1 ) ||
SUBSTR( CUST_ID, INSTR(CUST_ID, '#')+1 )

When the length argument is longer than the string,


SUBSTR returns all the characters from the start
position to the end of the string.
Consider the following example:
SUBSTR('abcd', 2, 8)

The return value is bcd. Compare this result to the


following example:
SUBSTR('abcd', -2, 8)

The return value is cd.


Examples
The following expressions return the area code for
each row in the Phone port:
SUBSTR( PHONE, 0, 3 )

PHONE
809-555-0269
357-687-6708
NULL
NULL

RETURN VALUE
809
357

You can also pass a negative start value to return


the phone number for each row in the Phone port.

SUM
Returns the sum of all values in the selected port.
Optionally, you can apply a filter to limit the rows
you read to calculate the total.
You can nest only one other aggregate function
within SUM, and the nested function must return a
Numeric datatype
Syntax
SUM( numeric_value [, filter_condition ] )
Return Value
Numeric value.

138

NULL if all values passed to the function are NULL or


if no rows are selected (for example, the filter
condition evaluates to FALSE or NULL for all rows).

NULL
NULL
'A12.3Grove'
0
' 176201123435.87' 176201123435
'-7245176201123435.2
-7245176201123435
'-7245176201123435.23'
-7245176201123435
-9223372036854775806.9 9223372036854775806
9223372036854775806.9 9223372036854775806

You can perform arithmetic on the values passed to


SUM before the function calculates the total. For
example:
SUM( QTY * PRICE - DISCOUNT )

SYSTIMESTAMP
Returns the current date and time of the node
hosting the PowerCenter Integration Service with
precision to the nanosecond.
The precision to which you display the date and time
depends on the platform.
The return value of the function varies depending on
how you configure the argument:
When
you
configure
the
argument
of
SYSTIMESTAMP as a variable, the PowerCenter
Integration Service evaluates the function for each
row in the transformation.
When
you
configure
the
argument
of
SYSTIMESTAMP as a constant, the PowerCenter
Integration Service evaluates the function once and
retains the value for each row in the transformation.
Syntax
SYSTIMESTAMP( [format] )
TO_BIGINT
Converts a string or numeric value to a bigint value.
TO_BIGINT syntax contains an optional argument
that you can choose to round the number to the
nearest integer or truncate the decimal portion.
TO_BIGINT ignores leading blanks.
Syntax
TO_BIGINT( value [, flag] )

TO_CHAR (Dates)
Converts dates to character strings.
TO_CHAR also converts numeric values to strings.
You can convert the date into any format using the
TO_CHAR format strings.
Syntax
TO_CHAR( date [,format] )
Return Value
String.
NULL if a value passed to the function is NULL.
Examples
The following expression converts the dates in the
DATE_PROMISED port to text in the format MON DD
YYYY:
TO_CHAR( DATE_PROMISED, 'MON DD YYYY' )

DATE_PROMISED
RETURN VALUE
Apr 1 1998 12:00:10AM
'Apr 01 1998'
Feb 22 1998 01:31:10PM
'Feb 22 1998'
Oct 24 1998 02:12:30PM
'Oct 24 1998'
NULL
NULL
If you omit the format argument, TO_CHAR returns a
string in the date format specified in the session, by
default, MM/DD/YYYY HH24:MI:SS.US:
TO_CHAR( DATE_PROMISED )

DATE_PROMISED
RETURN VALUE
Apr 1 1998 12:00:10AM
'04/01/1998
00:00:10.000000'
Feb 22 1998 01:31:10PM
'02/22/1998
13:31:10.000000'
Oct 24 1998 02:12:30PM
'10/24/1998
14:12:30.000000'
NULL
NULL

Return Value
Bigint.
NULL if the value passed to the function is NULL.
0 if the value passed to the function contains
alphanumeric characters
Examples
The following expressions use values from the port
IN_TAX:
TO_BIGINT( IN_TAX, TRUE )

IN_TAX
RETURN VALUE
'7245176201123435.6789' 7245176201123435
'7245176201123435.2'
7245176201123435
'7245176201123435.2.48' 7245176201123435

TO_CHAR (Numbers)
Converts numeric values to text strings. TO_CHAR
also converts dates to strings.
TO_CHAR converts numeric values to text strings as
follows:

139

- Converts double values to strings of up to 16 digits


and provides accuracy up to 15 digits. If you pass a
number with more than 15 digits, TO_CHAR rounds
the number to the sixteenth digit.
- Returns decimal notation for numbers in the ranges
(-1e16,-1e-16] and [1e-16, 1e16). TO_CHAR returns
scientific notation for numbers outside these ranges.
Note: The PowerCenter Integration Service converts
the values 1e-16 and -1e16 to scientific notation,
but returns the values 1e-16 and -1e-16 in decimal
notation.
Syntax
TO_CHAR( numeric_value )
Return Value
String.
NULL if a value passed to the function is NULL.
TO_DATE
Converts a character string to a Date/Time datatype.
You use the TO_DATE format strings to specify the
format of the source strings.
The output port must be Date/Time for TO_DATE
expressions.
If you are converting two-digit years with TO_DATE,
use either the RR or YY format string. Do not use
the YYYY format string.
Syntax
TO_DATE( string [, format] )
Return Value
Date.
TO_DATE always returns a date and time. If you pass
a string that does not have a time value, the date
returned
always
includes
the
time
00:00:00.000000000.
You can map the results of this function to any
target column with a datetime datatype.
If the target column precision is less than
nanoseconds, the PowerCenter Integration Service
truncates the datetime value to match the
precision of the target column when it writes
datetime values to the target.
NULL if you pass a null value to this function.

Return Value
If the string contains a non-numeric character,
converts the numeric portion of the string up to the
first nonnumeric character.
If the first numeric character is non-numeric, returns
0.
Decimal of precision and scale between 0 and 28,
inclusive.
NULL if a value passed to the function is NULL.
Example
This expression uses values from the port IN_TAX.
The datatype is decimal with precision of 10 and
scale of 3:
TO_DECIMAL( IN_TAX, 3 )

IN_TAX
'15.6789'
'60.2'
'118.348'
NULL
'A12.3Grove'
711A1

RETURN VALUE
15.679
60.200
118.348
NULL
0
711

TO_FLOAT
Converts a string or numeric value to a doubleprecision floating point number (the Double
datatype).
TO_FLOAT ignores leading blanks.
Syntax
TO_FLOAT( value )
Return Value
Double value.
0 if the value in the port is blank or a non-numeric
character.
NULL if a value passed to this function is NULL.
Example
This expression uses values from the port IN_TAX:
TO_FLOAT( IN_TAX )

IN_TAX
'15.6789'
'60.2'
'118.348'
NULL
'A12.3Grove'

TO_DECIMAL
Converts a string or numeric value to a decimal
value. TO_DECIMAL ignores leading blanks.
Syntax
TO_DECIMAL( value [, scale] )

TO_INTEGER

140

RETURN VALUE
15.6789
60.2
118.348
NULL
0

Converts a string or numeric value to an integer.


TO_INTEGER syntax contains an optional argument
that you can choose to round the number to the
nearest integer or truncate the decimal portion.
TO_INTEGER ignores leading blanks.

expression returns 6/13/1997 00:00:00.000000000:


TRUNC(6/13/1997 2:30:45, 'DD')
Syntax
TRUNC( date [,format] )
Return Value
Date.
NULL if a value passed to the function is NULL.

Syntax
TO_INTEGER( value [, flag] )
Return Value
Integer.
NULL if the value passed to the function is NULL.
0 if the value passed to the function contains
alphanumeric characters
Examples
The following expressions use values from the port
IN_TAX. The PowerCenter Integration Service
displays an error when the conversion causes a
numeric overflow:

Examples
The following expressions truncate the year portion
of dates in the DATE_SHIPPED port:
TRUNC( DATE_SHIPPED, 'Y' )
TRUNC( DATE_SHIPPED, 'YY' )
TRUNC( DATE_SHIPPED, 'YYY' )
TRUNC( DATE_SHIPPED, 'YYYY' )
DATE_SHIPPED
Jan 15 1998 2:10:30AM
12:00:00.000000000
Apr 19 1998 1:31:20PM
12:00:00.000000000
Jun 20 1998 3:50:04AM
12:00:00.000000000
Dec 20 1998 3:29:55PM
12:00:00.000000000
NULL
NULL

TO_INTEGER( IN_TAX, TRUE )

IN_TAX
'15.6789'
15
'60.2'
60
'118.348'
118
5,000,000,000
doesn't write row.
NULL
NULL
'A12.3Grove' 0
' 123.87'
123
'-15.6789'
-15

RETURN VALUE

Error.

Integration

Service

TRUNC (Dates)
Truncates dates to a specific year, month, day, hour,
minute, second, millisecond, or microsecond.
You can also use TRUNC to truncate numbers.
You can truncate the following date parts:
Year - If you truncate the year portion of the date,
the function returns Jan 1 of the input year with the
time set to 00:00:00.000000000. For example, the
following
expression
returns
1/1/1997
00:00:00.000000000: TRUNC(12/1/1997 3:10:15,
'YY')
Month - If you truncate the month portion of a date,
the function returns the first day of the month with
the time set to 00:00:00.000000000. For example,
the
following
expression
returns
4/1/1997
00:00:00.000000000: TRUNC(4/15/1997 12:15:00,
'MM')
Day - If you truncate the day portion of a date, the
function returns the date with the time set to
00:00:00.000000000. For example, the following

RETURN VALUE
Jan
1

1998

Jan

1998

Jan

1998

Jan

1998

TRUNC (Numbers)
Truncates numbers to a specific digit. You can also
use TRUNC to truncate dates.
Syntax
TRUNC( numeric_value [, precision] )
If precision is a positive integer, TRUNC returns
numeric_value with the number of decimal places
specified by precision.
If precision is a negative integer, TRUNC changes
the specified digits to the left of the decimal point
to zeros.
If you omit the precision argument, TRUNC truncates
the decimal portion of numeric_value and returns
an integer.
If you pass a decimal precision value, the
PowerCenter
Integration
Service
rounds
numeric_value to the nearest integer before
evaluating the expression.
Return Value
Numeric value.
NULL if one of the arguments is NULL

141

Examples
The following expressions truncate the values in the
Price port:
TRUNC( PRICE, 3 )
PRICE
12.9995
-18.8652
56.9563
15.9928
NULL

RETURN VALUE
12.999
-18.865
56.956
15.992
NULL

8. PERFORMANCE TUNING
TRUNC( PRICE, -1 )

PRICE
12.99
-187.86
56.95
1235.99

Complete the following tasks to improve session


performance:
1. Optimize the target - Enables the Integration
Service to write to the targets efficiently.
2. Optimize the source - Enables the Integration
Service to read source data efficiently.
3. Optimize the mapping - Enables the
Integration Service to transform and move data
efficiently.
4. Optimize the transformation - Enables the
Integration Service to process transformations in a
mapping efficiently.
5. Optimize the session - Enables the Integration
Service to run the session more quickly.
6. Optimize the grid deployments - Enables the
Integration Service to run on a grid with optimal
performance.
7. Optimize the PowerCenter components Enables the Integration Service and Repository
Service to function optimally.
8. Optimize the system - Enables PowerCenter
service processes to run more quickly

RETURN VALUE
10.0
-180.0
50.0
1230.0

UPPER
Converts lowercase string characters to uppercase.
Syntax
UPPER( string )
VARIANCE
Returns the variance of a value you pass to it.
VARIANCE is used to analyze statistical data.
You can nest only one other aggregate function
within VARIANCE, and the nested function must
return a Numeric datatype.
Syntax
VARIANCE( numeric_value [, filter_condition ] )

Bottlenecks Look for performance bottlenecks in the following


order:
1. Target
2. Source
3. Mapping
4. Session
5. System
Use the following methods to identify performance
bottlenecks:
Run test sessions - You can configure a test
session to read from a flat file source or to write to a
flat file target to identify source and target
bottlenecks.

142

Analyze performance details - Analyze


performance details, such as performance counters,
to determine where session performance decreases.
Analyze thread statistics - Analyze thread
statistics to determine the optimal number of
partition points.
Monitor system performance - You can use
system monitoring tools to view the percentage of
CPU use, I/O waits, and paging to identify system
bottlenecks. You can also use the Workflow Monitor
to view system resource usage

Complete the following tasks to eliminate


bottlenecks based on thread statistics:
- If the reader or writer thread is 100% busy,
consider using string datatype in the source or
target ports. Non-string ports require more
processing.
- If a transformation thread is 100% busy, consider
adding a partition point in the segment. When you
add partition points to the mapping, the Integration
Service increases the number of transformation
threads it uses for the session. However, if the
machine is already running at or near full capacity,
do not add more threads.
- If one transformation requires more processing
time than the others, consider adding a passthrough partition point to the transformation.

Using Thread Statistics


You can use thread statistics in the session log to
identify
source,
target,
or
transformation
bottlenecks.
By default, the Integration Service uses one reader
thread, one transformation thread, and one writer
thread to process a session.
The thread with the highest busy percentage
identifies the bottleneck in the session.
The session log provides the following thread
statistics:
Run time - Amount of time the thread runs.
Idle time - Amount of time the thread is idle. It
includes the time the thread waits for other thread
processing within the application. Idle time
includes the time the thread is blocked by the
Integration Service, but not the time the
thread is blocked by the operating system.
Busy time - Percentage of the run time the thread
is by according to the following formula:
(run time - idle time) / run time X 100
You can ignore high busy percentages when the total
run time is short, such as under 60 seconds. This
does not necessarily indicate a bottleneck.
Thread work time - The percentage of time the
Integration Service takes to process each
transformation in a thread. The session log shows
the following information for the transformation
thread work time:
Thread work time breakdown:
<transformation name>: <number> percent
<transformation name>: <number> percent
<transformation name>: <number> percent

If a transformation takes a small amount of time, the


session log does not include it.
If a thread does not have accurate statistics,
because the session ran for a short period of time,
the session log reports that the statistics are not
accurate.
Eliminating Bottlenecks Based on Thread Statistics

Example
When you run a session, the session log lists run
information and thread statistics similar to the
following text:
***** RUN INFO FOR TGT LOAD ORDER GROUP [1],
CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage]
of partition point [SQ_two_gig_file_32B_rows] has
completed.
Total Run Time = [505.871140] secs
Total Idle Time = [457.038313] secs
Busy Percentage = [9.653215]
Thread [TRANSF_1_1_1] created for [the
transformation stage] of partition point
[SQ_two_gig_file_32B_rows] has completed.
Total Run Time = [506.230461] secs
Total Idle Time = [1.390318] secs
Busy Percentage = [99.725359]
Thread work time breakdown:
LKP_ADDRESS: 25.000000 percent
SRT_ADDRESS: 21.551724 percent
RTR_ZIP_CODE: 53.448276 percent
Thread [WRITER_1_*_1] created for [the write stage]
of partition point [scratch_out_32B] has completed.
Total Run Time = [507.027212] secs
Total Idle Time = [384.632435] secs
Busy Percentage = [24.139686]
In this session log, the total run time for the
transformation thread is 506 seconds and the busy
percentage is 99.7%.
This means the transformation thread was never idle
for the 506 seconds.
The reader and writer busy percentages were
significantly smaller, about 9.6% and 24%.

143

In this session, the transformation thread is the


bottleneck in the mapping.
To
determine
which
transformation
in
the
transformation thread is the bottleneck, view the
busy percentage of each transformation in the
thread work time breakdown.
In
this
session
log,
the
transformation
RTR_ZIP_CODE had a busy percentage of 53%.
Identifying Target Bottlenecks
To identify a target bottleneck, complete the
following tasks:
- Configure a copy of the session to write to a flat file
target. If the session performance increases
significantly, you have a target bottleneck. If a
session already writes to a flat file target, you
probably do not have a target bottleneck.
- Read the thread statistics in the session log. When
the Integration Service spends more time on the
writer thread than the transformation or reader
threads, you have a target bottleneck.
Eliminating Target Bottlenecks
Complete the following tasks to eliminate target
bottlenecks:
- Have the database administrator optimize
database performance by optimizing the query.
- Increase the database network packet size.
- Configure index and key constraints.

If the time it takes to run the new session remains


about the same, you have a source bottleneck.
Using a Read Test Mapping
You can create a read test mapping to identify
source bottlenecks. A read test mapping isolates
the read query by removing the transformation in
the mapping.
To create a read test mapping, complete the
following steps:
1. Make a copy of the original mapping.
2. In the copied mapping, keep only the sources,
source qualifiers, and any custom joins or queries.
3. Remove all transformations.
4. Connect the source qualifiers to a file target.
Run a session against the read test mapping. If the
session performance is similar to the original
session, you have a source bottleneck.
Using a Database Query
To identify source bottlenecks, execute the read
query directly against the source database.
Copy the read query directly from the session log.
Execute the query against the source database with
a query tool such as isql.
Measure the query execution time and the time it
takes for the query to return the first row.
Eliminating Source Bottlenecks
Complete the following tasks to eliminate source
bottlenecks:
- Set the number of bytes the Integration
Service reads per line if the Integration
Service reads from a flat file source.
- Have the database administrator optimize
database performance by optimizing the query.
- Increase the database network packet size.
- Configure index and key constraints.
- If there is a long delay between the two time
measurements in a database query, you can use an
optimizer hint

Identifying Source Bottlenecks


You can read the thread statistics in the session log
to determine if the source is the bottleneck.
When the Integration Service spends more time on
the reader thread than the transformation or writer
threads, you have a source bottleneck.
If the session reads from a relational source,
use the following methods to identify source
bottlenecks:
- Filter transformation
- Read test mapping
- Database query

Identifying Mapping Bottlenecks


To identify mapping bottlenecks, complete the
following tasks:
- Read the thread statistics and work time statistics
in the session log. When the Integration Service
spends more time on the transformation thread than
the writer or reader threads, you have a
transformation bottleneck. When the Integration
Service spends more time on one transformation, it
is the bottleneck in the transformation thread.

If the session reads from a flat file source, you


probably do not have a source bottleneck.
Using a Filter Transformation
You can use a Filter transformation in the mapping to
measure the time it takes to read source data.
Add a Filter transformation after each source
qualifier.
Set the filter condition to false so that no data is
processed passed the Filter transformation.

144

- Analyze performance counters. High errorrows and


rowsinlookupcache counters indicate a mapping
bottleneck.
- Add a Filter transformation before each target
definition. Set the filter condition to false so that no
data is loaded into the target tables. If the time it
takes to run the new session is the same as the
original session, you have a mapping bottleneck.

Server total bytes per second - The server has


sent to and received from the network.
Identifying System Bottlenecks on UNIX
top - View overall system performance. This tool
displays CPU usage, memory usage, and swap
usage for the system and for individual processes
running on the system.
iostat - Monitor the loading operation for every disk
attached to the database server. Iostat displays the
percentage of time that the disk is physically active.
If you use disk arrays, use utilities provided with the
disk arrays instead of iostat.
vmstat - Monitor disk swapping actions. Swapping
should not occur during the session.
sar - View detailed system activity reports of CPU,
memory, and disk usage. You can use this tool to
monitor
CPU loading - It provides percent usage on user,
system, idle time, and waiting time. You can also use
this tool to monitor disk swapping actions.

Eliminating Mapping Bottlenecks


- To eliminate mapping bottlenecks, optimize
transformation settings in mappings.
If you do not have a source, target, or mapping
bottleneck, you may have a session bottleneck.
Small cache size, low buffer memory, and small
commit intervals can cause session bottlenecks.
Identifying Session Bottlenecks
- To identify a session bottleneck, analyze the
performance details. Performance details display
information about each transformation, such as the
number of input rows, output rows, and error rows.

Eliminating System Bottlenecks


- If the CPU usage is more than 80%, check the
number of concurrent running tasks. Consider
changing the load or using a grid to distribute tasks
to different nodes. If you cannot reduce the load,
consider adding more processors.
- If swapping occurs, increase the physical memory
or reduce the number of memory-intensive
applications on the disk.
- If you have excessive memory pressure
(thrashing), consider adding more physical memory.
- If the percent of time is high, tune the cache for
PowerCenter to use in-memory cache instead of
writing to disk. If you tune the cache, requests are
still in queue, and the disk busy percentage is at
least 50%, add another disk device or upgrade to a
faster disk device. You can also use a separate disk
for each partition in the session.
- If physical disk queue length is greater than two,
consider adding another disk device or upgrading
the disk device. You also can use separate disks for
the reader, writer, and transformation threads.
- Consider improving network bandwidth.
- When you tune UNIX systems, tune the server for a
major database system.
- If the percent time spent waiting on I/O (%wio) is
high, consider using other under-utilized disks. For
example, if the source data, target data, lookup,
rank, and aggregate cache files are all on the same
disk, consider putting them on different disks.

Eliminating Session Bottlenecks


- To eliminate session bottlenecks, optimize the
session
Using the Workflow Monitor to Identify System
Bottlenecks
CPU% - The percentage of CPU usage includes other
external tasks running on the system.
Memory usage - To troubleshoot, use system tools
to check the memory usage before and after running
the session and then compare the results to the
memory usage while running the session.
Swap usage - Swap usage is a result of paging due
to possible memory leaks or a high number of
concurrent tasks.
Identifying System Bottlenecks on Windows
Use the Windows Performance Monitor to create a
chart that provides the following information:
Percent processor time - If you have more than
one CPU, monitor each CPU for percent processor
time.
Pages/second - If pages/second is greater than
five, you may have excessive memory pressure
(thrashing).
Physical disks percent times - The percent of
time that the physical disk is busy performing read
or write requests.
Physical disks queue length - The number of
users waiting for access to the same disk device.

Optimizing Flat File Targets

145

If you use a shared storage directory for flat file


targets, you can optimize session performance by
ensuring that the shared storage directory is on a
machine that is dedicated to storing and managing
files, instead of performing other tasks.
If the Integration Service runs on a single node and
the session writes to a flat file target, you can
optimize session performance by writing to a flat file
target that is local to the Integration Service process
node.
Dropping Indexes and Key Constraints
When you define key constraints or indexes in target
tables, you slow the loading of data to those tables.
To improve performance, drop indexes and key
constraints before you run the session.
You can rebuild those indexes and key constraints
after the session completes.
If you decide to drop and rebuild indexes and key
constraints on a regular basis, you can use the
following methods to perform these operations
each time you run the session:
- Use pre-load and post-load stored procedures.
- Use pre-session and post-session SQL commands.
Note: To optimize performance, use constraintbased loading only if necessary
Increasing Database Checkpoint Intervals
The Integration Service performance slows each
time it waits for the database to perform a
checkpoint.
To decrease the number of checkpoints and increase
performance, increase the checkpoint interval in
the database.
Note: Although you gain performance when you
reduce the number of checkpoints, you also
increase the recovery time if the database shuts
down unexpectedly.
Using Bulk Loads
You can use bulk loading to improve the
performance of a session that inserts a large
amount of data into a DB2, Sybase ASE, Oracle, or
Microsoft SQL Server database.
Configure bulk loading in the session properties.
When bulk loading, the Integration Service bypasses
the database log, which speeds performance.
Without writing to the database log, however, the
target database cannot perform rollback.
As a result, you may not be able to perform
recovery. When you use bulk loading, weigh the
importance of improved session performance
against the ability to recover an incomplete
session.

When bulk loading to Microsoft SQL Server or Oracle


targets, define a large commit interval to increase
performance.
Microsoft SQL Server and Oracle start a new bulk
load transaction after each commit.
Increasing the commit interval reduces the number
of bulk load transactions, which increases
performance.
Using External Loaders
To increase session performance, configure
PowerCenter to use an external loader for the
following types of target databases:
- IBM DB2 EE or EEE
- Oracle
- Sybase IQ
- Teradata
Minimizing Deadlocks
If the Integration Service encounters a deadlock
when it tries to write to a target, the deadlock only
affects targets in the same target connection
group. The Integration Service still writes to targets
in other target connection groups.
Encountering
deadlocks
can
slow
session
performance.
To improve session performance, you can increase
the number of target connection groups the
Integration Service uses to write to the targets in a
session. To use a different target connection group
for each target in a session, use a different
database connection name for each target
instance.
You can specify the same connection information for
each connection name.
Increasing Database Network Packet Size
If you write to Oracle, Sybase ASE, or Microsoft SQL
Server targets, you can improve the performance
by increasing the network packet size.
Increase the network packet size to allow larger
packets of data to cross the network at one time.
Increase the network packet size based on the
database you write to:
Oracle - You can increase the database server
network packet size in listener.ora and
tnsnames.ora. Consult your database
documentation for additional information about
increasing the packet size, if necessary.
Sybase ASE and Microsoft SQL Server - Consult
your database documentation for information about
how to increase the packet size.
For Sybase ASE or Microsoft SQL Server, you must
also change the packet size in the relational

146

connection object in the Workflow Manager to


reflect the database server packet size.
Optimizing the Query
If a session joins multiple source tables in one
Source Qualifier, you might be able to improve
performance by optimizing the query with
optimizing hints. Also, single table select
statements with an ORDER BY or GROUP BY clause
may benefit from optimization such as adding
indexes.
Usually, the database optimizer determines the
most efficient way to process the source data.
However, you might know properties about the
source tables that the database optimizer does not.
The database administrator can create optimizer
hints to tell the database how to execute the query
for a particular set of source tables.
The query that the Integration Service uses to read
data appears in the session log/SQ Transformation
Have the database administrator analyze the query,
and then create optimizer hints and indexes for the
source tables.
Use optimizing hints if there is a long delay between
when the query begins executing and when
PowerCenter receives the first row of data.
Configure optimizer hints to begin returning rows as
quickly as possible, rather than returning all rows at
once.
This allows the Integration Service to process rows
parallel with the query execution.
Once you optimize the query, use the SQL override
option
to
take
full
advantage
of
these
modifications.
You can also configure the source database to run
parallel queries to improve performance.
Using Conditional Filters
A simple source filter on the source database can
sometimes negatively impact performance because
of the lack of indexes.
You can use the PowerCenter conditional filter in the
Source Qualifier to improve performance.
Whether you should use the PowerCenter
conditional filter to improve performance depends
on the session.
For example, if multiple sessions read from the same
source simultaneously, the PowerCenter conditional
filter may improve performance.
However, some sessions may perform faster if you
filter the source data on the source database.
You can test the session with both the database filter
and the PowerCenter filter to determine which
method improves performance.

Increasing Database Network Packet Size


If you read from Oracle, Sybase ASE, or Microsoft
SQL Server sources, you can improve the
performance by increasing the network packet size.
Increase the network packet size to allow larger
packets of data to cross the network at one time.
Increase the network packet size based on the
database you read from:
Oracle - You can increase the database server
network
packet
size
in
listener.ora
and
tnsnames.ora.
Consult
your
database
documentation for additional information about
increasing the packet size, if necessary.
Sybase ASE and Microsoft SQL Server - Consult your
database documentation for information about how
to increase the packet size.
For Sybase ASE or Microsoft SQL Server, you must
also change the packet size in the relational
connection object in the Workflow Manager to
reflect the database server packet size.

Connecting to Oracle Database Sources


If you are running the Integration Service on a single
node and the Oracle instance is local to the
Integration Service process node, you can optimize
performance by using IPC protocol to connect to the
Oracle database. You can set up an Oracle
database
connection
in
listener.ora
and
tnsnames.ora.
Using tempdb to Join Sybase or Microsoft SQL Server
Tables
When you join large tables on a Sybase or Microsoft
SQL Server database, it is possible to improve
performance by creating the tempdb as an inmemory database to allocate sufficient memory.

Optimizing Mappings Overview


Focus on mapping-level optimization after you
optimize the targets and sources.
Generally,
you
reduce
the
number
of
transformations in the mapping and delete
unnecessary links between transformations to
optimize the mapping.
Configure the mapping with the least number of
transformations and expressions to do the most
amount of work possible.
Delete unnecessary links between transformations
to minimize the amount of data moved.
Optimizing Flat File Sources
- Optimize the line sequential buffer length.
- Optimize delimited flat file sources.
- Optimize XML and flat file sources.

147

Optimizing the Line Sequential Buffer Length


If the session reads from a flat file source, you can
improve session performance by setting the
number of bytes the Integration Service reads per
line.
By default, the Integration Service reads 1024 bytes
per line.
If each line in the source file is less than the default
setting, you can decrease the line sequential buffer
length in the session properties.
Optimizing Delimited Flat File Sources
If a source is a delimited flat file, you must specify
the delimiter character to separate columns of data
in the source file.
You must also specify the escape character.
The Integration Service reads the delimiter
character as a regular character if you include the
escape character before the delimiter character.
You can improve session performance if the source
flat file does not contain quotes or escape
characters.
Optimizing XML and Flat File Sources
XML files are usually larger than flat files because of
the tag information.
The size of an XML file depends on the level of
tagging in the XML file.
More tags result in a larger file size.
As a result, the Integration Service may take longer
to read and cache XML sources.
Configuring Single-Pass Reading
Single-pass reading allows you to populate multiple
targets with one source qualifier.
Consider using single-pass reading if you have
multiple sessions that use the same sources.
You can combine the transformation logic for each
mapping in one mapping and use one source
qualifier for each source.
The Integration Service reads each source once and
then sends the data into separate pipelines.
A particular row can be used by all the pipelines, by
any combination of pipelines, or by no pipelines.
For example, you have the Purchasing source table,
and you use that source daily to perform an
aggregation and a ranking.
If
you
place
the
Aggregator
and
Rank
transformations
in
separate
mappings
and
sessions, you force the Integration Service to read
the same source table twice.

However, if you include the aggregation and ranking


logic in one mapping with one source qualifier, the
Integration Service reads the Purchasing source
table once, and then sends the appropriate data to
the separate pipelines.
When changing mappings to take advantage of
single-pass reading, you can optimize this feature
by factoring out common functions from mappings.
For example, if you need to subtract a percentage
from the Price ports for both the Aggregator and
Rank transformations, you can minimize work by
subtracting the percentage before splitting the
pipeline.

Optimizing Pass-Through Mappings


To pass directly from source to target without any
other transformations, connect the Source Qualifier
transformation directly to the target.
If you use the Getting Started Wizard to create a
pass-through mapping, the wizard creates an
Expression
transformation between the SQ
transformation and the target
Optimizing Filters
Use one of the following transformations to filter
data:
Source Qualifier transformation - The Source
Qualifier transformation filters rows from relational
sources.
Filter transformation - The Filter transformation
filters data within a mapping. The Filter
transformation filters rows from any type of source.
If you filter rows from the mapping, you can improve
efficiency by filtering early in the data flow.
Use a filter in the Source Qualifier transformation to
remove the rows at the source.
The Source Qualifier transformation limits the row
set extracted from a relational source.
If you cannot use a filter in the Source Qualifier
transformation, use a Filter transformation and
move it as close to the SQ transformation as
possible to remove unnecessary data early in the
data flow.
The Filter transformation limits the row set sent to a
target.
Avoid using complex expressions in filter conditions.
To optimize Filter transformations, use simple
integer or true/false expressions in the filter
condition.

148

Note: You can also use a Filter or Router


transformation to drop rejected rows from an Update
Strategy transformation if you do not need to keep
rejected rows.

Instead of performing the lookup five times, place


the Lookup transformation in the mapping before
the data flow splits.
Next, pass the lookup results to all five targets.

Optimizing Datatype Conversions


You can increase performance by eliminating
unnecessary datatype conversions.
For example, if a mapping moves data from an
Integer column to a Decimal column, then back to
an Integer column, the unnecessary datatype
conversion slows performance.
Where possible, eliminate unnecessary datatype
conversions from mappings.

Use the following datatype conversions to


improve system performance:
- Use integer values in place of other datatypes
when performing comparisons using Lookup and
Filter transformations. For example, many databases
store U.S. ZIP code information as a Char or Varchar
datatype. If you convert the zip code data to an
Integer datatype, the lookup database stores the zip
code
94303-1234 as 943031234. This helps increase the
speed of the lookup comparisons based on zip code.
- Convert the source dates to strings through portto-port conversions to increase session performance.
You can either leave the ports in targets as strings or
change the ports to Date/Time ports.

Optimizing Expressions
Complete the following tasks to isolate the slow
expressions:
1. Remove the expressions one-by-one from the
mapping.
2. Run the mapping to determine the time it takes to
run the mapping without the transformation.
If there is a significant difference in session run time,
look for ways to optimize the slow expression.
Factoring Out Common Logic
If the mapping performs the same task in multiple
places, reduce the number of times the mapping
performs the task by moving the task earlier in the
mapping.
For example, you have a mapping with five target
tables.
Each target requires a Social Security number
lookup.

Minimizing Aggregate Function Calls


When writing expressions, factor out as many
aggregate function calls as possible.
Each time you use an aggregate function call, the
Integration Service must search and group the
data.
For example, in the following expression, the
Integration Service reads COLUMN_A, finds the
sum, then reads COLUMN_B, finds the sum, and
finally finds the sum of the two sums:
SUM(COLUMN_A) + SUM(COLUMN_B)
If you factor out the aggregate function call, as
below, the Integration Service adds COLUMN_A to
COLUMN_B, then finds the sum of both.
SUM(COLUMN_A + COLUMN_B)

Replacing Common Expressions with Local


Variables
If you use the same expression multiple times in one
transformation, you can make that expression a
local variable.
You can use a local variable only within the
transformation.
Choosing Numeric versus String Operations
The
Integration
Service
processes
numeric
operations faster than string operations.
For example, if you look up large amounts of data on
two columns, EMPLOYEE_NAME and EMPLOYEE_ID,
configuring the lookup around EMPLOYEE_ID
improves performance.
Optimizing
Char-Char
and
Char-Varchar
Comparisons
When the Integration Service performs comparisons
between CHAR and VARCHAR columns, it slows
each time it finds trailing blank spaces in the row.
You can use the TreatCHARasCHARonRead option
when you configure the Integration Service in the
Informatica Administrator so that the Integration
Service does not trim trailing spaces from the end
of Char source fields
Choosing DECODE Versus LOOKUP
When you use a LOOKUP function, the Integration
Service must look up a table in a database.
When you use a DECODE function, you incorporate
the lookup values into the expression so the
Integration Service does not have to look up a

149

separate table. Therefore, when you want to look


up a small set of unchanging values, use DECODE
to improve performance.
Using Operators Instead of Functions
The Integration Service reads expressions written
with operators faster than expressions with
functions.
Where possible, use operators to write expressions.
For example, you have the following expression that
contains nested CONCAT functions:
CONCAT (CONCAT (CUSTOMERS.FIRST_NAME, )
CUSTOMERS.LAST_NAME)
You can rewrite that expression with the || operator
as follows:
CUSTOMERS.FIRST_NAME
||

||
CUSTOMERS.LAST_NAME

Optimizing IIF Functions


IIF functions can return a value and an action, which
allows for more compact expressions.
For example, you have a source with three Y/N flags:
FLG_A, FLG_B, FLG_C. You want to return values
based on the values of each flag.
You use the following expression:
IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'Y',
VAL_A + VAL_B + VAL_C,
IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'N',
VAL_A + VAL_B ,
IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'Y',
VAL_A + VAL_C,
IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'N',
VAL_A ,
IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y',
VAL_B + VAL_C,
IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'N',
VAL_B ,
IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'Y',
VAL_C,
IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N',
0.0,))))))))
This expression requires 8 IIFs, 16 ANDs, and at least
24 comparisons.
If you take advantage of the IIF function, you can
rewrite that expression as:
IIF(FLG_A='Y', VAL_A, 0.0)+ IIF(FLG_B='Y', VAL_B,
0.0)+ IIF(FLG_C='Y', VAL_C, 0.0)
This results in three IIFs, two comparisons, two
additions, and a faster session.
Evaluating Expressions
If you are not sure which expressions slow
performance, evaluate the expression performance
to isolate the problem.
1. Time the session with the original expressions.

2. Copy the mapping and replace half of the


complex expressions with a constant.
3. Run and time the edited session.
4. Make another copy of the mapping and replace
the other half of the complex expressions with a
constant.
5. Run and time the edited session
Optimizing External Procedures
For example, you need to create an external
procedure with two input groups.
The external procedure reads a row from the first
input group and then reads a row from the second
input group.
If you use blocking, you can write the external
procedure code to block the flow of data from one
input group while it processes the data from the
other input group.
When you write the external procedure code to
block data, you increase performance because the
procedure does not need to copy the source data to
a buffer. However, you could write the external
procedure to allocate a buffer and copy the data
from one input group to the buffer until it is ready
to process the data.
Copying source data to a buffer decreases
performance.

Optimizing Aggregator Transformations


Aggregator transformations often slow performance
because they must group data before processing it.
Aggregator transformations need additional memory
to hold intermediate group results.
Use the following guidelines to optimize the
performance of an Aggregator transformation:
- Group by simple columns.
1. When possible, use numbers instead of string and
dates in the columns used for the GROUP BY.
2. Avoid complex expressions in the Aggregator
expressions.
- Use sorted input.
1. The Sorted Input option decreases the use of
aggregate caches
2. You can increase performance when you use the
Sorted Input option in sessions with multiple
partitions
- Use incremental aggregation.
1. If you can capture changes from the source that
affect less than half the target, you can use
incremental
aggregation
to
optimize
the
performance of Aggregator transformations.
2. When you use incremental aggregation, you apply
captured changes in the source to aggregate
calculations in a session.

150

3. The Integration Service updates the target


incrementally, rather than processing the entire
source and recalculating the same calculations
every time you run the session.
4. You can increase the index and data cache sizes
to hold all data in memory without paging to disk
- Filter data before you aggregate it.
- Limit port connections

To perform a join in a database, use the following


options:
- Create a pre-session stored procedure to join the
tables in a database.
- Use the Source Qualifier transformation to perform
the join.
Join sorted data when possible - To improve
session
performance,
configure
the
Joiner
transformation to use sorted input. When you
configure the Joiner transformation to use sorted
data, the Integration Service improves performance
by minimizing disk input and output. You see the
greatest performance improvement when you work
with large data sets. For an unsorted Joiner
transformation, designate the source with fewer
rows as the master source.

Optimizing Custom Transformations


The Integration Service can pass a single row to a
Custom transformation procedure or a block of rows
in an array.
You can write the procedure code to specify whether
the procedure receives one row or a block of rows.
You can increase performance when the procedure
receives a block of rows:
- You can decrease the number of function calls the
Integration Service and procedure make. The
Integration Service calls the input row notification
function fewer times, and the procedure calls the
output notification function fewer times.
- You can increase the locality of memory access
space for the data.
- You can write the procedure code to perform an
algorithm on a block of data instead of each row of
data.

Optimizing Lookup Transformations


- Use the optimal database driver.
- Cache lookup tables.
Use the appropriate cache type.
Enable concurrent caches.
Optimize Lookup condition matching.
- When the Lookup transformation matches
lookup cache data with the lookup condition,
it sorts and orders the data to determine the
first matching value and the last matching
value.
- You can configure the transformation to
return any value that matches the lookup
condition.
- When
you
configure
the
Lookup
transformation to return any matching value,
the transformation returns the first value that
matches the lookup condition.
- It does not index all ports as it does when
you configure the transformation to return
the first matching value or the last matching
value.
- When you use any matching value,
performance can improve because the
transformation does not index on all ports,
which can slow performance.

Optimizing Joiner Transformations


Use the following tips to improve session
performance with the Joiner transformation:
Designate the master source as the source
with fewer duplicate key values - When the
Integration Service processes a sorted Joiner
transformation, it caches rows for one hundred
unique keys at a time. If the master source contains
many rows with the same key value, the Integration
Service must cache more rows, and performance
can be slowed.
Designate the master source as the source
with fewer rows - During a session, the Joiner
transformation compares each row of the detail
source against the master source. The fewer rows in
the master, the fewer iterations of the join
comparison occur, which speeds the join process.
Perform joins in a database when possible Performing a join in a database is faster than
performing a join in the session. The type of
database join you use can affect performance.
Normal joins are faster than outer joins and result in
fewer rows. In some cases, you cannot perform the
join in the database, such as joining tables from two
different databases or flat file systems.

151

Reduce the number of cached rows.


You can reduce the number of rows included
in the cache to increase performance. Use
the Lookup SQL Override option to add a
WHERE clause to the default SQL statement

Override the ORDER BY statement.


By default, the Integration Service generates
an ORDER BY statement for a cached lookup.

The ORDER BY statement contains all lookup


ports.
To increase performance, suppress the
default ORDER BY statement and enter an
override ORDER BY with fewer columns.
The Integration Service always generates an
ORDER BY statement, even if you enter one
in the override.
Place two dashes -- after the ORDER BY
override to suppress the generated ORDER
BY statement.
For example, a Lookup transformation uses
the following lookup condition:
ITEM_ID = IN_ITEM_ID
PRICE <= IN_PRICE
The Lookup transformation includes three
lookup ports used in the mapping, ITEM_ID,
ITEM_NAME, and PRICE.
When you enter the ORDER BY statement,
enter the columns in the same order as the
ports in the lookup condition.
You must also enclose all database reserved
words in quotes.
Enter the following lookup query in the
lookup SQL override:
SELECT
ITEMS_DIM.ITEM_NAME,
ITEMS_DIM.PRICE, ITEMS_DIM.ITEM_ID FROM
ITEMS_DIM ORDER BY ITEMS_DIM.ITEM_ID,
ITEMS_DIM.PRICE -Use a machine with more memory.

- Optimize the lookup condition.


If you include more than one lookup
condition, place the conditions in the
following
order
to
optimize
lookup
performance:
Equal to (=)
Less than (<), greater than (>), less than or
equal to (<=), greater than or equal to (>=)
Not equal to (! =)
- Filter lookup rows.
- Index the lookup table.
The Integration Service needs to query, sort,
and compare values in the lookup condition
columns.
The index needs to include every column
used in a lookup condition
- Optimize multiple lookups.
If a mapping contains multiple lookups, even
with caching enabled and enough heap
memory, the lookups can slow performance.

Tune the Lookup transformations that query


the largest amounts of data to improve
overall performance.
To determine which Lookup transformations
process the most data, examine the
Lookup_rowsinlookupcache counters for each
Lookup transformation.
The Lookup transformations that have a large
number in this counter might benefit from
tuning their lookup expressions.
If those expressions can be optimized,
session performance improves
- Create a pipeline Lookup transformation and
configure partitions in the pipeline that builds the
lookup source.

Optimizing
Sequence
Generator
Transformations
To optimize Sequence Generator transformations,
create a reusable Sequence Generator and using it
in multiple mappings simultaneously.
Also, configure the Number of Cached Values
property.
The Number of Cached Values property determines
the number of values the Integration Service
caches at one time.
Make sure that the Number of Cached Value is not
too small.
Consider configuring the Number of Cached Values
to a value greater than 1,000.
If you do not have to cache values, set the Number
of Cache Values to 0.
Sequence Generator transformations that do not use
cache are faster than those that require cache.
Optimizing Sorter Transformations
- Allocate enough memory to sort the data.
- Specify a different work directory for each partition
in the Sorter transformation.

152

Optimizing SQL Transformations


Each time the Integration Service processes a new
query in a session, it calls a function called
SQLPrepare to create an SQL procedure and pass it
to the database.
When the query changes for each input row, it has a
performance impact.
When an SQL query contains commit and rollback
query statements, the Integration Service must
recreate the SQL procedure after each commit or
rollback.
To optimize performance, do not use transaction
statements in an SQL transformation query.

When you configure the transformation to use a


static connection, you choose a connection from
the Workflow Manager connections.
The SQL transformation connects to the database
once during the session.
When you pass dynamic connection information, the
SQL transformation connects to the database each
time the transformation processes an input row.

Eliminating Transformation Errors


In large numbers, transformation errors slow the
performance of the Integration Service.
With each transformation error, the Integration
Service pauses to determine the cause of the error
and to remove the row causing the error from the
data flow.
Next, the Integration Service typically writes the row
into the session log file.
Transformation errors occur when the Integration
Service encounters conversion errors, conflicting
mapping logic, and any condition set up as an error,
such as null input.
Check the session log to see where the
transformation errors occur.
If the errors center on particular transformations,
evaluate those transformation constraints.
Optimizing Sessions
Grid A grid is an alias assigned to a group of nodes that
allows you to automate the distribution of
workflows and sessions across nodes.
When you use a grid, the Integration Service
distributes workflow tasks and session threads
across multiple nodes.
A Load Balancer distributes tasks to nodes without
overloading any node. Running workflows and
sessions on the nodes of a grid provides the
following performance gains:
- Balances the Integration Service workload.
- Processes concurrent sessions faster.
- Processes partitions faster.

When
a PowerCenter mapping
contains
a
transformation that has cache memory, deploying
adequate memory and separate disk storage for
each cache instance improves performance.
Running a session on a grid can improve throughput
because the grid provides more resources to run
the session.
Performance improves when you run a few sessions
on the grid at a time.
Running a session on a grid is more efficient than
running a workflow over a grid if the number of
concurrent session partitions is less than the
number of nodes.
When you run multiple sessions on a grid, session
subtasks share node resources with subtasks of
other concurrent sessions.
Running a session on a grid requires coordination
between processes running on different nodes.
For some mappings, running a session on a grid
requires additional overhead to move data from
one node to another node.
In addition to loading the memory and CPU
resources on each node, running multiple sessions
on a grid adds to network traffic.
When you run a workflow on a grid, the Integration
Service loads memory and CPU resources on nodes
without requiring coordination between the nodes
Pushdown Optimization To
increase
session
transformation logic to
database.

push
target

Concurrent Sessions and Workflows If possible, run sessions and workflows concurrently
to improve performance.
For example, if you load data into an analytic
schema, where you have dimension and fact tables,
load the dimensions concurrently

The Integration Service requires CPU resources for


parsing input data and formatting the output data.
A grid can improve performance when you have a
performance bottleneck in the extract and load
steps of a session.
A grid can improve performance when memory or
temporary storage is a performance bottleneck.

performance,
the source or

153

Buffer Memory
When the Integration Service initializes a session, it
allocates blocks of memory to hold source and
target data.
The Integration Service allocates at least two blocks
for each source and target partition.
Sessions that use a large number of sources and
targets might require additional memory blocks.
If the Integration Service cannot allocate enough
memory blocks to hold the data, it fails the session.
You can configure the amount of buffer memory, or
you can configure the Integration Service to
calculate buffer settings at run time.

To increase the number of available memory blocks,


adjust the following session properties:
DTM Buffer Size - Increase the DTM buffer size on
the Properties tab in the session properties.
Default Buffer Block Size - Decrease the buffer
block size on the Config Object tab in the session
properties.
Before you configure these settings, determine the
number of memory blocks the Integration Service
requires to initialize the session.
Then, based on default settings, calculate the buffer
size and the buffer block size to create the required
number of session blocks.
If you have XML sources or targets in a mapping,
use the number of groups in the XML source or
target in the calculation for the total number of
sources and targets.
For example, you create a session that contains a
single partition using a mapping that contains 50
sources and 50 targets.
Then, you make the following calculations:
1. You determine that the session requires a
minimum of 200 memory blocks:
[(total number of sources + total number of
targets)* 2] = (session buffer blocks)
100 * 2 = 200
2. Based on default settings, you determine that you
can change the DTM Buffer Size to 15,000,000, or
you can change the Default Buffer Block Size to
54,000:
(Session Buffer Blocks) = (.9) * (DTM Buffer Size) /
(Default Buffer Block Size) * (number of partitions)
200 = .9 * 14222222 / 64000 * 1
Or
200 = .9 * 12000000 / 54000 * 1
Note: For a session that contains n partitions, set
the DTM Buffer Size to at least n times the value for
the session with one partition.
The Log Manager writes a warning message in the
session log if the number of memory blocks is so
small that it causes performance degradation.
The Log Manager writes this warning message even
if the number of memory blocks is enough for the
session to run successfully.
The warning message also gives a suggestion for
the proper value.
If you modify the DTM Buffer Size, increase the
property by multiples of the buffer block size.

The DTM Buffer Size setting specifies the amount of


memory the Integration Service uses as DTM buffer
memory.
The Integration Service uses DTM buffer memory to
create the internal data structures and buffer
blocks used to bring data into and out of the
Integration Service.
When you increase the DTM buffer memory, the
Integration Service creates more buffer blocks,
which improves performance during momentary
slowdowns.
Increasing DTM buffer memory allocation generally
causes performance to improve initially and then
level off.
When you increase the DTM buffer memory
allocation, consider the total memory available on
the Integration Service process system.
If you do not see a significant increase in
performance, DTM buffer memory allocation is not
a factor in session performance.
Note: Reducing the DTM buffer allocation can cause
the session to fail early in the process because the
Integration Service is unable to allocate memory to
the required processes.
To increase the DTM buffer size, open the session
properties and click the Properties tab.
Edit the DTM Buffer Size property in the Performance
settings.
Increase the property by multiples of the buffer
block size, and then run and time the session after
each increase

Increasing DTM Buffer Size

154

Optimizing the Buffer Block Size


Depending on the session source data, you might
need to increase or decrease the buffer block size.
If the machine has limited physical memory and the
mapping in the session contains a large number of
sources, targets, or partitions, you might need to
decrease the buffer block size.
If you are manipulating unusually large rows of data,
increase the buffer block size to improve
performance.
If you do not know the approximate size of the rows,
determine the configured row size by completing
the following steps.
To evaluate needed buffer block size:
1. In the Mapping Designer, open the mapping for
the session.
2. Open the target instance
3. Click the Ports tab.
4. Add the precision for all columns in the target.

5. If you have more than one target in the mapping,


repeat steps 2 to 4 for each additional target to
calculate the precision for each target.
6. Repeat steps 2 to 5 for each source definition in
the mapping.
7. Choose the largest precision of all the source and
target precisions for the total precision in the buffer
block size calculation.

The total precision represents the total bytes needed


to move the largest row of data.
For example, if the total precision equals 33,000,
then the Integration Service requires 33,000 bytes
in the buffers to move that row.
If the buffer block size is 64,000 bytes, the
Integration Service can move only one row at a
time.
Ideally, a buffer accommodates at least 100 rows at
a time.
So if the total precision is greater than 32,000,
increase the size of the buffers to improve
performance.
To increase the buffer block size, open the session
properties and click the Config Object tab.
Edit the Default Buffer Block Size property in the
Advanced settings.
Increase the DTM buffer block setting in relation to
the size of the rows.
As with DTM buffer memory allocation, increasing
buffer block size should improve performance.
If you do not see an increase, buffer block size is not
a factor in session performance.

Caches
The Integration Service uses the index and data
caches for XML targets and Aggregator, Rank,
Lookup, and Joiner transformations.
The Integration Service stores transformed data in
the data cache before returning it to the pipeline.
It stores group information in the index cache.
Also, the Integration Service uses a cache to store
data for Sorter transformations.
To configure the amount of cache memory, use the
cache calculator or specify the cache size.
You can also configure the Integration Service to
calculate cache memory settings at run time.
If the allocated cache is not large enough to store
the data, the Integration Service stores the data in
a temporary disk file, a cache file, as it processes
the session data.
Performance slows each time the Integration Service
pages to a temporary file.
Examine the performance counters to determine
how often the Integration Service pages to a file.

Perform the following tasks to optimize caches:


- Limit the number of connected input/output and
output only ports.
- Select the optimal cache directory location.
If you run the Integration Service on a grid and only
some Integration Service nodes have fast access to
the shared cache file directory, configure each
session with a large cache to run on the nodes with
fast access to the directory.
If all Integration Service processes in a grid have
slow access to the cache files, set up a separate,
local cache file directory for each Integration
Service process.
An Integration Service process may have faster
access to the cache files if it runs on the same
machine that contains the cache directory.
Note: You may encounter performance degradation
when you cache large quantities of data on a
mapped or mounted drive.

Increase the cache sizes


You configure the cache size to specify the amount
of memory allocated to process a transformation.
The amount of memory you configure depends on
how much memory cache and disk cache you want
to use.
If you configure the cache size and it is not enough
to process the transformation in memory, the
Integration Service processes some of the
transformation in memory and pages information to
cache files to process the rest of the
transformation.
Each time the Integration Service pages to a cache
file, performance slows.
You can examine the performance details of a
session to determine when the Integration Service
pages to a cache file.
The
Transformation_readfromdisk
or
Transformation_writetodisk
counters
for
any
Aggregator, Rank, or Joiner transformation indicate
the number of times the Integration Service pages
to disk to process the transformation.

Use the 64-bit version of PowerCenter to run large


cache sessions
If you process large volumes of data or perform
memory-intensive transformations, you can use the
64-bit PowerCenter version to increase session
performance.
The 64-bit version provides a larger memory space
that can significantly reduce or eliminate disk
input/output.

155

This can improve session performance in the


following areas:
Caching - With a 64-bit platform, the Integration
Service is not limited to the 2 GB cache limit of a
32-bit platform.
Data throughput - With a larger available
memory space, the reader, writer, and DTM threads
can process larger blocks of data

Target-Based Commit
Each time the Integration Service commits,
performance slows.
Therefore, the smaller the commit interval, the more
often the Integration Service writes to the target
database, and the slower the overall performance
If you increase the commit interval, the number of
times the Integration Service commits decreases
and performance improves.
When you increase the commit interval, consider the
log file limits in the target database.
If the commit interval is too high, the Integration
Service may fill the database log file and cause the
session to fail.
Therefore, weigh the benefit of increasing the
commit interval against the additional time you
would spend recovering a failed session.
Click the General Options settings in the session
properties to review and adjust the commit interval.

Log Files
A workflow runs faster when you do not configure it
to write session and workflow log files.
Workflows and sessions always create binary logs.
When you configure a session or workflow to write a
log file, the Integration Service writes logging
events twice.
You can access the binary logs session and workflow
logs in the Administrator tool

Error Tracing
If a session contains a large number of
transformation errors, and you do not need to
correct them, set the session tracing level to Terse.
At this tracing level, the Integration Service does not
write error messages or row-level information for
reject data.
If you need to debug the mapping and you set the
tracing level to Verbose, you may experience
significant performance degradation when you run
the session. Do not use Verbose tracing when you
tune performance.

The
session
tracing
level
overrides
any
transformation-specific tracing levels within the
mapping.
This is not recommended as a long-term response to
high levels of transformation errors.
Post-Session Emails
When you attach the session log to a post-session
email, enable flat file logging.
If you enable flat file logging, the Integration
Service gets the session log file from disk.
If you do not enable flat file logging, the
Integration Service gets the log events from
the Log Manager and generates the session
log file to attach to the email.
When the Integration Service retrieves the session
log from the log service, workflow performance
slows, especially when the session log file is large
and the log service runs on a different node than
the master DTM.
For optimal performance, configure the session to
write to log file when you configure post-session
email to attach a session log.
Optimizing Grid Deployments Overview
When you run PowerCenter on a grid, you can
configure the grid, sessions, and workflows to use
resources efficiently and maximize scalability.
To improve PowerCenter performance on a grid,
complete the following tasks:
- Add nodes to the grid.
- Increase storage capacity and bandwidth.
- Use shared file systems.
- Use a high-throughput network when you complete
the following tasks:
1. Access sources and targets over the
network.
2. Transfer data between nodes of a grid
when using the Session on Grid option.

156

Storing Files
When you configure PowerCenter to run on a grid,
you specify the storage location for different types
of session files, such as source files, log files, and
cache files.
To improve performance, store files in optimal
locations.
For example, store persistent cache files on a highbandwidth shared file system.
Different types of files have different storage
requirements.
You can store files in the following types of locations:
Shared file systems - Store files on a shared file
system to enable all Integration Service processes to

access the same files. You can store files on lowbandwidth and high-bandwidth shared file systems.
Local - Store files on the local machine running the
Integration Service process when the files do not
have to be accessed by other Integration Service
processes.
High Bandwidth Shared File System Files
Because they can be accessed often during a
session, place the following files on a highbandwidth shared file system:
- Source files, including flat files for lookups.
- Target files, including merge files for partitioned
sessions.
- Persistent cache files for lookup or incremental
aggregation.
- Non-persistent cache files for only grid-enabled
sessions on a grid.
This allows the Integration Service to build the cache
only once.
If these cache files are stored on a local file system,
the Integration Service builds a cache for each
partition group.
Low Bandwidth Shared File System Files
Because they are accessed less frequently during a
session, store the following files on a low-bandwidth
shared file system:
- Parameter files or other configuration related files.
- Indirect source or target files.
- Log files.
Local Storage Files
To avoid unnecessary file sharing when you use
shared file systems, store the following files locally:
- Non-persistent cache files for sessions that are not
enabled for a grid, including Sorter transformation
temporary files.
- Individual target files for different partitions when
performing a sequential merge for partitioned
sessions.
- Other temporary files that are deleted at the end of
a session run. In general, to establish this, configure
$PmTempFileDir for a local file system.
Avoid storing these files on a shared file system,
even when the bandwidth is high
OPTIMIZING THE POWERCENTER COMPONENTS
You can optimize performance of the following
PowerCenter components:
- PowerCenter repository
- Integration Service

If you run PowerCenter on multiple machines, run


the Repository Service and Integration Service on
different machines.
To load large amounts of data, run the Integration
Service on the higher processing machine.
Also, run the Repository Service on the machine
hosting the PowerCenter repository
Optimizing PowerCenter Repository Performance
- Ensure the PowerCenter repository is on the same
machine as the Repository Service process.
- Order conditions in object queries.
- Use a single-node tablespace for the PowerCenter
repository if you install it on a DB2 database.
- Optimize the database schema for the
PowerCenter repository if you install it on a DB2 or
Microsoft SQL Server database.
Optimizing Integration Service Performance
- Use native drivers instead of ODBC drivers for the
Integration Service.
- Run the Integration Service in ASCII data
movement mode if character data is 7-bit ASCII or
EBCDIC.
- Cache PowerCenter metadata for the Repository
Service.
- Run Integration Service with high availability.
Note: When you configure the Integration Service
with high availability, the Integration Service
recovers workflows and sessions that may fail
because of temporary network or machine failures.
To recover from a workflow or session, the
Integration Service writes the states of each
workflow and session to temporary files in a shared
directory. This may decrease performance
OPTIMIZING THE SYSTEM OVERVIEW
Often performance slows because the session relies
on inefficient connections or an overloaded
Integration Service process system.
System delays can also be caused by routers,
switches, network protocols, and usage by many
users.
Slow disk access on source and target databases,
source and target file systems, and nodes in the
domain can slow session performance.
Have the system administrator evaluate the hard
disks on the machines.
After you determine from the system monitoring
tools that you have a system bottleneck, make the
following
global
changes
to
improve
the
performance of all sessions:

157

Improve network speed - Slow network


connections can slow session performance. Have the
system administrator determine if the network runs
at an optimal speed. Decrease the number of
network hops between the Integration Service
process and databases.
Use multiple CPUs - You can use multiple CPUs to
run multiple sessions in parallel and run multiple
pipeline partitions in parallel.
Reduce paging - When an operating system runs
out of physical memory, it starts paging to disk to
free physical memory. Configure the physical
memory for the Integration Service process machine
to minimize paging to disk.
Use processor binding - In a multi-processor UNIX
environment, the Integration Service may use a
large amount of system resources. Use processor
binding to control processor usage by the
Integration Service process. Also, if the source and
target database are on the same machine, use
processor binding to limit the resources used by the
database.
USING PIPELINE PARTITIONS
If you have the partitioning option, perform the
following tasks to manually set up partitions:
Increase the number of partitions Use the following tips when you add partitions to a
session:
Add one partition at a time - To best monitor
performance, add one partition at a time, and
note the session settings before you add each
partition.
Set DTM Buffer Memory - When you increase the
number of partitions, increase the DTM buffer
size. If the session contains n partitions, increase
the DTM buffer size to at least n times the value
for the session with one partition.
Set cached values for Sequence Generator - If a
session has n partitions, you should not need to
use the Number of Cached Values property for
the Sequence Generator transformation. If you
set this value to a value greater than 0, make
sure it is at least n times the original value for
the session with one partition.
Partition the source data evenly - Configure each
partition to extract the same number of rows.
Monitor the system while running the session - If
CPU cycles are available, you can add a partition
to improve performance. For example, you may
have CPU cycles available if the system has 20
percent idle time.
Monitor the system after adding a partition - If
the CPU utilization does not go up, the wait for

I/O time goes up, or the total data transformation


rate goes down, then there is probably a
hardware or software bottleneck. If the wait for
I/O time goes up by a significant amount, then
check the system for hardware bottlenecks.
Otherwise, check the database configuration.
Select the best performing partition types at
particular points in a pipeline
- You can use multiple pipeline partitions and
database partitions.
- To improve performance, ensure the number of
pipeline partitions equals the number of database
partitions
- To increase performance, specify partition types at
the following partition points in the pipeline:
Source Qualifier transformation - To read data
from multiple flat files concurrently, specify one
partition for each flat file in the Source Qualifier
transformation. Accept the default partition type,
pass-through.
Filter transformation - Since the source files vary
in size, each partition processes a different amount
of data. Set a partition point at the Filter
transformation, and choose round-robin partitioning
to balance the load going into the Filter
transformation.
Sorter transformation - To eliminate overlapping
groups
in
the
Sorter
and
Aggregator
transformations, use hash auto-keys partitioning at
the Sorter transformation. This causes the
Integration Service to group all items with the same
description into the same partition before the Sorter
and Aggregator transformations process the rows.
You can delete the default partition point at the
Aggregator transformation.
Target - Since the target tables are partitioned by
key range, specify key range partitioning at the
target to optimize writing data to the target
Use multiple CPUs.
Optimizing the Source Database for Partitioning
You can add partitions to increase the speed of the
query.
Usually, each partition on the reader side represents
a subset of the data to be processed.
Complete the following tasks to optimize the source
database for partitioning,
Tune the database - If the database is not tuned
properly, creating partitions may not make sessions
quicker.
Enable parallel queries - Some databases may
have options that must be set to enable parallel
queries. Check the database documentation for

158

these options. If these options are off, the


Integration Service runs multiple partition SELECT
statements serially.
Separate data into different tables spaces Each database provides an option to separate the
data into different tablespaces. If the database
allows it, use the PowerCenter SQL override feature
to provide a query that extracts data from a single
partition.
Group the sorted data - You can partition and
group source data to increase performance for a
sorted Joiner transformation.
Maximize single-sorted queries

any of the Transformation_errorrows counters, you


can eliminate the errors to improve performance

Optimizing the Target Database for Partitioning


If a session contains multiple partitions, the
throughput for each partition should be the same
as the throughput for a single partition session.
If you do not see this correlation, then the database
is probably inserting rows into the database serially.
To ensure that the database inserts rows in parallel,
check the following configuration options in the
target database:
- Set options in the database to enable parallel
inserts. For example, set the db_writer_processes
and DB2 has max_agents options in an Oracle
database to enable parallel inserts. Some databases
may enable these options by default.
- Consider partitioning the target table. If possible,
try to have each partition write to a single database
partition using a Router transformation to do this.
Also, have the database partitions on separate disks
to prevent I/O contention among the pipeline
partitions.
- Set options in the database to enhance database
scalability. For example, disable archive logging and
timed statistics in an Oracle database to enhance
scalability
PERFORMANCE COUNTERS OVERVIEW
All transformations have counters. The Integration
Service tracks the number of input rows, output
rows, and error rows for each transformation.
Some transformations have performance counters.
You can use the following performance counters to
increase session performance:
- Errorrows
- Readfromcache and Writetocache
- Readfromdisk and Writetodisk
- Rowsinlookupcache
Errorrows Counter
Transformation errors impact session performance. If
a transformation has large numbers of error rows in

Readfromcache and Writetocache Counters


If a session contains Aggregator, Rank, or Joiner
transformations,
examine
the
Transformation_readfromcache
and
Transformation_writetocache counters along with
the
Transformation_readfromdisk
and
Transformation_writetodisk counters to analyze how
the Integration Service reads from or writes to disk.
To analyze the disk access, first calculate the hit or
miss ratio.
The hit ratio indicates the number of read or write
operations the Integration Service performs on the
cache.
The miss ratio indicates the number of read or write
operations the Integration Service performs on the
disk.
Use the following formula to calculate the cache
miss ratio:
[(# of reads from disk) + (# of writes to disk)]/[(# of
reads from memory cache) + (# of writes to
memory cache)]
Use the following formula to calculate the cache hit
ratio:
[1 - Cache Miss ratio]
To minimize reads and writes to disk, increase the
cache size. The optimal cache hit ratio is 1.

Readfromdisk and Writetodisk Counters


If a session contains Aggregator, Rank, or Joiner
transformations,
examine
each
Transformation_readfromdisk
and
Transformation_writetodisk counter.
If these counters display any number other than
zero, you can increase the cache sizes to improve
session performance.
The Integration Service uses the index cache to
store group information and the data cache to store
transformed data, which is typically larger.
Therefore, although both the index cache and data
cache sizes affect performance, you may need to
increase the data cache size more than the index
cache size.
However, if the volume of data processed is greater
than the memory available you can increase the
index cache size to improve performance.

159

For example, the Integration Service uses 100 MB to


store the index cache and 500 MB to store the data
cache.
With 200 randomly distributed accesses on each of
the index and data caches, you can configure the
cache in the following ways:
- To optimize performance, allocate 100 MB to the
index cache and 200 MB to the data cache. The
Integration Service accesses 100 percent of the data
from the index cache and 40 percent of the data
from the data cache. The Integration Service always
accesses the index cache, and does not access the
data cache 120 times. Therefore, the percentage of
data that gets accessed is 70 percent.
- Allocate 50 MB to the index cache and 250 MB to
the data cache. The Integration Service accesses 50
percent of the data from the index cache and 50
percent of the data from the data cache. The
Integration Service does not access both index and
data caches a 100 times each. Therefore, the
percentage of data that gets accessed is 50 percent.
If the session performs incremental aggregation, the
Integration Service reads historical aggregate data
from the local disk during the session and writes to
disk when saving historical data.
As a result, the Aggregator_readtodisk and
Aggregator_writetodisk counters display numbers
besides zero.
However, since the Integration Service writes the
historical data to a file at the end of the session,
you can still evaluate the counters during the
session.
If the counters show numbers other than zero during
the session run, you can tune the cache sizes to
increase performance.
However, there is a cost associated with allocating
or deallocating memory, so refrain from increasing
the cache sizes to accommodate more data volume
if you know what volume of data the Integration
Service will process.
Rowsinlookupcache Counter
Multiple lookups can decrease session performance.
To improve session performance, tune the lookup
expressions for the larger lookup tables.

160

161

Potrebbero piacerti anche