Sei sulla pagina 1di 18

Using Pushdown Optimization

What is PDO?
Pushdown optimization can take your mapping and turn it into a SQL statement that will
be executed against either the Source or Target database.
PDO differs from, say, a SQL override in a Source Qualifier because it keeps the
mapping logic at eye level (you still see all the transformations at a glance) while pushing
the work out to a database engine. PDO thus keeps the mappings metadata intact for data
lineage, facilitating maintenance and reusability.
PDO is setup at the session level, although many aspects of the underlying mapping will
decide how PDO behaves.

When should you consider using PDO?


The best case scenario is when source and target data already reside on the same database
(or databases that are accessible by the same user on the same system, such as STG and
CORE databases in EDW). It is then possible to implement a Extract, Load and
Transform pattern rather than a regular Extract, Transform and Load. The obvious
advantage is that your data, once loaded on the target system, will not have to leave it. It
will be processed in place by the database engine rather than being lugged around by
PowerCenter to be processed on its server and having to be written back to the database.
Another possible scenario is when your mapping joins tables from the same database,
maybe performing some filtering, outer joins or even aggregation on the source data. In
that case, instead of writing your own SQL overrides in Source Qualifiers, why not lay it
down more visibly in a mapping, using plain transformations, and let PowerCenter
generate the SQL for you at the session level? This way, you can also see if PDO (or a
SQL override for that matter) is really more efficient than PowerCenter processing.

Source vs. Target vs. Full Optimization


PowerCenter provides 3 types of pushdown optimizations.

To Source
PowerCenter interprets your mapping starting from the Source Qualifier and converts as
much as it can into a SELECT statement that is then pushed to the source database.
Transformations that cannot be pushed (see limitations below) will be processed by
PowerCenter.
It this situation, PowerCenter handles the load into the target and some of the data may
have to be processed by PowerCenter as well.

To Target
PowerCenter reads the mapping from the target(s) this time and turns what it can into
INSERT or UPDATE or DELETE statement(s), depending on your target settings and
update strategy choices.
This is typically less useful, as PowerCenter places a lot more limitations on which
transformations can be pushed to the target.

Full
If every piece of data is accessible from the same connection (or connections that share
systems and users) and if every transformation can be pushed down, PowerCenter
generates one INSERT INTO SELECT statement. Clearly the best scenario, as the data is
left entirely on the system on which it is loaded and the database engine processes
everything. However, keep in mind that the database engine processes the statement in a
single transaction (consider the resources used, table locking and recovery aspects).
If source and target data reside on different systems, PowerCenter works its way through
your mapping from the source and from the target at the same time, generating a
SELECT statement as well as an INSERT/UPDATE/DELETE statement. The data still
has to transit through PowerCenter to be transferred from the source system to the target
system.
If some transformations cannot be pushed down, PowerCenter will process them but
leave the rest to the source or target database engines.

Limitations
The main drawback of PDO it that is has to be able to convert mapping functionality into
standard ANSI SQL (when using TERADATA and ODBC drivers. You have fewer
limitations when using Oracle and native drivers since PDO can make use of Oracle
specific SQL syntax).
What PDO will not do, however, is create database stored procedures or macros for your
fancy processing.
So, youll have to work around quite a few limitations and possibly bastardize your
mapping to fit the mold. How much change is too much, thats for you to decide, but a
word of advice: make a copy of your original mapping and session before you start
adapting it for PDO. This road can be long and rocky

General considerations and limitations

System dates (SYSDATE), when pushed down will use the database systems
date, not the Informatica servers system date. If the Informatica server is in D.C.

and the database server in CA, youll get different results unless these systems use
a common time zone.
Sort order and case sensitivity may be different on the Informatica server and the
database server
Error log events are not generated for the processing done by a database engine,
making it much harder to debug
Source rows read statistics may be missing when using full pushdown
Dimensional update mappings are very hard to pushdown on TERADATA. Well
have an example below, but it is not really practical. You may have better luck on
Oracle where you can use mappings generated by the slowly changing dimension
wizard for type 1 and 3 (not type 2!) or a similar template.

Mapping level limitations


As a rule of thumb, try to keep the logic in as few data streams as possible, dont do too
many branching and step downs. For instance, PDO doesnt like multiple lookups feeding
into a routers input group.
With TERADATA, dont try target side pushdown if your target has DATE fields. Source
side and full pushdown do support date fields in TERADATA targets (see more
TERADATA specific limitations below)

Source Qualifier::
o SLQ overrides are supported via temporary views (see session setup
below). The connection user must have create and drop view privileges on
the active database.
o Use ANSI outer join syntax SELECT FROM LEFT_TABLE [LEFT|
RIGHT|FULL OUTER JOIN ] RIGHT_TABLE ON
o No ORDER BY clause in overrides targeting TERADATA (OK with
Oracle)
o The override must be valid, PDO does NOT validate it
o You should qualify the table names in outer joins within a SQL override
Lookups:
o Attach multiple lookups in series, not in parallel. PDO does not support
multiple lookups if you branch out, do the lookups then merge back
o PDO will do an outer join, un-cached of course.
o Avoid unconnected lookups
o No pipeline lookup
o No dynamic cache
o Must be set to report an error on multiple matches (not use Any Value
which is the default)

o SQL override is supported (through a temporary view, see session setup)


but ORDER BY in the override is NOT supported
Joiner:
o Both detail and master sides must be push-able (heterogeneous joins only,
no joins with flat files or sources from different database systems)
o No multi-table joins insides either Source Qualifiers if you have the join
property in the Joiner set to an outer join type (Detail outer, Master outer
or Full outer join)
Aggregator:
o Pass-through ports are not supported. Make them part of your GROUP BY
ports or break it into input and output ports and attach a supported
aggregate function to the output.
o No conditional clause i.e. SUM(port, <condition>)
o FIRST, LAST, MEDIAN, PERCENTILE functions are not supported
Rank transformation is not supported
Functions without SQL equivalent cannot be pushed down, such as
o ABORT, ERROR
o Encryption/decryption/compression/decompression functions (MD5,
AES_DECRYPT, COMPRESS, )
o DATE_DIFF
o IS_NUMBER, IS_DATE, IS_SPACES
o Regular expression functions, such as REG_EXTRACT or REG_MATCH
o For a complete list of supported and unsupported function on any
particular database, see the online help (Advanced Workflow Manager
Guide > Pushdown> Working With Expressions)
o In general, youll find the greatest support for Oracle using Oracle specific
SQL syntax
No local variables (in Expression or Aggregator transformations),
No default value overrides using the Default Value field to change an incoming
NULL to a given value or change the output of a failed output port function.
No User-Defined functions

Session level limitations and considerations

Active and passive databases:

o PDO divides to work into active and passive (or idle) databases. The
active database is the one that processes transformation logic. For
instance, if you have a lookup transformation and use source side
pushdown, the database targeted by the lookup is passive, the database
targeted by the Source Qualifier is active. When using full pushdown, the
target database is the active one.
Connections
o Connections must be compatible between active and passive databases for
PDO to work. I.e. the active database user must be able to read from the
passive database.
o On TERADATA, connections are compatible when they use the same user
name (i.e. PCMT_EDW_DEV_ETL), password, data source name (i.e.
usto-ddbx-edw01), code page and environment SQL
o If user name and passwords are different, PDO can login into the passive
database using the active database user credentials (see session setup
below). This requires the active database user to have read access to the
passive database and you need to qualify the table names used by the
passive databases (as in PCMT_DEV_STG.STG_ASSESSMENT)
Target load switch Update Else Insert is not supported with full pushdown

Limitations Specific to TERADATA systems

No DATE fields in targets when pushing to the target database (source side or full
pushdown OK)
A pushdown session must use ODBC connections. At this time, TPT connections
are not allowed.
There is an run-time issue pushing date-time values (such as SESSSTARTTIME )
to time(0) target fields (KB 105651):

Avoid Sorter transformations


No Lookup transformations with target side pushdown
No date to string conversions
Sequence Generator is not supported (OK on Oracle through temporary
sequences)
No updates (update strategy transformation or treat row as update in the session)
if you have Lookups or Joiners in the mapping

When pushing Aggregators grouping by one or more string ports, make sure casesensitiveness is turned on in TERADATA for these columns, as the Aggregator in
PowerCenter is case-sensitive.
Very-high precision (>18) numerical conversion is not supported
No HH24 token or spaces in TO_CHAR(DATE) or TO_DATE() format strings.
PDO converts to former to HH and the latter to B
The string concatenation operator || is supported by source side, but not by full
pushdown
ASCII conversion functions such as CHR(), ASCII(), CHRCODE() are not
supported
No ORDER BY clauses in SQL overrides

PDO Session Setup and Interface


Setting up the session for PDO is easy.
First, go to the Properties tab, under the Performance node and select the type of
PDO you would like: To Source, To Target or Full. You can also specify a
parameter to drive this option.
Second, if you have SQL overrides in Source Qualifiers or Lookups, you must
select Allow Temporary Sequence for Pushdown. Ignore the Temporary
Sequence option on TERADATA, it is not supported.
Third, if you plan on using connections to the same system and data source name
but with different users and passwords, turn on Allow Pushdown for User
Incompatible Connections. The user of the active database must then have read
privileges into the idle databases:

Then go to the mapping tab and make sure the relational connections for all the
objects you wish to pushdown use compatible connections. Select connections for
your Lookups as well, PDO doesnt like $Source and $Target aliases:

Once you have done that, click on the Pushdown Optimization link or item in the
navigator. Be patient, it may take a while for Powercenter to respond.
The PDO interface shows up what can be done for your mapping. Working from
the error messages displayed in the bottom pane, you can then modify your
mapping to comply with PDO rules if appropriate.

>

An Example
Mapping specifics
A simple mapping that transfers data from stage into core, using a pair of lookups and an
expression transformation

Session specifics
The original session uses compatible connections, sourcing from STG and targeting
CORE
The lookups use the variable $Source pointing to STG

Making full pushdown optimization work


Under the sessions properties, turn on PDO, using the Full option

In the mapping tab, click on the PDO link to bring up the interface

The screen shows connection errors due to the use of the $Source variable in the lookups
PDO can preview results by temporarily setting the $Source or $Target variable. Just
click on the <Preview Results for Connection> button and pick a replacement connection

After selecting a STG source connection, PDO still shows errors, now about branching
and date fields.
This is a common pattern; implementing PDO often requires many iterations and round
trips between the designer and workflow manager.

Now, lets first change the $Source variable to a real connection for our lookups

Then, we will remove the lookup branches in the mapping. Thats a bit of work and
means passing unrelated data through the lookups via pass-through ports. Not a clean
design.

Back in the workflow manager, refresh and save the mapping and try PDO again
Now, it complains about the policy on multiple matches for lookups:

In designer, change the policy to report an error on multiple matches, for both lookups.
The default is to use any value when you create a lookup in v8.6

In workflow manager, after having refreshed and saved the session again, we finally see
that PDO is able to generate a full pushdown statement

You can change the pushdown option (preview) in the interface:


To Source:

To Target, this fails because of the date fields in our target:

Running the session, we see the SQL executed in the session log (OPTM messages) but
not much else:

No read information in the run properties either:

Verdict: session performance improves from the original 50 secs to about 9 secs on
average, a sizeable gain.

A Slowly Changing Dimension (sort of!) Example


Mapping specifics
The starting point of this example is a regular type 1 dimensional update mapping.
Source data is in STG, the target on CORE. The mapping simply reads the source data,
determines if a row is new or changed by looking up the target. If a row is new, a new
surrogate key in generated, based on the maximum value of the PK from the previous run
and the row is inserted. If the row has changed, it is updated, based on the existing
primary key:

Session specifics
Run of the mill settings, with server connections to STG and CORE assigned to our
source and target. The target lookups use the $Target variable.

Trying to make Full pushdown optimization work


This mapping presents a particular challenge on TERADATA for several reasons:

PDO on TERADATA does not support any kind of updates unless there are no
Lookup and no Joiner transformation in the mapping
o You will have to use SQL overrides to compare source to target
Sequence Generators arent supported either. So, if you cant lookup the target for
the last PK and you cant use sequence generators, how do you generate a
surrogate key? Some options:
o Use identity columns. They are a bit erratic on TERADATA, sometimes
leaving big gaps between values
o Use SQL, Generate unique row numbers for your source rows and add that
to the largest PK generated by the previous run (also gotten from SQL).
This can work but also leaves gaps in your sequences (although smaller
that identity columns) if a row is updated instead of inserted.
o Break the process in 2 components, leaving the comparing and the PK
generating to the first process and the load to the second process. This can
work too but its a lot of work, not very efficient and requires an extra
staging table. Furthermore, PDO can only be achieved on the second
process (the insert/update process)
o This example will take a look at the SQL option
When we use SQL overrides, PDO creates views and the connection user needs
privileges to create and drop these views, probably requiring the mapping to run
within the WORK area.

We will now gut this mapping, removing all lookups and PK generating logic and
replacing all that with a SQL override in the Source Qualifier.
One option for generating a row number on TERADATA with SQL is:
SELECT sum(1) over( rows unbounded preceding ) AS
<row_number_col>
To find out if a row exists or not in the target, we need to perform an outer join
where we take all the rows from the source and the matching data from the
target, as in:
o SELECT <all fields from source>, <natural key fields from target>
FROM SOURCE LEFT OUTER JOIN TARGET on <natural key fields>
We also need the maximum primary key value currently in the target to be
added to every row. This can be achieved by performing a Cartesian join
between the output of the outer join above and this query:
o SELECT MAX(<PK_COLUMN>) AS CURRENT_MAX_KEY FROM
<CORE_USER.TARGET_TABLE>
o

The complete SQL for the example mapping is shown here (there are probably more
efficient ways to do this):
SELECT * FROM
(SELECT
sum(1) over( rows unbounded preceding ) AS INSERT_NUM,
STG.ASSESSMENT_NO,
STG.ISSUE_NO,
STG.COMPLAINT_NO,
STG.ASSESSMENT_STATUS,
STG.ASSESSMENT_TYPE,
STG.ASSESSMENT_SITE,

STG.ASSESSMENT_DEPT,
STG.INSTRUCTIONS, STG.REFERENCED_ASSESSMENT,
STG.ASSESSOR_LOGIN,
STG.ASSESSOR_NAME, STG.ACKNOWLEDGED_BY,
STG.ACKNOWLEDGED_DATE, STG.ASSESSMENT_TEXT,
STG.REWORK_REQUIRED,
STG.ASSESSMENT_RECEIVED_DATE,
STG.ASSESSED_BY_LOGIN,
STG.ASSESSED_BY_NAME,
STG.ASSESSED_DATE,
STG.APPROVED_BY_LOGIN, STG.APPROVED_BY_NAME, STG.APPROVED_DATE,
STG.CREATED_BY, STG.CREATED_DATE, STG.UPDATED_BY, STG.UPDATED_DATE,
STG.DATE_FORW_TO_EXT_ASSESSOR, STG.DATE_ACKN_BY_EXT_ASSESSOR,
STG.DATE_REWORK_CLAR_RECEIVED,
DIM1.ASSESSMENT_KEY,
DIM1.CREATE_DT,
DIM1.CREATE_BY
FROM
EIRT_DEV_STG.STG_ASSESSMENT STG LEFT OUTER JOIN
EIRT_DEV_CORE.D_ASSESSMENT_PDO DIM1
ON STG.ASSESSMENT_NO = DIM1.CMIS_ASSESSMENT_NO)
A, (SELECT MAX(DIM2.ASSESSMENT_KEY) AS CURRENT_MAX_KEY FROM
EIRT_DEV_CORE.D_ASSESSMENT_PDO DIM2) B
WHERE 1=1
Note that tables in the override are qualified with the user (schema) name. This is
required.
Now, all is left is to flag the new rows for insert and to assemble their primary key
from the MAX_CURRENT_KEY and INSERT_NUM fields:

Here is a view of the new mapping:

Lets go to the workflow manager to setup the session. Turn on Full pushdown in the
properties as well as Allow temporary view for pushdown:

Taking a look at the pushdown SQL, we note the multiple SQL groups the first one to
handle the temporary views and the update, the second for the insert:

Verdict: not so good this time. What is the point of using PDO if one has to write ones
own SQL? The resulting mapping is much too far form the original and maintainability is
pretty much lost.

Potrebbero piacerti anche