Partitioning

PIPELINE PARTITIONING
In addition to a better ETL design, it is obvious to have a session optimized

with no bottlenecks to get the best session performance.
After optimizing the session performance, we can further improve the

performance by exploiting the under utilized hardware power
What is Session Partitioning
The Informatica PowerCenter Partitioning Option increases the performance

of PowerCenter through parallel data processing
Partitioning option will let you split the large data set into smaller subsets
which can be processed in parallel to get a better session performance.
Partitioning Terminology
Partition : A partition is a subset of the data that executes in a single thread.
Number of partitions : We can divide the data set into smaller subset by
increasing the number of partitions. When we add partitions, we increase the
number of processing threads, which can improve session performance.
Partition Point : This is the boundary between two stages and divide the
pipeline into stages. Partition point is always associated with a
transformation.
Partition Type : It is an algorithm for distributing data among partitions,
which is always associated with a partition point. The partition type controls
how the Integration Service distributes data among partitions at partition
points.
Type of Session Partitions

Pass-through Partitioning : In this type of partitioning, the Integration
Service passes all rows at one partition point to the next partition point
without redistributing them.
Key Range Partitioning: With this type of partitioning, you specify one or
more ports to form a compound partition key for a source or target. The
Integration Service then passes data to each partition depending on the
ranges you specify for each port
Round-Robin Partitioning : Using this partitioning algorithm, the
Integration service distributes data evenly among all partitions. Use roundrobin partitioning when you need to distribute rows evenly and do not need to
group data among partitions.
Hash Auto-Keys Partitioning: The PowerCenter Server uses a hash

function to group rows of data among partitions. When hash auto-key partition
is used, the Integration Service uses all grouped or sorted ports as a compound
partition key. You can use hash auto-keys partitioning at or before Rank,
Sorter, and unsorted Aggregator transformations to ensure that rows are
grouped properly before they enter these transformations
Hash User-Keys Partitioning : Hash user keys. The Integration Service
uses a hash function to group rows of data among partitions based on a userdefined partition key
Database partitioning: The Integration Service queries the database
system for table partition information. It reads partitioned data from the
corresponding nodes in the database..
We can invoke the user interface for session partition session using the menu
Mapping -> Partitions.
Choose any transformation from the mapping and the "Delete Partition Point"
or"Edit Partition Point"button will let you modify partition points.
The "Add/Delete/Edit Partition Point" opens up an additional window which let

you modify the partition and choose the type of the partition algorithm
Informatica PowerCenter Session Partitioning can be effectively used for

parallel data processing and achieve faster data delivery
Parallel data processing performance is heavily depending on the additional

hardware power available.
In additional to that, it is important to choose the appropriate partitioning
algorithm or partition type.
Business Use Case

Daily sales data generated from three sales region need to be loaded into an
Oracle data warehouse.
The sales volume from three different regions varies a lot, hence the number
of records processed for every region varies a lot.
The warehouse target table is partitioned based on product line.
Pass-through Partition
A pass-through partition at the source qualifier transformation is used to split
the source data into three different parallel processing data sets. Below image
shows how to setup pass through partition for three different sales regions
Once the partition is setup at the source qualifier, you get additional Source
Filter option to restrict the data which corresponds to each partition. Be sure
to provide the filter condition such that same data is not processed through
more than one partition and data is not duplicated.
Round Robin Partition

Since the data volume from three sales region is not same, use round robin
partition algorithm at the next transformation in pipeline. So that the data is
equally distributed among the three partitions and the processing load is
equally distributed
Hash Auto Key Partition

At the Aggregator transformation, data need to redistribute across the
partitions toavoid the potential splitting of aggregator groups. Hash auto key
partition algorithm will make sure the data from different partition
isredistributedsuch that records with the same key is in the same partition.
This algorithm will identify the keys based on the group key provided in the
transformation.
Key Range Partition

Use Key range partition when required todistribute the records among
partitions based on the range of values of a port or multiple ports.
Here the target table is range partitioned on product line. Create a range
partition on target definition on PRODUCT_LINE_ID port to get the best
write throughput
Now give the value start and end range for each partition as shown below.
You cannot create partition points for the following transformations:

Source definition
Sequence Generator
XML Parser
XML target
Unconnected transformations

Partitioning

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Partitioning

Caricato da

Copyright:

Formati disponibili

PIPELINE PARTITIONING

In addition to a better ETL design, it is obvious to have a session optimized

After optimizing the session performance, we can further improve the

What is Session Partitioning

The Informatica PowerCenter Partitioning Option increases the performance

Type of Session Partitions

Hash Auto-Keys Partitioning: The PowerCenter Server uses a hash

The "Add/Delete/Edit Partition Point" opens up an additional window which let

Informatica PowerCenter Session Partitioning can be effectively used for

Parallel data processing performance is heavily depending on the additional

Business Use Case

Round Robin Partition

Hash Auto Key Partition

Key Range Partition

You cannot create partition points for the following transformations:

Potrebbero piacerti anche