Sei sulla pagina 1di 27

PIPELINE PARTITIONING

In addition to a better ETL design, it is obvious to have a session optimized


with no bottlenecks to get the best session performance.

After optimizing the session performance, we can further improve the


performance by exploiting the under utilized hardware power

What is Session Partitioning

The Informatica PowerCenter Partitioning Option increases the performance


of PowerCenter through parallel data processing

Partitioning option will let you split the large data set into smaller subsets
which can be processed in parallel to get a better session performance.

Partitioning Terminology
Partition : A partition is a subset of the data that executes in a single thread.
Number of partitions : We can divide the data set into smaller subset by
increasing the number of partitions. When we add partitions, we increase the
number of processing threads, which can improve session performance.
Partition Point : This is the boundary between two stages and divide the
pipeline into stages. Partition point is always associated with a
transformation.
Partition Type : It is an algorithm for distributing data among partitions,
which is always associated with a partition point. The partition type controls
how the Integration Service distributes data among partitions at partition
points.

Type of Session Partitions


Pass-through Partitioning : In this type of partitioning, the Integration
Service passes all rows at one partition point to the next partition point
without redistributing them.
Key Range Partitioning: With this type of partitioning, you specify one or
more ports to form a compound partition key for a source or target. The
Integration Service then passes data to each partition depending on the
ranges you specify for each port
Round-Robin Partitioning : Using this partitioning algorithm, the
Integration service distributes data evenly among all partitions. Use roundrobin partitioning when you need to distribute rows evenly and do not need to
group data among partitions.

Hash Auto-Keys Partitioning: The PowerCenter Server uses a hash


function to group rows of data among partitions. When hash auto-key partition
is used, the Integration Service uses all grouped or sorted ports as a compound
partition key. You can use hash auto-keys partitioning at or before Rank,
Sorter, and unsorted Aggregator transformations to ensure that rows are
grouped properly before they enter these transformations
Hash User-Keys Partitioning : Hash user keys. The Integration Service
uses a hash function to group rows of data among partitions based on a userdefined partition key
Database partitioning: The Integration Service queries the database
system for table partition information. It reads partitioned data from the
corresponding nodes in the database..

We can invoke the user interface for session partition session using the menu
Mapping -> Partitions.

Choose any transformation from the mapping and the "Delete Partition Point"
or"Edit Partition Point"button will let you modify partition points.

The "Add/Delete/Edit Partition Point" opens up an additional window which let


you modify the partition and choose the type of the partition algorithm

Informatica PowerCenter Session Partitioning can be effectively used for


parallel data processing and achieve faster data delivery

Parallel data processing performance is heavily depending on the additional


hardware power available.
In additional to that, it is important to choose the appropriate partitioning
algorithm or partition type.

Business Use Case


Daily sales data generated from three sales region need to be loaded into an
Oracle data warehouse.
The sales volume from three different regions varies a lot, hence the number
of records processed for every region varies a lot.
The warehouse target table is partitioned based on product line.

Pass-through Partition
A pass-through partition at the source qualifier transformation is used to split
the source data into three different parallel processing data sets. Below image
shows how to setup pass through partition for three different sales regions

Once the partition is setup at the source qualifier, you get additional Source
Filter option to restrict the data which corresponds to each partition. Be sure
to provide the filter condition such that same data is not processed through
more than one partition and data is not duplicated.

Round Robin Partition


Since the data volume from three sales region is not same, use round robin
partition algorithm at the next transformation in pipeline. So that the data is
equally distributed among the three partitions and the processing load is
equally distributed

Hash Auto Key Partition


At the Aggregator transformation, data need to redistribute across the
partitions toavoid the potential splitting of aggregator groups. Hash auto key
partition algorithm will make sure the data from different partition
isredistributedsuch that records with the same key is in the same partition.
This algorithm will identify the keys based on the group key provided in the
transformation.

Key Range Partition


Use Key range partition when required todistribute the records among
partitions based on the range of values of a port or multiple ports.
Here the target table is range partitioned on product line. Create a range
partition on target definition on PRODUCT_LINE_ID port to get the best
write throughput

Now give the value start and end range for each partition as shown below.

You cannot create partition points for the following transformations:


Source definition
Sequence Generator
XML Parser
XML target
Unconnected transformations

Potrebbero piacerti anche