Dynamic Partitioning in Informatca 8.X

Dynamic Partitioning
integration * intelligence * insight

AGENDA
High Availability
Grid Computing
Dynamic Partitioning

Introduction
PowerCenter Domains
 PowerCenter introduces a service-oriented architecture
 PowerCenter introduces a domain, which serves as the primary unit of

administration for the PowerCenter environment.
 A domain is a collection of nodes and services in the PowerCenter

environment.
 The first time you install Informatica Services, you create a domain and add a
node to the domain.

Administration Console
• The Administration Console is a browser-based utility that enables you to

view domain properties and perform basic domain administration tasks
• The Navigator displays the following types of objects:
• Domain. You can view one domain in the Administration Console
• Node. A node represents a machine in the domain.
• Grid. Create a grid to run the Integration Service on multiple nodes.

Administration Console

Administration Console contd..

Administration Console contd..

High Availability
• High availability is the PowerCenter option that eliminates a single point of

failure in the PowerCenter environment
• High availability provides the following functionality:
• Resilience.
• Failover.
• Recovery.

The Partitioning Option
• The Partitioning Option increases PowerCenter’s performance through parallel data

processing .
• When the Integration Service runs the session, it can achieve higher performance by
partitioning the pipeline and performing the extract, transformation, and load for each partition
in parallel.
• Partition Types :
• Database partitioning.
• Hash auto-keys.
• Hash user keys.
• Key range.
• Pass-through .
• Round-robin.

Configuring Partitioning
• Create or edit a session .
• Update partitioning information using the Partitions view on the Mapping tab
of session properties.
• Add, delete, or edit partition points on the Partitions view of session

properties .

Configuring a Partition Point
• You can configure the following information when you edit or add a partition point:
• Specify the partition type at the partition point.
• Add and delete partitions.
• Enter a description for each partition.

Hash user keys
• The Integration Service uses a hash function to group rows of data among partitions .
• Improves the performance of the session , the hash function usually processes
numerical data more quickly than string data.
• Specify a hash key for user hash key.
• We have created a sample mapping when we don’t configure this
mapping(m_orders_scd3) for Partitioning then the run time comes up to 37 seconds

Hash user keys contd..
• using hash user key partition the run time comes up to 22 seconds to complete the
session as shown in the below figure.

Key range partition
• With key range partitioning, the Integration Service distributes rows of data based on a port.
• you define a range of values.

Key range partition contd..
• using key range partition the run time comes up to 33 seconds to complete the

Partition details
• Source/target statistics

Hash auto-keys
• Use hash auto-keys partitioning at or before Rank, Sorter, Joiner,

and unsorted Aggregator transformations.
• The Integration Service distributes rows to each partition according
to group before they enter the Sorter and Aggregator
transformations .

Pass-Through Partition Type
• In pass-through partitioning, the Integration Service processes data without

redistributing rows among partitions.
• Increases data throughput , without increasing number of partitions.

Round-Robin Partition Type
• In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions .
• The session based on this mapping reads item information from three flat files of different sizes:
• Source file 1: 80,000 rows
• When the Integration Service reads the source data, the first partition begins processing 80% of the
data, the second partition processes 5% of the data, and the third partition processes 15% of the
data.
• To distribute the workload more evenly, set a partition point at the Filter transformation and set the
partition type to round-robin. The Integration Service distributes the data so that each partition
processes approximately one-third of the data.

Dynamic Partitioning
• If the volume of data grows or you add more CPUs, you might need to adjust
partitioning so the session run time does not increase.
• When you use dynamic partitioning, you can configure the partition information so
the Integration Service determines the number of partitions to create at run time.
• The Integration Service scales the number of session partitions at run time based on
factors such as source database partitions or the number of nodes in a grid.

Configuring Dynamic Partitioning

Configuring Dynamic Partitioning contd..
• Configure dynamic partitioning using one of the following methods:
• Disabled. Do not use dynamic partitioning. Defines the number of partitions on the
Mapping tab.
• Based on number of partitions. Sets the partitions to a number that you define in
the Number of Partitions attribute. Use the $DynamicPartitionCount session
parameter, or enter a number greater than 1.
• Based on number of nodes in grid. Sets the partitions to the number of nodes in the
grid running the session. If you configure this option for sessions that do not run on a
grid, the session runs in one partition and logs a message in the session log.
• Based on source partitioning. Determines the number of partitions using database

partition information. The number of partitions is the maximum of the number of
partitions at the source.

Based on number of partitions
• Edit the task , go to config object tab. Set the dynamic partition as based on number
of partitions, number of partitions 3.

Based on number of partitions contd..
• Using Dynamic partition the run time comes up to 32 seconds to complete the

Partition details
• Source/target statistics

Based on number of nodes in grid
• Edit the task , go to config object tab. Set the dynamic partition as based on number
of nodes in grid.

Based on number of nodes in grid contd..
• Using Dynamic partition the run time comes up to 25 seconds to complete the

Based on source partitioning
• Edit the task , go to config object tab. Set the dynamic partition
as based on source partition

Based on source partitioning contd..
• Using this option Dynamic partition the run time comes up to

20 seconds to complete the session as shown in the below
figure.

Advantages of Dynamic Partition
 Session run time does not increase with volume of data grows or you add
more CPUs.
 Scales cost-effectively to handle large data volumes.

• Enhances developer productivity.
• Optimizes system performance in response to changing business
requirements.
• Even though any system fails , session will be completed. ( grid computing).

LIMITATIONS OF DYNAMIC PARTITION
• You cannot use dynamic partitioning with XML sources

and targets.
• You cannot use dynamic partitioning with the Debugger.

Thanks

Dynamic Partitioning in Informatca 8.X

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Dynamic Partitioning in Informatca 8.X

Caricato da

Copyright:

Formati disponibili

Dynamic Partitioning

integration * intelligence * insight

integration * intelligence * insight

 PowerCenter introduces a service-oriented architecture

 PowerCenter introduces a domain, which serves as the primary unit of

 A domain is a collection of nodes and services in the PowerCenter

integration * intelligence * insight

• The Administration Console is a browser-based utility that enables you to

• The Navigator displays the following types of objects:

• Domain. You can view one domain in the Administration Console

• Node. A node represents a machine in the domain.

• Grid. Create a grid to run the Integration Service on multiple nodes.

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

• High availability is the PowerCenter option that eliminates a single point of

• High availability provides the following functionality:

integration * intelligence * insight

• The Partitioning Option increases PowerCenter’s performance through parallel data

integration * intelligence * insight

• Create or edit a session .

• Add, delete, or edit partition points on the Partitions view of session

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

• Use hash auto-keys partitioning at or before Rank, Sorter, Joiner,

integration * intelligence * insight

• In pass-through partitioning, the Integration Service processes data without

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

• Configure dynamic partitioning using one of the following methods:

• Based on source partitioning. Determines the number of partitions using database

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

integration * intelligence * insight

• Using this option Dynamic partition the run time comes up to

integration * intelligence * insight

 Scales cost-effectively to handle large data volumes.

integration * intelligence * insight

• You cannot use dynamic partitioning with XML sources

• You cannot use dynamic partitioning with the Debugger.

integration * intelligence * insight

integration * intelligence * insight

Potrebbero piacerti anche