Sei sulla pagina 1di 32

Dynamic Partitioning

integration * intelligence * insight


AGENDA

High Availability
Grid Computing
Dynamic Partitioning

integration * intelligence * insight


Introduction

PowerCenter Domains

 PowerCenter introduces a service-oriented architecture

 PowerCenter introduces a domain, which serves as the primary unit of


administration for the PowerCenter environment.

 A domain is a collection of nodes and services in the PowerCenter


environment.

 The first time you install Informatica Services, you create a domain and add a
node to the domain.

integration * intelligence * insight


Administration Console

• The Administration Console is a browser-based utility that enables you to


view domain properties and perform basic domain administration tasks

• The Navigator displays the following types of objects:

• Domain. You can view one domain in the Administration Console

• Node. A node represents a machine in the domain.

• Grid. Create a grid to run the Integration Service on multiple nodes.

integration * intelligence * insight


Administration Console

integration * intelligence * insight


Administration Console contd..

integration * intelligence * insight


Administration Console contd..

integration * intelligence * insight


High Availability

• High availability is the PowerCenter option that eliminates a single point of


failure in the PowerCenter environment

• High availability provides the following functionality:

• Resilience.

• Failover.

• Recovery.

integration * intelligence * insight


The Partitioning Option

• The Partitioning Option increases PowerCenter’s performance through parallel data


processing .
• When the Integration Service runs the session, it can achieve higher performance by
partitioning the pipeline and performing the extract, transformation, and load for each partition
in parallel.

• Partition Types :
• Database partitioning.
• Hash auto-keys.
• Hash user keys.
• Key range.
• Pass-through .
• Round-robin.

integration * intelligence * insight


Configuring Partitioning

• Create or edit a session .

• Update partitioning information using the Partitions view on the Mapping tab
of session properties.

• Add, delete, or edit partition points on the Partitions view of session


properties .

integration * intelligence * insight


Configuring a Partition Point

• You can configure the following information when you edit or add a partition point:
• Specify the partition type at the partition point.
• Add and delete partitions.
• Enter a description for each partition.

integration * intelligence * insight


Hash user keys

• The Integration Service uses a hash function to group rows of data among partitions .
• Improves the performance of the session , the hash function usually processes
numerical data more quickly than string data.
• Specify a hash key for user hash key.
• We have created a sample mapping when we don’t configure this
mapping(m_orders_scd3) for Partitioning then the run time comes up to 37 seconds

integration * intelligence * insight


Hash user keys contd..

• using hash user key partition the run time comes up to 22 seconds to complete the
session as shown in the below figure.

integration * intelligence * insight


Key range partition

• With key range partitioning, the Integration Service distributes rows of data based on a port.
• you define a range of values.

integration * intelligence * insight


Key range partition contd..

• using key range partition the run time comes up to 33 seconds to complete the
session as shown in the below figure.

integration * intelligence * insight


Partition details

• Source/target statistics

integration * intelligence * insight


Hash auto-keys

• Use hash auto-keys partitioning at or before Rank, Sorter, Joiner,


and unsorted Aggregator transformations.
• The Integration Service distributes rows to each partition according
to group before they enter the Sorter and Aggregator
transformations .

integration * intelligence * insight


Pass-Through Partition Type

• In pass-through partitioning, the Integration Service processes data without


redistributing rows among partitions.
• Increases data throughput , without increasing number of partitions.

integration * intelligence * insight


Round-Robin Partition Type

• In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions .

• The session based on this mapping reads item information from three flat files of different sizes:
• Source file 1: 80,000 rows
• Source file 2: 5,000 rows
• Source file 3: 15,000 rows
• When the Integration Service reads the source data, the first partition begins processing 80% of the
data, the second partition processes 5% of the data, and the third partition processes 15% of the
data.
• To distribute the workload more evenly, set a partition point at the Filter transformation and set the
partition type to round-robin. The Integration Service distributes the data so that each partition
processes approximately one-third of the data.

integration * intelligence * insight


Dynamic Partitioning

• If the volume of data grows or you add more CPUs, you might need to adjust
partitioning so the session run time does not increase.

• When you use dynamic partitioning, you can configure the partition information so
the Integration Service determines the number of partitions to create at run time.

• The Integration Service scales the number of session partitions at run time based on
factors such as source database partitions or the number of nodes in a grid.

integration * intelligence * insight


Configuring Dynamic Partitioning

integration * intelligence * insight


Configuring Dynamic Partitioning contd..

• Configure dynamic partitioning using one of the following methods:

• Disabled. Do not use dynamic partitioning. Defines the number of partitions on the
Mapping tab.

• Based on number of partitions. Sets the partitions to a number that you define in
the Number of Partitions attribute. Use the $DynamicPartitionCount session
parameter, or enter a number greater than 1.

• Based on number of nodes in grid. Sets the partitions to the number of nodes in the
grid running the session. If you configure this option for sessions that do not run on a
grid, the session runs in one partition and logs a message in the session log.

• Based on source partitioning. Determines the number of partitions using database


partition information. The number of partitions is the maximum of the number of
partitions at the source.

integration * intelligence * insight


Based on number of partitions

• Edit the task , go to config object tab. Set the dynamic partition as based on number
of partitions, number of partitions 3.

integration * intelligence * insight


Based on number of partitions contd..

• Using Dynamic partition the run time comes up to 32 seconds to complete the
session as shown in the below figure.

integration * intelligence * insight


Partition details

• Source/target statistics

integration * intelligence * insight


Based on number of nodes in grid

• Edit the task , go to config object tab. Set the dynamic partition as based on number
of nodes in grid.

integration * intelligence * insight


Based on number of nodes in grid contd..

• Using Dynamic partition the run time comes up to 25 seconds to complete the
session as shown in the below figure.

integration * intelligence * insight


Based on source partitioning

• Edit the task , go to config object tab. Set the dynamic partition
as based on source partition

integration * intelligence * insight


Based on source partitioning contd..

• Using this option Dynamic partition the run time comes up to


20 seconds to complete the session as shown in the below
figure.

integration * intelligence * insight


Advantages of Dynamic Partition

 Session run time does not increase with volume of data grows or you add
more CPUs.

 Scales cost-effectively to handle large data volumes.


• Enhances developer productivity.
• Optimizes system performance in response to changing business
requirements.

• Even though any system fails , session will be completed. ( grid computing).

integration * intelligence * insight


LIMITATIONS OF DYNAMIC PARTITION

• You cannot use dynamic partitioning with XML sources


and targets.

• You cannot use dynamic partitioning with the Debugger.

integration * intelligence * insight


Thanks

integration * intelligence * insight

Potrebbero piacerti anche