Sei sulla pagina 1di 7

PARTITION COMPONENTS

Partition components distribute data records to multiple flow partitions to support data
parallelism, as follows:

BROADCAST arbitrarily combines all the data records it receives into a single
flow and writes a copy of that flow to each of its output flow partitions.
PARTITION BY EXPRESSION distributes data records to its output flow
partitions according to a specified DML expression.
PARTITION BY KEY distributes data records to its output flow partitions
according to key values.
PARTITION BY PERCENTAGE distributes a specified percent of the total
number of input data records to each output flow.
PARTITION BY RANGE distributes data records to its output flow partitions
according to the ranges of key values specified for each partition.
PARTITION BY ROUND-ROBIN distributes data records evenly to each
output flow.
PARTITION WITH LOAD BALANCE distributes data records to its output
flow partitions, writing more records to the flow partitions that consume records
faster.

BROADCAST
Purpose
Broadcast arbitrarily combines all the data records it receives into a single flow and
writes a copy of that flow to each of its output flow partitions.
Parameters
None.
Runtime Behavior
Broadcast:
Reads records from all flows on the in port
Combines the records arbitrarily into a single flow
Copies all the records to all the flow partitions connected to the out port
Use Broadcast to increase data parallelism when you have connected a single fan-out
flow to the out port or to increase component parallelism when you have connected
multiple straight flows to the out port.
Broadcast does not support default record assignment. To avoid unpredictable results,
make the record format of the in and out ports identical.

PARTITION BY EXPRESSION
Purpose
Partition by Expression distributes data records to its output flow partitions according to a
specified DML expression. The output port for Partition by Expression is ordered.
Parameter
Function (expression, required)
DML expression using a field or fields from the input record format:
The expression must evaluate to a number between 0 and the number of flows
connected to the out port minus 1.
Partition by Expression routes the record to the flow number returned by this
expression.
Flow numbers start at 0.
Runtime Behavior
Although Partition by Expression accepts fan-out flows on its out ports, the effects of
using them can be difficult to predict. If you need to use a fan-out flow, make certain you
understand what happens when you do.
The Partition by Expression component:
Reads records in arbitrary order from the flows connected to the in port
Distributes the records to the flows connected to the out port, according to the
expression in the function parameter
Example
If the input records have a five-digit field named zipcode, the following DML expression
(zipcode divided by 10000) divides the total number of data records into 10 sections,
based on the first digit of the zip code field:
zipcode/10000

PARTITION BY KEY
Purpose
Partition by Key distributes data records to its output flow partitions according to key
values.
Parameter
Key (key specifier, required)
Names(s) of the key field(s) you want Partition by Key to use when it distributes data
records among flow partitions.
Runtime Behavior

Partition by Key is typically followed by SORT as shown in the Example. Alternatively,


you can use the PARTITION BY KEY AND SORT component.
The Partition by Key component:
Reads records in arbitrary order from the in port
Distributes them to the flows connected to the out port, according to the key
parameter, writing records with the same key value to the same output flow

PARTITION BY PERCENTAGE
Purpose
Partition by Percentage distributes a specified percentage of the total number of input
data records to each output flow
Parameter
Percentages string, optional)
List of percentages expressed as integers from 1 to 100, separated by spaces.
Runtime Behaviorof Partition by Percentage
The Partition by Percentage component:
Reads records from the in port
Writes a specified percentage of the input records to each flow on the out port
You can supply the percentages that Partition by Percentage uses to partition data records
in either of two ways:
By specifying the percentages in the percentages parameter.
By connecting the output of any component that produces a list of percentages
to the pct port of Partition by Percentage. Use decimal('\n') as the record
format for the pct port of Partition by Percentage.
You can assign a different percentage to each output flow. Express percentages as
integers from 1 to 100. The numbers supplied by the percentages parameter or the pct
port represent percentages, and therefore must sum to 100 or less. If the percentages sum
to less than 100, the last flow gets the remainder. If you have n flows connected to the out
port, you can specify n-1 percentages.
Example of Partition by Percentage
In the following example, Partition by Percentage distributes 100 data records to 3
different flow partitions. If the percentages supplied are 20 and 10, Partition by
Percentage distributes the records as follows:

Note that, although within each output flow partition the records still appear in the same
order as they did in the input, Partition by Percentage distributes individual records to
individual output flow partitions at random.
In other words, if you were to concatenate the output flow partitions, you would not end
up with records in the same order as they were in the input.

PARTITION BY RANGE
Purpose
Partition by Range distributes data records to its output flow partitions according to the
ranges of key values specified for each partition. The component distributes the records
relatively equally among the partitions. The records with the key values that come first in
the key order go to partition 0, the records with the key values that come next in the order
go to partition 1, and so on. The records with the key values that come last in the key
order go to the partition with the highest number.
It is recommended to keep Automatic Flow Buffering, the default, turned on for partition
by range. This component reads input from its flows in a specific order. Thus, turning off
Automatic Flow Buffering could cause deadlock..
Parameter
Key (key specifier, required)
Name(s) of the field(s) containing the key values you want Partition by Range to use to
determine which partition to add each input record to.
The field(s) specified must exist in the record formats for both the in and split ports, and
must be of the same type in both record formats.
Runtime Behaviorof Partition by Range

Use Partition by Range when you want to divide data into useful, relatively equal,
groups. Input can be sorted or unsorted. If the input is sorted, the output is sorted. If the
input is unsorted, the output is unsorted.
Typically, you route the output from the out port of FIND SPLITTERS to the split port
of Partition by Range. When you do this, also do the following:
Use the same key specifier for both components.
Make the number of partitions on the flow connected to the out port of Partition by
Range the same as the value in the num_partitions parameter of Find Splitters.
The Partition by Range component:
Reads splitter records from the split port, and assumes that these records are sorted
according to the key parameter.
Determines whether the number of flows connected to the out port is equal to n
(where n-1 represents the number of splitter records).
If not, Partition by Range writes an error message and stops the execution of the graph.
Reads data records from the flows connected to the in port in arbitrary order.
Distributes the data records to the flows connected to the out port according to the
values of the key field(s), as follows:
Assigns records with key values less than or equal to the first splitter record to the
first output flow.
Assigns records with key values greater than the first splitter record, but less than
or equal to the second splitter record to the second output flow, and so on.

PARTITION BY ROUND-ROBIN
Purpose
Partition by Round-robin distributes blocks of data records evenly to each output flow in
round-robin fashion. The output port for Partition by Round-robin is ordered.
Parameter
block_size (integer, required)
Number of records distributed to one flow before distributing the same number to the
next flow. Default is 1.
Runtime Behavior
The Partition by Round-robin component:
Reads records from the in port.
Distributes them in block_size chunks to its output flows according to the order in
which the flows are connected.
The effect is similar to dealing a deck of cards.
Example
This example shows how Partition by Round-robin writes to its output flows according to
their connection order.

Suppose you attach four flows to the Partition by Round-robin output port, as shown in
the following figure. The Partition by Round-robin component writes to Load-1, then
Load-2, then Load-3, then Load-4, then back to Load-1 again.

PARTITION WITH LOAD BALANCE


Purpose
Partition with Load Balance distributes data records to its output flow partitions by
writing more records to the flow partitions that consume records faster.
The output port for Partition with Load Balance is ordered.
Parameters
None.
Runtime Behavior
NOTE: Although Partition with Load Balance balances the workload between CPUs, the
resulting number of data records in each partition can be unbalanced. You can use
PARTITION BY ROUND-ROBIN to balance the number of data records among
partitions.
The Partition with Load Balance component:
Reads records in arbitrary order from the flows connected to its in port
Distributes those records among the flows connected to its out port by
sending more records to the flows that consume records faster
Partition with Load Balance writes data records until each flow's output buffer fills up.

Potrebbero piacerti anche