Sei sulla pagina 1di 14

:1 Components

Components computer running the


Co>Operating System.
The Initio Component Library contains diverse,
built-in set of components.
The particular work component accomplishes
depends its parameter settings.
Some components require data transformation
parameter, that iS, set of business rules to
applied to input(s) to produce required output.
Viewing Component Properties
""':1. ",, -
. --_ .. _ .. __ ....... , ..........

click
10 bring
up ils Proper1ies Page
,1. Datasets
dataset is source destination of data. It
simple file, database tabIe, SAS dataset,
Datasets reside machine running the
Co>Operating System.
Datasets reside other machines if connected
database middleware.
Data within dataset must always exactly
described using Initio's Data Manipulation
Language (DML) to form record format metadata.
Viewing Port Properties
Cl ick the Ports
10 view the 'Port(s)'
Proper1ies
Dataset : Records and Fields
dataset is made
up of
consists of
fields.
Analogous database
terms rQWS
and columns
Records
FieJds
Data Manipulation Language - DML
Data Manipulation Language - the Initio
language used to define:
record formats (whlch kinds of types),

functions,
and key speciflers .
1
Record Format Metadata in Graphical Form
I Field and Field Length
i'"
There several built-in types avai labIe via the drop-down
menu. Most types: string, decimal (fOr all numbers),
and date.
date requires format specifier that is exact
representat ion of the date (e.g. , "MM-DD YYYY") .
field length is either number for fixed length fields, the
delimiter that terminates the field for variabIe length fields.
The Record Format Metadata in GDE and in text form

r_IFv>coiono!
... "op- J .. .. .. l
"
record
decirnal(4) id;
firscname;
lascname;
string(5) newfield;
end
., t,;" ..
-," ...
.. " I
,
. ,
Text Record Format for Date Field
record
decimal(4) id;
first_name;
last_name;
newfield;
end;
:J" Field Names
-"".---_ ....... ... .
Names consist of letters, digits, and underscores:
... z, .. . Z, ... 9, _
Note: No spaces, hyphens, $'s, #'s, %'s
Case does matters' and different!
Some words reserved (record, end, date, ... )
J What Data Described?
.There both fixed-size and
types.
ASCII, UNICODE character sets

types represent strings,
numbers, numbers, packed decimals,
dates ...
Complex data formats consist of nested
records, vectors, ...
2
Record Format Editing
The View Data Panel
( .........
-------']
-
Expressions in DML
Computations expressed in the algebraic syntax of
Pascal, etc.
Field names ad as variabIes.
Arithmetic +, -, *, ...
Comparison > I <, = =, ! =, .. .
built-in fundions; string_concat, string_trim, today,
date_day_oCweek,
Simple Components
,.
Filter
Expr ession <10 5".(:1'
-1_ Son
In these components the
record format metadata
does change from
input to output
Viewing Data
'.1 i ' .
',;.1" The Filter Expression Component
-1.
For each the input port the 'seled_expr' parameter is
evaluated. If 'seled_expr' evaluates true (non-zero), the input
record is written to the 'out' port exadly as the input was read.
If the 'seled_expr' evaluates false (zero), the record is written
to the 'deseled' port.
The 'out' port must conneded downstream, those records
meeting the 'seled_expr' criteria propagated to the

The 'deseled' output optionally used
3
Filter Data (Selection)
t +# ...
'1:!Fie Edt vew Insert oet>uooer wir"dow
"" .-111' ''f; @
s8n1pte 6
.i% .. i'i.i'i'i.ooZ:'M
The Sort Component
Reads records from input port, sorts them
key, and writes the result the output port.
.. Expression Parameter

.,
Sorting
key identifies single field set of fields composite key)
used to organize dataset in some way,
5ingle field: {id}
Multiple field: {lasCname; firscname}
Modifiers: {id descending}
Used for sorting, grouping, partitioning.
Sorting - The Specifier Editor


.
.
" sor1
'-
SII11ple SQrted
, 1.' . 1 .... '<=),1 t '5
4
Complex Components
-1 r-
In these components the
record format metadata
typically changes (goes
through transformation)
from input to output
,.) Transformation Functions
transform function specifies the business
rules used to create the output record.
Each field of the output record must
successfully assigned value. Partial output
records not allowed!
The Transform Editor is used to create
transform function in graphical manner.
d lilc imal (7) ld;
strln9(Bj 1<1st_,,;,.me
bday;
Input record format :
decim.al (" ,") id;
da.te( ''I+iDD'fY") bday;
stri ng(","' )f'irst
string(" ;") last=:naJI\Q;
The Transform Function Editor

out reformat (in) ..
begin
out.id:- in.id + 1000000 ;
: .
end;
The Component
Reads records from input port, reformats each according
to transform function and writes the result records to
the output port.
Additional output ports created.

Simple L-____ --' Simple-Out
Look Inside the Reformat Component
5
Record arrives at the input port
out (in)
-
b e gin
out . in.b - 1 ,
Qut . y in .
out . z fnlinc) ;

Since every rule within the Transform function is
result record is issued
out trll. ns (in) :
begin
out in.b - 1,
out . y in.
out . z ;
e nd ;
The Record is read into the component
out trans ( in)
beg in
out . in.b - 1,
out in . a ;
out . z fn(in . c) ;
end ;
The result record is written to the output port of the

out trans :
begin
in . b - 1,
out i n .
out
, fn(in . c) ;
end ;
The Transformation Function is evaluated
out . . trAns(in'} :
bIiIqin
cut . x in.b - 1 ;
out.y in.a;
cut,z fn(in:.c) ;
end;
Data Aggregation
mimic the kind of availabIe as extensions to
the database command, Initio has 2
components:
Aggregate
Rol lup
city.

Data Aggregation of SortedjGrouped Input
Rollup Wizard
__ __

, The Rollup Component
" ... .,;.c--- ...
default, Rollup reads grouped records from
the input aggregates them as indicated key
and transform parameters, and writes the resulting
aggregate record the out
Joining Data
1), Built-in Functions for Rollup
-
The following aggregation functions
predefined and only availabIe in the rollup
component:
avg
count
first
last

min
product
sum
Joining Sorted Data the 'id' field
7
Building the Output Record
Resulting display when out.dt is selected
'ilttt!?, .. .. ;


Inpul$
1nl
"
:3
Prioritized Assignment
..
-
J /
out . dt : 1: inl . dt;
out.dt : 2: "1900/01/01";
In DML, missing value (say, if there is 'inl' recard)
causes assignment ta fail.
If assignment left hand side fails, the
priarity assignment is tried. There must
successful assignment each autput field.
The Join Component
Jain perfarms jain af inputs. default, the inputs ta jain
must sarted and inner jain is camputed.
Assigning Priorities to Business Rules
,.
50"
ID
94
VI.lt. '----'"
8
out :: inl l ..
beqin
"join-type" Full Ouler
join
The input fields compared
out :: i n11 ..
beqin
inl 20 ;
: 2 inO . b" 10 ;
i n1 . q ;
out . q
end:
Records arrive at the inputs of the Join
beq1n
i "O . &;
in1 .. 20 ;
i nO . b " 10 ;
ou t q l ' i l\1 q ;
out q 2 :


The aligned records passed to the transformation
function
out :: )o1n l:l. nO , in1 1 ..
beqin
out . Q 1 i n 1 q
out . q . 2
begi n
inO. &;
1 inl . r " 20 ;
"
out . Q 1 inl . q ;
"
The transformation
out ;: j oin(iflO, 1n1)
beg:in
out.X :2
inO.&;
in1.r" 20;
out .. q:1 in1.q;
out.q:2
The t ransformation
engine evaluates based
the inputs
9
Output
out :: )oin( i nO , inll
beqin
out . q
10;
i nl . q ;

result record is
emitted and written out
as long as all output
frelds have been
successfully computed.
The input key fields compared
1 inl . 1'+ :;>0 ;
:1.,,0 . 0" 10 ;
out_q 1 inl . q;
..
:l.nQ . Jo;
1 : 11\1 r .. 20;
. 2 : 1nO. b 10;
out q 1 " inl q:
out . q . 2
The aligned passed to the
function
be9in
. 1 : inl . 1'+ 20 ;
: 2 : inO. b .. 1 0 ;
out q " 1 " inl . q;
out q : 2 :
Again, they into the Join component
ou! :: )Oin(l"O, 1 1\1)
beqin
inl . 1''' 20 ;
inO . b" 10;
out . q 1 1nl . q;
out q 2 :
..
The engine evaluates based the

,

:2:
Ol,lt.q :1: i nl.q;
out..q :2; ..

10
is and out

1 ! nl . z .10;
2 ,
1 i nl q ;
cut q :1 '



Steps in Building Application
new ('File>New') then do
'Fi le>Save As' (i.e.,
Begin the application
Configuring the Input Dataset
I

I


Steps in Building Application
new graph ('File>New') then do
'Fi le>Save As' (i.e., my_graph)
Begin the application construction
Add datasets. Where they sourced from, where does the
output goes
7
Add components.
Add flows.
Edit Component Parameters as needed.
Run the
:..J ........... OU<

,

.
.J;.

...
11
J Adding Output Dataset


....J ("" ... . .
...,Jr"...,...,""s
' ...J ( .. ot

EiI .. .. ' ...
::! .. ' .....
-
- ---+ .", ... _.,,,
Q ....... ..
. -...
::J
;s;
,

...JF ..... ,,'

I Configuring Filter Expression
' MfEF' 5 h .
,, 1
j


'
::'- ....,.. ....... '""'"

j
.i"iiL"MA r m ' . ' _ r
_ ..
Configuring the Output Dataset
Running the Application
( ' i .. "

fJ;
Adding Flows
..
".1 Forms of Parallelism
-1"1
Component parallelism
Pipeline parallelism
Data parallelism
J
"' ,!
12
Component Parallelism
,--- ---
-Z::::J
Comes " for free" with graph programming.
Limitat ion:
.. to number 01 "branches"
Two Ways of Looking at Data Parallelism
Comes "for free" with graph programming.
Limitations:
Scales to length of "branches" in graph.
Some operations, like do not pipeline.
Data Partitioning

InpulFil 8

L:::::I 0
. Data Parallelism
Data Partitioning: The Global View
,- h-1 1 Pa"ltlon .... Score
Inpu' Fil. 1 L
.-
13
, ' Th Global View Data Parallel

...
Loading data (Cont'd)
d types of parallelism
I
Mutually use

Types not mutually f Initio Parallelism
distinct, " take advantage of,
Each graph at time In
t types of
differen f the graph.
different areas
Customers
tabIe
D- -
Loading data
Customers
\ d I
OML : customers. m
f ile : Customers .dat
to load into
Unloading data (Cont'd)
14

Potrebbero piacerti anche