Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction ................................................................................................................................ 2 Palo connections ......................................................................................................................... 3 Palo input step ............................................................................................................................ 6 Palo output step .......................................................................................................................... 8 Palo engine step .......................................................................................................................... 9 Palo engine step usage and configuration ................................................................................ 9 Working with rules repository ............................................................................................... 11 Engine rules .............................................................................................................................. 13 Rule execution ...................................................................................................................... 13 Rule Node ............................................................................................................................. 13 Parameters and output stream ................................................................................................ 13 Assignment Node .................................................................................................................. 14 Call Node .............................................................................................................................. 14 Condition Node ..................................................................................................................... 15 Enumeration Node ................................................................................................................ 15 Immediately and buffered rule execution............................................................................... 15 Nested rules .......................................................................................................................... 16 Expression Syntax .................................................................................................................... 18 Numeric operations and functions ......................................................................................... 18 String operations and functions ............................................................................................. 19 Logical operations and functions ........................................................................................... 20 Sets of values ........................................................................................................................ 20 Dimension element operations and functions ......................................................................... 21 Cube cells operations ............................................................................................................ 22 Aggregate function ................................................................................................................ 23 Working with input and output streams ................................................................................. 23 Database operations .............................................................................................................. 24 Logging functions ................................................................................................................. 25 Error messages description........................................................................................................ 26 Compiler errors ..................................................................................................................... 26 Evaluating errors ................................................................................................................... 27 Engine errors......................................................................................................................... 28 Rule reading errors ................................................................................................................ 29 Examples .................................................................................................................................. 31 Exporting cube structure to stream ........................................................................................ 31 Create cube by given structure .............................................................................................. 32 Currency exchange example .................................................................................................. 33 Cube examples ...................................................................................................................... 33 Creating and filling germany cube ......................................................................................... 33 Creating and filling cube for other countries .......................................................................... 35 Creating and filling global cube............................................................................................. 35 Creating and filling analytical cube ....................................................................................... 36
Introduction
This document contains detailed description how to process data in Palo Cubes by using Kettle transformations. This data processing is performed by using special Kettle extensions. Those software modules allow connecting with Palo servers, getting data from Palo servers and putting data to Palo. Described software is well integrated with another Kettle tools. So, it is able to use this software with already existed Kettle tools (for example, loading data from flat file). Finally, this software allows accomplishing more complex data manipulation by using Palo Rule Engine (PRE) in Kettle transformations. This document includes listed below topics: How to connect with Palo servers in Kettle transformations. How to get data from Palo servers in Kettle transformations. How to put data to Palo servers in Kettle transformations. How to use PRE in Kettle transformations and write PRE rules for complex data processing by Kettle transformations.
Palo connections
If it is required to process data from Palo during Kettle transformation, then it need to describe each connection with Palo server (or several servers). There is no significant difference between Palo connection and connection with another database servers (for example, Oracle). So, you can create new connection by Kettles Connection Wizard or enter connection parameters in Kettle connection dialog. It is clean that you can create several Palo connections in same kettle transformation. For configure connection with one Palo server, it is need to specify: Connection name. This field doesnt affect on Palo connection, but is used for identifying this connection among another connection. For example, name of connection is used in Kettle transformation steps configurations (see below). Palo connection type. Palo server may be remote server and it is need to specify its network address and port. Database name, because same Palo server can manage several databases. Moreover, each database can contain several data cubes, but it is able to process data in these cubes by same Palo connection. Authentification parameters (user name and password). Following figure illustrates creation (or modifying) of Palo connection.
Fig 1. Palo connection dialog After setting up common, obligatory parameters, you may set up Palo specific parameters on Palo tap page of that dialog. I some cases their proper set up is a strong requirement, so, please be careful about them.
Fig 2. Palo-specific parameters Palo 1.0 supports only one interface, called Legacy. Palo 1.5 in addition supports new API based on HTTP. Two different access drivers are used for serve two different interfaces, so in some cases switch between access methods might give performance boost. In general HTTP connection interface is faster in most cases. Connection dialog contains special helpful buttons for working with already configured connection: Button Test opens connection with specified database server. If connection successful then it shows special dialog window. If connection cant be established then module shows detailed description of happen error. Button Explore allows viewing database contents. This feature is helpful if user wants to get information about Palo database. Following figure illustrates this.
Fig 3. Palo database explorer Appeared dialog window contains detailed information about database cubes, dimensions and elements. But, this window doesnt allow viewing cube content and it is necessary to use another software for this task. Button Feature List allows getting more structured information about connection. Following figure illustrates this.
Fig 5. Palo input step configuration Configuration window allow to specify listed parameters: 1. Step name. This value used only for identifying Palo input step among other transformation steps. 2. Palo connection and cube name. Palo input step will get data from this Palo cube. There two special buttons near connection combo box. They allow configuring already existed connection or creating new connection. 3. Field names in generated output stream. It is required to specify names for each cube dimension (in table) and value field. Palo input step automatically evaluates size of output stream by multiplying count of elements in each cube dimension and indicates this value in Combination factor. Really, this value is much bigger than real output stream size. But this value precisely indicates count of cube cells that will be scanned and allows to evaluate operation durability.
It is possible to decrease this time by specifying single element in any dimension. In this case, Palo input will not look all elements from this dimension. It is allowed to specify simple element or consolidated element. If first case, Palo input step processes only single element from dimension, but in second case, Palo input step processes all children of specified element. You need to click in elements column in order to see input steps shows special window, which allows selecting element from corresponding dimension.
User can select any element in showed dimension elements or click Clean button. If user clicks this button, then Palo input step clears element selection for this dimension and Palo input step would scan all elements from this dimension. Finally, user can click refresh button if wants to refresh contents of tree shown is required.
Fig 7. Palo output step configure Elements of this dialog allows to specify listed parameters: 1. Transformation step name. 2. Palo connection. There two special buttons near connection combo box. They allow configuring already existed connection or creating new connection. 3. Setup according between input row fields and cube dimensions. 4. Moreover, it is able to specify how to process element if there is no this element in dimension. There are three types of action: add numeric elements, add string elements or skip whole row. If element is numeric then corresponding cube cells can contain numeric values and they contain numeric values if all corresponding elements are numeric. Otherwise, if element is string then all corresponding cells contains string values.
Fig 10. Palo engine step dialog. Connections with Palo cubes This dialog window has elements for specifying: 1. Kettle transformation step name. 2. Rule that analyses data of input row and calculates fields of output row. Button Browse allows to view contents of rules repository and select rule (see below) 3. Tab Fields allows viewing structure of output stream. It includes count of fields, name and type of each field. 4. Tab Parameters allows to view rule parameters and specify values of them 5. Tab Connections allows viewing connections and setup accordance between cube aliases and real connections with Palo servers.
Fig 11. Rules repository It is able to create new rules, delete rules or modify already existed rule or function. Buttons allows selecting any rule, canceling selection or refreshing content of window. If user creates new rule or modifies already then kettle opens rule modification dialog (see next figure).
Fig 12. Rule modification window It is able to modify rule name, description and content. Rule context is XML document, which describes rule of data modifying. Syntax and meaning of XML document is described in following parts. Described window has buttons for checking rule and viewing fields of input stream.
Engine rules
As described above, rules repository contains user defined rules that can be used for data processing in Palo rule transformation steps. Palo rules can get data from input stream, from rule configuration parameters and from Palo cubes directly. Similarly, rules can put data to output stream or to Palo cube directly.
Rule execution
Rule has own name that identifies this rule in repository and rule can have description that can contain any text. This text may contain additional comments for a rule and it can be helpful for user. Each rule may be represented as a multibranch tree with single root node. Each node can be one of several types and it can have additional parameters depends of node type. Node execution is some actions, but those actions depend of node type and additional node parameters. In most cases, additional node parameters are expressions that calculate output or intermediate data. Rule execution consists of root node execution. Rule is easy represented as XML document. Each rule node is represented as XML element, which has specific attributes and children elements. Name of element depends of node type. All currently supported node types are described follow.
Rule Node
Root rule node element always is rule type node and corresponding XML element has name rule and has following format: <rule name=.... description=... immediately=true/false> ... </rule> It always has attribute name which contains name of rule and can have two additional attributes. First of them named description and contains non-restricted user comment for this rule. Second attribute named immediately and defines how rule stores data to Palo cubes or output stream. If this attribute has value true then rule engine stores data immediately. Else, Palo engine caches data modifications and stores cache content only when rule execution will be finished. Immediately execution and buffered execution are described in following section. Rule node execution consists in execution of each child node starting from first node and finished by last node.
used by rule. For this purpose, each cube must me declared as parameter with special type connection Similarly, if rule generates output stream, then it is need to declare structure of output stream. Each stream field must be declared by special XML element: <output name=... type=..../> Output field type can be numeric, string or boolean.
Assignment Node
This node type allows to set values of output stream or Palo cubes. In more usual terms, this node type realizes assignment operator. This node type has following structure in XML document. <set cell=... expression=... description=... /> This type has two additional parameters. First parameters name is cell and this parameter contains expression for calculate which cell (or output value) will me modified. So, this expression can refer to output stream, cube cell, consolidation factor and so on. In most of cases it is able to use special function only in this expression. These functions are described in following sections. Second parameter contains expression for calculating new value of modifying data. Both expressions must have same value type. Finally, assignment node can have description attribute for user comments. Assignment node cant have child nodes. Execution of assignment node consists of calculating source ad destination expression. After that, if rule stores data immediately, then node execution writes data to cube or output stream. And if rule doesnt work immediately, then node execution stores calculated value in cache. Immediately execution and buffered execution are described in following section.
Call Node
This node type allows to call functions that performs some operations such as creating new Palo cube and so on. This node type has following structure in XML document. <call expression=... description=... /> First parameter (expression) stores expression that refers to function and describes how to evaluate parameter values. Second parameter contains description attribute of this rule node. Call node cant have child nodes. Execution of call node consists of calculation given expression that consists of calculation of function parameters and execution of specified function. Call node can perform specified action in immediate or buffered mode.
Condition Node
This node type allows specifying execution branches in rule execution. In XML document condition node has following format. <if condition =... description=...> ... </if> <elseif condition =...> </elseif> <else> </else> Condition node always has condition expression that is stored in corresponding attribute and contains logical expression. Similarly to another node types, condition node can have description attribute with user comments. Execution of condition node involves calculation of condition. And, if condition expression is true, then condition node executes each child node in their order. But, if condition expression is false then condition node doesnt make anything. Elseif branches and else branch are executed consequently when if condition check has negative result.
Enumeration Node
This node allows to enumerate all items of specified collection and perform actions for each value from given collection. In XML document condition node has following format. <foreach name=... in=... description=> ... </foreach> First parameter (name) contains name of variable. This variable will store value of current item from collection. Second parameter (in) specifies collection of elements and represents expression with one of following result type: set, vector, rule result, cell enumeration. Enumeration node contains children nodes. They describes actions that are performed on collection items. Enumeration node can be empty (without children nodes), but such rule node doesnt do anything.
All buffered actions are performed after rule execution, but rule can perform all buffered actions by calling special function processActions. It is very helpful in some cases, For example, database creation rule uses it (See examples).
Nested rules
Nested rules are rules that are described inside rule. They calculate something and can be executed several times from different node positions. They are vary helpful in listed cases: 1. It is necessary to perform same actions in several cases. 2. It is necessary to calculate something and process it, but calculating is produced by complex node. Moreover, engine allows to declare variables and engine doesnt allows to modify variable values. But, variable value modifying is very comfortable in cases, then variable value is calculated by complex logic and depends from several conditions. This may be programmed in Palo engine by following mean. Special nested rule calculates value of variable and variable is declared with this value. 3. Rule is very complex and it is very helpful to disjoint rule on several simple rules. Nested rules has same format as normal rules. So, nested rules have name, description, input parameters and, finally, they can generate output stream. Nested rule can be executed in any expression as any internal engine function, but result of nested rule function is sequence of cortages. This sequence can be stored in variable or processed by enumeration node. Two special functions are accessible inside nested rules. First of them allows to write data to rule execution result and have name RuleName_OutputRow, Second of them allows to restart nested rule execution with other parameter values and have name RuleName_Restart. Following simple example demonstrates calculation of factorial.
<rule name=F immediately=true description=> <param name=N type=numeric/> <output name=V type=numeric/> <if condition=N=1> <call expression=F_OutputRow(1)/> </if> <if condition=N>1> <rule name=FComplex immediately=true description=> <param name=N1 type=numeric/> <param name=V1 type=numeric/> <output name=result type=numeric/> <if condition=N1=1> <call expression=F_OutputRow(V1)/> </if> <if condition=N1>1> <call expression=FComplex_Restart(N1-1,V1*N)/> </if> </rule> <foreach name=value in=FComplex(N,1)> <call expression=value.getNumeric(0)/> </foreach> </if> </rule>
Main rule F has one input parameter N and generates stream with one filed V. This stream always contains one cortege that contains value of factorial for given N. Further, if given N is equal to one, then rule writes one to result. Else it is used additional nested rule FComplex. It gets two parameter N1 and V1 and calculates N1!*V1. Internal logic of FComplex is very simply: if N1=1 then result of FComplex is V1 . Else, result of FComplex is equal to (N11)!*V1*N1. Finally, rule F gets result of FComplex and writes it to own result. Finally, is it able to store nested rule execution result to variable and check it by function isEmpty. This function returns true if rule result is empty sequence.
Expression Syntax
Working of most rule nodes based on expressions that determine how to analyze input data and calculate output data. Palo rule engine supports several data types listed below: Numeric data. Value of this data type is a floating-point number. String data. Value of this data type is string. There are no any restrictions on string content and length. Logic data (Boolean). Value of this data type is boolean flag. Single dimension elements. Value of this data type is single element from specified dimension in specified Palo database. Sets of dimension elements. Value of this data type is set of elements from specified dimension in specified Palo database. Sets of cube cells. Value of this data type is set of cells from specified cube. In other words, this is a cube projection. PRE expressions are strong-type expressions and any expression has definite type. Of course, this type depends on used operations and functions that are used in expressions. More important, this type always can be defined during expression parsing and compilation. Palo rule engine supports standard set of operations for values manipulation. For example, engine supports standard mathematical operations written in infix form and so on. Syntax and usage of these operations depends on operand types, but some operations are independent from value types and their syntax is fixed. So, it is able to use brackets for specifying more priority operations and it is able to use functions. Function call has following syntax: function-name( arg1,arg2,....,argN ) Function name can be not unique, but function can be unambiguously identified by its name, count of parameters and types of parameters. Function result depends of parameter values and can depend from any external data. For example, function can get additional information from Palo cubes, input stream and so on. This information isnt evidently specified in function arguments. In contrast, function result type is determined during expression compilation. It means that function result type depends on arguments count, arguments types and, probably, argument values if values can be calculated during expression compilation.
Sets of values
PRE gives several operations for manipulating by set of values. All sets havent any distinctions from other values and it is able to create sets of sets of any values. So, sets operations has following syntax: Set construction operator has following syntax: { item1, item2, item3 ,... } In this expression items must have same value type, but it is able to use expressions instead constant values. Result of this whole expression is set of listed values. Set union operation has syntax as numeric addition operation: A + B + C ... Expressions A,B,C must have set type. Result of whole expression is union of given arguments.
Set union operation has syntax as numeric multiplication: A * B * C ... Expressions A,B,C must have set type. Result of whole expression is intersection of given arguments
Set subtraction operation has syntax as numeric subtraction: A - B - C ... Expressions A,B,C must have set type. Result of whole expression is subtraction of given arguments.
Finally, it is able to get single elements from set by using following syntax: A[index] So, in this expression A is a function or variable that returns set and index is a numeric index of element.
It is able to enumerate all set items by using enumeration node. Moreover, PRE has several functions for working with set of values. These functions listed in following table. Function and arguments getSize(set) isEmpty(set) contains(set,elem) includes(set,subset) getEmpty(value) indexOf(set,value) getNumericRange(start, finish) Description Returns count of elements in given set. Checks given set and returns true if given set is empty. Checks given set and returns true if it contains given element. Checks given set and returns true if it contains each element of given subset. Creates empty set which can contains values of same type as given value. Returns index of given value in given set. Returns set of values in range (start, finish)
Description Returns name of given element. Returns type of given element. Type is a string values and may be one of following values: - consolidated - numeric - string - rule Returns set of children of the dimension element Returns set of all consolidated elements, which contains specified element. Returns set of all elements from dimension Returns set of all elements from dimension given by one element from this dimension or by given set of dimension elements. Returns specified element from dimension exists in the
getDimensionElement(connection, dimension, element) checkDimensionElement(connection, Checks whether specified element dimension, element) dimension
Checks projection and returns true if projection contains only numeric elements. Checks projection and returns true if projection contains only string elements. Returns numeric value of first projection element. It is able to use this function in set-cell engine items. Returns string value of first projection element. It is able to use this function in set-cell engine items. Caches values of given cube or sub cube. Caches values of given cube or sub cube, but limits cache size. Put value of specified cell to cache.
Aggregate function
PRE supports several aggregated functions listed below. Each function gets one or three parameters. If function gets one parameter then this parameter must be set of values or cube projection and function calculates result based on values from set or values from cube cells. If function gets three parameters, then first of them must be set of values or cube cells. Second must be string constant and specifies variable name in expression. Third parameter must be string constant too and specifies expression. This expression can use variable specified in second function parameter. Aggregate function maps values from given set of value or cube cells to specified variable and calculates specified expression for each value from given set or cube projection. Finally, function aggregates calculated values and results single value. Function and arguments Min(set) Min(set,varname,exp) Max(set) Max(set,varnname,exp) Avg(set) Avg(set,varname,exp) Sum(set) Sum(set,varname,exp) Description Calculates minimum value. Calculates minimum value. Calculates maximum value. Calculates maximum value. Calculates average value. Calculates average value. Calculates sum of all values. Calculates sum of expression values.
default input stream Checks whether current row is the first row of the specified input stream Checks whether current row is the last row of the default input stream Checks whether current row is the last row of the specified input stream Checks whether default input stream is finished Checks whether specified input stream is finished Tries to read next row from specified input stream Allows rows caching in specified input stream Marks stream as read Reads all remaining rows of specified stream
Database operations
PRE gives several functions for working with cubes, dimensions in database. Function and arguments Description getDatabaseCubes(connect) Returns set of cubes names in given Palo database. Database must be specified by connection name explained in rule connection. createDatabaseCube(connect,name,di Creates new cube by given gets connection name, cube ms) name and set of dimension names. It is able to use this function in set-call engine items only. getDatabaseDimensions(connect) Returns set of dimension names in given Palo database. Database must be specified by connection name explained in rule connection. createDatabaseDimension(connect,na Creates new dimension by given connection name and me) dimension name. It is able to use this function in set-call engine items only. getCubeDimensions(connect,cube) Returns set of dimension names in given Palo cube. This function similar to getDatabaseDimensions, but returns set of dimension names that are used in given cube. Function receives connection name and cube name as arguments. getDimensionElementsByName(conn Returns set of element names in specified dimension. ect,name) Function receives connection name and dimension name as arguments. createDimensionElement(connect,di Function allows to create new element. It receives m,name,type) connection name, dimension name, new element name and element type. Last argument (type) must be one of following values: - numeric - string - consolidated It is able to use this function in set-call engine items only. createConsolidation(connect,dim,ele Function allows to consolidate elements. So, it gets m,elemParent,factor) connection name, dimension name, child element name, parent element name and consolidation factor. After
consolidationFactor(elem,elemParent )
function execution specified children element becomes a children element in specified parent element. It is able to use this function in set-call engine items only. This function allows to work with already consolidated element. So, it returns consolidation factor if it is used in expression and it sets consolidation facto if it is used in set-call engine item.
Logging functions
PRE support several for writing messages to log and controlling data process. Function and arguments Description logMinimal(args) Writes message to log. Message is constructed by concatenating string representation of given arguments and will be visible on minimal loging level. logBasic(args) Writes message to log. Message is constructed by concatenating string representation of given arguments and will be visible on basic logging level. logError(args) Writes error message to log. Message is constructed by concatenating string representation of given arguments. logDetailed(args) Writes detailed message to log. Message is constructed by concatenating string representation of given arguments. logDebug(args) Writes debug message to log. Message is constructed by concatenating string representation of given arguments. stop() Stops data processing. processActions() Processes delayed (buffered) actions. incrementInputRows Increments count of read rows. incrementInputRows Increments count of wrote rows. InputRowsCount Returns count of read rows. OutputRowsCount Returns count of wrote rows.
Compiler errors
Code 1 Description Syntax error. (Cant parse expression) Error occurs if expression contains evident syntax error. For example, there is no close bracket and so on. In most cases, error description contains more detailed information about occurred error. Internal parser error. This is an unknown error that occurs in unintelligible cases. Attempt to incorrect add. This error occurs if user attempts to add value that doesnt support this operation. For example, user tries to add logic value to another logic value. Attempt to incorrect sub. This error occurs if user attempts to subtract value that doesnt support this operation. For example, user tries to subtract logic value from another logic value. Add different types. This error occurs if user tries to add values of different types. For example, user tries to add numeric value to string value. Attempt to incorrect multiplication. This error occurs if user attempts to multiply values that dont support this operation. For example, user tries to multiply logic value to another logic value. Attempt to multiply different types. This error occurs if user tries to multiply values of different types. For example, user tries to multiply numeric value on set of numeric values. Attempt to and different types. This error occurs if user tries to perform logical and on values of different types. For example, user tries to build logical and on logic value and numeric value. Attempt incorrect and. This error occurs if user attempts to perform logical and operation on values which dont support this operation. Attempt incorrect not. This error occurs if user attempts to perform logical not operation on values which dont support this operation. Attempt to or different types. This error occurs if user tries to perform logical or on values of different types. For example, user tries to build logical and on logic value and numeric value. Attempt incorrect or. This error occurs if user attempts to perform logical or operation on values which dont support this operation. Cant compare values.
2 3
10
11
12
13
14
15
16
17
18
19 20
21
This error occurs if user attempts to compare values that dont support specified comparison operation. Constructed set contains values of different types. User tries to create set of values, but specified set members has different type. For example, user tries to put into same set numeric value and string value. Index is not supported for this type. User tries to get index of value, but this value type doesnt support indexing operation. For example, it is unable to write i[10] if i is a numeric variable. Different indexing operations. User has specified different type value in same index. For example, he has written elements from different dimensions in same index. Result of condition expression has different types. Expression contains condition operator, but it is unable to determine result type of this operator. For example, user has wrote (i>j)?1:{1} . If i>j then expression result is numeric value, but if i<=j, then expression result is set of numeric values. Condition expression isnt logical expression. Condition expression in condition operator doesnt give boolean result. For example, expression i?1:0 is error expression if i isnt logical value. Function or variable not found. Error occurs if user has specified unknown function or variable name. Unknown database object identifier... Error occurs if user has specified some database object, but compiler cant treat this object. For example, user has written A.B.C, but A is not connection name or B isnt database object name. Invalid function arguments. Expression calls existed function, but function has another parameter count or parameter types.
Evaluating errors
Code 1 2 3 4 5 6 Description Division by zero. Dimension element not found. Database not found. Dimension not found. Cube not found. Cant set value. Specified expression doesnt represent any stored value and it is unable to set new value of specified variable. Invalid function argument. Function has got invalid argument value. Unable to setup constant. Specified expression always is a constant or isnt constant. Operation not supported. Cant compile string. Aggregate function cant compile given expression. Elements are not consolidated Specified dimension elements are not consolidated. Calculation error that doesnt depends from evaluator (some external error). Cant calculate function. Cant perform action.
7 8 9 10 11 12 13 14
Engine errors
Code 1 Description Cant compile left part of assignment. Error can occur during engine initialization. In assignment item unable to compile expression in cell attribute. Error description contains detailed information about compiler error. Cant compile right part of assignment. Error can occur during engine initialization. In assignment item unable to compile expression in expression attribute. Error description contains detailed information about compiler error. Cant calculate assigned value. Error can occur during rule execution if it is unable to calculate expression in assignment item. Error description contains detailed information about evaluator error. Cant set value. Error can occur during rule execution if it is unable to set new value of specified expression (see cell attribute in assignment item). Error description contains detailed information about evaluator error. Cant compile condition expression. Error can occur during engine initialization. In condition engine unable to compile expression in condition attribute. Error description contains detailed information about compiler error. Cant calculate condition. Error can occur during rule execution if it is unable to calculate expression in condition item. Error description contains detailed information about evaluator error. Invalid parameter name. Attempt to set value of unknown rule additional parameter. Invalid parameter type. Attempt to incorrect value of rule additional parameter. Condition expression isnt logic expression. Error occurs if specified expression in condition item doesnt return logical value. Cant compile enumeration expression. Error can occur during engine initialization. In enumeration item unable to compile expression in in attribute. Error description contains detailed information about compiler error. Cant calculate result of enumeration expression. Error can occur during engine execution. In enumeration item unable to calculate expression in in attribute. Error description contains detailed information about evaluator error. Result of enumeration expression isnt set of elements. Error can occur during engine initialization. Result of expression in in attribute of enumeration item must return set of value. Cant compile variable value. Error can occur during engine initialization. In variable item unable to compile expression in expression attribute. Error description contains detailed information about compiler error. Cant calculate variable value. Error can occur during engine initialization. In variable item unable to calculate variable value. Error description contains detailed information about evaluator error. Variable already declared.
7 8 9 10
11
12
13
14
15
16
17
Error can occur during engine initialization if variable with specified name already declared. Cant compile function call. Error can occur during engine initialization. In call item unable to compile expression in expression attribute. Error description contains detailed information about compiler error. Cant make call. Error occurs because of some internal function error. In most cases, error description gives detailed information about error cause.
22 23 24 25 26
In XML representation <foreach> element hasnt name attribute. Cant connect to repository. There is no rule description. In XML representation <rule> element hasnt description attribute. There is no function description. In XML representation <function> element hasnt description attribute. There is no parameter type. In XML representation <param> or <output> element hasnt type attribute. Invalid parameter type. In XML representation <param> or <output> element contains invalid value in type attribute.
Examples
Exporting cube structure to stream
This example demonstrates how to represent cube structure in Kettle stream and how to generate this stream by Palo Rule step. For this goal, kettle transformation must contain Palo Rule step, which gets data from Palo server and generates data stream. This step must use special rule, which gets Palo connection and Palo cube as input parameters. Generated output stream may be directed to flat file or any another kettle step. For example, this stream may be directed to another Palo rule step that generates new cube by given description. Stream generation rule will be described after detailed description of cube structure stream. So, cube structure description includes information about each cube dimension and each element in dimensions. Each row of structure stream has six fields and can represents information about single dimension element or about elements consolidation. Moreover, special row contains information about all dimensions in cube. This information id needed for cube creation and this row must be a first row in stream. Following table contains detailed information about stream fields. Field dimension element parent type Description Dimension name. If this field is empty, then row contains information about all dimensions of cube. Element name, or list of cube dimension (if row dimension field is empty). Parent element name if row contains information about element consolidation. Element type. Field must contain on of listed values: - numeric - string - consolidation If row contains information about elements consolidation, then this field contains consolidation factor.
factor
<!-- Firstly, declare variable for simple cube accessing --> <declare expression="getCube('connect',cubename)" name="c"/> <!-- Secondly, write special row to output stream. This row contains information about cube dimensions --> <call expression="OutputRow('',getCubeDimensions('connect',cubename).toString,'','',0)"/> <!-- Look each cube dimension --> <foreach in="getNumericRange(0,c.getDimensionCount-1) " name="dimnum"> <!-- Get dimension name and store it in variable --> <declare expression="c.getDimensionName(dimnum)" name="dimname"/>
<!-- Look each cube dimension element --> <foreach in="c.getDimensionElements(dimnum) " name="elem"> <!-- Write information about each element --> <call expression="OutputRow(c.getDimensionName(dimnum),'',elem.getElementName ,elem.getElementType,0)"/> <!-- Write information about consolidated elements --> <foreach in="elem.getChildren" name="child"> <!-- Get consolidation factor and store it to variable --> <declare expression="consolidationFactor('connect',dimname, child.getElementName,elem.getElementName)" name="f"/> <!- Output row --> <call expression="OutputRow(dimname, elem.getElementName, child.getElementName, elem.getElementType,f)"/> </foreach> </foreach> </foreach> </rule>
Cube examples
Followed several examples works on same cubes and demonstrates Palo Engine usage for manipulating data in more complex case. This section describes structures of each cube, but rule examples are described in following sections. All examples are based on one idea. There is a company that works in Europe. But has two subcompanies. First of them works in Germany, second works in another country. Each company has own cube, which contains information about sales. This information is analogue of Palo Demo database, but there are some distinctions. Germany company cube hasnt region dimension. Other company cube has this dimension, but hasnt Germany element in this dimension. Both cubes are filled by data from Palo database; global company cube is filled by data from Germany and other cube. Followed figure illustrates this.
Palo Demo
Germany
Other
Global
Analytics
Fig 10. Data transformation Finally, global company has special cube that contains analytical information based on global cube data. Followed sections describe implementation of each arrow on illustrated figure.
<if condition="Input('dimension')=''"> <declare expression="toStringSet(Input('parent'))" name="dims"/> <foreach in="dims" name="dim"> <!-- Create dimension if it is not regions dimension --> <if condition="dim!='Regions'"> <call expression="createDatabaseDimension('connect',dim)"/> </if> </foreach> <!-- Create Germany cube --> <call expression="createDatabaseCube('connect',cubename,dims-{'Regions'})"/> </if> <!-- If row contains data about dimension element and it is not element from Regions dimension, then create it --> <if condition="Input('dimension')!=''"> <if condition="Input('dimension')!='Regions'"> <if condition="!(getCubeDimensions('connect',cubename).contains(Input('dimension')) )"> <call expression="logError('Does not contains dimension ',Input('dimension'))"/> <call expression="stop"/> </if> <call expression="createDimensionElement('connect',Input('dimension'),Input('element'),Input('type')) "/> <if condition="Input('parent')!=''"> <call expression="createConsolidation('connect',Input('dimension'),Input('element'),Input('parent'),Inp ut('factor')) "/> </if> </if> </if> </rule>
Germany cube filling rule isnt so complex. It gets Demo cube output stream (it may be generated by Palo Output stream), reads data from input stream and checks region value. If region value isnt Germany, then rule writes data to Germany cube. Rule is listed below.
<rule description="" name="ImportGermanyData"> <!-- Rule gets two input additional parameters: Palo connection and cube name in specified Palo database --> <param name="connect" type="connection"/> <param name="cubename" type="string"/> <!-- Rule works with specified cube only and stores it in special variable --> <declare expression="getCube('connect',cubename)" name="c"/> <!-- Check regions value --> <if condition="Input('Regions')='Germany'"> <!-- If region is Germany, then get elements of each dimension and store value to cube --> <declare expression="getDimensionElement('connect','Products',Input('Products'))" name="product"/> <declare expression="getDimensionElement('connect','Months',Input('Months'))" name="month"/> <declare expression="getDimensionElement('connect','Years',Input('Years'))" name="year"/> <declare expression="getDimensionElement('connect','Datatypes',Input('Datatypes'))" name="datatype"/> <declare expression="getDimensionElement('connect','Measures',Input('Measures'))" name="measure"/> <set cell="NumericValue(c[product][month][year][datatype][measure])" expression="Input('Value')"/> </if> </rule>
<if condition="Input('Regions')!='Germany'"> <!-- If region is not Germany, then get elements of each dimension and store value to cube --> <declare expression="getDimensionElement('connect','Products',Input('Products'))" name="product"/> <declare expression="getDimensionElement('connect','Regions',Input('Regions'))" name="region"/> <declare expression="getDimensionElement('connect','Months',Input('Months'))" name="month"/> <declare expression="getDimensionElement('connect','Years',Input('Years'))" name="year"/> <declare expression="getDimensionElement('connect','Datatypes',Input('Datatypes'))" name="datatype"/> <declare expression="getDimensionElement('connect','Measures',Input('Measures'))" name="measure"/> <set cell="NumericValue(c[product][region][month][year][datatype][measure])" expression="Input('Value')"/> </if> </rule>
<set cell=" Output('Products') " expression=" Input('Products') <set cell=" Output('Years') " expression=" Input('Years') "/> <set cell=" Output('Value') " expression=" Input('Value') "/> </rule>
"/>
<foreach in="getDimensionElementsByName('connect','Years') " name="elemYear"> <call expression="OutputRow('Periods','',elemYear.getElementName,'consolidated',0)"/> <foreach in="getDimensionElementsByName('connect','Months') " name="elemMonth"> <if condition="elemMonth.getElementName!='Year'"> <declare name="elemName" expression=" elemYear.getElementName + ' ' + elemMonth.getElementName" />
<call expression="OutputRow('Periods','',elemName,elemMonth.getElementType,0)"/> <if condition="elemMonth.getChildren.getSize!=0"> <call expression="OutputRow('Periods',elemYear.getElementName,elemName,elemMonth.g etElementType,1)"/> </if> <foreach in="elemMonth.getChildren" name="child"> <declare name="childName" expression=" elemYear.getElementName + ' ' + child.getElementName" /> <declare expression="consolidationFactor('connect','Months',child.getElement Name,elemMonth.getElementName)" name="f"/> <call expression="OutputRow('Periods',elemName,childName,child.getElementTy pe,f)"/> </foreach> </if> </foreach> </foreach> <!- Generate Shares dimension --> <call expression="OutputRow('Shares','','Value','numeric',0)"/> <call expression="OutputRow('Shares','','In qtr','numeric',0)"/> <call expression="OutputRow('Shares','','In year','numeric',0)"/> <call expression="OutputRow('Shares','','In all time','numeric',0)"/> <call expression="OutputRow('Shares','','In subregion','numeric',0)"/> <call expression="OutputRow('Shares','','In all regions','numeric',0)"/> <call expression="OutputRow('Shares','','In product group','numeric',0)"/> <call expression="OutputRow('Shares','','In all products','numeric',0)"/> </rule>
Moving data from global cube to analytic cube consists of two steps, but it is possible to unite them to single step, based on transformation rule. First of steps copies data from global cube to analytic cube and second calculates analytic cells in cube. Both steps are based on Palo Rule. Moving data from global cube to analytic cube is produced by rule that gets data from stream and puts it into Palo cube.
<rule name="ImportAnalyticFromGlobal" description=""> <!-- Rule gets cube name as parameter and works with Palo database directly --> <param name="connect" type="connection"/> <param name="cubename" type="string"/> <!- Get cube --> <declare name="c" expression="getCube('connect',cubename)"/> <!- Get elements from each dimension --> <declare name="m" expression="getDimensionElement('connect','Measures',Input('Measures'))"/> <declare name="p" expression="getDimensionElement('connect','Products',Input('Products'))"/> <declare name="r" expression="getDimensionElement('connect','Regions',Input('Regions'))"/> <declare name="pername" expression=" Input('Years')+' ' + Input('Months')"/> <declare name="per" expression="getDimensionElement('connect','Periods',pername)"/> <declare name="s" expression="getDimensionElement('connect','Shares','Value')"/> <!- Write value to cube cell -->