Sei sulla pagina 1di 3

Generating DB based Dimension table PKs

By Saurabh Dua (Tata Consultancy Services Ltd.)

Introduction Informatica Sequence Generator (SG) is widely used in Dimension mappings to generate primary keys (PKs) of Dimension tables which are used as Surrogate keys in corresponding Fact tables. Directly using Sequence Generator makes it very easy to generate PKs, since minimal coding is required but it poses a certain set of limitations along with, like 1. The value of next PK to be generated is stored in Informatica, any back end insert in Dimension leads to code being out of sync with database keys. 2. 2,147,483,647 is the upper limit up to which the Sequence Generator can generate PKs. However in real time scenarios there can be certain dimensions like bank accounts which need more dimension keys. Preface This document aims to make the dimension table key generation process dependent on actual database values rather than a stored value in Informatica. The below example uses a Oracle Dimension table in SCD2 mode to demonstrate the Cached Lookup Approach: CREATE TABLE D_ABC( PK NUMBER(38) PRIMARY KEY , BIZKEY VARCHAR2(20));
Sequence Generator s1 NEXTVAL

Source Qualifier

Data Transformation e.g. exp_xyz

Dimension table Insert / Update

Unconnected Cached Lookup e.g. lkp_Max_PK

Steps: 1. All these steps assume the data type of PK and related ports as decimal 38. 2. Fetch Max value of PK in SQl Override of Cached lookup SQL : SELECT NVL(MAX(PK),0) as MAXPK,1 as DUMMY FROM D_ABC There will be just one row returned in this SQL, hence lookup cache will contain only this row. 3. For every INSERT type of row , call this lookup in exp_xyz from a variable port say v_lkp_maxpk = :LKP.lkp_Max_PK(1) 4. Create a Sequence Generator with these properties: Start Value = 0,Current Value = 1, Reset = Yes , Cycle = No

5. In exp_xyz , connect NEXTVAL from this Sequence Generator to an input port say in_increment 6. Create an output port in exp_xyz : out_Generated_PK = v_lkp_maxpk + in_increment 7. Connect this output port for PK of INSERT type of rows. Advantages: 1. The technique works even in partitioned mappings. 2. If required, backend inserts now possible in Dimension tables without changing stored value of key in Informatica. 3. Overcomes Sequence Generator upper limit of 2,147,483,647 keys generation for entire table.

Limitation:
1. Even though the Sequence Generator upper limit of 2,147,483,647 for entire table is overcome, the technique still has limit of processing 2,147,483,647 new dimensions in a single session run. Workaround to the limitation: Use a variable post instead of Sequence Generator to increment the lookup MAX dimension key, but now the mapping session cannot work in partitioned mode since variable port will start from 1 in all the partitions while sequence generator understood partition concept. Summary: Large data warehouses need to hold many dimension keys. This technique can prove useful with no performance impact at all. Oracle creates automatic index on the primary key, hence fetching MAX is not time consuming. The reliability and robustness of this technique makes it better than using SG alone in any dimension mapping.

Potrebbero piacerti anche