Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more
about your company's sales data, you can build a warehouse that concentrates on sales.
Using this warehouse, you can answer questions like "Who was our best customer for this
item last year?" This ability to define a data warehouse by subject matter, sales in this case
makes the data warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put data
from disparate sources into a consistent format. They must resolve such problems as
naming conflicts and inconsistencies among units of measure. When they achieve this, they
are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not
change. This is logical because the purpose of a warehouse is to enable you to analyze
what has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This
is very much in contrast to online transaction processing (OLTP) systems, where performance
requirements demand that historical data be moved to an archive. A data warehouse's focus
on change over time is what is meant by the term time variant.
2.
How many stages in Datawarehousing?
Data warehouse generally includes two stages
ETL
Report Generation
ETL
Short for extract, transform, load, three database functions that are combined into one tool
Transform -- the process of converting the extracted data from its previous form
into required form
Load -- the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data marts anddata
warehouses and also to convert databases from one format to another format.
It is used to retrieve the data from various operational databases and is transformed into
useful information and finally loaded into Datawarehousing system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing. It is a set of
specification which allows the client applications in retrieving the data for analytical
processing.
It is a specialized tool that sits between a database and user in order to provide various
analyses of the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for Decision support for top level management.
1.
2.
3.
4.
Business Objects
Cognos
Micro strategy
Hyperion
Oracle Express
5.
6. Microsoft Analysis Services
OLTP
Application Oriented
(e.g., purchase order
it is functionality of an
application)
Used to run business
Detailed data
Summarized data
Repetitive access
Ad-hoc access
Current data
Historical data
Clerical User
Knowledge User
Bulk Loading
10
Time invariant
Time variant
11
Normalized data
De-normalized data
12
E R schema
Star schema
3.
What are the types of datawarehousing?
EDW (Enterprise datawarehousing)
It provides a central database for decision support throughout the enterprise
It is a collection of DATAMARTS
DATAMART
It is a subset of Datawarehousing
It is a subject oriented database which supports the needs of individuals depts. in an
organizations
It is called high performance query structure
It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
It is defined as an integrated view of operational database designed to support operational
monitoring
It is a collection of operational data sources designed to support Transaction processing
Data is refreshed near real-time and used for business activity
It is an intermediate between the OLTP and OLAP which helps to create an instance
reports
Entity
Table
Attribute
Column
Primary Key
Alternate Key
Rule
Relationship
Foreign Key
Definition
Comment
Star schema
Snow flake schema
Star flake schema (or) Hybrid schema
Multi star schema
What is Star Schema?
The Star Schema Logical database design which contains a centrally located fact table
surrounded by at least one or more dimension tables
Since the database design looks like a star, hence it is called star schema db
The Dimension table contains Primary keys and the textual descriptions
It contain de-normalized business information
A Fact table contains a composite key and measures
The measure are of types of key performance indicators which are used to evaluate the
enterprise performance in the form of success and failure
Eg: Total revenue , Product sale , Discount given, no of customers
To generate meaningful report the report should contain at least one dimension and one fact
table
The advantage of star schema
Less number of joins
Improve query performance
Slicing down
Easy understanding of data.
Disadvantage:
Require more storage space
Semi Additive - Measures that can be summed up across few dimensions and not
with others
o
Surrogate Key
Joins between fact and dimension tables should be based on surrogate keys
Users should not obtain any information by looking at these keys
These keys should be simple integers
Staging area needs to clean operational data before loading into data warehouse.
Cleaning in the sense your merging data which comes from different source.
Its the area where most of the ETL is done
Data Cleansing
It is used to remove duplications
It is used to correct wrong email addresses
It is used to identify missing data
It used to convert the data types
It is used to capitalize name & addresses.
Types of Dimensions:
There are three types of Dimensions
Confirmed Dimensions
Junk Dimensions Garbage Dimension
Degenerative Dimensions
Slowly changing Dimensions
Garbage Dimension or Junk Dimension
Confirmed is something which can be shared by multiple Fact Tables or multiple Data Marts.
Junk Dimensions is grouping flagged values
Degenerative Dimension is something dimensional in nature but exist fact table.(Invoice No)
Which is neither fact nor strictly dimension attributes. These are useful for some kind of analysis. These are
kept as attributes in fact table called degenerated dimension
Degenerate dimension: A column of the key section of the fact table that does not have the associated dimension
table but used for reporting and analysis, such column is called degenerate dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no, and date in key section and
price, quantity, amount in measure section. In this fact table, bill_no from key section is a single value; it has no
associated dimension table. Instead of creating a
Separate dimension table for that single value, we can Include it in fact table to improve performance. SO here the
column, bill_no is a degenerate dimension or line item dimension.
Informatica Architecture
Q. How can you define a transformation? What are different types of transformations available in Informatica?
A. A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of
transformations that perform specific functions. For example, an Aggregator transformation performs calculations on
groups of data. Below are the various transformations available in Informatica:
Aggregator
Custom
Expression
External Procedure
Filter
Input
Joiner
Lookup
Normalizer
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier
Q. What is a source qualifier? What is meant by Query Override?
A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs
a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier
transformation.
PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default
query is SELET statement containing all the source columns. Source Qualifier has capability to override this default query
by changing the default settings of the transformation properties. The list of selected ports or the order they appear in the
default query should not be changed in overridden query.
Q. What is aggregator transformation?
A. The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike Expression
Transformation, the Aggregator transformation can only be used to perform calculations on groups. The Expression
transformation permits calculations on a rowby-row basis only.
Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the
aggregator transformation outputs the last row of each group unless otherwise specified in the transformation properties.
Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE,
STDDEV, SUM, VARIANCE.
Q. What is Incremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation
can be enabled. When PowerCenter performs incremental aggregation, it passes new source data through the
mapping and uses historical cache data to perform new aggregation calculations incrementally.
Q. How Union Transformation is used?
A. The union transformation is a multiple input group transformation that can be used to merge data from various sources
(or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two
SELECT statements.
Q. Can two flat files be joined with Joiner Transformation?
A. Yes, joiner transformation can be used to join data from two flat file sources.
Q. What is a look up transformation?
A. This transformation is used to lookup data in a flat file or a relational table, view or synonym. It compares lookup
transformation ports (input ports) to the source column values based on the lookup condition. Later returned values can be
passed to other transformations.
Q. Can a lookup be done on Flat Files?
A. Yes.
Q. What is a mapplet?
A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations and
it allows us to reuse that transformation logic in multiple mappings.
Q. What does reusable transformation mean?
A. Reusable transformations can be used multiple times in a mapping. The reusable
transformation is stored as a metadata separate from any other mapping that uses the
transformation. Whenever any changes to a reusable transformation are made, all the mappings where the transformation
is used will be invalidated.
Q. What is update strategy and what are the options for update strategy?
A. Informatica processes the source data row-by-row. By default every row is marked to be inserted in the target table. If
the row has to be updated/inserted based on some logic Update Strategy transformation is used. The condition can be
specified in Update Strategy to mark the processed row for update or insert.
Following options are available for update strategy:
DD_INSERT: If this is used the Update Strategy flags the row for insertion. Equivalent numeric value of DD_INSERT is
0.
DD_UPDATE: If this is used the Update Strategy flags the row for update. Equivalent numeric value of DD_UPDATE is 1.
DD_DELETE: If this is used the Update Strategy flags the row for deletion. Equivalent numeric value of DD_DELETE is 2.
DD_REJECT: If this is used the Update Strategy flags the row for rejection. Equivalent numeric value of DD_REJECT is
3.
The informatica server follows instructions coded into update strategy transformations within the session mapping which
determine how to flag records for insert, update, delete or reject. If we do not choose data driven option setting, the
informatica server ignores all update strategy transformations in the mapping.
Q. What are the types of mapping wizards that are provided in Informatica?
The designer provide two mapping wizard.
1. Getting Started Wizard - Creates mapping to load static facts and dimension tables as well as slowly growing
dimension tables.
2. Slowly Changing Dimensions Wizard - Creates mappings to load slowly changing dimension tables based on the
amount of historical dimension data we want to keep and the method we choose to handle historical dimension data.
Q. What is Load Manager?
A. While running a Workflow, the PowerCenter Server uses the Load Manager
process and the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow tasks.
When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks:
1. Locks the workflow and reads workflow properties.
2. Reads the parameter file and expands workflow variables.
3. Creates the workflow log file.
4. Runs workflow tasks.
5. Distributes sessions to worker servers.
6. Starts the DTM to run sessions.
7. Runs sessions from master servers.
8. Sends post-session email if the DTM terminates abnormally.
When the PowerCenter Server runs a session, the DTM performs the following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is enabled. Checks
Query conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mappings, reader, writer, and transformation threads to extract,
transform, and load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.
Q. What is Data Transformation Manager?
A. After the load manager performs validations for the session, it creates the DTM
process. The DTM process is the second process associated with the session run. The
primary purpose of the DTM process is to create and manage threads that carry out
the session tasks.
The DTM allocates process memory for the session and divide it into buffers. This
is also known as buffer memory. It creates the main thread, which is called the
master thread. The master thread creates and manages all other threads.
If we partition a session, the DTM creates a set of threads for each partition to
allow concurrent processing.. When Informatica server writes messages to the
session log it includes thread type and thread ID.
Following are the types of threads that DTM creates:
Master Thread - Main thread of the DTM process. Creates and manages all other
threads.
Mapping Thread - One Thread to Each Session. Fetches Session and Mapping
Information.
Pre and Post Session Thread - One Thread each to Perform Pre and Post Session
Operations.
Reader Thread - One Thread for Each Partition for Each Source Pipeline.
Writer Thread - One Thread for Each Partition if target exist in the source pipeline
write to the target.
Transformation Thread - One or More Transformation Thread For Each Partition.
Q. What is Session and Batches?
Session - A Session Is A set of instructions that tells the Informatica Server How
And When To Move Data From Sources To Targets. After creating the session, we
can use either the server manager or the command line program pmcmd to start
or stop the session.
Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server. There
Are Two Types Of Batches:
1. Sequential - Run Session One after the Other.
2. Concurrent - Run Session At The Same Time.
Q. How many ways you can update a relational source definition and what
are they?
A. Two ways
1. Edit the definition
2. Reimport the definition
Q. What is a transformation?
A. It is a repository object that generates, modifies or passes data.
Q. What are the designer tools for creating transformations?
A. Mapping designer
Transformation developer
Mapplet designer
Q. In how many ways can you create ports?
A. Two ways
1. Drag the port from another transformation
2. Click the add button on the ports tab.
Q. What are reusable transformations?
A. A transformation that can be reused is called a reusable transformation
They can be created using two methods:
1. Using transformation developer
2. Create normal one and promote it to reusable
Q. Is aggregate cache in aggregator transformation?
A. The aggregator stores data in the aggregate cache until it completes aggregate calculations. When u run a session that
uses an aggregator transformation, the Informatica server creates index and data caches in memory to process the
transformation. If the Informatica server requires more space, it stores overflow values in cache files.
Q. What r the settings that u use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join
Q. What are the join types in joiner transformation?
A. Normal (Default) -- only matching rows from both master and detail
Master outer -- all detail rows and only matching rows from master
Detail outer -- all master rows and only matching rows from detail
Full outer -- all rows from both master and detail (matching or non matching)
Q. What are the joiner caches?
A. When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source
and builds index and data caches based on the master rows. After building the caches, the Joiner transformation reads
records
from the detail source and performs joins.
Q. What r the types of lookup caches?
Static cache: You can configure a static or read-only cache for only lookup table. By default Informatica server creates a
static cache. It caches the lookup table and lookup values in the cache for each row that comes into the transformation.
When the lookup condition is true, the Informatica server does not update the cache while it processes the lookup
transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the target, you can create a
look up transformation to use dynamic cache. The Informatica server dynamically inserts data to the target table.
Persistent cache: You can save the lookup cache files and reuse them the next time the Informatica server processes a
lookup transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized with the lookup table, you can configure the lookup
transformation to rebuild the lookup cache.
Shared cache: You can share the lookup cache between multiple transactions. You can share unnamed cache between
transformations in the same mapping.
Q. What is Transformation?
A: Transformation is a repository object that generates, modifies, or passes data.
Transformation performs specific function. They are two types of transformations:
1. Active
Rows, which are affected during the transformation or can change the no of rows that pass through it. Eg: Aggregator,
Filter, Joiner, Normalizer, Rank, Router, Source qualifier, Update Strategy, ERP Source Qualifier, Advance External
Procedure.
2. Passive
Does not change the number of rows that pass through it. Eg: Expression, External Procedure, Input, Lookup, Stored
Procedure, Output, Sequence Generator, XML Source Qualifier.
Q. What are Options/Type to run a Stored Procedure?
A: Normal: During a session, the stored procedure runs where the
transformation exists in the mapping on a row-by-row basis. This is useful for calling the stored procedure for each row of
data that passes through the mapping, such as running a calculation against an input port. Connected stored procedures
run only in normal mode.
Pre-load of the Source. Before the session retrieves data from the source, the stored procedure runs. This is useful for
verifying the existence of tables or performing joins of data in a temporary table.
Post-load of the Source. After the session retrieves data from the source, the stored procedure runs. This is useful for
removing temporary tables.
Pre-load of the Target. Before the session sends data to the target, the stored procedure runs. This is useful for
verifying target tables or disk space on the target system.
Post-load of the Target. After the session sends data to the target, the stored procedure runs. This is useful for recreating indexes on the database. It must contain at least one Input and one Output port.
Q. What kinds of sources and of targets can be used in Informatica?
Sources may be Flat file, relational db or XML.
Target may be relational tables, XML or flat files.
Q: What is Session Process?
A: The Load Manager process. Starts the session, creates the DTM process, and
sends post-session email when the session completes.
Q. What is DTM process?
A: The DTM process creates threads to initialize the session, read, write, transform
data and handle pre and post-session operations.
Q. What is the different type of tracing levels?
Tracing level represents the amount of information that Informatica Server writes in a log file. Tracing levels store
information about mapping and transformations. There are 4 types of tracing levels supported
1. Normal: It specifies the initialization and status information and summarization of the success rows and target rows
and the information about the skipped rows due to transformation errors.
2. Terse: Specifies Normal + Notification of data
3. Verbose Initialization: In addition to the Normal tracing, specifies the location of the data cache files and index cache
files that are treated and detailed transformation statistics for each and every transformation within the mapping.
4. Verbose Data: Along with verbose initialization records each and every record processed by the informatica server.
Q. TYPES OF DIMENSIONS?
A dimension table consists of the attributes about the facts. Dimensions store the
textual descriptions of the business.
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact table to
which they are joined.
Eg: The date dimension table connected to the sales facts is identical to the date
dimension connected to the inventory facts.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension is
simply a structure that provides a convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. In the
fact table we need to maintain two keys referring to these dimensions. Instead of
that create a junk dimension which has all the combinations of gender and marital
status (cross join gender and marital status table and create a junk table). Now we
can maintain only one key in the fact table.
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and
doesnt have its own dimension table.
Eg: A transactional code in a fact table.
Slowly changing dimension:
Slowly changing dimensions are dimension tables that have slowly increasing
data as well as updates to existing data.
Q. What are the output files that the Informatica server creates during the
session running?
Informatica server log: Informatica server (on UNIX) creates a log for all status and
error messages (default name: pm.server.log). It also creates an error log for error
messages. These files will be created in Informatica home directory
Session log file: Informatica server creates session log file for each session. It writes
information about session into log files such as initialization process, creation of sql
commands for reader and writer threads, errors encountered and load summary. The
amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each target in mapping.
Session detail includes information such as table name, number of rows written or
rejected. You can view this file by double clicking on the session in monitor window.
Performance detail file: This file contains information known as session performance
details which helps you where performance can be improved. To generate this file
select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to
targets.
Control file: Informatica server creates control file and a target file when you run a
session that uses the external loader. The control file contains the information about
the target flat file such as data format and loading instructions for the external
loader.
Post session email: Post session email allows you to automatically communicate
information about a session run to designated recipients. You can create two
different messages. One if the session completed successfully the other if the session
fails.
Indicator file: If you use the flat file as a target, you can configure the Informatica
server to create indicator file. For each target row, the indicator file contains a
number to indicate whether the row was marked for insert, update, delete or reject.
Output file: If session writes to a target file, the Informatica server creates the
target file based on file properties entered in the session property sheet.
Cache files: When the Informatica server creates memory cache it also creates cache
files.
For the following circumstances Informatica server creates index and data cache
files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
Q. What is meant by lookup caches?
A. The Informatica server builds a cache in memory when it processes the first row
of a data in a cached look up transformation. It allocates memory for the cache
based on the amount you configure in the transformation or session properties. The
Informatica server stores condition values in the index cache and output values in
the data cache.
Q. How do you identify existing rows of data in the target table using lookup
transformation?
A. There are two ways to lookup the target table to verify a row exists or not :
1. Use connect dynamic cache lookup and then check the values of NewLookuprow
Output port to decide whether the incoming record already exists in the table / cache
or not.
2. Use Unconnected lookup and call it from an expression transformation and check
the Lookup condition port value (Null/ Not Null) to decide whether the incoming
record already exists in the table or not.
Status code provides error handling for the informatica server during the session. The stored procedure issues a status
code that notifies whether or not stored procedure completed successfully. This value cannot see by the user. It only used
by the informatica server to determine whether to continue running the session or stop.
Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data.
Target definitions. Definitions of database objects or files that contain the target data. Multi-dimensional metadata.
Target definitions that are configured as cubes and dimensions.
Mappings. A set of source and target definitions along with transformations containing business logic that you build into
the transformation. These are the instructions that the Informatica Server uses to transform and move data.
Reusable transformations. Transformations that you can use in multiple mappings.
Mapplets. A set of transformations that you can use in multiple mappings.
Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves
data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and
loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.
Following are the types of metadata that stores in the repository
Database Connections
Global Objects
Multidimensional Metadata
Reusable Transformations
Short cuts
Transformations
Audit Table is nothing but the table which contains about your workflow names and session names. It contains information
about workflow and session status and their details.
WKFL_RUN_ID
WKFL_NME
START_TMST
END_TMST
ROW_INSERT_CNT
ROW_UPDATE_CNT
ROW_DELETE_CNT
ROW_REJECT_CNT
Q. If session fails after loading 10000 records in the target, how can we load 10001th record when we run the
session in the next time?
Select the Recovery Strategy in session properties as Resume from the last check point. Note Set this property
before running the session
O - Overflowed Numeric Data. Numeric data exceeded the specified precision or scale for the column. Bad data, if you
configured the mapping target to reject overflow or truncated data.
N - Null Value. The column contains a null value. Good data. Writer passes it to the target, which rejects it if the target
database does not accept null values.
T - Truncated String Data. String data exceeded a specified precision for the column, so the Integration Service
truncated it. Bad data, if you configured the mapping target to reject overflow or truncated data.
Also to be noted that the second column contains column indicator flag value D which signifies that the Row Indicator is
valid.
Now let us see how Data in a Bad File looks like:
0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T
Q. What is Factless fact table? In which purpose we are using this in our DWH projects? Plz give me the
proper answer?
A Factless fact table contains only the keys but there is no measures or in other
way we can say that it contains no facts. Generally it is used to integrate the fact
tables
Factless fact table contains only foreign keys. We can have two kinds of aggregate
functions from the factless fact one is count and other is distinct count.
2 purposes of factless fact
1. Coverage: to indicate what did NOT happen. Like to
Like: which product did not sell well in a particular region?
2. Event tracking: To know if the event took place or not.
Like: Fact for tracking students attendance will not contain any measures.
Q. What is staging area?
Staging area is nothing but to apply our logic to extract the data from source and
cleansing the data and put the data into meaningful and summaries of the data for
data warehouse.
Q. What is constraint based loading
Constraint based load order defines the order of loading the data into the multiple
targets based on primary and foreign keys constraints.
Q. Why union transformation is active transformation?
the only condition for a transformation to bcum active is row number changes.
Now the thing is how a row number can change. Then there are
2 conditions:
1. either the no of rows coming in and going out is diff.
eg: in case of filter we have the data like
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
and we have a filter condition like dept=4 then the o/p wld
b like
id name dept row_num
1 aa 4 1
3 cc 4 2
So row num changed and it is an active transformation
2. or the order of the row changes
eg: when Union transformation pulls in data, suppose we have
2 sources
sources1:
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
source2:
id name dept row_num
4 aaa 4 4
5 bbb 3 5
6 ccc 4 6
it never restricts the data from any source so the data can
come in any manner
id name dept row_num old row_num
1 aa 4 1 1
4 aaa 4 2 4
5 bbb 3 3 5
2 bb 3 4 2
3 cc 4 5 3
6 ccc 4 6 6
so the row_num are changing . Thus we say that union is an active transformation
Q. What is use of batch file in informatica? How many types of batch file in
informatica?
With the batch file, we can run sessions either in sequential or in concurrently.
Grouping of Sessions is known as Batch.
Two types of batches:
1)Sequential: Runs Sessions one after another.
2)Concurrent: Run the Sessions at the same time.
If u have sessions with source-target dependencies u have to go for sequential
batch to start the sessions one after another. If u have several independent
sessions u can use concurrent batches Which run all the sessions at the same time
Q. What is joiner cache?
When we use the joiner transformation an integration service maintains the cache,
all the records are stored in joiner cache. Joiner caches have 2 types of cache
1.Index cache 2. Joiner cache.
Index cache stores all the port values which are participated in the join condition
and data cache have stored all ports which are not participated in the join
condition.
Q. What is the location of parameter file in Informatica?
$PMBWPARAM
SQL statements executed using the source database connection, after a pipeline is
run write post sql in target table as truncate table name. we have the property in
session truncate option.
Q. What is polling in informatica?
It displays the updated information about the session in the monitor window. The
monitor window displays the status of each session when you poll the Informatica
server.
Q. How i will stop my workflow after 10 errors
Constraint based load order defines the order in which data loads into the multiple
targets based on primary key and foreign key relationship.
Q. What is target load plan
my requirement is to load first in tar2 then tar1 and then finally tar3
for this type of loading to control the extraction of data from source by source
qualifier we use target load plan.
Q. What is meant by data driven.. in which scenario we use that..?
Data driven is available at session level. it says that when we r using update
strategy t/r ,how the integration service fetches the data and how to update/insert
row in the database log.
Data driven is nothing but instruct the source rows that should take action on
target i.e(update,delete,reject,insert). If we use the update strategy transformation
in a mapping then will select the data driven option in session.
Q. How to run workflow in unix?
Q. What is constraint based loading exactly? And how to do this? I think it is when we have primary keyforeign key relationship. Is it correct?
Constraint Based Load order defines load the data into multiple targets depend on
the primary key foreign key relation.
set the option is: Double click the session
Static cache
Dynamic cache
Shared cache
Persistent cache
$ls -lrt
Q. How to import multiple flat files in to single target where there is no common column in the flat files
Create a Oracle source with how much ever column you want and write the join
query in SQL query override. But the column order and data type should be same
as in the SQL query.
Q. How to call unconnected lookup in expression transformation?
:LKP.LKP_NAME(PORTS)
Connected lookup:
CONNECTED LOOKUP:
>> It will participated in data pipeline
>> It contains multiple inputs and multiple outputs.
>> It supported static and dynamic cache.
UNCONNECTED LOOKUP:
>> It will not participated in data pipeline
>> It contains multiple inputs and single output.
>> It supported static cache only.
Q. Types of partitioning in Informatica?
Partition 5 types
1.
2.
3.
4.
5.
1.
2.
3.
4.
5.
Lookup transformation
Aggregator transformation
Rank transformation
Sorter transformation
Joiner transformation
The informatica server queries the look up source based on the look up ports in the
transformation. It compares look up t/r port values to look up source column values
based on the look up condition.
Look up t/r is used to perform the below mentioned tasks:
1) To get a related value.
2) To perform a calculation.
3) To update SCD tables.
Q. How to identify this row for insert and this row for update in dynamic lookup cache?
Based on NEW LOOKUP ROW.. Informatica server indicates which one is insert and
which one is update.
Newlookuprow- 0...no change
Newlookuprow- 1...Insert
Newlookuprow- 2...update
Normalizer transformation
COBOL sources, joiner
XML source qualifier transformation
XML sources
Target definitions
Pre & Post Session stored procedures
Other mapplets
if the source table contain multiple records .if the record specified in the associated
port to insert into lookup cache. it does not find a record in the lookup cache when
it is used find the particular record & change the data in the associated port.
---------------------We set this property when the lookup TRFM uses dynamic cache and the session
property TREAT SOURCE ROWS AS "Insert" has been set.
-------------------This option we use when we want to maintain the history.
If records are not available in target table then it inserts the records in to target
and records are available in target table then it updates the records.
Q. What is an incremental loading? in which situations we will use
incremental loading?
Incremental Loading is an approach. Let suppose you a mapping for load the data
from employee table to a employee_target table on the hire date basis. Again let
suppose you already move the employee data from source to target up to the
employees hire date 31-12-2009.Your organization now want to load data on
employee_target today. Your target already have the data of that employees
having hire date up to 31-12-2009.so you now pickup the source data which are
hiring from 1-1-2010 to till date. That's why you needn't take the data before than
that date, if you do that wrongly it is overhead for loading data again in target
which is already exists. So in source qualifier you filter the records as per hire date
and you can also parameterized the hire date that help from which date you want
to load data upon target.
This is the concept of Incremental loading.
Q. What is target update override?
By Default the integration service updates the target based on key columns. But we
might want to update non-key columns also, at that point of time we can override
the
UPDATE statement for each target in the mapping. The target override affects only
when the source rows are marked as update by an update strategy in the mapping.
2
4
5
6
Dense Rank
select dense_rank() over (partition by empno order by sal) from emp
and dense rank gives
1
2
2
3
4
5
Q. What is the incremental aggregation?
The first time you run an upgraded session using incremental aggregation, the
Integration Service upgrades the index and data cache files. If you want to partition
a session using a mapping with incremental aggregation, the Integration Service
realigns the index and data cache files.
Q. What is session parameter?
Parameter file is a text file where we can define the values to the parameters
.session parameters are used for assign the database connection values
Q. What is mapping parameter?
A mapping parameter represents a constant value that can be defined before
mapping run. A mapping parameter defines a parameter file which is saved with an
extension.prm a mapping parameter reuse the various constant values.
Q. What is parameter file?
A parameter file can be a text file. Parameter file is to define the values for
parameters and variables used in a session. A parameter file is a file created by
text editor such as word pad or notepad. You can define the following values in
parameter file
Mapping parameters
Mapping variables
Session parameters
Q. What is session override?
Session override is an option in informatica at session level. Here we can manually
give a sql query which is issued to the database when the session runs. It is
nothing but over riding the default sql which is generated by a particular
transformation at mapping level.
Q. What are the diff. b/w informatica versions 8.1.1 and 8.6.1?
Little change in the Administrator Console. In 8.1.1 we can do all the creation of IS
and repository Service, web service, Domain, node, grid ( if we have licensed
version),In 8.6.1 the Informatica Admin console we can manage both Domain page
and security page. Domain Page means all the above like creation of IS and
repository Service, web service, Domain, node, grid ( if we have licensed version)
etc. Security page means creation of users, privileges, LDAP configuration, Export
Import user and Privileges etc.
Q. What are the uses of a Parameter file?
Parameter file is one which contains the values of mapping variables.
type this in notepad.save it .
foldername.sessionname
$$inputvalue1=
--------------------------------Parameter files are created with an extension of .PRM
These are created to pass values those can be changed for Mapping Parameter and
Session Parameter during mapping run.
Mapping Parameters:
A Parameter is defined in a parameter file for which a Parameter is create already in
the Mapping with Data Type , Precision and scale.
The Mapping parameter file syntax (xxxx.prm).
[FolderName.WF:WorkFlowName.ST:SessionName]
$$ParameterName1=Value
$$ParameterName2=Value
After that we have to select the properties Tab of Session and Set Parameter file
name including physical path of this xxxx.prm file.
Session Parameters:
The Session Parameter files syntax (yyyy.prm).
[FolderName.SessionName]
$InputFileValue1=Path of the source Flat file
After that we have to select the properties Tab of Session and Set Parameter file
name including physical path of this yyyy.prm file.
Do following changes in Mapping Tab of Source Qualifier's
Properties section
Attributes
values
Source file Type ---------> Direct
Source File Directory --------> Empty
Source File Name
--------> $InputFileValue1
Q. What is the default data driven operation in informatica?
This is default option for update strategy transformation.
The integration service follows instructions coded in update strategy within session
mapping determine how to flag records for insert,delete,update,reject. If you do not
data driven option setting, the integration service ignores update strategy
transformations in the mapping.
Q. What is threshold error in informatica?
When the target is used by the update strategy DD_REJECT,DD_UPDATE and some
limited count, then if it the number of rejected records exceed the count then the
session ends with failed status. This error is called Threshold Error.
Q. SO many times i saw "$PM parser error ". What is meant by PM?
PM: POWER MART
1) Parsing error will come for the input parameter to the lookup.
2) Informatica is not able to resolve the input parameter CLASS for your lookup.
3) Check the Port CLASS exists as either input port or a variable port in your
expression.
4) Check data type of CLASS and the data type of input parameter for your lookup.
Q. What is a candidate key?
A candidate key is a combination of attributes that can be uniquely used to identify
a database record without any extraneous data (unique). Each table may have one
or more candidate keys. One of these candidate keys is selected as the table
primary key else are called Alternate Key.
Q. What is the difference between Bitmap and Btree index?
Bitmap index is used for repeating values.
ex: Gender: male/female
Account status:Active/Inactive
Btree index is used for unique values.
ex: empid.
The crontab (cron derives from chronos, Greek for time; tab stands for table)
command, found in Unix and Unix-like operating systems, is used to schedule
commands to be executed periodically. To see what crontabs are currently running
on your system, you can open a terminal and run:
sudo crontab -l
To edit the list of cronjobs you can run:
sudo crontab -e
This will open a the default editor (could be vi or pico, if you want you can change
the default editor) to let us manipulate the crontab. If you save and exit the editor,
all your cronjobs are saved into crontab. Cronjobs are written in the following
format:
* * * * * /bin/execute/this/script.sh
Scheduling explained
As you can see there are 5 stars. The stars represent different date parts in the
following order:
1.
minute (from 0 to 59)
2.
hour (from 0 to 23)
3.
day of month (from 1 to 31)
4.
month (from 1 to 12)
5.
day of week (from 0 to 6) (0=Sunday)
Execute every minute
If you leave the star, or asterisk, it means every. Maybe
that's a bit unclear. Let's use the the previous example
again:
* * * * * /bin/execute/this/script.sh
They are all still asterisks! So this means
execute /bin/execute/this/script.sh:
1.
every minute
2.
of every hour
3.
of every day of the month
4.
of every month
5.
and every day in the week.
In short: This script is being executed every minute.
Without exception.
Execute every Friday 1AM
So if we want to schedule the script to run at 1AM every
Friday, we would need the following cronjob:
0 1 * * 5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1.
minute: 0
2.
of hour: 1
3.
of day of month: * (every day of month)
4.
of month: * (every month)
5.
and weekday: 5 (=Friday)
Execute on weekdays 1AM
So if we want to schedule the script to run at 1AM every Friday, we would need the
following cronjob:
0 1 * * 1-5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1.
minute: 0
2.
of hour: 1
3.
of day of month: * (every day of month)
4.
of month: * (every month)
5.
and weekday: 1-5 (=Monday til Friday)
Execute 10 past after every hour on the 1st of every month
Here's another one, just for practicing
10 * 1 * * /bin/execute/this/script.sh
Fair enough, it takes some getting used to, but it offers great flexibility.
Expression Transformation
Sorter Transformations
Aggregator Transformations
Filter Transformation
Union Transformation
Joiner Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Update Strategy Transformation
The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The
ODS may further become the enterprise shared operational database, allowing operational systems that are being
reengineered to use the ODS as there operation databases.
Decision Task
Event-Raise
Event- Wait
Timer Task
Link Task
Domains
Nodes
Services
Q. WHAT IS VERSIONING?
Its used to keep history of changes done on the mappings and workflows
1. Check in: You check in when you are done with your changes so that everyone can see those changes.
2. Check out: You check out from the main stream when you want to make any change to the mapping/workflow.
3. Version history: It will show you all the changes made and who made it.
The integration service increments the generated key (GK) sequence number each time it process a source row. When the
source row contains a multiple-occurring column or a multiple-occurring group of columns, the normalizer transformation
returns a row for each occurrence. Each row contains the same generated key value.
The normalizer transformation has a generated column ID (GCID) port for each multiple-occurring column. The GCID is an
index for the instance of the multiple-occurring data. For example, if a column occurs 3 times in a source record, the
normalizer returns a value of 1, 2 or 3 in the generated column ID.
TABLES
VIEWS
INDEXES
SYNONYMS
SEQUENCES
TABLESPACES
Q. WHAT IS @@ERROR?
The @@ERROR automatic variable returns the error code of the last Transact-SQL statement. If there was no error,
@@ERROR returns zero. Because @@ERROR is reset after each Transact-SQL statement, it must be saved to a variable if
it is needed to process it further after checking it.
Surrogate key:
Query processing is fast.
It is only numeric
Developer develops the surrogate key using sequence generator transformation.
Eg: 12453
1.
2.
3.
4.
Primary key:
Query processing is slow
Can be alpha numeric
Source system gives the primary key.
Eg: C10999
Q. How does the server recognize the source and target databases?
If it is relational - By using ODBC connection
FTP connection - By using flat file
B-tree index
B-tree cluster index
Hash cluster index
Reverse key index
Bitmap index
Function Based index
$> ps ef
Q. How can i display only and only hidden file in the current directory?
ls -a|grep "^\."
Q. How to display the first 10 lines of a file?
# head -10 logfile
Q. How to display the last 10 lines of a file?
# tail -10 logfile
TC_CONTINUE_TRANSACTION: The Integration Service does not perform any transaction change for this row. This is
the default value of the expression.
TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new transaction, and writes the
current row to the target. The current row is in the new transaction.
TC_COMMIT_AFTER: The Integration Service writes the current row to the target, commits the transaction, and begins a
new transaction. The current row is in the committed transaction.
TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction, begins a new transaction, and writes
the current row to the target. The current row is in the new transaction.
TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target, rolls back the transaction, and
begins a new transaction. The current row is in the rolled back transaction.
Q. WHAT ARE THE DIFFERENCE BETWEEN DDL, DML AND DCL COMMANDS?
DDL is Data Definition Language statements
RANK CACHE
Sample Rank Mapping
When the Power Center Server runs a session with a Rank transformation, it compares an input row with rows in the data
cache. If the input row out-ranks a Stored row, the Power Center Server replaces the stored row with the input row.
Example: Power Center caches the first 5 rows if we are finding top 5 salaried Employees. When 6th row is read, it
compares it with 5 rows in cache and places it in Cache is needed.
1) RANK INDEX CACHE:
The index cache holds group information from the group by ports. If we are Using Group By on DEPTNO, then this cache
stores values 10, 20, 30 etc.
All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO
2) RANK DATA CACHE:
It holds row data until the Power Center Server completes the ranking and is generally larger than the index cache. To
reduce the data cache size, connect only the necessary input/output ports to subsequent transformations.
All Variable ports if there, Rank Port, All ports going out from RANK Transformations are stored in RANK DATA CACHE.
Example: All ports except DEPTNO In our mapping example.
Aggregator Caches
1. The Power Center Server stores data in the aggregate cache until it completes Aggregate calculations.
2. It stores group values in an index cache and row data in the data cache. If the Power Center Server requires more space, it
stores overflow values in cache files.
Note: The Power Center Server uses memory to process an Aggregator transformation with sorted ports. It does not use
cache memory. We do not need to configure cache memory for Aggregator transformations that use sorted ports.
1) Aggregator Index Cache:
The index cache holds group information from the group by ports. If we are using Group By on DEPTNO, then this cache
stores values 10, 20, 30 etc.
JOINER CACHES
Joiner always caches the MASTER table. We cannot disable caching. It builds Index cache and Data Cache based on
MASTER table.
1) Joiner Index Cache:
All Columns of MASTER table used in Join condition are in JOINER INDEX CACHE.
Example: DEPTNO in our mapping.
2) Joiner Data Cache:
Master column not in join condition and used for output to other transformation or target table are in Data Cache.
Example: DNAME and LOC in our mapping example.
Unconnected Lookup
Supports user-defined
default values
Cache Comparison
Persistence and Dynamic Caches
Dynamic
1) When you use a dynamic cache, the Informatica Server updates the lookup
cache as it passes rows to the target.
2) In Dynamic, we can update catch will new data also.
3) Dynamic cache, Not Reusable.
(When we need updated cache data, That only we need Dynamic Cache)
Persistent
1) A Lookup transformation to use a non-persistent or persistent cache. The
PowerCenter Server saves or deletes lookup cache files after a successful session
based on the Lookup Cache Persistent property.
2) Persistent, we are not able to update the catch with new data.
3) Persistent catch is Reusable.
(When we need previous cache data, that only we need Persistent Cache)
Informatica - Transformations
In Informatica, Transformations help to transform the source data according to the
requirements of target system and it ensures the quality of the data being loaded
into target.
Transformations are of two types: Active and Passive.
Active Transformation
An active transformation can change the number of rows that pass through it from
source to target. (i.e) It eliminates rows that do not meet the condition in
transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through it
(i.e) It passes all rows through the transformation.
Transformations can be Connected or Unconnected.
Connected Transformation
Connected transformation is connected to other transformations or directly to
target table in the mapping.
Unconnected Transformation
An unconnected transformation is not connected to other transformations in the
mapping. It is called within another transformation, and returns a value to that
transformation.
Following are the list of Transformations available in Informatica:
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
In the following pages, we will explain all the above Informatica Transformations
and their significances in the ETL process in detail.
==============================================================================
Aggregator Transformation
Aggregator transformation is an Active and Connected transformation.
This transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on
multiple rows or groups).
For example, to calculate total of daily sales or to calculate average of monthly or yearly sales. Aggregate functions such
as AVG, FIRST, COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.
==============================================================================
Expression Transformation
Expression transformation is a Passive and Connected transformation.
This can be used to calculate values in a single row before writing to the target.
For example, to calculate discount of each product
or to concatenate first and last names
or to convert date to a string field.
==============================================================================
Filter Transformation
Filter transformation is an Active and Connected transformation.
This can be used to filter rows in a mapping that do not meet the condition.
For example,
To know all the employees who are working in Department 10 or
To find out the products that falls between the rate category $500 and $1000.
==============================================================================
Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming from two
different locations or from same location. For example, to join a flat file and a relational source or to join two flat files or to
join a relational source and a XML source.
In order to join two sources, there must be at least one matching port. While joining two sources it is a must to specify
one source as master and the other as detail.
It is used to update data in target table, either to maintain history of data or recent changes.
You can specify how to treat source rows in table, insert, update, delete or data driven.
==============================================================================
XML Source Qualifier Transformation
XML Source Qualifier is a Passive and Connected transformation.
XML Source Qualifier is used only with an XML source definition.
It represents the data elements that the Informatica Server reads when it executes a session with XML sources.
==============================================================================
Constraint-Based Loading
In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the
Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the
Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables.
Constraint-based loading depends on the following requirements:
Active source: Related target tables must have the same active source.
Key relationships: Target tables must have key relationships.
Target connection groups: Targets must be in one target connection group.
Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint based
loading.
Active Source:
When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those
tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping
contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets
receive data from different active sources, the Integration Service reverts to normal loading for both targets. The third
pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the
Normalizer), the Integration Service performs constraint-based loading: loading the primary key table first, then the
foreign key table.
Key Relationships:
When target tables have no key relationships, the Integration Service does not perform constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example,
you have one target containing a primary key and a foreign key related to the primary key in a second target. The second
target also contains a foreign key that references the primary key in the first target. The Integration Service cannot
enforce constraint-based loading for these tables. It reverts to a normal load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the same target connection group. If you want to
specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the
tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different
target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow. To
verify that all targets are in the same target connection group, complete the following tasks:
Verify all targets are in the same target load order group and receive data from the same active source.
Use the default partition properties and do not add partitions or partition points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session properties.
Choose normal mode for the target load type for all targets in the session properties.
Treat Rows as Insert:
Use constraint-based loading when the session option Treat Source Rows As is set to insert. You might get inconsistent
data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading.
When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split
the mapping using one of the following options:
Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to load
the primary table.
Perform inserts in one mapping and updates in another mapping.
Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order
the Integration Service reads the sources in each target load order group in the mapping. A target load order group is a
collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint based loading
establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a
single source qualifier.
Example
The following mapping is configured to perform constraint-based loading:
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3
has a primary key that T_4 references as a foreign key.
Since these tables receive records from a single active source, SQ_A, theIntegration Service loads rows to the target
in the following order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced
by T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are
not loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a
primary key in T_3.After loading the first set of targets, the Integration Service begins reading source B. If there are no
key relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source,
the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order:
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and
you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same
database connection for each target and you use the default partition properties. The Integration Service includes T_5 and
T_6 in a different target connection group because they are in a different target load order group from the first four
targets.
Enabling Constraint-Based Loading:
When you enable constraint-based loading, the Integration Service orders the target load on a row-by-row basis. To enable
constraint-based loading:
1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property.
2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.
3. Click OK.
When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan for sources within the
mapplet.
Setting the Target Load Order
You can configure the target load order for a mapping containing any type of target definition. In the Designer, you can set
the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. A
target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping.
You can set the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables
that have the primary key and foreign key constraints.
The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups
sequentially.
To specify the order in which the Integration Service sends data to targets, create one source qualifier for each target
within a mapping. To set the target load order, you then determine in which order the Integration Service reads each
source in the mapping.
The following figure shows two target load order groups in one mapping:
In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS. The second target load order
group includes all other objects in the mapping, including the TOTAL_ORDERS target. The Integration Service processes
the first target load order group, and then the second target load order group.
When it processes the second target load order group, it reads data from both sources at the same time.
To set the target load order:
Create a mapping that contains multiple target load order groups.
Click Mappings > Target Load Plan.
The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that receive data
from each source qualifier.
Select a source qualifier from the list.
Click the Up and Down buttons to move the source qualifier within the load order.
Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.
Count: Integer and small integer data types are valid only.
Max: All transformation data types except binary data type are valid.
Min: All transformation data types except binary data type are valid.
Variable Functions
Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores rows marked for update, delete,
or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows marked for update, delete,
or reject. Aggregation type set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the variable value when a row is marked for
insertion, and subtracts one when the row is Marked for deletion. It ignores rows marked for update or reject. Aggregation
type set to Count.
SetVariable: Sets the variable to the configured value. At the end of a session, it compares the final current value of the
variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the
repository.
Creating Mapping Parameters and Variables
In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the Mapplet Designer, click Mapplet >
Parameters and Variables.
Select Type and Data type. Select Aggregation type for mapping variables.
PARAMETER FILE
A parameter file is a list of parameters and associated values for a workflow, worklet, or session.
Parameter files provide flexibility to change these variables each time we run a workflow or session.
We can create multiple parameter files and change the file we use for a session or workflow. We can create a parameter
file using a text editor such as WordPad or Notepad.
Enter the parameter file name and directory in the workflow or session properties.
A parameter file contains the following types of parameters and variables:
Workflow variable: References values and records information in a workflow.
Worklet variable: References values and records information in a worklet. Use predefined worklet variables in a parent
workflow, but we cannot use workflow variables from the parent workflow in a worklet.
Session parameter: Defines a value that can change from session to session, such as a database connection or file
name.
Mapping parameter and Mapping variable
USING A PARAMETER FILE
Parameter files contain several sections preceded by a heading. The heading identifies the Integration Service, Integration
Service process, workflow, worklet, or session to which we want to assign parameters or variables.
Make session and workflow.
Give connection information for source and target table.
Run workflow and see result.
Sample Parameter File for Our example:
In the parameter file, folder and session names are case sensitive.
Create a text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example]
$$Bonus=1000
$$var_max=500
$$var_min=1200
$$var_count=0
CONFIGURING PARAMTER FILE
We can specify the parameter file name and directory in the workflow or session properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field.
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.
3. Enter the parameter directory and name in the Parameter Filename field.
4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click OK.
Mapplet
Sample:
D:\EMP1.txt
E:\EMP2.txt
E:\FILES\DWH\EMP3.txt and so on
3. Now make a session and in Source file name and Source File Directory location fields, give the name and location of
above created file.
4. In Source file type field, select Indirect.
5. Click Apply.
6. Validate Session
7. Make Workflow. Save it to repository and run.
Incremental Aggregation
When we enable the session option-> Incremental Aggregation the Integration Service performs incremental
aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations
incrementally.
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If
the source changes incrementally and you can capture changes, you can configure the session to process those changes.
This allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source
and recalculate the same data each time you run the session.
For example, you might have a session using a source that receives new data every day. You can capture those
incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the
flow of data. You then enable incremental aggregation.
When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This
allows the Integration Service to read and store the necessary aggregate data. On March 2, when you run the session
again, you filter out all the records except those time-stamped March 2. The Integration Service then processes the new
data and updates the target accordingly. Consider using incremental aggregation in the following circumstances:
You can capture new source data. Use incremental aggregation when you can capture new source data each time you run
the session. Use a Stored Procedure or Filter transformation to process new data.
Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not
significantly change the target. If processing the incrementally changed source alters more than half the existing target,
the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with
complete source data.
Note: Do not use incremental aggregation if the mapping contains percentile or median functions. The Integration Service
uses system memory to process these functions in addition to the cache memory you configure in the session properties.
As a result, the Integration Service does not store incremental aggregation values for percentile and median functions in
disk caches.
Integration Service Processing for Incremental Aggregation
(i)The first time you run an incremental aggregation session, the Integration Service processes the entire source. At the
end of the session, the Integration Service stores aggregate data from that session run in two files, the index file and the
data file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation
properties.
(ii)Each subsequent time you run the session with incremental aggregation, you use the incremental source changes in the
session. For each input record, the Integration Service checks historical information in the index file for a corresponding
group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the
aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration
Service creates a new group and saves the record data.
(iii)When writing to the target, the Integration Service applies the changes to the existing target. It saves modified
aggregate data in the index and data files to be used as historical data the next time you run the session.
(iv) If the source changes significantly and you want the Integration Service to continue saving aggregate data for future
incremental changes, configure the Integration Service to overwrite existing aggregate data with new aggregate data.
Each subsequent time you run a session with incremental aggregation, the Integration Service creates a backup of the
incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for
two sets of the files.
(v)When you partition a session that uses incremental aggregation, the Integration Service creates one set of cache files
for each partition.
The Integration Service creates new aggregate data, instead of using historical data, when you perform one of the
following tasks:
Save a new version of the mapping.
Configure the session to reinitialize the aggregate cache.
Move the aggregate files without correcting the configured path or directory for the files in the session properties.
Change the configured path or directory for the aggregate files without moving the files to the new location.
Delete cache files.
Decrease the number of partitions.
When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files.
Preparing for Incremental Aggregation:
When you use incremental aggregation, you need to configure both mapping and session properties:
Implement mapping logic or filter to remove pre-existing data.
Configure the session for incremental aggregation and verify that the file directory has enough disk space for the
aggregate files.
Configuring the Mapping
Before enabling incremental aggregation, you must capture changes in source data. You can use a Filter or Stored
Procedure transformation in the mapping to remove pre-existing source data during a session.
Configuring the Session
Use the following guidelines when you configure the session for incremental aggregation:
(i) Verify the location where you want to store the aggregate files.
The index and data files grow in proportion to the source data. Be sure the cache directory has enough disk space to store
historical data for the session.
When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then, enter the
appropriate directory for the process variable, $PMCacheDir, in the Workflow Manager. You can enter session-specific
directories for the index and data files. However, by using the process variable for all sessions using incremental
aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir.
Changing the cache directory without moving the files causes the Integration Service to reinitialize the aggregate cache
and gather new aggregate data.
In a grid, Integration Services rebuild incremental aggregation files they cannot find. When an Integration Service rebuilds
incremental aggregation files, it loses aggregate history.
(ii) Verify the incremental aggregation settings in the session properties.
You can configure the session for incremental aggregation in the Performance settings on the Properties tab.
You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the Workflow
Manager displays a warning indicating the Integration Service overwrites the existing cache and a reminder to clear this
option after running the session.
TASKS
The Workflow Manager contains many types of tasks to help you build workflows and worklets. We can create reusable
tasks in the Task Developer.
Types of tasks:
Task Type
Reusable or not
Session
Task Developer
Yes
Command
Worklet Designer
Event-Raise
Workflow Designer No
Event-Wait
Worklet Designer
Yes
No
Timer
No
Decision
No
Assignment
No
Control
No
SESSION TASK
A session is a set of instructions that tells the Power Center Server how and when to move data from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the Session tasks sequentially or concurrently,
depending on our needs.
The Power Center Server creates several files and in-memory caches depending on the transformations and options used
in the session.
EMAIL TASK
The Workflow Manager provides an Email task that allows us to send email during a workflow.
Created by Administrator usually and we just drag and use it in our mapping.
Steps:
1. In the Task Developer or Workflow Designer, choose Tasks-Create.
2. Select an Email task and enter a name for the task. Click Create.
3. Click Done.
4. Double-click the Email task in the workspace. The Edit Tasks dialog box appears.
5. Click the Properties tab.
6. Enter the fully qualified email address of the mail recipient in the Email User Name field.
7. Enter the subject of the email in the Email Subject field. Or, you can leave this field blank.
8. Click the Open button in the Email Text field to open the Email Editor.
9. Click OK twice to save your changes.
Example: To send an email when a session completes:
Steps:
1. Create a workflow wf_sample_email
2. Drag any session task to workspace.
3. Edit Session task and go to Components tab.
4. See On Success Email Option there and configure it.
5. In Type select reusable or Non-reusable.
6. In Value, select the email task to be used.
7. Click Apply -> Ok.
8. Validate workflow and Repository -> Save
9. We can also drag the email task and use as per need.
10.
We can set the option to send email on success or failure in components tab of a session task.
COMMAND TASK
The Command task allows us to specify one or more shell commands in UNIX or DOS commands in Windows to run during
the workflow.
For example, we can specify shell commands in the Command task to delete reject files, copy a file, or archive target files.
Ways of using command task:
1. Standalone Command task: We can use a Command task anywhere in the workflow or worklet to run shell commands.
2. Pre- and post-session shell command: We can call a Command task as the pre- or post-session shell command for a
Session task. This is done in COMPONENTS TAB of a session. We can run it in Pre-Session Command or Post Session
Success Command or Post Session Failure Command. Select the Value and Type option as we did in Email task.
Example: to copy a file sample.txt from D drive to E.
Command: COPY D:\sample.txt E:\ in windows
Steps for creating command task:
13.
14.
15.
Description
Fail Me
Fail Parent
Stop Parent
Abort Parent
Fail Top-Level WF
Stop Top-Level WF
To use an Assignment task in the workflow, first create and add the
Assignment task to the workflow. Then configure the Assignment task to assign values or expressions to user-defined
variables.
Scheduler
We can schedule a workflow to run continuously, repeat at a given time or interval, or we can manually start a workflow.
The Integration Service runs a scheduled workflow as configured.
By default, the workflow runs on demand. We can change the schedule settings by editing the scheduler. If we change
schedule settings, the Integration Service reschedules the workflow according to the new settings.
If we delete a folder, the Integration Service removes workflows from the schedule.
If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all
workflows.
For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of
scheduling settings for workflows in the folder.
Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow.
When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the
workflows valid, we must edit them and replace the missing scheduler.
Steps:
1.
2.
3.
4.
5.
6.
Run on Demand
2.
Run Continuously
3.
1. Run on Demand:
Integration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs the workflow as soon as the service initializes. The Integration Service then starts the next run of
the workflow as soon as it finishes the previous run.
3. Run on Server initialization
Integration Service runs the workflow as soon as the service is initialized. The Integration Service then starts the next run
of the workflow according to settings in Schedule Options.
Schedule options for Run on Server initialization:
Customized Repeat: Integration Service runs the workflow on the dates and times specified in the Repeat
dialog box.
Start options for Run on Server initialization:
Start Date
Start Time
End options for Run on Server initialization:
2.
3.
In the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an existing reusable
scheduler for the workflow.
4.
5.
6.
Click the right side of the Scheduler field to edit scheduling settings for the non- reusable scheduler
7.
8.
9.
Click Ok.
Points to Ponder:
To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose
Unscheduled Workflow.
To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose
Schedule Workflow.
You can push transformation logic to the source or target database using pushdown optimization. When you run a session
configured for pushdown optimization, theIntegration Service translates the transformation logic into SQL queries
and sends the SQL queries to the database. The source or target database executes the SQL queries to process the
transformations.
The amount of transformation logic you can push to the database depends on the database, transformation logic, and
mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a
database.
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can
push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related
to pushdown optimization.
The following figure shows a mapping containing transformation logic that can be pushed to the source database:
This mapping contains a Filter transformation that filters out all items except those with an ID greater than 1005. The
Integration Service can push the transformation logic to the database. It generates the following SQL statement to process
the transformation logic:
The Integration Service generates an INSERT SELECT statement to get the ID, NAME, and DESCRIPTION columns from
the source table. It filters the data using a WHERE clause. The Integration Service does not extract data from the database
at this time.