Sei sulla pagina 1di 3

The first part of an ETL process involves extracting the data from the source systems.

Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization/format. Common data source formats are relational databases and flat files, but may include non-relational database structures The transform stage applies a series of rules or functions to the extracted data from the source to derive the data for loading into the end target. Some data sources will require very little or even no manipulation of data. The load phase loads the data into the end target, usually the data warehouse (DW). Depending on the requirements of the organization, this process varies widely. Some data warehouses may overwrite existing information with cumulative, updated data every week, while other DW (or even other parts of the same DW) may add new data in a historized form, for example, hourly. The typical real-life ETL cycle consists of the following execution steps: 1. 2. 3. 4. 5. Cycle initiation Build reference data Extract (from sources) Validate Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates) 6. Stage (load into staging tables, if used) 7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair) 8. Publish (to target tables) 9. Archive 10. Clean up

Informatica can communicate with all major databases, can move/transform data
between them. It can move huge volumes of data in a very effective way. It can throttle the transactions (do big updates in small chunks to avoid long locking and filling the transactional log). It can effectively do joins between tables in different databases on different servers. The tasks are performed by Informatica Server (Unix or MS Windows). You get a client application called "Server Manager" to work with the server.

Define transformation process, known as mapping. (Designer)

Define run-time properties for a mapping, known as sessions (Workflow Manager) Monitor execution of sessions (Workflow Monitor) Manage repository, useful for administrators (Repository Manager) Report Metadata (Metadata Reporter)

Informatica PowerCenter includes following type of repositories:Standalone Repository, which functions individually.Global Repository which is a centralized repository in a domain and it also contain shared objects across the repositories in a domain. Local Repository is one which is within a domain.Versioned Repository can be either local or global but it allows version control. A code page contains encoding to specify characters in a set of one or more languages and is selected based on source of the data. A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. Each transformation has rules for configuring and connecting in a mapping. Transformation is created to use once in a mapping or reusable transformations can be created to use in multiple mappings.Eg. Aggregator transformation performs calculations on groups of data. There are two types of loading in informatica, normal loading and bulk loading.In normal loading record by record are loaded and writes log for that. In this longer time is needed to load data to the target.In bulk loading number of records are loaded at a time to target database. It takes less time to load data to the target than in normal loading. The designer provide two mapping wizard. Getting Started Wizard creates mapping to load static facts and dimension tables as well as slowly growing dimension tables.Slowly Changing Dimensions Wizard, creates mappings to load slowly changing dimension tables based on the amount of historical dimension data we want to keep and the method we choose to handle historical dimension data.

SQL Server Integration Services (SSIS) is a component of the Microsoft SQL


Server database software which can be used to perform a broad range of data migration tasks. SSIS is a platform for data integration and workflow applications. It features a fast and flexible data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data. The Ab Initio software is a fourth generation powerful data analysis, batch processing, data manipulation graphical user interface (GUI)-based parallel processing tool which is commonly used to extract, transform and load (ETL) data.

Potrebbero piacerti anche