Sei sulla pagina 1di 10

Session 1: Overview of Extraction, Transformation & Loading

Slide 1

Objectives
At the end of this lesson, you will know : a data warehouse or data mart Data Extraction, Transformation and Load Process Types of ETL Tools What to look for in ETL Tools Key tools in the market ETL Trends & New Solution Options

<Enter Project Name>

Slide 2

What is a Data Warehouse?


Repository of data Optimized for report generation Supports business analysis Projections Comparisons Assessments Extracted from operational sources Integrated Summarized Filtered Cleansed De-normalized Historical

<Enter Project Name>

Slide 3

Data Marts
Like data warehouses but smaller in scope Subject oriented Organize data from a single subject area or department Solve a small set of business requirements Cheaper and faster to build

<Enter Project Name>

Slide 4

Data Extraction and Preparation


Stage I

Stage II

Analyze, Clean and Transform

Periodic Refresh/ Update

Stage III

Data Movement and Load

<Enter Project Name>

Slide 5

The Need For Data Transformation


Businesses have data in multiple databases with different formats Mergers and acquisitions have also created disparities in data representation

<Enter Project Name>

Slide 6

The ETL Process


Access data dictionaries defining source files Build logical and physical data models for target data Survey existing systems to identify sources of data Specify business and technical rules for data extraction, conversion and transformation Perform data extraction and transformation Load target databases

<Enter Project Name>

Slide 7

The ETL Process


Data Modeling Tool

Data Definitions

Define/code Extraction Rules MDDB

Source Metadata

Metadata Repository

Target Metadata

RDBMS Extract Program Generation

Clean Data
Load Data Warehouse

Source Databases

Raw Data

Run Extract Programs

<Enter Project Name>

Slide 8

The ETL Process

Transform
Staging Area Data Warehouse

OLTP Systems

Extract

Load

OLTP Systems

Stage I

Stage II
<Enter Project Name>

Stage III
Slide 9

ETL Tools
Provides facility to specify a large number of transformation rules with a GUI Generate programs to transform data Handle multiple data sources Handle data redundancy Generate metadata as output Most tools exploit parallelism by running on multiple lowcost servers in multi-threaded environment

<Enter Project Name>

Slide 10

Potrebbero piacerti anche