Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Chapter 3: Extracting
Prepared by: Thang Nguyen, Dung Phan
Date: 09/02/2008
3.0 Overview
The ETL process needs to effectively integrate
systems that have different:
Database management systems
Operating systems
Hardware
Communications protocols
3.2.2Components of the
Logical Data Map (cont)
Department/Business use.
Business owner.
Technical Owner.
DBMS.
Production server/OS.
One-to-one
One-to-many.
Many-to-many.
3.4 Integrating
Heterogeneous
Data
1. Identify the source systems.
Sources
2. Understand the source systems (data
profiling).
3. Create record matching logic.
4. Establish survivorship rules.
5. Establish non-key attribute business
rules.
6. Load conformed dimension.
3.4.2 Connecting to
Diverse Sources through
ODBC
4.5.5 Handling
Mainframe
Data
When you begin to Numeric
work with quantitative
data elements, such as dollar amounts,
counts, and balances, you can see that theres
more to these numbers than meets the eye.
3.5.11 Handling
Mainframe Variable
Record Lengths
3.6.2 Processing
Delimited
Flat
Files
Flat files often come
with a set of delimiters
that separate the data fields within the file.
attributes
Extensible to future additions
Support of namespaces
Namespaces
3.10.1 Detecting
Changes
Using Audit Columns.
Database Log Scraping or Sniffing
Timed Extracts
Process of Elimination
Initial and Incremental Loads
Summary
The Logical Data Map
The Challenge of Extracting from Disparate
Platforms
Extracting Changed Data