Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
GROUP 7:
Arshiya
Pathan
Jisha
Rakesh
Karthik
Ramanuj
Amit Kumar
Data Warehousing
Concepts
What is a Data Warehouse?
A Data warehouse is a subject-oriented, integrated, time-
varying, non-volatile collection of data in support of the
management's decision –making process
(OR)
Data warehouse is a relational database for Query and
Analysis Rather than for transaction processing.
Subject-oriented(customer,products,sales,etc)
Non-volatile
Time-varying
Integrated
(William.Inmon –1993)
Subject-oriented
A Data Warehouse is organized around
major subjects, such as customer,
supplier, product and sales. Rather than
concentrating on day-to-day operations
and transaction processing of an
organization.
Integrated
A Data Warehouse is usually constructed by
integrating multiple heterogeneous
sources, such as relational databases, flat
files, and on-line transaction records. data
cleaning and data integration techniques
are applied to ensure consistency in
naming conventions, encoding structures,
attribute measures and so on.
Time Variant
SEX
FEMAL
MALE
E
Dimension Modeling
Example:
Profession Level 0
Level
Engineer Secretary Teacher 1
Level 2
FACTS
Fact: Fact consists of whole data with primary key,
foreign key relation ship with dimensions and
also consists of measures.
There are Three types of facts
1.ADDITIVE FACTS
2.SEMI ADDITIVE FACTS
3.NON ADDITIVE FACTS
Fact less Fact & Conform dim
Fact less fact is a fact it does not
contain
Measures.
A Dimension which can share more
than one Fact is called Conform
Dimension
Collection of Star Schemas and
Snowflake
Schemas is called Galaxy.
TYPES OF MAPPINGS
History
Simple pass through
(None)
Slowly growing target
(Full)
Slowly changing dimension
(depends)
Types of SCD’S
Slowly changing dimension—1
Slowly changing dimension—2
Time stamping
versioning
Flaging
Slowly changing dimension—3
Slowly changing
Dimension-1
SCD-1:When you does not want History
use this kind of mapping (Only insert
else
Update takes place) it inserts the new
row or Update the existing dimensions.
Slowly changing
Dimension-2
SCD -2 (Time stamp): When you want maintain
full history use this kind of mapping. Inserts new
and changed dimensions. Creates an effective
date range to track changes.
SCD -2 (Versioning): Inserts new and changed
dimensions. Creates a version number and
increments the primary key to track changes.
SCD -2 (Flaging): Inserts new and changed
dimensions. Flags the current version and
increments the primary key to track changes.
Slowly changing
Dimension-3
SCD -3 : when you want partial history use
this kind of mapping. It Inserts new
dimensions. Updates changed values in
existing dimensions. Optionally uses the
load date to track changes.
Data Warehouse Execution
Architecture
Architecture
Oracle
ODBC
NCR
ODBC Tera Data
Informatica (ETL)
ODBC Warehouse
UNIX-HP
SQL Server Native Native
FTP
Flat files
Benefits Of DataWarehousing
ü
üAllows for a continuous planning process .
ü
üData from different locations can be combined
in one location.
Drawbacks of data
warehousing