Sei sulla pagina 1di 21

DATA WAREHOUSING

GROUP 7:
Arshiya
Pathan
Jisha
Rakesh
Karthik
Ramanuj
Amit Kumar
Data Warehousing
Concepts
What is a Data Warehouse?
 A Data warehouse is a subject-oriented, integrated, time-
varying, non-volatile collection of data in support of the
management's decision –making process
 (OR)
 Data warehouse is a relational database for Query and
Analysis Rather than for transaction processing.

 Subject-oriented(customer,products,sales,etc)
 Non-volatile
 Time-varying
 Integrated

(William.Inmon –1993)
Subject-oriented
A Data Warehouse is organized around
major subjects, such as customer,
supplier, product and sales. Rather than
concentrating on day-to-day operations
and transaction processing of an
organization.
Integrated
A Data Warehouse is usually constructed by
integrating multiple heterogeneous
sources, such as relational databases, flat
files, and on-line transaction records. data
cleaning and data integration techniques
are applied to ensure consistency in
naming conventions, encoding structures,
attribute measures and so on.
Time Variant

Data are stored to provide information from


a historical perspective (example: the past
5 -10 years). Every key structure in the
data warehouse contains, either implicitly
or explicitly an element of time
Non-Volatile
A data warehouse is always a physically
separate store of data transformed from
the application data found in the
operational environment. due to this
separation, a data warehouse does not
require transaction
processing,recovery,and concurrency
control mechanisms. it usually requires
only two operations in data accessing
:initial loading loading of data and
access of data.
Comparison of OLTP systems and
data warehousing systems
OLTP Systems Data warehousing systems

Holds Current data Holds Historical data

Stores detailed data stores detailed,lightly,and highly summarized


data
Data is dynamic Data is largely static

Transaction-driven Analysis driven

Application -oriented Subject -oriented

Supports day-to-day decisions Supports strategic decisions

Serves large number of clerical/operational Serves relatively low number of managerial


users users
Dimension Modeling
Dimension: Dimension is a structure which
consists of levels, and hierarchies is
defined on each level.
Example:

SEX

FEMAL
MALE
E
Dimension Modeling
 Example:

Profession Level 0

Level
Engineer Secretary Teacher 1

Chemical Civil Executive Junior Elementary High School

Level 2
FACTS
Fact: Fact consists of whole data with primary key,
foreign key relation ship with dimensions and
also consists of measures.
There are Three types of facts

1.ADDITIVE FACTS

2.SEMI ADDITIVE FACTS

3.NON ADDITIVE FACTS
Fact less Fact & Conform dim
Fact less fact is a fact it does not
contain
 Measures.
A Dimension which can share more
than one Fact is called Conform
Dimension
Collection of Star Schemas and
Snowflake
 Schemas is called Galaxy.

TYPES OF MAPPINGS

History
Simple pass through
(None)

Slowly growing target
(Full)

Slowly changing dimension
(depends)
Types of SCD’S
Slowly changing dimension—1
Slowly changing dimension—2
 Time stamping
 versioning
 Flaging
Slowly changing dimension—3

Slowly changing
Dimension-1
SCD-1:When you does not want History
use this kind of mapping (Only insert
else
 Update takes place) it inserts the new
row or Update the existing dimensions.
Slowly changing
Dimension-2
SCD -2 (Time stamp): When you want maintain
full history use this kind of mapping. Inserts new
and changed dimensions. Creates an effective
date range to track changes.
SCD -2 (Versioning): Inserts new and changed
dimensions. Creates a version number and
increments the primary key to track changes.
SCD -2 (Flaging): Inserts new and changed
dimensions. Flags the current version and
increments the primary key to track changes.

Slowly changing
Dimension-3

SCD -3 : when you want partial history use
this kind of mapping. It Inserts new
dimensions. Updates changed values in
existing dimensions. Optionally uses the
load date to track changes.

Data Warehouse Execution
Architecture
 Architecture

Source Systems ETL DATA WAREHOUSE

Reporting( DSS) Servers


DB2/400

Oracle

ODBC
NCR
ODBC Tera Data
Informatica (ETL)
ODBC Warehouse
UNIX-HP
SQL Server Native Native

FTP

Flat files
Benefits Of DataWarehousing

Helps Users to access large


amount of information
§Problems can be solved using
information

Has a Subject Area


Orientation
§Data is arranged by subject rather than by
application, and is more intuitive for users to
navigate
ü
üAssist a company in increasing its profits,
cost of computing is reduced.

ü
üAllows for a continuous planning process .

ü
üData from different locations can be combined
in one location.
Drawbacks of data
warehousing

ØTakes long period of time and difficult to maintain


Ø
ØDesign and implementation effort
Ø
ØIssues with compatibility
Ø
ØMay lead to number of security problems
( accessing via the internet)
THANK YOU

Potrebbero piacerti anche