Sei sulla pagina 1di 23

Data

warehous
e
Submitted By: RITESH
Guided by:­
Registration no.:­123456789
Dr. ABC XYZ Branch/Roll no.:­IT/12345
Definition
“ A data warehouse is a subject­
oriented, integrated, time­variant, 
and nonvolatile collection of data in 
support of management’s decision 
making process”
W. H. Inmon

2
Subject-Oriented
Datais arranged and
optimized to provide answer
to questions from diverse
functional areas
ØData is organized and
 summarized by topic
 -Sales / Marketing / Finance /
Distribution / Etc. 3

Integrated
 The data warehouse is a
centralized, consolidated
database that integrated data
derived from the entire
organization
Multiple Sources
Diverse Sources
Diverse Formats
4

Time-Variant
 The Data Warehouse represents
the flow of data through time
 Can contain projected data from
statistical models
 Data is periodically uploaded then
time-dependent data is
recomputed

5
Nonvolatile
 Once data is entered it is NEVER
removed
 Represents the company’s entire
history
Near term history is continually
added to it
Always growing
Must support terabyte databases
and multiprocessors
 Read-Only database for data analysis 6

and query processing



data warehousing
Data warehousing is the term used to
name the process of constructing and
using a data warehouse. Construction
of data warehouse consist of three
operation on data.
 1. Data integration
 2. Data cleaning
 3. Data consolidation

7
Why Data Warehousing?
Which
Whichare
areour
our
lowest/highest
lowest/highest
margin
margin
customers
customers?? Who
Whoarearemy
my
What customers
Whatisisthe
the customers
and
most
most andwhat
what
effective products
products
effective are
distribution
distribution are theybuying?
they buying?
channel?
channel?
What
Whatproduct
product Which
Which
prom-
prom- customers
customers
-otions
-otionshave
havethe are
the aremost
mostlikely
likely
biggest to
biggest What
Whatimpact
impact togo
go
impact
impacton to
on will
will tothe
the
revenue? competition
revenue? new
new competition??
products/servic
products/servic
es 88
es
have
haveon on
revenue
revenue
DATA WAREHOUSE DEPENDENTS
 Data warehouse dependents are
those parts of data warehouse,
which are not directly connected
with the warehouse functioning.
The commonly used dependents
are data marts and meta data.
 .

9
Data Warehouse Functionality

Relational
Databases
Optimized
Extractio Loader
ERP n
Systems Cleansin
g Data
Warehouse Analyze
Purchased 
Engine Query
Data

Legacy
10
Data Metadata Repository
data warehouse architecture

GO TO DIAGRAM

GO TO DIAGRAM

GO TO DIAGRAM

11
Data Warehouse Components

•Staging Area
•A preparatory repository where 
transaction data can be transformed for 
use in the data warehouse
•Data Mart 
•Traditional dimensionally modeled set of 
dimension and fact tables
•Per Kimball, a data warehouse is the 
union of a set of data marts 
•Operational Data Store (ODS)
•Modeled to support near real­time  12

reporting needs.
Very Large Data Bases
WAREHOUSES ARE VERY LARGE 
DATABASES

 Terabytes -- 10^12  Wal-Mart -- 24 Terabytes


bytes: 

Geographic Information
 Petabytes -- 10^15 Systems
bytes: National Medical Records


 Exabytes -- 10^18 bytes:
Weather images

 Zettabytes -- 10^21  Intelligence Agency 13


bytes:
Videos
Complexities of Creating a Data Warehouse

 Incomplete errors 

Missing Fields
Records or Fields That, by Design, are 
not Being Recorded

 Incorrect errors

Wrong Calculations, Aggregations
Duplicate Records
Wrong Information Entered into Source 
14

System
SUCCESS & FUTURE OF DATA WAREHOUSE
 The Data Warehouse has successfully supported
the increased needs of the State over the past
eight years.
 Theneed for growth continues however, as the
desire for more integrated data increases.
 The Data Warehouse has software and tools in
place to provide the functionality needed to
support new enterprise Data Warehouse projects.
 Thefuture capabilities of the Data Warehouse can
be expanded to include other programs and 15
agencies.
Data Warehouse Pitfalls

 You are going to spend much time extracting,


cleaning, and loading data
 You are going to find problems with systems
feeding the data warehouse
 You will find the need to store/validate data not
being captured/validated by any existing system
 Large scale data warehousing can become an
exercise in data homogenizing

16
Data Warehouse Pitfalls…

 The time it takes to load the warehouse will


expand to the amount of the time in the
available window... and then some
 You are building a HIGH maintenance system
 You will fail if you concentrate on resource
optimization to the neglect of project, data,
and customer management issues and an
understanding of what adds value to the
customer

17
Best Practices

 Complete requirements and design

 Prototyping is key to business understanding

 Utilizing proper aggregations and detailed data

 Training is an on­going process

 Build data integrity checks into your system.


18


Useful URLs

 Ralph Kimball’s home page


 http://www.rkimball.com
 Larry Greenfield’s Data Warehouse
Information Center
 http://pwp.starnetinc.com/larryg/
 Data Warehousing Institute
 http://www.dw-institute.com/
 OLAP Council
 http://www.olapcouncil.com/

19
Thank 
You
20
Top-Down Architecture

21
BACK TO 
ARCHITECTURE
Bottom-Up Architecture

22
BACK TO 
ARCHITECTURE
HYBRID Data Mart Architecture

23
BACK TO 
ARCHITECTURE

Potrebbero piacerti anche