Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
IMRAN KHAN
IBA
IMRAN KHAN 1
A producer wants to know….
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?
2
In the Beginning, life was simple…
1960s:
Data collection, database creation, IMS and network DBMS.
Batch Reports
hard to find and analyze information
inflexible & expensive, reprogram every new request
1970s:
Relational data model, relational DBMS implementation.
Terminal-Based DSS and EIS
still inflexible, not integrated with desktop tools
But…
1980s:
RDBMS, advanced data models (extended-relational, OO, deductive,
etc.) and application-oriented DBMS (spatial, scientific, engineering,
etc.).
Desktop data access and analysis tools
query tools, spreadsheets, GUIs
easier to use, but only access operational DB
Our information needs…
Kept growing. (The Spider web)
IMRAN KHAN 7
Goal: Unified Access to Data
IMRAN KHAN 10
Data Warehouse - Subject Oriented
subject oriented: oriented to the major subject areas
of the corporation that have been defined in the data
model.
E.g. for an insurance company: customer, product,
transaction or activity, policy, claim, account, and etc.
operational DB and applications may be organized
differently
E.g. based on type of insurance's: auto, life, medical,
fire, ...
IMRAN KHAN 11
Data Warehouse - Integrated
IMRAN KHAN 12
Data Warehouse - Non-Volatile
IMRAN KHAN 13
Data Warehouse - Time Variance
The time horizon for the data warehouse is significantly
longer than that of operational systems.
Operational database contain current value data. Data
warehouse data is nothing more than a sophisticated
series of snapshots, taken as of some moment in time.
The key structure of operational data may or may not
contain some element of time. The key structure of the
data warehouse always contains some element of time.
IMRAN KHAN 14
Data Warehouse Definitions
y The information in a DW is subject-oriented, non-
volatile, and of an historic nature, and so DWs tend to
contain extremely large datasets
y The purpose of the DW is to provide the tools and
facilities to manage and deliver complete, timely,
accurate, and understandable business information to
authorized individuals for effective business decision
making
y DW implementation needs a company-wide effort that
requires user involvement and commitment at all levels
y A successful DW implementation tracks return on
investment
IMRAN KHAN 15
Data Warehouse Definitions
A DW is a concept not a product
It is the compiling, assembling, and consolidating of
application data common to user communities at a
single logical point
Typical use includes ad hoc queries, “what if”, data
matching, trend analysis and other sophisticated
information functions
Warehouse data is typically extracted from OLTP systems
A DW can be described as a read-only database that
provides users with access to consolidated, historic, or
static data extracted from operational databases, usually
augmented with external data
IMRAN KHAN 16
Comparison Chart of Database Types
Operational System-
Data Entry
Data Warehouse-
Data Retrieval
Application-Orientation vs. Subject-Orientation
Application-Orientation Subject-Orientation
Operational Data
Database Warehouse
Credit
Loans Card Customer
Vendor
Product
Trust
Savings Activity
20
To summarize ...
21
Data Warehouses, Data Marts, and
Operational Data Stores (ODS)
y Data Warehouse – The queryable source of data in the
enterprise. It is comprised of the union of all of its
constituent data marts.
y Data Mart – A logical subset of the complete data
warehouse. Often viewed as a restriction of the data
warehouse to a single business process or to a group of
related business processes targeted toward a particular
business group.
y Operational Data Store (ODS) – A point of integration for
operational systems that developed independent of each
other. Since an ODS supports day to day operations, it
needs to be continually updated.
SOURCE: Ralph Kimball
Inmon’s 12 Rules - 1
IMRAN KHAN 23
Inmon’s 12 Rules - 2
No online update
DW SDLC is data-driven
DW contains several levels of data - raw to
summarized
Data sources are traced
Meta-data is a critical component
DW contains a charge back mechanism
IMRAN KHAN 24
DW Architecture (W. H. Inmon)
Load
Authoritative Extract / Enterprise Data Customise DataMarts
Source Enhance / single image Warehouse
Transform data view
Source Systems Build data Meets specific Delivery to
Layer
External systems Separates for OLAP user
data from appropriate requirements
Copy mgt
application datamart Industry
Extract
standard
Transform
Fully modelled Parallel tools
& documented process
Process once
Tailored
Business rules
Denormalize applications
for specific where
Consistency
use appropriate
& controls
Value add
Monitor OLAP
metadata & Server
Integrator
Analysis
other Query
sources Extract Serve Reports
Transform Data Data mining
Load
Refresh
Warehouse
Tools
IMRAN KHAN Data Marts
26
Why Separate Data Warehouse?
Performance
special data organization, access methods, and
implementation methods are needed to support
multidimensional views and operations typical of
OLAP
Complex OLAP queries would degrade performance
for operational transactions
Concurrency control and recovery of OLTP mode are
not compatible with OLAP analysis
IMRAN KHAN 27
Why Separate Data Warehouse (II)?
Function
missing data: Decision support requires historical
data which operational DBs do not typically maintain
data consolidation: DS requires consolidation
(aggregation, summarization) of data from
heterogeneous sources: operational DBs, external
sources
data quality: different sources typically use
inconsistent data representations, codes and
formats which have to be reconciled.
IMRAN KHAN 28
IMRAN KHAN 29