Sei sulla pagina 1di 29

Lecture #1

IMRAN KHAN
IBA

IMRAN KHAN 1
A producer wants to know….
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?

What product prom- Which customers


-otions have the biggest are most likely to go
impact on revenue? to the competition ?
What impact will
new products/services
have on revenue
and margins?

2
In the Beginning, life was simple…

„ 1960s:
… Data collection, database creation, IMS and network DBMS.
… Batch Reports
… hard to find and analyze information
… inflexible & expensive, reprogram every new request
„ 1970s:
… Relational data model, relational DBMS implementation.
… Terminal-Based DSS and EIS
… still inflexible, not integrated with desktop tools
But…

„ 1980s:
… RDBMS, advanced data models (extended-relational, OO, deductive,
etc.) and application-oriented DBMS (spatial, scientific, engineering,
etc.).
… Desktop data access and analysis tools
… query tools, spreadsheets, GUIs
… easier to use, but only access operational DB
Our information needs…
Kept growing. (The Spider web)

SOURCE: William H. Inmon


1990 onwards
„ 90’s: Data warehousing with integrated OLAP engines
and data mining tools, multimedia databases, and Web-
based database technology.

IMRAN KHAN 7
Goal: Unified Access to Data

• Collects and combines information


• Provides integrated view, uniform user interface
• Supports sharing
CS 336
What is a Data Warehouse?
A Practitioners Viewpoint

y Defined in many different ways:


{ A decision support database that is maintained
separately from the organization’s operational
database
{ Support information processing by providing a solid
platform of consolidated, historical data for analysis.
{ “A data warehouse is simply a single, complete, and
consistent store of data obtained from a variety of
sources and made available to end users in a way they
can understand and use it in a business context.”
-- Barry Devlin, IBM Consultant
What is Data Warehouse?
An Alternative Viewpoint
„ “A data warehouse is a subject-oriented,
integrated, time-variant, and nonvolatile
collection of data in support of management’s
decision-making process.”—W. H. Inmon
„ Data warehousing:
… The process of constructing and using data
warehouses

IMRAN KHAN 10
Data Warehouse - Subject Oriented
„ subject oriented: oriented to the major subject areas
of the corporation that have been defined in the data
model.
… E.g. for an insurance company: customer, product,
transaction or activity, policy, claim, account, and etc.
„ operational DB and applications may be organized
differently
… E.g. based on type of insurance's: auto, life, medical,
fire, ...

IMRAN KHAN 11
Data Warehouse - Integrated

„ There is no consistency in encoding,


naming conventions, … among different
data sources
„ When data is moved to the warehouse, it is
converted.

IMRAN KHAN 12
Data Warehouse - Non-Volatile

„ Operational data is regularly accessed and


manipulated a record at a time and update is
done to data in the operational environment.
„ Warehouse Data is loaded and accessed.
Update of data does not occur in the data
warehouse environment.

IMRAN KHAN 13
Data Warehouse - Time Variance
„ The time horizon for the data warehouse is significantly
longer than that of operational systems.
„ Operational database contain current value data. Data
warehouse data is nothing more than a sophisticated
series of snapshots, taken as of some moment in time.
„ The key structure of operational data may or may not
contain some element of time. The key structure of the
data warehouse always contains some element of time.

IMRAN KHAN 14
Data Warehouse Definitions
y The information in a DW is subject-oriented, non-
volatile, and of an historic nature, and so DWs tend to
contain extremely large datasets
y The purpose of the DW is to provide the tools and
facilities to manage and deliver complete, timely,
accurate, and understandable business information to
authorized individuals for effective business decision
making
y DW implementation needs a company-wide effort that
requires user involvement and commitment at all levels
y A successful DW implementation tracks return on
investment

IMRAN KHAN 15
Data Warehouse Definitions
„ A DW is a concept not a product
… It is the compiling, assembling, and consolidating of
application data common to user communities at a
single logical point
„ Typical use includes ad hoc queries, “what if”, data
matching, trend analysis and other sophisticated
information functions
„ Warehouse data is typically extracted from OLTP systems
„ A DW can be described as a read-only database that
provides users with access to consolidated, historic, or
static data extracted from operational databases, usually
augmented with external data

IMRAN KHAN 16
Comparison Chart of Database Types

Data warehouse Operational system


Subject oriented Transaction oriented

Large (hundreds of GB up to several Small (MB up to several GB)


TB)
Historic data Current data

De-normalized table structure (few Normalized table structure (many


tables, many columns per table) tables, few columns per table)
Batch updates Continuous updates

Usually very complex queries Simple to complex queries


Design Differences
Operational System Data Warehouse

ER Diagram Star Schema


Supporting a Complete Solution

Operational System-
Data Entry

Data Warehouse-
Data Retrieval
Application-Orientation vs. Subject-Orientation

Application-Orientation Subject-Orientation

Operational Data
Database Warehouse

Credit
Loans Card Customer

Vendor
Product
Trust

Savings Activity
20
To summarize ...

y OLTP Systems are


used to “run” a business

y The Data Warehouse


helps to “optimize” the
business

21
Data Warehouses, Data Marts, and
Operational Data Stores (ODS)
y Data Warehouse – The queryable source of data in the
enterprise. It is comprised of the union of all of its
constituent data marts.
y Data Mart – A logical subset of the complete data
warehouse. Often viewed as a restriction of the data
warehouse to a single business process or to a group of
related business processes targeted toward a particular
business group.
y Operational Data Store (ODS) – A point of integration for
operational systems that developed independent of each
other. Since an ODS supports day to day operations, it
needs to be continually updated.
SOURCE: Ralph Kimball
Inmon’s 12 Rules - 1

„ DW and operational environments are


separated
„ Integrated DW data
„ DW contains historical data
„ DW is snapshot data captured at particular
point in time
„ DW data is subject-oriented

IMRAN KHAN 23
Inmon’s 12 Rules - 2

„ No online update
„ DW SDLC is data-driven
„ DW contains several levels of data - raw to
summarized
„ Data sources are traced
„ Meta-data is a critical component
„ DW contains a charge back mechanism

IMRAN KHAN 24
DW Architecture (W. H. Inmon)

Load
Authoritative Extract / Enterprise Data Customise DataMarts
Source Enhance / single image Warehouse
Transform data view
Source Systems Build data Meets specific Delivery to
Layer
External systems Separates for OLAP user
data from appropriate requirements
Copy mgt
application datamart Industry
Extract
standard
Transform
Fully modelled Parallel tools
& documented process
Process once
Tailored
Business rules
Denormalize applications
for specific where
Consistency
use appropriate
& controls

Value add

Business Information Directory


IMRAN KHAN 25
Multitiered Architecture

Monitor OLAP
metadata & Server
Integrator
Analysis
other Query
sources Extract Serve Reports
Transform Data Data mining
Load
Refresh
Warehouse

Tools
IMRAN KHAN Data Marts
26
Why Separate Data Warehouse?

„ Performance
… special data organization, access methods, and
implementation methods are needed to support
multidimensional views and operations typical of
OLAP
… Complex OLAP queries would degrade performance
for operational transactions
… Concurrency control and recovery of OLTP mode are
not compatible with OLAP analysis

IMRAN KHAN 27
Why Separate Data Warehouse (II)?

„ Function
… missing data: Decision support requires historical
data which operational DBs do not typically maintain
… data consolidation: DS requires consolidation
(aggregation, summarization) of data from
heterogeneous sources: operational DBs, external
sources
… data quality: different sources typically use
inconsistent data representations, codes and
formats which have to be reconciled.
IMRAN KHAN 28
IMRAN KHAN 29

Potrebbero piacerti anche