Sei sulla pagina 1di 24

Introduction to Data Warehousing

From DBMS to Decision Support


DBMSs widely used to maintain transactional data Attempts to use of these data for analysis, exploration, identification of trends etc. has led to Decision Support Systems. Rapid Growth since mid 70s DBMSs vendors have answered this trend by adding new features to existing products Rarely enough

DBs for Decision Support


Trend towards Data Warehousing Data Warehousing consolidation of data from several databases which are in turn maintained by individual business units along with historical and summary information

Characteristics of TPSs
Characteristic Typical operation Level of analytical requirements Screens Amount of data per transaction Data level Age of data Orientation OLTP Update Low Unchanging Small Detailed Current Records

TPS vs Decision Support

OLTP Information to support day-to-day service Data stored at transaction level Database design: Normalized

Complex Analysis Historical information to analyze Data needs to be integrated

Database design:
Denormalized, star schema

MIS and Decision Support


Ad hoc access Production platforms

Operational reports

Decision makers

MIS systems provided business data Reports were developed on request Reports provided little analysis capability no personal ad hoc access to data

Analyzing Data from Operational Systems


ERP

Data structures are complex Systems are designed for high performance and throughput Data is not meaningfully represented Data is dispersed TPS systems unsuitable for intensive queries

Production platforms Operational reports

Data Extract Processing

Operational systems

Extracts

Decision makers

End user computing offloaded from the operational environment Users own data

Management Issues

Operational systems

Extracts

Decision makers

Extract explosion
Duplicated effort Multiple technologies Obsolete reports No metadata

Data Quality Issues


No common time basis Different calculation algorithms Different levels of extraction Different levels of granularity Different data field names Different data field meanings Missing information No data correction rules No drill-down capability

From Extract to Warehouse DSS

Internal and external systems

Data warehouse

Decision makers

Controlled Reliable Quality information Single source of data

Data Warehousing Architecture


External Data Sources

Visualisation Extract Clean Metadata respository

Transform Load
Refresh

Serves OLAP

Operational Databases

Data Warehouse

Data Mining

Business Motivators
Provide superior services and products Know the business New products Invest in customers Retain customers Invest in technology Reinvent to face new challenges

Centralised data warehouse


Co rpo rate datawarehouse Mainframe Co rpo rate Financial Marketing Man ufact uring Dist ribution Serv er Analy st Analy st

Analy st

Federated data warehouse


Financial Corporate data warehouse Marketing Manufact uring Dist ribution Analyst Analyst Analyst Analyst Mainframe

Tiered data warehouse


Mainframe

Tier 3 (detailed dat a)

Corporate dat a warehouse

Tier 2 (summarized dat a)

Local data mart

Tier 1 (highly summarized data)

Workst at ion

Analyst

Data Warehouses Vs Data Marts


Data Warehouse Property Scope Data Warehouse Enterprise Data Mart

Data Mart
Department Single-subject Few < 100 GB Months

Subjects
Data Source Size (typical)

Multiple
Many 100 GB to > 1 TB

Implementation time

Months to years

End-user Access Tools


High performance is achieved by pre-planning the requirements for joins, summations, and periodic reports by end-users. There are five main groups of access tools: Data reporting and query tools Application development tools Executive information system (EIS) tools Online analytical processing (OLAP) tools Data mining tools

Data Usage - $1000 questions


Verification What is the average sale for in-store and catalog customers? What is the average high school GPA of students who graduate from college compared to those who do not? Discovery What is the best predictor of sales? What are the best predictors of college graduation?

Need to complement RDBMS technology with a flexible, multidimensional view of data

The Functionality of OLAP


Rotate and drill down Create and examine calculated data Determine comparative or relative differences. Perform exception and trend analysis. Perform advanced analytical functions

The star structure

Multidimensional Database Customer Store Model Store


Time Time FINANCE SALES

Product

The data is found at the intersection of dimensions.

Data Mining

Data mining functions


Associations
85 percent of customers who buy a certain brand of wine also buy a certain type of pasta

Sequential patterns
32 percent of female customers who order a red jacket within six months buy a gray skirt

Classifying
Frequent customers are those with incomes about $50,000 and having two or more children

Clustering
Market segmentation

Predicting
predict the revenue value of a new customer based on that personal demographic variables

Potrebbero piacerti anche