Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CLASS AGENDA
Data Warehousing
MY BACKGROUND
President, Data Warehouse Consultants LLC a
Pittsburgh-focused database and data warehouse
consulting company
Started company in 2004 to focus solely on data
warehousing consulting opportunities
Former Deloitte Consulting Manager one of the
founding members of Deloittes DW public sector
practice.
15+ years of experience in design, development and
implementation of data warehouse projects
Successful implementation of many different data
warehouses in various businesses
Master of Business Administration from Tepper School
of Business and EE undergraduate degree from Penn
State University
Data Warehousing
CLASS AGENDA
Data Warehousing
SYLLABUS REVIEW
Textbook
Ralph Kimball and Margy Ross. The
Data Warehouse Toolkit: The
Definitive Guide to Dimensional
Modeling (Third Edition).
ISBN: 1-118-53080-2
Data Warehousing
BLACKBOARD
Blackboard will be used as for the
class website.
Lecture slides, handouts and
announcements will be posted via
the website.
Data Warehousing
COURSE GOALS
Understand the basic components of a
data warehouse
Design a data warehouse based on user
requirements
Create a prototype data warehouse using
established principles discussed in class
This will essentially be a project course
and most work will revolve around your
groups project.
Data Warehousing
COURSE GRADING
Grading Criteria
Quizzes (2)
30%
Project Requirements & Design 15%
Project Presentation 15%
Course Project
30%
Class Participation 10%
Data Warehousing
QUIZZES
Two scheduled in-class quizzes
Focused on key principles of data warehousing
discussed in class and in the handouts
Scheduled for Week 3 and Week 6 (subject to change)
If you miss a quiz you will receive a zero for that score
unless you make alternate arrangements with me IN
ADVANCE. No make-ups or alternate arrangements
will be made after the quiz is given.
Alternate arrangements are not guaranteed, however,
and are made solely at my discretion based on the
individual students circumstances.
Data Warehousing
COURSE PROJECT
Once the basics of data warehousing have been
covered, the course project will become the focal point
of the class
Groups will consist of 3-4 members (depending on
class size) and be assigned randomly by me
The project is due the final week of class. I strongly
encourage you to begin the project in week 3 once
your groups have been assigned.
Every group member should know the subject and
goals of his/her project. Failure to be knowledgeable
on your groups activities will negatively affect your
evaluation.
Data Warehousing
10
Data Warehousing
11
GROUP PRESENTATION
Each group will be required to
present a short synopsis on their
project during the final class.
Presentations will be approximately
10 to 12 minutes in length.
All group members are strongly
encouraged to be present for your
groups presentation
The final presentations will be
during the last day of class.
Data Warehousing
12
13
COURSE PROJECT
The final project is due the last week of class
Detailed instructions of what is required for
the project will be posted later in the year
Every group member should know the subject
and goals of his/her project. Failure to be
knowledgeable on your groups activities will
negatively affect your evaluation.
Data Warehousing
14
Data Warehousing
16
Data Warehousing
17
CLASS AGENDA
Data Warehousing
18
subject-oriented,
integrated,
time-varying,
non-volatile
a collection of data that is used primarily in
organizational decision making [Inmon, 1992]
Data Warehousing
19
Data Warehousing
20
REAL-WORLD DEFINITIONS
Whatever the business says it is
The distinctions really break down:
Data mart
Decision Support System
Data warehouse Data Analysis
Environment
Cube
Access database
Reporting system
Analytic Workspace
21
Data Warehousing
22
Data Warehousing
23
Data Warehousing
24
Target Audience
Helps to maintain one version of the truth.
Data Warehousing
25
OLAP Servers
Relational OLAP (ROLAP): extended relational DBMS that
maps operations on multidimensional data to standard
relational operations.
Multidimensional OLAP (MOLAP): special purpose server
that directly implements multidimensional data and
operations.
Tools or Clients
Data Warehousing
26
Oracle
Microsoft SQL Server
Sybase
IBM DB2
Microsoft Access
ETL Tools
Informatica PowerPlay
Ascential DataStage (IBM)
Hyperion Application Link
Oracle PL/SQL
Microsoft Data Transformation Services
Many Others
Data Warehousing
27
Source System 1
Extracts
Source System 2
Source System 2
Extracts
Transformation Logic
Source System 1
OLAP Cubes
Operational Data
Store (ODS)
or
Staging Area
Data Warehouse
Standard Reports
Data marts
Extraction
User View
METADATA
Data Warehousing
28
IBM Cognos 8 BI
Business Objects
Enterprise
Desktop Intelligence /
OLAP Intelligence
Web Intelligence
Web-based
multidimensional
analysis
Microsoft Excel
Spredsheet integration
MS SQL Server BI
Comments
PowerPlay Excel
IBM C ognos 8 BI
Analysis for Microsoft
Excel
Impromptu
C rystal Reports
ReportNet (Report
Studio + Query Studio) WebIntelligence
Reporting
C ognos WebPortal
C ognos C onnection
InfoView
Web Portal
GO! Dashboard
Performance manager
Report Studio
XC elsius
Business Intelligence
Development Studio
Visual dashboards
Visualizer
Live office
Xcelsius
Dashboard builder
DecisionStream
Data Manager
Framework Manager
Designer
Business Intelligence
Development Studio
Modeling application
Business Intelligence
Development Studio
Scorecarding
BusinessObjects
Enterprise XI
Business activity
monitoring
Planning application
PowerPlay Transformer
Impromptu
Administrator
Metrics Manager
Metrics Studio
Notice cast
Event Studio
Planning
Planning
C ontroller
C ontroller
Performance manager
Dashboard manager
http://www.bi-dw.info/cognos-bo-sqlserver.htm
Data Warehousing
29
CLASS AGENDA
Data Warehousing
30
DIMENSIONAL MODELING
Transactional systems
Designed to allow for quick transactional processing and
efficient storing of data.
To accomplish this, designers typically use some type of
normalization. Most strive for Third Normal Form.
Analytical systems
Designed to extract and query data quickly
Access speed is the main concern
Hence, normalization which is widely used for transactional
databases, is generally not appropriate for data warehouse
design
Design should reflect multidimensional view
Data Warehousing
31
Source System 1
Extracts
Source System 2
Source System 2
Extracts
Transformation Logic
Source System 1
OLAP Cubes
Operational Data
Store (ODS)
or
Staging Area
Data Warehouse
Standard Reports
Data marts
Extraction
User View
METADATA
Data Warehousing
32
THE PROBLEM
Transactional models, while efficient for
transaction processing, are not good for
analytics
How do we
determine
the average
grade in
biology for
CMU in a
given
semester?
Data Warehousing
33
THE SOLUTION
Organize the data so it can be pulled out more
efficiently.
The number
of students
can be
counted by a
simple
aggregate
query based
on the fact
table.
Data Warehousing
34
DIMENSIONAL MODELING
COMPONENTS
Fact Table
Primary table which stores the performance measurements
of the business
The term fact refers to a business measure
Each row in a fact table corresponds to a specific
measurement
Each measurement is taken at the intersection of all the
relevant dimensions (e.g., day, product, and store) this list
of dimensions defines the grain of the fact table
All measurements in a fact table must be at the same grain
Facts are either additive, semiadditive, or nonadditive
most are numeric
Contains two or more foreign keys to dimension tables
Expresses the many-to-many relationships between
dimensions in dimensional models
Data Warehousing
35
DIMENSIONAL MODELING
COMPONENTS
Dimension Tables
Contain the textual descriptors of the business
Usually low in cardinality, but very wide (50100 attributes not uncommon)
Dimension attributes used as query
constraints, groupings, and report labels
The more descriptive the dimension attributes,
the better
Often contain hierarchical relationships
(city=>state=>region)
Data Warehousing
36
DIMENSIONAL MODELING
COMPONENTS
Fact Table + Dimension Tables =
Dimensional Model (Star Schema)
Benefits of dimensional model
Simplicity
Easy for business users to understand
Improved query performance
Extensibility
Easily accommodates change (but not that
easily!)
Data Warehousing
37
Data Warehousing
38
CLASS AGENDA
Data Warehousing
39
Data Warehousing
40
Data Warehousing
41
Data Warehousing
42
Why?
Data Warehousing
43
Date
Product
Store
Data Warehousing
44
Data Warehousing
45
Data Warehousing
46
Product B
Price is $100, Cost is $90
Gross Profit is $10
Gross Margin is 10%
Data Warehousing
47
Data Warehousing
48
Data Warehousing
49
SELECT SKU,
SUM(GrossProfit),
SUM(GrossProfit) / SUM (SalesDollars) AS GrossMargin
FROM
RetailSalesTransactionFact F
MonthName = January
GROUP BY SKU
Data Warehousing
50
Date
Full Date Description
Month Number
Month Name
Month Short Name
Day Number in Month
Day of Week
Day Number in Year
Year
Fiscal Quarter
Fiscal Year
Holiday Indicator
First Day of Quarter Indicator
Selling Season
Etc.
Data Warehousing
It is possible to pre-populate
Date dimension
51
Data Warehousing
52
Data Warehousing
Product hierarchy:
SKU=>Brand=>Category=>Department
53
Store Number
Store Name
Store Street Address
Store City
Store County
Store State
Store Zip Code
Store Manager
Store District
Store Region
Floor Plan Type
Selling Square Footage
First Open Date
Data Warehousing
54
A causal dimension it
describes factors believed to
cause a change in product sales
Example attributes:
Promotion Code
Promotion Name
Price Reduction Type
Promotion Media Type
Ad Type
Display Type
Coupon Type
Ad Media Name
Display Provider
Promotion Cost
Promotion Begin Date
Promotion End Date
Data Warehousing
55
Data Warehousing
56
SURROGATE KEYS
Operational codes (e.g., SKU number) can still be retained for analysis purposes
The main disadvantage of using surrogate keys is that it requires some effort to
implement
Data Warehousing
57
Data Warehousing
58
Date Dimension
3,650 dates (10 years) x 1 KB per row = 3.5 MB
Store Dimension
100 stores x 2 KB per row = 0.2 MB
Promotion Dimension
5,000 promotions x 1KB per row = 5 MB
Data Warehousing
59
Data Warehousing
60
CLASS AGENDA
Data Warehousing
61
COURSE PROJECT
Project groups will be assigned by
me after the late drop period next
week.
Groups will consist of 4-5 members
Data Warehousing
62
COURSE PROJECT
Its not too early to begin thinking
about your topic for the course
project.
What you will need
A business objective (real or
plausible)
A source of data
An interest in the topic (this could be
important!)
Data Warehousing
63
READING ASSIGNMENTS
Kimball Chapters 1, 2 and 3
Chaudhuri and Dayal, An
Overview of Data Warehousing and
OLAP Technology, Sections 1-7
(available on Blackboard)
Data Warehousing
64
CLASS AGENDA
Data Warehousing
65
66