Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Presented by
1
Course Agenda
Rationale for dimensional modeling
Dimensional modeling basics
Dimensional modeling details
Fact table details
Dimension table details
Design process
Aggregate schemas
Multiple fact tables
Architected data marts
2
Rationale for Dimensional
Modeling
3
OLTP Design Characteristics
Focus of OLTP Design
Individual data elements
Data relationships
Design goals
Accurately model
business
Remove redundancy
4
OLTP Design Shortcomings
Complex
Unfamiliar to business
people
Incomplete history
Slow query
performance
5
Emergence of Dimensional
Model
Logical modeling technique
For designing relational database structures
Addresses OLTP design shortcomings
For use in analytic systems
First developed early 1980's
Packaged goods industry
Popularized by Ralph Kimball, PhD.
1996 book: 'The Data Warehouse Toolkit'
6
Dimensional Modeling
Basics
7
Process Measurement
Measures
Metrics or indicators by
which people evaluate a
Coffee Maker Fulfillment Report
business process
Referred to as “Facts” Brand Product Units Sold Units Shipped % Shipped
Examples
Coffee Coffee
Maker
Inventory Amount
Deluxe
Coffee 2,073 1,658 80%
Maker
Receivable Dollars
Return Rate
Facts
8
Perspective Focus
G/L
category Product, supplier
account
warehouse
Maker
Descriptive business terms Thermal 2,400 1,632 68%
Coffee
Examples Maker
Deluxe
Coffee
Product
2,073 1,658 80%
Maker
All
Warehouse Products 9,473 7,090 75%
Customer
Supplier
10
Dimensions
Dimensional Model
Definition
Logical data model used to represent the
measures and dimensions that pertain to one or
more business subject areas
Dimensional Model = Star Schema
Serves as basis for the design of a relational
database schema
Can easily translate into multi-dimensional
database design if required
Overcomes OLTP design shortcomings
11
Dimensional Model Advantages
Understandable
Systematically
represents history
Enterprise scalability
12
Schema Simplicity
Fewer tables Store
Denormalized Time
Facts
Consolidated
Dimensional
Familiar to users
Facts go in the fact tables
Product
Dimensions in dimension
tables
Increases
understandability
13 Star Schema
Data Familiarity
Adding business context
Single source field
ord_date
Expanded into parts
Decoded into business
terms
Add special indicators
and flags Time Dimension
year
e.g. time dimension
quarter
month
date
Increases day of the week
14
Representing History
Store
Time dimension
Time
Part of every star schema Dimension
Facts
Marks the date when the year
facts (process quarter
month
measurements) occurred date
Allows the schema to day of the week
holiday flag Product
easily add and query
data over time
Especially useful for
performing comparison
queries
15
Time Dimension
Fewer Join Paths
Star schema joins
Defined during schema
design - not runtime
Business people can
easily understand these
relationships
One-to-many relations
between dimensions and
facts
Referential integrity
always enforced
16
High Performance Design
Fewer joins means
less 'expensive'
queries
Deterministic query
patterns
Star schema query
optimization
supported by all
major RDBMS
vendors
17
Subject Area Models
Subject
area E/R
models
Manufacturing and Shipping and Sales Order Entry Customer Support
Process Control Inventory and Campaign and Relationship
Management Management Management
Subject area
dimensional
models
18
Enterprise Models
Enterprise
Scope E/R
model
Enterprise
scope
dimensional
model
19
Dimensional Design
Details
20
Star Schema Dimension Tables
Dimension tables Dimension
Dimension tables
usually referred to
simply as Dimension
'dimensions'
Spend extra effort to
add dimensional
attributes
21
Dimension Keys
Synthetic keys Dimension
Each table assigned a Dimension
key
22
Dimension Columns
Dimension
Dimension attributes
Key
Specify the way in Dimension
attribute
which measures are Key
attribute
viewed: rolled up, attribute
attribute
broken out or attribute
summarized attribute
Often follow the word
“by” as in “Show me Dimension
as 'Dimensions' attribute
23
Star Schema Fact Table
Process measures
Start by assigning one
fact table per business Fact Table
subject area
Fact tables store the
process measures (aka fact1
Facts) fact2
Compared to fact3
24
Fact Table Primary Key
Every fact table
Multi-part primary key
added Fact Table
Made up of foreign key
key
keys referencing key
dimensions fact1
fact2
fact3
25
Fact Table Sparsity
Sparsity
Term used to describe the very common situation
where a fact table does not contain a row for
every combination of every dimension table row
for a given time period
26
Fact Table Grain
Grain
The level of detail
represented by a row in Fact Table
the fact table
Must be identified early
Cause of greatest
confusion during design
process
Example
Each row in the fact table
represents the daily item
sales total
27
Designing a Star Schema
Five initial design steps
Based on Kimball's six steps
Start designing in order
Re-visit and adjust over project life
28
Step One
29
Step Two
30
Step Three
3. Identify dimensions
31
Step Four
4. Select facts
32
Step Five
5. Identify dimensional
attributes
33
Fact Table Details
34
Example Fact Table
Sales Facts
model_key
dealer_key
time_key
revenue
quantity
35
Facts
Fully additive
Can be summed across any and all dimensions
Stored in fact table
Examples: revenue, quantity
36
Facts
Semi-additive
Can be summed across most dimensions but not
all
Anything that measures a “level”
Must be careful with ad-hoc reporting
Often aggregated across the “forbidden
dimension” by averaging
37
Facts
Non-Additive
Cannot be summed across any dimension
All ratios are non-additive
Break down to fully additive components, store
them in fact table
38
Factless Fact Table
A fact table with no measures in it
Nothing to measure...
…Except the convergence of dimensional
attributes
Sometimes store a “1” for convenience
Examples: Attendance, Customer
Assignments, Coverage
39
Dimension Table
Details
40
Example Dimension Tables
Time
Model time_key
model_key year
quarter
brand month
category date
line
model
Dealer
dealer_key
region
state
city
dealer
41
Dimension Tables
Characteristics
Hold the dimensional attributes
Usually have a large number of attributes (“wide”)
Add flags and indicators that make it easy to
perform specific types of reports
Have small number of rows in comparison to fact
tables (most of the time)
42
Don’t Normalize Dimensions
Saves very little space
Impacts performance
Can confuse matters when multiple
hierarchies exist
A star schema with normalized dimensions is
called a "snowflake schema"
Usually advocated by software vendors whose
product require snowflake for performance
43
Slowly Changing Dimensions
Dimension source data may change over time
Relative to fact tables, dimension records
change slowly
Allows dimensions to have multiple 'profiles'
over time to maintain history
Each profile is a separate record in a
dimension table
44
Slowly Changing Dimension
Example
Example: A woman gets married
Possible changes to customer dimension
• Last Name
• Marriage Status
• Address
• Household Income
Existing facts need to remain associated with her
single profile
New facts need to be associated with her married
profile
45
Slowly Changing Dimension
Types
Three types of slowly changing dimensions
Type 1
• Updates existing record with modifications
• Does not maintain history
Type 2
• Adds new record
• Does maintain history
• Maintains old record
Type 3:
• Keep old and new values in the existing row
• Requires a design change
46
Designing Loads to Handle SCD
Design and implementation guidelines
Gather SCD requirements when designing data
mapping and loading
SCD needs to be defined and implemented at the
dimensional attribute level
Each column in a dimension table needs to be
identified as a Type 1 or a Type 2 SCD
If one Type 1 column changes, then all Type 1
columns will be updated
If one Type 2 column changes, then a new record
will be inserted into the dimension table
47
Designing Loads to Handle SCD
Design and implementation guidelines
For large dimension tables, change data capture
techniques may be used to minimize the data
volume
For smaller dimension tables, compare all OLTP
records with dimension table records
Balance data volume with change data capture
logic complexities
48
Degenerate Dimensions
Dimensions with no other place to go
Stored in the fact table
Are not facts
Common examples include invoice numbers
or order numbers
49
Dimensional Design
Process
Project Context
50
Data Mart Development
Dimensional modeling is a critical part of the
data mart development effort
Development Deployment
Design Phase
Phase Phase
51
Data Mart Development
Design phase
Determine requirements and design schema
Development phase
Iterative build and feedback
Deployment phase
Automate load, document, train users
52
Project Deliverables
Design Deployment
Project definition Automation
document Documentation
Project plan Training materials
Schema design
Mapping document
Report design
Development
Populated data mart
Load routines
(Sagent “Plans”)
Query and reporting
53 environment
Project Approach
The dimensional model is developed during
the design stage
Scope of the project has already been
determined
Development Deployment
Design Phase
Phase Phase
54
Design Stage Activities
Gather requirements through requirements
workshops
Develop star schema
Conduct design review
Development Deployment
Design Phase
Phase Phase
55
Gather Requirements
Requirements definition
User workshops
Spreadsheets
Sample reports
56
Design Deliverables
Deliverables
The star schema itself
Load mapping document
57
Notation
No recognized standard
ER semantics unnecessary
Clarity is the only characteristic that really
matters
58
Design Naming Standards
Responsibility of data administration
Extended to the data warehouse
Important to start early in the project
Suggested conventions
Fact tables
Dimension tables
Aggregate tables
Keys
59
Data Element Definitions
Clear descriptions
Facts
Calculated formulae
Dimensional attributes
Multiple meanings/synonymous terms
Aliases
60
Data Element Instances
Example of Data
As it will exist in the warehouse
After decoding
Adds to model understanding
Removes ambiguity/uncertainty
61
Data Element Mapping
62
Data Transformation
63
Aggregates Schemas
64
Aggregate Designs
Aggregates
Pre-stored fact summaries
Along one or more dimensions
The most effective tool for improving performance
Examples
Summary of sales by region, by product, by
category
Monthly sales
65
Aggregate Background
Aggregate rationale
Improve end user query performance
Reduce required CPU cycles
Powerful cost saving tool
Restrictions
Additive facts only
Must use dimensional design
66
Aggregate Guidelines
67
Aggregate Types
Level field
Separate fact tables
68
Aggregate Types
Level field
Old technique
Requires “level” attribute in appropriate dimensions
Aggregates and base-level facts stored in same
table
Same number of total fact records as separate
table approach
Drawbacks
Every query must constrain on the level field
Possibility of double counting
69
Aggregate Types
Separate Tables
Separate fact table for every aggregate
Separate dimension table for every aggregate
dimension
Same number of fact records as level field tables
Advantage
Removes possibility of double counting
Schema clarity
Caveat
Requires software with aggregate navigation
capability
70
Aggregate Pitfalls
Sparsity failure
Term used to describe the result of building too
many aggregate fact that do not summarize
enough rows.
When Sparsity failure occurs, a relatively small
star schema can grow (in terms of disk size)
thousands of times.
Sparsity failure = aggregate explosion
71
Aggregate Design Guidelines
Rule of twenty
To avoid aggregate explosion
Make sure each aggregate record summarizes 20
or more lower-level records
Remember
Total number of possible fact tables in any given
dimensional model = cartesian product of all
levels in all the dimensions
72
Hierarchies & Aggregate Design
Hierarchy diagram
Helps visualize Time
options for building 5 years Year (1)
aggregates
Adding cardinalities 20 quarters Quarter (4)
insures following the
rule of 20
60 months Month (12)
Not required to build
initial star schema
1825 days Date (365)
73
Aggregate Navigation
Description
Function provided by software layer: Aggregate
Navigator
Directs user queries to the most favorable
available aggregate
Transparent to the end user
74
Aggregate Framework
Business View
Designer View
75
Aggregate Deployment
Incremental
Based on usage
Transparent to users
Typically warehouse DBA responsibility
76
Aggregate Deployment
78
Multiple Fact Tables
Different business processes usually require
different fact tables
There are also several cases where a single
business process will require multiple fact
tables
Core and custom
Snapshot and transaction
Coverage
Aggregates
79
Different Business Processes
Different business processes usually require
different fact tables
In practice, it may be hard to identify what a
“process” is
Sometimes you can spot different processes
because measures are recorded
With different dimensions
At differing grains
80
Different Dimensions or Grain
81
Different Points in Time
Sometimes, it is not easy to identify the
discrete business processes
All measures may have the same
dimensionality or grain
Different measures are recorded at different
times
Quantity sold is not recorded at the same time as
quantity shipped
82
Different Timing
Building a single fact table would require
recording zero or null for measures that are
not applicable at a point in time
Reports would contain a confusing
combination of zeros, nulls, and absence of
data
83
Identifying Different Processes
84
Design Tools for Multiple Tables
Create a set of matrices
Facts vs dimension
Facts vs dimensional attributes
Mark where facts apply to dimensions
Mark where facts apply to dimensional
attributes
When facts don't apply, assume separate fact
table
85
Multiple Fact Table Summary
Different processes need different tables
Identified with
Grain
Dimensionality
Timing
Same process may need multiple fact tables
Heterogeneous attributes
Coverage
Snapshot and transaction
Aggregates
86
Architected Data Marts
87
Data Mart
Meaning of the term 'data mart' has shifted
over the last several years...
88
Data Mart Architecture 1993
E.T.L. E.T.L.
Query &
Software Software Reporting
Software
Operational Data
Data Marts Analysis Users
Systems Warehouse
89
Data Mart Architecture 1997
Query &
E.T.L
Reporting
Software
Software
Data Mart
Operational
Analysis Users
Systems Data Warehouse
91
Data Mart
92
“Stovepipe” Data Marts
Time
(Day)
Store Sales
Facts
“Stovepipe” data
marts Product
Dimensions not
conformed
93
Conformed Dimensions
Definition
Dimensions are conformed when they are the
same
-or-
When one dimension is a strict rollup of another
94
Conformed Dimensions
Same dimensions must:
95
Conformed Dimensions
Rolled up dimension
When one dimension is a strict rollup of another
Which means
Two conformed dimensions can be combined into
a single logical dimension by creating a union of
the attributes
96
Conformed Dimensions
Description
Shared common dimensions
Integrates logical design
Ensures consistency between data marts
Allows incremental development
Independent of physical location
Some re-work may be required
97
Conformed Dimensions
Advantages
Enables an incremental development approach
Easier and cheaper to maintain
Drastically reduces extraction and loading
complexity
Answers business questions that cross data marts
Supports both centralized and distributed
architectures
98
Interlocking Star Schemas
Time
Store Dimension
Dimension Sales Shipment
Facts Facts
Product Warehous
Dimension e
Dimension
Inventory
Facts
Month
Dimension
99
Conformed Dimensions
Kimball’s Data Warehouse Bus
Sales Shipment Inventory
Facts Facts Facts