Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
The “Classic” Star Schema
A single fact table, with
Store Dimension Fact Table
STORE KEY
Time Dimension
detail and summary data
STORE KEY PERIOD KEY
Store Description PRODUCTKEY
City PERIOD KEY Period Desc
Year
Fact table primary key has
State
District ID
District Desc.
Dollars
Units
Quarter
Month
only one key column per
Price
Region_ID
Region Desc.
Day
Current Flag
dimension
Product Dimension
Regional Mgr. Resolution
Each key is generated
Level PRODUCTKEY Sequence
Product Desc.
Brand
Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low
maintenance, very simple metadata
Drawbacks: Summary data in the fact table yields poorer performance for summary
levels, huge dimension tables a problem
Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
Store Dimension The “Classic” Star
Fact Table
STORE KEY
Schema
The biggest drawback: dimension
Time Dimension
STORE KEY
Store Description PRODUCTKEY
PERIOD KEY tables must carry a “level”
Period Desc
City
State
PERIOD KEY
Year indicator for every record and
Dollars
District ID
District Desc.
Units
Quarter
Month
every query must use it. In the
Price
Region_ID
Region Desc.
Day
Current Flag
example below, without the level
Product Dimension
Regional Mgr.
Level PRODUCTKEY
Resolution constraint, keys for all stores in the
Sequence
Product Desc. NORTH region, including
Brand
Color aggregates for region and district
Size
Manufacturer will be pulled from the fact table,
Level
resulting in error.
Sudarshan
The “Fact Constellation” Schema
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PERIOD KEY
Store Description PRODUCTKEY
City PERIOD KEY Period Desc
State Year
Dollars Quarter
District ID
Units
District Desc. Month
Price
Region_ID Day
Region Desc. Current Flag
Regional Mgr.
Product Dimension
Sequence
PRODUCTKEY
Product Desc.
Brand District Fact Table
Color
Region Fact Table
Size District_ID
Manufacturer PRODUCT_KEY Region_ID
PRODUCT_KEY
PERIOD_KEY
PERIOD_KEY
Dollars Dollars
Units Units
Price Price
Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
The “Fact Constellation” Schema
Store Dimension
STORE KEY
Fact Table
STORE KEY
Time Dimension
In the Fact Constellations,
PERIOD KEY
Store Description
City
PRODUCTKEY
PERIOD KEY Period Desc aggregate tables are created
State
District ID
Dollars
Units
Year
Quarter separately from the detail,
District Desc. Month
Region_ID
Region Desc.
Price
Day
Current Flag
therefor
Product Dimension
Regional Mgr.
PRODUCTKEY
Sequence it is impossible to pick up, for
Product Desc.
Brand
Color
Dis tric t Fact Table example, Store detail when
querying
Re g io n Fac t Table
Size District_ID
Manufacturer PRODUCT_KEY Region_ID
Major Advantage: No need for the “Level” indicator in the dimension tables,
since no aggregated data is stored with lower-level detail
Disadvantage: Dimension tables are still very large in some cases, which can
slow performance; front-end must be able to detect existence of aggregate
facts, which requires more extensive metadata
Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
Another Alternative to “Level”
Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
The “Snowflake” Schema
Store Dimens ion No LEVEL in dimension tables
STORE KEY Dis trict_ID Region_ID
Store Des criptio n
City
Dis trict Des c .
Re gion_ID
Reg ion Des c.
Reg ional Mgr.
Dimension tables are normalized by
State
Dis trict ID
decomposing at the attribute level
Dis trict Des c .
Re gion_ID
Re gion Des c.
Each dimension table has one key for
Store Fact Table
Re gional Mg r.
Dis tric t Fact Table
District_ID
Re gionFac t Table
Region_ID each level of the dimension’s
STORE KEY PRODUCT_KEY
PRODUCT KEY
PRODUCT_KEY
PERIOD_KEY
PERIOD_KEY
Do llars
hierarchy
PERIOD KEY Do llars Units
Price
Dollars
Units
Pric e The lowest level key joins the
Units
Price dimension table to both the fact table
and the lower level attribute table
How does it work? The best way is for the query to be built by understanding
which summary levels exist, and finding the proper snowflaked attribute
tables, constraining there for keys, then select’ing from the fact table.
Sudarshan
Copyright © 1995-1996 Archer Decision Sciences, Inc.
The “Snowflake” Schema
Store Dimens ion Additional features: The original Store
STORE KEY Dis trict_ID Region_ID
Store Des criptio n Dis trict Des c . Reg ion Des c.
Dimension table, completely de-
City
State
Re gion_ID Reg ional Mgr. normalized, is kept intact, since
Dis trict ID
Dis trict Des c .
certain queries can benefit by its all-
Re gion_ID
Re gion Des c.
encompassing content.
Re gional Mg r.
Store Fact Table Dis tric t Fact Table Re gionFac t Table
CityName Date
Salesperson
Quantity City
SalespersonID
SalespersonName Total Price
CityName
City
State
Quota
Country
Sudarshan
Star Schema
• A single fact table and a single table for each
dimension
• Every fact points to one tuple in each of the
dimensions and has additional attributes
• Does not capture hierarchies directly
• Straightforward means of capturing a multiple
dimension data model using relations
Sudarshan
Example of a Snowflake Schema
Order
Product
Order No Category
ProductNO
Order Date ProdName CategoryName
Fact Table
ProdDescr CategoryDescr
Customer
Category
OrderNO
Customer No Category
Customer Name SalespersonID
UnitPrice
Customer CustomerNO
Address Date
ProdNo Month
City DateKey
DateKey Month
Date
Salesperson CityName Year
Year
Month Year
SalespersonID Quantity City
SalespersonName State
Total Price CityName
City StateName
State
Quota Country
Country
Sudarshan
Snowflake Schema
• Represent dimensional hierarchy directly
by normalizing the dimension tables
• Easy to maintain
• Saves storage, but may reduce
effectiveness of browsing (Kimball)
Sudarshan
Fact Constellation
Sales Shipping
Fact Table Fact Table
Store Key Product Dimension
Shipper Key
Product Key Product Key Store Key
Period Key Product Desc Product Key
Units
Period Key
Price
Units
Price
Store Dimension
Store Key
Store Name
City
State
Region
Sudarshan
Fact Constellation
• Multiple fact tables share dimension tables.
• This schema is viewed as collection of
stars hence called galaxy schema or fact
constellation.
• Sophisticated applications require such
schema.
Sudarshan
Data Warehouse vs. Data Marts
• Enterprise warehouse: collects all
information about subjects (customers,
products, sales, assets, personnel) that span
the entire organization.
– Requires extensive business modeling
– May take years to design and build
• Data Marts: departmental subsets that focus
on selected subjects: Marketing data mart:
customer, products, sales.
– Faster roll out, but complex integration in
the long run.
Sudarshan
Extraction, Transformation, &
Load (ETL)
ETL is a set of tools and techniques
used to populate a data warehouse
Extraction
Extract data from sources (e.g., operational
DBMSs, file systems, Web pages)
Transformation
Clean data
Convert from legacy/host format to
warehouse format (e.g., convert “surname”
to “last name”)
Sudarshan
Extraction, Transformation, &
Load (ETL)
Load
Sort, summarize, consolidate, compute views, check
integrity, build indexes, partition
Huge volumes of data to be loaded, yet small time window
(usually at night) when the warehouse can be taken off-line
Techniques: batch, sequential load often too slow;
incremental, parallel loading techniques may be used
Refresh
Propagate updates from sources to the warehouse
When to refresh - on every update, periodically (e.g., every
24 hours), or after “significant” events
How to refresh – full extract from base tables vs. incremental
techniques
Sudarshan
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
Sudarshan
CSP002N-week2 11
Conceptual Modeling of Data Warehouses
CSP002N-week2 15
Sudarshan
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
CSP002N-week2 16
Sudarshan
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type dollars_sold
city_key
avg_sales city
province_or_street
Measures country
CSP002N-week2 17
Sudarshan
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location
Sudarshan