Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1. Methodological Framework
Conceptual Design & Logical Design
Top-Down Versus Botton-Up Approach
(4)
Bernard ESPINASSE
Professeur Aix-Marseille Universit (AMU)
Ecole Polytechnique Universitaire de Marseille
Fact schema
Dimension hierarchies
Additive, semi-additive and non-additive attributes
November 5, 2013
Methodological Framework
Conceptual Modelling: the Dimensionnal Fact Model (DFM)
Conceptual Design: from Relational schema to DFM
Books
Golfarelli M., Rizzi S., Data Warehouse Design : Modern Principles and
Methodologies, McGrawHill, 2009.
Kimball R., Ross, M., Entrepts de donnes : guide pratique de
modlisation dimensionnelle, 2dition, Ed. Vuibert, 2003.
S. Rizzi. Conceptual modeling solutions for the data warehouse. In Data
Warehousing and Mining: Concepts, Methodologies, Tools, and
Applications, J. Wang (Ed.), Information Science Reference, pp. 208-227,
2008.
M. Golfarelli, D. Maio, S. Rizzi. Conceptual Design of Data Warehouses
from E/R Schemes. Proceedings 31st Hawaii International Conference
on System Sciences (HICSS-31), vol. VII, Kona, Hawaii, pp. 334-343,
1998.
Courses
Course of M. Golfarelli M. and S. Rizzi, University of Bologna
Courses of M. Bhlen and J. Gamper J., Free University of Bolzano
A better approach:
! 1) design first a conceptual model : Conceptual Design
! 2) which is then translated into a logical model : Logical
Design
Appl.
Appl.
DB2
DB3
Appl.
DB1
DB4
Botton-Up Approach
Data Marts
DM3
DM2
DM1
Appl.
Appl.
Appl.
Mixed Approach
Top-Down Approach
DW
Analyze global business needs, plan how to develop a DW, design it,
and implement it as a whole with its DMs
Bottom-Up Approach:
Trans..
! Mixed strategy
Top-Down Approach:
1. Design of DW
2. Design of DMs
Existing databases
and systems (OLTP)
Appl.
! Botton-Up strategy
Appl.
! Top-Down strategy
1. Design of DMs
2. Integration of DMs in DW
3. Maybe no physical DW
Mixed Approach:
1. Design of DW for
DM1
2. Design of DM2 and
integration with DW
3. Design of DM3 and
integration with DW
4. ...
7
(+) Stengths:
! Promising: it is based on a global picture of the goal to achieve,
and in principle it ensures consistent, well integrated DW
(-) Weakness:
! High-cost estimates with long-term implementations discourage
company managers from embarking on these kind of projects.
! Analyzing and integrating all relevant sources at the same time is
a very difficult task: they are all available and stable at the same
time.
! Extremely difficult to forecast the specific needs of every
department involved in a project, which leads to specific DMs
! As no working DW system is going to be delivered in the
short term, users cannot check for this project to be useful, so
they lose trust and interest in it.
Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design
Phase 1: Goal
setting
and planning
Phase 2:
Infrastructure Design
10
!"#"$%"&#$'()*+,$-.")()
Top-Down and Bottom-Up strategies should be mixed :
db administrator
!"#$%&'()(*+,-,'
().'-)/&0$(/-")
1&2#-$&3&)/
()(*+,-,
4")%&5/#(*
.&,-0)
designer
business user
6"$7*"(.
().'.(/('8"*#3&
9"0-%(*
.&,-0)
:;9
.&,-0)
<=+,-%(*
.&,-0)
11
40
12
E/R
Scheme
Physical
Scheme
Logical
Scheme
Conceptual
Scheme
Relational
Scheme
chiave negozio negozio citt
N1
N2
N1
N1
P1
10
Fact schema
Dimension hierarchies
Fact schema and fact instances
Additive attributes
Semi-additive and non-additive attributes
Overlapping compatible fact schemata
Representing query patterns on a fact schema
1000000 2
P2
1200000 8
T1
N2
P5
15
1500000 5
..
Facts
CONCEPTUAL
DESIGN
Preliminary
workload
Workload
LOGICAL
DESIGN
Target
logical
model
PHYSICAL
DESIGN
Workload
Target
DBMS
13
14
52
Conceptual Design is based on the documentation of the underlying
operational information system (IS):
The Dimensional Fact Model (DFM) has be proposed by Golfarelli M., Rizzi S. to
support a Conceptual Design of DW
! Relational schemata or
! E/R schemata
Steps:
1. Find facts
2. For each fact:
a) Navigate functional dependencies
b) Drop useless attributes
15
16
dimensions
fact
product
SALE
quantity
receips
unitPrice
numberOfCustomer
date
store
17
date
quantity
receips
unitPrice
numberOfCustomer
store
measures
18
In dimension hierarchies :
nodes represented by circles are dimension attributes which may
assume a discrete set of values.
department
size
hierarchies
product
SALE
measures
non dimension
attribute
dimensions
category
marketingGroup
brandCity
type
brand
product
Ex: product -> type; type -> category; category -> department
dimensions
fact
day
holiday
year
quarter
month
date
quantity
receips
unitPrice
numberOfCustomer
salesManager
salesDistrict
SALE
store
storeCity
state
country
week
19
20
size
category
marketingGroup
non dimension
attribute
size
brandCity
brandCity
category
marketingGroup
type
optional arc
brand
type
brand
product
product
diet
salesManager
day
holiday
salesManager
day
holiday
year
quarter
month
date
quantity
receips
unitPrice
numberOfCustomer
salesDistrict
SALE
quantity
receips
unitPrice
numberOfCustomer
year
store
quarter
salesDistrict
SALE
month
date
storeCity
state
country
telephone
address
country
state
discount
endDate
cost
21
department
category
marketingGroup
storeCity
promotion
startDate
week
size
store
week
advertising
22
brandCity
cross-dimensionnal attributes
type
brand
product
diet
holiday
year
quarter
month
salesManager
day
salesManager
day
date
quantity
receips
unitPrice
numberOfCustomer
holiday
store
storeCity
state
year
country
month
date
quantity
receips
unitPrice
numberOfCustomer
store
storeCity
state
country
week
week
quarter
salesDistrict
SALE
salesDistrict
SALE
23
24
hour
CALL
callingNumber
calledNumberDistrict
Ex : in a fact schema modeling the sales of books, whose dimensions are date and
book. It would certainly be interesting to aggregate and select sales on the basis of
book authors.
However, it would not be accurate to model author as a dimensional child attribute
of book because many different authors can write many books. Then, the
relationship between books and authors is modeled as a multiple arc:
number
duration
day
date
calledNumber
calledNumberType
month
genre
holiday
year
SALE
quantity
receips
unitPrice
numberOfCustomer
shared hierarchy
type
calling
hour
CALL
O
telNumber
district
year
quarter
month
number
duration
called
date
book
author
week
date
month
multiple arc
year
roles
25
3 Types of measure :
! Flow measure: refer to time (ex: number of products sold in a day)
! Level measure: evaluated at particular time (ex: number of products in
inventory)
! Unit measure: evaluated at particular time but are expressed in relative terms
(ex: product unit price, discount percentage)
! Suitable operators for aggregation:
Flow measures
Level measures
Unit measures
Temporal hierarchies
SUM, AVG, MIN, MAX
AVG, MIN, MAX
AVG, MIN, MAX
Nontemporal hierarchies
SUM, AVG, MIN, MAX
SUM, AVG, MIN, MAX
AVG, MIN, MAX
Along all the dimensions by default measures are additive (operator SUM)
Non-additive measure can be explicitely specified with its operator(s) used
for aggregation other that SUM (Ex: AVG and MIN for inventory level measure
for time dimension)
department
category
type
weight
brand
packaging
ItemPerPallet
product
! additive along a dimension when can be used the SUM aggregation operator
! non-additive along a dimension if the aggregation operator is not SUM (ex:
inventory level)
! a non-additive measure is non-aggregable if no operator exists (ex: unitPrice
product)
27
address
AVG, MIN
INVENTORY
3 Natures of measure :
26
level
incomingQuantity
year
quarter
month
date
warehouse
city
country
week
28
Queries the user formulates on the DW may require comparing fact attributes
taken from distinct, though related, schemata (drill across in OLAP)
continent
2 fact schemata are said compatible if they share at least one dimension
attribute
job
job
AVG
year
29
year
MAX
MAX
AVG
state
year quarter
MAX
NON-EUROPEAN
EMPLOYEES
city
state
numberOfEmp
AVG
sex
ageRange
30
ALL EMPLOYEES
city
numberOfEmp
maxSalary
numberOfNonEuroEmp
city
H
AVG
store
F and G are compatible, they share the time, job and store dimensions
AVG
numberOfEmp
maxSalary
the set of the fact attributes in H is the union of the sets in F and G
nation
EMPLOYEES
month
MAX
state
On the other hand, the designer must keep in mind that, by adopting
this solution, the time for extracting data by quarter will increase
significantly
31
32
type
product
invoice number
order date
brand
units perseason
pallet
season
warehouse city
brand
diet
qty
stateshipped
.....
deal
state
type
carrier
address
allowance
corporate
address
customer
customer
SHIPMENT
ship to
city
state
ship from
address
address
ship mode
AVG,
MIN
type
brand
invoice
number product
diet
order date
corporate
address
INVENTORY
qty
department
category
SHIPMENT
ship to
year quarter month
datecity
qty shipped
ship from
.....
date
address
season
manager
weight
package size
type
weight
package size
category
weight
package size
package type
product
schema SHIPMENT:
contact person
contact
person
ship
mode
deal
type
terms carrier
address
incentive
allowance
terms
incentive
(a)
(a)
Fact schema overlaping INVENTORY
and SHIPMENT:
category
category
weight
package size
type
brand
product
month
SHIPMENT
!
INVENTORY
qty shipped
inventory qty
AVG, .....
MIN
(b)
(b)
Fig. 8. The SHIPMENT scheme (a) and its overlap with INVENTORY (b).
Fig. 8. The SHIPMENT scheme (a) and its overlap with INVENTORY (b).
The measures in f are the union of those in f' and f". Thus, the fact on which f is
The Conceptual
measures in f are
the union
of those
in
and f". Thus,
which f is embracing both f' and f".
centred
may
bef'considered
as athe
sortfact
of on
"macro-fact"
Bernard ESPINASSE - Data Warehouse
modeling
and
Design
33
34
Note that the step to derive DF schemata from E/R schema is very similar:
the main difference concerns the algorithm used to build the attribute tree
35
36
For each fact defined from F table, the attribute tree is built as follow :
Each node of the attribute tree corresponds to one or more Relational
schema attributes
The table RENTALS is the only candidate for expressing facts, the attribute
tree associated is:
expiry
title
name
cardNumber
(CUSTOMER)
positionOnShelf
(RENTALS)
movieCode
category
telephone
cardNumber
(CARDS)
gender
positionOnShelf
(COPIES)
lengh
address
director
personalDocument
37
time
city
38
name
country
fromAirport
carrier
fromAirport
airline
flightNumber
(FLIGHTS_INSTANCES)
airline
flightNumber
(FLIGHTS)
departureTime
flightNumber
(FLIGHTS)
departureTime
toAirport
name
toAirport
date
name
city
39
name
country
carrier
FLIGHTS
FLIGHT_INSTANCES
TICKETS
CHECK_IN
mainActor
date
country
city
country
40
city
fare
name
country
fare
country
name
checkInTime
carrier
fromAirport
flightNumber
(FLIGHTS)
fromAirport
ticketNumber
(TICKETS)
airline
flightNumber
(FLIGHTS)
ticketNumber
(TICKETS)
airline
flightNumber
(FLIGHTS_INSTANCES)
departureTime
flightNumber
(FLIGHTS_INSTANCES)
departureTime
checkInTime
carrier
ticketNumber
(CHECK_IN)
numberOfBags
date
passagerLastName
passagerFirstName
passagerLastName
date
numberOfBags
passengerGender
toAirport
passengerGender
toAirport
ticketNumber
(CHECK_IN)
passagerFirstName
name
city
name
city
country
country
Facts TICKETS and CHECK_IN are the best choices because existing functional
dependencies permit to include a maximum of attributs in trees 3 and 4.
41
42
43
44
.
1
date
sales
manager
sales
manager
date
address
address
quantity
dimensions
brandCity
ticket
number
store
city
ticket
number
state
salesManager
brand
department
store
salesDistrict
sale
store
category
product
sales
manager
date
storeCity
state
country
type
address
marketingGroup
unitPrice
date
phone
address
month
quarter
45
salesManager
brand
department
salesDistrict
sale
store
category
product
storeCity
state
country
type
marketingGroup
unitPrice
date
phone
address
month
quarter
46
quantity
brandCity
Granularity of data :
year
store
year
47
48
expiry
title
name
cardNumber
(CUSTOMER)
positionOnShelf
(RENTALS)
movieCode
category
telephone
cardNumber
(CARDS)
gender
positionOnShelf
(COPIES)
lengh
address
director
personalDocument
gender
date
customer
time
positionOnShelf
(RENTALS)
mainActor
title
category
lengh
director
date
mainActor
49
50
gender
customer
positionOnShelf
(RENTALS)
title
category
lengh
director
date
mainActor
ON R.positionOnShelf = C.positionOnShelf,
dimensions
date
gender
customer
RENTAL
title
ON R.cardNumber = C.cardNumber
category
number
measure
lengh
director
mainActor
51
52
city
After
name
fare
country
country
checkInTime
carrier
carrier
fare
city
fromAirport
fromAirport
flightNumber
(FLIGHTS)
ticketNumber
(TICKETS)
airline
flightNumber
(FLIGHTS_INSTANCES)
departureTime
ticketNumber
(TICKETS)
airline
ticketNumber
(CHECK_IN)
flightNumber
departureTime
passengerGender
toAirport
seat
toAirport
passagerLastName
date
numberOfBags
numberOfBags
date
passagerFirstName
check-in
passengerGender
city
name
city
country
country
53
54
date
check-in
check-in
country
carrier
country
fare
city
country
city
fromAirport
ticketNumber
(TICKETS)
airline
city
Airport
numberOfBags
from
to
flightNumber
flightNumber
departureTime
seat
numberOfFlights
numberOfBags
receipts
airline
toAirport
date
city
country
check-in
passengerGender
Airport
TICKET ISSUE
from
to
TICKET ISSUE
flightNumber
numberOfFlights
numberOfBags
receipts
airline
passengerGender
departureTime
passengerGender
departureTime
arrivalTime
carrier
arrivalTime
55
carrier
56
3#$%&5-
;&,&.
The check-in dimension was left out to avoid making the query too complex.
<(*=$<
57
58
2(*.%).+1,&.>+(*=$<',&.
05#<#99
!"#$%&
',&.
'()*#$%&
3#$%&5;&,&.
!9.,
;#$%&96&,&.
3(&;(&-
7$.1
899(*.:,".
65,%0
>$.:
05#<#99 /(1.)
=9,%0
<(*=$<
/#0.1
2(*.%).+1,&.
3,&.4#5-
09#A#@@ /(:.)
A(*B$A
/#0.:
?@:A!2;
;,9
1234!56
;,&.<#9-
<(*=$<
5.4()&5,&(#%
?@@(*.
05#<#99
',&.
'()*#$%&
9.<()&9,&(#%
2(*.%).+1,&.>+(*=$<',&.
!"#$%&
!"#$%&
',&.
/#%&7
+,-".%&/#0.
'$9,&(#%
'()*#$%&
+,-".%&/#0.
8.,9
!"#$%&#&'($&'#)$$*(+,$&#$*&&-#$%&#'*./0.11#')$&#(2#/*,"&'#)"'#*&/3)4&'#+5#)#!"#$%&'(#)$$*(+,$.6/,$&'#)2
$%&#",6+&*#.1#')52#+&$7&&"#$%&#'*./0.11#)"'#$%&#/(480,/#')$&29
Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design
!9.,
;#$%&96&,&.
59
60
:.%"
9'"'%
/7'#
34%1
56678%
2."&*
<78=4<
()*%1
*.)<)66
>?@A:B
/".
:$)4&'
+7;8)4&'
+4."'7)&
(71%;
/"'%0).#
.%07;'."'7)&
<78=4<
!"#$%&'()*%
+"'%
()&',
-%".
!!!"#$%&'#()*+
61
!"#$%&&'()*&#$&'##$+(&"$',,&$(-$!"#$%$(.$/#0(1&#/$)#2,+3
A0B1)7;'+)
&*.8063(
&).87*2(*03
&'-.
=6785'.)$*+)
=7047)..*?);6+@)7
$-2)/01)
=0*3(.
/0.(
!)34(5
$-2)
:,,*8)/01)>=6785'.)&'()>=7047)..*?);6+@)7
$*+)
26785'.)
&67'(*03
&'()
!"#$%
=0*3(.
!*,(%-.()+/01)
26785'.)
$*89)(/01)
/0.(
:,,*8)/01)
<117)..
;'+)
4,&#$&"%&5$(-$&"#$#/(&#/$&'##5$&"#$6*-1&(,-%2$/#0#-/#-17$6',8$!&'(%)*(+,-./+$&,$0.&1(*$"%.$)##'#8,9#/$&,$8%:#$0.&1(*$%$1"(2/$,6$&"#$',,&5$.,$&"%&$(&$1%-$)#$1",.#-$%.$%$8#%.*'#3$!"#$.%8#$(.$/,-#$6,'
2345(&.13$!"#$&(1:#&$%-/$&"#$.:(0%..$;'%-*2%'(&(#.$%'#$'#8,9#/5$%-/$%$%6&75**84$&96+($62%;$(.$%//#/$&,