Sei sulla pagina 1di 16

Data Warehouse/Data Mart

Conceptual Modeling and Design

1. Methodological Framework
Conceptual Design & Logical Design
Top-Down Versus Botton-Up Approach

(4)

Design Phases and schemata derivations


2. Conceptual Modelling: The Dimensionnal Fact Model (DFM)

Bernard ESPINASSE
Professeur Aix-Marseille Universit (AMU)
Ecole Polytechnique Universitaire de Marseille

Fact schema
Dimension hierarchies
Additive, semi-additive and non-additive attributes

November 5, 2013

Overlapping compatible fact schemata


Representing query patterns on a fact schema

Methodological Framework
Conceptual Modelling: the Dimensionnal Fact Model (DFM)
Conceptual Design: from Relational schema to DFM

3. Conceptual Design : From Relationnal schema to DFM of Data Mart


Finding and defining facts from Relational schema
Building the Attribute Tree from Relational schema
Building the Fact Schema from Attribute Tree

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Books
Golfarelli M., Rizzi S., Data Warehouse Design : Modern Principles and
Methodologies, McGrawHill, 2009.
Kimball R., Ross, M., Entrepts de donnes : guide pratique de
modlisation dimensionnelle, 2dition, Ed. Vuibert, 2003.
S. Rizzi. Conceptual modeling solutions for the data warehouse. In Data
Warehousing and Mining: Concepts, Methodologies, Tools, and
Applications, J. Wang (Ed.), Information Science Reference, pp. 208-227,
2008.
M. Golfarelli, D. Maio, S. Rizzi. Conceptual Design of Data Warehouses
from E/R Schemes. Proceedings 31st Hawaii International Conference
on System Sciences (HICSS-31), vol. VII, Kona, Hawaii, pp. 334-343,
1998.

Conceptual Design & Logical Design


Life-Cycle
Top-Down, Botton-Up and Mixed Strategies
Design Phases
Schemata derivations for DMs design

Courses
Course of M. Golfarelli M. and S. Rizzi, University of Bologna
Courses of M. Bhlen and J. Gamper J., Free University of Bolzano

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Entite-Relation models are not very useful in modeling DWs

Building a DW is a very complex task, which requires an accurate


planning aimed at devising satisfactory answers to organizational
and architectural questions

DW is conceptualy based on a multidimensional view of data :


! But there is still no agreement on HOW to develop its
conceptual design !

A large number of organizations lack experience and skills that are


required to meet the challenges involved in DW projects

Most of the time, DW design is at the logical level : a


multidimensional model (star/snowflake schema) is directly designed :
! But a star/snowflake schema is nothing but a relational
schema
! it contains only the definition of a set of relations and
integrity constraints !

Major cause of DW failures lies in the absence of a global view of


the design process, of a design methodology
Design Methodologies are necessary to minimizing the risks for
failure
Tree main strategies for DW design:

A better approach:
! 1) design first a conceptual model : Conceptual Design
! 2) which is then translated into a logical model : Logical
Design

Appl.

Appl.

DB2

DB3

Appl.

DB1

DB4

Botton-Up Approach

Data Marts

DM3

DM2

DM1

Appl.

Appl.

Appl.

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Mixed Approach

Top-Down Approach

DW

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Analyze global business needs, plan how to develop a DW, design it,
and implement it as a whole with its DMs

Bottom-Up Approach:

Trans..

Global Data Warehouse

! Mixed strategy

Top-Down Approach:
1. Design of DW
2. Design of DMs

Existing databases
and systems (OLTP)
Appl.

! Botton-Up strategy

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Appl.

! Top-Down strategy

1. Design of DMs
2. Integration of DMs in DW
3. Maybe no physical DW

Mixed Approach:
1. Design of DW for
DM1
2. Design of DM2 and
integration with DW
3. Design of DM3 and
integration with DW
4. ...
7

(+) Stengths:
! Promising: it is based on a global picture of the goal to achieve,
and in principle it ensures consistent, well integrated DW
(-) Weakness:
! High-cost estimates with long-term implementations discourage
company managers from embarking on these kind of projects.
! Analyzing and integrating all relevant sources at the same time is
a very difficult task: they are all available and stable at the same
time.
! Extremely difficult to forecast the specific needs of every
department involved in a project, which leads to specific DMs
! As no working DW system is going to be delivered in the
short term, users cannot check for this project to be useful, so
they lose trust and interest in it.
Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Phase 1: Goal
setting
and planning

Phase 2:
Infrastructure Design

Phase 3: Design and


developpement of
Data Marts

Phase 1 : Goal setting and planning of the DW


set system goals, borders, and size
select an approach for design and implementation
estimate costs and benefits
analyze risks and expectations
examine the skills of the working team
Phase 2 : Infrastructure design of the DW
analyze and compare the possible architectural
solutions
assess the available technologies and tools
create a preliminary plan of the whole DW
system
Phase 3 : Design and development of DMs
Every iteration causes a new DM and new
applications to be created and progressively
added to the DW system

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

DW is incrementally built and several DM are iteratively created


Each DM is based on a set of facts that are linked to a specific
department and that can be interesting for a user group
(+) Stengths:
! Leads to concrete results in a short time
! Does not require huge investments
! Enables designers to investigate one area at a time
Gives managers a quick feedback about the actual benefits of the
system being built
(-) Weakness:
Keeps the interest for the project constantly high may determine a
partial vision of the business domain.
=> Mixed strategy

10

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

!"#"$%"&#$'()*+,$-.")()
Top-Down and Bottom-Up strategies should be mixed :

Each Data Mart (DM) will be designed according these steps:

When planning a DW, a bottom-up strategy should be followed


One Data Mart (DM) at a time is identified and prototyped
according to a top-down strategy by building a conceptual schema
for each fact of interest
The first DM (DM1) to prototype :
! is the one playing the most strategic role for the enterprise
! should be a backbone for the whole DW
! should lean on available and consistent data sources

db administrator

!"#$%&'()(*+,-,'
().'-)/&0$(/-")
1&2#-$&3&)/
()(*+,-,
4")%&5/#(*
.&,-0)

designer
business user

6"$7*"(.
().'.(/('8"*#3&
9"0-%(*
.&,-0)
:;9
.&,-0)
<=+,-%(*
.&,-0)

J. Gamper, Free University of Bolzano, DWDM 2012/13

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

11

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

40

12

E/R
Scheme

Physical
Scheme

Logical
Scheme

Conceptual
Scheme

Relational
Scheme
chiave negozio negozio citt

regione indirizzo resp.vendite

N1

N2

chiavetempo chiave negozio chiave_prodotto quantvenduta incasso num_clienti


T1
T1

N1
N1

P1

10

Fact schema
Dimension hierarchies
Fact schema and fact instances
Additive attributes
Semi-additive and non-additive attributes
Overlapping compatible fact schemata
Representing query patterns on a fact schema

1000000 2

P2

1200000 8

T1

N2

P5

15

1500000 5

..

Facts

CONCEPTUAL
DESIGN

Preliminary
workload

Workload

LOGICAL
DESIGN

Target
logical
model

PHYSICAL
DESIGN

Workload

Target
DBMS

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

13

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

14

52
Conceptual Design is based on the documentation of the underlying
operational information system (IS):

The Dimensional Fact Model (DFM) has be proposed by Golfarelli M., Rizzi S. to
support a Conceptual Design of DW

! Relational schemata or

The DFM is a graphical conceptual model for Data Mart design

! E/R schemata

The aim of the DFM is to :


1. Provide an efficient support to Conceptual Design
2. Create an environment in which user queries may be formulated
intuitively
3. Make communication possible between designers and end users
with the goal of formalizing requirement specifications
4. Build a stable platform for logical design (independently of the target
logical model)
5. Provide clear and expressive design documentation

Steps:
1. Find facts
2. For each fact:
a) Navigate functional dependencies
b) Drop useless attributes

The conceptual representation generated by the DFM consists of a set of fact


schemata that basically model facts, measures, dimensions, and hierarchies.

c) Define dimensions and measures

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

15

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

16

Ex : a simple 3-dimensional fact schema SALE for a chain of stores :

A fact is a concept relevant to decision-making processes :


It models a set of events (ex: in a compagny: sales, shipments, purchases, ...)
It has dynamic properties or evolve in some way over time
It has one or more numeric and continuously valued attributes which
"measure" the fact from different points of view

dimensions

fact

product

a measure is a numerical property of


a fact and describes a quantitative fact
aspect that is relevant to analysis :
fact
Ex : every sale is quantified by its
quantity, receips, unitPrice,
numberOfCustomer

SALE
quantity
receips
unitPrice
numberOfCustomer

date

store

a dimension is a fact property with a


finite domain and describes an
analysis axes of the fact : Ex : typical
dimensions for the sales fact are
product, store, and date

A fact schema is structured as a tree whose root is a fact


A Conceptual Model of a DW consists of a set of fact schemata

17

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Hierarchy determines how fact instances may be aggregated and selected


significantly for the decision-making process and determines the granularity
adopted for representing facts.
Hierarchies are subtrees rooted in dimensions:

date

quantity
receips
unitPrice
numberOfCustomer

store

measures

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

18

In dimension hierarchies :
nodes represented by circles are dimension attributes which may
assume a discrete set of values.

department

size

hierarchies

product

SALE

measures

non dimension
attribute

dimensions

Ex : week, month, product,

category

marketingGroup

brandCity

arcs represent relationships between pairs of attributes: these


relationships are functional dependencies:

type
brand
product

Ex: product -> type; type -> category; category -> department
dimensions
fact

day
holiday

year

quarter

month

date

quantity
receips
unitPrice
numberOfCustomer

dimension attributes in the nodes along each sub-path of the hierarchy


starting from the dimension define progressive granularities.

salesManager
salesDistrict

SALE

store

storeCity

state

country

week

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

19

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

20

non-dimension attributes contains additional information about an attribute of the


hierarchy: it cannot be used for aggregation ! Ex : size : aggregating sales
according to the size of the product would not make sense!
department

size

category

marketingGroup

non dimension
attribute

Optional arcs (marked by a dash) express optional relationships between pairs of


attributes (useful for logical design) Ex : diet, promotion. The diet attribute takes a
value (such as cholesterol-free, gluten-free, or sugar-free) only for food products;
for the other products, it is undefined.
department

size

brandCity

brandCity

category

marketingGroup

type

optional arc

brand

type
brand

product
product

diet
salesManager

day
holiday

salesManager

day
holiday

year

quarter

month

date

quantity
receips
unitPrice
numberOfCustomer

salesDistrict

SALE
quantity
receips
unitPrice
numberOfCustomer

year

store

quarter

salesDistrict

SALE

month

date

storeCity

state

country

telephone
address

country

state

discount

endDate
cost

21

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Cross-dimensional attribute is a dimensionnal or descriptive attribute whose


value is defined by the combination of 2 or more dimensional attributes, possibly
belonging to different hierarchies.
Ex : if a product Value Added Tax (VAT) depends both on the product category and
on the country where the product is sold, you can use a cross-dimensional attribute
to represent it:
VAT

department
category

marketingGroup

storeCity

promotion
startDate

week

size

store

week

advertising

22

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

A convergence takes place when 2 dimensional attributes within a hierarchy are


connected by 2 or more alternative paths of many-to-one associations (Graphically,
use of arrows).
Ex : in store dimension, store are grouped into sales districts and no inclusive
relationship exists between districts and states, but each district is part of only one
country:
Store -> salesDistrict -> country
or
Store -> storeCity -> state -> country
convergence

brandCity

cross-dimensionnal attributes

type
brand
product

diet

holiday

year

quarter

month

salesManager
day

salesManager

day

date

quantity
receips
unitPrice
numberOfCustomer

holiday

store

storeCity

state

year

country

month

date

quantity
receips
unitPrice
numberOfCustomer

store

storeCity

state

country

week

week

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

quarter

salesDistrict

SALE

salesDistrict

SALE

23

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

24

Shared hierarchies exist when entire portion of hierarchies are frequently


replicated 2 or more time in fact schemata
In particular in time hierarchies, 2 or more date-type dimensions with different
meaning can easily exist in a same fact, and need to build a month-year hierarchy
on each one of them
=> an abreviation is introduced
Ex: calling and called phone numbers
callingNumberType
callingNumberDistrict

hour

CALL
callingNumber

calledNumberDistrict

Ex : in a fact schema modeling the sales of books, whose dimensions are date and
book. It would certainly be interesting to aggregate and select sales on the basis of
book authors.
However, it would not be accurate to model author as a dimensional child attribute
of book because many different authors can write many books. Then, the
relationship between books and authors is modeled as a multiple arc:

number
duration

day
date

calledNumber

calledNumberType

Multiple arc models a many-to-many association between the 2 dimensional


attributes it connects (Graphically, denoted by doubling of the arc)

month

genre

holiday

year

SALE
quantity
receips
unitPrice
numberOfCustomer

shared hierarchy
type

calling

hour

CALL
O
telNumber
district

year

quarter

month

number
duration

called

date

book

author

week
date

month

multiple arc

year

roles

25

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

3 Types of measure :
! Flow measure: refer to time (ex: number of products sold in a day)
! Level measure: evaluated at particular time (ex: number of products in
inventory)
! Unit measure: evaluated at particular time but are expressed in relative terms
(ex: product unit price, discount percentage)
! Suitable operators for aggregation:
Flow measures
Level measures
Unit measures

Temporal hierarchies
SUM, AVG, MIN, MAX
AVG, MIN, MAX
AVG, MIN, MAX

Nontemporal hierarchies
SUM, AVG, MIN, MAX
SUM, AVG, MIN, MAX
AVG, MIN, MAX

Along all the dimensions by default measures are additive (operator SUM)
Non-additive measure can be explicitely specified with its operator(s) used
for aggregation other that SUM (Ex: AVG and MIN for inventory level measure
for time dimension)
department
category
type

weight

brand

packaging

ItemPerPallet
product

non additive measure

! additive along a dimension when can be used the SUM aggregation operator
! non-additive along a dimension if the aggregation operator is not SUM (ex:
inventory level)
! a non-additive measure is non-aggregable if no operator exists (ex: unitPrice
product)
27

address

AVG, MIN
INVENTORY

3 Natures of measure :

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

26

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

level
incomingQuantity
year

quarter

month

date

warehouse

city

country

week

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

28

Consider the 2 fact schemata :

Different facts are represented in different fact schemata

F represents all employees of an enterprise


G only the non-European employees.

Queries the user formulates on the DW may require comparing fact attributes
taken from distinct, though related, schemata (drill across in OLAP)

continent

2 fact schemata are said compatible if they share at least one dimension
attribute

job
job

2 compatible schemata F and G may be overlapped to create a resulting schema


H

AVG
year

Without conflict between attribute dependencies in the 2 schemata:

the dimensions in H are the intersection of those in F and G, assuming


that a given dimension is common to F and G if at least one dimension
attribute is shared
each hierarchy in H includes all and only the dimension attributes
included in the corresponding hierarchies of both F and G.

29

year

MAX

MAX
AVG

state

year quarter

MAX

NON-EUROPEAN
EMPLOYEES

city

state

numberOfEmp

AVG
sex

ageRange

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

30

Ex: a month attribute within a time hierarchy, fact instances can be


aggregated by quarter, semester and year by performing a simple
calculation.

ALL EMPLOYEES
city

numberOfEmp
maxSalary
numberOfNonEuroEmp

city

In some cases, aggregation along a dimension can be carried out at


different abstraction levels even if the corresponding dimension attributes
were not explicitly shown.

H
AVG

store

F and G are compatible, they share the time, job and store dimensions

Schema resulting from overlapping F and G is H:


job

AVG
numberOfEmp
maxSalary

the set of the fact attributes in H is the union of the sets in F and G

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

nation

EMPLOYEES

month

MAX

state

Thus, given the F and G fact schemata, attribute quarter could in


principle be added to the time dimension in the resulting schema H
MAX

On the other hand, the designer must keep in mind that, by adopting
this solution, the time for extracting data by quarter will increase
significantly

H can be used, for instance, to calculate the percentage of non-European


employees for each city, job and year.

thus, the best solution would probably be to add explicitly the


quarter attribute to the time hierarchy in the employee fact schema.

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

31

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

32

As a guideline, most measures in a fact scheme should be additive. An example of


additive measure in the sale scheme is qty sold: the quantity sold for a given sales manager
is the sum of the quantities sold for all the stores managed by that sales manager.
A measure may be non-additive on one or more dimensions. Examples of this are all
the measures expressing a level, such as an inventory level, a temperature, etc. An
inventory level is non-additive on time, but it is additive on the other dimensions. A
temperature measure is non-additive on all the dimensions, since adding up two
temperatures hardly makes sense. However, this kind of non-additive measures can still be
Fact schema INVENTORY :
Fact
aggregated by using operators such as average, maximum, minimum; Figure 5 shows an
manager
example where both operators AVG and MIN can be used for aggregation; measure qty
expresses, for each product, the number of copies present within each warehouse during department
category
each week.

type

product

invoice number
order date

brand
units perseason
pallet

year quarter month

season

year month week

warehouse city

brand
diet

qty
stateshipped
.....

deal

state

type
carrier
address

allowance

corporate

address

customer

customer
SHIPMENT
ship to

city

state

ship from
address

address

ship mode

AVG,
MIN

type

brand
invoice
number product
diet
order date
corporate
address

INVENTORY
qty

department
category

SHIPMENT
ship to
year quarter month
datecity
qty shipped
ship from
.....

date

address

season

manager

weight
package size

type

weight
package size

category
weight
package size
package type
product

schema SHIPMENT:

contact person

contact
person
ship
mode

deal

type

terms carrier
address
incentive

allowance

terms
incentive

Finding and defining facts from Relational schema

(a)

(a)
Fact schema overlaping INVENTORY
and SHIPMENT:
category

Fig. 5. The INVENTORY fact scheme.

category

For other measures, aggregation is inherently impossible for conceptual reasons.


type
weight
brand
Consider the measure number of customers in the sale example, estimated package
for a size
given
product
product, day and store by counting the number of purchase tickets for that product printed
season
on that day in that store. Since the same ticket may include other products, adding or
season
averaging the number of customers for two or more products would lead to an inconsistent
month SHIPMENT
year
result. Thus, number of customers is non-aggregable on the product dimension (while it!
year
INVENTORY
is additive on the time and the stores dimensions). In this case, the reason for nonqty shipped
aggregability is that the relationship between purchase tickets and products is many-toinventory qty
AVG, .....
many instead of many-to-one: measure number of customers cannot be consistently
MIN

weight
package size

Building the Attribute Tree from Relational schema

type

Building the Fact Schema from Attribute Tree

brand

product

month

SHIPMENT
!
INVENTORY

qty shipped
inventory qty
AVG, .....
MIN

(b)
(b)
Fig. 8. The SHIPMENT scheme (a) and its overlap with INVENTORY (b).
Fig. 8. The SHIPMENT scheme (a) and its overlap with INVENTORY (b).

The measures in f are the union of those in f' and f". Thus, the fact on which f is

The Conceptual
measures in f are
the union
of those
in
and f". Thus,
which f is embracing both f' and f".
centred
may
bef'considered
as athe
sortfact
of on
"macro-fact"
Bernard ESPINASSE - Data Warehouse
modeling
and
Design

33

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

centred may be considered as aEach


sort ofhierarchy
"macro-fact"
f' andonly
f". the attributes included in the corresponding
in embracing
f includesboth
all and
Each hierarchy in f includes all
and only the
included
in thefunctional
corresponding
hierarchies
ofattributes
both f' and
f". The
dependencies expressed by the interhierarchies of both f' and f".attribute
The functional
expressed by the interlinks independencies
f' and f" are preserved.
attribute links in f' and f" are
preserved.
The domain of each dimension attribute in f is the intersection of the domains of the
The domain of each dimension
attribute in f isattributes
the intersection
of f".
the domains of the
corresponding
in f' and
corresponding attributes in f' and f".

34

Facts correspond to events occurring dynamically

The step to derive DF schemata from Relational schema is :

Within an Relational schema, a fact is represented by a table:

1. Finding and defining facts from Relational schema

Tables representing frequently updated archives are good candidates


to define facts

For each fact :

Tables representing nearly-static archives or representing structural


properties of the domain (such as STORE and CITY), are not
candidates to define facts

2. Building the Attribute Tree from Relational schema


3. Building the Fact Schema from Attribute Tree

Each fact identified on the Relational schema becomes the root of an


attribute tree, that become a fact schema.

Note that the step to derive DF schemata from E/R schema is very similar:
the main difference concerns the algorithm used to build the attribute tree

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

35

Ex : the more important fact is a product sale, and it is represented by the


SALES table

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

36

Relational schema of the DVD rental BD:

For each fact defined from F table, the attribute tree is built as follow :
Each node of the attribute tree corresponds to one or more Relational
schema attributes

CARDS (cardNumber, expiry)


CUSTOMERS (cardNumber:CARDS, name, gender, address, telephone,
personalDocument)
MOVIES (moviesCode, title, category, director, lengh, mainActor)
COPIES (positionOnShelf, movieCode:MOVIES)
RENTALS (positionOnShelf:COPIES, cardNumber:CARDS, date, time)

The table RENTALS is the only candidate for expressing facts, the attribute
tree associated is:

The root of the attribute tree corresponds to the primary key of F

expiry

For each node v, the corresponding attribute functionally determines


all the attributes that correspond to the descendants of v (functionnal
dependencies)

title

name
cardNumber
(CUSTOMER)

positionOnShelf
(RENTALS)

movieCode
category

telephone
cardNumber
(CARDS)

gender

positionOnShelf
(COPIES)

lengh

address

director

personalDocument

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

37

Relational schema of the Flight BD:

The tables that are candidates for expressing facts are :

time

city

38

Attribute Tree 2 (FLIGHTS_INSTANCES)


city

name

country

fromAirport

carrier
fromAirport

airline

flightNumber
(FLIGHTS_INSTANCES)

airline

flightNumber
(FLIGHTS)

departureTime

flightNumber
(FLIGHTS)

departureTime
toAirport

name

toAirport

date

name
city

39

name

country

carrier

FLIGHTS
FLIGHT_INSTANCES
TICKETS
CHECK_IN

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

mainActor

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Attribute Tree 1 (FLIGHTS)

FLIGHTS (flightNumber, airline, fromAirport:AIRPORTS, toAirport:AIRPORTS,


departureTime, arrivalTime, carrier)
FLIGHT_INSTANCES (FlightNumber:FLIGHTS, date)
AIRPORTS (IATAcode, name, city, country)
TICKETS (ticketNumber, flightNumber:FLIGHT_INSTANCES), seat, fate,
passengersFirstName, passengersSurname, passengersGender)
CHECK-IN (ticketNumber:TICKETS, CheckInTime, numberOfBags)

date

country

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

city

country

40

Attribute Tree 3 (TICKETS):

Attribute Tree 4 (CHECK_IN):


city

city

fare

name

country
fare
country

name

checkInTime

carrier

fromAirport
flightNumber
(FLIGHTS)

fromAirport

ticketNumber
(TICKETS)

airline

flightNumber
(FLIGHTS)

ticketNumber
(TICKETS)

airline

flightNumber
(FLIGHTS_INSTANCES)

departureTime
flightNumber
(FLIGHTS_INSTANCES)

departureTime

checkInTime

carrier

ticketNumber
(CHECK_IN)

numberOfBags

date

passagerLastName
passagerFirstName

passagerLastName

date

numberOfBags

passengerGender
toAirport

passengerGender
toAirport

ticketNumber
(CHECK_IN)

passagerFirstName

name
city

name
city

country

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

country

Facts TICKETS and CHECK_IN are the best choices because existing functional
dependencies permit to include a maximum of attributs in trees 3 and 4.

41

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

For each fact:

For each fact:

3.1. Pruning and grafting the attribute tree:

Some attributes in the tree maybe uninteresting for the DW

42

We can retain or graft any nodes corresponding to composite keys

We can retain or graft any nodes corresponding to composite keys

We can modify, add, or delete a fuctional dependency

We can modify, add, or delete a fuctional dependency

We can add one or more fuctional dependencies if a non-mormalized


table exists in the relational schema

We can add one or more fuctional dependencies if a non-mormalized


table exists in the relational schema
In order to drop useless levels of detail, it is possible to apply the
following operators:

3.2. Defining Fact Schema with its dimensions (fact dimensions)


3.3. Defining Fact Schema measures (fact attributes)
3.4. Defining Fact Schema granularity of data (dimension
hierarchies).
The step to derive DF schemata from E/R schema is very similar: the main
difference concerns the algorithm used to build the attribute tree

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

43

Pruning: delete a vertex and its subtree.


Grafting: delete a vertex and move its subtree. It is useful when an
attribute is not interesting but the attributes it determines must be
preserved.

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

44

.
1
date

The choice of dimensions determines the fact granularity


Dimensions must be chosen among the root children in the attribute
tree.
Time should always be a dimension

sales
manager

sales
manager

date

address

address

quantity

dimensions

brandCity
ticket
number

store

city

ticket
number

state

salesManager
brand

department

store

salesDistrict
sale

store

category
product
sales
manager

date

storeCity

state

country

type

address
marketingGroup

unitPrice

date

phone

address

month
quarter

45

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

depends on the queries users are interested in


represents a trade-off between query response time and detail of
information to be stored :
! It may be worth adopting a finer granularity than that required by
users, provided that this does not slow down the system too much
! Constrained by the maximum time frame for loading

salesManager
brand

department

salesDistrict
sale

store

category
product

storeCity

state

country

type
marketingGroup
unitPrice

date

phone

address

month
quarter

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

46

Primary issue in determining performance

quantity

brandCity

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Granularity of data :

Measures must be chosen among the children of the root


Measures are typically computed either by counting the number of
instances of F, or by summing (averaging, ...) expressions which
involve numerical attributes
An attribute cannot be both a measure and a dimension
A fact may have no measures
measures

year

store

Choosing granularity includes defining the refresh interval that needs


to consider :
! Availability of operational data
! Workload characteristics
! The total time period to be analysed

year

47

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

48

3.1: Pruning and grafting the attribute tree:

Relational schema of the DVD rental BD:

expiry
title

name

CARDS (cardNumber, expiry)

cardNumber
(CUSTOMER)

positionOnShelf
(RENTALS)

movieCode
category

telephone

CUSTOMERS (cardNumber:CARDS, name, gender, address, telephone,


personalDocument)

cardNumber
(CARDS)

gender

positionOnShelf
(COPIES)

lengh

address

director

personalDocument

MOVIES (moviesCode, title, category, director, lengh, mainActor)

gender

date

customer

time

positionOnShelf
(RENTALS)

mainActor

title
category

COPIES (positionOnShelf, movieCode:MOVIES)

lengh

RENTALS (positionOnShelf:COPIES, cardNumber:CARDS, date, time)

director

date

mainActor

49

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Fact schema RENTAL:

movieCode and Title are inverted


cardNumber(CARDS) and name (renamed customer) are inverted
positionOnShelf(COPIES) and cardNumber(CARDS) are grafted
time, expiry, telephone, address, personalDocument, movieCode and
cardNumber(CUSTOMERS) are pruned

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

50

SQL measure glossaries for fact schema RENTAL:

gender

customer

positionOnShelf
(RENTALS)

title
category

number = SELECT COUNT (*)

lengh
director

date

FROM RENTALS R INNER JOINT COPIES C

mainActor

ON R.positionOnShelf = C.positionOnShelf,

dimensions

COPIES C INNER JOINT MOVIES F


fact

RENTALS R INNER JOINT CUSTOMERS C

date
gender

customer

RENTAL

title

ON R.cardNumber = C.cardNumber

category

number

measure

GROUP BY F.title, R.date, C.name;

lengh
director
mainActor

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

51

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

52

Pruning and grafting the attribute tree:


Before

Relationnal logical schema describes an operational DB for Fights:

city

After

name

FLIGHTS (flightNumber, airline, fromAirport:AIRPORTS)

fare
country

country

checkInTime

carrier

carrier

fare

city

FLIGHT_INSTANCES (FlightNumber:FLIGHTS, date)

fromAirport

fromAirport

flightNumber
(FLIGHTS)

AIRPORTS (IATAcode, name, city, country)

ticketNumber
(TICKETS)

airline

TICKETS (ticketNumber, flightNumber:FLIGHT_INSTANCES), seat,


fate, passengersFirstName, passengersSurname, passengersGender)

flightNumber
(FLIGHTS_INSTANCES)

departureTime

ticketNumber
(TICKETS)

airline
ticketNumber
(CHECK_IN)

flightNumber

departureTime

passengerGender
toAirport

seat
toAirport

passagerLastName

date

numberOfBags

numberOfBags

date

passagerFirstName

CHECK-IN (ticketNumber:TICKETS, CheckInTime, numberOfBags)

check-in

passengerGender

city
name
city

Fact TICKET ISSUE

country
country

country is now the child of city


checkIn is now a bolean added on the tree when number node was grafted: ist value is
TRUE only for tickets whose passengers have checked in.

53

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

54

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

Fact schema TICKET ISSUE:


Final attribute tree

Derived fact Schema


date

date

check-in

check-in

country
carrier

country

fare

city

country
city

fromAirport
ticketNumber
(TICKETS)

airline

city
Airport

numberOfBags

from
to

flightNumber

flightNumber

departureTime

seat

numberOfFlights
numberOfBags
receipts

airline

toAirport
date
city
country

check-in

passengerGender

Airport

TICKET ISSUE

from
to

TICKET ISSUE

flightNumber
numberOfFlights
numberOfBags
receipts

airline
passengerGender

departureTime

passengerGender

departureTime
arrivalTime

carrier

arrivalTime

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

55

carrier

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

56

SQL measure glossaries for fact schema TICKET ISSUE:

The following relationnal logical schema describes an operational


database for car rentals :

numberOfFlight = SELECT COUNT (*)


FROM TICKETS T INNER JOINT FLIGHT_INSTANCES I
ON T.flightNumber = I.flightNumber AND T.date = I.date
GROUP BY T.passengerNumber, I.date, T.flightNumber;

numberOfBags = SELECT SUM (C.numberOfBag)


FROM TICKETS T INNER JOINT FLIGHT_INSTANCES I
ON T.flightNumber = I.flightNumber AND T.date = I.date
TICKETS T INNER JOINT CHECK_IN C
ON T.ticketNumber = C.ticketNumber
GROUP BY T.ticketNumber, I.date, T.flightNumber;

receipts = SELECT SUM (T.fare)


FROM TICKETS T INNER JOINT FLIGHT_INSTANCES I
ON T.flightNumber = I.flightNumber AND T.date = I.date
GROUP BY T.passengerGender, I.date, T.flightNumber;

RENTAL_OFFICES (OfficeName, City, Area, State, Country)


CARS (LicensePlate, Category, Model, Brand, Fuel, RegistrationDate)
HAVE_OPTIONAL (LicensePlate:CARS, Optional)
RENTALS (LicensePlate: CARS, PickupDate, DropoffDate,
PickupPlace:RENTALJDFFICES, DropoffPlace :RENTAL_OFFICES, Miles)
DRIVERS (LicenseNumber, LicenseExpiration, DriverName, Birthdate)
DRIVE (LicenseNumber: DRIVERS,(LicensePlate, PickupDate):RENTALS)
INSURANCES (Risk,(LicensePlate, PickupDate):RENTALS, Cost)
PAYMENTS ((LicensePlate, PickupDate):RENTALS, Amount, Discount,
PaymentMode)
!5.,

3#$%&5-

;&,&.

Some hidden functional dependencies hold: City->State->Country->Area,


and Model->Brand. Inspect and3(&normalize the source schema, then choose
7$.1
899(*.:,".
a fact of interest
and
design
its
fact
schema.
05#<#99 /(1.)
65,%0

The check-in dimension was left out to avoid making the query too complex.

<(*=$<

57

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

58

Bernard ESPINASSE - Data/#0.1


Warehouse Conceptual
modeling and Design
2(*.%).+1,&.
?@:A!2;
3,&.4#5<(*=$<
5.4()&5,&(#%

2(*.%).+1,&.>+(*=$<',&.

05#<#99

!"#$%&

',&.

'()*#$%&

Choosing either RENTALS or PAYMENTS as fact is the same here,


because these 2 tables are related by a one-to-one link
!5.,

In the edited attribute tree, the drop-off date is pruned


and replaced by a
+,-".%&/#0.
Duration attribute computed as the number of days between the drop-off
!"#$%&#&'($&'#)$$*(+,$&#$*&&-#$%&#'*./0.11#')$&#(2#/*,"&'#)"'#*&/3)4&'#+5#)#!"#$%&'(#)$$*(+,$&#4.6/,$&'#)2
and the pick-up dates :
$%&#",6+&*#.1#')52#+&$7&&"#$%&#'*./0.11#)"'#$%&#/(480,/#')$&29

3#$%&5;&,&.

!9.,

;#$%&96&,&.

3(&;(&-

7$.1

899(*.:,".

65,%0

>$.:

05#<#99 /(1.)

=9,%0

<(*=$<
/#0.1

2(*.%).+1,&.

3,&.4#5-

09#A#@@ /(:.)

A(*B$A
/#0.:

?@:A!2;

;,9

1234!56

;,&.<#9-

<(*=$<

5.4()&5,&(#%

?@@(*.

05#<#99
',&.

'()*#$%&

9.<()&9,&(#%

2(*.%).+1,&.>+(*=$<',&.
!"#$%&

!"#$%&

',&.
/#%&7

+,-".%&/#0.
'$9,&(#%

'()*#$%&
+,-".%&/#0.

8.,9

!"#$%&#&'($&'#)$$*(+,$&#$*&&-#$%&#'*./0.11#')$&#(2#/*,"&'#)"'#*&/3)4&'#+5#)#!"#$%&'(#)$$*(+,$&#4.6/,$&'#)2
$%&#",6+&*#.1#')52#+&$7&&"#$%&#'*./0.11#)"'#$%&#/(480,/#')$&29
Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

!9.,

;#$%&96&,&.

59

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

60

RENTAL Fact schema :


/)4&'.#

:.%"

9'"'%

/7'#
34%1

56678%

2."&*

<78=4<
()*%1

*.)<)66
>?@A:B

/".

:$)4&'
+7;8)4&'
+4."'7)&
(71%;

/"'%0).#

.%07;'."'7)&

<78=4<

!"#$%&'()*%

+"'%
()&',

-%".

!!!"#$%&'#()*+
61

Bernard ESPINASSE - Data Warehouse Conceptual modeling and Design

!"#$%&&'()*&#$&'##$+(&"$',,&$(-$!"#$%$(.$/#0(1&#/$)#2,+3
A0B1)7;'+)
&*.8063(
&).87*2(*03
&'-.

=6785'.)$*+)

=7047)..*?);6+@)7

$-2)/01)

=0*3(.
/0.(

!)34(5
$-2)

:,,*8)/01)>=6785'.)&'()>=7047)..*?);6+@)7
$*+)

26785'.)

&67'(*03

&'()
!"#$%
=0*3(.
!*,(%-.()+/01)

26785'.)

$*89)(/01)

/0.(
:,,*8)/01)
<117)..

;'+)

4,&#$&"%&5$(-$&"#$#/(&#/$&'##5$&"#$6*-1&(,-%2$/#0#-/#-17$6',8$!&'(%)*(+,-./+$&,$0.&1(*$"%.$)##'#8,9#/$&,$8%:#$0.&1(*$%$1"(2/$,6$&"#$',,&5$.,$&"%&$(&$1%-$)#$1",.#-$%.$%$8#%.*'#3$!"#$.%8#$(.$/,-#$6,'
2345(&.13$!"#$&(1:#&$%-/$&"#$.:(0%..$;'%-*2%'(&(#.$%'#$'#8,9#/5$%-/$%$%6&75**84$&96+($62%;$(.$%//#/$&,

Potrebbero piacerti anche