Sei sulla pagina 1di 15

Database Design

Logical Design
Application independent, data-centered using data and application semantics How do we say that one schema (set of relations) is better than another schema? What is the measure of goodness? How do we quantify it?

Normalization
Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma

Is it possible to have a theory that analyzes schema, identifies its drawbacks, and transforms it into a better schema?
Database management systems:

Database Management Systems: Sharma Chakravarthy

S. Chakravarthy

Slide 2

Database Design
Physical Design
Application specific How do we say that one design (choices of physical representation) is better than another? What is the measure of goodness and how do we quantify it?
Performance (#trans/sec, throughput, # of disk accesses) ( , g p , )

Normalization
Need to understand
Functional dependencies (FDs) Multi-valued dependencies (MVDs) Join dependencies (JDs) Closure property Loss-less join property Decomposition

Is it possible to have a theory that analyzes schema, identifies its drawbacks, and transforms it into a better schema?
Wizards, advisor, DBA! Model specific !!! DBMS specific !!!
Database management systems:

S. Chakravarthy

Slide 3

Database management systems:

S. Chakravarthy

Slide 4

Normal Forms
Using FDs:
1 NF (First normal form) 2NF (Second normal form) 3NF (Third normal form) BCNF (Boyce-Codd Normal Form) (Boyce Codd

Normal Forms
Advantages
Data redundancy is minimized simplifies maintenance and reduces storage Has more tables than an unnormalized database/schema; can have more clustered indexes Each table contains information about a single entity; each index has fewer columns Each table has fewer indexes; insert, delete and update are more efficient

Based on MVDs: 4NF (Fourth Normal Form) Based on JDs: 5NF (Fifth Normal form) Others: Domain key normal form (DK/NF) Restriction-Union Normal Form
Database management systems:

Disadvantages
More joins to be performed for the same query!
Slide 5 Database management systems:

S. Chakravarthy

S. Chakravarthy

Slide 6

Normal Forms how are they related?


5NF 4NF BCNF 3NF 2NF

Functional Dependency
Generalization of the key concept
Given the key value, all other attribute values can be determined (e.g., ssn name, address, phone#) Each key value uniquely determines values of all other attributes in a relation

1NF

A functional dependency (FD) is denoted by X Y where X and Y are attributes of a relation. Whenever two tuples agree on their X-value, they also agree on their Y-value, that is:
If t1[X] = t2[X] then t1[Y] = t2[Y]

Database management systems:

S. Chakravarthy

Slide 7

Database management systems:

S. Chakravarthy

Slide 8

FD - example
S# city S#, P# qty S#, P# city {S#, P#} {city, qty} S#, P# S# S#, P# S#,P#,city,qty S# qty Qty S# S# City P# is Not true ! P# is not true !
Database management systems:

Functional Dependency
SCP table S# S1 S1 S2 S2 S3 S4 S4 S4 City P# qty 100 100 200 200 300 400 400 400

In the previous example, how did we obtain/generate functional dependencies?


By looking at the table! Is that a correct way to derive FDs?

London P1 London P2 Paris Paris Paris P1 P2 P2

How do we ob ow obtain FDs in ge e ? s general?


Based on data/application semantics Not by eyeballing the relation Tuples in a relation change (tables grow and shrink) But FDs need to hold for all tuples at all times!

London P2 London P4 London P5

S. Chakravarthy

Slide 9

Database management systems:

S. Chakravarthy

Slide 10

FD (example)
Not true any more: S# city S#, P# qty S#, P# city {S#, P#} {city, qty} Still true S#, P# S# S#, S# P# S#,P#, city S# P# city, qty S# qty Qty S# Moral: CANNOT derive FDs from looking at a snapshot; it has to be true for all instances of a table

SCP table S# S1 S1 S2 S2 S3 S4 s4 S1 s4 City London London Paris Paris Paris London London Paris London P# P1 P2 P1 P2 P2 P2 P4 P1 p5 qty 100 100 200 200 300 400 400 100 400

Trivial and non-trivial FDs


xx x, y x x, y u, v implies
x, y u x, y v

x y and y z implies x z (transitive fd) The set of FDs implied by a given set S of FDs is called the closure of S and is denoted by S+ How do we compute closure?
Database management systems:

Database management systems:

S. Chakravarthy

Slide 11

S. Chakravarthy

Slide 12

Keys
Key
minimal set of attributes that functionally determine all other attributes of the relation. Candidate Keys Primary Key y y

Armstrongs inference rules


1. Reflexivity: if Y is a subset of X, then X Y 2. Augmentation: if X Y, then XZ YZ 3. Transitivity: if X Y and Y Z, then X Z The above are sound and complete Soundness: does not generate any incorrect FDs Completeness: generates all FDs that can be inferred from a given set of FDs.
Slide 13 Database management systems:

Superkey
A set of attributes that contains a key Need not be minimal.

Database management systems:

S. Chakravarthy

S. Chakravarthy

Slide 14

Inferring a FD
Suppose we are given a relation R with attributes A, B, C, D, E, F, and the FDs
A BC BE CD EF

Inferring a FD
Suppose we are given a relation R with attributes A, B, C, G, H, I, and the FDs
AB AC CG H CG I BH

Show that the FD: AD F holds in R


A BC AC AD CD CD EF AD EF AD F (given) (trivial, decomposition) (2, augmentation) (given) (3, transitivity) (decomposition)
S. Chakravarthy
Slide 15

Show that the following FDs can be inferred:


AH CG HI AG I
Database management systems:

Database management systems:

S. Chakravarthy

Slide 16

Closure of a set of attributes


Algorithm for computing the closure of a set of attributes X (denoted by X+) for a given set of functional dependencies F: X+ = X Repeat oldX+ := X+ For each fd Y Z in F do if Y X+ then X+ := X+ Z Until (oldX+ = X+)
Database management systems:

Closure of a set of attributes


Compute (AG)+ for the previous example
Set (AG)+ := (AG) is (AGBC) is (AGBCHI)

Significance of closure of a set of attributes


Can be used to check whether a set of attributes (e.g., (AG)+) form a key! Can be used to check whether a set of attributes is NOT a key

Need to have a complete set of dependencies!


Slide 17 Database management systems:

S. Chakravarthy

S. Chakravarthy

Slide 18

Anomalies: insertion
Consider the EMP_DEPT relation the EMP_DEPT Ename Ssn Bdate Add Dnum Dname dmgssn table e1 123 89 Abcd 5 Res 345 To insert a new e2 124 88 Abde 5 Res 345 e1 125 85 Abdf 6 Cse 346 emp tuple, we e1 126 89 Abd 6 1 Abdg Cse 346 C need to give details e1 127 91 Abdh 6 cse 346 of the dept he works e1 346 80 Abdi 6 cse 346 for (or null) It is not possible to insert a dept tuple that has NO employees as yet. The key value of the emp_dept cannot be NULL!
Database management systems:

Modification Anomaly
Consider the the EMP_DEPT table
Ename Ssn Bdate Add Dnum Dname dmgssn

e1 e2 e1 e1 e1 e1

123 89 124 88 125 85 126 89 127 91 346 80

Abcd 5 Abde 5 Abdf 6 Abdg 6 Abdh 6 Abdi 6

Res Res Cse Cse cse cse

345 345 346 346 346 346

Any changes to the dept attributes in EMP_DEPT requires that all tuples that refer to the dept tuple being modified be properly changed !
Database management systems:

S. Chakravarthy

Slide 19

S. Chakravarthy

Slide 20

Deletion Anomaly
Consider the the EMP_DEPT table
Ename Ssn Bdate Add Dnum Dname dmgssn

First Normal Form (1NF)


89 88 85 89 91 80 Abcd Abde Abdf Abdg Abdh Abdi 5 5 6 6 6 6 Res Res Cse Cse cse cse 345 345 346 346 346 346

e1 e2 e1 e1 e1 e1

123 124 125 126 127 346

A first normal form relation must have atomic (i.e., simple) value for all of its attribute values in every tuple. This means that a relation with set-valued attributes and/or nested sub relations (sub tables) from composite attributes will not be in 1NF.

If we delete the last employee, information concerning the department is lost !

Database management systems:

S. Chakravarthy

Slide 21

Database management systems:

S. Chakravarthy

Slide 22

Example
Name J Smith Age Child name Years 40 Alice Ben,Donna 75-80 70-79 Company Title Abc Chem Engineer Xerox Accountant Name J Smith

Example (Contd.)
Age Child name Years Company Title 40 Alice Ben Donna 75-80 Abc chem engineer 70-79 Xerox 70-79 Xerox Accountant Accountant

H. Nicholas 50

H. Nicholas 50
childname years

name

age

H. Nicholas 50

person

Previous job

company

title

If you follow EER mapping properly, you will always have relations in 1NF
Database management systems:

Database management systems:

S. Chakravarthy

Slide 23

S. Chakravarthy

Slide 24

Prime and non-prime attributes


An attribute that is part of a key (any candidate key, not just the primary key) is called a prime attribute p y( y An attribute that is not part of a key (any candidate key, not just the primary key) is called a non-prime attribute. Given relation and its keys, we can identify prime and non-prime attributes
Database management systems:

Second Normal Form (2NF)


A relation R is in 2NF if every non-prime attribute is fully functionally dependent on each relation key.
A relation not in 2NF embodies two disjoint facts together

e.g., Sales (Dept, Item , Price)


key

Dept Item

price
Not fully functionally dependent
S. Chakravarthy

S. Chakravarthy

Slide 25

Database management systems:

Slide 26

2NF (Contd.)
The term full functional dependency is used to indicate the minimum set of attributes on the left side of an FD.
Formally a set of attributes Y is fully functionally dependent on a set of attributes X if f 1. Y is functionally dependent on X 2. Y is NOT functionally dependent on any subset of X. If AB C and A C, then C is NOT fully functionally dependent on AB.
Database management systems:

Loss less join


To convert previous relation into 2NF, separate different facts by decomposing the relation into Sales( Dept, item) ( p, ) Iteminfo ( item, Price) Above decomposition is loss less. That is, if the bottom two relations are joined, you get the top relation!
Database management systems:

Dept Item price D1 D1 D2 Dept Item D t It D1 D1 D2 I1 I2 I2 I1 I2 I2 p1 p2 p2

Item price It i I1 I2 p1 p2

S. Chakravarthy

Slide 27

S. Chakravarthy

Slide 28

Lossy join
For the same relation, if we decompose it as shown below, it is NOT a loss-less join. We We get 2 extra (or spurious tuples):
D1 I1 p2 D1 I2 p1

2NF Normalization
Dept
DNAME DNUM DMGRSSN DLOC I1 I2 I2 p1 p2 p2 dept price D1 D1 D2 p1 p2 p2
nam e
DNUM

Dept Item price D1 D1 D2 Dept Item D1 D1 D2 I1 I2 I2

DLOC is a set valued attribute

Dept
Database management systems:

DLOC

Why?
Database management systems:

S. Chakravarthy

Slide 29

S. Chakravarthy

Slide 30

2NF Normalization
Dept

2NF Normalization
Decomposed into

Dnum DLOC

DNAME
DMGRSSN
DNUM DNAME DMGRSSN

But DNUM DNAME and DNUM

Dmgrssn

Dept Loc

DNUM DLOC

DNAME and DMGRSSN are NOT fully functionally dependent on DNUM, DLOC.

Loss-less decomposition

How do we decompose?
Database management systems:

S. Chakravarthy

Slide 31

Database management systems:

S. Chakravarthy

Slide 32

2NF Normalization
Prop_id# County_name Lot # Area Price Tax-rate

2NF Normalization
c_name
Prop_id#

Tax_rate

Lot# Area Price does not affect the 2NF Two keys: Prop_ID # and County_name, Lot# Prime attributes:_____________________ Non-prime attributes:_________________
Database management systems:

S. Chakravarthy

Slide 33

Database management systems:

S. Chakravarthy

Slide 34

2NF Normalization
Decomposed into:
Prop_id # County_name Lot #

2NF Normalization
SSN Pnumber Hours Ename Pname Plocation

Area

Price

2NF Normalization
SSN Pnumber County_name Tax-rate Pname Pnumber Plocation Hours SSN Ename

2NF
Database management systems:

S. Chakravarthy

Slide 35

Database management systems:

S. Chakravarthy

Slide 36

Points to Note !
A 1NF relation must have atomic value for all of its attribute values in every tuple. When there is more than one attribute in the y, key, the relation MUST be verified for 2NF Typically, relations are decomposed along the lines of functional dependencies to be in 2NF

3NF (Third Normal Form)


A relation scheme R with dependencies F is said to be in 3NF if whenever X A holds in R, and A is not in X, then a) X is a super key for R or b) A is Prime. A relation R is in 3NF if it has the following properties: 1. The relation R is in 2NF 2. The non-prime attributes are mutually independent; i.e., no non-prime attribute is functionally dependent on another non-prime attribute
Database management systems:

Database management systems:

S. Chakravarthy

Slide 37

S. Chakravarthy

Slide 38

3NF Explained
A relation schema R is in 3NF if for all FDs that hold on R of the form X A, where X R and A R and at least one of the following holds:
X A is a trivial FD X is a superkey for R A is contained in a candidate key for R
( Or A is a prime attribute of R)

3NF
Key Non-Prime

A B

Database management systems:

S. Chakravarthy

Slide 39

Database management systems:

S. Chakravarthy

Slide 40

10

3NF Example

3NF Decomposition
Decomposed into

Prop_id # County_name Lot #

Area

Price

Prop-Id# County_name Lot#

Area

Area
Prop_Id# County_name Lot #

Price

Areaprice
Area is not a super key Price is NOT a prime attribute
Slide 41

Both are 3NF.

Database management systems:

S. Chakravarthy

Database management systems:

S. Chakravarthy

Slide 42

3NF Example
EMP_DEPT
EName SSN Bdate ADD DNUM DNAME DMGRSSN

3NF Example
Decompose
EName SSN Bdate ADD
DNUM

3NF
DNUM
DNAME DMGRSSN

SSN

DNUM

DNAME

3NF
DMGRSSN
Database management systems:

S. Chakravarthy

Slide 43

Database management systems:

S. Chakravarthy

Slide 44

11

3NF Example
R(ABCD) F = { AB C, B D, BC A}
AB is a key BC is also a key Here B D and D is NOT a prime attribute. Hence R is not in 3NF.
A B S C

3NF Example
R( city, Zipcode, Street )
c
CS Z ZC

C S S z

Both CS and SZ are keys. Hence all attributes are prime. In spite of Z C, R is in 3NF because C is prime.
Slide 45 Database management systems:

Database management systems:

S. Chakravarthy

S. Chakravarthy

Slide 46

1 to 3NF
1NF: No Multi-valued Attributes 2NF: A relation R is in 2NF if every non-prime attribute is fully functionally dependent on each relation key. 3NF: A relation scheme R with dependencies p F is said to be in 3NF if whenever X A holds in R, and A is not in X, then a) X is a super key for R or b) A is Prime.
Database management systems:

Boyce-Codd Normal Form


A relation Scheme R with dependencies F is said to be in BCNF if whenever X A holds in R, and A is not in X, then X is a super key for R; that is, X is a key or contains a key. Examples:

R( City Zipcode, Street ) City, Zipcode c z s


CS Z ZC Both CS and SZ are keys. In Z C, Z is not a super key Hence R is not in BCNF although it is in 3NF.
Database management systems:

S. Chakravarthy

Slide 47

S. Chakravarthy

Slide 48

12

BCNF Cont.
A relation R is in BCNF if every determinant (left side of a FD) in the relation is a relation key.
R= ({A,B,C,D},{ A BCD , D A}) is in ({ , , , },{ }) BCNF if both A and D are key of R

BCNF Example
A relation R is in BCNF if every determinant (left side of a FD) in the relation is a relation key. R= ({A,B,C,D},{ A BCD , D A}) is in BCNF if both A and D are key of R
City

street Street
Zipcode

Violates BCNF definition

Decompose into CS and SZ Does not preserve Dependency !!


Database management systems:

S. Chakravarthy

Slide 49

Database management systems:

S. Chakravarthy

Slide 50

BCNF Example
Grades( Stud_id, Ph#, Course,Grades) S P C G
SC is a key PC is a key But P S (a FD) violates BCNF definition
s c S p

BCNF Example
Lending_scheme ( branch_name, assets,branch_city, loan_number,
cust_name, amount) Key: loan_number, cust_name FDs: Branch_name assets Branch_name branch_city loan_number amount loan_number branch_name
Slide 51 Database management systems:

Grade

Decompose into SP and SCG


Database management systems:

S. Chakravarthy

S. Chakravarthy

Slide 52

13

BCNF Example
Decompose the previous into: R1 = (branch_name, assets) R2 = (branch_name, branch_city) R3 = (loan_number, amount) R4 = (loan_number, branch_name) R5 = (loan_number, customer_name) Key Loan_number Cust_name

BCNF Example
R( City, Zipcode, Street )
c z s
CS Z ZC Both CS and SZ are keys.
Decomposition into any scheme that does not have CSZ, the dependency CS Z is not implied by the projected dependencies. For example, decomposition into SZ and CZ does not preserve dependencies.
Slide 53 Database management systems:

Database management systems:

S. Chakravarthy

S. Chakravarthy

Slide 54

BCNF Example Cont.


street 415 Yates St 415 Yates St zip 76010 76019 city Arlington Arlington zip 76010 76019

Summary
Loss-less join decomposition Dependency preserving decomposition Any relation scheme has a loss-less join decomposition into BCNF Any relation scheme has both a loss-less join loss less and dependency preserving decomposition into 3NF. There may be no decomposition into BCNF that is dependency preserving

Join tuples with equal zip codes street 415 Yates St 415 Yates St city Arlington Arlington zip 76010 76019

Although no FDs were violated in the decomposed relations, FD CS Z is violated by the database as a whole.
(Adapted from Jeff Ullman)
Database management systems:

S. Chakravarthy

Slide 55

Database management systems:

S. Chakravarthy

Slide 56

14

Summary
Motivation behind normal forms
An integrity constraint and a relationship the database is intended to store Anomalies are avoided by the normal forms If we have a transitive dependency X Y Z then we cannot associate a Y-value with an X-value unless there is a Z-value associated with the Y value. This leads to insertion and deletion anomalies BCNF avoids some anomalies not prevented by 3NF e.g., in the CSZ example, we cannot record the city to which the zip code belongs unless we know the street address with the zip code.

Database management systems:

S. Chakravarthy

Slide 57

15

Potrebbero piacerti anche