Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Logical Design
Application independent, data-centered using data and application semantics How do we say that one schema (set of relations) is better than another schema? What is the measure of goodness? How do we quantify it?
Normalization
Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma
Is it possible to have a theory that analyzes schema, identifies its drawbacks, and transforms it into a better schema?
Database management systems:
S. Chakravarthy
Slide 2
Database Design
Physical Design
Application specific How do we say that one design (choices of physical representation) is better than another? What is the measure of goodness and how do we quantify it?
Performance (#trans/sec, throughput, # of disk accesses) ( , g p , )
Normalization
Need to understand
Functional dependencies (FDs) Multi-valued dependencies (MVDs) Join dependencies (JDs) Closure property Loss-less join property Decomposition
Is it possible to have a theory that analyzes schema, identifies its drawbacks, and transforms it into a better schema?
Wizards, advisor, DBA! Model specific !!! DBMS specific !!!
Database management systems:
S. Chakravarthy
Slide 3
S. Chakravarthy
Slide 4
Normal Forms
Using FDs:
1 NF (First normal form) 2NF (Second normal form) 3NF (Third normal form) BCNF (Boyce-Codd Normal Form) (Boyce Codd
Normal Forms
Advantages
Data redundancy is minimized simplifies maintenance and reduces storage Has more tables than an unnormalized database/schema; can have more clustered indexes Each table contains information about a single entity; each index has fewer columns Each table has fewer indexes; insert, delete and update are more efficient
Based on MVDs: 4NF (Fourth Normal Form) Based on JDs: 5NF (Fifth Normal form) Others: Domain key normal form (DK/NF) Restriction-Union Normal Form
Database management systems:
Disadvantages
More joins to be performed for the same query!
Slide 5 Database management systems:
S. Chakravarthy
S. Chakravarthy
Slide 6
Functional Dependency
Generalization of the key concept
Given the key value, all other attribute values can be determined (e.g., ssn name, address, phone#) Each key value uniquely determines values of all other attributes in a relation
1NF
A functional dependency (FD) is denoted by X Y where X and Y are attributes of a relation. Whenever two tuples agree on their X-value, they also agree on their Y-value, that is:
If t1[X] = t2[X] then t1[Y] = t2[Y]
S. Chakravarthy
Slide 7
S. Chakravarthy
Slide 8
FD - example
S# city S#, P# qty S#, P# city {S#, P#} {city, qty} S#, P# S# S#, P# S#,P#,city,qty S# qty Qty S# S# City P# is Not true ! P# is not true !
Database management systems:
Functional Dependency
SCP table S# S1 S1 S2 S2 S3 S4 S4 S4 City P# qty 100 100 200 200 300 400 400 400
S. Chakravarthy
Slide 9
S. Chakravarthy
Slide 10
FD (example)
Not true any more: S# city S#, P# qty S#, P# city {S#, P#} {city, qty} Still true S#, P# S# S#, S# P# S#,P#, city S# P# city, qty S# qty Qty S# Moral: CANNOT derive FDs from looking at a snapshot; it has to be true for all instances of a table
SCP table S# S1 S1 S2 S2 S3 S4 s4 S1 s4 City London London Paris Paris Paris London London Paris London P# P1 P2 P1 P2 P2 P2 P4 P1 p5 qty 100 100 200 200 300 400 400 100 400
x y and y z implies x z (transitive fd) The set of FDs implied by a given set S of FDs is called the closure of S and is denoted by S+ How do we compute closure?
Database management systems:
S. Chakravarthy
Slide 11
S. Chakravarthy
Slide 12
Keys
Key
minimal set of attributes that functionally determine all other attributes of the relation. Candidate Keys Primary Key y y
Superkey
A set of attributes that contains a key Need not be minimal.
S. Chakravarthy
S. Chakravarthy
Slide 14
Inferring a FD
Suppose we are given a relation R with attributes A, B, C, D, E, F, and the FDs
A BC BE CD EF
Inferring a FD
Suppose we are given a relation R with attributes A, B, C, G, H, I, and the FDs
AB AC CG H CG I BH
S. Chakravarthy
Slide 16
S. Chakravarthy
S. Chakravarthy
Slide 18
Anomalies: insertion
Consider the EMP_DEPT relation the EMP_DEPT Ename Ssn Bdate Add Dnum Dname dmgssn table e1 123 89 Abcd 5 Res 345 To insert a new e2 124 88 Abde 5 Res 345 e1 125 85 Abdf 6 Cse 346 emp tuple, we e1 126 89 Abd 6 1 Abdg Cse 346 C need to give details e1 127 91 Abdh 6 cse 346 of the dept he works e1 346 80 Abdi 6 cse 346 for (or null) It is not possible to insert a dept tuple that has NO employees as yet. The key value of the emp_dept cannot be NULL!
Database management systems:
Modification Anomaly
Consider the the EMP_DEPT table
Ename Ssn Bdate Add Dnum Dname dmgssn
e1 e2 e1 e1 e1 e1
Any changes to the dept attributes in EMP_DEPT requires that all tuples that refer to the dept tuple being modified be properly changed !
Database management systems:
S. Chakravarthy
Slide 19
S. Chakravarthy
Slide 20
Deletion Anomaly
Consider the the EMP_DEPT table
Ename Ssn Bdate Add Dnum Dname dmgssn
e1 e2 e1 e1 e1 e1
A first normal form relation must have atomic (i.e., simple) value for all of its attribute values in every tuple. This means that a relation with set-valued attributes and/or nested sub relations (sub tables) from composite attributes will not be in 1NF.
S. Chakravarthy
Slide 21
S. Chakravarthy
Slide 22
Example
Name J Smith Age Child name Years 40 Alice Ben,Donna 75-80 70-79 Company Title Abc Chem Engineer Xerox Accountant Name J Smith
Example (Contd.)
Age Child name Years Company Title 40 Alice Ben Donna 75-80 Abc chem engineer 70-79 Xerox 70-79 Xerox Accountant Accountant
H. Nicholas 50
H. Nicholas 50
childname years
name
age
H. Nicholas 50
person
Previous job
company
title
If you follow EER mapping properly, you will always have relations in 1NF
Database management systems:
S. Chakravarthy
Slide 23
S. Chakravarthy
Slide 24
Dept Item
price
Not fully functionally dependent
S. Chakravarthy
S. Chakravarthy
Slide 25
Slide 26
2NF (Contd.)
The term full functional dependency is used to indicate the minimum set of attributes on the left side of an FD.
Formally a set of attributes Y is fully functionally dependent on a set of attributes X if f 1. Y is functionally dependent on X 2. Y is NOT functionally dependent on any subset of X. If AB C and A C, then C is NOT fully functionally dependent on AB.
Database management systems:
Item price It i I1 I2 p1 p2
S. Chakravarthy
Slide 27
S. Chakravarthy
Slide 28
Lossy join
For the same relation, if we decompose it as shown below, it is NOT a loss-less join. We We get 2 extra (or spurious tuples):
D1 I1 p2 D1 I2 p1
2NF Normalization
Dept
DNAME DNUM DMGRSSN DLOC I1 I2 I2 p1 p2 p2 dept price D1 D1 D2 p1 p2 p2
nam e
DNUM
Dept
Database management systems:
DLOC
Why?
Database management systems:
S. Chakravarthy
Slide 29
S. Chakravarthy
Slide 30
2NF Normalization
Dept
2NF Normalization
Decomposed into
Dnum DLOC
DNAME
DMGRSSN
DNUM DNAME DMGRSSN
Dmgrssn
Dept Loc
DNUM DLOC
DNAME and DMGRSSN are NOT fully functionally dependent on DNUM, DLOC.
Loss-less decomposition
How do we decompose?
Database management systems:
S. Chakravarthy
Slide 31
S. Chakravarthy
Slide 32
2NF Normalization
Prop_id# County_name Lot # Area Price Tax-rate
2NF Normalization
c_name
Prop_id#
Tax_rate
Lot# Area Price does not affect the 2NF Two keys: Prop_ID # and County_name, Lot# Prime attributes:_____________________ Non-prime attributes:_________________
Database management systems:
S. Chakravarthy
Slide 33
S. Chakravarthy
Slide 34
2NF Normalization
Decomposed into:
Prop_id # County_name Lot #
2NF Normalization
SSN Pnumber Hours Ename Pname Plocation
Area
Price
2NF Normalization
SSN Pnumber County_name Tax-rate Pname Pnumber Plocation Hours SSN Ename
2NF
Database management systems:
S. Chakravarthy
Slide 35
S. Chakravarthy
Slide 36
Points to Note !
A 1NF relation must have atomic value for all of its attribute values in every tuple. When there is more than one attribute in the y, key, the relation MUST be verified for 2NF Typically, relations are decomposed along the lines of functional dependencies to be in 2NF
S. Chakravarthy
Slide 37
S. Chakravarthy
Slide 38
3NF Explained
A relation schema R is in 3NF if for all FDs that hold on R of the form X A, where X R and A R and at least one of the following holds:
X A is a trivial FD X is a superkey for R A is contained in a candidate key for R
( Or A is a prime attribute of R)
3NF
Key Non-Prime
A B
S. Chakravarthy
Slide 39
S. Chakravarthy
Slide 40
10
3NF Example
3NF Decomposition
Decomposed into
Area
Price
Area
Area
Prop_Id# County_name Lot #
Price
Areaprice
Area is not a super key Price is NOT a prime attribute
Slide 41
S. Chakravarthy
S. Chakravarthy
Slide 42
3NF Example
EMP_DEPT
EName SSN Bdate ADD DNUM DNAME DMGRSSN
3NF Example
Decompose
EName SSN Bdate ADD
DNUM
3NF
DNUM
DNAME DMGRSSN
SSN
DNUM
DNAME
3NF
DMGRSSN
Database management systems:
S. Chakravarthy
Slide 43
S. Chakravarthy
Slide 44
11
3NF Example
R(ABCD) F = { AB C, B D, BC A}
AB is a key BC is also a key Here B D and D is NOT a prime attribute. Hence R is not in 3NF.
A B S C
3NF Example
R( city, Zipcode, Street )
c
CS Z ZC
C S S z
Both CS and SZ are keys. Hence all attributes are prime. In spite of Z C, R is in 3NF because C is prime.
Slide 45 Database management systems:
S. Chakravarthy
S. Chakravarthy
Slide 46
1 to 3NF
1NF: No Multi-valued Attributes 2NF: A relation R is in 2NF if every non-prime attribute is fully functionally dependent on each relation key. 3NF: A relation scheme R with dependencies p F is said to be in 3NF if whenever X A holds in R, and A is not in X, then a) X is a super key for R or b) A is Prime.
Database management systems:
S. Chakravarthy
Slide 47
S. Chakravarthy
Slide 48
12
BCNF Cont.
A relation R is in BCNF if every determinant (left side of a FD) in the relation is a relation key.
R= ({A,B,C,D},{ A BCD , D A}) is in ({ , , , },{ }) BCNF if both A and D are key of R
BCNF Example
A relation R is in BCNF if every determinant (left side of a FD) in the relation is a relation key. R= ({A,B,C,D},{ A BCD , D A}) is in BCNF if both A and D are key of R
City
street Street
Zipcode
S. Chakravarthy
Slide 49
S. Chakravarthy
Slide 50
BCNF Example
Grades( Stud_id, Ph#, Course,Grades) S P C G
SC is a key PC is a key But P S (a FD) violates BCNF definition
s c S p
BCNF Example
Lending_scheme ( branch_name, assets,branch_city, loan_number,
cust_name, amount) Key: loan_number, cust_name FDs: Branch_name assets Branch_name branch_city loan_number amount loan_number branch_name
Slide 51 Database management systems:
Grade
S. Chakravarthy
S. Chakravarthy
Slide 52
13
BCNF Example
Decompose the previous into: R1 = (branch_name, assets) R2 = (branch_name, branch_city) R3 = (loan_number, amount) R4 = (loan_number, branch_name) R5 = (loan_number, customer_name) Key Loan_number Cust_name
BCNF Example
R( City, Zipcode, Street )
c z s
CS Z ZC Both CS and SZ are keys.
Decomposition into any scheme that does not have CSZ, the dependency CS Z is not implied by the projected dependencies. For example, decomposition into SZ and CZ does not preserve dependencies.
Slide 53 Database management systems:
S. Chakravarthy
S. Chakravarthy
Slide 54
Summary
Loss-less join decomposition Dependency preserving decomposition Any relation scheme has a loss-less join decomposition into BCNF Any relation scheme has both a loss-less join loss less and dependency preserving decomposition into 3NF. There may be no decomposition into BCNF that is dependency preserving
Join tuples with equal zip codes street 415 Yates St 415 Yates St city Arlington Arlington zip 76010 76019
Although no FDs were violated in the decomposed relations, FD CS Z is violated by the database as a whole.
(Adapted from Jeff Ullman)
Database management systems:
S. Chakravarthy
Slide 55
S. Chakravarthy
Slide 56
14
Summary
Motivation behind normal forms
An integrity constraint and a relationship the database is intended to store Anomalies are avoided by the normal forms If we have a transitive dependency X Y Z then we cannot associate a Y-value with an X-value unless there is a Z-value associated with the Y value. This leads to insertion and deletion anomalies BCNF avoids some anomalies not prevented by 3NF e.g., in the CSZ example, we cannot record the city to which the zip code belongs unless we know the street address with the zip code.
S. Chakravarthy
Slide 57
15