Sei sulla pagina 1di 15

A table is always said to be in some normal form.

A relation is said to be in certain normal form if it


satisfies certain prescribed set of conditions.
The aim of normalisation is to have some rules
which help to produce a sensible set of tables
Normalization is the successive decomposition of
a relation into further relations that are said to
have more desirable form (i.e. is table with less
problems)

Database [ITC218]:

Example:
S {S#, SNAME, STATUS, CITY}
P {P#, PNAME, COLOR, WEIGHT, CITY}
SP {S#, P#, QTY}
In Suppliers/Parts Database shown above, could
delete CITY from S and insert into SP, but value
would be duplicated for every occurrence of a
particular value of S#, leading to redundancy

Database [ITC218]:

Example Database: Supplier/Parts


S

S#

SNAME

STATUS

CITY

S#

P#

QTY

S1

Smith

20

London

S1

P1

300

S2

Jones

10

Paris

S1

P2

200

S3

Blake

30

Paris

S1

P3

400

S4

Clark

20

London

S1

P4

200

S5

Adams

30

Athens

S1

P5

100

S1

P6

100

S2

P1

300

S2

P2

400

S3

P2

200

S4

P2

200

S4

P4

300

S4

P5

400

SP

P#

PNAME COLOUR

WEIGHT

CITY

P1

Nut

Red

12

London

P2

Bolt

Green

17

Paris

P3

Screw

Blue

17

Rome

P4

Screw

Red

14

London

P5

Cam

Blue

12

Paris

P6

Cog

Red

19

London

Database [ITC218]:

There are lots of Normal Forms and a higher one


implies all lower ones as well
The process is reversible
i.e. going to a higher normal form never loses
information
Main ones in increasing level are:
1NF, 2NF, 3NF, BCNF, 4NF, 5NF
A database should be in 3NF or higher

Database [ITC218]:

A relation is in 1 NF if all the underlying domains


contain only atomic values
For any relational system this is always true of the
relations
It is true for SQL, since only one value allowed in
any field
Normalised sometimes means 1NF but more often
a higher form of normalisation e.g. 3NF or higher

Database [ITC218]:

S#

STATUS

QTY
P#

CITY

FIRST {S#, SNAME, STATUS, CITY, P#, QTY}


PRIMARY KEY {S#, P#}
Note: arrows are out of candidate key along with certain
additional arrows
Database [ITC218]:

Suppose in our database there was only one


table called FIRST instead of S and SP as
shown in previous slide
e.g. FIRST (S#, SNAME, STATUS, CITY, P#, QTY)
Primary Key is now (S#,P#) and there are FDs
other than from a primary key
In addition, add the FD
CITY STATUS
i.e. the status is determined by the location
Database [ITC218]:

S#

STATUS

QTY
P#

CITY

An illustration of the problems this causes is that


we cannot insert a CITY for a Supplier unless
they supply at least one part

Database [ITC218]:

If we delete the sole FIRST tuple for a particular


supplier, we delete not only the shipment
connecting that supplier to a particular part but
also the information that the supplier is located in
a particular city.
The city value for a given supplier appears in
FIRST many times, in general. This redundancy
causes update problems.

Database [ITC218]:

Decomposition
SECOND {S#, STATUS. CITY}
SP {S#, P#, QTY}
S#

STATUS

S#
QTY
P#
Database [ITC218]:

CITY

10

Database:
Second

S#

STATUS

CITY

S#

P#

QTY

S1

20

London

S1

P1

300

S2

10

Paris

S1

P2

200

S3

30

Paris

S1

P3

400

S4

20

London

S1

P4

200

S5

30

Athens

S1

P5

100

S1

P6

100

S2

P1

300

S2

P2

400

S3

P2

200

S4

P2

200

S4

P4

300

S4

P5

400

SP

Now, we can see the definition of 2NF


Database [ITC218]:

11

Definition of 2NF
A relation is in 2NF iff (if and only if) it is in 1NF
and every nonkey attribute is irreducibly
dependent on the primary key
A nonkey attribute is a key that does not
participate in the primary key
This assumes there is only one candidate key,
which we can then assume is the primary key

Database [ITC218]:

12

A relation in 1NF can always be transformed into a


collection of 2NF relations by means of projections
In general, a relation R{A, B, C,D} with primary key
{A,B} having a FD A D
can always be decomposed into the projections
R1{A,D} with primary key A
and
R2{A,B,C} with primary key {A,B} and
foreign key {A} REFERENCES R1
Thus FIRST(S#.STATUS,CITY,P#,QTY) can be
decomposed into:
SECOND(S#, STATUS, CITY) and SP(S#,P#,QTY)
Database [ITC218]:

S#

S#

13

STATUS

QTY
P#

CITY

This shows the 2NF tables diagrammatically


The transformation from 1NF to 2NF is an
example of non-loss decomposition
Database [ITC218]:

14

However 2NF still causes problems, in that


although the dependency of STATUS on S# is
functional and irreducible, it is also transitive
S# CITY STATUS
which has the logical consequence that
S# STATUS
also holds
In general, if A B C then A C
This causes problems with insert, delete and
update
In fact, we want all the nonkey attributes in a
relation to be mutually independent, so that each
can be updated independently of the rest
Database [ITC218]:

15

INSERT: Particular city having particular status


cannot be inserted until we have some supplier
actually located in that city.
DELETE: If the sole SECOND tuple for a
particular city is deleted, then we delete not only
the information for the supplier concerned but also
the information that that city has that particular
status.
UPDATE: The status for a given city appears in
FIRST many times, in general. This redundancy
causes update problems.
Database [ITC218]:

16

Replace the original table (SECOND in this case)


by two projections,
SC {S#, CITY}
CS {CITY, STATUS}
CITY
S#
CITY

STATUS

Now, we can define 3NF

Database [ITC218]:

17

Definition of 3NF:
A relation is in 3NF iff it is in 2NF and every
nonkey attribute is nontransitively dependent
on the primary key
This implies that there are no mutual
dependencies between the nonkey attributes
A relation that is in 2NF can always be
transformed by means of projections into a
collection of 3NF relations

Database [ITC218]:

18

Thus given a relation R(A,B,C) with primary key {A} and


assuming the FD B C, we normalise taking the
projections
R1{B,C} with primary key {B}
and
R2{A,B} with primary key {A} and foreign
key {B} references R1
e.g. SECOND{S#, STATUS,CITY} becomes
SC{S#,CITY} and CS{CITY,STATUS}
Note. The relation SP was already in 3NF
Database [ITC218]:

19

Boyce/Codd Normal Form(BCNF)

The definition of 3NF assumed in a relation that


there was only one candidate key
BCNF is a slightly stronger normal form that
addresses this (named from its authors)
It deals with the case where a relation has:
1. Two or more candidate keys such that
2. The candidate keys were composite, and
3. They overlapped (had at least one attribute in
common)
Database [ITC218]:

20

Boyce/Codd Normal Form(cont)


Formal Definition: A relation is in BCNF iff every
nontrivial left-irreducible FD has a candidate key as
its determinant (i.e. LHS)
Informal Definition: A relation is in BCNF iff the only
determinants are candidate keys
This means that in a FD diagram, the only arrows are
out of candidate keys
With no reference to 3NF, this is stronger and implies
3NF
It is conceptually simpler than 3NF
Relations FIRST and SECOND are not in BCNF
Relations SP , SC and CS are in BCNF
Examples: See Date pp.366-372
Database [ITC218]:

21

Consider a relation representing courses, teachers and texts


There may be several teachers on one course and several
texts, but the set of texts are independent of the set of
teachers
Could represent by a row for every combination of teacher
and text but this contains redundancy which could lead to
update anomalies, i.e. CTX(COURSE,TEACHER,TEXT)
This is an example of two multi-valued dependencies
(MVDs)
COURSE TEACHER and COURSE TEXT

Database [ITC218]:

22

Definition: Let R be a relation, and let A,B, and C be


subsets of the attributes of R, then
B is multi-dependent on A
AB
iff in every possible legal value of R, the set of B values
matching a given (A value, C value) pair depends only on
the A value (i.e. is independent of the C value)
Every FD is a MVD (but not the other way round)
MVDs go in pairs: if MVD AB holds then so does
AC often written as A B | C
e.g. COURSETEACHER and COURSETEXT
Database [ITC218]:

23

Fourth Normal Form (4NF)


Fagins Theorem
Let R{A,B,C} be a relation, where A,B, and C are sets
of attributes. Then R is equal to the join of its
projections on {A,B} and {A,C} iff R satisfies the
MVD A B | C
Definition of 4NF:
A relation R is in 4NF iff whenever there exists subsets
A and B of the attributes of R such that the nontrivial
MVD AB is satisfied, then all the attributes of R
are also functionally dependent on A
An MVD AB is trivial if either A is a superset of B or
the union of A and B is the entire heading
Database [ITC218]:

24

Thus instead of having the relation


CTX(COURSE,TEACHER,TEXT)
which is in BCNF, it is decomposed into two projections
CT(COURSE,TEACHER)
and
CX (COURSE,TEXT)
which are now in 4NF
Notes:
1. 4NF implies BCNF
2. 4NF is always achievable but not always desirable, due to other
constraints

Database [ITC218]:

25

Assumption so far has been that the relation that we are


normalising can be non-loss decomposed into two
projections
Not possible for all relations: some relations are only ndecomposable where n>2
e.g Relation SPJ(S#,P#,J#) which is an extension of our
SP table to include a project number (ignore QTY for
simplicity). Candidate key {S#, P#, J#}
Decomposition into only two projections and then joining
causes the addition of spurious triples (Date p.395)

Database [ITC218]:

26

A set of non-loss projections are:


SPJ{S#,P#} SPJ{P#,J#} SPJ{J#,S#}

Definition of Join dependency:


Let R be a relation, and let A,B, ,Z be subsets of the
attributes of R. Then we say that R satisfies the JD
* { A, B, Z}
iff every possible legal value of R is equal to the join of
its projections on A,B, Z
i.e. SPJ satisfies *{ SP, PJ, JS}

Database [ITC218]:

27

Fagins theorem be restated as:


R{A,B,C} satisfies the JD *{AB, AC} iff it satisfies the
MVDs A B | C
This theorem can be taken as a definition of MVD , thus
MVD is just a special case of JD
Definition of 5NF:
A relation is in 5NF iff every nontrivial join dependency
that holds for R is implied by the candidate keys of R
e.g SPJ is not in 5NF because:
a) It can be 3-decomposed
b) this 3-decomposability is not implied by the fact that
the combination{S#,P#,J#} is a candidate key
Database [ITC218]:

28

What does it mean for a JD to be implied by candidate keys?


Suppose suppliers relation S had two candidate keys, S# and SNAME
Then the relation satisfies several join dependencies, e.g.
* { { S#, SNAME, STATUS}, {S#, CITY} }

This join dependency is implied by the candidate key S#


A superset of a candidate key is called a superkey

A given JD *{A,B,,Z} is implied by candidate keys iff


each of the A,B, .Z is a superkey for the relation
Thus we can tell if a relation is in 5NF if we know all its
candidate keys and all it s JDs
However, discovering all the JDs might not be easy
because they might not be intuitively obvious (unlike FDs
and MVDs)
Database [ITC218]:

29

1.
2.
3.

To eliminate certain types of redundancy


To avoid certain update anomalies
To produce a design that is a good representation of the
real world (i.e. utilise semantic information)
4. To simplify the enforcement of certain integrity
constraints
e.g. if in 5NF and we enforce uniqueness of candidate
keys, then we automatically enforce all JDs (and all
MVDs and FDs) , since they are implied by the
candidate keys
Database [ITC218]:

30

Potrebbero piacerti anche