Adbms Lectures Midterm-Unit 1-Ch7 Normalization

A table is always said to be in some normal form.
A relation is said to be in certain normal form if it

satisfies certain prescribed set of conditions.
The aim of normalisation is to have some rules
which help to produce a sensible set of tables
Normalization is the successive decomposition of
a relation into further relations that are said to
have more desirable form (i.e. is table with less
problems)
Database [ITC218]:
Example:
S {S#, SNAME, STATUS, CITY}
P {P#, PNAME, COLOR, WEIGHT, CITY}
SP {S#, P#, QTY}
In Suppliers/Parts Database shown above, could
delete CITY from S and insert into SP, but value
would be duplicated for every occurrence of a
particular value of S#, leading to redundancy
Database [ITC218]:
Example Database: Supplier/Parts

S
S#
SNAME
STATUS
CITY
S#
P#
QTY
S1
Smith
20
London
S1
P1
300
S2
Jones
10
Paris
S1
P2
200
S3
Blake
30
Paris
S1
P3
400
S4
Clark
20
London
S1
P4
200
S5
Adams
30
Athens
S1
P5
100
S1
P6
100
S2
P1
300
S2
P2
400
S3
P2
200
S4
P2
200
S4
P4
300
S4
P5
400
SP
P#
PNAME COLOUR
WEIGHT
CITY
P1
Nut
Red
12
London
P2
Bolt
Green
17
Paris
P3
Screw
Blue
17
Rome
P4
Screw
Red
14
London
P5
Cam
Blue
12
Paris
P6
Cog
Red
19
London
Database [ITC218]:
There are lots of Normal Forms and a higher one

implies all lower ones as well
The process is reversible
i.e. going to a higher normal form never loses
information
Main ones in increasing level are:
1NF, 2NF, 3NF, BCNF, 4NF, 5NF
A database should be in 3NF or higher
Database [ITC218]:
A relation is in 1 NF if all the underlying domains

contain only atomic values
For any relational system this is always true of the
relations
It is true for SQL, since only one value allowed in
any field
Normalised sometimes means 1NF but more often
a higher form of normalisation e.g. 3NF or higher
Database [ITC218]:
S#
STATUS
QTY
P#
CITY
FIRST {S#, SNAME, STATUS, CITY, P#, QTY}

PRIMARY KEY {S#, P#}
Note: arrows are out of candidate key along with certain
additional arrows
Database [ITC218]:
Suppose in our database there was only one

table called FIRST instead of S and SP as
shown in previous slide
e.g. FIRST (S#, SNAME, STATUS, CITY, P#, QTY)
Primary Key is now (S#,P#) and there are FDs
other than from a primary key
In addition, add the FD
CITY STATUS
i.e. the status is determined by the location
Database [ITC218]:
S#
STATUS
QTY
P#
CITY
An illustration of the problems this causes is that

we cannot insert a CITY for a Supplier unless
they supply at least one part
Database [ITC218]:
If we delete the sole FIRST tuple for a particular

supplier, we delete not only the shipment
connecting that supplier to a particular part but
also the information that the supplier is located in
a particular city.
The city value for a given supplier appears in
FIRST many times, in general. This redundancy
causes update problems.
Database [ITC218]:
Decomposition
SECOND {S#, STATUS. CITY}
SP {S#, P#, QTY}
S#
STATUS
S#
QTY
P#
Database [ITC218]:
CITY
10
Database:
Second
S#
STATUS
CITY
S#
P#
QTY
S1
20
London
S1
P1
300
S2
10
Paris
S1
P2
200
S3
30
Paris
S1
P3
400
S4
20
London
S1
P4
200
S5
30
Athens
S1
P5
100
S1
P6
100
S2
P1
300
S2
P2
400
S3
P2
200
S4
P2
200
S4
P4
300
S4
P5
400
SP
Now, we can see the definition of 2NF

Database [ITC218]:
11
Definition of 2NF
A relation is in 2NF iff (if and only if) it is in 1NF
and every nonkey attribute is irreducibly
dependent on the primary key
A nonkey attribute is a key that does not
participate in the primary key
This assumes there is only one candidate key,
which we can then assume is the primary key
Database [ITC218]:
12
A relation in 1NF can always be transformed into a

collection of 2NF relations by means of projections
In general, a relation R{A, B, C,D} with primary key
{A,B} having a FD A D
can always be decomposed into the projections
R1{A,D} with primary key A
and
R2{A,B,C} with primary key {A,B} and
foreign key {A} REFERENCES R1
Thus FIRST(S#.STATUS,CITY,P#,QTY) can be
decomposed into:
SECOND(S#, STATUS, CITY) and SP(S#,P#,QTY)
Database [ITC218]:
S#
S#
13
STATUS
QTY
P#
CITY
This shows the 2NF tables diagrammatically

The transformation from 1NF to 2NF is an
example of non-loss decomposition
Database [ITC218]:
14
However 2NF still causes problems, in that

although the dependency of STATUS on S# is
functional and irreducible, it is also transitive
S# CITY STATUS
which has the logical consequence that
S# STATUS
also holds
In general, if A B C then A C
This causes problems with insert, delete and
update
In fact, we want all the nonkey attributes in a
relation to be mutually independent, so that each
can be updated independently of the rest
Database [ITC218]:
15
INSERT: Particular city having particular status

cannot be inserted until we have some supplier
actually located in that city.
DELETE: If the sole SECOND tuple for a
particular city is deleted, then we delete not only
the information for the supplier concerned but also
the information that that city has that particular
status.
UPDATE: The status for a given city appears in
FIRST many times, in general. This redundancy
causes update problems.
Database [ITC218]:
16
Replace the original table (SECOND in this case)

by two projections,
SC {S#, CITY}
CS {CITY, STATUS}
CITY
S#
CITY
STATUS
Now, we can define 3NF
Database [ITC218]:
17
Definition of 3NF:
A relation is in 3NF iff it is in 2NF and every
nonkey attribute is nontransitively dependent
on the primary key
This implies that there are no mutual
dependencies between the nonkey attributes
A relation that is in 2NF can always be
transformed by means of projections into a
collection of 3NF relations
Database [ITC218]:
18
Thus given a relation R(A,B,C) with primary key {A} and

assuming the FD B C, we normalise taking the
projections
R1{B,C} with primary key {B}
and
R2{A,B} with primary key {A} and foreign
key {B} references R1
e.g. SECOND{S#, STATUS,CITY} becomes
SC{S#,CITY} and CS{CITY,STATUS}
Note. The relation SP was already in 3NF
Database [ITC218]:
19
Boyce/Codd Normal Form(BCNF)
The definition of 3NF assumed in a relation that

there was only one candidate key
BCNF is a slightly stronger normal form that
addresses this (named from its authors)
It deals with the case where a relation has:
1. Two or more candidate keys such that
2. The candidate keys were composite, and
3. They overlapped (had at least one attribute in
common)
Database [ITC218]:
20
Boyce/Codd Normal Form(cont)

Formal Definition: A relation is in BCNF iff every
nontrivial left-irreducible FD has a candidate key as
its determinant (i.e. LHS)
Informal Definition: A relation is in BCNF iff the only
determinants are candidate keys
This means that in a FD diagram, the only arrows are
out of candidate keys
With no reference to 3NF, this is stronger and implies
3NF
It is conceptually simpler than 3NF
Relations FIRST and SECOND are not in BCNF
Relations SP , SC and CS are in BCNF
Examples: See Date pp.366-372
Database [ITC218]:
21
Consider a relation representing courses, teachers and texts

There may be several teachers on one course and several
texts, but the set of texts are independent of the set of
teachers
Could represent by a row for every combination of teacher
and text but this contains redundancy which could lead to
update anomalies, i.e. CTX(COURSE,TEACHER,TEXT)
This is an example of two multi-valued dependencies
(MVDs)
COURSE TEACHER and COURSE TEXT
Database [ITC218]:
22
Definition: Let R be a relation, and let A,B, and C be

subsets of the attributes of R, then
B is multi-dependent on A
AB
iff in every possible legal value of R, the set of B values
matching a given (A value, C value) pair depends only on
the A value (i.e. is independent of the C value)
Every FD is a MVD (but not the other way round)
MVDs go in pairs: if MVD AB holds then so does
AC often written as A B | C
e.g. COURSETEACHER and COURSETEXT
Database [ITC218]:
23
Fourth Normal Form (4NF)

Fagins Theorem
Let R{A,B,C} be a relation, where A,B, and C are sets
of attributes. Then R is equal to the join of its
projections on {A,B} and {A,C} iff R satisfies the
MVD A B | C
Definition of 4NF:
A relation R is in 4NF iff whenever there exists subsets
A and B of the attributes of R such that the nontrivial
MVD AB is satisfied, then all the attributes of R
are also functionally dependent on A
An MVD AB is trivial if either A is a superset of B or
the union of A and B is the entire heading
Database [ITC218]:
24
Thus instead of having the relation

CTX(COURSE,TEACHER,TEXT)
which is in BCNF, it is decomposed into two projections
CT(COURSE,TEACHER)
and
CX (COURSE,TEXT)
which are now in 4NF
Notes:
1. 4NF implies BCNF
2. 4NF is always achievable but not always desirable, due to other
constraints
Database [ITC218]:
25
Assumption so far has been that the relation that we are

normalising can be non-loss decomposed into two
projections
Not possible for all relations: some relations are only ndecomposable where n>2
e.g Relation SPJ(S#,P#,J#) which is an extension of our
SP table to include a project number (ignore QTY for
simplicity). Candidate key {S#, P#, J#}
Decomposition into only two projections and then joining
causes the addition of spurious triples (Date p.395)
Database [ITC218]:
26
A set of non-loss projections are:

SPJ{S#,P#} SPJ{P#,J#} SPJ{J#,S#}
Definition of Join dependency:

Let R be a relation, and let A,B, ,Z be subsets of the
attributes of R. Then we say that R satisfies the JD
* { A, B, Z}
iff every possible legal value of R is equal to the join of
its projections on A,B, Z
i.e. SPJ satisfies *{ SP, PJ, JS}
Database [ITC218]:
27
Fagins theorem be restated as:

R{A,B,C} satisfies the JD *{AB, AC} iff it satisfies the
MVDs A B | C
This theorem can be taken as a definition of MVD , thus
MVD is just a special case of JD
Definition of 5NF:
A relation is in 5NF iff every nontrivial join dependency
that holds for R is implied by the candidate keys of R
e.g SPJ is not in 5NF because:
a) It can be 3-decomposed
b) this 3-decomposability is not implied by the fact that
the combination{S#,P#,J#} is a candidate key
Database [ITC218]:
28
What does it mean for a JD to be implied by candidate keys?

Suppose suppliers relation S had two candidate keys, S# and SNAME
Then the relation satisfies several join dependencies, e.g.
* { { S#, SNAME, STATUS}, {S#, CITY} }
This join dependency is implied by the candidate key S#

A superset of a candidate key is called a superkey
A given JD *{A,B,,Z} is implied by candidate keys iff

each of the A,B, .Z is a superkey for the relation
Thus we can tell if a relation is in 5NF if we know all its
candidate keys and all it s JDs
However, discovering all the JDs might not be easy
because they might not be intuitively obvious (unlike FDs
and MVDs)
Database [ITC218]:
29
1.
2.
3.
To eliminate certain types of redundancy

To avoid certain update anomalies
To produce a design that is a good representation of the
real world (i.e. utilise semantic information)
4. To simplify the enforcement of certain integrity
constraints
e.g. if in 5NF and we enforce uniqueness of candidate
keys, then we automatically enforce all JDs (and all
MVDs and FDs) , since they are implied by the
candidate keys
Database [ITC218]:
30

Adbms Lectures Midterm-Unit 1-Ch7 Normalization

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Adbms Lectures Midterm-Unit 1-Ch7 Normalization

Caricato da

Copyright:

Formati disponibili

A table is always said to be in some normal form.

A relation is said to be in certain normal form if it

Example Database: Supplier/Parts

There are lots of Normal Forms and a higher one

A relation is in 1 NF if all the underlying domains

FIRST {S#, SNAME, STATUS, CITY, P#, QTY}

Suppose in our database there was only one

An illustration of the problems this causes is that

If we delete the sole FIRST tuple for a particular

Now, we can see the definition of 2NF

A relation in 1NF can always be transformed into a

This shows the 2NF tables diagrammatically

However 2NF still causes problems, in that

INSERT: Particular city having particular status

Replace the original table (SECOND in this case)

Now, we can define 3NF

Thus given a relation R(A,B,C) with primary key {A} and

Boyce/Codd Normal Form(BCNF)

The definition of 3NF assumed in a relation that

Boyce/Codd Normal Form(cont)

Consider a relation representing courses, teachers and texts

Definition: Let R be a relation, and let A,B, and C be

Fourth Normal Form (4NF)

Thus instead of having the relation

Assumption so far has been that the relation that we are

A set of non-loss projections are:

Definition of Join dependency:

Fagins theorem be restated as:

What does it mean for a JD to be implied by candidate keys?

This join dependency is implied by the candidate key S#

A given JD *{A,B,,Z} is implied by candidate keys iff

To eliminate certain types of redundancy

Potrebbero piacerti anche