Sei sulla pagina 1di 9

8.

Normalization
see
Pernul, Unland: Datenbanken im Unternehmen, Chapter 3.6.3.3
also:
Elmasri, Navathe: Fundamentals of Database Systems, Chapter 10 -
11

8.1 Normal Forms


8.2 Normalization

Normalization of data is a process of analyzing the given relation schemas


based on their FDs and primary keys to achieve the desirable properties of
minimizing redundancy and minimizing insertion, deletion and update
anomalies. First of all we have to define the different normal forms.

top

8.1 Normal Forms


Normal Forms are a measuring system for the quality of a relational schema.

First Normal Form (1NF):

A relational schema RS({A1, ...,An}; {F}) is in 1NF if dom(Ai) (i=1..n) is atomic.

This means that the domain of the attribute may only contain single, individable
values and that the value of the attribute in each tuple may only be a single
value from the domain. So, 1NF prevents some set of values or tuples being an
attribute value for a single tuple. In other words, 1NF prevents "relations inside
relations" or "relations as attributes of tuples".
To make this more clear, take a look at the following example "company":

Department Number City Zip Code Employees


101 Regensburg 93053 Obermeier
102 Regensburg 93047 Müller, Denk
103 Munich 80331 Haller, Holzer
104 Cologne 50667 Muster
105 Berlin 13629 Fischer, Eger

The same example in 1NF would look like this:

Department Number City Zip Code Employee


101 Regensburg 93053 Obermeier
102 Regensburg 93047 Müller
102 Regensburg 93047 Denk
103 Munich 80331 Haller
103 Munich 80331 Holzer
104 Cologne 50667 Muster
105 Berlin 13629 Fischer
105 Berlin 13629 Eger

Unfortunately, this structure contains multiple redundancies. These can be


eliminated by the following Normal Forms.

Second Normal Form (2NF):

An 1NF relational schema RS(S;F) is in 2NF if each nonprime attribute of S is


fully functional dependent from each key of RS.

If a relation schema is not in 2NF it can be decomposed to some new 2NF-


relations where nonprime attributes are only associated with the part of the
primary key they fully functional (remember: Y depends fully functional on X if
FD X → Y is left-reduced) depend on.

E.g. we can find following functional redundancies in the relation above:

Department Number → City


Department Number, Employee → Department Number, City, Zip Code,
Employee
In our example, Department Number and Employee together form a key
because of the second FD. However, City only depends on Department
Number, which is a part of the key. So we could build two separate relations
(Department Number, City) and (Department Number, Zip Code, Employee) to
eliminate the redundancy.

Third Normal Form (3NF):

A 2NF relational schema RS(S;F) is in 3NF if no nonprime attribute is


transitively dependent on the key.

Transitive Dependency:

Given: X, Y ⊆ S and an attribute A from S. Attribute A is transitively dependent


on X via Y if the following holds:
X → Y, Y X, Y → A, A ∉ XY.

This means that X determines Y, but Y doesn't determine X. Instead, Y


determines A, and A is no attribute from XY.

In the section on 2NF we did not consider that City → Zip Code. Let's expand
our relation (not in 3NF):

Department Zip Street Employee


City Street
Nr. Code Nr. Name
101 Regensburg 93053 Universitätsstraße 1 Obermeier
102 Regensburg 93047 Ägidienplatz 2 Müller
103 Munich 80331 Tal 41 Haller
104 Cologne 50667 Helenenstraße 14 Muster
105 Berlin 13629 Jugendweg 4 Fischer
106 Berlin 13629 Rohrdamm 80 Birkert
107 Hamburg 20095 Bugenhagenstraße 28 Schmidt

Now, we have the FDs Department Number → Zip Code and Zip Code → City,
so City depends on Department Number transitively because of Zip Code. In
the table of the example we can see that the information "City Berlin belongs to
Zip Code 13629" is redundant. So again we could form a new relation (City, Zip
Code) to eliminate this redundancy.

If we choose Y as a subset of the set of keys, we can see that partial functional
dependencies are a special kind of transitive dependencies; so 3NF implies
2NF.
Boyce-Codd-Normal Form:

The BCNF is an even stronger normal form than 3NF, as sometimes there are
dependencies between prime attributes. A relation scheme is in BCNF if for
every nontrivial FD X → A, X is a superkey of R; this means that every FD has
to have a superkey on the left side. Technically:

A 3NF relational schema RS(S;F) is in BCNF if for each Y ⊆ S and for each
attribute A ∈ S\Y the following holds: Y → A ⇒ Y → S.

In the table above, we also can notice the following FDs of attributes Zip Code,
City, Street and Street Number:
City, Street, Street Number → Zip Code
Zip Code → City

Key for these four attributes is the set City, Street, Street Number as well as the
set Zip Code, Street, Street Number. So all these four attributes are prime, but
there is a dependency between some of them (see from above):

City, Street, Street Number → Zip Code → City

We will not look into the issue on how to acquire BCNF from a 3NF relation.

Examples

Example 1: Determination of the Normal Form

Given:
RS(S;F) with S = {A, B, C, D, E} and F = {AB → CE, E → AB, C → D}

F is already minimal.

Candidate keys: AB, E

From this follows that attributes C and D are nonprime.


Because neither C nor D depend on a subset of the candidate keys, RS(S;F) is
in 2NF.
Because the nonprime attribute D is transitively dependent on the candidate
key AB, 3NF is not given.

Example 2: Determination of the Normal Form

Given:
RS(S;F) with S = {A, B, C, D} and F = {AC → BD, D → A, CD → A}

Minimal cover of F: F' = { AC → BD, D → A }

Candidate keys : AC, CD

Nonprime attribute: B
RS(S;F) is in 2NF, because B is fully functional dependent on candidate keys.
It is in 3NF because B is not transitively dependent on a candidate key.
It is not in BCNF, because the functional dependency D → A is not a key
dependency.

top

8.2 Normalization
If a relation schema is normalized, it is decomposed into smaller relation
schemas that show the desirable properties. These smaller relation schemas
have a normal form of a higher degree. But there are some restrictions:

• The semantics must remain intact.


• No loss of information is allowed (attribute preservation).
• The nonadditive join property and the dependency preservation property
must be ensured (see below).

Nonadditive Join Property

This means, if a decomposed relation schema is going to be reconstructed by a


natural join, no additional tuples are allowed to occur (these tuples are called
spurios tuples).

Example: Violation of the Nonadditive Join Property

r:

A B C
1 1 1
1 2 2
2 1 2

Decomposition:

πAB(r):

A B
1 1
1 2
2 1
πBC(r):

B C
1 1
2 2
1 2

Join:

πAB(r) |><| πBC(r):

A B C
1 1 1
1 1 2
1 2 2
2 1 1
2 1 2

Dependency Preservation Property

The dependency preservation property ensures that each functional


dependency is represented in some individual relations after decomposition

Example: Violation of the Dependency Preservation Property

RS({ A, B, C, D, E }; {A → BCD, CD → E, AE → B}) is decomposed into:

RS1({A, B, C, D}; {A → BCD}) and RS2({C, D, E }; {CD → E})

The FD AE → B is lost!

Definition:

A decomposition of RS(S;F) into {RSi(Xi;Fi)} (i=1, ... , n) holds, if

• (attribute preservation)

• (dependency preservation property)


i.e. the union of all Fi is equivalent to F
• each relation r(S) with the schema RS(S;F) and decomposition RSi(Xi;Fi)
fulfills the condition:
(nonadditive join property)
i.e. a natural join of the projections of r(S) on the subsets Xi results in the
original relation r(S)

Algorithm NORMALIZATION(RS(S;F))
Input: a relation schema RS(S;F) in an undesirable form
Output: a valid 3NF-decomposition of RS(S;F) into {RSi(Xi;Fi)} (i=1, ..., n)

NORMALIZATION(RS(S;F))
BEGIN
F := REDUCE(F);

[determine the candidate keys of RS(S;F)]


FOR EACH FD X→Y ∈ F DO
determine X+;

[determine groups of equivalent FDs]


FOR EACH group DO
generate a schema RSi(Xi;Fi)
Xi are all attributes
Fi are all FDs of a group;

MERGE(RSi(Xi;Fi), RSj(Xj;Fj))

IF NOT nonadditive join property holds THEN


generate another schema RSi(Xi;Fi) with Xi
attributes of a candidate key and Fi ={};
END

MERGE(RSi(Xi;Fi), RSj(Xj;Fj))
BEGIN
IF Xi ⊆ Xj and RSi(Xi ∪ Xj, Fi ∪ Fj) suffice the 3NF THEN
merge RSi(Xi;Fi) and RSj(Xj;Fj)
END

Example: Normalization

F={

Exercise → Lecturer (E → L)
/* Each exercise is coached by a lecturer */

Exercise, Student → Grade (ES → G)


/* Only one grade per exercise and student */

Time, Lecture Room → Exercise (TR → E)


/* Only one exercise per room at the same time */
Time, Student → Lecture Room (TS → R)
/* A student can only be at one lecture room at the same time */

Time, Lecturer → Lecture Room (TL → R)


/* A lecturer can only be at one lecture room at the same time */

Time, Student → Exercise (TS → E)


/* A student can only be at one exercise at the same time */

Time, Lecturer → Exercise (TL → E)


/* A lecturer can coach only one exercise at the same time */

Time, Exercise → Lecture Room (TE → R)


/* Only one exercise can take place at one room at the same time */

A minimal cover shall be generated. If this is not in 3NF, decompose it.

REDUCE(F) results in:

F'={ E → L, ES → G, TR → E, TS → E, TL → E, TE → R }

In the next step groups of equivalent FDs are determined:

• {E}+ = EL; Group 1


• {ES}+ = ESG; Group 2
• {TR}+ = TREL; Group 3
• {TS}+ = TRELSG; Group 4
• {TL}+ = TREL; Group 3
• {TE}+ = TREL; Group 3

Therefore: TR → E, TL → E, TE → R are equivalent FDs (merged into the


same group) and the only candidate key is TS. From each group a relation
schema shall be generated in the next step:

RS1({E,L}; {E → L}),

RS2({ESG}; {ES → G}),

RS3({TREL}; {TR → E, TL → E, TE → R}) and

RS4({TSE}; {TS → E})

Both relation schemas RS1 and RS3 can be merged because of RS1 ⊆ RS3
and because the new relation schema does not violate any 3NF-constraints. In
the relation schema RS4 the key TS is also the only candidate key, thus the
nonadditive join property is granted. All FDs of F+ are also enclosed in the
decompositions and therefore the dependency preservation property is granted
as well. A possible solution could look like this:
({Exercise, Student, Grade}; {Exercise Student → Grade})
test
candidate key: Exercise Student; BCNF
({Time, Lecture Room, Exercise, Lecturer}; { Time Lecture Room →
Exercise, Time Lecturer → Exercise, Time Exercise → Lecture Room,
assignment Exercise → Lecturer})
candidate key: Time Lecture Room, Time Lecturer, Time Exercise;
3NF
({Time, Student, Exercise}; {Time Student → Exercise})
timetable
candidate key: Time Student; BCNF

Note: The relation schema "assignment" is not in BCNF. Hence, anomalies like
the insertion anomaly still can occur: An exercise can only be inserted into the
database if a time is available for it; a lecturer and a lecture room can only be
inserted if they are assigned to an exercise. The first anomaly can be avoided
by not merging RS1 and RS3. The other ones are based on the fact that no
more properties are specified for lecturer and lecture room.

top

Potrebbero piacerti anche