RelativeResourceManager JSESSIONID fKvNMv4Jb1T2cJjGTz3KlyjK315Rhy2JvCkkWBL6gTlg09QpKCB1!271448655!Wctng - Uni Ffo

8.
Normalization
see
Pernul, Unland: Datenbanken im Unternehmen, Chapter 3.6.3.3
also:
Elmasri, Navathe: Fundamentals of Database Systems, Chapter 10 -
11
8.1 Normal Forms

8.2 Normalization
Normalization of data is a process of analyzing the given relation schemas

based on their FDs and primary keys to achieve the desirable properties of
minimizing redundancy and minimizing insertion, deletion and update
anomalies. First of all we have to define the different normal forms.
top
8.1 Normal Forms

Normal Forms are a measuring system for the quality of a relational schema.
First Normal Form (1NF):
A relational schema RS({A1, ...,An}; {F}) is in 1NF if dom(Ai) (i=1..n) is atomic.
This means that the domain of the attribute may only contain single, individable
values and that the value of the attribute in each tuple may only be a single
value from the domain. So, 1NF prevents some set of values or tuples being an
attribute value for a single tuple. In other words, 1NF prevents "relations inside
relations" or "relations as attributes of tuples".
To make this more clear, take a look at the following example "company":
Department Number City Zip Code Employees

101 Regensburg 93053 Obermeier
102 Regensburg 93047 Müller, Denk
103 Munich 80331 Haller, Holzer
104 Cologne 50667 Muster
105 Berlin 13629 Fischer, Eger
The same example in 1NF would look like this:
Department Number City Zip Code Employee

101 Regensburg 93053 Obermeier
102 Regensburg 93047 Müller
102 Regensburg 93047 Denk
103 Munich 80331 Haller
103 Munich 80331 Holzer
104 Cologne 50667 Muster
105 Berlin 13629 Fischer
105 Berlin 13629 Eger
Unfortunately, this structure contains multiple redundancies. These can be

eliminated by the following Normal Forms.
Second Normal Form (2NF):
An 1NF relational schema RS(S;F) is in 2NF if each nonprime attribute of S is

fully functional dependent from each key of RS.
If a relation schema is not in 2NF it can be decomposed to some new 2NF-

relations where nonprime attributes are only associated with the part of the
primary key they fully functional (remember: Y depends fully functional on X if
FD X → Y is left-reduced) depend on.
E.g. we can find following functional redundancies in the relation above:
Department Number → City

Department Number, Employee → Department Number, City, Zip Code,
Employee
In our example, Department Number and Employee together form a key
because of the second FD. However, City only depends on Department
Number, which is a part of the key. So we could build two separate relations
(Department Number, City) and (Department Number, Zip Code, Employee) to
eliminate the redundancy.
Third Normal Form (3NF):
A 2NF relational schema RS(S;F) is in 3NF if no nonprime attribute is

transitively dependent on the key.
Transitive Dependency:
Given: X, Y ⊆ S and an attribute A from S. Attribute A is transitively dependent

on X via Y if the following holds:
X → Y, Y X, Y → A, A ∉ XY.
This means that X determines Y, but Y doesn't determine X. Instead, Y

determines A, and A is no attribute from XY.
In the section on 2NF we did not consider that City → Zip Code. Let's expand
our relation (not in 3NF):
Department Zip Street Employee

City Street
Nr. Code Nr. Name
101 Regensburg 93053 Universitätsstraße 1 Obermeier
102 Regensburg 93047 Ägidienplatz 2 Müller
103 Munich 80331 Tal 41 Haller
104 Cologne 50667 Helenenstraße 14 Muster
105 Berlin 13629 Jugendweg 4 Fischer
106 Berlin 13629 Rohrdamm 80 Birkert
107 Hamburg 20095 Bugenhagenstraße 28 Schmidt
Now, we have the FDs Department Number → Zip Code and Zip Code → City,
so City depends on Department Number transitively because of Zip Code. In
the table of the example we can see that the information "City Berlin belongs to
Zip Code 13629" is redundant. So again we could form a new relation (City, Zip
Code) to eliminate this redundancy.
If we choose Y as a subset of the set of keys, we can see that partial functional
dependencies are a special kind of transitive dependencies; so 3NF implies
2NF.
Boyce-Codd-Normal Form:
The BCNF is an even stronger normal form than 3NF, as sometimes there are
dependencies between prime attributes. A relation scheme is in BCNF if for
every nontrivial FD X → A, X is a superkey of R; this means that every FD has
to have a superkey on the left side. Technically:
A 3NF relational schema RS(S;F) is in BCNF if for each Y ⊆ S and for each
attribute A ∈ S\Y the following holds: Y → A ⇒ Y → S.
In the table above, we also can notice the following FDs of attributes Zip Code,
City, Street and Street Number:
City, Street, Street Number → Zip Code
Zip Code → City
Key for these four attributes is the set City, Street, Street Number as well as the
set Zip Code, Street, Street Number. So all these four attributes are prime, but
there is a dependency between some of them (see from above):
City, Street, Street Number → Zip Code → City
We will not look into the issue on how to acquire BCNF from a 3NF relation.
Examples
Example 1: Determination of the Normal Form
Given:
RS(S;F) with S = {A, B, C, D, E} and F = {AB → CE, E → AB, C → D}
F is already minimal.
Candidate keys: AB, E
From this follows that attributes C and D are nonprime.

Because neither C nor D depend on a subset of the candidate keys, RS(S;F) is
in 2NF.
Because the nonprime attribute D is transitively dependent on the candidate
key AB, 3NF is not given.
Example 2: Determination of the Normal Form
Given:
RS(S;F) with S = {A, B, C, D} and F = {AC → BD, D → A, CD → A}
Minimal cover of F: F' = { AC → BD, D → A }
Candidate keys : AC, CD
Nonprime attribute: B
RS(S;F) is in 2NF, because B is fully functional dependent on candidate keys.
It is in 3NF because B is not transitively dependent on a candidate key.
It is not in BCNF, because the functional dependency D → A is not a key
dependency.
top
8.2 Normalization
If a relation schema is normalized, it is decomposed into smaller relation
schemas that show the desirable properties. These smaller relation schemas
have a normal form of a higher degree. But there are some restrictions:
• The semantics must remain intact.

• No loss of information is allowed (attribute preservation).
• The nonadditive join property and the dependency preservation property
must be ensured (see below).
Nonadditive Join Property
This means, if a decomposed relation schema is going to be reconstructed by a

natural join, no additional tuples are allowed to occur (these tuples are called
spurios tuples).
Example: Violation of the Nonadditive Join Property
r:
A B C
1 1 1
1 2 2
2 1 2
Decomposition:
πAB(r):
A B
1 1
1 2
2 1
πBC(r):
B C
1 1
2 2
1 2
Join:
πAB(r) |><| πBC(r):
A B C
1 1 1
1 1 2
1 2 2
2 1 1
2 1 2
Dependency Preservation Property
The dependency preservation property ensures that each functional

dependency is represented in some individual relations after decomposition
Example: Violation of the Dependency Preservation Property
RS({ A, B, C, D, E }; {A → BCD, CD → E, AE → B}) is decomposed into:
RS1({A, B, C, D}; {A → BCD}) and RS2({C, D, E }; {CD → E})
The FD AE → B is lost!
Definition:
A decomposition of RS(S;F) into {RSi(Xi;Fi)} (i=1, ... , n) holds, if
• (attribute preservation)
• (dependency preservation property)

i.e. the union of all Fi is equivalent to F
• each relation r(S) with the schema RS(S;F) and decomposition RSi(Xi;Fi)
fulfills the condition:
(nonadditive join property)
i.e. a natural join of the projections of r(S) on the subsets Xi results in the
original relation r(S)
Algorithm NORMALIZATION(RS(S;F))
Input: a relation schema RS(S;F) in an undesirable form
Output: a valid 3NF-decomposition of RS(S;F) into {RSi(Xi;Fi)} (i=1, ..., n)
NORMALIZATION(RS(S;F))
BEGIN
F := REDUCE(F);
[determine the candidate keys of RS(S;F)]

FOR EACH FD X→Y ∈ F DO
determine X+;
[determine groups of equivalent FDs]

FOR EACH group DO
generate a schema RSi(Xi;Fi)
Xi are all attributes
Fi are all FDs of a group;
MERGE(RSi(Xi;Fi), RSj(Xj;Fj))
IF NOT nonadditive join property holds THEN

generate another schema RSi(Xi;Fi) with Xi
attributes of a candidate key and Fi ={};
END
MERGE(RSi(Xi;Fi), RSj(Xj;Fj))
BEGIN
IF Xi ⊆ Xj and RSi(Xi ∪ Xj, Fi ∪ Fj) suffice the 3NF THEN
merge RSi(Xi;Fi) and RSj(Xj;Fj)
END
Example: Normalization
F={
Exercise → Lecturer (E → L)
/* Each exercise is coached by a lecturer */
Exercise, Student → Grade (ES → G)

/* Only one grade per exercise and student */
Time, Lecture Room → Exercise (TR → E)

/* Only one exercise per room at the same time */
Time, Student → Lecture Room (TS → R)
/* A student can only be at one lecture room at the same time */
Time, Lecturer → Lecture Room (TL → R)

/* A lecturer can only be at one lecture room at the same time */
Time, Student → Exercise (TS → E)

/* A student can only be at one exercise at the same time */
Time, Lecturer → Exercise (TL → E)

/* A lecturer can coach only one exercise at the same time */
Time, Exercise → Lecture Room (TE → R)

/* Only one exercise can take place at one room at the same time */
A minimal cover shall be generated. If this is not in 3NF, decompose it.
REDUCE(F) results in:
F'={ E → L, ES → G, TR → E, TS → E, TL → E, TE → R }
In the next step groups of equivalent FDs are determined:
• {E}+ = EL; Group 1

• {ES}+ = ESG; Group 2
• {TR}+ = TREL; Group 3
• {TS}+ = TRELSG; Group 4
• {TL}+ = TREL; Group 3
• {TE}+ = TREL; Group 3
Therefore: TR → E, TL → E, TE → R are equivalent FDs (merged into the

same group) and the only candidate key is TS. From each group a relation
schema shall be generated in the next step:
RS1({E,L}; {E → L}),
RS2({ESG}; {ES → G}),
RS3({TREL}; {TR → E, TL → E, TE → R}) and
RS4({TSE}; {TS → E})
Both relation schemas RS1 and RS3 can be merged because of RS1 ⊆ RS3
and because the new relation schema does not violate any 3NF-constraints. In
the relation schema RS4 the key TS is also the only candidate key, thus the
nonadditive join property is granted. All FDs of F+ are also enclosed in the
decompositions and therefore the dependency preservation property is granted
as well. A possible solution could look like this:
({Exercise, Student, Grade}; {Exercise Student → Grade})
test
candidate key: Exercise Student; BCNF
({Time, Lecture Room, Exercise, Lecturer}; { Time Lecture Room →
Exercise, Time Lecturer → Exercise, Time Exercise → Lecture Room,
assignment Exercise → Lecturer})
candidate key: Time Lecture Room, Time Lecturer, Time Exercise;
3NF
({Time, Student, Exercise}; {Time Student → Exercise})
timetable
candidate key: Time Student; BCNF
Note: The relation schema "assignment" is not in BCNF. Hence, anomalies like
the insertion anomaly still can occur: An exercise can only be inserted into the
database if a time is available for it; a lecturer and a lecture room can only be
inserted if they are assigned to an exercise. The first anomaly can be avoided
by not merging RS1 and RS3. The other ones are based on the fact that no
more properties are specified for lecturer and lecture room.
top

RelativeResourceManager JSESSIONID fKvNMv4Jb1T2cJjGTz3KlyjK315Rhy2JvCkkWBL6gTlg09QpKCB1!271448655!Wctng - Uni Ffo

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

RelativeResourceManager JSESSIONID fKvNMv4Jb1T2cJjGTz3KlyjK315Rhy2JvCkkWBL6gTlg09QpKCB1!271448655!Wctng - Uni Ffo

Caricato da

Copyright:

Formati disponibili

8.

8.1 Normal Forms

Normalization of data is a process of analyzing the given relation schemas

8.1 Normal Forms

First Normal Form (1NF):

A relational schema RS({A1, ...,An}; {F}) is in 1NF if dom(Ai) (i=1..n) is atomic.

Department Number City Zip Code Employees

The same example in 1NF would look like this:

Department Number City Zip Code Employee

Unfortunately, this structure contains multiple redundancies. These can be

Second Normal Form (2NF):

An 1NF relational schema RS(S;F) is in 2NF if each nonprime attribute of S is

If a relation schema is not in 2NF it can be decomposed to some new 2NF-

E.g. we can find following functional redundancies in the relation above:

Department Number → City

Third Normal Form (3NF):

A 2NF relational schema RS(S;F) is in 3NF if no nonprime attribute is

Given: X, Y ⊆ S and an attribute A from S. Attribute A is transitively dependent

This means that X determines Y, but Y doesn't determine X. Instead, Y

Department Zip Street Employee

City, Street, Street Number → Zip Code → City

Example 1: Determination of the Normal Form

Candidate keys: AB, E

From this follows that attributes C and D are nonprime.

Example 2: Determination of the Normal Form

Minimal cover of F: F' = { AC → BD, D → A }

Candidate keys : AC, CD

• The semantics must remain intact.

Nonadditive Join Property

This means, if a decomposed relation schema is going to be reconstructed by a

Example: Violation of the Nonadditive Join Property

πAB(r) |><| πBC(r):

Dependency Preservation Property

The dependency preservation property ensures that each functional

Example: Violation of the Dependency Preservation Property

RS({ A, B, C, D, E }; {A → BCD, CD → E, AE → B}) is decomposed into:

RS1({A, B, C, D}; {A → BCD}) and RS2({C, D, E }; {CD → E})

A decomposition of RS(S;F) into {RSi(Xi;Fi)} (i=1, ... , n) holds, if

• (dependency preservation property)

[determine the candidate keys of RS(S;F)]

[determine groups of equivalent FDs]

IF NOT nonadditive join property holds THEN

Exercise, Student → Grade (ES → G)

Time, Lecture Room → Exercise (TR → E)

Time, Lecturer → Lecture Room (TL → R)

Time, Student → Exercise (TS → E)

Time, Lecturer → Exercise (TL → E)

Time, Exercise → Lecture Room (TE → R)

A minimal cover shall be generated. If this is not in 3NF, decompose it.

REDUCE(F) results in:

In the next step groups of equivalent FDs are determined:

• {E}+ = EL; Group 1

Therefore: TR → E, TL → E, TE → R are equivalent FDs (merged into the

RS2({ESG}; {ES → G}),

RS3({TREL}; {TR → E, TL → E, TE → R}) and

RS4({TSE}; {TS → E})

Potrebbero piacerti anche