Sei sulla pagina 1di 30

Notes of Database Management System

UNIT – III

NORMALISATION
Functional Dependencies

A relational schema has attributes (A, B, C... Z) and that the whole database is described by a single universal relation
called R = (A, B, C, ..., Z). This assumption means that every attribute in the database has a unique name. A functional
dependency is a property of the semantics of the attributes in a relation. The semantics indicate how attributes relate to one
another, and specify the functional dependencies between attributes. When a functional dependency is present, the
dependency is specified as a constraint between the attributes.
Consider a relation with attributes A and B, where attribute B is functionally dependent
on attribute A. If we know the value of A and we examine the relation that holds this dependency, we will find only one
value of B in all of the tuples that have a given value of A, at any moment in time.

In the figure above, A is the determinant of B and B is the consequent of A.

The determinant of a functional dependency is the attribute or group of attributes on the left-hand side of the arrow in
the functional dependency. The consequent of a fd is the attribute or group of attributes on the right-hand side of the
arrow.
Consider a relational schema

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 1


Notes of Database Management System

The functional dependency staff# → position clearly holds on this relation instance. However, the reverse functional
dependency position → staff# clearly does not hold.

The relationship between staff# and position is 1:1 – for each staff member there is only one position. On the other
hand, the relationship between position and staff# is 1:M – there are several staff numbers associated with a given
position.

For the purposes of normalization we are interested in identifying functional dependencies between attributes of a
relation that have a 1:1 relationship.
A functional dependency is a property of a relational schema (its intension) and not a property of a
particular instance of the schema (extension).
Identify the functional dependencies that hold using the relation schema STAFFBRANCH
In order to identify the time invariant Fds, we need to clearly understand the semantics of the various attributes in each of
the relation schemas in question. For example, if we know that a staff member’s position and the branch at which they are
located determines their salary.

Inference Rules for Functional Dependencies


The set of all Fds that are implied by a given set S of Fds is called the closure of S, written S+.

The very first Armstrong is the person who gives a set of inference rules. The following are the six well-known
inference rules that apply to functional dependencies.
IR1: reflexive rule – if X ⊇ Y, then X → Y
IR2: augmentation rule – if X → Y, then XZ → YZ
IR3: transitive rule – if X → Y and Y → Z, then X → Z
IR4: projection rule – if X → YZ, then X → Y and X → Z
IR5: additive rule – if X → Y and X → Z, then X → YZ
IR6: pseudo transitive rule – if X → Y and YZ → W, then XZ → W

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 2


Notes of Database Management System

The first three of these rules (IR1-IR3) are known as Armstrong’s Axioms and constitute a necessary and sufficient set of
inference rules for generating the closure of a set of functional dependencies.
The other rules derived from this are:

Given R = (A,B,C,D,E,F,G,H, I, J) and


F = {AB → E, AG → J, BE → I, E → G, GI → H}
Does F = { AB → GH} ?
Proof
1. AB → E, given in F
2. AB → AB, reflexive rule IR1
3. AB → B, projective rule IR4 from step 2
4. AB → BE, additive rule IR5 from steps 1 and 3
5. BE → I, given in F
6. AB → I, transitive rule IR3 from steps 4 and 5
7. E → G, given in F
8. AB → G, transitive rule IR3 from steps 1 and 7
9. AB → GI, additive rule IR5 from steps 6 and 8
10. GI → H, given in F
11. AB → H, transitive rule IR3 from steps 9 and 10
12. AB → GH, additive rule IR5 from steps 8 and 11 . Hence proven

Irreducible sets of dependencies

We define a set of Fds to be irreducible( Usually called minimal in the literature) if and only if it satisfies the
following three properties
1. The right hand side (the dependent) of every Fds in S involves just one attribute (that is, it is singleton set)
2. The left hand side (determinant) of every in S is irreducible in turn-meaning that no attribute can be
discarded from the determinant without changing the closure S+(that is, with out converting S into some set
not equivalent to S). We will say that such an Fd is left irreducible.
3. No Fd in S can be discarded from S without changing the closure S+(That is, without converting s into
some set not equivalent to S)

Eg. Relation R {A,B,C,D,E,F} satisfies the following Fds


AB → C , C → A , BC → D , ACD → B
BE → C
CE → FA
CF → BD
D → EF
Find an irreducible equivalent for this set of Fds?
Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 3
Notes of Database Management System

The solution is simple.


1. AB → C
2. C → A
3. BC → D
4. ACD → B
5. BE → C
6. CE → A
7. CE → F
8. CF → B
9. CF → D
10. D → E
11. D → F

Now:
• 2 implies CE → AE (By augmentation), CE → A (By decomposition), so we can drop 6.
• 8 implies CF → BC (By augmentation), by which 3 implies CF → D (By Transitivity), so we can drop 9.
• 8 implies ACF → AB (By augmentation), and 11 implies ACD → ACF (By augmentation), and so ACD → AB
(By Transitivity), and so ACD → B (By Decomposition), so we can drop 4.

No further reductions are possible, and so we are left with the following irreducible set:
1. AB → C
2. C → A
3. BC → D
5. BE → C
7. CE → F
8. CF → B
10. D → E
11. D → F

Eg. Given a relation schema R(A,B,C,D) and a set of FDs


A BC
B C
A B
AB C
AC D
Compute an irreducible set of FDs that is equivalent to this set.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 4


Notes of Database Management System

Sol:
1) The first step is to rewrite the FDs such that each has a singleton right-hand side:
AB
AC [By Decomposition]
BC
AB
ABC
ACD
We observe immediately that is FD AB occurs twice, so one occurrence can be eliminated.

2) Next, attribute C can be eliminated from the left-hand side of the FD ACD, because we have
AC, so AAC by augmentation, and we are given ACD, so AD by transitivity; thus the C on
the left-hand side of ACD is redundant.
So the remaining FDs are:
AB
AC
BC
ABC
AD

3) Next, we observe that the FD ABC can be eliminated, because again we have AC, so ABCB
by augmentation, so ABC by decomposition. So the remaining FDs are:
AB
AC
BC
AD

4) Finally, the FD AC is implied by the FDs AB and BC, so it can also be eliminated. We are
left with
AB
BC
AD

This set is irreducible.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 5


Notes of Database Management System

Eg. Consider the Relation R(A,B,C,D,E,F,G,H,I,J) and a set of FDs


ABDE , ABG , BF , CJ ,CJI ,GH
Is this an irreducible set? What are the candidate keys?

Sol: 1) The set is not irreducible, since CJ and CJI together imply CI.

2) The super key of this relation is (A, B, C, D, G, J) (i.e. the set of attributes mentioned on the
left-hand side of the given FDs). We can eliminate J from this set because CJ, and we eliminate
G because ABG. Since none of the attributes A, B, C, D appears on the right-hand side of any
of the given FDs, it follows that (A,B,C,D) is a candidate key.

Full Functional Dependency – A functional dependency XY is a full functional dependency if removal
of any attribute A from X means that the dependency does not hold any more; i.e. for any attribute A є X,
(X – {A}) does not functionally determine Y.

E.g. consider the following InvLine relation.

INV_NUM LINENUM QTY INVDATE

Candidate Keys: {InvNum, LineNum}.


Qty and InvDate is fully functionally dependent on combination of InvNum and LineNum, because
removal of any attribute from candidate key does not determine any entity from the relation.

Trivial Dependency – A functional dependency XY is trivial if X ‫ כ‬Y.(i.e X is a super set of Y or Y is a


subset of X).
E.g. The FD ABB and ABA is a trivial dependency.

Partial dependency
A partial dependency exists when an attribute B is functionally dependent on an attribute A, and A is a component of a
multipart candidate key. Candidate keys: {InvNum, LineNum} InvDate is partially dependent on {InvNum, LineNum} as
InvNum is a determinant of InvDate and InvNum is part of a candidate key.

If there is some attribute that can be removed from A and the dependency still holds.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 6


Notes of Database Management System

PART - II
A redundancy in a conceptual schema corresponds to a piece of information that can be derived (that is, obtained
through a series of retrieval operations) from other data in the database.
The presence of a redundancy in a database may be decided upon the following factores
• an advantage: a reduction in the number of accesses necessary to.Obtain the derived information.
• a disadvantage: because of larger storage requirements, (but, usually at negligible cost) and the necessity to
carry out additional operations in order to keep the derived data consistent.
The serious problem with using the relations is the problem of update anomalies.
These can be classified in to
• Insertion anomalies
• Deletion anomalies
• Modification anomalies

Q. Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate with
examples.

Ans. EMP_DEPT

Insertion Anomalies
An "insertion anomaly" is a failure to place information about a new database entry into all the places in the database
where information about that new entry needs to be stored. In a properly normalized database, information about a new
entry needs to be inserted into only one place in the database; in an inadequately normalized database, information about a
new entry may need to be inserted into more than one place and, human fallibility being what it is, some of the needed
additional insertions may be missed.
Eg. First Instance: - To insert a new employee tuple in to Emp_Dept table, we must include either the attribute values for
the department that the employee works for, or nulls (if the employee does not work for a department as yet). For example
to insert a new tuple for an employee who works in department no 5, we must enter the attribute values of department
number 5correctly so that they are consistent, with values for the department 5 in other tuples in emp_dept.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 7


Notes of Database Management System

Second Instance: - It is difficult to insert a new department that has no employees as yet in the emp_dept relation. The only
way to do this is to place null values in the attributes for the employee this causes a problem because SSN in the primary
key of emp_dept table and each tuple is supposed to represent an employee entity- not a department entity.
Moreover, when the first employee is assigned to that department, we do not need this tuple with null values anymore.
Deletion Anomalies
A "deletion anomaly" is a failure to remove information about an existing database entry when it is time to remove that
entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deleted from only
one place in the database; in an inadequately normalized database, information about that old entry may need to be deleted
from more than one place, and, human fallibility being what it is, some of the needed additional deletions may be missed.
Eg. The problem of deletion anomaly is related to the second insertion anomaly situation which we have discussed earlier,
if we delete from emp_dept an employee tuple that happens to represent the last employee working for a particular
department, the information concerning that department is lost from the database.
Modification Anomalies
Eg. In Emp_Dept, if we change the value of one of the attribute of a particular department- say, the manager of
department 5-we must update the tuples of all employees who work in that department; other wise, the database will
become inconsistent. If we fail to update some tuples, the same department will be shown to have 2 different values for
manager in different employee tuple which would be wrong.
All three kinds of anomalies are highly undesirable, since their occurrence constitutes corruption of the
database. Properly normalized databases are much less susceptible to corruption than are unnormalized databases.
Redundant information not only wastes storage but makes updates more difficult since.

Normalization is the process of taking data from a problem and reducing it to a set of relations while
ensuring data integrity, eliminating data redundancy and minimizing the insertion, deletion and update
anomalies.
1. Data Integrity – All of the data in the database are consistent and satisfy all integrity constraints.
2. Data redundancy – If data in the database can be found in two different locations (direct redundancy) or if data can be
calculated from other data items (indirect redundancy) then the data is said to contain redundancy.
Data should only be stored once and avoid storing data that can be calculated from other
data already held in the database. During the process of normalization redundancy must be removed, but not at the expense
of breaking data integrity rules.

Basically the normal form of the data indicates how much redundancy is in that data. The normal forms
have a strict ordering.
a) 1NF
b) 2NF
c) 3NF
d) BCNF
e) 4NF
f) 5NF

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 8


Notes of Database Management System

A normalized relational database provides several benefits:


• Elimination of redundant data storage.
• Close modeling of real world entities, processes, and their relationships.
• Structuring of data so that the model is flexible.

1st Normal Form (1NF)


Def: A table (relation) is in 1NF if
1. There are no duplicated rows in the table.
2. Each cell is single-valued (i.e., there are no repeating groups or arrays).
3. Entries in a column (attribute, field) are of the same kind.
Note: The order of the rows is immaterial; the order of the columns is immaterial. The requirement that there be no
duplicated rows in the table means that the table has a key (although the key might be made up of more than one
column—even, possibly, of all the columns).

So we come to the conclusion,


A relation is in 1NF if and only if all underlying domains contain atomic values only.
The first normal form deals only with the basic structure of the relation and does not resolve the problems of
redundant information or the anomalies discussed earlier. All relations discussed in these notes are in 1NF.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 9


Notes of Database Management System

Eg. The following is not in 1NF.


EmpNum EmpPhone EmpDegrees

1234 2345-987

3456 2367-890 BA,BSC

6789 2367-890 BSC,MSC,PhD


EmpDegrees is a multi valued field:
Employee 6789 has 3 degrees: BSC,PhD and MSC
Employee 3456 has 2 degrees: BA, BSC
To obtain 1NF relations we must, without loss of information, replace the above with two relations.
EMP_DETAIL
EmpNum EmpPhone

1234 2345-987

3456 2367-890

6789 2367-890

EMP_DEG
EmpNum EmpDegrees

1234

3456 BA

3456 BSC

6789 BSC

6789 MSC

6789 PhD

An outer join between Employee and EmployeeDegree will produce the information that the original table hold.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 10


Notes of Database Management System

Eg. No. of duplicate tuples

Eg. EMP_Proj

A relation is in first normal form if and only if, in every legal value of that relation every tuple contains one
value for each attribute
The above definition merely states that the relations are always in first normal form which is always correct.
However the relation that is only in first normal form has a structure those undesirable for a number of reasons.
First normal form (1NF) sets the very basic rules for an organized database:
• Eliminate duplicative columns from the same table.
• Createseparate tables for each group of related data and identify each row with a unique column or set
of columns (the primary key).

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 11


Notes of Database Management System

2nd Normal Form (2NF)


Def: A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on the entire key.
The second normal form attempts to deal with the problems that are identified with the relation above that is in 1NF. The
aim of second normal form is to ensure that all information in one relation is only about one thing.
A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation.
Note: Since a partial dependency occurs when a non-key attribute is dependent on only a part of the (composite)
key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial
dependencies."
The general requirements of 2NF:
• Remove subsets of data that apply to multiple rows of a table and place them in separate rows.
• Create relationships between these new tables and their predecessors through the use of foreign keys.
These rules can be summarized in a simple statement: 2NF attempts to reduce the amount of redundant data in a table by
extracting it, placing it in new table(s) and creating relationships between those tables.
Eg. The concept of 2NF requires that all attributes that are not part of a candidate key be fully dependent on each
candidate key. If we consider the relation
student (sno, sname, cno, cname)
and the functional dependencies
sno  sname
cno  cname
and assume that (sno, cno) is the only candidate key (and therefore the primary key), the relation is not in 2NF
since sname and cname are not fully dependent on the key. The above relation suffers from the same anomalies
and repetition of information as discussed above since sname and cname will be repeated. To resolve these
difficulties we could remove those attributes from the relation that are not fully dependent on the candidate keys
of the relations. Therefore we decompose the relation into the following projections of the original relation:
S1 (sno, sname) , S2 (cno, cname) , SC (sno, cno)

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 12


Notes of Database Management System

Eg. Online store that maintains customer information database.

At this table reveals a small amount of redundant data. We're storing the "Sea Cliff, NY 11579" and "Miami, FL
33157" entries twice each. After decomposition:

Now that we've removed the duplicative data from the Customers table, we've satisfied the first rule of second
normal form. We still need to use a foreign key to tie the two tables together. We'll use the ZIP code (the primary
key from the ZIPs table) to create that relationship. Here's our new Customers table:

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 13


Notes of Database Management System

This minimized the amount of redundant information stored within the database and our structure is in second
normal form.
Eg.

Stu_ID Class_ID Class_name

134-56-7890 M148 Math 148

123-45-7894 P113 Physics 113

534-98-9009 H151 History 151

134-56-7890 H151 History 151

AND
Transitive Dependency
A functional dependency XY in a relation schema R is a transitive dependency if there is a set of
attributes Z that is neither a candidate key nor a subset of any key of R, and both X Z and ZY hold.
Consider attributes A, B, and C, and where A  B and B  C.
FDs are transitive, which means that we also have the FD, A  C.
We say that C is transitively dependent on A through B.
Eg. EmpNum  DeptNum

Emp_number Emp_email Dept_number Dept_name

DeptNum  DeptName

Emp_number Emp_email Dept_number Dept_name

DeptName is transitively dependent on EmpNum via DeptNum: EmpNum  DeptName

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 14


Notes of Database Management System

3rd Normal Form (3NF)


Def: A table is in 3NF if it is in 2NF and if it has no transitive dependencies.
The basic requirements of 3NF are as follows

• Meet the requirements of 1NF and 2NF


• Remove columns that are not fully dependent upon the primary key.

Although transforming a relation that is not in 2NF into a number of relations that are in 2NF removes many of the
anomalies that appear in the relation that was not in 2NF, not all anomalies are removed and further normalization is
sometime needed to ensure further removal of anomalies. These anomalies arise because a 2NF relation may have
attributes that are not directly related to the thing that is being described by the candidate keys of the relation. Let us first
define the 3NF.
For E.g. The relation schema EMP_DEPT is in 2NF, since no partial dependencies on a key exist. However, EMP_DEPT
is not in 3NF because of the transitive dependency of DMGRENO (and also DNAME) on ENO via DNUMBER. We can
normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2. Intuitively, we see that ED1
and ED2 represent independent entity facts about employees and departments. A NATURAL JOIN operation on ED1 and
ED2 will recover the original relation EMP_DEPT without generating spurious tuples.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 15


Notes of Database Management System

Consider the following relation


subject (cno, cname, instructor, office)
cno  cname , cno  instructor , instructure  office
Assume that cname is not unique and therefore cno is the only candidate key. The following functional
dependencies exist.
We can derive cno  office from the above functional dependencies and therefore the above relation is in 2NF.
The relation is however not in 3NF since office is not directly dependent on cno. This transitive dependence is an
indication that the relation has information about more than one thing (viz. course and instructor) and should
therefore be decomposed. The primary difficulty with the above relation is that an instructor might be
responsible for several subjects and therefore his office address may need to be repeated many times. This leads
to all the problems that we identified at the beginning of this chapter. To overcome these difficulties we need to
decompose the above relation in the following two relations:
s (cno, cname, instructor) ins (instructor, office)
s is now in 3NF and so is ins.

Eg. Consider a table of Orders

Test for 1NF, 2NF and 3NF????


Ans. Our first requirement is that the table must satisfy the requirements of 1NF and 2NF. Are there any
duplicative columns? No. Do we have a primary key? Yes, the order number. Therefore, we satisfy the
requirements of 1NF. Are there any subsets of data that apply to multiple rows? No, so we also satisfy the
requirements of 2NF.
Now, are all of the columns fully dependent upon the primary key? The customer number varies with the order
number and it doesn't appear to depend upon any of the other fields. What about the unit price? This field could
be dependent upon the customer number in a situation where we charged each customer a set price. However,
looking at the data above, it appears we sometimes charge the same customer different prices. Therefore, the unit
price is fully dependent upon the order number. The quantity of items also varies from order to order. BUT, The
Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 16
Notes of Database Management System

total can be derived by multiplying the unit price by the quantity; therefore it's not fully dependent upon the
primary key. We must remove it from the table to comply with the third normal form:

Boyce-Codd Normal Form (BCNF)


A relation R is said to be in BCNF if whenever X  A , holds in R, and A is not in X, then X is a candidate
key for R.
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF. That
is, every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF.
It should be noted that most relations that are in 3NF are also in BCNF. Infrequently, a 3NF relation is not in
BCNF and this happens only if
(a) the candidate keys in the relation are composite keys (that is, they are not single attributes),
(b) there is more than one candidate key in the relation, and
(c) the keys are not disjoint, that is, some attributes in the keys are common.
The BCNF differs from the 3NF only when there are more than one candidate keys and the keys are
composite and overlapping.
So, a relation is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, left-irreducible
functional dependency has a candidate key as its determinant. In more informal terms, a relation is in BCNF if it
is in 3NF and the only determinants are the candidate keys.
( REFER understandingnormalisation.pdf )

DESIRABLE PROPERTIES OF DECOMPOSITIONS


So far our approach has consisted of looking at individual relations and checking if they belong to 2NF, 3NF or
BCNF. If a relation was not in the normal form that was being checked for and we wished the relation to be
normalized to that normal form so that some of the anomalies can be eliminated, it was necessary to decompose
the relation in two or more relations. The process of decomposition of a relation R into a set of relations
R1,R2….RN was based on identifying different components and using that as a basis of decomposition. The
decomposed relations R1,R2….RN are projections of R and are of course not disjoint otherwise the glue holding
the information together would be lost. Decomposing relations in this way based on a recognize and split method

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 17


Notes of Database Management System

is not a particularly sound approach since we do not even have a basis to determine that the original relation can
be constructed if necessary from the decomposed relations.
Desirable properties of decomposition are:
1. Attribute preservation
2. Lossless-join decomposition
3. Dependency preservation
4. Lack of redundancy

Attribute Preservation
This is a simple and an obvious requirement that involves preserving all the attributes that were there in the
relation that is being decomposed.

Lossless-Join Decomposition
In these notes so far we have normalized a number of relations by decomposing them. We decomposed a relation
intuitively. We need a better basis for deciding decompositions since intuition may not always be correct. We
illustrate how a careless decomposition may lead to problems including loss of information.
Consider the following relation
enrol (sno, cno, date-enrolled, room-No., instructor)
Suppose we decompose the above relation into two relations enrol1 and enrol2 as follows
enrol1 (sno, cno, date-enrolled) , enrol2 (date-enrolled, room-No., instructor)
There are problems with this decomposition but we wish to focus on one aspect at the moment. Let an instance
of the relation enrol be

and let the decomposed relations enrol1 and enrol2 be:

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 18


Notes of Database Management System

All the information that was in the relation enrol appears to be still available in enrol1 and enrol2 but this is not
so. Suppose, we wanted to retrieve the student numbers of all students taking a course from Wilson, we would
need to join enrol1 and enrol2. The join would have 11 tuples as follows:

…………. So on.
The join contains a number of spurious tuples that were not in the original relation Enrol. Because of these
additional tuples, we have lost the information about which students take courses from WILSON. (Yes, we have
more tuples but less information because we are unable to say with certainty who is taking courses from
WILSON). Such decompositions are called lossy decompositions. A nonloss or lossless decomposition is that
which guarantees that the join will result in exactly the same relation as was decomposed. One might think that
there might be other ways of recovering the original relation from the decomposed relations but, sadly, no other
operators can recover the original relation if the join does not.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 19


Notes of Database Management System

We need to analyze why some decompositions are lossy. The common attribute in above decompositions was
Date-enrolled. The common attribute is the glue that gives us the ability to find the relationships between
different relations by joining the relations together. If the common attribute is not unique, the relationship
information is not preserved. If each tuple had a unique value of Date-enrolled, the problem of losing
information would not have existed. The problem arises because several enrolments may take place on the same
date.

Dependency Preservation
It is clear that decomposition must be lossless so that we do not lose any information from the relation that is
decomposed. Dependency preservation is another important requirement since a dependency is a constraint on
the database and if X  Yholds than we know that the two (sets) attributes are closely related and it would be
useful if both attributes appeared in the same relation so that the dependency can be checked easily. Let us
consider a relation R(A, B, C, D) that has the dependencies F that include the following:
A  B , A C etc….
If we decompose the above relation into R1(A, B) and R2(B, C, D) the dependency A C cannot be checked
(or preserved) by looking at only one relation. It is desirable that decompositions be such that each dependency
in F may be checked by looking at only one relation and that no joins need be computed for checking
dependencies. In some cases, it may not be possible to preserve each and every dependency in F but as long as
the dependencies that are preserved are equivalent to F, it should be sufficient.

Lack of Redundancy
We have discussed the problems of repetition of information in a database. Such repetition should be avoided as
much as possible.
Lossless-join, dependency preservation and lack of redundancy not always possible with BCNF. Lossless-join,
dependency preservation and lack of redundancy is always possible with 3NF.
Q. Define Boyce-Codd normal form. How it is differ from 3NF? Why it is considered a stronger form of 3NF?

Sol:
BCNF - A relation schema R is in BCNF if whenever a nontrivial functional dependency X A holds in R, then X is a
super key of R.
3NF - A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency XA holds in
R, either
• X is a super key of R, or
• A is a prime attribute of R.

Difference – The formal definition of BCNF differs slightly from the definition of 3NF. The only difference between the
definitions of BCNF and 3NF is the second condition of 3NF (i.e. A is a prime attribute of R) which allows A to be prime,
is absent from BCNF. BCNF requires that every determinant is a candidate key and 3NF doesn’t require every determinant
to act as candidate key. So every relation which is in BCNF is also in 3NF but every relation which is in 3NF is not
necessarily in BCNF. Consider for instance a relation R(A, B, C) with the following dependency diagram.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 20


Notes of Database Management System

R
A B C

Here the candidate key of the relation is AB and also the super key of this relation is AB. So the relation is
in 3NF because we have the FDs ABC and AB is a super key of R and the FD CB and B is a prime
attribute of R. The relation is not in BCNF because C is not a super key of the relation and still the FD
CB holds.
Why it is considered as a stronger normal form of 3NF
BCNF is considered as a stronger normal form of 3NF because 3NF relations still have insertion,
deletion and update anomalies but BCNF relations doesn’t have any insertion, deletion and update anomalies. In
BCNF every dependency is on the candidate key which is our goal of normalization and in 3NF it is not
required that every dependency is on candidate key.
For e.g., consider a relation Stud_Course with the following dependency diagram with the fact that a
particular student takes a course and has one instructor and a particular instructor teaches one course only.
Stud_Course

With the dependency diagram we have following FDs.


Studno, CoursenoInstrno and
InstrnoCourseno
Consider the following instance of the above relation:-

From this relation we have InstrnoCourseno i.e. instructor 99 teaches 1803, instructor 77 teaches 1903
and instructor 66 teaches 1803.
The super key and candidate key of this relation is Studno, Courseno.The relation is in 3NF
because we have FD Studno, CoursenoInstrno and Studno, Courseno is a super key and FD
InstrnoCourseno and Courseno is a prime attribute of this relation.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 21


Notes of Database Management System

This relation has the following insertion, deletion and update anomalies.
1. Insertion Anomaly – How do we add the fact that instructor 55 teaches course 2906.
2. Deletion Anomaly – If we delete all rows for course 1803 we will lose the information that
instructor 99 teaches student 121 and 66 teaches student 222.
3. Update Anomaly – If we update the Courseno of student 121 to be 1903 then we also have to
update his Instrno to be 77.
Now the relation is not in BCNF because we have the FD InstrnoCourseno and Instrno is not a super key
of this relation. So we decompose this relation into two relations Stud(Studno, Instrno) and Instr(Instrno,
Courseno).

Stud
Studno Instrno

Instr
Instrno courseno

Stud
Studno Instrno
121 99
121 77
222 66
222 77

Instr
Instrno Courseno
99 1803
77 1903
66 1803

Now these relations don’t have any insertion, deletion and update anomalies because every dependency is
on the super key. The natural join between these two relations produce the original relations.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 22


Notes of Database Management System

Q. Prove that any relation schema with two attributes is in BCNF?

Sol: Consider a relation schema R1 with two attributes A and B and with the following dependency
diagram.
R1
A B
Now with this dependency diagram we either have AB or BA.
Now we define BCNF as a relation schema R is in BCNF if whenever a nontrivial functional dependency
XA holds in R, then X is a super key of R.
From the definition of BCNF the relation schema R1 is necessarily in BCNF because we either have AB
with super key A or have BA with super key B.

MULTIVALUED DEPENDENCIES

A multivalued dependency (MVD) on R, X Y , says that if two tuples of R agree on all the attributes of X, then their
components in Y may be swapped, and the result will be two tuples that are also in the relation.
i.e., for each value of X, the values of Y are independent of the values of R-X-Y.
Every FD is an MVD (promotion ):- If X Y, then swapping Y ’s between two tuples that agree on X doesn’t change
the tuples. Therefore, the “new” tuples are surely in the relation, and we know X Y.
Consider the following relation that represents an entity employee that has one mutlivalued attribute proj:
emp (e#, dept, salary, proj)
We have so far considered normalization based on functional dependencies; dependencies that apply only to
single-valued facts. For example, implies only one dept value for each value of e#. Not all information in a
database is single-valued, for example, e# proj in an employee relation may be the list of all projects that the
employee is currently working on. Although e# determines the list of all projects that an employee is working
on, e# dept is not a functional dependency.
Eg. programmer (emp_name, qualifications, languages)
The above relation includes two multivalued attributes of entity programmer; qualifications and languages. There
are no functional dependencies.
The attributes qualifications and languages are assumed independent of each other. If we were to consider
qualifications and languages separate entities, we would have two relationships (one between employees and
qualifications and the other between employees and programming languages). Both the above relationships are
many-to-many i.e. one programmer could have several qualifications and may know several programming
languages. Also one qualification may be obtained by several programmers and one programming language may
be known to many programmers.
The above relation is therefore in 3NF (even in BCNF) but it still has some disadvantages. Suppose a
programmer has several qualifications (B.Sc, Dip. Comp. Sc, etc) and is proficient in several programming
languages; how should this information be represented? There are several possibilities.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 23


Notes of Database Management System

All these variations have some disadvantages. If the information is repeated we face the same problems of
repeated information and anomalies as we did when second or third normal form conditions are violated. If there
is no repetition, there are still some difficulties with search, insertions and deletions. For example, the role of
NULL values in the above relations is confusing.. These problems may be overcome by decomposing a relation
like that:-

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 24


Notes of Database Management System

The concept of multivalued dependencies was developed to provide a basis for decomposition of relations like
the one above. Therefore if a relation like enrolment(sno, subject#) has a relationship between sno and subject#
in which sno uniquely determines the values of subject#, the dependence of subject# on sno is called a
trivial MVD since the relation enrolment cannot be decomposed any further. More formally, X Ya MVD is
called trivial MVD if either Y is a subset of X or X and Y together form the relation R. The MVD is trivial since
it results in no constraints being placed on the relation.
A MVD XY in R is called a trivial MVD if
a) Y is a subset of X or
b) X U Y = R.
For e.g., consider the relation EMP_PROJECTS.from table we have a trivial MVD ENAME PNAME.

Therefore a relation having non-trivial MVDs must have at least three attributes;
two of them multivalued. Non-trivial MVDs result in the relation having some constraints on it since all possible
combinations of the multivalue attributes are then required to be in the relation. Possible: ZX Y but
X ZY means X Y , X Z.

Multivalued Normalization --- Fourth Normal Form


Def: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.

Consider an example of Programmer(Emp name, qualification, languages) and discussed the problems that may
arise if the relation is not normalised further. We also saw how the relation could be decomposed into P1(Emp
name, qualifications) and P2(Emp name, languages) to overcome these problems. The decomposed relations are
in fourth normal form (4NF) which we shall now define.
We are now ready to define 4NF. A relation R is in 4NF if, whenever a multivalued dependency XYholds then
either
(a) the dependency is trivial, or (b) X is a candidate key for R.
In fourth normal form, we have a relation that has information about only one entity. If a relation has more than
one multivalue attribute, we should decompose it to remove difficulties with multivalued facts.

EMP

ENAME PNAME DNAME

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 25


Notes of Database Management System

EMP
ENAME PNAME DNAME

Kumar X Rakesh

Kumar Y Rao

Kumar X Rao

Kumar Y Rakesh
From the above fig we have ENAMEPNAME and ENAME DNAME (or ENAME PNAME|DNAME)
holds in the EMP relation. The employee with ENAME ‘Kumar’ works on projects with PNAME ‘X’ and ‘Y’ and has two
dependents with DNAME ‘Rakesh’ and ‘Rao’.
This relation is not in 4NF because in the nontrivial MVDs ENAMEPNAME and ENAMEDNAME, ENAME is
not a super key of EMP. We decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS shown in fig below.

Both EMP_PROJECTS and EMP_DEPENDENTS are in 4NF, because the MVDs ENAMEPNAME

in EMP_PROJECTS and ENAMEDNAME in EMP_DEPENDENTS are trivial MVDs. No other nontrivial MVDs
hold in either EMP_PROJECTS or EMP_DEPENDENTS. No FDs hold in these relation schemas either.
JOIN DEPENDENCIES

Notice that an MVD is a special case of a JD where n = 2. That is, a JD denoted as JD(R1, R2) implies a
MVD (R1 R2)  (R1 – R2) (or, by symmetry, (R1 R2  (R2 – R1)). A join dependency JD(R1,
R2,…,Rn), specified on relation schema R, is a trivial JD if one of the relation schemas Ri in JD(R1, R2,…,Rn) is equal to
R. Such a dependency is called trivial because it has the nonadditive join property for any relation state r of R and hence
does not specify any constraint on R.
For e.g., consider the relation SUPPLY with JD(R1, R2, R3) shown below. From the relations we have
R3) = SUPPLY

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 26


Notes of Database Management System

SUPPLY
SNAME PARTNAME PROJNAME
Kumar Bolt projX
Kumar Nut projY
Prakash Bolt projY
Ramesh Nut projZ
Prakash Nail projX

R1 R2

R3

Fifth Normal Form (5NF) (Project Join Normal Form) (PJNF)


A relation schema R is in fifth normal form (5NF) (or project join normal form [PJNF]) with respect
to a set of F of functional, multivalued, and join dependencies if, for every nontrivial join dependency JD(R1,
R2,…,Rn) in F+ (that is implied by F), every Ri is a super key of R.
OR
Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if every join
dependency in the table is a consequence of the candidate keys of the table. The fifth normal form deals with
join-dependencies which is a generalisation of the MVD. The aim of fifth normal form is to have relations that
cannot be decomposed further. A relation in 5NF cannot be constructed from several smaller relations.
A relation R satisfies join dependency (R1,R2,…..Rn) if and only if R is equal to the join of R1,R2,…..Rn where
are R1subsets of the set of attributes of R.
A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies at least one of the
following holds.
(a) (R1,R2,…..Rn) is a trivial join-dependency (that is, one of R1 is R) (b) Every R1 is a candidate key for R.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 27


Notes of Database Management System

For e.g., consider the SUPPLY relation shown above. The super keys of this relation are {SNAME, PARTNAME},
{SNAME, PROJNAME} and {PARTNAME, PROJNAME}. The decomposition of
R1(SNAME, PARTNAME), R2(SNAME, PROJNAME) and R3(PARTNAME, PROJNAME) is in 5NF
because we have a JD(R1, R2, R3) over R and every Ri is a super key of R.
Eg. An example of 5NF can be provided by the example below that deals with departments, subjects and
students.

The above relation says that Comp. Sc. offers subjects CP1000, CP2000 and CP3000 which are taken by a
variety of students. No student takes all the subjects and no subject has all students enrolled in it and therefore all
three fields are needed to represent the information.
The above relation does not show MVDs since the attributes subject and student are not independent; they are
related to each other and the pairings have significant information in them. The relation can therefore not be
decomposed in two relations
(dept, subject), and (dept, student)
without loosing some important information. The relation can however be decomposed in the following three
relations.
(dept, subject), and (dept, student) , (subject, student)
and now it can be shown that this decomposition is lossless.

DOMAIN KEY NF (Free from modification anomalies)


A key uniquely identifies each row in a table. A domain is the set of permissible values
for an attribute. By enforcing key and domain restrictions, the database is assured of being free from modification
anomalies. Thus, it avoids all non-temporal anomalies. It's much easier to build a database in domain/key normal form than
it is to convert lesser databases which may contain numerous anomalies. However, successfully building a domain/key
normal form database remains a difficult task, even for experienced database programmers. Thus, while the domain/key
normal form eliminates the problems found in most databases, it tends to be the most costly normal form to achieve.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 28


Notes of Database Management System

Eg. A violation of DKNF occurs in the following table:


Wealthy Person

Wealthy Person Wealthy Person Type Net Worth in Dollars

Steve Eccentric Millionaire 124,543,621

Roderick Evil Billionaire 6,553,228,893

Katrina Eccentric Billionaire 8,829,462,998

Gary Evil Millionaire 495,565,211

(Assume that the domain for Wealthy Person consists of the names of all wealthy people in a pre-defined sample of
wealthy people; the domain for Wealthy Person Type consists of the values 'Eccentric Millionaire', 'Eccentric Billionaire',
'Evil Millionaire', and 'Evil Billionaire'; and the domain for Net Worth in Dollars consists of all integers greater than or
equal to 1,000,000.)

There is a constraint linking Wealthy Person Type to Net Worth in Dollars, even though we cannot deduce one from the
other. The constraint dictates that an Eccentric Millionaire or Evil Millionaire will have a net worth of 1,000,000 to
999,999,999 inclusive, while an Eccentric Billionaire or Evil Billionaire will have a net worth of 1,000,000,000 or higher.
This constraint is neither a domain constraint nor a key constraint; therefore we cannot rely on domain constraints and key
constraints to guarantee that an inconsistent Wealthy Person Type / Net Worth in Dollars combination does not make its
way into the database.

The DKNF violation could be eliminated by altering the Wealthy Person Type domain to make it consist of just two
values, 'Evil' and 'Eccentric' (the wealthy person's status as a millionaire or billionaire is implicit in their Net Worth in
Dollars, so no useful information is lost).

DKNF is frequently difficult to achieve in practice.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 29


Notes of Database Management System

Denormalization
Occasionally database designers choose a schema that has redundant information; that is, it is not
normalized. They use the redundancy to improve performance for specific applications. The penalty paid
for not using a normalized schema is the extra work (in terms of coding time and execution time) to keep
redundant data consistent. For instance, suppose that the name of an account holder has to be displayed
along with the account number and balance, every time the account is accessed. In our normalized schema,
this requires a join of account with depositor. One alternative to computing the join on the fly is to store a
relation containing all the attributes of account and depositor. This makes displaying the account
information faster. However, the balance information for an account is repeated for every person who owns
the account, and all copies must be updated by the application, whenever the account balance is updated.
The process of taking a normalized schema and making it non-normalized is called denormalization, and
designers use it to tune performance of systems to support time-critical operations.

Mrs Mousmi Ajay Chaurasia , LECTURER, Dept. of INFORMATION TECHNOLOGY Page 30

Potrebbero piacerti anche