Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
UNIT – III
NORMALISATION
Functional Dependencies
A relational schema has attributes (A, B, C... Z) and that the whole database is described by a single universal relation
called R = (A, B, C, ..., Z). This assumption means that every attribute in the database has a unique name. A functional
dependency is a property of the semantics of the attributes in a relation. The semantics indicate how attributes relate to one
another, and specify the functional dependencies between attributes. When a functional dependency is present, the
dependency is specified as a constraint between the attributes.
Consider a relation with attributes A and B, where attribute B is functionally dependent
on attribute A. If we know the value of A and we examine the relation that holds this dependency, we will find only one
value of B in all of the tuples that have a given value of A, at any moment in time.
The determinant of a functional dependency is the attribute or group of attributes on the left-hand side of the arrow in
the functional dependency. The consequent of a fd is the attribute or group of attributes on the right-hand side of the
arrow.
Consider a relational schema
The functional dependency staff# → position clearly holds on this relation instance. However, the reverse functional
dependency position → staff# clearly does not hold.
The relationship between staff# and position is 1:1 – for each staff member there is only one position. On the other
hand, the relationship between position and staff# is 1:M – there are several staff numbers associated with a given
position.
For the purposes of normalization we are interested in identifying functional dependencies between attributes of a
relation that have a 1:1 relationship.
A functional dependency is a property of a relational schema (its intension) and not a property of a
particular instance of the schema (extension).
Identify the functional dependencies that hold using the relation schema STAFFBRANCH
In order to identify the time invariant Fds, we need to clearly understand the semantics of the various attributes in each of
the relation schemas in question. For example, if we know that a staff member’s position and the branch at which they are
located determines their salary.
The very first Armstrong is the person who gives a set of inference rules. The following are the six well-known
inference rules that apply to functional dependencies.
IR1: reflexive rule – if X ⊇ Y, then X → Y
IR2: augmentation rule – if X → Y, then XZ → YZ
IR3: transitive rule – if X → Y and Y → Z, then X → Z
IR4: projection rule – if X → YZ, then X → Y and X → Z
IR5: additive rule – if X → Y and X → Z, then X → YZ
IR6: pseudo transitive rule – if X → Y and YZ → W, then XZ → W
The first three of these rules (IR1-IR3) are known as Armstrong’s Axioms and constitute a necessary and sufficient set of
inference rules for generating the closure of a set of functional dependencies.
The other rules derived from this are:
We define a set of Fds to be irreducible( Usually called minimal in the literature) if and only if it satisfies the
following three properties
1. The right hand side (the dependent) of every Fds in S involves just one attribute (that is, it is singleton set)
2. The left hand side (determinant) of every in S is irreducible in turn-meaning that no attribute can be
discarded from the determinant without changing the closure S+(that is, with out converting S into some set
not equivalent to S). We will say that such an Fd is left irreducible.
3. No Fd in S can be discarded from S without changing the closure S+(That is, without converting s into
some set not equivalent to S)
Now:
• 2 implies CE → AE (By augmentation), CE → A (By decomposition), so we can drop 6.
• 8 implies CF → BC (By augmentation), by which 3 implies CF → D (By Transitivity), so we can drop 9.
• 8 implies ACF → AB (By augmentation), and 11 implies ACD → ACF (By augmentation), and so ACD → AB
(By Transitivity), and so ACD → B (By Decomposition), so we can drop 4.
No further reductions are possible, and so we are left with the following irreducible set:
1. AB → C
2. C → A
3. BC → D
5. BE → C
7. CE → F
8. CF → B
10. D → E
11. D → F
Sol:
1) The first step is to rewrite the FDs such that each has a singleton right-hand side:
AB
AC [By Decomposition]
BC
AB
ABC
ACD
We observe immediately that is FD AB occurs twice, so one occurrence can be eliminated.
2) Next, attribute C can be eliminated from the left-hand side of the FD ACD, because we have
AC, so AAC by augmentation, and we are given ACD, so AD by transitivity; thus the C on
the left-hand side of ACD is redundant.
So the remaining FDs are:
AB
AC
BC
ABC
AD
3) Next, we observe that the FD ABC can be eliminated, because again we have AC, so ABCB
by augmentation, so ABC by decomposition. So the remaining FDs are:
AB
AC
BC
AD
4) Finally, the FD AC is implied by the FDs AB and BC, so it can also be eliminated. We are
left with
AB
BC
AD
Sol: 1) The set is not irreducible, since CJ and CJI together imply CI.
2) The super key of this relation is (A, B, C, D, G, J) (i.e. the set of attributes mentioned on the
left-hand side of the given FDs). We can eliminate J from this set because CJ, and we eliminate
G because ABG. Since none of the attributes A, B, C, D appears on the right-hand side of any
of the given FDs, it follows that (A,B,C,D) is a candidate key.
Full Functional Dependency – A functional dependency XY is a full functional dependency if removal
of any attribute A from X means that the dependency does not hold any more; i.e. for any attribute A є X,
(X – {A}) does not functionally determine Y.
Partial dependency
A partial dependency exists when an attribute B is functionally dependent on an attribute A, and A is a component of a
multipart candidate key. Candidate keys: {InvNum, LineNum} InvDate is partially dependent on {InvNum, LineNum} as
InvNum is a determinant of InvDate and InvNum is part of a candidate key.
If there is some attribute that can be removed from A and the dependency still holds.
PART - II
A redundancy in a conceptual schema corresponds to a piece of information that can be derived (that is, obtained
through a series of retrieval operations) from other data in the database.
The presence of a redundancy in a database may be decided upon the following factores
• an advantage: a reduction in the number of accesses necessary to.Obtain the derived information.
• a disadvantage: because of larger storage requirements, (but, usually at negligible cost) and the necessity to
carry out additional operations in order to keep the derived data consistent.
The serious problem with using the relations is the problem of update anomalies.
These can be classified in to
• Insertion anomalies
• Deletion anomalies
• Modification anomalies
Q. Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate with
examples.
Ans. EMP_DEPT
Insertion Anomalies
An "insertion anomaly" is a failure to place information about a new database entry into all the places in the database
where information about that new entry needs to be stored. In a properly normalized database, information about a new
entry needs to be inserted into only one place in the database; in an inadequately normalized database, information about a
new entry may need to be inserted into more than one place and, human fallibility being what it is, some of the needed
additional insertions may be missed.
Eg. First Instance: - To insert a new employee tuple in to Emp_Dept table, we must include either the attribute values for
the department that the employee works for, or nulls (if the employee does not work for a department as yet). For example
to insert a new tuple for an employee who works in department no 5, we must enter the attribute values of department
number 5correctly so that they are consistent, with values for the department 5 in other tuples in emp_dept.
Second Instance: - It is difficult to insert a new department that has no employees as yet in the emp_dept relation. The only
way to do this is to place null values in the attributes for the employee this causes a problem because SSN in the primary
key of emp_dept table and each tuple is supposed to represent an employee entity- not a department entity.
Moreover, when the first employee is assigned to that department, we do not need this tuple with null values anymore.
Deletion Anomalies
A "deletion anomaly" is a failure to remove information about an existing database entry when it is time to remove that
entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deleted from only
one place in the database; in an inadequately normalized database, information about that old entry may need to be deleted
from more than one place, and, human fallibility being what it is, some of the needed additional deletions may be missed.
Eg. The problem of deletion anomaly is related to the second insertion anomaly situation which we have discussed earlier,
if we delete from emp_dept an employee tuple that happens to represent the last employee working for a particular
department, the information concerning that department is lost from the database.
Modification Anomalies
Eg. In Emp_Dept, if we change the value of one of the attribute of a particular department- say, the manager of
department 5-we must update the tuples of all employees who work in that department; other wise, the database will
become inconsistent. If we fail to update some tuples, the same department will be shown to have 2 different values for
manager in different employee tuple which would be wrong.
All three kinds of anomalies are highly undesirable, since their occurrence constitutes corruption of the
database. Properly normalized databases are much less susceptible to corruption than are unnormalized databases.
Redundant information not only wastes storage but makes updates more difficult since.
Normalization is the process of taking data from a problem and reducing it to a set of relations while
ensuring data integrity, eliminating data redundancy and minimizing the insertion, deletion and update
anomalies.
1. Data Integrity – All of the data in the database are consistent and satisfy all integrity constraints.
2. Data redundancy – If data in the database can be found in two different locations (direct redundancy) or if data can be
calculated from other data items (indirect redundancy) then the data is said to contain redundancy.
Data should only be stored once and avoid storing data that can be calculated from other
data already held in the database. During the process of normalization redundancy must be removed, but not at the expense
of breaking data integrity rules.
Basically the normal form of the data indicates how much redundancy is in that data. The normal forms
have a strict ordering.
a) 1NF
b) 2NF
c) 3NF
d) BCNF
e) 4NF
f) 5NF
1234 2345-987
1234 2345-987
3456 2367-890
6789 2367-890
EMP_DEG
EmpNum EmpDegrees
1234
3456 BA
3456 BSC
6789 BSC
6789 MSC
6789 PhD
An outer join between Employee and EmployeeDegree will produce the information that the original table hold.
Eg. EMP_Proj
A relation is in first normal form if and only if, in every legal value of that relation every tuple contains one
value for each attribute
The above definition merely states that the relations are always in first normal form which is always correct.
However the relation that is only in first normal form has a structure those undesirable for a number of reasons.
First normal form (1NF) sets the very basic rules for an organized database:
• Eliminate duplicative columns from the same table.
• Createseparate tables for each group of related data and identify each row with a unique column or set
of columns (the primary key).
At this table reveals a small amount of redundant data. We're storing the "Sea Cliff, NY 11579" and "Miami, FL
33157" entries twice each. After decomposition:
Now that we've removed the duplicative data from the Customers table, we've satisfied the first rule of second
normal form. We still need to use a foreign key to tie the two tables together. We'll use the ZIP code (the primary
key from the ZIPs table) to create that relationship. Here's our new Customers table:
This minimized the amount of redundant information stored within the database and our structure is in second
normal form.
Eg.
AND
Transitive Dependency
A functional dependency XY in a relation schema R is a transitive dependency if there is a set of
attributes Z that is neither a candidate key nor a subset of any key of R, and both X Z and ZY hold.
Consider attributes A, B, and C, and where A B and B C.
FDs are transitive, which means that we also have the FD, A C.
We say that C is transitively dependent on A through B.
Eg. EmpNum DeptNum
DeptNum DeptName
Although transforming a relation that is not in 2NF into a number of relations that are in 2NF removes many of the
anomalies that appear in the relation that was not in 2NF, not all anomalies are removed and further normalization is
sometime needed to ensure further removal of anomalies. These anomalies arise because a 2NF relation may have
attributes that are not directly related to the thing that is being described by the candidate keys of the relation. Let us first
define the 3NF.
For E.g. The relation schema EMP_DEPT is in 2NF, since no partial dependencies on a key exist. However, EMP_DEPT
is not in 3NF because of the transitive dependency of DMGRENO (and also DNAME) on ENO via DNUMBER. We can
normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2. Intuitively, we see that ED1
and ED2 represent independent entity facts about employees and departments. A NATURAL JOIN operation on ED1 and
ED2 will recover the original relation EMP_DEPT without generating spurious tuples.
total can be derived by multiplying the unit price by the quantity; therefore it's not fully dependent upon the
primary key. We must remove it from the table to comply with the third normal form:
is not a particularly sound approach since we do not even have a basis to determine that the original relation can
be constructed if necessary from the decomposed relations.
Desirable properties of decomposition are:
1. Attribute preservation
2. Lossless-join decomposition
3. Dependency preservation
4. Lack of redundancy
Attribute Preservation
This is a simple and an obvious requirement that involves preserving all the attributes that were there in the
relation that is being decomposed.
Lossless-Join Decomposition
In these notes so far we have normalized a number of relations by decomposing them. We decomposed a relation
intuitively. We need a better basis for deciding decompositions since intuition may not always be correct. We
illustrate how a careless decomposition may lead to problems including loss of information.
Consider the following relation
enrol (sno, cno, date-enrolled, room-No., instructor)
Suppose we decompose the above relation into two relations enrol1 and enrol2 as follows
enrol1 (sno, cno, date-enrolled) , enrol2 (date-enrolled, room-No., instructor)
There are problems with this decomposition but we wish to focus on one aspect at the moment. Let an instance
of the relation enrol be
All the information that was in the relation enrol appears to be still available in enrol1 and enrol2 but this is not
so. Suppose, we wanted to retrieve the student numbers of all students taking a course from Wilson, we would
need to join enrol1 and enrol2. The join would have 11 tuples as follows:
…………. So on.
The join contains a number of spurious tuples that were not in the original relation Enrol. Because of these
additional tuples, we have lost the information about which students take courses from WILSON. (Yes, we have
more tuples but less information because we are unable to say with certainty who is taking courses from
WILSON). Such decompositions are called lossy decompositions. A nonloss or lossless decomposition is that
which guarantees that the join will result in exactly the same relation as was decomposed. One might think that
there might be other ways of recovering the original relation from the decomposed relations but, sadly, no other
operators can recover the original relation if the join does not.
We need to analyze why some decompositions are lossy. The common attribute in above decompositions was
Date-enrolled. The common attribute is the glue that gives us the ability to find the relationships between
different relations by joining the relations together. If the common attribute is not unique, the relationship
information is not preserved. If each tuple had a unique value of Date-enrolled, the problem of losing
information would not have existed. The problem arises because several enrolments may take place on the same
date.
Dependency Preservation
It is clear that decomposition must be lossless so that we do not lose any information from the relation that is
decomposed. Dependency preservation is another important requirement since a dependency is a constraint on
the database and if X Yholds than we know that the two (sets) attributes are closely related and it would be
useful if both attributes appeared in the same relation so that the dependency can be checked easily. Let us
consider a relation R(A, B, C, D) that has the dependencies F that include the following:
A B , A C etc….
If we decompose the above relation into R1(A, B) and R2(B, C, D) the dependency A C cannot be checked
(or preserved) by looking at only one relation. It is desirable that decompositions be such that each dependency
in F may be checked by looking at only one relation and that no joins need be computed for checking
dependencies. In some cases, it may not be possible to preserve each and every dependency in F but as long as
the dependencies that are preserved are equivalent to F, it should be sufficient.
Lack of Redundancy
We have discussed the problems of repetition of information in a database. Such repetition should be avoided as
much as possible.
Lossless-join, dependency preservation and lack of redundancy not always possible with BCNF. Lossless-join,
dependency preservation and lack of redundancy is always possible with 3NF.
Q. Define Boyce-Codd normal form. How it is differ from 3NF? Why it is considered a stronger form of 3NF?
Sol:
BCNF - A relation schema R is in BCNF if whenever a nontrivial functional dependency X A holds in R, then X is a
super key of R.
3NF - A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency XA holds in
R, either
• X is a super key of R, or
• A is a prime attribute of R.
Difference – The formal definition of BCNF differs slightly from the definition of 3NF. The only difference between the
definitions of BCNF and 3NF is the second condition of 3NF (i.e. A is a prime attribute of R) which allows A to be prime,
is absent from BCNF. BCNF requires that every determinant is a candidate key and 3NF doesn’t require every determinant
to act as candidate key. So every relation which is in BCNF is also in 3NF but every relation which is in 3NF is not
necessarily in BCNF. Consider for instance a relation R(A, B, C) with the following dependency diagram.
R
A B C
Here the candidate key of the relation is AB and also the super key of this relation is AB. So the relation is
in 3NF because we have the FDs ABC and AB is a super key of R and the FD CB and B is a prime
attribute of R. The relation is not in BCNF because C is not a super key of the relation and still the FD
CB holds.
Why it is considered as a stronger normal form of 3NF
BCNF is considered as a stronger normal form of 3NF because 3NF relations still have insertion,
deletion and update anomalies but BCNF relations doesn’t have any insertion, deletion and update anomalies. In
BCNF every dependency is on the candidate key which is our goal of normalization and in 3NF it is not
required that every dependency is on candidate key.
For e.g., consider a relation Stud_Course with the following dependency diagram with the fact that a
particular student takes a course and has one instructor and a particular instructor teaches one course only.
Stud_Course
From this relation we have InstrnoCourseno i.e. instructor 99 teaches 1803, instructor 77 teaches 1903
and instructor 66 teaches 1803.
The super key and candidate key of this relation is Studno, Courseno.The relation is in 3NF
because we have FD Studno, CoursenoInstrno and Studno, Courseno is a super key and FD
InstrnoCourseno and Courseno is a prime attribute of this relation.
This relation has the following insertion, deletion and update anomalies.
1. Insertion Anomaly – How do we add the fact that instructor 55 teaches course 2906.
2. Deletion Anomaly – If we delete all rows for course 1803 we will lose the information that
instructor 99 teaches student 121 and 66 teaches student 222.
3. Update Anomaly – If we update the Courseno of student 121 to be 1903 then we also have to
update his Instrno to be 77.
Now the relation is not in BCNF because we have the FD InstrnoCourseno and Instrno is not a super key
of this relation. So we decompose this relation into two relations Stud(Studno, Instrno) and Instr(Instrno,
Courseno).
Stud
Studno Instrno
Instr
Instrno courseno
Stud
Studno Instrno
121 99
121 77
222 66
222 77
Instr
Instrno Courseno
99 1803
77 1903
66 1803
Now these relations don’t have any insertion, deletion and update anomalies because every dependency is
on the super key. The natural join between these two relations produce the original relations.
Sol: Consider a relation schema R1 with two attributes A and B and with the following dependency
diagram.
R1
A B
Now with this dependency diagram we either have AB or BA.
Now we define BCNF as a relation schema R is in BCNF if whenever a nontrivial functional dependency
XA holds in R, then X is a super key of R.
From the definition of BCNF the relation schema R1 is necessarily in BCNF because we either have AB
with super key A or have BA with super key B.
MULTIVALUED DEPENDENCIES
A multivalued dependency (MVD) on R, X Y , says that if two tuples of R agree on all the attributes of X, then their
components in Y may be swapped, and the result will be two tuples that are also in the relation.
i.e., for each value of X, the values of Y are independent of the values of R-X-Y.
Every FD is an MVD (promotion ):- If X Y, then swapping Y ’s between two tuples that agree on X doesn’t change
the tuples. Therefore, the “new” tuples are surely in the relation, and we know X Y.
Consider the following relation that represents an entity employee that has one mutlivalued attribute proj:
emp (e#, dept, salary, proj)
We have so far considered normalization based on functional dependencies; dependencies that apply only to
single-valued facts. For example, implies only one dept value for each value of e#. Not all information in a
database is single-valued, for example, e# proj in an employee relation may be the list of all projects that the
employee is currently working on. Although e# determines the list of all projects that an employee is working
on, e# dept is not a functional dependency.
Eg. programmer (emp_name, qualifications, languages)
The above relation includes two multivalued attributes of entity programmer; qualifications and languages. There
are no functional dependencies.
The attributes qualifications and languages are assumed independent of each other. If we were to consider
qualifications and languages separate entities, we would have two relationships (one between employees and
qualifications and the other between employees and programming languages). Both the above relationships are
many-to-many i.e. one programmer could have several qualifications and may know several programming
languages. Also one qualification may be obtained by several programmers and one programming language may
be known to many programmers.
The above relation is therefore in 3NF (even in BCNF) but it still has some disadvantages. Suppose a
programmer has several qualifications (B.Sc, Dip. Comp. Sc, etc) and is proficient in several programming
languages; how should this information be represented? There are several possibilities.
All these variations have some disadvantages. If the information is repeated we face the same problems of
repeated information and anomalies as we did when second or third normal form conditions are violated. If there
is no repetition, there are still some difficulties with search, insertions and deletions. For example, the role of
NULL values in the above relations is confusing.. These problems may be overcome by decomposing a relation
like that:-
The concept of multivalued dependencies was developed to provide a basis for decomposition of relations like
the one above. Therefore if a relation like enrolment(sno, subject#) has a relationship between sno and subject#
in which sno uniquely determines the values of subject#, the dependence of subject# on sno is called a
trivial MVD since the relation enrolment cannot be decomposed any further. More formally, X Ya MVD is
called trivial MVD if either Y is a subset of X or X and Y together form the relation R. The MVD is trivial since
it results in no constraints being placed on the relation.
A MVD XY in R is called a trivial MVD if
a) Y is a subset of X or
b) X U Y = R.
For e.g., consider the relation EMP_PROJECTS.from table we have a trivial MVD ENAME PNAME.
Therefore a relation having non-trivial MVDs must have at least three attributes;
two of them multivalued. Non-trivial MVDs result in the relation having some constraints on it since all possible
combinations of the multivalue attributes are then required to be in the relation. Possible: ZX Y but
X ZY means X Y , X Z.
Consider an example of Programmer(Emp name, qualification, languages) and discussed the problems that may
arise if the relation is not normalised further. We also saw how the relation could be decomposed into P1(Emp
name, qualifications) and P2(Emp name, languages) to overcome these problems. The decomposed relations are
in fourth normal form (4NF) which we shall now define.
We are now ready to define 4NF. A relation R is in 4NF if, whenever a multivalued dependency XYholds then
either
(a) the dependency is trivial, or (b) X is a candidate key for R.
In fourth normal form, we have a relation that has information about only one entity. If a relation has more than
one multivalue attribute, we should decompose it to remove difficulties with multivalued facts.
EMP
EMP
ENAME PNAME DNAME
Kumar X Rakesh
Kumar Y Rao
Kumar X Rao
Kumar Y Rakesh
From the above fig we have ENAMEPNAME and ENAME DNAME (or ENAME PNAME|DNAME)
holds in the EMP relation. The employee with ENAME ‘Kumar’ works on projects with PNAME ‘X’ and ‘Y’ and has two
dependents with DNAME ‘Rakesh’ and ‘Rao’.
This relation is not in 4NF because in the nontrivial MVDs ENAMEPNAME and ENAMEDNAME, ENAME is
not a super key of EMP. We decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS shown in fig below.
Both EMP_PROJECTS and EMP_DEPENDENTS are in 4NF, because the MVDs ENAMEPNAME
in EMP_PROJECTS and ENAMEDNAME in EMP_DEPENDENTS are trivial MVDs. No other nontrivial MVDs
hold in either EMP_PROJECTS or EMP_DEPENDENTS. No FDs hold in these relation schemas either.
JOIN DEPENDENCIES
Notice that an MVD is a special case of a JD where n = 2. That is, a JD denoted as JD(R1, R2) implies a
MVD (R1 R2) (R1 – R2) (or, by symmetry, (R1 R2 (R2 – R1)). A join dependency JD(R1,
R2,…,Rn), specified on relation schema R, is a trivial JD if one of the relation schemas Ri in JD(R1, R2,…,Rn) is equal to
R. Such a dependency is called trivial because it has the nonadditive join property for any relation state r of R and hence
does not specify any constraint on R.
For e.g., consider the relation SUPPLY with JD(R1, R2, R3) shown below. From the relations we have
R3) = SUPPLY
SUPPLY
SNAME PARTNAME PROJNAME
Kumar Bolt projX
Kumar Nut projY
Prakash Bolt projY
Ramesh Nut projZ
Prakash Nail projX
R1 R2
R3
For e.g., consider the SUPPLY relation shown above. The super keys of this relation are {SNAME, PARTNAME},
{SNAME, PROJNAME} and {PARTNAME, PROJNAME}. The decomposition of
R1(SNAME, PARTNAME), R2(SNAME, PROJNAME) and R3(PARTNAME, PROJNAME) is in 5NF
because we have a JD(R1, R2, R3) over R and every Ri is a super key of R.
Eg. An example of 5NF can be provided by the example below that deals with departments, subjects and
students.
The above relation says that Comp. Sc. offers subjects CP1000, CP2000 and CP3000 which are taken by a
variety of students. No student takes all the subjects and no subject has all students enrolled in it and therefore all
three fields are needed to represent the information.
The above relation does not show MVDs since the attributes subject and student are not independent; they are
related to each other and the pairings have significant information in them. The relation can therefore not be
decomposed in two relations
(dept, subject), and (dept, student)
without loosing some important information. The relation can however be decomposed in the following three
relations.
(dept, subject), and (dept, student) , (subject, student)
and now it can be shown that this decomposition is lossless.
(Assume that the domain for Wealthy Person consists of the names of all wealthy people in a pre-defined sample of
wealthy people; the domain for Wealthy Person Type consists of the values 'Eccentric Millionaire', 'Eccentric Billionaire',
'Evil Millionaire', and 'Evil Billionaire'; and the domain for Net Worth in Dollars consists of all integers greater than or
equal to 1,000,000.)
There is a constraint linking Wealthy Person Type to Net Worth in Dollars, even though we cannot deduce one from the
other. The constraint dictates that an Eccentric Millionaire or Evil Millionaire will have a net worth of 1,000,000 to
999,999,999 inclusive, while an Eccentric Billionaire or Evil Billionaire will have a net worth of 1,000,000,000 or higher.
This constraint is neither a domain constraint nor a key constraint; therefore we cannot rely on domain constraints and key
constraints to guarantee that an inconsistent Wealthy Person Type / Net Worth in Dollars combination does not make its
way into the database.
The DKNF violation could be eliminated by altering the Wealthy Person Type domain to make it consist of just two
values, 'Evil' and 'Eccentric' (the wealthy person's status as a millionaire or billionaire is implicit in their Net Worth in
Dollars, so no useful information is lost).
Denormalization
Occasionally database designers choose a schema that has redundant information; that is, it is not
normalized. They use the redundancy to improve performance for specific applications. The penalty paid
for not using a normalized schema is the extra work (in terms of coding time and execution time) to keep
redundant data consistent. For instance, suppose that the name of an account holder has to be displayed
along with the account number and balance, every time the account is accessed. In our normalized schema,
this requires a join of account with depositor. One alternative to computing the join on the fly is to store a
relation containing all the attributes of account and depositor. This makes displaying the account
information faster. However, the balance information for an account is repeated for every person who owns
the account, and all copies must be updated by the application, whenever the account balance is updated.
The process of taking a normalized schema and making it non-normalized is called denormalization, and
designers use it to tune performance of systems to support time-critical operations.