Sei sulla pagina 1di 49

The Relational Model and

Normalization

The Relational Model

Page 113

Broad, flexible model


Basis for almost all DBMS products
E.F. Codd defined well-structured
normal forms of relations,
normalization

Relational Data Model

A relational data model organizes data as a


set of relations, or two-dimensional tables.
A relation is viewed as a two-dimensional
table, with following properties:

Each column contains values about the same


attribute, and each table cell must be simple
Each column has a distinct name (attribute
name), and the order of columns is immaterial
Each row is distinct, duplicate rows are not
allowed
The sequence of the rows is immaterial

An Example Relation
Key

Candidate
Key

Foreign
Key

Non-key
Attribute

Non- key
Attribute

Employee Employee
Number
Name
28719
Smith Tom

Department
Number
172

Salary
18,000

Date
Started
12/03/84

53730

Jones Bill

044

20,000

01/05/83

79313

Ropley Ed

044

11,000

18/09/81

51616

Fair Carolyn

090

50,000

05/12/79

61930

Hall Albert

090

25,000

21/06/82

Terminology in a
Relation

Tuple - a row or record


Column - values of an attribute
Domain - a set of possible values for
an attribute

Terminology in a Relation

Key

primary key (unique ID)


Concatenated key - use two or more
attributes to identify a record (e.g.. Student
ID & Course ID to identify a Grade record)

Foreign key (cross reference key)

a foreign key is a non-key attribute in one


relation that also appears as a primary key
in another relation

An E-R Model for


Student Registration
System
Attributes

Course
Number

Instructor ID

Description

Name

Room

Course
1

Teaches

Rank

Instructor
1
Advises

M
Course Enrollment

Course
Number

Grade

M
M
Student
Number

Student

Student
Number

Major
Student
Name

Covert E-R Model to


Relational Tables
Create one table for each entity
with key and attributes
Introduce foreign key into the
many side to represent 1:m
relation

A Relational Model For


Student Registration
System
Course Table
Course ID

Description

Credit

Instructor ID

Instructor Table
Instructor ID

Instructor Name

Rank

Student Table
Student ID

Student Name

Major

Enrollment Table
Course ID

Student ID

Grade

Advisor ID

Relational Database

Advantages

Easy to understand and use


Powerful data manipulation capability
Implicit association to meet different needs. Flexible,
best for DSS
Normalization theory for database design

Disadvantages

Redundantly store keys as logical pointers for


implementing relationship
Inefficiency for high-volume transaction processing
Lack of semantic quality control

Equivalent Relational
Terms

Page 114

Figure 5-1

2000 Prentice Hall

Normalization

Reduce complex user views to a


set of small, stable data structures
Eliminate errors and
inconsistencies related to the
adding, deleting or updating of
record occurrences

Modification Anomalies

Insertion anomalies - cannot add a


record because of a missing value for
one or more fields
Deletion anomalies - the deletion of a
record causes an unintended deletion of
information
Update anomalies - updating as made
needlessly complicated due to
redundancy

Functional Dependence

Given a relation R, attribute Y of R


is functionally dependent on
attribute X of R if and only if,
whenever two tuples of R agree on
their X- value, they must
necessarily agree on their Y-value.
We write R.X --> R.Y

Example:
(Student ID, Student Name, Course ID, Course Title, Grade)

Student ID --> Student Name,


Course ID --> Course Title
Student ID -?-> Course ID
Course Title -?-> Student Name
Student ID -?-> Grade
Course ID -?-> Grade

Normal Forms

A relation is said to be in a
particular normal form if it satisfies
a certain specified set of
constraints

Normal Forms
1 NF (no repeating groups)
2 NF (no partial dependencies)
3 NF (no transitive dependencies)
Boyce-Codd NF
4 NF (no multi-value dependencies)
5 NF

Domain-Key NF

First Normal Form

A relation is in first normal form if it


contains no repeating groups

First Normal Form

An un-normalized relation contains


repeating groups

First Normal Form

Grade Report with repeating group of courses for


each student
(Student ID, Student Name, Campus Address,
Major, Course ID, Course Title, Instructor Name,
Instructor Location, Grade)
Remove repeating group
(Student ID, Student Name, Campus Address,
Major) (3NF)
(Student ID, Course ID, Course Title, Instructor
Name, Instructor Location, Grade) (1NF)

First Normal Form

Second Normal Form

A relation is in second normal


form if it is already in first
normal form and any partial
functional dependencies on the
primary key have been
removed

Second Normal Form


A

partial functional dependencies on the primary key


A

Second Normal Form

(Student ID, Course ID, Course Title, Instructor


Name, Instructor Location, Grade) (1NF)
Primary key is Student ID + Course ID
Student ID + Course ID --> Grade
Course ID --> Course Title (partial dependency)
Removing partial dependencies
(Student ID, Course ID, Grade) (3NF)
(Course ID, Course Title, Instructor Name,
Instructor Location ) (2NF)

Second Normal Form

Third Normal Form


A relation is in third normal form
if it is already in second normal
form and contains no transitive
dependencies
transitive dependency - One
nonkey attribute is dependent on
one or more nonkey attributes

Third Normal Form


A

transitive dependencies
A

Third Normal Form

(Course ID, Course Title, Instructor Name,


Instructor Location ) (2NF)
Course ID --> Instructor Name --> Instructor
Location
Instructor Name is nonkey
Instructor Location is dependent on Instructor
Name
Remove transitive dependency
(Course ID, Course Title, Instructor Name) (3NF)
(Instructor Name, Instructor Location ) (3NF)

Third Normal Form

Third Normal Form


if it is in second normal form and
has no transitive dependencies

Figure 5-7

2000 Prentice Hall

Practice: Mountain View


Community Hospital
Mountain View Community Hospital
Physician Report
Physician: A Campbell
Specialty: Internal Medicine
Date
Patient-Code
Patient-Name
Procedure
Charge
---------------------------------------------------------------------------------------------10/17/96 32968
Baker, Marry S. Examination
35.00
X-ray
75.00
10/17/96 39271
Emery, Nancy
Examination
35.00
Chemotherapy 50.00
10/18/96 32968
Baker, Marry S. Examination
35.00
----------------------------------------------------------------------------------------------

Normalize a table
Report (Doctor Name, Specialty, Date, Patient Code,
Patient Name, Procedure Name, Charge)
Analyzing functional dependency:
Assume no duplicate Doctor Name. Otherwise
introduce a doctor ID
Assume no duplicate Procedure Name. Otherwise
introduce a Procedure code
Assume charge is determined by procedure.
Assume a patient may visit a doctor more than
once during the same day.

Answer

Doctors (Doctor ID, Doctor Name, Specialty)


Patients (Patient Code, Patient Name)
Visit (Visit ID Doctor ID, Patient Code, Date)
Treatment (Visit ID, Procedure ID)
Procedure (Procedure ID, Procedure Name,
Charge)

Here the Visit ID is automatically generated by


the system

A E-R Model for


Hospital Treatment Charge
Procedure
ID

Doctor ID

Description

Name

Rate

Specialty

Procedure

Doctor ID

M
Treatment

Visit ID

Procedure
ID

Visit ID
Date/Time

Doctors

M
Patients

Visit

Patient
Code

1
Patient
Code

Patient
Name

E-R model improvement


criteria vs. Normalization
Each entity must have a key (simple or
Theory

composite) (basic requirement of a


relational table)
Introduce composite entity to convert a m:n
relation into two 1:m relations. Introduce a
composite key (the way of presenting m:n
relationships in relational database)
Convert a multivalued attribute into an
attribute entity or weak entity (1 NF)

E-R model improvement


criteria vs. Normalization
Make
theory
each entity represent a simple
object or concept (2 NF and 3NF)
Divide complex entity into several
related simple entities (2 NF and 3 NF)
Make each attribute associate with only
one entity unless it is a foreign key (3
NF)
A good E-R model usually satisfies 3 NF.

Boyce-Codd Normal
Form
if every determinant is a candidate key

Figure 5-8

2000 Prentice Hall

Boyce-Codd Normal Form


(Student, Major, Advisor) (3NF)
or (Student, Advisor, Major) (1NF)
Student may have more than one major
with one advisor in each major
Student + Major Advisor
Student + Advisor Major
Advisor Major (Advisor determines major but Advisor

is not candidate key)

(Student, Advisor) (BCNF)


(Advisor, Major) (BCNF)

Boyce-Codd Normal Form

A relation is in BCNF if and only if it is in 3NF


and every determinant is a candidate key
A determinant is any attribute (simple or
composite) on which some other attribute is
fully functionally dependent
Situation:
1. Multiple candidate keys
2. Those candidate keys are composite
3. The candidate keys are overlapped

Fourth Normal Form

A relation is in fourth normal form if it is in


BCNF and contains no multivalued
dependencies
Multivalued Dependency

There are three attributes (e.g. A,B,C) in a


relation.
For each value of A there is a well-defined set
of value of B and a well-defined set of value of
C.
The set of value of B is independent of the set
of value of C, and vice versa.

Fourth Normal Form

(Course, Instructor, Textbook)


(BCNF)
One course is taught by several
instructors
One course uses the same set of
textbooks by each instructor
(Course, Textbook) (4NF)
(Course, Instructor) (4NF)

Fourth Normal Form


Course

Instructor

Textbook

1ka3

David

Intro. Web
design

1ka3

Smith

Intro. Web
design

1ka3

David

Intro. Access

1ka3

Smith

Intro. Access

Course

Instructor

Course

Textbook

1ka3

David

1ka3

Intro. Web design

1ka3

Smith

1ka3

Intro. Access

Fifth Normal Form

?
Page 125

Fifth Normal Form

Every join dependency is a


consequence of its relation keys
A non 5NF: Person-using-skills-onjobs (Person, Skill, Job)
5 NF: Has-skill (Person, Skill)
Need-skill (Skill, Job)
Assigned-to-job (Person, Job)

Domain Key Normal


Form
if every constraint on the relation
is a logical consequence of the
definition of keys and domains
Constraint a rule governing
static values of attributes
Key unique identifier of a
tuple

Domain
Page 125

description of an
attributes allowed values

Example of non DK/NF

Enrollment (Student ID, Course ID, Grade)


Key constraint: Student ID + Course ID --> Grade
Domain constraint:
Student ID: 7 digits, Course ID: 3 digits, Grade:
A,B,C,D,F,P
General constraint
If Course ID < 900 then Grade in {A,B,C,D,F}
else Grade in {P,F}
Since the general constraint cannot be inferred from
key constraint or domain constraint, it is not a
DK/NF.

Remarks on Normalization

The notions of dependency and


normalization are semantic in
nature
The normalization guidelines
should be regarded primarily as a
discipline to help the database
design

Limitations of
normalization

may not natural, e.g. zip code, area code for


phone #
May ignore operational considerations: need not
change, may change over time. e.g. (order# ,
prod# ,description, unit-price, quantity)
Difficult to enforce integrity control
(Order#, Prod#, quantity)
(Prod#, Description, Unit-price)
Prod# may not be valid.
Now the integrity control is provided by relational
DBMS

Denormalization

Normalization is only one of many database


design goals.
Normalized (decomposed) tables require
additional processing, reducing system
speed.
Normalization purity is often difficult to
sustain in the modern database
environment. The conflict between design
efficiency, information requirements, and
processing speed are often resolved through
compromises that include denormalization.

Potrebbero piacerti anche