Sei sulla pagina 1di 23

Choons Guide to Normalization

(UNF 3NF)
For AFW2851 use only

by Tan Choon Ling


27/3/2011

Disclaimer
This guide is NOT designed to replace the lecture
notes, they are meant to make reading the lecture
notes EASIER
This guide will NOT tell you what is a primary/foreign
key (again, lecture notes!)
The examples in this guide are from the lecture notes
because with a different way of explaining things,
hopefully you can understand the lecture notes better
Finally, I CAN be wrong, so let me know if you see
something you feel is not right so I can help make this
guide better.

UNF
You should have seen this table before:

However, the way this table is shown might not be intuitive (its
CORRECT, just that you might not get the right idea on what UNF
really is)

So why show it like this?


Because this is what UNF looks like in real life databases

Now, if you look at say, project number 18, you might think that that
project is represented by 4 rows right?
Not quite. Its actually one row, just that real-life databases can only
show it this way.
Fortunately, theres other ways of showing project number 18 (and all
the other projects)

This is another way of representing UNF


PROJ_NUM

PROJ_NAME

18

Amber Wave

EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

JOB_CLASS
CHG_HOUR HOURS
Applications Designer
$48.10
24.6
General Support
$18.36
45.3
Systems Analyst
$96.75
32.4
DSS Analyst
$45.95
44

In this way, it is much easier to find out repeating groups, and you can
see that project 18 is really more like 1 row, but with many different
values of emp_name, job_class, etc. etc.
Repeating groups are attributes/columns that have many values
associated with one value of the key attributes/columns in the table
(most likely proj_num)
Note: I may use the words attribute/column interchangeably in this
guide, but we (both you and i) should all try to stick to the word
attribute

So why is UNF bad?


Lets take a look at another example
CUST_ID

CUST_NAME
23 John Travolta

CUST_ADDRESS
CUST_PHONE
No. 22, Beverly Hills 555-3341

CUST_EMAIL
jt@hollywood.com

Its possible for any customer to have more than one email address.
So if we add one more email address, it will look like this:
CUST_ID

CUST_NAME
23 John Travolta
23 John Travolta

CUST_ADDRESS
CUST_PHONE
No. 22, Beverly Hills 555-3341
No. 22, Beverly Hills 555-3341

CUST_EMAIL
jt@hollywood.com
john.t@beverly.com

See what happens? Adding a new email means adding 4 attribute


values that were already previously there (redundant data). This is a
waste of space.
And this is only for the email address, what about the phone
number? And the address?

So the issue is figuring out which attributes can have multiple values .
If it can, it is definitely a repeating group.

If you know for a fact that an attribute can only have one value
(whether it is by common sense or set by the question), then all is
good!

UNF to 1NF (method 1)


Step 1: Copy appropriate values into the empty cells for each row.

So just copy over the values of proj_num and name into the empty
cells (this process actually divides 1 row into multiple rows)

Before Step 1

After Step 1

Another way of seeing it


Step 1: Copy appropriate values into the empty cells for each row.

In other words, unmerge these cells

PROJ_NUM

PROJ_NAME

18

Amber Wave

EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

JOB_CLASS
CHG_HOUR HOURS
Applications Designer
$48.10
24.6
General Support
$18.36
45.3
Systems Analyst
$96.75
32.4
DSS Analyst
$45.95
44

Before Step 1
PROJ_NUM

PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave

EMP_NUM

EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

After Step 1

JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst

CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44

UNF to 1NF (method 1)


Step 2: Find a combination of attributes that uniquely identifies each
row. This combination will be the primary key (This CAN be tedious to
do, but its not hard)
PROJ_NUM

PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave

EMP_NUM

EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst

CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44

How to do it:
(1) Look at proj_num + proj name: does this combination uniquely identify a row?
Nope, 18 Amber Wave can mean any of the 4 rows
(2) Look at proj_num + emp_num: Yes! 18 114 uniquely identifies the first row
(3) Look at proj_num + emp_name: Yes! But emp_num already does the job well
enough, so ignore this. (Besides you never know if an employee has multiple
names)
(4) Proj_num and job_class? No, you could have many Systems Analysts in project
18. So 18 Systems Analysts would not uniquely identify a row.
(5) Repeat the process of finding combinations. If you cant find combinations of
2 attributes, look at combinations of 3, then 4, so on (the highest combination
number you should reach is (total number of attributes 1.) can you guess why?)

UNF to 1NF (method 1)


PROJ_NUM

PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave

EMP_NUM

EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst

CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44

In this case, the primary key consists of 2 attributes (proj_num and


emp_num)
Some people like to re-arrange all the primary keys to the left, its
really up to you if you want to do this or not
PROJ_NUM

EMP_NUM
18
18
18
18

PROJ_NAME
114 Amber Wave
118 Amber Wave
104 Amber Wave
102 Amber Wave

EMP_NAME
Annelise Jones
James J. Frommer
Anne K. Ramoras *
Darlene M. Smithson

Both tables at the top are now in 1NF

JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst

CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44

UNF to 1NF (method 2)


Step 1: Place repeating data with a copy of the original key attribute(s)
in a seperate relation(table) . Nominate one of the non-repeating
attributes to be the primary key of the new table. Name all tables.
PROJ_NUM

PROJ_NAME

18

Amber Wave

EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

JOB_CLASS
CHG_HOUR HOURS
Applications Designer
$48.10
24.6
General Support
$18.36
45.3
Systems Analyst
$96.75
32.4
DSS Analyst
$45.95
44

Before Step 1
We normally nominate the id

PROJ_NUM

PROJ_NAME

18

Amber Wave

Copy these repeating groups into a new table

PROJ_NUM
18
18
18
18

EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

Projects

JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst

CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44

Employee_Assignments
After Step 1

UNF to 1NF (method 2)


Step 2: Find the primary keys for ALL tables (dont take for granted that
the nominated primary key is enough!)
PROJ_NUM

PROJ_NAME

18

Amber Wave

Projects
PROJ_NUM
18
18
18
18

EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

For this table, its easy enough to see that


each proj_num uniquely identifies a
proj_name. So make proj_num a PK.
JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst

CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44

Employee_Assignments

Proj_num cannot uniquely identify each row. So look for another


column.
Proj_num + emp_num? Yes! It definitely identifies each row uniquely.
Proj_num + emp_name? Yes! But emp_num does that already
Proj_num + job_class? Nope. A project might have two DSS analysts.
Proj_num + chg_hour? Nope. Two people in a project may charge the
same price.
Proj_num + hours? Nope. Two people in a project may work the same
no. of hours.

End of 1NF
Method 1
Rpt_Format(proj_num, proj_name, emp_num, emp_name, job_class,
chg_hour, hours)
Method 2
Projects(proj_num, proj_name)
Employee Assignments(proj_num, emp_num, emp_name, job_class,
chg_hour, hours)

Which is the better method? I prefer Method 2, since it shows a better


understand of UNF -> 1NF. Besides, if you are planning to go all the way to
3NF, you will already be doing some of the work if you use Method 2. (You
might already be able to see some partial dependencies if you use Method
1.)

Partial Dependencies (method 1)


Before moving on to normalizing from 1NF to 2NF, youll need to know
what are partial dependencies
PROJ_NUM

PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave

EMP_NUM

EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst

CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44

If the value of a non-key attribute is determined by only PART OF the


whole primary key, then that attribute is partially dependent on the
whole primary key (or you can also say that that attribute is
fully/functionally dependent on part of the PK)
This is because only part of the PK uniquely identifies a non- key
attribute

Note: It is impossible for a table with only one primary key to have a
partial dependency. Can you guess why?

Non-key attribute = any attribute that is not a primary key or a foreign key

Partial Dependencies (method 1)


PROJ_NUM

PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave

EMP_NUM

EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson

JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst

CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44

So now, for each attribute in the PK...


(1) Does only proj_num uniquely identify proj_name? Yes! Theres a partial dependency
there.
(2) Dont compare proj_num to emp_num, they are both part of the primary key.
(3) Does only proj_num uniquely identify emp_name? No.
(4) Compare proj_num with all other attributes. None found.
(5) Compare emp_num with:
Proj_name: different emp_num, same proj_name. Nope.
Emp_name: 1 emp_num for 1 emp_name. Yes!
Job_class: At first, it seems like a yes. But check the original table! There are
many employees (101 and 105) with the same job_class (database designer) .
Different emp_num, same job_class. Nope.
Chg_hour: Same as job_class, so nope.
Hours: Different employees can work the same hours. Different emp_num,
same hours. Nope.

Partial Dependencies (method 1)


Arrows for primary key(s) and nonkeys should be here!
PROJ_NUM PROJ_NAME EMP_NUM

Partial dependency

EMP_NAME

JOB_CLASS CHG_HOUR HOURS

Partial dependency

Rpt_Format(PROJ_NUM, EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR,


HOURS)
Partial Dependencies:
(PROJ_NUM -> PROJ_NAME)
(EMP_NUM -> EMP_NAME)
Transitive Dependencies:
Not shown here. Up to you if you want to write them down.

1NF to 2NF
Now you have identified all the partial dependencies, and written them down,
create new tables based on these dependencies, and name the tables.
Rpt_Format(PROJ_NUM, EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR,
HOURS)
Partial Dependencies:
(PROJ_NUM -> PROJ_NAME)
(EMP_NUM -> EMP_NAME)

Create 1 new table for each


dependency

Attributes that are taken over into the new tables dont have to be in the old
table, So after 2NF...
Projects
PROJ_NUM

Employee Assignments
PROJ_NAME

Employees
EMP_NUM

EMP_NAME

PROJ_NUM

EMP_NUM

JOB_CLASS

CHG_HOUR HOURS

Partial Dependencies (method 2)


At 1NF for Method 2...
Projects(proj_num, proj_name)
Employee Assignments(proj_num, emp_num, emp_name, job_class,
chg_hour, hours)

Projects
PROJ_NUM PROJ_NAME
Employee_Assignments
PROJ_NUM EMP_NUM

EMP_NAME

Partial dependency
Partial Dependencies:
Emp_num -> emp_name

JOB_CLASS CHG_HOUR HOURS

End of 2NF
At 1NF for Method 2...
Projects(proj_num, proj_name)
Employee Assignments(proj_num, emp_num, emp_name, job_class,
chg_hour, hours)
Partial Dependencies:
Emp_num -> emp_name
Projects
PROJ_NUM

Employee Assignments
PROJ_NAME

PROJ_NUM

EMP_NUM

JOB_CLASS

CHG_HOUR HOURS

Employees
EMP_NUM

EMP_NAME

End of Method 1 = End of Method 2! (Not always the case, just for this example.

Transitive Dependencies
Transitive dependencies are easy to look for. Just see if any non-key attribute
is 100% influenced by another non-key attribute
Projects
PROJ_NUM

Employee Assignments
PROJ_NAME

PROJ_NUM

EMP_NUM

JOB_CLASS

CHG_HOUR HOURS

Employees
EMP_NUM

EMP_NAME

The Projects and Employees tables only has 1 non-PK attribute, so they are
definitely in 3NF already! So we look at the Employee Assignmentes table:
Does job_class affect chg_hour? Yes! DSS analysts definitely earn a specific
amount, for example.
Does job_class affect hours? No, 2 people with different jobs can work the
same number of hours.
Does chg_hour affect job class? No! 2 people with different jobs can charge
the same amount! (This is an example where A influences B, but B doesnt
influence A)
Chg_hour doesnt influence hour. Hour doesnt influence anything.

2NF to 3NF
Now you have identified all the transitive dependencies, and written them
down, create new tables based on these dependencies, and name the tables.
At 2NF...
Projects(proj_num, proj_name)
Employees(emp_num, emp_name)
Employee Assignments(proj_num, emp_num, job_class, chg_hour, hours)
Transitive Dependencies:
Job_class -> chg_hour

Projects
PROJ_NUM

Employee Assignments
PROJ_NAME

Employees
EMP_NUM

Again, create one new table for each


transitive dependency. The influencing
attribute (job_class) will stay in the old
table, and be the PK for the new table

EMP_NAME

PROJ_NUM

EMP_NUM

JOB_CLASS

CHG_HOUR HOURS

Transitive dependency

End of 3NF
Projects
PROJ_NUM

Employee Assignments
PROJ_NAME

Employees
EMP_NUM

PROJ_NUM

EMP_NUM

JOB_CLASS

Job Rates
EMP_NAME

JOB_CLASS

CHG_HOUR

Just as for partial dependencies, the influencing attribute stays in the


old table, and becomes the PK for the new table, while the
influenced attributes are taken(cut) from the old table and placed
(pasted) into the new one. The ONLY difference is whether the
influencing attribute is a PK in the old table (partial) or a non-key in
the old table (transitive)

HOURS

Potrebbero piacerti anche