Sei sulla pagina 1di 27

Database Normalization

About You
How many of you
Currently

use SQL? Another RDBMS? Are responsible for database design? Will be in the future? Know about database normalization?
2

About This Session


Introduction What Is Database Normalization? What are the Benefits of Database Normalization? What are the Normal Forms? First Normal Form Second Normal Form Forming Relationships Third Normal Form Joining Tables De-Normalization Conclusion
3

What Is Database Normalization?


Cures

the Spread Sheet Syndrome i.e. Null Fields Store only the minimal amount of information. Remove redundancies. Restructure data.
4

Database Normalization

Database normalization is the process of removing redundant data from your tables in to improve storage efficiency, data integrity, and scalability. In the relational model, methods exist for quantifying how efficient a database is. These classifications are called normal forms (or NF), and there are algorithms for converting a given database between them. Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.

What are the Benefits of Database Normalization?

Decreased storage requirements! 1 VARCHAR(20) converted to 1 TINYINT UNSIGNED in a table of 1 million rows is a savings of ~20 MB Faster search performance! Smaller file for table scans. More directed searching. Improved data integrity!

What are the Normal Forms?


First

Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)
7

Table 1
Title
Database System Concepts

Author1

Author2 ISBN
0072958863

Subject
MySQL, Computers

Pages Publisher
1168 McGrawHill

Abraham Henry F. Silberschatz Korth

Operating Abraham Henry F. System Silberschatz Korth Concepts

0471694665

Computers

944

McGrawHill

First Normal Form


In our Table 1, we have two violations of First Normal Form: First, we have more than one author field, Second, our subject field contains more than one piece of information. With more than one value in a single field, It would be very difficult to search for all books on a given subject.

First Normal Form

Remove horizontal redundancies No two columns hold the same information No single column holds more than a single item Each row must be unique Use a primary key Benefits Easier to query/sort the data More scalable Each row can be identified for updating

10

First Normal Table


Table 2
Title Database System Concepts Database System Concepts Operating System Concepts Author Abraham Silberschatz Henry F. Korth ISBN 0072958863 Subject MySQL Pages 1168 Publisher McGraw-Hill

0072958863

Computers

1168

McGraw-Hill

Henry F. Korth

0471694665

Computers

944

McGraw-Hill

Operating System Concepts

Abraham Silberschatz

0471694665

Computers

944

McGraw-Hill

11

We

now have two rows for a single book. Additionally, we would be violating the Second Normal Form A better solution to our problem would be to separate the data into separate tables- an Author table and a Subject table to store our information, removing that information from the Book table:
12

Subject Table Subject_ID 1 2 Subject MySQL Computers Author Table Author_ Last Name ID First Name

1
2 Book Table

Silberschatz
Korth

Abraham
Henry

ISBN
0072958863

Title
Database System Concepts Operating System Concepts

Pages
1168

Publisher
McGraw-Hill

0471694665

944

McGraw-Hill

13

Each

table has a primary key, used for joining tables together when querying the data. A primary key value must be unique with in the table (no two books can have the same ISBN number), And a primary key is also an index, which speeds up data retrieval based on the primary key. Now to define relationships between the tables

14

Forming Relationships

Three Forms One to (zero or) One One to (zero or) Many Many to Many One to One Same Table? One to Many Place PK of the One in the Many Many to Many Create a joining table

15

Relationships
Book_Author Table Book_Subject Table

ISBN

Author_ID ISBN Subject_ID 0072958863 1

0072958863 1 0072958863 2 0471694665 1

0072958863 2
0471694665 2

0471694665 2

16

Second Normal Form


Table must be in First Normal Form Remove vertical redundancy The same value should not repeat across rows Composite keys All columns in a row must refer to BOTH parts of the key Benefits Increased storage efficiency Less data repetition

17

2NF Table
Publisher Table Publisher_ID Publisher Name 1 McGraw-Hill

Book Table

ISBN
0072958863

Title

Pages

Publisher_ID
1

Database System Concepts 1168

0471694665

Operating System Concepts 944

18

2NF

Here we have a one-to-many relationship between the book table and the publisher. A book has only one publisher, and a publisher will publish many books. When we have a one-to-many relationship, we place a foreign key in the Book Table, pointing to the primary key of the Publisher Table. The other requirement for Second Normal Form is that you cannot have any data in a table with a composite key that does not relate to all portions of the composite key.

19

Third Normal Form

Table must be in Second Normal Form If your table is 2NF, there is a good chance it is 3NF All columns must relate directly to the primary key Benefits - No extraneous data Third normal form (3NF) requires that there are no functional dependencies of non-key attributes on something other than a candidate key. A table is in 3NF if all of the non-primary key attributes are mutually independent There should not be transitive dependencies 20

21

Boyce-Codd Normal Form

BCNF requires that the table is 3NF and only determinants are the candidate keys The determinant column is one which some of the columns are fully functionally dependant. it is more rigorous version of 3NF deal with relational tables that had 1. Multiple candidate keys 2. Composite candidate keys 3. candidate keys that overlapped In BCNF, it may not be possible to preserve dependencies
22

Example

R={J,K,L} F={JK->L, L->K}

J
J1

K
K1

L
L1

J2
J3

K1
K1

L1
L1

Null

k2

l2

23

4NF

If relation is in BCNF and all multivalued dependencies are also functional dependencies. Multivalued dependencies R.A -- >>R.B

Student_id Project_id 705 705 1 5

Student_id skill
705 705 705 Analysis Design Program
24

De-Normalizing Tables
Use with caution Normalize first, then de-normalize Use only when you cannot optimize Try temp tables, UNIONs, VIEWs, subselects first

25

Conclusion

http://dev.mysql.com/tech-resources/articles/intro-tonormalization.html MySQL Database Design and Optimization Jon Stephens & Chad Russell Chapter 3 ISBN 1-59059-332-4 http://www.openwin.org/mike/books http://www.openwin.org/mike/presentations http://www.openwin.org/mike/presentations/ http://dev.mysql.com/tech-resources/articles/intro-tonormalization.html
26

QUESTIONS?

27

Potrebbero piacerti anche