Sei sulla pagina 1di 9

What is Normalization ? Why should we use it?

Normalization is a database design technique which organizes tables in a manner that reduces redundancy and dependency of data. It divides larger tables to smaller tables and link them using relationships. The inventor of the relational model Edgar Codd proposed the theory of normalization with the introduction of FirstNormal Form and he continued to extend theory with Second and Third Normal Form. Later he joined with Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form. Theory of Normalization is still being developed further. For example there are discussions even on 6th Normal Form. But in most practical applications normalization achieves its best in 3rd Normal Form. The evolution of Normalization theories is illustrated below-

Lets learn Normalization with practical example Assume a video library maintains a database of movies rented out. Without any normalization all information is stored in one table as shown below. Table 1 : Here you see Movies Rented column has multiple values.

1NF Rules :

A table is in 1NF form if and only if all records in the table is Unique and must be Single Valued. Each table cell should contain single value. Each record needs to be unique.

The above table in 1NF-

Before we proceed lets understand a few things --

What is a KEY ?
A KEY is a value used to uniquely identify a record in a table. A KEY could be a single column or combination of multiple columns Note: Columns in a table that are NOT used to uniquely identify a record are called non-key columns.

What is a primary Key?


A primary is a single column values used to uniquely identify a database record. It has following attributes

A primary key cannot be NULL A primary key value must be unique The primary key values can not be changed The primary key must be given a value when a new record is inserted.

What is a composite Key?


A composite key is a primary key composed of multiple columns used to identify a record uniquely. In our database , we have two people with the same name Robert Phil but they live at different places.

Hence we require both Full Name and Address to uniquely identify a record. This is a composite key.

2NF Rules : A 1NF table is in 2NF form if and only if


Functionally Dependent on the Candidate Key.

every Non-Key Attributes are

Rule 1- Be in 1NF Rule 2- Has no Partial Dependancy

It is clear that we cant move forward to make our simple database in 2nd Normalization form unless we partition the table above.

Table 1

Table 2 We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member information. Table 2 contains information on movies rented. We have introduced a new column called Membership_id which is the primary key for table 1. Records can be uniquely identified in Table 1 using membership id

Introducing Foreign Key!


Foreign Key references primary key of another Table!It helps connect your Tables

A foreign key can have a different name from its primary key It ensures rows in one table have corresponding rows in another Unlike Primary key they do not have to be unique. Most often they arent Foreign keys can be null even though primary keys can not

Why do you need a foreign key?


Suppose an idiot inserts a record in Table 2 such as. You will only be able to insert values into your foreign key that exist in the unique key in the parent table. This helps in referential integrity.

The above problem can be overcome by declaring membership id from Table2 as foreign key of membership id from Table1 Now , if somebody tries to insert a value in the membership id field that does not exist in the parent table , an error will be shown!

What is a transitive functional dependency?


A transitive functional dependency is when changing a non-key column , might cause any of the other non-key columns to change Consider the table 1. Changing the non-key column Full Name , may change Salutation.

Lets move ito 3NF

3NF Rules: A 2NF table is in 3NF form if and only if every Non-Key Attribute in a table is
Fully Functionally Dependent on Key Attribute.

Rule 1- Be in 2NF Rule 2- Has no transitive functional dependencies

To move our 2NF table into 3NF we again need to need divide our table.
Table 1

Table 2

Table 3

We have again divided our tables and created a new table which stores Salutations. There are no transitive functional dependencies and hence our table is in 3NF In Table 3 Salutation ID is primary key and in Table 1 Salutation ID is foreign to primary key in Table 3

NORMAL FORMS
First Normal Form (1NF) Rules for an organized database:

Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second Normal Form (2NF) Rules for removing duplicative data:


Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships b/w these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF) Goes one large step further:


Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key.

BCNF (3.5 NF) Adds one more requirement:


Meet all the requirements of the third normal form. Every determinant must be a candidate key.

4th Normal Form (4NF) Has one additional requirement:


Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies.

5th Normal Form (5NF)


A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed in to any number of smaller tables without loss of data.

6th Normal Form(6NF)


6th Normal Form is not standardized yet however it is being discussed by database experts for some time. Hopefully we would have clear standardized definition for 6th Normal Form in near future.

Thats all to Normalization!!!

Codd's Rules!!
CODDs 12 rules define an ideal relational database which is used as a guideline for designing relational database systems today. Though no commercial database system completely conform to all 12 rules, they do interpret the relational approach. Here are the CODDs 12 rules:

Rule 0: Foundation Rule: The system must qualify as relational both as a database and as a management system. Rule 1: The Information Rule: All information in the database must be represented in one and only one way Rule 2: The Guaranteed Access Rule: All data should be logically accessible through a combination of table name, primary key value and column name. Rule 3: Systematic Treatment of NULL Values: A DBMS must support Null Values to represent missing information and inapplicable information independent of data types. Rule 4: Active Online Catalog Based on the Relational Model: The database must support online relational catalog that is accessible to authorized users through their regular query language. Rule 5: The Comprehensive Data Sublanguage Rule: The database must support at least one language that defines & supports data definition and manipulation operations, data integrity and database transaction control. Rule 6: The View Updating Rule: Representation of data can be done using different logical combinations called Views. All the views that are theoretically updatable must also be updatable by

the system.

Rule 7: High-level Insert, Update, and Delete: The system must support set at a time insert, update and delete operators. Rule 8: Physical Data Independence: Changes made in physical level must not impact and require a change to be made in the application program. Rule 9: Logical Data Independence: Changes made in logical level must not impact and require a change to be made in the application program. Rule 10: Integrity Independence: Integrity constraints must be defined and separated from the application programs. Changing Constraints must be allowed without affecting the applications. Rule 11: Distribution Independence: The user should be unaware about the database location i.e. whether or not the database is distributed in multiple locations. Rule 12: The No-subversion Rule: If a system provides a low level language, then there should be no way to subvert or bypass the integrity rules of high-level language.

Of all the rules, rule 3 is the most controversial. This is due to a debate about three-valued or ternary, logic. Codd's rules and SQL use ternary logic, where null is used to represent missing data and comparing anything to null results in an unknown truth state. However, when both booleans or operands are false, the operation is false; therefore, not all data that is missing is unknown, hence the controversy.

Potrebbero piacerti anche