Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
The Entity-Relationship model (or ER model) is a way of graphically representing the
logical relationships of objects in order to create a database. Creation of an ER diagram is
the first step in designing a database. It helps the designer(s) to understand and to specify
the desired components of the database and the relationships among those components. An
ER model is a graphical representation which contains entities or "items", relationships among
the entities and attributes of the entities and relationships.
Let us take University database as an example and try to understand how ER model is arrived at.
Example:
A university consists of a number of departments. Each department offers several courses. Each
course includes a number of modules. Students enroll in a particular course and study modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate
department, and each lecturer teaches a group of students.
Entities
Entities are real world items or concepts that exist on their own and are represented as objects
or things of interest. An entity type is a collection of entities that share a common definition.
Identify all nouns in our university example,
A university consists of a number of departments. Each department offers several courses. Each
course includes a number of modules. Students enroll in a particular course and study modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate
department, and each lecturer teaches a group of students.This scenario consists of students,
lecturers, modules, courses and departments. So here the physical things(Physical things are
those which exist in this world, that we can touch, feel etc.) like students, lecturers and abstract
things(An abstract thing is an idea or a concept in your mind. It is not something that you can
physically reach out and touch, smell, hear, taste, see) like modules,department etc., make an
entity type. If we take students as an entity type, then each student in the university is an entity.
The entities are represented as nouns in the description because they are objects or things.
We can touch an entity of physical things and feel the entity of abstract things but an entity type
is simply an idea. Student is an idea of physical things (entity type) while Scott, Nancy, Lindsey,
and Mackenzie are touchable (Student names are entities). Department is an idea of abstract
things (entity type) while IT,CSE,ECE and CIVIL are entities.
Entity Diagrams
The box is labeled with the name of the entity type. The entities identified in our example
are shown in Figure 2.1.
If an entity depends on another existing entity then it is considered as weak. A weak entity
cannot be identified by its own attributes. A weak entity is represented by double
rectangles in E-R diagram.
Example:
SubModule is a good example for weak entity. The SubModule will be meaningless without a
Module entity and so it depends on the existence of Module as shown in Figure 2.2
Attributes
Attributes represent properties, facts, aspects or details of an entity. There are attributes or
particular properties that describe each entity .In our University database each student in the
university will have a Student ID, Name, Course taken etc. Similarly each lecturer will have
his/her own properties of ID, Name, department etc.Attributes will have a name, an associated
entity and properties of an entity. Attributes are often nouns also.
Attributes in ER diagram
The figure below represents the entities and their corresponding attributes in the University
database.
A multivalued attribute is an attribute that has more than one value attached to it. For
instance if phone number and graduating degree are the attributes of an Entity called
Person, then those attributes could have multiple values, as a person could have multiple
phone numbers or could hold multiple graduating degrees. We represent a multi valued
attribute by double oval in E-R diagram. Single Valued Attribute: Attribute that holds a
single value; in Our example the attributes of Students such as Roll number, Age, Date of Birth,
City etc., can have only a single value. In our example, a Student can have multiple phone
numbers, and so Phone number is a multi valued attribute.
Relationships
The association between two or more entities is called a relationship. In our University
database, each student studies several Modules and each Lecturer teaches several Students. Here
the entity types Student - Modules and Lecturer - Students have a relationship. The Verbs most
often describe relationships between entities. Identify the verbs(relationships) in our University
database example:
A university consists of a number of departments. Each department offers several courses. Each
course includes a number of modules. Students enroll in a particular course and study modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate a
department, and each lecturer teaches a group of students.
Each relationship has a name, a set of entities that participate in it, a degree and a cardinality
ratio. The degree is the number of entities that participate in that relationship(most have degree
2, For example in figure 2.3 each Lecturer teaches several Students, so we can say that this
relationship has degree 2. Here the degree is 2 because it has two entities related to it).
Relationships in an ER diagram
The name of the relationship is given in a diamond box (For example Belongs to as
shown in Figure 5.1).
Cardinality Ratio
Each student belongs to one University. We can illustrate this ratio by writing ones on
the lines indicating the relationship as shown in Figure 2.5.
A lecturer teaches many students, and this One to Many relationship is illustrated in
figure 2.7.
Each student takes many modules, and each module is taken by many students as shown in
Till now we have seen how to identify the basic elements in an ER Diagram. Finally, to make an
E/R model you need to identify:
Entities
Attributes
Relationships
Cardinality ratios
Now lets see how an ER model will look like when all these elements are put together. The final
ER Model of our University database is shown in the Figure 2.10. In this figure we have shown
the entities and the relationship between the entities which depict the complete ER model of a
University. Here Department, Course, Module, Lecturer and Student are the entities.
The relationships in the Figure 2.10 are defined as Department Offers many Courses and those
two entities have One to Many relationship. A Department Assigns Many Lecturers(One(1) To
Many(n)). Each Lecturer teaches Many Students(One(1) To Many(n)). Every Student takes
several Modules(Many(n) To Many(n)). Every Module includes Many Courses(Many(n) To
Many(n)). A Course is enrolled by Many Students(One(1) to Many(n)).
The complete ER Model for our University database will be as shown in the diagram below. It is
an Integrated ER model containing the Entities and Relationships for a University database.
Figure 2.10 : University ER Model
Summary
The database design technique that is used to organize tables in a manner that reduces
redundancy and dependency of data is called Normalization. It is the scientific process of
decomposing complex tables(Relations) into smaller and easily manageable tables. The use of
normalization is to accurately access data from database. Without normalization, database
systems can be inaccurate, redundant, slow and inefficient. They might not produce the data
that is expected. Listed below are the advantages of normalization.
Advantages
Edgar Codd invented the relational model and he proposed the theory of normalization with the
introduction of First Normal Form. He continued to extend the theory with Second and Third
Normal Forms. Later Edgar Codd joined with Raymond F. Boyce to develop the theory of
Boyce-Codd Normal Form(BCNF).
Theory of Normalization is still developing. For example, the discussions on 6th Normal Form
are in progress. However, in most practical applications normalization achieves its best in Third
Normal Form. The evolution of Normalization theories is illustrated below:
What is a KEY ?
A KEY is a value used to uniquely identify a row in a table. It could be a single column or a
combination of multiple columns.
Note: The columns in a table that are NOT used to uniquely identify a record or row in a
table are called non-key columns.
A primary key is a single column value that is used to uniquely identify a database record.
Example:
The table below contains the details of students. Here studentId is Primary Key which is used to
uniquely identify the details of a student from the table.
Figure 2.14 : Primary Key Illustration
Composite Key
If two or more columns are used to uniquely identify a record then combination of those
multiple columns constitutes a composite key.In the Student table given below, we have
StudentId, TestId and Mark. Here one student can take multiple tests and one test can be taken
by multiple students. In this case in order to uniquely identify the mark of a student in a test we
require both StudentId and TestId. This is a composite key.
Student Table
Table 2.1
Functional Dependency
In simple terms, functional dependency can be explained as follows. If you know one attribute
then you can get another attribute. Then both these attributes are said to be functionally
dependent. In the Student table given below, we can get the attribute 'Name' if you know the
attribute 'StudentId', then Name and StudentId are functionally dependent. Here we can say
StudentId is determinant and Name as dependent.
For example, let's consider the Student table given below. Table 2.2 stores student
details(StudentId, Name, Languages Known), student's department details (Dept_No,
Dept_Name) and lecturer details (LecturerInCharge, Designation) for Students.
In this approach, we keep repeating the languages known and department details data for all
the students in the same field. This is called an UnNormalized table. Instead of storing the
same data again and again, we could normalize the data and create related tables.
Let's see how we can normalize the table,create related tables and learn forms with the Student
table(which is not normalized):
Table 2.2
To move from unnormalized form to first normal form all multi-valued attributes (called
repeating groups) should be removed. The repeating groups nust be eliminated. All attributes
must be atomic.
Table 2.2 is not in 1NF since there are repeating groups (more than 1 value in a field). The
column "Languages Known" has(English, Hindi and Tamil) in the Row(Tuple)1 and (English
and Hindi) in the Row(Tuple) 2 .To satisfy 1NF we can create separate rows for each value in
Languages Known by duplicating the values in the remaining columns. Table 2.3 represents the
same.
1NF Rules
Partial functional dependencies must be removed. If two attributes of a table are combined to
form a composite key, then the non-key attributes of that table must depend on both the attributes
of the composite key. They must not depend on one of the attributes, which is the part of the
composite key.
2NF Rules
Partial dependency
It is the functional dependency on part of the primary key instead of the entire primary
key.
It is clear that we can't move forward to make our simple database in 2nd Normalization form
unless we partition the columns in Table 2.3. Here, assume that StudentId and Dept_No together
act as the key (Composite key). As per 2NF all non-key attributes must be dependent on whole
key.
Table 2.4
Department
Table 2.5
Languages
Table 2.6
A foreign key is a field in a table that matches the primary key column of another table.
The cross-reference tables can be achieved by Foreign Key.
Table 2.7
Figure 2.15 : Foreign Key
Foreign key refers primary key of another table. It helps to connect the two tables.
The foreign key ensures that a row in a table is mapped to a corresponding row in
another table.
Foreign key does not have to be unique; most often it is not unique.
Foreign Key
Why do you need a foreign key? Foreign key is required in RDBMS for the concept of
Referential Integrity.
Referential integrity
It is a concept used in database to ensure that there is consistency in table relationships. If one
table has a foreign key to another table, then the concept of referential integrity states that you
cannot add a record to the table that contains the foreign key unless there is a corresponding
record in the link/relationship with the other table.
For example, consider the Figure 2.16 given in the previous page, where Dept_No in the Student
table is foreign key of Dept_No in Department table. Here let's try to add a student with
StudentId as "103" and Dept_No as "D003" in Student table as shown below. But the entry for
Dept_No "D003" is not present in Department table which means we have added a student to a
department which does not exist. This leads to inconsistency of data across related tables. Hence
RDMS has the concept of referential integrity which does not allow to add a record to the table
that contains the foreign key unless there is a corresponding record in the table to which it is
linked.
Student
Table 2.8
Department
Table 2.9
Transitive functional dependencies
When changing a non-key column might cause any of the other non-key columns to change,
it is called transitive functional dependency. Attributes that are not a part of the key must
not depend on any non-key attribute.
Consider the table 2.9. Changing the non-key column Lecturer In Charge , may change
Designation. Here Dept_No acts as the key. All other columns are non-key attributes. As per
3NF non-key attributes should not be dependent on any other non-key attributes but 'Lecturer In
Charge' is dependent on 'Designation'. Both Lecturer In Charge and Designation are non-key
attributes. So it forms transitive dependency. So, to satisfy 3NF let's split the table in a short
while.
Third normal form (3NF) is the third step in database normalization and it builds on the
first (INF)and second normal forms(2NF).The Third Normal Form(3NF) states that all
column references in the referenced data that are not dependent on the primary key should
be removed. Another way of putting this statement is that only foreign key columns should be
used to reference another table, and the other columns from the parent table should not exist in
the reference table. The Second Normal form(2NF) covers in case of multi-column primary
keys. 3NF is meant to cover single column keys as mentioned in transitive functional
dependencies above.
3NF Rules
We need to divide our table if it has to be moved from second normal form(2NF) into Third
Normal form(3NF). In table 2.1 Dept_No acts as the key. All other columns are non-key
attributes. The non-key attributes should not be dependent on any other non-key attributes as per
third normal form. The 'Designation' is dependent on 'Lecturer In Charge' and these are non key
attributes in the Lecturer table explained. It forms transitive dependency. So, to satisfy 3NF split
the table as follows.
Student
Table 2.10
Department
Table 2.11
Lecturer
Table 2.12
Languages
Table 2.13
The example given above cannot be decomposed further to attain higher forms of normalization
because it is already normalized to the highest level.Normally only complex data bases would
need next levels of normalization.
2.3. JOINS
A join is a technique where records from two or more tables are retrieved through a single
SQL query and shown as a single output. As it forms a set, It can be saved as a table or used as
it is. A join is a means of combining columns from two tables by using values common to
both tables. It allows us to combine data from more than one table into a single result set. A
join condition is used in the WHERE clause of select, update and delete queries.
Note: The query will give results from two tables as Cartesian product(A Cartesian product is
defined as all possible combinations of rows in all tables). If join condition is omitted. The first
table's rows are joined with all rows of the second table. For example, if the first table has 30
rows and the second table has 10 rows, the result will be 30 * 10, or 300 rows. This query will
take a long time to execute.
Let's use the two tables below to explain the join conditions.
Table "Student"
Table 2.14
Table "Department"
Table 2.15
In the above example the column that is common between both the tables is Dept_No. Using
Dept_No,the Student and Department tables can be joined to combine data from both the tables
as shown below.
2.4. SUMMARY
2.5. WEBLIOGRAPHY
http://jcsites.juniata.edu/faculty/rhodes/dbms/ermodel.htm
http://www.phlonx.com/resources/nf3/
http://www.w3schools.com/sql/sql_join.asp