Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Session 4
Session 4 objectives
Review Questions
Lecture Notes
Introduction to ERD
P. P. Chen first introduced the entity-relationship model in 1976. It facilitated database
design and complemented the relational data model concepts. The E-R model is a
semantic model at the conceptual level and it captures meanings as well as structure.
E-R models are represented in an entity relationship diagram or ERD, which uses
graphical representations to model database components in terms of entities, attributes,
and relationships.
A particular occurrence of an entity is the entity instance. For example, each one of the
students who takes Database System class is an instance of the Student entity.
Entity type is a category of entity. The entity type forms the intension of the entity or the
permanent definition part, while all the entity instances that fulfill the definition at the
moment form the extension of the entity.
Entity set is a collection of entities of the same type. An entity set is represented in E- R
diagrams by a rectangle having the name of the entity inside.
Each entity has its properties and characteristics. A set of attributes can be used to
describe the defining properties or qualities of the entity. For example, the Student entity
will have attributes such as name, major, address, and GPA.
A domain is the set of permitted values for a given attribute. For example, the domain for
the student gender attribute consists of only two possible values, male and female.
An attribute may have null values if the value is unknown at the present time or is
undefined for a particular entity instance. For example, some students have not declared
their major yet, so the values for their major attribute are null.
On the other hand, a student may have more than one email addresses. So the email
address attribute of the student entity is a multi-valued attribute.
Page |3
An attribute may consist of other attributes. For example, the address of a student
includes street, city, state and zip code. We call attributes like address composite
attributes.
The diagram below shows two entities, Student and Course. Student has attributes of
Student_ID, First_Name, Last_Name, Major, GPA, SSN, and Email. Course has
attributes of Course_No, Description, and Cost.
A super key is an attribute or a set of attributes that uniquely identifies an entity. For
example, for the Student entity set, Student_ID is a super key because it can be used to
uniquely identify each student. On the other hand, First_Name plus Last_Name is not a
super key because you may have two or more students with the same name. If you have a
super key, then any set of attributes containing that super key is also a super key.
Therefore, the combination of student ID and first name/last name, is also a super key.
A candidate key is the minimal super key, one that does not contain extra attributes. For
example, Student_ID plus First_Name and Last_Name is a super key, but not a candidate
key because first name or last name is not necessary. Student_ID by itself can identify a
student. A key with more than one attribute is called composite key.
An entity may have several candidate keys. For example, student ID, social security
number, and email are all candidate keys for the student entity. However when
implementing the database, we usually choose one of the candidate keys as the normal
way of identifying entities and accessing records. This becomes the primary key. For
example, we use student ID as the primary key for student. Often, the other candidate
keys become alternate keys, whose unique values provide another method of accessing
records, in our example, social security number. For the Course entity, the primary key is
Course_No.
Page |4
The term secondary key usually means an attribute or set of attributes whose values, not
necessarily unique, are used as a means of accessing records. For the Student example,
the attribute set First_Name and Last_Name can be used as a secondary key. We can use
the name to help us find a student record if we do not know the student ID or social
security number.
Understanding relationships
Usually there is a natural business association, or relationship, exists between two or
more entities. For example, students take courses while faculty members teach courses.
Common properties of certain relationships can be defined as a relationship type and the
collection of relationships of that type forms the corresponding well defined relationship
set. The relationship set consists of relationship instances, or relationships that exist at a
given moment.
Relationships between two entities instances are called binary relationship set, for
example, the following diagram shows an “Enrolls” relationship between Student and
Course. The degree can also be ternary which links three entity sets, or N-nary which
links N entities.
A relationship set may have descriptive attributes that belong to the relationship rather
than to any of the entities involved. For example, the attribute Grade is a descriptive
attribute for the Enroll relationship set. The value shows the final grade after a student
takes a course.
The following three diagrams illustrate 1:1, 1:M, and M:N relationship.
There are cases that not all members of an entity set participate in a relationship. If every
member of an entity set must participate in a relationship, we call that total participation.
Partial participation refers to cases that some members of the entity set may not
participate in the relationship.
In a relationship, each entity has a function called its role in the relationship. It is
optional to name role of an entity, though it might be helpful in cases such as recursive
relationship or if multiple relationships exist between the same entity set. A recursive
relationship occurs when an entity set is related to itself.
If entity Y does not have its own candidate key, Y is called a weak entity, and entity X is
a strong entity. A weak entity may have a partial key, or a discriminator, that
Page |6
distinguishes instances of the weak entity that are related to the same strong entity. In the
following example, Building is a strong entity as its existence does not depend on other
entities and it has its own primary key. On the other hand, without a building, a room
would not exist so Room is a weak entity. Part of its primary key has to be derived from
the parent entity. Room_Number alone would not be unique enough as two different
building can both have a room with number 101. So in this case, the combination of
Building_Number and Room_Number is the primary key of the Room entity, while
Building_Number is derived from the Building entity.
Notations
In an ERD using Chen’s notation, a rectangle is used to represent an entity, an oval to
represent an attribute, and a diamond to represent a relationship. These elements are
connected by lines. A complete list of Chen’s notation symbols can be found below.
Page |7
There are other conventions and one of the most widely used is Crow’s Foot notation,
which is the notation being used in most of examples in this and earlier lectures. As in
Chen’s, an entity is represented by rectangle with entity’s name. Unlike Chen’s, Crow’s
Foot notation writes attribute in attribute box below entity rectangle. Relationships are
illustrated as a straight line. A relationship can have a name, usually a strong verb phrase,
written on the relationship line. Below is a list of symbols of Crow’s Foot notation.
Page |8
Although M:N relationships may properly be viewed in a relational database model at the
conceptual level, such relationships should not be implemented, because their existence
creates undesirable redundancies. Therefore, M:N relationships must be decomposed into
1:M relationships to fit into the ER framework. For example, if you were to develop an
ER model for a video rental store, you would note that tapes can be rented more than
once and that customers can rent more than one tape.
Also we have to keep in mind that newly arrived tapes that have just been entered into
inventory have not yet been rented and that some tapes may never be rented at all if there
is no demand for them. Therefore, CUSTOMER is optional to TAPE. Assuming that the
video store only rents videos and that a CUSTOMER entry will not exist unless a person
coming into the video store actually rents that first tape, TAPE is mandatory to
CUSTOMER. On the other hand, if the store has other services, such as selling movies or
games, then a CUSTOMER entry could exist without having rented a video. In which
case, a TAPE is optional to CUSTOMER. Note that this discussion includes a very brief
description of the video store's operations and some business rules. The relationship
between customers and tapes would thus be M:N, as shown in Figure 4.7.
P a g e | 10
Chen Model
M N
CUSTOMER rents TAPE
rents
CUSTOMER TAPE
The M:N relationship depicted in Figure 4.7 must be broken up into two 1:M
relationships through the use of a bridge entity, also known as a composite entity. The
composite entity, named RENTAL in the example shown in Figure 4.8, must include at
least the primary key components (CUS_NUM and TAPE_CODE) of the two entities it
bridges, because the RENTAL entity’s foreign keys must point to the primary keys of the
two entities CUSTOMER and TAPE.
RENTAL and CUSTOMER and between RENTAL and TAPE. Because this
composite PK was not used, it is a candidate key.
• In this case, the designer made the decision to use a single-attribute PK rather
than a composite PK. Note that the RENTAL entity uses the PK RENT_NUM. It
is useful to point out that single-attribute PKs are usually more desirable than
composite PKs, especially when relationships must be established between the
RENTAL and some as yet unidentified entity. (You cannot use a composite PK as
a foreign key in a related entity!) In addition, a composite PK makes queries less
efficient.
• Note the placement of the optional symbols. Because a tape that is never rented
will never show up in the RENTAL entity, RENTAL has become optional to TAPE.
That's why the optional symbol has migrated from CUSTOMER to the opposite
side of RENTAL. Also, note the addition of a few attributes in each of the three
entities to make it easier to see what is being tracked.
• Because the M:N relationship has now been decomposed into two 1:M
relationships, the ER model shown in Figure 4.8 can be implemented. However,
“implementable” is not necessarily synonymous with “practical” or “useful.”
(We’ll modify the ERD in Figure 4.8 after some additional discussion.)
• Therefore, the relationships between CUSTOMER and RENTAL and between
TAPE and RENTAL are read as:
CUSTOMER generates RENTAL
TAPE enters RENTAL
• The dashed relationship lines indicate weak relationships. In this case, the
RENTAL entity’s primary key is RENT_NUM, and this PK did not use any
attribute from the CUSTOMER and TAPE entities.
The (implied) cardinalities in Figure 4.8 reflect the rental transactions. Each rental
transaction, i.e., each record in the RENTAL table, will reference one and only one
customer and one and only one tape. The (simplified!) implementation of this model may
thus yield the sample data shown in the database in Figure 4.9. The database's relational
diagram is shown in Figure 4.10.
P a g e | 12
The relational diagram that corresponds to the design in Figure 4.8 is shown in Figure
4.10.
The database’s TAPE and RENTAL tables contain some attributes that merit additional
discussion.
• The TAPE_CODE attribute values include a “trailer” after the dash. For example,
note that the third record in the TAPE table has a PK value of R2345-2. The
“trailer” indicates the tape copy. For example, the “-2” trailer in the PK value
R2345-2 indicates the second copy of the “Once Upon a Midnight Breezy” tape.
So why include a separate TAPE_COPY attribute? This decision was made to
make it easier to generate queries that make use of the tape copy value. (It’s much
more difficult to use a right-string function to “strip” the tape copy value than
simply using the TAPE_COPY value. And “simple” usually translates into “fast”
in a query environment; “fast” is a good thing!
• The RENTAL table uses two dates: RENT_DATE_OUT and
RENT_DATE_RETURN. This decision leaves the RENT_DATE_RETURN
value null until the tape is returned. Naturally, such nulls can be avoided by
creating an additional table in which the return date is not a named attribute. Note
the following few check-in and check-out transactions:
RENT_NUM TRANS_DATE
TRANS_TYPE
10050 Checked-out 10-Jan-2010
10050 Returned 11-Jan-2010
10051 Checked-out 10-Jan-2010
10051 Returned 11-Jan-2010
10052 Checked-out 11-Jan-2010
10053 Returned 10-Jan-2010
….. …………… ……………
….. …………… ……………
Additional References
Understanding relationships
Database Design