Sei sulla pagina 1di 87

Database Management

System

This book belongs to

Name :

Batch :

SQL Star International Ltd.


Copyright © 2009
First Edition

SQL STAR INTERNATIONAL LIMITED

SQL STAR HOUSE, No. 8-2-293/174/A 25, Road No.


14, Banjara Hills, Hyderabad - 500 033.
Tel. No. 91- 40-23101600(30 lines)
Fax No. 23101663
Toll Free No: 1800 425 2944

Email: knowledge.services@sqlstar.com

No part of this publication may be reproduced (incl. photocopying) in any way, without prior
agreement and written permission of SQL Star International Ltd., Hyderabad. SQL Star
International Ltd., Hyderabad assumes no responsibility for its use, nor for any infringements
of patents or other rights of third parties which could result.
Contents

Chapter Page No

Chapter 1: Database Management System (DBMS) 1-17

Chapter 2: Introduction to Relational Databases (RDBMS) 18-24

Chapter 3: Conceptual Design using Entity-Relationship Model 25-44

Chapter 4: Schema Refinement and Normalization 45-58

Chapter 5: Supertypes and Subtypes 59-73

Chapter 6: Exercises 74-84


Database Management Systems (DBMS)

Chapter 1

Database Management System (DBMS)

Database
File Systems and Associated Problems
Benefits of Database Approach
Database Mangement System
DBMS Functions
Database System
Users
Functions of a Database Adminsitrator (DBA)
Components of DBMS
Data Model
Database Architecture
Schema
Types of Database Models

© SQL Star International Ltd. 1


Database Management Systems (DBMS)

Objectives

In this chapter, we will discuss:


• What is a Database?
• File System Vs Database Approach
• Benefits of database approach
• What is DBMS
• Various functions of a DBMS
• Database system
• Role of a DBA
• Components of DBMS
• Data Model and their types
• Database Architecture
• Types of Database Models

© SQL Star International Ltd. 2


Database Management Systems (DBMS)

Data

Data are known facts that can be recorded and that have implicit meaning.

Database
Database is a logical collection of relevant data. It is designed to offer an organized
mechanism for storing, managing and retrieving stored information. A ledger, a
telephone directory or an address book can be called a database because they all
store related data in a structured way.

Traditionally, data accessed through computers has been stored on different storage
media in the form of individual files. Files proved to be quite satisfactory as long as
computerization was limited to a few application areas and the use of computers
restricted to a privileged few. However, as actual users grew in number, especially
with the advent of online time-sharing systems, the file systems gave rise to many
serious problems. The discipline of database systems evolved in response to these
problems. Let us first consider what these problems are so as to understand the
different features of database systems more clearly.

File Systems and the Associated Problems


Most data processing systems in existence today, especially in India, use files for
storing, accessing and manipulating data. Files are stored typically on magnetic
tapes and disks.
Most of the problems with files arise out of the fact that files are specific to an
application, e.g., a set of files may be designed for the sales analysis system of a
company.
Programs of the same application system can use these files. However, if some
other new application needs source data from this system, there may be difficulties.
Therefore, in many cases, new files, with considerable data in common with the
existing files, may have to be designed for the new applications.
Therefore, as applications proliferate, the total number of computerized files grows
considerably. Also, as the number of actual users of the computer grows, the number
of applications increases, in turn resulting in an increase in the number of files.
A large number of files give rise to the following problems:
• Files involve a high level of redundancy in data. As we have mentioned above,
proliferation of files results in the same data item being stored at many
different places.
• Redundancy in data often results in inconsistency. The same data item being
used by different applications may exist in different versions. What is worse,
it may exist in different stages of update at different places and thus may
have different values. This may ultimately result in inconsistencies amongst
reports generated by the two application systems.

© SQL Star International Ltd. 3


Database Management Systems (DBMS)

• Individual files are not amendable to rapid changes, especially with respect to
the way the data items are structured within the file. If an application wants
data from an existing file but structured differently, it cannot be provided
quickly and easily. For this purpose, either conversion programs have to be
written or new files have to be created.
• Because of the inflexibility of files, many ad hoc queries cannot be answered.
• Yet, another consequence of the inflexibility of files is that it is usually
expensive to make changes to a file system. It is also a very slow process. It
may even involve modification of application programs.
• What is worse, modification in one program may require modifications in
other programs, which interface with this program. This process may set off a
chain reaction of modifications.

The above problems give rise to further difficulties detailed below:


• The Management Information Systems (MIS) finds it difficult to control data,
especially when the actual users develop applications on their own.
• Major changes required by the system while modifying files increase the
maintenance load on Data Processing (DP) professionals substantially, thus
making them unavailable for development of new systems.
• High-level data redundancy entails repetitive data entry and redundant
storage with the accompanying costs.

As has been already stated, Database Systems provide an effective solution to the
above problems. Let us see how.
You have just seen, files give rise to several problems because they are application
specific. Consequently, the applications become data-dependent, that is, they
depend upon the organization and access method for the data on the secondary
storage. This happens because, with conventional application development tools such
as COBOL, the application logic incorporates the knowledge of data organization and
access methods. Therefore, most changes in data organization or access methods
affect the application logic substantially. If problems arising due to this fact are to be
avoided, the data organization (and the access method) and the application logic
have to be made independent of each other. Database Systems do precisely this.
The first step towards this goal is to distinguish between data as is actually stored
(called as the physical representation of data) and data as is presented to an
individual user (called as the logical representation of data).

Physical Representation of Data


The smallest named unit of data physically stored in the database is known as a
“stored field”, and a named collection of associated stored fields is known as a
“stored or physical record”. The named collection of all occurrences of one type of
physical record is known as a “stored or a physical file”. This concept will be clearer
after we discuss the logical representation of data.

© SQL Star International Ltd. 4


Database Management Systems (DBMS)

Logical Representation of Data


A logical field, record or file is, the field, record or file as it appears to the users, that
is, as it is defined in the user’s application programs. In all traditional systems, the
logical and physical data are practically the same, which is the root cause of all the
major problems with traditional files. This is not the case with database systems.
Similarly, the structure of stored and logical records can be different. A logical record
type may be obtained by selectively combining fields from different stored records.
The logical and physical views of a file could also be different in terms of, say, the
key fields for sequencing the records in the file.
With such a separation of the logical and physical data, the database can be modified
and developed without affecting existing applications. The database architecture
achieves this separation.
Finally, the Library DBMS must have mechanisms to handle system failure (e.g.,
failure of power, disk crash, etc.) so that the database can be recovered to a
consistent state.

Benefits of the Database Approach


• Redundancy can be reduced - Because of the relational approach towards
data organization, data is not stored in more than one location. Repetition of
information is also avoided.

• Inconsistency can be avoided – With the usage of database, it is assured


that all the users access a true picture of information present in the database.

• Data can be shared - Multiple users can login into the database to access
information and each of them are granted access to the database. They can
manipulate the database in a controlled environment.

• With a centralized control of data, the database system may be designed


for an overall optimal performance from the viewpoint of the entire
organization.

• Standards can be enforced - Standards can be enforced on the database to


regulate the access to the database.

• Security restrictions can be applied - Security is the process of limiting


actual access to the database server itself. It is the most important angle of
security and needs to be carefully planned.

• Integrity can be maintained – Through integrity, one can ensure only


accurate data is stored within the database.

• Data independence can be provided - None of the users need to know the
technical aspects of the database to access it. They are physically, as well as
logically, independent to access the database.

© SQL Star International Ltd. 5


Database Management Systems (DBMS)

• New applications may be developed using the existing database.

Database Management System


It is a computerized record-keeping system.

Modern day Computer-based Information Systems (IS) are capable of serving a


variety of complex tasks in a coordinated manner. Such systems handle large
volumes of data, multiple users and several applications for activities occurring in a
central and/or distributed environment.

The heart of an IS is Database Management. This is because most IS have to


handle massive amounts of data. This core module of an IS is called as Database
Management System (DBMS). A DBMS provides for storage, retrieval and
updation of data in an organized manner.

1. User requests data item

2. DBMS intercepts and interprets the request

3. Retrieves the data from the physical database

4. Constructs the record using physical/conceptual mapping

5. Records constructed using relevant conceptual/external mapping.

6. Derives the required external record from conceptual record.

An example: Consider the situation in a library. Here, we have data corresponding


to books, authors, suppliers, borrowers, etc. The total volume of data stored and
handled in a library may be quite large. The Library DBMS may require several
operations, such as issue, return or purchase of books; handle queries relating to
book information, borrowing information, etc. Moreover, there are different types of
users who operate various stages or activities. For instance, a borrower may merely
view certain information, whereas an issuer may be allowed to update the status of a
book during issue or return. The Library staff may, on the other hand, add new
books, their supplier, price and other information to the database. Each user
category has a different access right on both, the data, as well as the processing
capabilities. Multiple users may concurrently operate the Library DBMS performing
several tasks at the same time. They may even try to access the same data
simultaneously. It is the job of a DBMS to handle the data and its processing in an
integrated, coordinated and consistent manner.

DBMS Functions
• Data Definition - Database allows us to define our own data in a simpler
possible way.

© SQL Star International Ltd. 6


Database Management Systems (DBMS)

• Data Manipulation - Database allows us to manipulate i.e., insert, update


and delete information.
• Data Security and Integrity - Database allows us to secure the data and a
true picture of the data is given to users accessing it.
• Data Recovery and Concurrency - We can always get back to a previously
defined consistent state of the database in case of a crash; and multiple users
can still access the database.
• Data Dictionary - Database maintains Meta information in its dictionaries.
This will help database identify information on behalf of the user queries.
• Performance - Performance of the database is maintained irrespective of the
load it takes in terms of number of users accessing the database.

Database System
A DBMS is a complex piece of software that usually consists of a number of modules.
It may be considered as an agent that allows communication between the various
types of users with the physical database and the operating system without the users
being aware of every detail of how it is done. To enable the DBMS to fulfill its tasks,
the database management system must maintain information about the data itself
that is stored in the system. This information would normally include what data is
stored, how it is stored, who has access to what parts of it, and so on.

The information about the data in a database is called the metadata (data about
data). In addition to information listed above, some information regarding the use of
a database is often collected to monitor the system's performance. This metadata
helps management in maintaining an effective and efficient database system.

Users

Application Programs/Queries

Software to Access Stored

Software to Process Programs/Queries

Meta-Data Stored Database

© SQL Star International Ltd. 7


Database Management Systems (DBMS)

Users
The three broad classes of users are as follows:

• Application programmers - Responsible for writing application programs


that use the database.
• End users - Interact with the system from workstations or terminals. A given
end user can access the database via one of the applications, or can use an
interface provided as an integral part of the database system software (such
interfaces are also supported by means of applications, of course, but those
applications are built-in, not user-written, e.g., query language processor)
• Database Administrator (DBA) - Creates the actual database and
implements technical controls needed to enforce various policy decisions. The
DBA is also responsible for ensuring that the system operates with adequate
performance and for providing a variety of other related technical services.

Functions of Database Administrator (DBA)

The database administrator is responsible for the overall planning of the company’s
data resources, for the design of data, and for the day-to-day operational aspects of
data management.
The overall planning of corporate data is the strategic aspect of the database
administration function and involves company-wide planning of existing data and
assessment of organization-wise data standards.

Some of the design aspects of database administration work are:

• Deciding on the storage structures and access methods

• Selecting database software and hardware

• Designing restart and recovery procedures to take care of system outages or


crashes

• Designing means of reconstructing data in the event of abnormal loss of the


same

• Designing schema

• Designing the means of reorganizing or tuning databases periodically

• Designing database searching strategies

© SQL Star International Ltd. 8


Database Management Systems (DBMS)

• Designing authorization checks and validation procedures

• Specifying techniques for monitoring database performance

The operations management of database administration deals with data problems


arising on a day-to-day basis. Specifically, the responsibilities include:

• Investigation of errors found in the data


• Supervision of restart and recovery procedures in the event of a failure
• Supervision of reorganization of databases
• Initiation and control of all periodic dumps of data.

In addition, this aspect of database administration includes maintenance of data


security, which involves maintaining security authorization tables, conducting
periodic security audits, investigating all known security breaches.

To carry out all these functions, it is crucial that the DBA has all the accurate
information about the company’s data readily on hand. For this purpose he maintains
a data dictionary. The data dictionary contains definitions of all data items and
structures, the various schemes, the relevant authorization and validation checks
and the different mapping definitions. It should also have information about the
source and destination of a data item and the flow of a data item as it is used by a
system. This type of information is a great help to the DBA in maintaining centralized
control of data.

Components of DBMS
The main components of DBMS are:

A Query Language and a Data Description Language (DDL) to provide users the
access to the database.

Query processor - translates statements in a query language (or DML) into


low-level instructions that the DB manager understands.

Database manager - provides interface between the low level data stored in the
database and the application programs and queries submitted to the system.

File manager - manages the allocation of space on disk storage and the data
structures used to represent information stored on the disk.

The physical database – This is the data collection on drives.

The metadata – This describes the data for database to provide access to it for
users.

© SQL Star International Ltd. 9


Database Management Systems (DBMS)

The above listing of DBMS components is not exhaustive, and also includes some
very important components like concurrency controller and recovery manager. These
components have not been shown (to keep the architecture relatively simple).

Data Model
One fundamental characteristic of the database approach is that it provides some
level of data abstraction by hiding details of data storage that are not needed by
most database users. A data model is the main tool for providing this abstraction.

A data model is a set of concepts that can be used to describe the structure of a
database. It is a collection of high-level data description constructs that hide many
low-level storage details.

Categories of Data Models


Many data models have been proposed. We can categorize data models based on the
types of concepts they provide to describe the data structure.

High Level or conceptual data models: Provide concepts that are close to the
way many users perceive data. Use concepts, such as entities, attributes and
relationships, where:
• Entity represents a real world object (e.g., student, employee) or concepts
(e.g., course, company);
• Attribute represents properties that describes objects (e.g., color, name);
• Relationships represent an interaction or links among entities (e.g., works-on, is-
a, has, etc.).

© SQL Star International Ltd. 10


Database Management Systems (DBMS)

Low-level or physical data models:


Provide concepts that describe the details of how data is stored in the computer.
Concepts provided by low-level data models are generally meant for computer
specialists, not for typical end users. They represent information, such as record
formats, record orderings and access paths (structure that makes the search for
particular database records efficient, i.e. Indexing).

Representational or implementation data models: Between above two extremes


is a class of representational (or implementation) data models, which provide
concepts that may be understood by end users, but that are not too far removed
from the way data is organized within the computer. Representational data models
hide some details of data storage, but can be implemented on a computer system in
a direct way.

The three important characteristics of the database approach are:

(a) Insulation of programs and data (program-data and program-operation


independence).
(b) Support of multiple user views.
(b) Use of a catalog to store database description.

The three-schema architecture was proposed to achieve these characteristics.

Architecture
The goal of the three-schema architecture is to separate the user, applications and
the physical database. The three levels of architecture are:

Internal Level
The internal level is the one closest to the physical storage, i.e., it is the one
concerned with the way data is physically stored. The internal (or physical) database
is stored on secondary storage devices, mainly the magnetic disk. It itself can be
conceptually viewed at different levels of abstraction.

At its lowest level, it is stored in the form of bits with the associated physical
addresses on the secondary storage device.

At its highest level, it can be viewed in the form of files and simple data structures. It
is this level that we shall study when we discuss the physical organization for
databases in later chapters.
The physical database is described by means of a physical schema or an internal
schema. It essentially describes the various types of stored records, the different
indexes that are employed for accessing these and the representations for different
stored fields. It is also called as the “storage structure definition”.

© SQL Star International Ltd. 11


Database Management Systems (DBMS)

External Level
The external level is the one closest to the user, i.e., it is the one concerned with
the way data is viewed by individual users. The external model (or view) is
application-specific. Therefore, the user views the database through an external
model, and there are as many external views as there are applications. External
views are the proper interface between the user and the database as an individual
user can hardly be expected to be interested in the entire database.

Generally, an external model consists of multiple occurrences of multiple types of


external record. An example of an external record is the record of a file as defined in
the data division of COBOL program.

Each external model is defined by means of an external schema, which describes


each external record type in the external model.

The external model is derived from the conceptual model. For this purpose, the
correspondence between the particular external models has to be defined. An
external/conceptual mapping similar to the conceptual/physical mapping does this.
However, a separate mapping has to be defined for each external view.

An explicit definition of the mapping should be documented, preferably in the


corresponding external schema.

The user interacts with the database through a high-level language such as COBOL,
PL/I or some special purpose language. This language is known as the host
language. It includes a data sub-language (DSL). The user carries out the retrieval
and storage operations on the database through the DSL.

In fact, the Database Task Group (DBTG) report published in April 1971, contains
proposals for three distinct languages, two of which relate very closely to the concept
of a DSL. These are sub schema Data Description Language (DDL) and Data
Manipulation Language (DML). The sub schema DDL is used for defining the external
views while the DML is used for carrying out operations on the database.

In addition to the languages, the user is also supposed to be provided with a


workspace. This workspace is an area meant for receiving or transmitting all data
transferred between the user and the database. This is simply the input-output area
for a program.

Conceptual Level
The conceptual level is a level of indirection between the other two. The conceptual
model, also called as the data model, represents information content of the database
in its entirety, but is abstract with respect to the physical database. Broadly
speaking, the conceptual model provides a view of the data as it really is.

This model consists of multiple occurrences of multiple types of a conceptual record.


A conceptual record represents relevant information content only. In this sense, it is
much closer to the external record than a stored record is.

© SQL Star International Ltd. 12


Database Management Systems (DBMS)

However, it is not the same as the external record. It contains all the information to
build relevant external records. For example, a conceptual stock record may consist
of the quantity of material and the buying rate but not its value; still the user’s
external record may consist of the value of the stock. A conceptual model may
consist of occurrences of such stock records, a collection of supplier record
occurrences and a collection of assembly records.
Obviously, the conceptual model is derived from the physical model. For this, the
database needs a conceptual/physical mapping, which specifies how conceptual
records and fields map into their counterparts in the physical database. The
conceptual database is described by means of a conceptual schema. Needless to say,
the conceptual schema is independent of the physical characteristics of data, such as
storage structures, physical sequences, stored field representations etc. Ideally the
conceptual schema should include many features in addition to just the definitions of
conceptual records. These may include relevant authorization checks and validation
procedures, the uses of data, the source and destination of data etc.
The conceptual database is a real-world view of data from the organization point of
view. As the real world changes, changes have to be made to the conceptual
database and schema as well. In such a case, it is usually possible to limit the
corresponding changes to only those external schemas, which use the conceptual
elements that are changed.
There will be many distinct external views, each consisting of a more or less abstract
representation of some portion of the total database, and there will be one
conceptual view, consisting of a similarly abstract representation of the database in
its entirety. Likewise, there will be precisely one internal view, representing the total
database as physically stored.
External level
(Individual
user views)

Conceptual Level
(Community user
view)

Internal level
(Storage view)

Database

© SQL Star International Ltd. 13


Database Management Systems (DBMS)

Database Architecture

An example for the three levels is as shown:

External View 1 External View 2

SNo Lname BranchNo SNo FName LName Age Salary

Conceptual Model

SNo FName LName Age Salary BranchNo

Internal View

Create table STAFF


{
Sno number(3),
Fname varchar2(20),
Lname varchar2(20),
Age number,
Salary Number(8,2),
BranchNo number(6)
};

Schema
A description of data in terms of a data model is called a schema. The description of
a database is called database schema, which is specified during database design
and is not expected to change frequently.

The Internal View/ Schema:


The internal view (or stored database) is a low-level representation of the entire
database. The internal view is defined by the internal schema, which defines the
various stored record types and specified what indexes exist, how stored fields are
represented, what physical sequence the stored records are in, and so on.

The Conceptual View / Schema:

© SQL Star International Ltd. 14


Database Management Systems (DBMS)

The conceptual view is a representation of the entire content of the database, in a


form that is more or less abstract in comparison with the way in which the data is

physically stored. The conceptual view is defined by means of the conceptual


schema, which includes definitions of each of the various conceptual record types.

The External View / Schema:


Each external view is defined by means of an external schema. External schema
consists of definitions of each of the various external record types in that external
view.There must be a definition of the mapping between the external schema and
the underlying conceptual schema.

Data Independence
Data independence refers to changing the schema at one level of a database system
without the need to change the schema at the next higher level.

The three-level database architecture allows a clear separation of the information


meaning (conceptual view) from the external data representation and from the
physical data structure layout.
A database system that is able to separate the three different views of data is likely
to be flexible and adaptable. This flexibility and adaptability is data independence.
Physical data independence: The separation of the conceptual view from the
internal view enables us to provide a logical description of the database without the
need to specify physical structures. This is often called physical data independence.

Logical data independence: Separating the external views from the conceptual
view enables us to change the conceptual view without affecting the external views.
This separation is sometimes called logical data independence.

Types of Database Models


The most well known record-based models are the hierarchical model, the network
model and the relational model.

Hierarchical Model: This model represents data as a hierarchical tree. It is a


special kind of a network model in which the relationship is essentially a tree-like
structure, where one parent may have many children, but one child cannot have
more than one parent.
The relationship borrower to books in a library system satisfies this condition. One of
the popular DBMS based on hierarchical model is Information Management System
(IMS) from IBM.

© SQL Star International Ltd. 15


Database Management Systems (DBMS)

Hierarchical Model

Network model: This model represents data as record types. Here, we have
explicit linkages (expressed in the form of pointers), which relate various records.
Each record has a link field corresponding to every relationship that it participates in.
IDS (Integrated Data Store) is one of the DBMS product based on network models.

Network model
Relational model: In this model, each database item is viewed as a record with
attributes. A set of records with similar attributes is called a table. Most of the
popular commercial DBMS products like Oracle, Sybase, MySQL, etc. are based on
relational model.

Relational model
Object Relational model

Hierarchical, network and relational database models have been quite successful in
storing data for traditional business applications. But, object oriented databases
evolved to handle more complex applications such as databases for scientific
experiments, geographic information system, engineering design and manufacturing.
An object oriented database stores data, their relationships and the way they interact
with other data. This model draws its concept from real world objects. As compared
to the relational database approach, which deals with data at the lowest level, that
is, columns and rows, the object oriented approach deals with data at a higher level,
that is, with the objects surrounding the data. This model represents DB in terms of
objects, their attributes and their behaviors.

© SQL Star International Ltd. 16


Database Management Systems (DBMS)

Summary

In this chapter, we have discussed:


• Describe a database
• Understand File System Vs Database Approach
• List the benefits of database approach
• Describe DBMS
• Describe various functions of a DBMS
• Database system
• Components of DBMS
• Data Model and their types
• Describe Database Architecture
• List the Role of DBA
• Types of Database Models

© SQL Star International Ltd. 17


Introduction to Relational Databases (RDBMS)

Chapter 2

Introduction to Relational Databases (RDBMS)

Evolution of RDBMS
What is a Relational Database?
What is a RDBMS?
Features of RDBMS
Basic Relational Database Terminology
Keys and their Use
Referential Integrity

© SQL Star International Ltd. 18


Introduction to Relational Databases (RDBMS)

Objectives

In this chapter, we will discuss:

• Evolution of RDBMS
• Relational Database
• Relational Database Management System (RDBMS)
• Features of an RDBMS
• Important terms related to RDBMS
• Different types of keys and their use
• Explain referential integrity

© SQL Star International Ltd. 19


Introduction to Relational Databases (RDBMS)

RDBMS
Dr. E.F.Codd outlined the principles of the relational model, which formed the basis
for the evolution of the Relational Database Management System.
A Relational Database Management System is defined as a collection of tables
related to each other through common values.

Evolution of RDBMS
Before the acceptance of Codd’s Relational Model, database management systems
was just an ad hoc collection of data designed to solve a particular type of problem,
later extended to solve more basic purposes. This led to complex systems, which
were difficult to understand, install, maintain and use. These database systems were
plagued with the following problems:

• They required large budgets and staffs of people with special skills that were
in short supply.

• Database administrators’ staff and application developers required prior


preparation to access these database systems.

• End-user access to the data was rarely provided.

• These database systems did not support the implementation of business logic
as a DBMS responsibility.

Hence, the objective of developing a relational model was to address each and every
one of the shortcomings that plagued those systems that existed at the end of the
1960s decade, and make DBMS products more widely appealing to all kinds of users.
The existing relational database management systems offer powerful, yet simple
solutions for a wide variety of commercial and scientific application problems. Almost
every industry uses relational systems to store, update and retrieve data for
operational, transaction, as well as decision support systems.

What is a Relational Database?


A relational database is a database system in which the database is organized and
accessed according to the relationships between data items without the need for any
consideration of physical orientation and relationship. Relationships between data
items are expressed by means of tables.

It is a tool, which can help you store, manage and disseminate information of various
kinds. It is a collection of objects, tables, queries, forms, reports, and macros, all
stored in a computer program all of which are inter-related.
It is a method of structuring data in the form of records, so that relations between
different entities and attributes can be used for data access and transformation.

© SQL Star International Ltd. 20


Introduction to Relational Databases (RDBMS)

What is a Relational Database Management System?


A Relational Database Management System (RDBMS) is a system, which allows us to
perceive data as tables (and nothing but tables), and operators necessary to
manipulate that data are at the user’s disposal.

Features of an RDBMS
The features of a relational database are as follows:
y The ability to create multiple relations (tables) and enter data into them
y An interactive query language
y Retrieval of information stored in more than one table
y Provides a Catalog or Dictionary, which itself consists of tables ( called
system tables )

Basic Relational Database Terminology

Catalog:

A catalog consists of all the information of the various schemas (external, conceptual
and internal) and also all of the corresponding mappings (external/conceptual,
conceptual/internal).

It contains detailed information regarding the various objects that are of interest to
the system itself; e.g., tables, views, indexes, users, integrity rules, security rules,
etc.

In a relational database, the entities of the ERD are represented as tables and their
attributes as the columns of their respective tables in a database schema.

It includes some important terms, such as:

• Table: Tables are the basic storage structures of a database where data about
something in the real world is stored. It is also called a relation or an entity.

• Row: Rows represent collection of data required for a particular entity. In


order to identify each row as unique there should be a unique identifier called
the primary key, which allows no duplicate rows. For example in a library
every member is unique and hence is given a membership number, which
uniquely identifies each member. A row is also called a record or a tuple.

• Column: Columns represent characteristics or attributes of an entity. Each


attribute maps onto a column of a table. Hence, a column is also known as an
attribute.

© SQL Star International Ltd. 21


Introduction to Relational Databases (RDBMS)

• Relationship: Relationships represent a logical link between two tables. A


relationship is depicted by a foreign key column.

y Degree: number of attributes

y Cardinality: number of tuples

y An attribute of an entity has a particular value. The set of possible values


That a given attribute can have is called its domain.

For example, the set of values that the attribute EMPLOYEE.id can assume

are a positive integer of 5 digits.

Keys and Their Use


Key: An attribute or set of attributes whose values uniquely identify each entity in
an entity set is called a key for that entity set.

Super Key: If we add additional attributes to a key, the resulting combination


would still uniquely identify an instance of the entity set. Such augmented keys are
called super keys.

Primary Key: It is a minimum super key.

It is a unique identifier for the table (a column or a column combination with the
property that at any given time no two rows of the table contain the same value in
that column or column combination).

Candidate Key: There may be two or more attributes or combinations of attributes


that uniquely identify an instance of an entity set. These attributes or combinations
of attributes are called candidate keys.

In such a case, we must decide which of the candidate keys will be used as the
primary key. The remaining candidate keys would be considered alternate keys.

Secondary Key: A secondary key is an attribute or combination of attributes that


may not be a candidate key, but that classifies the entity set on a particular
characteristic.

A case in point is the entity set EMPLOYEE having the attribute department, which
identifies by its value all instances EMPLOYEE who belong to a given department.

Any key consisting of a single attribute is called a simple key, while that consisting
of a combination of attributes is called a composite key.

© SQL Star International Ltd. 22


Introduction to Relational Databases (RDBMS)

Referential Integrity
Referential Integrity can be defined as an integrity constraint that specifies that the
value (or existence) of an attribute in one relation depend on the value (or
existence) of an attribute in the same or another relation.

Referential integrity in a relational database is consistency between coupled tables. It


is usually enforced by the combination of a primary key and a foreign key.

For referential integrity to hold, any field in a table that is declared a foreign key can
contain only values from a parent table's primary key field. For instance, deleting a
record that contains a value referred to by a foreign key in another table would break
referential integrity.

Primary Key
Course

Course code Course Name

E01 ELECTRONICS
M02 MATHS
A03 ACCOUNTS
B04 BIOLOGY

Foreign Key

Student

Student No Name Course code

101 Annie B04


102 Julie E01
103 Rita A03

© SQL Star International Ltd. 23


Introduction to Relational Databases (RDBMS)

Summary

In this chapter, we have discussed:

• Evolution of RDBMS
• Relational Database
• Relational Database Management System (RDBMS)
• Features of an RDBMS
• Important terms related to RDBMS
• Different types of keys and their use
• Referential Integrity

© SQL Star International Ltd. 24


Conceptual Design Using the Entity – Relationship Model

Chapter 3

Conceptual Design Using the Entity- Relationship


Model

Overview of Database Design


E-R Modeling
Degree of Relationship
Cardinality
Keys
E-R Model example
Constraints on E-R Model
ISA Hierarchies
Aggregation
Conceptual Design using E-R Model
Constraints beyond E-R Model

© SQL Star International Ltd. 25


Conceptual Design Using the Entity – Relationship Model

Objectives

In this chapter, we will discuss:

• Process of designing a database


• List the components of an E-R model
• Drawing E-R diagrams
• Designing E-R Diagrams with key constraints
• Aggregation
• Conceptual Design using the ER Model
W Constraints beyond the E-R Model

© SQL Star International Ltd. 26


Conceptual Design Using the Entity – Relationship Model

Overview of Database Design

The database design process comprises the following steps:

y Requirement Analysis
y Conceptual Design (ER Model is used at this stage)
y Schema Refinement (Normalization)
y Logical Design
y Physical Database Design and Tuning

• Requirement Collection & Analysis: The database designers interview


prospective database users to understand and document their data requirements.
The result of this step is concisely written set of users requirements.
This concept of user-defined operations will be applied to the database and they
include both retrievals and updates in software design.

• Conceptual Design: It is a concise description of the data requirements of the


users and includes detailed descriptions of the entity types, relationships and
constraints. They are expressed using the concepts provided by the high level
data model.

• Logical Design: Identification of data model mapping is done here - RDBMS /


DBMS / Object Model.

y Schema Refinement (Normalization): Check the relational schema for


redundancies and related anomalies.

• Physical Design: Here, the internal storage structures/ access paths and file
organizations for the database files are specified. These activities and application
programs are designed and implemented as database transactions corresponding
to the high level specifications.

E-R Modeling
The Entity-Relationship model (ER Model in short) is a graphical designing tool for implementation of
database systems. It provides a common, informal and convenient model for communication between users
and the DBA for the purpose of modeling the structure of data.

The following components are used in developing an E-R Model:

y Entity
y Entity Set
y Instance
y Attribute
y Relationship
y Cardinality
y Keys

© SQL Star International Ltd. 27


Conceptual Design Using the Entity – Relationship Model

Entity: An entity is anything that exists and is distinguishable. For example, each
chair is an entity. So is each person and each automobile. Entities can have concrete
existence or constitute ideas or concepts. Concepts like love and hate are entities.

Entities can be classified as Regular entities and Weak entities.

A regular (independent) entity does not depend on any other entity for its
existence. For example, Employee is a regular entity. A regular entity is depicted
using a rectangle.

Employees

It can also be represented as:

Employees

An entity whose existence depends on the existence of another entity is called a


weak (or dependent) entity. For example, the dependent of an employee is a
weak entity, whose existence depends on the entity Employee. A dependent entity is
depicted in a double-lined box, or a darkened rectangle.

Or

During the design phase, an entity is processed further as Tables.

Entity Set: A group of similar entities forms an entity set.


Examples of entity sets are:

µ All persons
µ All automobiles
µ All emotions

Instance: A specific type of entity is called an instance.


Example: - Smith, Jones, Ally are all employees.

Attributes: Attributes are the properties that characterize an entity set.


For example, employees of an organization are modeled by the entity set
EMPLOYEE. We must include in the model the properties of the employees that may
be useful to the organization. Some of these properties are name, address, skill, etc.

An attribute is denoted by an ellipse with its type written inside thereby attached to
their respective entity.

© SQL Star International Ltd. 28


Conceptual Design Using the Entity – Relationship Model

Type Name

An attribute is attached to its entity in the following manner.

Name

Employees

During design phase, an attribute is processed further as Column of a table.

Relationship: It is an association between two or more entities or same entity set.

For example, we may have the relationship that an employee works in a


department.

Same entity set could participate in different relationship sets, or in different “roles”
in same set.

A relationship is depicted by a diamond, with the name of the relationship type.

A relation can be of following types:

„ Strong Relationship: A Strong relationship can have


Attributes.

It is shown using -
Type

„ Weak relationship: A weak relationship cannot have any attributes.

It is shown using -
or

Degree of Relationship:

The number of participating entities in a relationship is known as degree of the


relationship.

© SQL Star International Ltd. 29


Conceptual Design Using the Entity – Relationship Model

According to degree of relationship, there can be three types of relationships.

„ Unary Relationship

„ Binary Relationship

„ Ternary Relationship

„ N-ary Relationship

Unary Relationship:

A relationship where only one entity participates in more than role, is called a Unary
Relationship.
manages

Employee

Binary Relationship:

A relationship where there are two entities participating in a relationship, it is called a Binary relationship.

Example:

Employee
Manager manages

Ternary relationship:

A relationship where three entity types are involved is called a ternary relationship.

Example:

Sales sell Product


Assistant

Customer

© SQL Star International Ltd. 30


Conceptual Design Using the Entity – Relationship Model

N-ary Relationship:

An n-ary relationship set R relates n entity sets E1...En; each relationship in R


involves entities e1 E1, ..., and En.

Cardinality: It defines the numeric relationship between occurrences of entities


on either end of the relationship line.

Relationships can be classified into three types based on cardinality:

1. One-to-one: One student is issued only one card (and vice-versa).

1 Issued 1
Student Card

2. One-to-many (or many-to-one): One student can enroll for only one course, but
one course can be offered to many students.
Chen- notation

1 m
Student enroll Course

Crow’s foot Notation

Student enroll Course

3. Many-to-many: One student can take many tests, and one test can be taken by
many students.

Student Write Test

© SQL Star International Ltd. 31


Conceptual Design Using the Entity – Relationship Model

Keys:
Data items used to uniquely identify individual occurrences of an entity type.

Candidate Keys:

It is a set of attributes used to uniquely identify individual occurrences of an entity


type.
Each table may have more than one candidate key.
Primary Key:

One of the candidate keys is selected to be a primary key.

Composite key:

A candidate key with more than one attribute is called a composite key.

E-R Model Example:

Let us now see how the E-R model is implemented using the above discussed
notations.
Consider that an employee works in a department and his details stored in the
database include his id,name, department name, department id etc.

In the above figure, we show the relationship set Works_in, in which each
relationship indicates a department in which an employee works. The entities are
described by a set of attributes and identified by primary keys denoted as ‘__’.

© SQL Star International Ltd. 32


Conceptual Design Using the Entity – Relationship Model

Entities used in the above diagram are:

Entity name: Employees


Attributes: Ssn, Name, Lot
Primary Key: Ssn

Entity Name: Department


Attributes: Did, Dname, Budget
Primary Key: Did

The entity sets that participate in a relationship set need not be distinct; sometimes
a relationship might involve two entities in the same entity set. For example, in
Reports_To relationship set, every relationship is of the form (emp1, emp2).

Works_In relationship shows that an employee can work in many departments and a
department can have many employees

Relationship sets can also have descriptive attributes (e.g., the since attribute of
Works_ In).

A relationship must be uniquely identified by the participating entities, without


reference to the descriptive attributes. In the Works_in relationship set, for example,
each Works_in relationship must be uniquely identified by the combination of
employee ssn and department did. Thus, for a given employee-department pair, we
cannot have more than one associated since value.

Thus, in translating a relationship set to a relation, attributes of the relation must


include:

• Keys for each participating entity set (as foreign keys). This set of attributes
forms superkey for the relation.

• All descriptive attributes

© SQL Star International Ltd. 33


Conceptual Design Using the Entity – Relationship Model

Constraints on E-R model

Key Constraints:
A Key constraint between an entity set S and a relationship set restricts instances of
the relationship set by requiring that each entity of S participate in at most one
relationship.

Look at an example:

Consider the relationship Manages: Each dept has at most one manager,
according to the key constraint on ‘Manages’ relationship. The arrow from
Department to Manages indicates that each Department entity appears in at most
one ‘Manages’ relationship in any allowable instance of ‘Manages’. Thus given a
Department entity, we can uniquely determine the ‘Manages’ relationship in which it
appears.

Translating ER Diagrams with Key Constraints:

Map relationship to a table: Note that did is the key now. Since each department has
a unique manager, we could instead combine ‘Manages’ and Departments.

Manages table without Key constraint:

CREATE TABLE Manages(


ssn CHAR( 11),
did INTEGER,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn)
REFERENCES Employees,
FOREIGN KEY (did)
REFERENCES Departments)

© SQL Star International Ltd. 34


Conceptual Design Using the Entity – Relationship Model

Key Constraints for Ternary Relationships

The following figures show the relationship between employee, department and
locations. Since three entity are involved in the relationship with a key constraint on
the employee entity, is it known as key constraint for ternary relationship.

In the above figure, SSn, Did and Address are a primary keys in the Employee entity,
Department entity and location entity respecitively.
An arrow drawned from the employee entity indicates that an employee can work in
at most one department at a single location.

Participation Constraints:
The key constraint on ‘Manages’ tells us that a Department has at most one Manager
(indicated by arrow).

The participation constraint specifies whether the existence of an entity depends on


its being related to another entity, via the relationship type.

Participation constraints can be of two types:

µ Total participation
µ Partial participation

Total Participation constraint: Does every department have a manager? If so,


this is a participation constraint:
The total participation is indicated by a dark line between entity and relationship.

Partial Participation constraint: A participation that is not total is said to be


partial. Eg. participation of Employee in Manages is partial.

© SQL Star International Ltd. 35


Conceptual Design Using the Entity – Relationship Model

In the above example, the participation of departments in Manages is total whereas


the participation of employee in Manages is partial.

A participation constraint between an entity set S and a relationship set restricts


instances of the relationship set by requiring that each entity of S participate in at
least one relationship. Every did value in Department table must appear in a row of
the Manages table (with a non-null ssn value!). Similarly, every ssn value in
Employee table must appear in a row of the Works_in table.

Participation Constraints in SQL: We can capture participation constraints


involving one entity set in a binary relationship, but little else (without resorting to
CHECK constraints).

CREATE TABLE Dept_Mgr(


did INTEGER,
dname CHAR( 20),
budget REAL,
ssn CHAR( 11) NOT NULL,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE NO ACTION )

Weak entity
A weak entity’s existence is dependent on another (owner) entity. Hence, a weak
entity will not have it’s own key. It can be identified uniquely only by considering the
primary key of its owner entity.

y Owner entity set and weak entity set must participate in a one-to-many
relationship set (1 owner, many weak entities).

y Weak entity set must have total participation in this identifying relationship set.

© SQL Star International Ltd. 36


Conceptual Design Using the Entity – Relationship Model

Translating Weak Entity Sets: Weak entity set and identifying relationship set are
translated into a single table.

• When the owner entity is deleted, all owned weak entities must also be deleted.
For example: If the employee quits, any policy owned by the employee is
terminated. All the relevant policy and dependent information is also deleted from
the database.

To indicate that Dependent is a weak entity and policy is its identifying relationship,
we draw both with dark lines.

CREATE TABLE Dep_ Policy (


pname CHAR( 20),
age INTEGER,
cost REAL,
ssn CHAR( 11) NOT NULL,
PRIMARY KEY (pname, ssn),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE )

ISA (‘is a ‘) Hierarchies


It is the formation of new entity as a union of two or more entity sets. The process is
also known as generalization.
Here, an employee can be an hourly employee or a contract employee. Attributes are
inherited.

© SQL Star International Ltd. 37


Conceptual Design Using the Entity – Relationship Model

ISA Constraints:
There are two types of ISA constraints:

Overlap constraints : Can Joe be an Hourly_Emp, as well as a Contract_Emp


entity? (Allowed/ disallowed)

Covering constraints : Does every Employee entity also have to be an Hourly_


Emp or a Contract_ Emp entity? (Yes/ no)

Reasons for using ISA :

µ To add descriptive attributes specific to a subclass;


µ To identify entities those participate in a relationship.

Translating ISA hierarchies to relations:


General approach:

3 relations: Employee, Hourly_Emp and Contract_Emp.

Hourly_ Emp : Every employee is recorded in Employee.

For Hourly emps, extra info recorded in Hourly_Emp ( hourly_wages, hours_worked,


ssn) must delete Hourly_ Emps tuple if referenced Employees tuple is deleted).

Queries involving all employees easy, those involving just Hourly_Emp require a join
to get some attributes.

Alternative:
y Just Hourly_ Emp and Contract_ Emp.
y Hourly_ Emp : ssn, name, lot, hourly_ wages, hours_ worked.

© SQL Star International Ltd. 38


Conceptual Design Using the Entity – Relationship Model

y Contract_ Emp : ssn, name, lot, contractid.


y Each employee must be in one of these two subclasses

Aggregation
Aggregation is meant to represent a relationship between a whole object and its
component parts. It is used when we have to model a relationship involving (entity
sets and a relationship set).

Aggregation allows us to treat a relationship set as an entity set for purposes of


participation in (other) relationships. For example, a Project is sponsored by a
Department. This is a simple relationship.

An Employee monitors this Sponsorship (and not Project or Department). This is


aggregation.

Monitors are mapped to the table like any other relationship set.

Aggregation vs. Ternary Relationship:


Can we express relationships involving other relationships without using
aggregation?

The use of aggregation vs. ternary relationship may be guided by certain integrity
constraints. For example: we can impose a constraint that each sponsorship is
monitored by at most one employee (not possible without aggregation).

Conceptual Design Using the E-R Model


The design choices are:

µ Should a concept be modeled as an entity or an attribute?


µ Should a concept be modeled as an entity or a relationship?
µ Identifying relationships: Binary or ternary? Aggregation?

© SQL Star International Ltd. 39


Conceptual Design Using the Entity – Relationship Model

Entity vs. Attribute


Should address be an attribute of Employees or an entity (connected to Employees
by a relationship)? It all depends upon the use we want to make of address
information, and the semantics of the data.

If we have several addresses per employee, address must be an entity (since


attributes cannot be set- valued). If the structure (city, street, etc.) is important,
e.g., we want to retrieve employees in a given city, address must be modeled as an
entity (since attribute values are atomic). Otherwise, address can be used as an
attribute of Employee.

Works_In does not allow an employee to work in a department for two or more
periods. Why?

Similar to the problem of wanting to record several addresses for an employee: we


want to record several values of the descriptive attributes for each instance of this
relationship.

Consider that an employee works in a given department over more than one period.
This possibility is ruled out by the ER diagram’s semantics of previous slide. The
problem is that we want to record several values for descriptive attributes for each
instance of Works_in relationship. We can address this problem by introducing an
entity set called Duration, with attributes from and to.

© SQL Star International Ltd. 40


Conceptual Design Using the Entity – Relationship Model

Entity vs. Relationship


The ER diagram above is OK if a manager gets a separate discretionary budget for
each department.

But, what if a manager gets a discretionary budget that covers all managed
departments? The following factors follow:

„ Redundancy of dbudget, which is stored for each dept managed by the


manager.

„ Misleading: suggests dbudget (DB) tied to managed dept.

One of the possible designs to resolve the two issues of the previous ER diagram:

We model the appointment as an entity set, say Mgr_appt, and use a ternary relationship, say manages, to
relate a manager, an appointment, and a department. The budget is now associated with the appointment of
the employee

© SQL Star International Ltd. 41


Conceptual Design Using the Entity – Relationship Model

as manager of a group of departments. The details of an appointment (such as the discretionary budget) are
not repeated for each department that is included in the appointment now, although there is still one
Manages relationship instance per such Department.

The figure below models a situation in which an employee can own several policies,
each policy can be owned by several employees, and each dependent can be covered
by several policies.

Suppose we have following constraint:

Each policy is owned by just 1 employee. Key constraint on Policy would mean policy
can only cover 1 dependent!

Binary Vs Ternary Relationship - A better Design


The key constraints allow us to combine Purchaser with Policy and Beneficiary with
Dependent.

© SQL Star International Ltd. 42


Conceptual Design Using the Entity – Relationship Model

Participation constraints lead to NOT NULL constraints.

CREATE TABLE Policy (


policyid INTEGER,
cost REAL,
ssn CHAR( 11) NOT NULL,
PRIMARY KEY (policyid),
FOREIGN KEY (ssn) REFERENCES Employee,
ON DELETE CASCADE )

CREATE TABLE Dependent (


pname CHAR( 20),
age INTEGER,
policyid INTEGER,
PRIMARY KEY (pname, policyid),
FOREIGN KEY (policyid) REFERENCES Policy,
ON DELETE CASCADE )

Constraints Beyond the ER Model


The constraints in the ER Model are as follows:

µ A lot of data semantics can (and should) be captured.


µ But, some constraints cannot be captured in ER diagrams.

Hence, there is a further need for refining the schema. Relational schema obtained
from ER diagram is a good first step. But, the ER design is subjective and can’t
express certain constraints; so this relational schema may need refinement.

Functional Dependencies

For example, a department can’t order two distinct parts from the same supplier. We
cannot express this with respect to ternary Contracts relationship.
Normalization refines ER design by considering FDs.

The next chapter will deal with Normalization to refine the Entity Relationship Design.

© SQL Star International Ltd. 43


Conceptual Design Using the Entity – Relationship Model

Summary

In this chapter, we have discussed:

• Process of designing a database


• List the components of an E-R model
• Drawing E-R diagrams
• Designing E-R Diagrams with key constraints
• Aggregation
• Conceptual Design using the ER Model
• Constraints beyond the E-R Model

© SQL Star International Ltd. 44


Schema Refinement and Normalization

Chapter 4

Schema Refinement and Normalization

Normalization
Why Normalization?
What is a Normal Form?
Types of Normal Forms
First Normal Form
Functional Dependencies
Second Normal Form
Transitive Dependency
Third Normal Form
Boyce-Codd Normal Form
Multivalued Dependency
Fourth Normal Form
Fifth Normal Form

© SQL Star International Ltd. 45


Schema Refinement and Normalization

Objectives

In this chapter, we will discuss:

• Normalization
• Reasons for Normalization
• Refining a database
• Defining Normal Form
• Types of Normal Forms

© SQL Star International Ltd. 46


Schema Refinement and Normalization

Normalization
Normalization is a process of designing a consistent Database by minimizing
redundancy and ensuring Data Integrity through the principle of Non-loss
decomposition.

Why Normalization?

In order to produce good database design, we should ask questions like:

a. Does the design ensure that all database operations will be efficiently
performed and that the design does not make the DBMS perform expensive
consistency checks, which could be avoided?

b. Is the information unnecessarily replicated?

Unless these issues are properly handled, several difficulties like redundancy and loss of

information may arise. There are several methods to avoid the above-mentioned

problems. One such method is database decomposition through normalization, which

tries to minimize redundancy and the efforts of checking of constraints and dependencies.

Database normalization:

y Ensures Data Integrity

Now, let us see what is Data Integrity.

Data integrity ensures the correctness of data stored within the database.
It is achieved by imposing integrity constraints.
An integrity constraint is a rule, which restricts values present in the
database.

There are three integrity constraints:

♦ Entity constraints:
The entity integrity rule states that the value of the primary key can never

be a null value (a null value is one that has no value and is not the same as a

blank). Because a primary key is used to identify a unique row in a relational

© SQL Star International Ltd. 47


Schema Refinement and Normalization

table, its value must always be specified and should never be unknown. The

integrity rule requires that insert, update and delete operations maintain the

uniqueness and existence of all primary keys.

♦ Domain Constraints:
Only permissible values of an attribute are allowed in a relation.

♦ Referential Integrity constraints:


The referential integrity rule states that if a relational table has a
foreign key, then every value of the foreign key must either be null or
match the values in the relational table in which that foreign key is a
primary key.

y Prevents Redundancy in data

A non-normalized database is vulnerable to data anomalies, if it stores data


redundantly. If data is stored in two locations, but later updated in only one of the
locations, then the data is inconsistent; this is referred to as an "update anomaly". A
normalized database stores non-primary key data in only one location.

Redundancy can be:

♦ Direct Redundancy:
Direct redundancy can result due to the presence of same data in two
different locations, thereby, leading to anomalies such as reading, writing,
updating and deleting.

♦ Indirect redundancy:
Indirect Redundancy results due to storing information that can be
computed from the other data items stored within the database.

Normalized databases have a design that reflects the true dependencies between
tracked quantities, allowing quick updates to data with little risk of introducing
inconsistencies. There are formal methods for quantifying "how normalized" a
relational database is, and these classifications are called Normal Forms (or NF).

What is a Normal Form?


Forms are designed to logically address potential problems such as inconsistencies
and redundancy in information stored in the database.

A database is said to be in one of the Normal Forms, if it satisfies the rules required
by that Form as well as previous; it also will not suffer from any of the problems
addressed by the Form.

© SQL Star International Ltd. 48


Schema Refinement and Normalization

Types of Normal Forms


Several normal forms have been identified, the most important and widely used of
which are:

y First normal form (1NF)


y Second normal form (2NF)
y Third normal form (3NF)

y Boyce-Codd normal form (BCNF)


y Fourth normal form (4NF)
y Fifth Normal Form (5NF)

A form is said to be in its particular form only if it satisfies the previous Normal form.

First Normal Form (1NF)

A Relation is in 1NF, if every row contains exactly one value for each attribute.

Let us understand this with an example.

Consider a table ‘Faculty’ which has information about the faculty, subjects and, the
number of hours allotted to each subject they teach.

Faculty:

Faculty code Faculty Name Date of Birth Subject Hours


100 Smith 17/07/64 Java 16
PL/SQL 8
Linux 8
101 Jones 24/12/72 Java 16
Forms 8
Reports 12
102 Fred 03/02/80 SQL 10
Linux 8
Java 16
103 Robert 28/11/66 SQL 10
PL/SQL 8
Forms 8

Anomalies: -

The above table does not have any atomic values in the ‘Subject’ column. Hence, it is
called un-normalized table. Inserting, Updating and deletion would be a problem is
such table.

Hence it has to be normalized.

© SQL Star International Ltd. 49


Schema Refinement and Normalization

For the above table to be in first normal form, each row should have atomic values.
Hence let us re-construct the data in the table. A ‘S.No’ column is included in the
table to uniquely identity each row.

SNO Faculty Faculty Date of Subject Hours


code Name Birth
1 100 Smith 17/07/64 Java 16
2 100 Smith 17/07/64 PL/SQL 8
3 100 Smith 17/07/64 Linux 8
4 101 Jones 24/12/72 Java 16
5 101 Jones 24/12/72 Forms 8
6 101 Jones 24/12/72 Reports 12
7 102 Fred 03/02/80 SQL 10
8 102 Fred 03/02/80 Linux 8
9 102 Fred 03/02/80 Java 16
10 103 Robert 28/11/66 SQL 10
11 103 Robert 28/11/66 PL/SQL 8
12 103 Robert 28/11/66 Forms 8

This table shows the same data as the previous table but we have eliminated the
repeating groups.
Hence the table is now said to be in First Normal form (1NF). But we have
introduced Redundancy into the table now. This can be eliminated using Second
Normal Form (2NF).

Functional Dependencies (FDs)

Functional dependency determines the set of values of the attribute based on


another attribute.

It is denoted by

A -> B i.e., B is functionally dependent on A

Or

A determines B.

Functional Dependencies can be of two types:

y Full Functional Dependency

y Partial Functional Dependency

© SQL Star International Ltd. 50


Schema Refinement and Normalization

Full Functional Dependency:

A Functional Dependency A -> B is a full functional dependency if removal of any


attribute x from A means that the dependency does not hold any more.

{Empno, Project_no} -> HOURS

Full functional dependency:

Empno ->hours and Project_no ->Hours

In the above example, Hours is fully functional dependent on both Empno and
Project_no.

Why? The reason is:

The number of hours spent on the project by a particular employee cannot be


determined with the project number (project_no) alone. It needs the employee
number (empno) as well.

Partial Dependency:

An FD A -> B is a partial dependency if there is some attribute x Є A (x subset of


A) , that can be removed from A and the dependency will still hold.

{Empno, Project_no } -> Ename

Partial dependency:
Empno -> Ename holds.

In the above example, Ename is partially dependent on {Empno, Project_no}


Reason being, employee name (ename) can be determined using the employee id
(empno) alone even if project_no is removed from the relation.

For a table to be in 2nd Normal form, there should be no partial dependencies.

Second Normal Form (2NF)


A relation is in 2NF, if it is in 1NF and every non-key attribute is fully functionally
dependent on the primary key of the relation.

2NF prohibits partial dependencies.

The steps for converting a database to 2NF are as follows:

• Find and remove attributes that are related to only a part of the key.
• Group the removed items in another table.
• Assign the new table a key that consists of that part of the old composite key.

© SQL Star International Ltd. 51


Schema Refinement and Normalization

If a relation is not in 2NF, it can be further normalized into a number of 2NF


relations. Let us consider the table we obtained after first normalization.

SNO Faculty Faculty Date of Subject Hours


code Name Birth
1 100 Smith 17/07/64 Java 16
2 100 Smith 17/07/64 PL/SQL 8
3 100 Smith 17/07/64 Linux 8
4 101 Jones 24/12/72 Java 16
5 101 Jones 24/12/72 Forms 8
6 101 Jones 24/12/72 Reports 12
7 102 Fred 03/02/80 SQL 10
8 102 Fred 03/02/80 Linux 8
9 102 Fred 03/02/80 Java 16
10 103 Robert 28/11/66 SQL 10
11 103 Robert 28/11/66 PL/SQL 8
12 103 Robert 28/11/66 Forms 8

While eliminating the repeating groups, we have introduced redundancy into table.
Faculty Code, Name and date of Birth are repeated since the same faculty is multi
skilled.
To eliminate this, let us split the table into 2 parts; one with the non-repeating
groups and the other for repeating groups.

Faculty:
Faculty code Faculty Name Date of Birth
100 Smith 17/07/64
101 Jones 24/12/72
102 Fred 03/02/80
103 Robert 28/11/66

Faculty_code Faculty_name, Date_of_Birth

The other table is those with repeating groups.

Subject:
SNO Faculty code Subject Hours
1 100 Java 16
2 100 PL/SQL 8
3 100 Linux 8
4 101 Java 16
5 101 Forms 8
6 101 Reports 12
7 102 SQL 10
8 102 Linux 8
9 102 Java 16
10 103 SQL 10
11 103 PL/SQL 8
12 103 Forms 8

© SQL Star International Ltd. 52


Schema Refinement and Normalization

Faculty Code is the only key to identify the faculty name and the date of birth.

Hence, Faculty code is the primary key in the first table and foreign key in the second table.

Faculty code is repeated in the Subject table. Hence, we have to take into account
the ‘SNO’ to form a composite key in Subject table. Now, SNO +Faculty code can
unique identity each row in this table.

Hence, the relation is now in Second Normal form.

Anomalies in 2nd NF:

The situation could lead to the following problems:

• Insertion: Inserting the records of various Faculty teaching same subject


would result the redundancy of hours information.

• Updation: For a subject, the number of hours allotted to a subject is


repeated several times. Hence, if the number of hours has to be changed, this
change will have to be recorded in every instance of that subject. Any
omissions will lead to inconsistencies.

• Deletion: If a faculty leaves the organization, information regarding hours


allotted to the subject is lost.

This Subject table should therefore be further decomposed without any loss of information as:

SNO Faculty code Subject

Subject Hours

Transitive Dependency
Transitive dependencies arise:

• When one non-key attribute is functionally dependent on another non-key


attribute.
• FD: non-key attribute -> non-key attribute

• And when there is redundancy in database.

Third Normal Form


A relation is in 3NF, if it is in 2NF and no non-key attribute of the relation is
transitively dependent on the primary key.

3NF prohibits transitive dependencies.

© SQL Star International Ltd. 53


Schema Refinement and Normalization

In order to remove the anomalies that arose in Second Normal Form and to remove
transitive dependencies, if any, we have to perform third normalization.
Now let us see how to normalize the second table obtained after 2NF.

Subject:

SNO Faculty code Subject Hours


1 100 Java 16
2 100 PL/SQL 8
3 100 Linux 8
4 101 Java 16
5 101 Forms 8
6 101 Reports 12
7 102 SQL 10
8 102 Linux 8
9 102 Java 16
10 103 SQL 10
11 103 PL/SQL 8
12 103 Forms 8

In this table, hours depend on the subject and subject depends on the Faculty code and SNO. But, hours is neither dependent on the
faculty code nor the SNO. Hence, there exits a transitive dependency between SNO, Subject and Hours.

If a faculty code is deleted, due to transitive dependency, information regarding the subject and hours allotted to it will be lost.

For a table to be in 3rd Normal form, transitive dependencies must be eliminated.

So, we need to decompose the table further to normalize it.

Fac_Sub:

SNO Faculty code Subject


1 100 Java
2 100 PL/SQL
3 100 Linux
4 101 Java
5 101 Forms
6 101 Reports
7 102 SQL
8 102 Linux
9 102 Java
10 103 SQL
11 103 PL/SQL
12 103 Forms

© SQL Star International Ltd. 54


Schema Refinement and Normalization

Sub_Hrs:

Subject Hours
Java 16
PL/SQL 8
Linux 8
Forms 8
Reports 12
SQL 10

After decomposing the ‘Subject’ table we now have ‘Fac_Sub’ and ‘Sub_Hrs’ table respectively. By doing so, the following
anomalies are addressed in the table.

Insertion: - No redundancy of data for subject and hours while inserting the records.

Updation: - Subject and hours are stored in the separate table. So updation becomes much easier as there is no repetitiveness of data.

Deletion: - Even if the faculty leaves the organization, the hours allotted to a particular subject can be still retrieved from the Sub_Hrs
table.

Boyce–Codd Normal Form (BCNF)


The intention of Boyce-Codd Normal Form (BCNF) is that - 3NF does not satisfactorily
handle the case of a relation processing two or more composite or overlapping
candidate keys.

A relation R is said to be in BCNF, if and only if every determinant is a candidate key.

In most cases, third normal form is the sufficient level of decomposition. But some
case requires the design to be further formalized upto the level of 4th as well as 5th.
These are based on the concept of MultiValued Dependency. Let us have a idea
about it now.

Multivalued Dependency:
Multivalued dependency defined by X Y is said to hold for a relation R(X,Y,Z) if
for a given set of values for X, there is a set of associated values for set of values of
attribute Y, and X values depend only on X values and have no dependence on the
set of attributes Z.

Fourth Normal Form (4NF)


A relation is said to be in fourth normal form if each table contains no more than one
multi-valued dependency per key attribute.

© SQL Star International Ltd. 55


Schema Refinement and Normalization

Seminar Faculty Topic


DBP-1 Brown Database Principles
DAT-2 Brown Database Advanced Techniques
DBP-1 Brown Data Modeling Techniques
DBP-1 Robert Database Principles
DBP-1 Robert Data Modeling Techniques
DAT-2 Maria Database Advanced Techniques

In the above example, same topic is being taught in a seminar by more than 1
faculty. And Each Faculty takes up different topics in the same seminar. Hence, Topic
names are being repeated several times. This is an example of multivalued
dependency. For a table to be in fourth Normal Form, multivalued dependency must
be avoided.
To eliminate multivalued dependency, split the table such that there is no
multivalued dependency.

Seminar Faculty Seminar Topic


DBP-1 Brown DBP-1 Database Principles
DAT-2 Brown DAT-2 Database Advanced
Techniques
DBP-1 Robert
DBP-1 Data Modeling Techniques
DAT-2 Maria

Fifth Normal Form


A relation is said to be in 5NF if and only if it is in 4NF and every join dependency in
it is implied by the candidate keys.
Fifth normal form deals with cases where information can be reconstructed from
smaller pieces of information that can be maintained with less redundancy. It
emphasizes on lossless decomposition.
Consider the following example:
Faculty Seminar Location
Brown DBP-1 New York
Brown DAT-2 Chicago
Robert DBP-1 Chicago

If we were to add the seminar DAT-2 to New York, we would have to add a line to
the table for each instructor located in New York.

© SQL Star International Ltd. 56


Schema Refinement and Normalization

The table would look like as shown below adding the above information:

Faculty Seminar Location


Brown DBP-1 New York
Brown DAT-2 Chicago
Robert DBP-1 Chicago
Brown DAT-2 New York
Robert DAT-2 New York

From the above table, we observe that there is a redundancy of data stored for
Brown’s information. So to eliminate this redundancy, we have to do a ‘Non-Loss
decomposition’ of the table.
Consider the following decomposition of the above table into fifth normal form:

Faculty Seminar
Brown DBP-1
Brown DAT-2
Robert DBP-1
Robert DAT-2

Seminar Location
DBP-1 New York
DAT-2 Chicago
DBP-1 Chicago
DAT-2 New York

Faculty Location
Brown New York
Brown Chicago
Robert Chicago
Robert New York

Generally, table is in fifth normal form when its information content cannot be
reconstructed from several smaller tables, i.e., from tables having fewer fields than
the original table, each table having different keys.
In the normalized form, the fact that ‘Brown’ traveling to ‘New York’ is recorded only
once, whereas, in the unnormalized form it may be repeated many times.

An attempt has been made to explain Normal forms in a simple yet understandable
manner.

Some redundancies are unavoidable. One should take care while normalizing a table
so that data integrity is not compromised for removing redundancies.

© SQL Star International Ltd. 57


Schema Refinement and Normalization

Summary

In this chapter, we have discussed:

y Normalization
y Reasons for Normalization
y Refining a database
y Normal Form
y Types of Normal Forms

© SQL Star International Ltd. 58


Supertypes and Subtypes

Chapter 5

Supertypes and Subtypes

Supertype
Subtype
Inheritance
Relationships and Subtypes
Supertype/Subtype Notation
Generalization and Specialization
Constraints in Supertype
Constraints in Supertype/Subtype
Supertype/Subtype Hierarchy
Domains
Domain Integrity Constraints

© SQL Star International Ltd. 59


Supertypes and Subtypes

Objectives

In this chapter, we will discuss:

y Advanced concepts of database design


y Defining Subtypes and Supertypes
y Generalization and Specialization
y Using Constraints in Supertype
y Using Constraints in Supertype/Subtype Discriminators
y Supertype/Subtype Hierarchy
y Domains

© SQL Star International Ltd. 60


Supertypes and Subtypes

Basics
Supertype

Supertype is a generic parent entity that contains generalized attributes and key. It
is a generic entity type that has a relationship with one or more subtypes.

Subtype
A subtype is a subgrouping of the entities in an entity type, which has attributes that
are distinct from those in other sub groupings. Subtypes are category entities that
inherit the attributes keys, and relationships of the Supertype entity. Each subtype
entity will contain the migrated foreign key and only those attributes that pertain to
the category type.
Inheritance
Subtype entities inherit values of all attributes of the supertype. An instance of a
subtype is also an instance of the supertype.
By this important property, the subtype entities inherit values of all attributes of the
supertype. It makes it unnecessary to include supertype attributes redundantly with
the subtypes.

Attributes shared
by all entities

SUPERTYPE General entity


type

And so forth

Specialized versions of
Subtype 1 Subtype 2 supertype

Attributes Attributes
unique unique
to subtype 1 to subtype 2

Figure 1: Basic notation for supertype/subtype relationships

© SQL Star International Ltd. 61


Supertypes and Subtypes

Example:

The following figure shows an Employee supertype with three subtypes.

Addres
Employ
s
ee na
All Employee subtypes will
Employ EMPLOYEE Date_hi have Emp name, number,
ee_no red date_hired and address.

Each Employee subtype


Hourly Salaried will also have its own
Employee employee Consultant attributes.

contact
_numbe
Hourly_ annual_ Stock_o Billing_r
rate salary ption ate

Figure 2: An Employee supertype with three subtypes

Relationships and Subtypes


a) Relationships at the supertype level indicate that all subtypes will participate in
the relationship.
b) The instances of a subtype may participate in a relationship unique to that
subtype. In this situation, the relationship is shown at the subtype level.

© SQL Star International Ltd. 62


Supertypes and Subtypes

Example:

The following figure shows the supertype/subtype relationships in a hospital:

Figure 3: supertype/subtype relationships in a hospital

Supertype/Subtype Notation

The hieararchy of the supertype/suntype notation is as follows:

y SUPERTYPE
y SUBTYPE 3
y SUBTYPE 2
y SUBTYPE 1
y Attributes unique to subtype 1
y Attributes unique to subtype 2
y Attributes unique to subtype 3
y Attributes shared by all entities

© SQL Star International Ltd. 63


Supertypes and Subtypes

Generalization and Specialization

Generalization

In general, an object can be described by its shared characteristics; the attributes.


For example, we can characterize an employee by their employee id, name, job title
and skill set.

Another method of characterizing entities is by both similarities and differences. For


example, suppose an organization categorizes the work it does, into internal and
external projects. Internal projects are done on behalf of some unit within the
organization. External projects are done for entities outside of the organization. We
can recognize that both types of projects are similar in that each involves work done
by employees of the organization within a given schedule. Yet, we also recognize
that there are differences between them. External projects have unique attributes,
such as a customer identifier and the fee charged to the customer.

This process of categorizing entities by their similarities and differences is known as


generalization.

Generalization hierarchies should be used when:


y A large number of entities appear to be of the same type
y Attributes are repeated for multiple entities
y The model is continually evolving
Rules for Generalization

The primary rule of generalization hierarchies is that each instance of the supertype
entity must appear in at least one subtype; likewise, an instance of the subtype must
appear in the supertype.
Subtypes can be a part of only one generalization hierarchy. That is, a subtype
cannot be related to more than one supertype. However, generalization hierarchies
may be nested by having the subtype of one hierarchy be the supertype for another.
Subtypes may be the parent entity in a relationship, but not the child. If this were
allowed, the subtype would inherit two primary keys.
The following figure shows three entity types: CAR, TRUCK and MOTORCYCLE.

Specialization
It is the process of defining one or more subtypes of the supertype, and forming
supertype/subtype relationships TOP-DOWN.

© SQL Star International Ltd. 64


Supertypes and Subtypes

Figure 4: Example of Generalization

Constraints in Supertype

Completeness Constraints

The completeness constraint addresses the question of whether an instance of a


supertype must also be a member of at least one subtype.

There are two possible rules:

a) Total Specialization Rule

The total specialization rule specifies that each entity instance of the supertype must
be a member of some subtype in the relationship. For example: all STUDENTS are
either UNDERGRADUATE or GRADUATE students.

It is denoted by a double line.

c) Partial Specialization Rule

The partial specialization rule specifies that an entity instance of the supertype is
allowed to not belong to any subtype. For example: FACULTY and STAFF are not the
only possible members of the entity EMPLOYEE.

It is denoted by a single line.

© SQL Star International Ltd. 65


Supertypes and Subtypes

Following are the examples of completeness constraints.

Figure 5: Total specialization rule

Figure 6: Partial specialization rule

Disjointness Constraints
The disjoint constraint addresses the question of whether an instance of a Super
type may simultaneously be a member of two (or more) subtypes.

© SQL Star International Ltd. 66


Supertypes and Subtypes

There are two possible rules:

a) Disjoint Rule

The disjoint rule specifies that if an entity instance is a member of one subtype, it
cannot simultaneously be a member of any other subtype. For example: all
PERSONS are either MALE or FEMALE.

It is denoted by the letter “d”.

b) Overlap Rule

The overlap rule specifies that an entity instance can simultaneously be a member
of two (or more) subtypes. For example: an ATHLETE can be both a RUNNER and a
JUMPER. It is denoted by the letter “O”.

Figure 7: An example of Disjoint Rule

© SQL Star International Ltd. 67


Supertypes and Subtypes

Figure 8: An example of Overlap Rule

Constraints in Supertype/Subtype
Discriminators

Subtype Discriminator

The subtype discriminator is “an attribute of the supertype whose values determine
the target subtype(s)”. It is used to direct into which of the subtypes (if any) a new
instance of the supertype should be inserted.

Disjoint - a simple attribute with alternative values to indicate the possible


subtypes.

Overlapping - a composite attribute whose subparts pertain to different subtypes.


Each subpart contains a Boolean value to indicate whether or not the instance
belongs to the associated subtype.

The following figure introduces subtype discriminators - disjoint rule and overlap
rule.

© SQL Star International Ltd. 68


Supertypes and Subtypes

Figure 9: Introducing a subtype discriminator (Disjoint Rule)

Figure 10: Introducing a subtype discriminator (Overlap Rule)

© SQL Star International Ltd. 69


Supertypes and Subtypes

Supertype/Subtype Hierarchy
A supertype/subtype hierarchy is “a hierarchical arrangement of supertypes and
subtypes, where each subtype has only one supertype”.

In this hierarchy, attributes are assigned at the highest logical level that is possible
in the hierarchy. Subtypes that are lower in the hierarchy inherit attributes not only
from their immediate supertype, but also from all supertypes higher in the
hierarchy, up to the root.

The following figure shows the supertype/subtype hierarchy:

Figure 11: Example of supertype/subtype hierarchy

© SQL Star International Ltd. 70


Supertypes and Subtypes

Domains
A domain is a conceptual pool of values from which one or more attributes draw their
actual values.

Examples:

DOMAIN AGE RANGE 0-127


ATTRIBUTE EMPLOYEE.AGE 16-65
ATTRIBUTE DEPENDENT.AGE 0-60

Two values can only be compared if they come from the same domain.

Defining a Domain
The syntax to create a domain in a database is as follows:

CREATE { DOMAIN | DATATYPE } [ AS ] domain-name data-type


[ [NOT] NULL]
[DEFAULT default-value]
[ CHECK ( condition ) ]
Domain-name: identifier
data-type: built-in data type, with precision and scale

Example:
DOMAIN GENDER

- Data Type: Character


- Length: 6 bytes
- Allowable Values: Male, Female, Null
- Storage Format: Uppercase
- Operations Allowed:
- Inherited Operators: String, Unstring, =
- Input Editing: Nil
- Extra Functions: Is_ Male, Is_Female,What_Gender

Domain Integrity Constraints

Domains are used in the relational model to define the characteristics of the columns
of a table. The domain specifies its own name, data type and logical size. The logical
size represents the size as perceived by the user, not how it is implemented
internally.
For example, for an integer, the logical size represents the number of digits used to
display the integer, not the number of bytes used to store it. The domain integrity

© SQL Star International Ltd. 71


Supertypes and Subtypes

constraints are used to specify the valid values that a column defined over the
domain can take. You can define the valid values by listing them as a set of values
(such as an enumerated data type in a strongly typed programming language), a
range of values, or an expression that accepts the valid values. Strictly speaking,
only values from the same domain should ever be compared or be integrated
through a union operator.
Note that a formal treatment of the domain concept would require the following for
all of the domains:

y The ability to specify the complete set of domains that apply to a given
database (the result of any operation on any column defined over any domain
must then yield a result in one of the specified domains).
y The ability to specify - for every domain, pair of domains, triplet of domains,
and so on - which operators can be applied to the values taken from the
domains, as well as what the domain of the result must be.
y The ability to specify an ordering of the values in the domain.

Summary

© SQL Star International Ltd. 72


Supertypes and Subtypes

In this chapter, we have discussed:

y Advanced concepts of database design


y Defining Subtypes and Supertypes
y Generalization and Specialization
y Using Constraints in Supertype
y Using Constraints in Supertype/Subtype Discriminators
y Supertype/Subtype Hierarchy
y Domains

© SQL Star International Ltd. 73


Exercises

Exercises

© SQL Star International Ltd. 2008 74


Exercises

Chapter 6

E-R Diagrams

1. Construct an E-R Diagram for a hospital with a set of patients and a set of medical
doctors. A log of the various conducted tests is associated with each patient. Construct
the normalized relations from this ER diagram.
2. Construct an E-R Diagram for a car insurance company with a set of customers, each of
who owns a number of cars. Each car has a number of accidents associated with it.
Construct the normalized relations from this ER diagram.
3. Consider the following E-R Diagram: Represent the diagram in the relational model by
relations (tables).
4. Suppose we have a database consisting of the following 3 relations:
FREQUENTS ( DRINKER, BAR )
SERVES ( BAR, BEER )
LIKES ( DRINKER, BEER )
The first relation indicates the bars each drinker visits, the second tells what beers
each bar serves, and the last indicates which beers each drinker likes to drink.
Draw an E-R Diagram for the given relations.
5. An education database contains information about an in-house company education-
training scheme. For each training course, the database contains details of all
prerequisite courses and all offerings for that course; and for each offering it contains
details of all teachers and all student enrollments for that offering. The database also
contains information about employees. The relevant relations in outline are as follows:
COURSE ( COURSE#, TITLE )
PREREQ (SUP_COURSE#, SUB_COURSE# )
OFFERING ( COURSE#, OFF#, OFFDATE, LOCATION )
TEACHER ( COURSE#, OFF#, EMP# )
ENROLLMENT ( COURSE#, OFF#, EMP#, GRADE )
EMPLOYEE ( EMP#, ENAME, JOB )
The meaning of the PREREQ relation is that the superior course (SUP_COURSE#) has the
subordinate course (SUB_COURSE#) as an immediate prerequisite.
Draw an E-R Diagram for this education database.

© SQL Star International Ltd. 2008 75


Exercises

Normalization
6. Consider the table:

Course_No. Course_Name Student_Name Address Credits


CIS200 Information Systems John Warner 23, Main St. 5
CIS220 Information Systems Tim Hoffman 87, River Rd. 5
CIS220 Information Systems Jenny Lin 18, Wind Circle 5
CIS450 System Ana. and Des. Alice Chalmers 5483, Ocean Bld. 5
CIS480 Communication N/ws. John Warner 23, Main St. 5
CIS480 Communication N/ws. Jenny Lin 18, Wind Circle 5
a) Does the relation contain any repeating groups? Explain.
b) In what normal form is this relation?
c) If the relation is not already in third normal form, develop new relations that meet the
requirements of 3NF.
7. The following figure is an un-normalized representation of a collection of information to
be recorded in a company personnel database. The figure is intended to be read as
follows:
1. The company has a set of departments.
2. Each department has a set of employees, a set of projects, and a set of
offices.
3. Each employee has a job history (set of jobs the employee has held). For
each such job, the employee also has a salary history (set of salaries
received while employed on that job).
4. Each office has a set of phones.
The database is to contain the following information:
• For each department: department number (unique), budget, and the
department manager’s employee number.
• For each employee: employee number (unique), current project number,
office number, and phone number; also, title of each job the employee has
held, plus date and salary for each distinct salary received in that job.
• For each project: project number (unique) and budget.
• For each office: office number (unique), area in square feet, and numbers
(unique) of all phones in that office.
• Design an appropriate set of normalized relations to represent this
information. State any assumptions you make concerning the dependencies
involved.

Normalization - A sample example

© SQL Star International Ltd. 2008 76


Exercises

Normalization is defined briefly, but accurately in the following statement:


‘The Key, the Whole Key and Nothing but the Key’!
Typically, the literature on normalization covers many levels of normalization, 9 is not
uncommon, but this seems to me to be a race amongst academics to identify as many
levels as possible, in 99 cases out of 100, 3 levels of normalization are all that is required.
1st Normal Form: Converting an un-normalized data structure, such as a report or an
order form into 1st Normal Form (1NF) is commonly referred to as removing repeating
groups, but also may involve removing complex groups, such as the Address Group
described in rule 2. The aim is to ensure that each item is atomic.
2nd Normal Form: Converting a 1NF data structure into 2nd Normal Form (2NF) involves
looking at each non-primary key attribute and ensuring that it depends on the whole of the
key and not just part of it.
3rd Normal Form: Converting a 2NF data structure into 3rd Normal Form (3NF) involves
looking at the interrelationships between non-key attributes to see if any non-key
attributes depend only on each other.
This is all best described by looking at an example. Consider the following table, which has
been built up by an order entry clerk.
However, this seems to be a clumsy approach and results in a three part key consisting of
Cust#, Ord# and Part#. A simpler approach is to separate the repeating groups out into
separate tables.
Step 1: Remove the repeating group of orders:
CUSTOMERS(Customer_Number, Customer_Name)
ORDERS(Order_Number, Customer_Number*, Order_Date, (Part_Number,
Part_Description,Part_Quantity,Part_Price,Supplier_Number,
Supplier_Name))
Step 2: Remove the repeating group of parts:
CUSTOMERS(Customer_Number, Customer_Name)
ORDERS(Order_Number, Customer_Number*, Order_Date)
ORDER_PARTS(Part_Number,Order_Number*,Part_Description, Part_Quantity,
Part_Price, Supplier_Number, Supplier_Name)
The structure is now in 1NF, since there are no repeating or complex group items (each
item depends on the key). The next step is to convert the structure into 2NF, by
examining each non-primary key attribute to ensure that each depends on the whole of
the key.
The CUSTOMERS and ORDERS tables each has a single column making up their primary
key and are therefore by definition in 2NF. However, looking at the ORDER_PARTS table, it
can be seen that Part_Description, Part_Price, Supplier_Number and Supplier Name only
depend on Part_Number, i.e. their values are the same regardless of Order_Number.
(Part_Quantity depends on the whole of the key since different quantities can appear on
different orders.) To convert to 2NF, a separate table is created for part descriptions,
prices, and supplier details.

© SQL Star International Ltd. 2008 77


Exercises

CUSTOMERS(Customer_Number, Customer_Name)
ORDERS(Order_Number, Customer_Number*, Order_Date)
ORDER_PARTS(Part_Number, Order_Number*, Part_Quantity)
PARTS(Part_Number,Part_Description,Part_Price,Supplier_Number,
Supplier_Name)
The structures are now in 2NF, since every non-primary key attribute depends on the
whole of the key. The next step is to convert the structure into 3NF by ensuring that each
non-primary key attribute depends on nothing, but the key.
The CUSTOMERS table is patently in 3NF, because there is no non-primary key attribute
for Customer_Name to depend on. The ORDERS table is in 3NF, because there is no
dependency between Order_Date and Customer_Number (a customer can place different
orders on different dates). The ORDER_PARTS table is in 3NF, because the quantity
ordered is dependent on both the order number and the part number. Looking however at
the PARTS table it can be seen that the Supplier_Name attribute depends on the
Supplier_Number and has nothing to do with the part number. To convert the structure
into 3NF, a separate table is created containing supplier details.
CUSTOMERS(Customer_Number, Customer_Name)
ORDERS(Order_Number, Customer_Number*, Order_Date)
ORDER_PARTS(Part_Number, Order_Number*, Part_Quantity)
PARTS(Part_Number, Supplier_Number*, Part_Description, Part_Price)
SUPPLIERS(Supplier_Number, Supplier_Name)

Sample Example Of E-R Diagram


Company:
Organized into Departments, Each Department has a name, number and manager who
manages the department. The Company keeps track of the date that the employee
manages the department. A Department may have a several locations.
Department:
A Department controls a number of Projects each of which has a unique name, number
and a single Location.
Employee:
Name, Age, Gender, BirthDate, SSN, Address, Salary. An Employee is assigned to one
department, may work on several projects, which are not controlled by the department.
Track of the number of hours per week is also controlled.
Keep track of the dependents of each employee for insurance policies: We keep each
dependant first name, gender, Date of birth and relationship to the employee.

© SQL Star International Ltd. 2008 78


Exercises

Example:

Manage:
Department and Employee
Partial Participation
Relation Attribute : StartDate.
Works For:
Department and Employee
Total Participation

Control :
Department , Project
Partial Participation from Department
Total Participation from Project
Control Department is a RKA.

Supervisor :
Employee, Employee
Partial and Recursive

© SQL Star International Ltd. 2008 79


Exercises

Works–On:
Project , Employee
Total Participation
Hours Worked is a RKA.
Dependants of:
Employee , Dependant
Dependant is a Weaker
Dependant is Total , Employee is Partial.

Summary of Conceptual Design


Conceptual design follows requirements analysis. It yields a high-level description of data
to be stored.
ER model is popular for conceptual design. Its constructs are expressive, close to the way
people think about their applications.
Basic constructs: Entities, Relationships, and Attributes (of entities and relationships).
Some additional constructs are: Weak entities, ISA hierarchies, and Aggregation.
Note: There are many variations on ER model.

Summary of ER
Several kinds of integrity constraints can be expressed in the ER model: Key constraints,
Participation constraints, and Overlap/Covering constraints for ISA hierarchies. Some
Foreign key constraints are also implicit in the definition of a relationship set.
• Some of these constraints can be expressed in SQL only if we use general
CHECK constraints or assertions.

© SQL Star International Ltd. 2008 80


Exercises

• Some constraints (notably, functional dependencies) cannot be expressed in the ER


model.
• Constraints play an important role in determining the best database design for an
enterprise.
ER design is subjective. There are often many ways to model a given scenario! Analyzing
alternatives can be tricky, especially for a large enterprise. Common choices include:
Entity vs. attribute, entity vs. relationship, binary or n- ary relationship, whether or not
to use ISA hierarchies, and whether or not to use aggregation.
Ensuring good database design: Resulting relational schema should be analyzed and
refined further. FD information and normalization techniques are especially useful.

© SQL Star International Ltd. 2008 81


Exercises

Case Studies

1. Prescriptions-R-X chain
The Prescriptions-R-X chain of pharmacies has offered to give you a free lifetime supply of
medicines if you design its database. Given the rising cost of health care, you agree.
Here's the information that you gather:
Patients are identified by an SSN, and their names, addresses, and ages must be
recorded.
Doctors are identified by an SSN. For each doctor, the name, specialty, and years of
experience must be recorded.
Each pharmaceutical company is identified by name and has a phone number.
For each drug, the trade name and formula must be recorded. Each drug is sold by a given
pharmaceutical company, and the trade name identifies a drug uniquely from among the
products of that company. If a pharmaceutical company is deleted, you no longer need to
keep track of its products.
Each pharmacy has a name, address, and phone number.
Every patient has a primary physician. Every doctor has at least one patient.
Each pharmacy sells several drugs and has a price for each. A drug could be sold at
several pharmacies, and the price could vary from one pharmacy to another.
Doctors prescribe drugs for patients. A doctor could prescribe one or more drugs for
several patients, and a patient could obtain prescriptions from several doctors. Each
prescription has a date and a quantity associated with it. You can assume that if a doctor
prescribes the same drug for the same patient more than once, only such last prescription
needs to be stored.
Pharmaceutical companies have long-term contracts with pharmacies. A pharmaceutical
company can contract with several pharmacies, and a pharmacy can contract with several
pharmaceutical companies. For each contract, you have to store a start date, an end date,
and the text of the contract.
Pharmacies appoint a supervisor for each contract. There must always be a supervisor for
each contract, but the contract supervisor can change over the lifetime of the contract.
1. Draw an ER diagram that captures the above information. Identify any constraints that
are not captured by the ER diagram.
2. How would your design change if each drug must be sold at a fixed price by all
pharmacies?
3. How would your design change if the design requirements change as follows: If a
doctor prescribes the same drug for the same patient more than once, several such
prescriptions may have to be stored.

2. Dane County Airport

© SQL Star International Ltd. 2008 82


Exercises

Computer Sciences Department has been frequently complaining to Dane County Airport
officials about the poor organization at the airport. As a result, the officials have decided
that all information related to the airport should be organized using a DBMS, and you've
been hired to design the database. Your first task is to organize the information about all
the airplanes that are stationed and maintained at the airport.
The relevant information is as follows:
• Every airplane has a registration number, and each airplane is of a specific model.
• The airport accommodates a number of airplane models, and each model is
identified by a model number (e.g., DC-10) and has a capacity and a weight.
• A number of technicians work at the airport. You need to store the name, SSN,
address, phone number, and salary of each technician.
• Each technician is an expert on one or more plane model(s), and his or her
expertise may overlap with that of other technicians. This information about
technicians must also be recorded.
• Traffic controllers must have an annual medical examination. For each Traffic
controller, you must store the date of the most recent exam.
• All airport employees (including technicians) belong to a union. You must store the
union membership number of each employee. You can assume that each employee
is uniquely identified by the social security number.
• The airport has a number of tests that are used periodically to ensure that airplanes
are still airworthy. Each test has a Federal Aviation Administration (FAA) test
number, a name, and a maximum possible score.
• The FAA requires the airport to keep track of each time that a given airplane is
tested by a given technician using a given test. For each testing event, the
information needed is the date, the number of hours the technician spent doing the
test, and the score that the airplane received on the test.
1. Draw an ER diagram for the airport database. Be sure to indicate the various attributes
of each entity and relationship set; also specify the key and participation constraints
for each relationship set. Specify any necessary overlap and covering constraints as
well (in English).
2. The FAA passes a regulation that tests on a plane must be conducted by a technician
who is an expert on that model. How would you express this constraint in the ER
diagram? If you cannot express it, explain briefly.

3. University Database

© SQL Star International Ltd. 2008 83


Exercises

Consider the following information about a university database:


• Professors have an SSN, a name, an age, a rank, and a research speciality.
• Projects have a project number, a sponsor name (e.g., NSF), a starting date, an
ending date and a budget.
• Graduate students have an SSN, a name, an age and a degree program (e.g., M.S. or
Ph.D.).
• Each project is managed by one professor (known as the project's principal
investigator).
• Each project is worked on by one or more professors (known as the project's co-
investigators).
• Professors can manage and/or work on multiple projects.
• Each project is worked on by one or more graduate students (known as the project's
research assistants).
• When graduate students work on a project, a professor must supervise their work on
the project. Graduate students can work on multiple projects, in which case they will
have a (potentially different) supervisor for each one.
• Departments have a department number, a department name and a main office.
Departments have a professor (known as the chairman) who runs the department.
• Professors work in one or more departments, and for each department that they work
in, a time percentage is associated with their job.
• Graduate students have one major department in which they are working on their
degree.
• Each graduate student has another, more senior graduate student (known as a student
advisor) who advises him or her on what courses to take.
Design and draw an ER diagram that captures the information about the university.
Use only the basic ER model here that is, entities, relationships, and attributes. Be sure to
indicate any key and participation constraints.

© SQL Star International Ltd. 2008 84

Potrebbero piacerti anche