Sei sulla pagina 1di 99

Fundamentals of Database Management 1

Table of contents

CHAPTER 1: INTRODUCTION TO RDBMS TECHNOLOGIES ................................................ 8

Introduction to Database Management System Concepts .................................................................... 8


What is a Database? ............................................................................................................................. 8
What is a DBMS? ................................................................................................................................. 8
Components of a DBMS ...................................................................................................................... 8
Characteristics of Data in a Database ................................................................................................... 9
Types of Database Management Systems ............................................................................................ 9
Introduction .......................................................................................................................................... 9
Relational Model .................................................................................................................................. 9
Properties of Relational Tables: ......................................................................................................... 10
Advantages ......................................................................................................................................... 10
Disadvantages..................................................................................................................................... 10
Network Model .................................................................................................................................. 10
Advantages ......................................................................................................................................... 11
Disadvantages..................................................................................................................................... 11
Hierarchical Model ............................................................................................................................. 11
Advantages ......................................................................................................................................... 11
Disadvantages..................................................................................................................................... 11
Object Oriented Data Models ............................................................................................................. 12
Advantages ......................................................................................................................................... 12
Disadvantages..................................................................................................................................... 12
Semistructured Model ........................................................................................................................ 12
Associative Model .............................................................................................................................. 13
Entity-Attribute-Value (EAV) data model ......................................................................................... 13
Context Model .................................................................................................................................... 13
Advantages of DBMS .......................................................................................................................... 13
Redundancies and inconsistencies can be reduced ............................................................................ 14
Better service to the Users .................................................................................................................. 14
Flexibility of the system is improved ................................................................................................. 15
Cost of developing and maintaining systems is lower ....................................................................... 15
Standards can be enforced .................................................................................................................. 15
Security can be improved ................................................................................................................... 15
2 Fundamentals of Database Management 

Integrity can be improved .................................................................................................................. 15


Enterprise requirements can be identified .......................................................................................... 16
Data model must be developed .......................................................................................................... 16

CHAPTER 2 : Database Design ........................................................................................................ 17

Introduction ........................................................................................................................................ 17
Design Process ................................................................................................................................... 17
Determining data to be stored ............................................................................................................ 17
Conceptual schema ............................................................................................................................. 18
Logically structuring data................................................................................................................... 18
Physical database design .................................................................................................................... 18
Difference between a Database System and a File System .............................................................. 18
Introduction ........................................................................................................................................ 18
Self-Describing Nature of a Database System ................................................................................... 19
Insulation Between Programs And Data ............................................................................................ 19
Support of Multiple Views of the Data .............................................................................................. 19
Sharing Of Data and Multi-User Transaction Processing .................................................................. 19
Moving to Relational Model............................................................................................................... 19
Introduction ........................................................................................................................................ 19
Schema ............................................................................................................................................... 19
Subschema .......................................................................................................................................... 20
Levels of Abstraction ......................................................................................................................... 20
Data Independence ............................................................................................................................. 20
Relation .............................................................................................................................................. 20
Types of Relationship......................................................................................................................... 20
One-to-one relationships .................................................................................................................... 21
One-to-many relationships ................................................................................................................. 21
Many-to-many relationships .............................................................................................................. 21
The Relational Data Structure ............................................................................................................ 22
Relational Data Integrity .................................................................................................................... 23
Integrity Constraints ........................................................................................................................... 24
Domain Constraints ............................................................................................................................ 24
Referential Integrity ........................................................................................................................... 25
Operational Constraints ...................................................................................................................... 25
CODD’S Rules ................................................................................................................................... 25
Fundamentals of Database Management 3

CHAPTER 3: Relational Algebra ..................................................................................................... 27

A Brief Introduction ........................................................................................................................... 27


What? Why? ....................................................................................................................................... 27
The Basic Operations in Relational Algebra ...................................................................................... 27
Selection Operation σ ......................................................................................................................... 28
SELECT S WHERE CITY = 'PARIS' ............................................................................................... 29
Project Operation................................................................................................................................ 30
PROJECT S OVER CITY.................................................................................................................. 31
PROJECT S OVER SNAME, STATUS ............................................................................................ 32
Sequences Of Operations ................................................................................................................... 32
Part names where weight is less than 17: ........................................................................................... 32
Renaming Operation .......................................................................................................................... 32
The Cartesian Product ........................................................................................................................ 33
Division .............................................................................................................................................. 34
Using basic operations: ...................................................................................................................... 34
Theta-join:- ......................................................................................................................................... 35
Equi-join:- .......................................................................................................................................... 35
Outer Joins:- ....................................................................................................................................... 35
Natural Joins....................................................................................................................................... 36
Set Operations .................................................................................................................................... 36

CHAPTER 4 : Relational Calculus ................................................................................................... 38

Introduction ........................................................................................................................................ 38
Why It Is Called Relational Calculus? ............................................................................................... 38
Tuple Calculus.................................................................................................................................... 38
Domain Calculus ................................................................................................................................ 39
Analogies ............................................................................................................................................ 40
Entity .................................................................................................................................................. 40
Attribute ............................................................................................................................................. 40
Single Valued vs. Multi Valued ......................................................................................................... 41
Database Architecture Explained ...................................................................................................... 41
Types of Database Architecture ......................................................................................................... 41
Two-Tier Architecture (Client-Server Architecture) ......................................................................... 42
4 Fundamentals of Database Management 

Presentation Services.......................................................................................................................... 42
Business Services/objects................................................................................................................... 42
Application Services .......................................................................................................................... 42
Advantages of Two-tier Architecture ................................................................................................. 43
Drawbacks of Two-tier Architecture.................................................................................................. 43
Three-tier Architecture ....................................................................................................................... 43
Multitier Architecture ......................................................................................................................... 44
E-R Diagrams ...................................................................................................................................... 46
Introducing E/R Diagram ................................................................................................................... 46
Analogies ............................................................................................................................................ 46
Entity .................................................................................................................................................. 46
Attribute ............................................................................................................................................. 46
Movie World Example: ...................................................................................................................... 46
Student World Example: .................................................................................................................... 47
Single Valued vs. Multi Valued ......................................................................................................... 47
Movie World Example: ...................................................................................................................... 47
Student World Example: .................................................................................................................... 47
Movie World Example: ...................................................................................................................... 47
Student World Example: .................................................................................................................... 47
E-R Diagrams ..................................................................................................................................... 48
An aside on null values ...................................................................................................................... 48
Symbols Used In E-R Diagrams ........................................................................................................ 49
Entity Type ......................................................................................................................................... 49
Movie World Example: ...................................................................................................................... 49
Student World Example: .................................................................................................................... 50
Key Attributes .................................................................................................................................... 50
Movie World Example: ..................................................................................................................... 50
Student World Example: .................................................................................................................... 50
Relationship ........................................................................................................................................ 50
Movie World Example: .................................................................................................................... 50
Student World Example: .................................................................................................................... 50
Relationship Type .............................................................................................................................. 50
Movie World Example: ...................................................................................................................... 51
Student World Example: .................................................................................................................... 51
Cardinality Ratio ................................................................................................................................ 51
Movie World Example: ...................................................................................................................... 52
Fundamentals of Database Management 5

Student World Example: .................................................................................................................... 52


Movie World Example: ...................................................................................................................... 52
Student World Example: .................................................................................................................... 52
Movie World Example ...................................................................................................................... 53
Student World Example ..................................................................................................................... 53
Weak Entity Type............................................................................................................................... 53
Movie World Example: .................................................................................................................... 53
Example of an E-R Diagram .............................................................................................................. 54
Data Flow Diagram ............................................................................................................................ 55
The process specification: .................................................................................................................. 56
Functional Dependencies .................................................................................................................... 60
Introduction ........................................................................................................................................ 60
What Is Functional Dependency In A Relation? ................................................................................ 60
Identifying Functional Dependencies................................................................................................. 61
Trivial Functional Dependencies ....................................................................................................... 62
Inference Rules for Functional Dependencies ................................................................................... 62

Chapter 5 : Normalization ................................................................................................................. 63

Analysis of Redundancies .................................................................................................................. 63


Deciding About Redundancies ........................................................................................................... 63
Issues Related To Redundancies (Anomalies) ................................................................................... 63
Insertion Anomalies ........................................................................................................................... 63
Emp_Dept........................................................................................................................................... 63
THE RELATIONAL MODEL........................................................................................................... 64
INTRODUCTION TO THE RELATIONAL MODEL ..................................................................... 65
The Relational Model ......................................................................................................................... 65
Creating and Modifying Relations Using SQL-92 ............................................................................. 67
INTEGRITY CONSTRAINTS OVER RELATIONS ....................................................................... 69
Specifying Key Constraints in SQL-92 .............................................................................................. 70
CONSTRAINT Students Key PRIMARY KEY (sid) ) ..................................................................... 70
Students (Referenced relation) ........................................................................................................... 72
Specifying Foreign Key Constraints in SQL-92 ................................................................................ 72
General Constraints ............................................................................................................................ 72
ENFORCING INTEGRITY CONSTRAINTS .................................................................................. 73
QUERYING RELATIONAL DATA................................................................................................. 75
6 Fundamentals of Database Management 

LOGICAL DATABASE DESIGN: ER TO RELATIONAL ............................................................ 77


Entity Sets to Tables ........................................................................................................................... 77
Relationship Sets (without Constraints) to Tables ............................................................................. 78
Translating Relationship Sets with Key Constraints .......................................................................... 80
Translating Relationship Sets with Participation Constraints ............................................................ 81
Normalization ..................................................................................................................................... 82
Design versus Implementation ........................................................................................................... 83
Normalized Design: Pros and Cons ................................................................................................... 83
Pros of Normalizing: .......................................................................................................................... 83
Cons of Normalizing: ......................................................................................................................... 83
Terminology ....................................................................................................................................... 83
Formal Definitions of the Normal Forms ........................................................................................... 84
Steps to Normalize a Table ................................................................................................................ 87
Understanding Database Instance ..................................................................................................... 87
Understanding Database Language .................................................................................................. 87
Explaining Database Security ............................................................................................................ 88
What is Database Security? ................................................................................................................ 88
Discretionary Access Control............................................................................................................. 88
User Roles .......................................................................................................................................... 89
Setting Permission to Create Databases ............................................................................................. 89
Security for External Routines (UDRs) ............................................................................................. 89
Enabling non-DBSAs to View SQL Statements a Session Is Executing ........................................... 89
Mandatory Access .............................................................................................................................. 89
Statistical Databases ........................................................................................................................... 89
Security in Statistical Databases ........................................................................................................ 90
Data Encryption.................................................................................................................................. 90

Chapter 6 : Writing Queries Using SQL .......................................................................................... 91

A Brief History of SQL ...................................................................................................................... 91


Current State....................................................................................................................................... 91
SQL Data Definition Statements ....................................................................................................... 91
Table space Creation .......................................................................................................................... 91
Semantics ........................................................................................................................................... 93
BIGFILE | SMALLFILE .................................................................................................................... 93
Table space Management ................................................................................................................... 93
Fundamentals of Database Management 7

Introduction to Tablespaces, Datafiles, and Control Files ................................................................. 93


So What is SQL? ................................................................................................................................. 95
SQL Commands ................................................................................................................................. 95
Characteristics Of SQL Commands ................................................................................................... 95
SQL Data Definition Language (DDL) .............................................................................................. 95
How to Modify Table ......................................................................................................................... 96
SQL Data Manipulation Language (DML) ........................................................................................ 96
Transaction Control Language(TCL) ................................................................................................. 98
8 Fundamentals of Database Management 

CHAPTER 1: INTRODUCTION TO RDBMS TECHNOLOGIES

Introduction to Database Management System Concepts

What is a Database?

A Computer Database is a structured collection of records or data that is stored in a computer system. The
structure is achieved by organizing the data according to a database model. The model in most common
use today is the relational model. Other models such as the hierarchical model and the network model use
a more explicit representation of relationships.

What is a DBMS?

As one of the oldest components associated with computers, the database management system, or DBMS,
is a computer software program that is designed as the means of managing all databases that are
currently installed on a system hard drive or network. Different types of database management systems
exist, with some of them designed for the oversight and proper control of databases that are configured
for specific purposes. Here are some examples of the various incarnations of DBMS technology that are
currently in use, and some of the basic elements that are part of DBMS software applications.

As the tool that is employed in the broad practice of managing databases, the DBMS is marketed in many
forms. Some of the more popular examples of DBMS solutions include Microsoft Access, FileMaker, DB2,
and Oracle. All these products provide for the creation of a series of rights or privileges that can be
associated with a specific user. This means that it is possible to designate one or more database
administrators who may control each function, as well as provide other users with various levels of
administration rights. This flexibility makes the task of using DBMS methods to oversee a system
something that can be centrally controlled, or allocated to several different people.

Components of a DBMS

There are four essential elements that are found with just about every example of DBMS currently on the
market. The first is the implementation of a modeling language that serves to define the language of each
database that is hosted via the DBMS. There are several approaches currently in use, with hierarchical,
network, relational, and object examples. Essentially, the modeling language ensures the ability of the
databases to communicate with the DBMS and thus operate on the system.

Second, data structures also are administered by the DBMS. Examples of data that are organized by this
function are individual profiles or records, files, fields and their definitions, and objects such as visual
media. Data structures are what allows DBMS to interact with the data without causing and damage to the
integrity of the data itself.

A third component of DBMS software is the data query language. This element is involved in maintaining
the security of the database, by monitoring the use of login data, the assignment of access rights and
Fundamentals of Database Management 9

privileges, and the definition of the criteria that must be employed to add data to the system. The data
query language works with the data structures to make sure it is harder to input irrelevant data into any
of the databases in use on the system.

Last, a mechanism that allows for transactions is an essential basic for any DBMS. This helps to allow
multiple and concurrent access to the database by multiple users, prevents the manipulation of one record
by two users at the same time, and preventing the creation of duplicate records.

Characteristics of Data in a Database

1. You may be interested to know the characteristics of data in a Database. They are,
2. Shared - Data in database are shared among Different users and applications.
3. Persistence – Data in a database exist permanently in the sense that data can live beyond the
4. scope of the process that created it.
5. Correctness – Data should be correct
6. Security – Data should be protected from Un-Authorized access.
7. Consistency- Whenever more than one data element in a database represents real-
8. world
9. values, the values should be protected from unauthorized access.
10. Non-redundancy – No two data items in a database should represent the same real world
11. entity.

Types of Database Management Systems

Introduction

A DBMS can take any one of the several approaches to manage data. Each approach constitutes
a database model. A data model is a collection of descriptions of data structures and their contained fields,
together with the operations or functions that manipulate them. A data model is a comprehensive
scheme for describing how data is to be represented for manipulation by humans or computer
programs. A thorough representation details the types of data, the topological arrangements of data,
spatial and temporal maps onto which data can be projected, and the operations and structures that can
be invoked to handle data and its maps. The various Database Models are the following:-

 Relational – data model based on tables.


 Network – data model based on graphs with records as nodes and relationships between records
as edges.
 Hierarchical – data model based on trees.
 Object-Oriented – data model based on the object-oriented programming paradigm.

Relational Model

A database model that organizes data logically in tables. A formal theory of data consisting of three major
components: (a) A structural aspect, meaning that data in the database is perceived as tables, and only
tables, (b) An integrity aspect, meaning that those tables satisfy certain integrity constraints, and
(c) A manipulative aspect, meaning that the tables can be operated upon by means of operators which
derive tables from tables. Here each table corresponds to an application entity and each row
represents an instance of that entity. (RDBMS - relational database management system) A database
based on the relational model was developed by E.F. Codd.

A relational database allows the definition of data structures, storage and retrieval operations and
integrity constraints. In such a database the data and relations between them are organized in tables. A
table is a collection of records and each record in a table contains the same fields.
10 Fundamentals of Database Management 

Properties of Relational Tables:

 Values Are Atomic


 Each Row is Unique
 Column Values Are of the Same Kind
 The Sequence of Columns is Insignificant
 The Sequence of Rows is Insignificant
 Each Column Has a Unique Name

Certain fields may be designated as keys, which mean that searches for specific values of that field will
use indexing to speed them up. Often, but not always, the fields will have the same name in both tables.
For example, an "orders" table might contain (customer-ID, product-code) pairs and a "products" table
might contain (product-code, price) pairs so to calculate a given customer's bill you would sum the prices
of all products ordered by that customer by joining on the product-code fields of the two tables. This can
be extended to joining multiple tables on multiple fields. Because these relationships are only specified at
retrieval time, relational databases are classed as dynamic database management system. The
RELATIONAL database model is based on the Relational Algebra.

Advantages

 Structural Independence
 Conceptual Simplicity
 Ease of design, implementation, maintenance and usage.
 Ad hoc query capability

Disadvantages

 Hardware Overheads
 Ease of design can lead to bad design

Network Model

The popularity of the network data model coincided with the popularity of the hierarchical data model.
Some data were more naturally modelled with more than one parent per child. So, the network model
permitted the modelling of many-to-many relationships in data. In 1971, the Conference on Data
Systems Languages (CODASYL) formally defined the network model. The basic data modelling
construct in the network model is the set construct. A set consists of an owner record type, a set name,
and a member record type.

A member record type in the Network Model can have that role in more than one set; hence the
multiparent concept is supported. An owner record type can also be a member or owner in another set.
The data model is a simple network, and link and intersection record types (called junction records by
IDMS) may exist, as well as sets between them . Thus, the complete network of relationships is
represented by several pair wise sets; in each set some (one) record type is owner (at the tail of the
network arrow) and one or more record types are members (at the head of the relationship arrow).
Usually, a set defines a 1:M relationship, although 1:1 is permitted. The CODASYL network model is based
on mathematical set theory.
Fundamentals of Database Management 11

Advantages

 Conceptual Simplicity
 Ease of data access
 Data Integrity and capability to handle more relationship types
 Data independence
 Database standards

Disadvantages

 System complexity
 Absence of structural independence

Hierarchical Model

The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child
data segments. This structure implies that a record can have repeating information, generally in the child
data segments. Data in a series of records, which have a set of field values attached to it. It
collects all the instances of a specific record together as a record type. These record types are the
equivalent of tables in the relational model, and with the individual records being the equivalent of rows.

In a Hierarchical model you could create links between these record types; the hierarchical model
uses Parent Child Relationships. These are a 1: N mapping between record types. This is done by using
trees, like set theory used in the relational model, "borrowed" from maths. For example, an organization
might store information about an employee, such as name, employee number, department, salary. The
organization might also store information about an employee's children, such as name and date of birth.
The employee and children data forms a hierarchy, where the employee data represents the parent
segment and the children data represents the child segment.

If an employee has three children, then there would be three child segments associated with one
employee segment. In a hierarchical database the parent-child relationship is one to many. This restricts a
child segment to having only one parent segment. Hierarchical DBMSs were popular from the late
1960s, with the introduction of IBM's Information Management System (IMS) DBMS, through the
1970s.

Advantages

 Simplicity
 Data Security and Data Integrity
 Efficiency

Disadvantages

 Implementation Complexity
 Lack of structural independence
 Programming complexity
12 Fundamentals of Database Management 

Object Oriented Data Models

Object DBMSs add database functionality to object programming languages. They bring much more than
persistent storage of programming language objects. Object DBMSs extend the semantics of the
C++, Smalltalk and Java object programming languages to provide full- featured database
programming capability, while retaining native language compatibility. A major benefit of this
approach is the unification of the application and database development into a seamless data model
and language environment. As a result, applications require less code, use more natural data
modeling, and code bases are easier to maintain. Object developers can write complete database
applications with a modest amount of additional effort.

In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or
joined together from those tables to form the in-memory structure, object DBMSs have no performance
overhead to store or retrieve a web or hierarchy of interrelated objects. This one-to-one mapping of object
programming language objects to database objects has two benefits over other storage approaches: it
provides higher performance management of objects, and it enables better management of the
complex interrelationships between objects. This makes object DBMSs better suited to support
applications such as financial portfolio risk analysis systems, telecommunications service
applications, World Wide Web document structures, design and manufacturing systems, and hospital
patient record systems, which have complex relationships between data.

Advantages

 Capability to handle large number of different data types


 Marriage of object-oriented programming and database technology
 Data access

Disadvantages

 Difficult to maintain Not suited for all applications

Semistructured Model

In semi-structured data model, the information that is normally associated with a schema is contained
within the data, which is sometimes called ``self-describing''. In such database there is no clear
separation between the data and the schema, and the degree to which it is structured depends on the
application. In some forms of semistructured data there is no separate schema, in others it exists but only
places loose constraints on the data. Semi-structured data is naturally modelled in terms of graphs which
contain labels which give semantics to its underlying structure. Such databases subsume the modelling
power of recent extensions of flat relational databases, to nested databases which allow the nesting (or
encapsulation) of entities, and to object databases which, in addition, allow cyclic references between
objects.

Semistructured data has recently emerged as an important topic of study for a variety of reasons. First,
there are data sources such as the Web, which we would like to treat as databases but which cannot be
constrained by a schema. Second, it may be desirable to have an extremely flexible format for data
exchange between disparate databases. Third, even when dealing with structured data, it may be helpful
to view it as semi-structured for the purposes of browsing.
Fundamentals of Database Management 13

Associative Model

The associative model divides the real-world things about which data is to be recorded into two sorts:
Entities are things that have discrete, independent existence. An entity‘s existence does not depend on
any other thing. Associations are things whose existence depends on one or more other things, such that
if any of those things ceases to exist, then the thing itself ceases to exist or becomes meaningless.
An associative database comprises two data structures:
 A set of items, each of which has a unique identifier, a name and a type.
 A set of links, each of which has a unique identifier, together with the unique identifiers of three
other things, that represent the source, verb and target of a fact that is recorded about the source in the
database. Each of the three things identified by the source, verb and target may be either a link or an
item.

Entity-Attribute-Value (EAV) data model

The best way to understand the rationale of EAV design is to understand row modelling (of which EAV is a
generalized form). Consider a supermarket database that must manage thousands of products and
brands, many of which have a transitory existence. Here, it is intuitively obvious that product names
should not be hard-coded as names of columns in tables. Instead, one stores product descriptions in a
Products table: purchases/sales of individual items are recorded in other tables as separate rows with a
product ID referencing this table. Conceptually an EAV design involves a single table with three columns,
an entity (such as an olfactory receptor ID), an attribute (such as species, which is actually a pointer into
the metadata table) and a value for the attribute (e.g., rat). In EAV design, one row stores a single fact.
In a conventional table that has one column per attribute, by contrast, one row stores a set of facts. EAV
design is appropriate when the number of parameters that potentially apply to an entity is vastly more
than those that actually apply to an individual entity.

Context Model

The context data model combines features of all the above models. It can be considered as a collection of
object-oriented, network and semi-structured models or as some kind of object database. In other words
this is a flexible model, you can use any type of database structure depending on task. Such data model
has been implemented in DBMS Context.

The fundamental unit of information storage of Context is a CLASS. Class contains METHODS and
describes OBJECT. The Object contains FIELDS and PROPERTY. The field may be composite, in this case
the field contains Sub Fields etc. The property is a set of fields that belongs to particular Object. (similar
to AVL database). In other words, fields are permanent part of Object but Property is its variable part.

The header of Class contains the definition of the internal structure of the Object, which includes the
description of each field, such as their type, length, attributes and name. Context data model has a set of
predefined types as well as user defined types. The predefined types include not only character strings,
texts and digits but also pointers (references) and aggregate types (structures).

Advantages of DBMS

There are three main features of a database management system that make it attractive to use a DBMS in
preference to more conventional software. These features are centralized data management, data
independence, and systems integration.
14 Fundamentals of Database Management 

In a database system, the data is managed by the DBMS and all access to the data is through the DBMS
providing a key to effective data processing. This contrasts with conventional data processing systems
where each application program has direct access to the data it reads or manipulates. In a conventional
DP system, an organization is likely to have several files of related data that are processed by several
different application programs.

In the conventional data processing application programs, the programs usually are based on a
considerable knowledge of data structure and format. In such environment any change of data structure
or format would require appropriate changes to the application programs. These changes could be as
small as the following:

 Coding of some field is changed. For example, a null value that was coded as -1 is
 now coded as -9999.
 A new field is added to the records.
 The length of one of the fields is changed. For example, the maximum number of
 digits in a telephone number field or a postcode field needs to be changed.
 The field on which the file is sorted is changed.

If some major changes were to be made to the data, the application programs may need to be rewritten.
In a database system, the database management system provides the interface between the application
programs and the data. When changes are made to the data representation, the metadata maintained by
the DBMS is changed but the DBMS continues to provide data to application programs in the previously
used way. The DBMS handles the task of transformation of data wherever necessary.

This independence between the programs and the data is called data independence. Data independence is
important because every time some change needs to be made to the data structure, the programs that
were being used before the change would continue to work. To provide a high degree of data
independence, a DBMS must include a sophisticated metadata management system.

In DBMS, all files are integrated into one system thus reducing redundancies and making data
management more efficient. In addition, DBMS provides centralized control of the operational data. Some
of the advantages of data independence, integration and centralized control are:

Redundancies and inconsistencies can be reduced

In conventional data systems, an organization often builds a collection of application programs often
created by different programmers and requiring different components of the operational data of the
organisation. The data in conventional data systems is often not centralised. Some applications may
require data to be combined from several systems. These several systems could well have data that is
redundant as well as inconsistent (that is, different copies of the same data may have different values).
Data inconsistencies are often encountered in everyday life. For example, we have all come across
situations when a new address is communicated to an organisation that we deal with (e.g. a bank, or
Telecom, or a gas company), we find that some of the communications from that organisation are
received at the new address while others continue to be mailed to the old address. Combining all the data
in a database would involve reduction in redundancy as well as inconsistency. It also is likely to reduce the
costs for collection, storage and updating of data.

Better service to the Users

A DBMS is often used to provide better service to the users. In conventional systems, availability of
information is often poor since it normally is difficult to obtain information that the existing systems were
not designed for. Once several conventional systems are combined to form one centralised data base, the
Fundamentals of Database Management 15

availability of information and its up-todateness is likely to improve since the data can now be shared and
the DBMS makes it easy to respond to unforeseen information requests.
Centralizing the data in a database also often means that users can obtain new and combined information
that would have been impossible to obtain otherwise. Also, use of a DBMS should allow users that do not
know programming to interact with the data more easily.

The ability to quickly obtain new and combined information is becoming increasingly important in an
environment where various levels of governments are requiring organizations to provide more and more
information about their activities. An organization running a conventional data processing system would
require new programs to be written (or the information compiled manually) to meet every new demand.

Flexibility of the system is improved

Changes are often necessary to the contents of data stored in any system. These changes are more easily
made in a database than in a conventional system in that these changes do not need to have any impact
on application programs.

Cost of developing and maintaining systems is lower

As noted earlier, it is much easier to respond to unforeseen requests when the data is centralized in a
database than when it is stored in conventional file systems. Although the initial cost of setting up of a
database can be large, one normally expects the overall cost of setting up a database and developing and
maintaining application programs to be lower than for similar service using conventional systems since the
productivity of programmers can be substantially higher in using non-procedural languages that have
been developed with modern DBMS than using procedural languages.

Standards can be enforced

Since all access to the database must be through the DBMS, standards are easier to enforce. Standards
may relate to the naming of the data, the format of the data, the structure of the data etc.

Security can be improved

In conventional systems, applications are developed in an ad hoc manner. Often different system of an
organisation would access different components of the operational data. In such an environment,
enforcing security can be quite difficult.

Setting up of a database makes it easier to enforce security restrictions since the data is now centralized.
It is easier to control who has access to what parts of the database. However, setting up a database can
also make it easier for a determined person to breach security. We will discuss this in the next section.

Integrity can be improved

Since the data of the organization using a database approach is centralized and would be used by a
number of users at a time, it is essential to enforce integrity controls.
16 Fundamentals of Database Management 

Integrity may be compromised in many ways. For example, someone may make a mistake in data input
and the salary of a full-time employee may be input as $4,000 rather than $40,000. A student may be
shown to have borrowed books but has no enrolment. Salary of a staff member in one department may be
coming out of the budget of another department.

If a number of users are allowed to update the same data item at the same time, there is a possibility that
the result of the updates is not quite what was intended. For example, in an airline DBMS we could have a
situation where the number of bookings made is larger than the capacity of the aircraft that is to be used
for the flight. Controls therefore must be introduced to prevent such errors to occur because of concurrent
updating activities. However, since all data is stored only once, it is often easier to maintain integrity than
in conventional systems.

Enterprise requirements can be identified

All enterprises have sections and departments and each of these units often consider the work of their unit
as the most important and therefore consider their needs as the most important. Once a database has
been set up with centralized control, it will be necessary to identify enterprise requirements and to
balance the needs of competing units. It may become necessary to ignore some requests for information if
they conflict with higher priority needs of the enterprise.

Data model must be developed

Perhaps the most important advantage of setting up a database system is the requirement that an overall
data model for the enterprise be built. In conventional systems, it is more likely that files will be designed
as needs of particular applications demand. The overall view is often not considered. Building an overall
view of the enterprise data, although often an expensive exercise, is usually very cost-effective in the long
term.
Fundamentals of Database Management 17

CHAPTER 2 : Database Design

Introduction

Database design is the process of producing a detailed data model of a database. This logical data model
contains all the needed logical and physical design choices and physical storage parameters needed to
generate a design in a Data Definition Language, which can then be used to create a database. A fully
attributed data model contains detailed attributes for each entity.

The term database design can be used to describe many different parts of the design of an overall
database system. Principally, and most correctly, it can be thought of as the logical design of the base
data structures used to store the data. In the relational model these are the tables and views. In an
Object database the entities and relationships map directly to object classes and named relationships.
However, the term database design could also be used to apply to the overall process of designing, not
just the base data structures, but also the forms and queries used as part of the overall database
application within the Database Management System or DBMS.

Design Process

The process of doing database design generally consists of a number of steps which will be carried out by
the database designer. Not all of these steps will be necessary in all cases. Usually, the designer must:

 Determine the data to be stored in the database


 Determine the relationships between the different data elements
 Superimpose a logical structure upon the data on the basis of these relationships.

Within the relational model the final step can generally be broken down into two further steps that of
determining the grouping of information within the system, generally determining what are the basic
objects about which information is being stored, and then determining the relationships between these
groups of information, or objects. This step is not necessary with an Object database.

The tree structure of data may enforce a hierarchical model organization, with a parentchild relationship
table. An Object database will simply use a one-to-many relationship between instances of an object class.
It also introduces the concept of a hierarchical relationship between object classes, termed inheritance

Determining data to be stored

In a majority of cases, the person who is doing the design of a database is a person with expertise in the
area of database design, rather than expertise in the domain from which the data to be stored is drawn
e.g. financial information, biological information etc. Therefore the data to be stored in the database must
be determined in cooperation with a person who does have expertise in that domain, and who is aware of
what data must be stored within the system.

This process is one which is generally considered part of requirements analysis, and requires skill on the
part of the database designer to elicit the needed information from those with the domain knowledge. This
is because those with the necessary domain knowledge frequently cannot express clearly what their
system requirements for the database are as they are unaccustomed to thinking in terms of the discrete
data elements which must be stored. Data to be stored can be determined by Requirement Specification.
18 Fundamentals of Database Management 

Conceptual schema

Once a database designer is aware of the data which is to be stored within the database, they must then
determine how the various pieces of that data relate to one another. When performing this step, the
designer is generally looking out for the dependencies in the data, where one piece of information is
dependent upon another i.e. when one piece of information changes, the other will also. For example, in a
list of names and addresses, assuming the normal situation where two people can have the same address,
but one person cannot have two addresses, the name is dependent upon the address, because if the
address is different then the associated name is different too. However, the inverse is not necessarily true,
i.e. when the name changes address may be the same.

(NOTE: A common misconception is that the relational model is so called because of the stating of
relationships between data elements therein. This is not true. The relational model is so named such
because it is based upon the mathematical structures known as relations.)

Logically structuring data

Once the relationships and dependencies amongst the various pieces of information have been
determined, it is possible to arrange the data into a logical structure which can then be mapped into the
storage objects supported by the database management system. In the case of relational databases the
storage objects are tables which store data in rows and columns.

Each table may represent an implementation of either a logical object or a relationship joining one or more
instances of one or more logical objects. Relationships between tables may then be stored as links
connecting child tables with parents. Since complex logical relationships are themselves tables they will
probably have links to more than one parent.

In an Object database the storage objects correspond directly to the objects used by the Object-oriented
programming language used to write the applications that will manage and access the data. The
relationships may be defined as attributes of the object classes involved or as methods that operate on
the object classes.

Physical database design

The physical design of the database specifies the physical configuration of the database on the storage
media. This includes detailed specification of data elements, data types, indexing options, and other
parameters residing in the DBMS data dictionary. It is the detailed design of system that includes modules
& the database's hardware & software specifications of the system.

Difference between a Database System and a File System


Introduction

In the database approach, a single repository of data is maintained that is defined once then accessed by
various users.

The major differences between Database and File are:

 Self-describing of a database
 Insulation between programs and data
Fundamentals of Database Management 19

 Support of multiple views of the data


 Sharing of data and multiuser transaction processing

Self-Describing Nature of a Database System

Database system contains not only the database itself but also a complete definition of the database
structure and constrains

The information stored in the catalog is called Meta-data (data about data), and it describes the structure
of the primary database.

Insulation Between Programs And Data

In file processing, if any changes to the structure of a file may require changing all programs that access
the file.

In database system, the structure of data files is stored in the DBMS catalog separately from the access
program. This is called program-data independence.

Support of Multiple Views of the Data

Each user may see a different view of the database, which describes only the data of interest to that user.

Sharing Of Data and Multi-User Transaction Processing

Allowing a set of concurrent users to retrieve from and to update the database. Concurrency control within
the DBMS guarantees that each transaction is correctly executed or aborted.

Moving to Relational Model


Introduction

The relational model is an abstract theory of data that is based on the mathematical theory whose
principles were laid down by Dr. E F Codd. The relational model of Codd used certain terms and principles.
The Relational data base management systems are based on the relational model. More precisely
relational model is concerned with the aspects of data, data structure, and data integrity and data
manipulation.

Here we discuss some basic concepts related to relational model.

Schema

A schema describes the organization of data and relationships within the database. A schema is
owned by a database user and has the same name as that user. A schema separates physical
aspects of data storage from logical aspects of data representation. The internal schema defines how and
where data are organized in physical data storage. The conceptual schema defines the stored data
20 Fundamentals of Database Management 

structure in terms of the database model used. The external schema defines a view or views of
database for particular users. An instance of a database is the data it contains at some particular time.

Subschema

That part of a database definition, to be viewed by particular applications, that describes all or a subset
of the data elements, record types, set types, and areas defined in the schema. It is basically a
portion of a schema - usually to show a particular user department's portion of the database. It
identifies a subset of areas, sets, records, and data names defined in the database schema available to
user sessions.

Levels of Abstraction

 Physical level: describes how a record (e.g., customer) is stored.


 Logical level: describes data stored in database, and the relationships among the

Data Independence

It is the ability to modify a schema definition in one level without affecting a schema definition
in the next higher level. The interfaces between the various levels and components should be
well defined so that changes in some parts do not seriously influence others.

Two levels of data independence:

 Physical data independence


 Logical data independence

Relation

A relation is a set of tuples. A database is a collection of relations. A relation is a Mathematical entity


corresponding to a table. Each row in a table represents a fact that corresponds to an entity or a
relationship that exists. Each row is called a tuple. Formally, the column headings of the table are the
attributes of a relation. Each attribute must be atomic. Each attribute has a domain. The domain must be
a simple data type, including for convenience strings, enumeration, dates, and sub range types. All tuples
in a relation have the same structure; constructed from the same set of attributes.

 row ~ tuple
 column ~ attribute
Values in a tuple are related to each other. Relation R can be thought of as a predicate R R(x, y, z) is true
if tuple (x, y, z) is in R.

Types of Relationship

In database theory, there are different types of relationships between data.


Fundamentals of Database Management 21

One-to-one relationships

Every student has a mobile telephone number (probably!), and every mobile telephone number
corresponds to just one person. There is a one-to-one relationship between students and mobile telephone
numbers. If we have a table whose entity is Student (i.e. a table with information about students), we
could simply add "Mobile number" as one of the fields in that table. However, we might not want to clutter
the table with this kind of information, so we might make another table showing simply the student ID
and the telephone number. We can then look up this information if we ever need it. The link between the
two tables Students and Mobiles would be a one-to-one link. You can represent such a link with an entity
relationship diagram:

Fig – Example of a One-to-One Relationship

One-to-many relationships

A student only has one Director of Studies in any one year (usually), but a Director of Studies can have
many different students. The relationship between Directors of Studies and Students is a one-to-many
relationship. So, if our database were to show the director of studies of each student, we could have a
table listing all the Directors of Studies of the Colleges, with their IDs (primary key) and any extra
information wanted (full names and contact information, for example).

DoS_ID Name College Telephone

rrb20 Brown, Dr Rachael King's 32179

You could then simply add a field "DoS" to your Students table (since a student can only have one DoS)
containing the DoS's ID, and link the two tables, DoS and Students, with a one-to-many relationship. The
linked fields are the DoS_IDs (which appear in both tables).

Stud_ID Name Matric. Mobile DoS_ID

frt20 Twome, Frida 2003 01734 568983 rrb20

It is useful to draw an entity relationship diagram to conceptualize this:

Fig – Example of a One-to-Many Relationship


Notice the way the little feet of the line go, indicating that one DoS has many Students.

Many-to-many relationships

There is a kind of relationship that needs special handling in relational databases, the many-to-many
relationship. One student may have many supervisors, but equally, one supervisor will have many
students. This poses a problem in terms of how to represent the relationship without resorting to
repeating attributes like this:
22 Fundamentals of Database Management 

Stud_I Matric Supervisor Supervisor Supervisor Supervisor Supervisor


Name
D . 1 2 3 4 5

Twome
frt20 2003 egk10 fpm20 llt101 hf2003 ffrt2
, Frida

If you find yourself wanting to put repeating attributes in a table, then it is a sure sign that there is
something wrong with your data structure. Imagine the complications here if the Supervisors table were
to list all the students taught by each supervisor: you would have to have an indeterminate number of
fields: Student1, Student2, Student3, Student4, ..., Student25 ...
The solution is to provide a third linking table, one which simply lists pairs of supervisors and supervisees.
In relational databases, many-to-many relationships always require a third linking table between the two
entities which are linked by this kind of relationship. An entity diagram shows how this works:

Fig – Example of a Many-to-Many Relationship


And this is what the linking table Students_Supervisors would look like:

Student_ID Supervisor_ID

frt20 egk10

lmnu1 rpu5

frt20 ull200

yt1001 egk10

This table might also contain Course Codes (FR9, SP5, etc.), in which case it would also be linked to the
Courses table with a crow's-foot line in the diagram. Note that it doesn't matter if a student ID or a
supervisor ID appears twice, in fact that's the whole point since a student can have many supervisors and
vice versa. This table doesn't need a primary key because the pairs of IDs together each form a unique
composite key. In case you think that entering data into this kind of table, with just IDs (and perhaps
Course Codes), would be error-prone, do not worry: a data entry form would present you with the
surnames, forenames and IDs of both students and supervisors, and Course Titles, as a drop-down-list to
choose from, and then would insert the appropriate IDs and codes into the table for you.

The Relational Data Structure

The smallest unit of data in the relational model is the individual data value. Such values are assumed to
be atomic, which means that they have no internal structure as far as the model is concerned. A domain is
a set of all possible data values. For example in supplier parts example, the domain of supplier numbers is
the set of all valid supplier numbers. Thus domains are pools of values, from which the actual values
appearing the attributes are drawn. The domain concept is a very important and integral part of relational
model. Now let us take a look at the relations...
Fundamentals of Database Management 23

A relation schema R, is denoted by R(A1,A2,…,An), is made up of a relation name R and a list of


attributes A1, A2, …., An. Each attribute Ai is the name of a role played by some domain D in the relation
schema R. D is called the domain of Ai and is denoted by dom(Ai). A relation schema is used to describe a
relation; R is called the name of this relation. The degree (or arity) of a relation is the number of
attributes n of its relation schema. The figure shown below is an example of a STUDENT relation.

Fig – An Example of a Relation


The earlier definition of a relation can be restated more formally as follows. A relation (or relation state) r(
R ) is a mathematical relation of degree n on the domain dom(A1), dom(A2), …., dom(An), which is a
subset of Cartesian product of the domains that define R:

The Cartesian product specifies all combinations of values from the underlying domains.
Hence, if we denote the total number of values, or cardinality, in domain D by |D| (assuming
that all domains are finite), the total number of tuples in the Cartesian product is

So we can think relation as a table, then a tuple corresponds to a row of the table; the number tuples is
called the cardinality; the number of attributes is a called the degree; and a domain is a pool of values,
from which the values of specific attributes of specific relation are taken.

Relational Data Integrity

As you know most of the relations have an attribute, which can uniquely identify each tuple in the
relation. In some cases there can be more than one attribute, which can uniquely identify each
tuple in the relation. This attribute is called as a candidate key. If there are more than one attribute both
of the attributes are eligible to be identified as a candidate key. One of the candidate keys is arbitrarily
designated to be the primary key and others are called as secondary or alternate keys. A key is minimal
set of attributes guaranteeing separation for the members of the relation. When more than one key exists,
a primary key is selected.
24 Fundamentals of Database Management 

In the above table symbol, name and atomic number can uniquely identify each row, so any one can be a
candidate key, or the Element_Table has three candidate keys. Let R be the relation with attributes A1,
A2, …An. The set of attributes K=(Ai, Aj,…An) of R is said to be a candidate key of R if and only if the
following two properties are satisfied:
 Uniqueness- At any given point of time, no two distinct tuples of R have the same
value of Ai, the same value for Aj…..and the same value for An.
 Minimality – No proper subset of the set (Ai, Aj,…An) has the uniqueness
property.
In the Element_Table relation there are three candidate keys, so we can choose any one of them as the
primary key. There are no hard and fast rules on how to choose the primary key from the list of
candidate keys. It is a matter of preference and convenience of database designer.
Let us take a look at another relation, SHIPMENT_TABLE.

In the ELEMENT_TABLE, the attribute Symbol and in the SHIPMENT_TABLE the attribute Item has same
data values. And it is clear that a given value for that attribute, say Item ‗Ag‘ should be permitted to
appear in the database only if the same value appears as a value of the Primary Key ‗Symbol‘ in the
relation ELEMENT_TABLE..
Such an attribute is a foreign key. A foreign key is an attribute or attribute combination of one relation
whose values are required to match those out of the primary key of some other relation. Also the foreign
key and the primary key should be defined on the same underlying domain.

Fig – Primary Key – Foreign Key Relationship

Integrity Constraints

Relational model includes several types of constraints whose purpose is to maintain the accuracy and
integrity of the data in the database. The major types of integrity constraints are:
 Domain Constraints
 Entity Integrity
 Referential Integrity
 Operational Constraints

Domain Constraints
Fundamentals of Database Management 25

All the values that appear in a column of a relation must be taken from the same domain. A domain
usually consists of the following components:

 Domain Name
 Meaning
 Data Type
 Size or length
 Allowable values or Allowable range( if applicable) Entity Integrity

The Entity Integrity rule is so designed to assure that every relation has a primary key and that the data
values for the primary key are all valid. Entity integrity guarantees that every primary key attribute is non
null. No attribute participating in the primary key of a base relation is allowed to contain nulls. Primary
key performs unique identification function in a relational model. Thus a null primary key performs the
unique identification function in a relation would be like saying that there are some entity that had no
known identity. An entity that cannot be identified is a contradiction in terms, hence the name
entity integrity.

Referential Integrity

In the relational model the association between the tables is defined using foreign keys. The association
between the SHIPMENT and ELEMENT tables is defined by including the Symbol attribute as a foreign key
in the SHIPMENT table. This implies that before we insert a row in the SHIPMENT table, the element for
that order must already exist in the ELEMENT table.
A referential integrity constraint is a rule that maintains consistency among the rows of two tables or
relations. The rule states that if there is a foreign key in one relation, either each of the foreign key value
must match a primary key value in the other table or else the foreign key value must be null.

Operational Constraints

These are the constraints enforced in the database by the business rules or real world
limitations. For example if the retirement age of the employees in a organization is 60, then the age
column of the employee table can have a constraint ―Age should be less than or equal to 60ǁ. These
kinds of constraints enforced by the business and the environment are called operational constraints.

CODD’S Rules

Dr. E.F. Codd, the founder of the relational database systems, places the relational model‘s
characteristic in three main categories. First, structural features that support the view of the data. They
include relations and their underlying components, views and queries, both mechanism for creating
virtual queries. Second, integrity features such as entity and referential integrity and also application
specific-constraints. Finally data manipulation features for data retrieval, insertion, deletion and
update. These features must be able to emulate any operation from relation algebra. We will see the
Codd‘s rules now.

 Information Rule.
 Guaranteed Access Rule.
 Systematic Treatment of nulls Rule.
 Active on-line catalog based on the Relational model.
 Comprehensive data Sub-language Rule.
 View Updating Rule.
 High-Level Insert, Update and Delete.
26 Fundamentals of Database Management 

 Physical data Independence.


 Logical data base.
 Integrity Independence.
 Distribution Independence.
 Non-Subversion Rule.
Fundamentals of Database Management 27

CHAPTER 3: Relational Algebra

A Brief Introduction

 Relational algebra and relational calculus are formal languages associated with the relational
model.
 Informally, relational algebra is a (high-level) procedural language and relational calculus a non-
procedural language.
 However, formally both are equivalent to one another.
 A language that produces a relation that can be derived using relational calculus is relationally
complete.
 Relational algebra operations work on one or more relations to define another relation
without changing the original relations.
 Both operands and results are relations, so output from one operation can become input to
another operation.
 Allows expressions to be nested, just as in arithmetic. This property is called closure.

What? Why?

 Similar to normal algebra (as in 2+3*x-y), except we use relations as values instead of
numbers.
 Not used as a query language in actual DBMSs. (SQL instead.)
 The inner, lower-level operations of a relational DBMS are, or are similar to, relational
algebra operations. We need to know about relational algebra to understand query
execution and optimization in a relational DBMS.
 Some advanced SQL queries requires explicit relational algebra operations, most commonly outer
join.
 SQL is declarative, which means that you tell the DBMS what you want, but not how it is to be
calculated. A C++ or Java program is procedural, which means that you have to state, step by
step, exactly how the result should be calculated. Relational algebra is (more) procedural than
SQL. (Actually, relational algebra is mathematical expressions.)
 It provides a formal foundation for operations on relations.
 It is used as a basis for implementing and optimizing queries in DBMS software.
 DBMS programs add more operations which cannot be expressed in the relational algebra.
 Relational calculus (tuple and domain calculus systems) also provides a foundation,
but is more difficult to use. We‘ll skip these for now.

The Basic Operations in Relational Algebra

 Basic Operations:
 Selection (σ): choose a subset of rows.
 Projection ( ): choose a subset of columns.
 Cross Product ( ): Combine two tables.
 Union ( ): unique tuples from either table.
 Set difference ( −): tuples in R1 not in R2.
 Renaming (ρ): change names of tables & columns Additional Operations (for convenience):
 Intersection, joins (very useful), division, outer joins, aggregate functions, etc.
 Now we will see the various operations in relational algebra in detail.
28 Fundamentals of Database Management 

Selection Operation σ

The select command gives a programmer the ability to choose tuples from a relation (rows from
a table). Please do not confuse the Relational Algebra select command with the more powerful SQL select
command that we will discuss later.
Idea: choose tuples of a relation (rows of a table)
Format: σ selection-condition(R). Choose tuples that satisfy the selection condition.
Result has identical schema as the input.
σ Major = ‗CS‘ (Students)
This means that, the desired output is to display the name of students who has taken CS as Major. The
Selection condition is a Boolean expression including =, ≠, <, ≤, >, ≥, and, or, not.

Fig – Students and Result Table


Once again, all the Relational Algebra select command does choose tuples from a relation. For
example, consider the following relation R (A, B, C, D):

Fig – Illustrating Relation

This an―abstractǁ table because there is no way to determine the real world model that the table
represents. All we know is that attribute (column) A is the primary key and that fact is reflected in the
fact that no two items currently in the A column of R are the same. Now using a popular variant of
Relation Algebra notation…if we were to do the Relational Algebra command:

Fig – A Select Relational Algebra Command


We would create a relation R1 with the exact same attributes and attribute domains (column
headers and column domains) as R, but we would select only the tuples where the B attribute value is
greater than ‗b2‘. This table would be
Fundamentals of Database Management 29

Fig – Illustrating Output of a Select Command


Important things to know about using the Relational Algebra select command is that the Relation
produced always has the exact same attribute names and attribute domains as the original table –
we just delete out certain columns.
Let us consider the following relations

Fig – Illustrating Tables Related to Each Other

Now based on the above tables, consider the following examples:

SELECT S WHERE CITY = 'PARIS'

Following figure shows a sample output for the preceding SELECT command:

Fig – Illustrating Output of SELECT S WHERE CITY = 'PARIS' Command

SELECT SP WHERE (S# = S1 and P# = P1)


Following figure shows a sample output for the preceding SELECT command:
30 Fundamentals of Database Management 

Fig – Illustrating Output of SELECT SP WHERE (S# = S1 and P# = P1) Command

The resulting relation has the same attributes as the original relation. The selection condition is applied to
each tuple in turn - it cannot therefore involve more than one tuple.

Project Operation

The Relational Algebra project command allows the programmer to choose attributes (columns) of
a given relation and delete information in the other attributes.
Idea: Choose certain attributes of a relation (columns of a table)
Format: Attribute_List (Relation)
Returns: a relation with the same tuples as (Relation) but limited to those attributes of interest (in the
attribute list).selects some of the columns of a table; it constructs a vertical subset of a relation; implicitly
removes any duplicate tuples (so that the result will
be a relation).

Major(Students)

Fig – Students and Result Table for Project Relation

For example, given the original abstract relation R(A, B, C, D):

Fig – Illustrating a Sample Relation to be Used for Project Command

We can pick out columns A, B, and C with the following:

Fig - A Project Relational Algebra Command

This would give us a relation R2(A, B, C) with the D attribute gone:


Fundamentals of Database Management 31

Fig – Illustrating Output of project R over [A, B, C] giving R2Command


There is one slight problem with this command. Sometimes the result might contain a duplicate tuple.
For example, what about

Fig – Another Project Command


What would R3(C, D) look like?

Fig – Illustrating Output of project R over [C, D] giving R3Command

There are two (c3, d2) tuples. This is not allowed in a ―legalǁ relation. What is to be done? Of
course, in Relational Algebra, all duplicates are deleted. Now consider the following examples:
PROJECT S OVER CITY
Following figure shows a sample output for the preceding PROJECT command:

Fig – Illustrating Output of PROJECT S OVER CITY Command


32 Fundamentals of Database Management 

PROJECT S OVER SNAME, STATUS

Following figure shows a sample output for the preceding PROJECT command:

Fig – Illustrating Output of PROJECT S OVER SNAME, STATUS Command

Sequences Of Operations

Now we can see the sequence of operations based on both selection and Projection operations.
E.g.

Part names where weight is less than 17:

TEMP <- SELECT P WHERE WEIGHT < 17 RESULT <- PROJECT TEMP OVER PNAME
or
(nested operations)
PROJECT (SELECT P WHERE WEIGHT < 17) OVER PNAME

Renaming Operation

Format: ρS(R) or ρS(A1, A2, …)(R): change the name of relation R, and names of attributes of
R:
ρCS_Students(σMajor = ‗CS‘ Students))

Fig – Tables for Illustrating Renaming Operation

Consider the scenario


 pList1(scond1 (R1))
 pList2(scond1 (R1))
 scond2 (pList1(scond1(R1)))
Fundamentals of Database Management 33

It would be useful to have a notation that allows us to ―saveǁ the output of an operation for future use.
Tmp1 scond1 (R1)

Tmp2 pList1(Tmp1)

Tmp3 pList2(Tmp2)

Tmp4 scond2(Tmp2)

The resulting temporary relations will have the same attribute names as the originals. We might also
want to change the attribute names:
 To avoid confusion between relations.
 To make them agree with names in another table.

For this we will define a Rename operator (rho):


rS(B1,B2,…, Bn) ( R(A1,A2,…,An))
where S is the new relation name, and B1…Bn are the new attribute names. Note that the degree(S) =
degree(R) = n.
Examples: r EmpNames(LAST,FIRST) (pLNAME,FNAME(Employee)) rWorks_On2(SSN,PNUMBER,HOURS)
(Works_On(ESSN,PNO,HOURS))

The Cartesian Product

The cartesian product of two tables combines each row in one table with each row in the other table.
XX
The Cartesian product of n domains, written dom(A1) dom(A2) ... dom(An), is defined
as follows.
X X Є Є
(A1 A2 ... An = {(a1, a2, ..., an) | a1 A1 AND a2 A2 AND ... AND Є
an An}
We will call each element in the Cartesian product a tuple. So each (a1, a2, ..., an) is known
as a tuple. Note that in this formulation, the order of values in a tuple matters.
Example: The table E (for EMPLOYEE)

enr ename dept

1 Bill A

2 Sarah C

3 John A

Example: The table D (for DEPARTMENT)

dnr dname

A Marketing

B Sales
34 Fundamentals of Database Management 

C Legal

Now consider the following SELECT command:


select *
from E, D
It is equivalent to Cartesian product as shown below:
EXD

enr ename dept dnr dname

1 Bill A A Marketing

1 Bill A B Sales

1 Bill A C Legal

2 Sarah C A Marketing

2 Sarah C B Sales

2 Sarah C C Legal

3 John A A Marketing

3 John A B Sales

3 John A C Legal

Division

Is expressed:
As R ÷ S
Defines a relation over the attributes C that consists of set of tuples from R that match combination of
every tuple in S.

Using basic operations:

T1 πC(R)
T2 πC((S X T1) – R)
T T1 – T2
The division operation ( ÷ ) is useful for a particular type of query that sometimes occurs in database
applications. For example, if I want to organize a study group, I would like to find people who do the same
subjects I do. The division operator provides you with the facility to perform this query without the need
to ―hard codeǁ the subjects involved.
Joins
Fundamentals of Database Management 35

Theta-join:-

The theta-join operation is the most general join operation. We can define theta-join in terms of the
operations that we are familiar with already.
x
R θ S = σθ(R S)
So the join of two relations results in a subset of the Cartesian product of those relations. Which subset is
determined by the join condition:
Let's look at an example. The result of
Professions Job = Job Careers is shown below.

Name Job Job Pays

Joe Garbageman Garbageman 50000

Sue Doctor Doctor 40000

Joe Surfer Surfer 6500

Equi-join:-

The join condition, θ, can be any well-formed logical expression, but usually it is just the conjunction of
equality comparison between pairs of attributes, one from each of the joined relations. This common
case is called an equi-join. The example given above is an example of an equi-join.

Outer Joins:-

A join operation is complete, if all the tuples of the operands contribute to the result. Tuples not
participating in the result are said to be dangling. Outer join operations are variants of the join
operations in which the dangling tuples are appended with NULL fields.
They can be categorized into:
 LEFT OUTER JOIN - keep data from the left-hand table
 RIGHT OUTER JOIN - keep data from the right-hand table
 FULL OUTER JOIN - keep data from both tables
The following figure illustrates example of left and full outer joins:

Fig – Illustrating Left and Outer Join


36 Fundamentals of Database Management 

Natural Joins

The join is the method whereby two tables are combined so data from two or more tables can be used to
extract information. In Relational Algebra, Codd defined the idea of a natural join. The following
description describes a natural join
To do a natural join of two relations, you examine the relations for common attributes (columns with the
same name and domain). For example, look at the following abstract tables: R (A, B, C, D) and Q (B, E,
F)

Fig – Tables for Illustrating Natural Joins


Notice that relations R and Q both have attribute B with the same domain. Note also that there are no
other attributes common to both. So … to do a natural join of R and Q, you match rows from R and rows
of Q with the same B value.
Next, you must know that the result of a join is another table that contains the same attributes
(columns) as both of the tables being joined. The attributes in Q are B, E, and F. The attributes in R are
A, B, C, and D. The attributes in the join of Q and R are B, E, F, A, C, D. Notice that B only appears once
in this list.

Fig – Illustrating Natural Join

Set Operations

Consider two relations R and S. You can perform following set operations on these two relations:
UNION of R and S
The union of two relations is a relation that includes all the tuples that are either in R or in S or in both R
and S. Duplicate tuples are eliminated.
 INTERSECTION of R and S
 The intersection of R and S is a relation that includes all tuples that are both in R and S.
 DIFFERENCE of R and S
 The difference of R and S is the relation that contains all the tuples that are in R but that are not in
S.
 For set operations to function correctly the relations R and S must be union compatible. Two
relations are union compatible if:
 They have the same number of attributes
 The domain of each attribute in column order is the same in both R and S
The following figures illustrate each of the set operation:
Fundamentals of Database Management 37

Fig – UNION Operation

Fig – INTERSECTION Operation

Fig – DIFFERENCE Operation


38
Fundamentals of Database Management
 

CHAPTER 4 : Relational Calculus

Introduction
An operational methodology, founded on predicate calculus, dealing with descriptive expressions
that are equivalent to the operations of relational algebra. Codd's reduction algorithm can convert from
relational calculus to relational algebra. Two forms of the relational calculus exist: the tuple calculus and
the domain calculus. Codd proposed the concept of a relational calculus (applied predicate
calculus tailored to relational databases).

Why It Is Called Relational Calculus?

It is founded on a branch of mathematical logic called the predicate calculus. Relational calculus is a
formal query language where we write one declarative expression to specify a retrieval request and hence
there is no description of how to evaluate a query; a calculus expression specifies what is to be retrieved
rather than how to retrieve it. Therefore, the relational calculus is considered to be a nonprocedural
language. This differs from relational algebra, where we must write a sequence of operations to specify a
retrieval request; hence it can be considered as a procedural way of stating a query. It is possible to nest
algebra operations to form a single expression; however, a certain order among the operations is always
explicitly specified in a relational algebra expression. This order also influences the strategy for evaluating
the query.
It has been shown that any retrieval that can be specified in the relational algebra can also be specified
in the relational calculus, and vice versa; in other words, the expressive power of the two
languages is identical. This has led to the definition of the concept of a relationally complete language. A
relational query language L is considered relationally complete if we can express in L any query that can
be expressed in relational calculus. Relational completeness has become an important basis for
comparing the expressive power of high-level query languages. However certain frequently required
queries in database applications cannot be expressed in relational algebra or calculus. Most
relational query languages are relationally complete but have more expressive power than relational
algebra or relational calculus because of additional operations such as aggregate functions,
grouping, and ordering.

Tuple Calculus

The tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational
model in order to give a declarative database query language for this data model. It formed the
inspiration for the database query languages QUEL and SQL of which the latter, although far less
faithful to the original relational model and calculus, is now used in almost all relational database
management systems as the ad-hoc query language. Along with the tuple calculus Codd also
introduced the domain calculus which is closer to first-order logic and showed that these two calculi (and
the relational algebra) are equivalent in expressive power. The SQL language is based on the tuple
relational calculus (TRC) which in turn is a subset of classical predicate logic. Queries in the TRC all have
the form:
{QueryTarget | QueryCondition}
The QueryTarget is a tuple variable which ranges over tuples of values. The
QueryCondition is a logical expression such that
 It uses the QueryTarget and possibly some other variables.
 If a concrete tuple of values is substituted for each occurrence of the QueryTarget in
QueryCondition, the condition evaluates to a boolean value of true or false.
Fundamentals of Database Management 39

The result of a TRC query with respect to a database instance is the set of all choices of values for the
query variable that make the query condition a true statement about the database instance. The
relation between the TRC and logic is in that the QueryCondition is a logical expression of classical first-
order logic.
The tuple relational calculus is based on specifying a number of tuple variables. Each tuple variable
usually ranges over a particular database relation, meaning that the variable may take as its
value any individual tuple from that relation. A simple tuple relational calculus query is of the form
{t | COND(t)} where t is a tuple variable and COND(t) is a conditional expression involving t.
The result of such a query is the set of all tuples t that satisfy COND(t).
For example, to find all employees whose salary is above $50,000, we can write the following tuple
calculus expression:
{t | EMPLOYEE(t) and t.SALARY>50000}
The condition EMPLOYEE(t) specifies that the range relation of tuple variable t is EMPLOYEE. Each
EMPLOYEE tuple t that satisfies the condition t.SALARY>50000 will be retrieved. Notice that t.SALARY
references attribute SALARY of tuple variable t; this notation resembles how attribute names are qualified
with relation names or aliases in SQL. The above query retrieves all attribute values for each
selected EMPLOYEE tuple t. To retrieve only some of the attributes—say, the first and last names—we
write {t.FNAME, t.LNAME | EMPLOYEE(t) and t.SALARY>50000} This is equivalent to the following SQL
query:
SELECT T.FNAME, T.LNAME FROM EMPLOYEE AS T WHERE T.SALARY>50000;
Informally, we need to specify the following information in a tuple calculus expression:
1.For each tuple variable t, the range relation R of t. This value is specified by a
2.condition of the form R(t).
3.A condition to select particular combinations of tuples. As tuple variables range over their respective
range relations, the condition is evaluated for every possible combination of tuples to identify the
selected combinations for which the condition evaluates to TRUE.
4.A set of attributes to be retrieved, the requested attributes. The values of these
attributes are retrieved for each selected combination of tuples.
Observe the correspondence of the preceding items to a simple SQL query: item 1 corresponds
to the FROM-clause relation names; item 2 corresponds to the WHERE- clause condition; and item
3 corresponds to the SELECT-clause attribute list.
Before we discuss the formal syntax of tuple relational calculus, consider another query we have
seen before.
Retrieve the birthdate and address of the employee (or employees) whose name is ‗John B. Smith‘.
Q0 : {t.BDATE, t.ADDRESS | EMPLOYEE(t) and t.FNAME=‗John‘ and t.MINIT=‗B‘ and t.LNAME=‗Smith‘}
In tuple relational calculus, we first specify the requested attributes t.BDATE and t.ADDRESS for
each selected tuple t. Then we specify the condition for selecting a tuple following the bar ( | )—
namely, that t be a tuple of the EMPLOYEE relation whose FNAME, MINIT, and LNAME attribute
values are ‗John‘, ‗B‘, and ‗Smith‘, respectively.

Domain Calculus

There is another type of relational calculus called the domain relational calculus, or simply,
domain calculus. The language QBE that is related to domain calculus was developed almost
concurrently with SQL at IBM Research, Yorktown Heights. The formal specification of the domain
calculus was proposed after the development of the QBE system.
The domain calculus differs from the tuple calculus in the type of variables used in formulas:
rather than having variables range over tuples, the variables range over single values from domains of
attributes. To form a relation of degree n for a query result, we must have n of these domain variables—
one for each attribute. An expression of the Domain calculus is of the form
{x1, x2, . . ., xn | COND(x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m)}
where x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m are domain variables that range over domains (of attributes)
and COND is a condition or formula of the domain relational calculus. A formula is made up of atoms.
40 Fundamentals of Database Management 

As in tuple calculus, atoms evaluate to either TRUE or FALSE for a specific set of values, called
the truth values of the atoms.
In a similar way to the tuple relational calculus, formulas are made up of atoms, variables, and
quantifiers, so we will not repeat the specifications for formulas here. Some examples of queries
specified in the domain calculus follow. We will use lowercase letters l, m, n, . . ., x, y, z for domain
variables.

Example: Q0

Retrieve the birthdate and address of the employee whose name is ‗John B. Smith‘.
Q0 : {uv | ( q) ( r) ( s) ( t) ( w) ( x) ( y) ( z)
(EMPLOYEE(qrstuvwxyz) and q=‘John‘ and r=‘B‘ and s=‘Smith‘)}
Example: Q1
Retrieve the name and address of all employees who work for the ‗Research‘ department.
Q1 : {qsv | ( z) ( l) ( m) (EMPLOYEE(qrstuvwxyz) and
DEPARTMENT(lmno) and l=‗Research‘ and m=z)}
A condition relating two domain variables that range over attributes from two relations, such as m = z in
Q1, is a join condition; whereas a condition that relates a domain variable to a constant, such as l =
‗Research‘, is a selection condition.
Example: Q2
For every project located in ‗Stafford‘, list the project number, the controlling department number, and
the department manager‘s last name, birthdate, and address.
Q2 : {iksuv | ( j) ( m)( n) ( t)(PROJECT(hijk) and EMPLOYEE(qrstuvwxyz) and
DEPARTMENT(lmno) and k=m and n=t and j=‗Stafford‘)}
As mentioned earlier, it can be shown that any query that can be expressed in the relational
algebra can also be expressed in the domain or tuple relational calculus. Also, any safe expression in
the domain or tuple relational calculus can be expressed in the relational algebra.
The Entity/Relationship (E/R) model was developed to give an overall, conceptual view of the organization
of data. In these notes, we present the modeling concepts. The E/R model has an associated graphical
representation, called E/R diagrams which will be discussed later.

Analogies

A Mini world is a small part of the real world that we are interested in Modeling.
Movie World Example: For a running example we will assume that our Mini world is the motion picture
industry.
Student World Example: For another running example we will assume that your Mini world is the
students and subjects at JCU.

Entity

An entity is a thing or an object in that world, usually one that physically exists, that is distinguishable
from other entities.

Attribute

An attribute is a property of an entity. Movie World Example:


Let us assume that we have several "Star" and "Movie" entities.

entity a1 has attributes Name = Merly Streep, Age = 50, HairColour =


Fundamentals of Database Management 41

{blond, red, brunette}

entity a2 has attributes Name = Robert Redford, Age = 60, HairColour =

blond
entity a3 has attributes Name = Yul Brenner, Age = 60, HairColour = bald
entity m1 has attributes Name = Sneakers, Cost = $10M, Earning = $40M, Profit = $30M, When-Released
= 1995
Where a1 and m1 indicates stars and movies respectively.
Student World Example: Let's assume that we have several "Student" and ―Subject" entities.
entity s1 has attributes Name = Charles Walker, Id = 484350 entity s2 has attributes Name = Jasper, Id
= 2234433
entity u1 has attributes Code = CP1500, Name = Information Systems entity u2 has attributes Code =
CP1200, Name = Programming
Now we will consider the following observations from the above.
Even among these simple entities we notice that there are several different kinds of attributes.
One distinction is simple vs. composite. A simple attribute has an atomic value, while a composite
attribute is (naturally) composed of other attributes.
Movie World Example: We could view a "Star's" Name attribute as a composite attribute, since it is the
composition of Given Names and Surname attributes.
Student World Example: We could view a "Student's" Name attribute as a composite attribute,
since it is the composition of Given Names and Surname attributes.

Single Valued vs. Multi Valued

Another distinction we can make is single-valued vs. multivalued. A single-valued attribute can only be a
single value, while a multivalued attribute can be a list or set of values.
Movie World Example: The HairColour attribute is multivalued since Meryl Streep's hair colour is three
different colours. We will assume that it is three different colours all at the same time!
Student World Example: A Location attribute could be added to each Subject indicating in which rooms
lectures are held. It is often the case that a subject is taught in different rooms. So Location is a
multivalued attribute
In general, the fact that a single-valued attribute changes value over time (e.g., when a person dyes their
hair) does not mean that it is multivalued. A third distinction is stored vs. derived. While the vast majority
of attributes will be stored, some attributes can be computed or derived from other attributes.
Movie World Example: A movie's Profit is a derived attribute, computable from the Cost and Earnings
attributes.
Student World Example: Assume that each subject has a When multivalued attribute that indicates when
the lectures are held. Then a possible derived attribute would be Lecture hours, which is total number of
hours that the class meets each week. Lecture hours is derived from the When attribute.
So, you now seem to be got the basic idea of entities and attributes.

Database Architecture Explained

Types of Database Architecture

Database architecture essentially describes the location of all the pieces of information that make up the
database application. The database architecture can be broadly classified into two-, three-, and multitier
architecture.
42 Fundamentals of Database Management 

Two-Tier Architecture (Client-Server Architecture)

The two-tier architecture is a client–server architecture in which the client contains the presentation code
and the SQL statements for data access. The database server processes the SQL statements and sends
query results back to the client. The two-tier architecture is shown in the figure depicted below. Two-tier
client/server provides a basic separation of tasks. The client, or first tier, is primarily responsible for the
presentation of data to the user and the server, or second tier, is primarily responsible for supplying
data services to the client.

Fig - Two-tier client–server architecture

Presentation Services

Presentation services refers to the portion of the application which presents data to the user. In
addition, it also provides for the mechanisms in which the user will interact with the data. More simply
put, presentation logic defines and interacts with the user interface. The presentation of the data should
generally not contain any validation rules.

Business Services/objects

Business services are a category of application services. Business services encapsulate an organizations
business processes and requirements. These rules are derived from the steps necessary to carry out day-
to day business in an organization. These rules can be validation rules, used to be sure that the incoming
information is of a valid type and format, or they can be process rules, which ensure that the proper
business process is followed in order to complete an operation.

Application Services

Application services provide other functions necessary for the application.


Data Services
Fundamentals of Database Management 43

Data services provide access to data independent of their location. The data can come from legacy
mainframe, SQL RDBMS, or proprietary data access systems. Once again, the data services provide a
standard interface for accessing data.

Advantages of Two-tier Architecture

The two-tier architecture is a good approach for systems with stable requirements and a moderate
number of clients. The two-tier architecture is the simplest to implement, due to the number of good
commercial development environments.

Drawbacks of Two-tier Architecture

Software maintenance can be difficult because PC clients contain a mixture of presentation, validation, and
business logic code. To make a significant change in the business logic, code must be modified on many
PC clients. Moreover the performance of two-tier architecture can be poor when a large number of clients
submit requests because the database server may be overwhelmed with managing messages. With a
large number of simultaneous clients, three-tier architecture may be necessary.

Three-tier Architecture

A Multitier, often referred to as three-tier


― or N-tier, architecture provides greater application
scalability, lower maintenance, and increased reuse of components. Three-tier architecture offers a
technology neutral method of building client/server applications with vendors who employ standard
interfaces which provide services for each logical tier. The three-tier architecture is shown in the figure
depicted below. From this figure, it is clear that in order to improve the performance a second-tier is
included between the client and the server.

Fig - Three-tier client–server architecture


44 Fundamentals of Database Management 

Through standard tiered interfaces, services are made available to the application. A single application can
employ many different services which may reside on dissimilar platforms or are developed and maintained
with different tools. This approach allows a developer to leverage investments in existing systems while
creating new application which can utilize existing resources.
Although the three-tier architecture addresses performance degradations of the two-tier architecture, it
does not address division-of-processing concerns.
The PC clients and the database server still contain the same division of code although the tasks of the
database server are reduced. Multiple-tier architectures provide more flexibility on division of processing.

Multitier Architecture

A multi-tier, three-tier, or N-tier implementation employs a three-tier logical architecture superimposed on


a distributed physical model. Application Servers can access other application servers in order to supply
services to the client application as well as to other Application Servers. The multiple-tier architecture is
the most general client–server architecture. It can be most difficult to implement because of its generality.
However, a good design and implementation of multiple-tier architecture can provide the most benefits in
terms of scalability, interoperability, and flexibility.
For example, in the diagram shown in the following figure, the client application looks to Application
Server #1 to supply data from a mainframe-based application. Application Server #1 has no direct access
to the mainframe application, but it does know, through the development of application services, that
Application Server #2 provides a service to access the data from the mainframe application which satisfies
the client request.
Application Server #1 then invokes the appropriate service on Application Server #2 and receives the
requested data which is then passed on to the client.

Fig - Multiple-tier architecture


Fundamentals of Database Management 45

Application Servers can take many forms. An Application Server may be anything from custom application
services, Transaction Processing Monitors, Database Middleware, Message Queue to a CORBA/COM based
solution.
46 Fundamentals of Database Management 

E-R Diagrams

Introducing E/R Diagram

The entity-relationship (ER) data model allows us to describe the data involved in a real-world
enterprise in terms of objects and their relationships and is widely used to develop an initial database
design. Here, we introduce the ER model and discuss how its features allow us to model a wide
range of data faithfully.
The ER model is important primarily for its role in database design. It provides useful concepts that allow
us to move from an informal description of what users want from their database to a more detailed,
and precise, description that can be implemented in a DBMS. Within the larger context of the overall
design process, the ER model is used in a phase called conceptual database design.
There are many variations of ER diagrams are in use, and no widely accepted standards prevail.
The presentation here is representative of the family of ER models and includes a selection of the most
popular features.

Analogies

A Mini world is a small part of the real world that we are interested in Modeling.
Movie World Example: For a running example we will assume that our Mini world is the motion picture
industry.
Student World Example: For another running example we will assume that your Mini world is the
students and subjects at JCU.

Entity

An entity is a thing or an object in that world, usually one that physically exists, that is distinguishable
from other entities.

Attribute

An attribute is a property of an entity.

Movie World Example:

 Let us assume that we have several "Star" and "Movie" entities.


 entity a1 has attributes Name = Merly Streep, Age = 50, HairColour = {blond, red,
 brunette}
 entity a2 has attributes Name = Robert Redford, Age = 60, HairColour = blond
 entity a3 has attributes Name = Yul Brenner, Age = 60, HairColour = bald
 entity m1 has attributes Name = Sneakers, Cost = $10M, Earning = $40M, Profit =
 $30M, When-Released = 1995
Here a1 and m1 indicates stars and movies respectively.
Fundamentals of Database Management 47

Student World Example:

 Let's assume that we have several "Student" and "Subject" entities.


 entity s1 has attributes Name = Charles Walker, Id = 484350
 entity s2 has attributes Name = Jasper, Id = 2234433
 entity u1 has attributes Code = CP1500, Name = Information Systems entity u2
 has attributes Code = CP1200, Name = Programming

Now we will consider the following observations from the above.


Even among these simple entities we notice that there are several different kinds of attributes.
One distinction is simple vs composite. A simple attribute has an atomic value, while a composite
attribute is (naturally) composed of other attributes.
Movie World Example: We could view a "Star's" Name attribute as a composite attribute, since it is the
composition of Given Names and Surname attributes.
Student World Example: We could view a "Student's" Name attribute as a composite attribute,
since it is the composition of Given Names and Surname attributes.

Single Valued vs. Multi Valued

Another distinction we can make is single-valued vs. multivalued A single-valued attribute can only
be a single value, while a multivalued attribute can be a list or set of values.

Movie World Example:

The HairColour attribute is multivalued since Meryl Streep's hair colour is three different colours. We will
assume that it is three different colours all at the same time!

Student World Example:

A Location attribute could be added to each Subject indicating in which rooms lectures are held. It is often
the case that a subject is taught in different rooms. So Location is a multivalued attribute
In general, the fact that a single-valued attribute changes value over time (e.g., when a person dyes their
hair) does not mean that it is multivalued. A third distinction is stored vs. derived. While the vast majority
of attributes will be stored, some attributes can be computed or derived from other attributes.

Movie World Example:

A movie's Profit is a derived attribute, computable from the Cost and Earnings attributes.

Student World Example:

Assume that each subject has a When multivalued attribute that indicates when the lectures are held.
Then a possible derived attribute would be Lecture hours, which is total number of hours that the class
meets each week. Lecture hours is derived from the When attribute.
So, you now seem to be got the basic idea of entities and attributes.
48 Fundamentals of Database Management 

E-R Diagrams
In an E/R diagram we will represent an attribute using an oval inscribed with the name of the attribute, as
follows.

Fig – Representing a Single-Valued Attribute in E/R Diagram

At least that is how we will represent a simple, single-valued, stored attribute. A composite
attribute will be represented by a hierarchy of ovals, where each oval represents an attribute
value within the composite. A multivalued attribute will be represented as an oval within an oval.

Fig – Representing a Multivalued Attribute in E/R Diagram

Finally, a derived attribute will be represented as an attribute with dashed or dotted lines.

Fig – Representing a Derived Attribute in E/R Diagram

An aside on null values

One interesting question is what happens when we don't know the value of a particular attribute? When an
attribute value is unknown we will use a null value. For the above entities, we have complete information,
but in real world databases null values will often be present. We will represent a null value with the special
symbol @.
For some entities an attribute is inapplicable, which means that the entity does not have a value
for that attribute. For instance, the HairColour attribute for Yul Brenner is really inapplicable since he does
not have any hair. We will use a @ to represent inapplicable values as well. We have thus
overloaded the semantics of @ with two completely disparate meanings. The overloaded semantics
however is common in databases since it is in SQL.
Fundamentals of Database Management 49

Symbols Used In E-R Diagrams

The following figure shows the various symbols used in an E/R diagram.

Fig – Symbols Used in E/R Diagram

Entity Type

Now you know what an entity is; now we will look into what exactly is an entity type?
An entity type is a description of the attributes that a set of possible entities has in common.

Fig - The Students Entity

Movie World Example:

In our running example, we so far have two entity types: Star and Movie. We will use a third, Studio
as well. We will assume that Star has attributes Name, Age, and HairColour. Movie has attributes
50 Fundamentals of Database Management 

Name, When Released, Cost, Earnings, and Profit. Finally, Studio has attributes Name and Location. Name
is certainly a popular attribute name for these entity types!

Student World Example:

In our running example, we so far have two entity types: Student and Subject. We will use a third,
Lecturer as well. We will assume that Student has attributes Name, Address, and Id. Subject has
attributes Code, Name, and When. Finally, Lecturer has attributes Name and Age. Name is certainly
a popular attribute name for these entity types!
An entity type is sometimes called an entity set, however, some authors distinguish between the
two. More specifically an entity set is a set of actual entities (that is, it is an extension of an entity type,
rather than an entity type itself). We will use the two terms interchangeably. In an E/R diagram an entity
type is represented with a rectangular box inscribed with the name of that entity type.

Key Attributes

Key attributes (or just keys) are a set of attributes which have distinct values for any possible
entity. There may be several keys for a particular entity type.

Movie World Example:

By convention, two movies with the same name cannot be released during the same year. So
the attributes Name and When Released form a perfectly reasonable key for the Movie entity type.

Student World Example:

Each student has a unique Id, so that attribute makes a perfectly reasonable key for the Student entity
type. In an E/R diagram we depict a key attribute (or an attribute that is part of a key) by underlining the
attribute name.

Relationship

We both have a good relation now, we could call it as a student and faculty relationship, likewise in E-R
model…. A relationship is an association between two or more entities.

Movie World Example:

The star Robert Redford "stars in" the movie Sneakers.

Student World Example:

The student Charles Walker "takes" the subject

Relationship Type

A relationship type or relationship set is a set of "similar in kind" relationships among one or more entities.
Mathematically, a relationship type, R, among entity types E1, E2, ...En is R E1 E2 ... En . In other
Fundamentals of Database Management 51

words a relationship set can be thought of as a subset of the Cartesian product of the participating entity
types. The Cartesian product is just the space of all possible associations among the entity types. A
relationship type is often also called a role because it describes a role that one entity plays with another.

Fig – Relating Entities

Movie World Example:

Each star may "star in" one or more movies. So we could have a relationship type StarsIn that
captures has all the associations between stars and the movies in which they star.

Student World Example:

The relationship type EnrolledIn is the set of associations between Student and the Subject in which they
are enrolled.

Cardinality Ratio

We will often be interested in the cardinality ratio of a relationship type, that is, how many of
each entity type participate in the relationship. Possible cardinality ratios are the following.
One-to-one(1-to-1)
Each entity in E1 is associated with 0 or one entity in E2, and vice versa.

Fig – Illustrating One-to-one Relationship


52 Fundamentals of Database Management 

Movie World Example:

Assume that Married is a relationship type between Star and Star, which captures whom is married to
whom. It is a 1-1 relationship since each Star is married to at most one other Star (let's not worry
too much about people who currently have multiple wives or husbands!).

Student World Example:

Assume that Married is a relationship type between Student and Student, which captures whom is
married to whom. It is a 1-1 relationship since each Student is married to at most one other
Student (let's not worry too much about students who currently have multiple wives or husbands!).
one-to-many
A one-to-many relationship type (1-N or 1:N) is one in which a single entity of one entity type can be
related to several entities of another type, but each entity of the other type is related to at most
one entity of the first type.

Fig – Illustrating One-to-many Relationship

Movie World Example:

Assume that Produces is a relationship type between Studio and Movie, which captures which studio
produces which movies. It is a 1-N relationship since each Studio may produce several different
Movies, but each movie can be produced by at most one Studio (assuming that only one studio can
produce a movie, let's not worry too much about collaboration between studios).

Student World Example:

Assume that Teaches is a relationship type between Lecturers and Subjects, which captures which
lecturer teaches which subject. It is a 1-N relationship since each Lecturer can teach several
different subjects, but each Subject has a single Lecturer (let's not worry too much about subjects
that have more than one lecturer).
many-to-many
A many-to-many relationship type (N-M or N:M) is one in which a single entity of one entity type is
related to at most N entities of another type, and vice- versa.
Fundamentals of Database Management 53

Fig – Illustrating Many-to-many Relationship

Movie World Example

Assume that StarsIn is a relationship type between Star and Movie, which captures who stars in what
movies. It is a N-M relationship since each Star may star in many different Movies, and each Movie may
have many different Stars.

Student World Example

Assume that EnrolledIn is a relationship type between Student and Subject, which captures who is
enrolled in what subject. It is a N-M relationship since each Student may enroll in many different
Subjects, and each Subject may have many different Students.
In an E/R diagram we depict a relationship type as a diagonal box. The cardinality ratio is also shown by
adding 1, N, or M to the lines connecting the relationship type to the entity type.

Weak Entity Type

A weak entity type is an entity that needs the key attributes from another entity to uniquely
identify tuples. Weak entities lack keys. In an E/R diagram a weak entity type is represented by a nested
pair of rectangles as shown below. The weak entity is connected by an identifying or owning relationship
to the entity type that supplies the key attributes, which in turn is called the owning entity type. An
owning relationship is depicted as a nested pair of diamonds.

Fig – Illustrating Weak Entity Type

Movie World Example:

Each Star could have several children. We choose to represent a Child entity type using Child
Name and Age attributes. For instance assume that Meryl Streep has a child named Joe who is 6 years
old. The key of the Star entity type needs to be used to help identify which Child is dependent on
which Star since children in different families could be the same age with the same first name.
For instance assume that Robert Redford also has a child named Joe who is 6 years old. We need the
Star's key to identify which child is Owned by which Star, to keep the two Joe's separate.
54 Fundamentals of Database Management 

Example of an E-R Diagram

Consider an example of a library management system, we have the following relationships:


 We realize that: one book or magazine (documents) must be belong to one language, but one
language can have many books or magazines, so this relationship is
 one - to - many(one language has many documents). The similar relationships are:
documents and nation, documents and collection, documents and specialty.
 With the magazines: we realize that each magazine category has many volumes (depend on years,
months, and numbers...), but one volume must be belong to one magazine category, so this
relationship is one - to - many.
The relationship between department and readers is also one - to - many relationship, because one reader
must belong to one and only one department, but one department can have many readers (staffs).
And the last relationship: the relationship between readers and documents: This is a special relationship:
one reader can borrow many documents, and one document can be borrowed by many readers at
different time (because, when a reader gives back a document, it can be borrowed by another reader
again). So we can say that this relationship is ―many - to -manyǁ relationship, and separate it into 2
―one - to - manyǁ relationships and one entity, the Borrowing/Returning Ticket entity, with its primary
key is the compose of the primary key of document entity, the primary key of reader entity, and the
BORROW_DATE attribute.
So we have the Entity Relationship diagrams as follow:

Fig – ERD Diagram (Books)


Fundamentals of Database Management 55

Fig – ERD Diagram (Magazines)

Data Flow Diagram

In addition to E-R diagrams, another tool that comes handy during database as well as system design is
the Data Flow Diagram (DFD). Both DFD and ERD are important for an organization. While entities,
whether they are people, places, events or objects are represented in an ERD, DFD talks about how data
flows between entities. One gets to know about the entities for which data is stored in the organization
through ERD while DFD gives information about the flow of data between entities and how and where it is
stored.
Data flow diagram will support 4 main activities:
 Analysis: DFD is used to determine requirements of users
 Design: DFD is used to map out a plan and illustrate solutions to analysts and users while designing
a new system
 Communication: One of the strength of DFD is its simplicity and ease to understand to analysts and
users;
56 Fundamentals of Database Management 

 Documents: DFD is used to provide special description of requirements and system design.
DFD provide an overview of key functional components of the system but it does not provide any detail on
these components. We have to use other tools like database dictionary, process specification to get
an idea of which information will be exchanged and how.
The data dictionary is an organized listing of all the data elements pertinent to the system, with precise,
rigorous definitions so that both user and systems analyst will have a common understanding of all inputs,
outputs, components of stores, and intermediate calculations. The data dictionary defines the data
elements by doing the following:
 Describing the meaning of the flows and stores shown in the data flow diagrams;
 Describing the composition of aggregate packets of data moving along the flow;
 Describing the composition of packets of data in stores;
 Specifying the relevant values and units of elementary chunks of information in the
data flows and data stores.
 Describing the details of relationships between stores that are highlighted in an
entity- relationship diagram.
 The system analysis can ensure that the dictionary is complete, consistent, and non-
contradictory. He can examine the dictionary on his own and ask the following
questions:
 Has every flow on the data flow diagram been defined in the data dictionary?
 Have all the components of composite data elements been defined?
 Has any data element been defined more than once?
 Has the correct notation been used for all data dictionary definition?
 Are there any data elements in the data dictionary that are not referenced in the
functioning diagrams, data flow diagrams, or entity-relationship diagrams
Building a data dictionary is one of the more important aspects and time consuming of systems analysis.
But, without a formal dictionary that defines the meaning of all the terms, there can be no hope for
precision.

The process specification:

As we know, there is a variety of tools that we can use to produce a process specification: decision tables,
structured English, pre/post conditions, flowcharts, and so on. Most of the systems analysts use structured
English. But, any method can be used as long as it satisfies two important requirements:
 The process specification must be expressed in a form that can be verified by the
 user and the systems analysts;
 The process specification must be expressed in a form that can be
 effectively communicated to the various audiences involved.
The process specification represents the largest amount of detailed work in building a system model.
Because of the amount of work involved, you may want to consider the top – down implementation
approach: begin the design and implementation phase of your project before all the process specifications
have been finished.
The activity of writing process specifications regarded as a check of the data flow diagrams that have
already developed. In writing process specifications, you may discover that the process specifications
needs additional functions, input data flow or output data flow... Thus, the DFD model may be changed,
revisions, and corrections based on the detailed work of writing the process specifications.
Data flow diagram can be described in the following ways:
 What functions should the system perform?
 Interaction between functions?
 What does the system have to transfer?
 What inputs are transferred to what outputs?
 What type of work does the system do?
 Where does the system get information from to work?
 And where does it give work results to?
Fundamentals of Database Management 57

 Regardless of the ways it is described, the data flow diagram needs to meet the following
requirements:
 Without explanation in words, the diagram can still tell the system‘s functions and
its information flowing process. Moreover, it must be really simple for users and
systems analysts to understand.
 The diagram must be balance laid out in one page (for small systems) and in
every single page showing system‘s functions of the same level (for larger systems)
 It is better for the diagram to be laid out with computer supporting tools, because that
way the diagram will be consistent and standardized. Also, the adjustment process (when
needed) will be done quickly and easily.
 The main components of data flow diagram are:
 The process: The process shows a part of the system that transforms inputs into outputs;
that is, it shows how one or more inputs are changed into outputs. Generally, the process
is represented graphically as a circle or rectangle with rounded edges. The process name
will describe what the process does.
 The flow: The flow is used to describe the movement of information from one part of the
system to another. Thus, the flow represents data in motion, whereas the stores represent
data at rest. A flow is represented graphically by an arrow into or out of a process.
 The store: the store is used to model a collection of data packets at rest. A
store is represented graphically by two parallel lines. The name of a store identified the
store is the plural of the name of the packets that are carried by flows into and out of the
store
 External factors: External factors can be a person, a group of persons or an organization
that are not under the studying field of the system (they can stay in or out of
the organization), but has certain contact with the system. The presence of these factors
on the diagram shows the limit of the system and identifies the system
relationship to the outside world. External factors are important components crucial to
the survival of every system, because they are sources of information for the
systems and are where system products are transferred to. An external factor tends to
be represented by an rectangle, one shorter edge of which is omitted while the other is
drawn by a duplicated line.
 Internal factors: While the external factors‘ names are always nouns
showing a department or an organization, internal factors‘ names are
expressed by verbs or modifiers. Internal factors are systems‘ functions or process. To
distinguish itself from external factors, an internal factor is represented by an
rectangle, one shorter edge of which is omitted while the other is drawn by a single
line.
 You can construct DFD model of system with the following guidelines:
 Choose meaningful names for processes, flows, stores, and terminators
 Number of processes
 Re-draw the DFD many times
 Avoid overly complex DFD
 Make sure the DFD is consistent internally and with any associated DFD

To recap, DFD is one of the most important tools in a structured system analysis. It presents a method of
establishing relationship between functions or processes of the system with information it uses. DFD is a
key component of the system requirement specification, because it determines what information is needed
for the process before it is implemented. Many systems analysts reckon that DFD is all they need to know
about structured analysis.
On the one hand, this is because DFD is the only thing that a systems analyst remembers after reading a
book focussing on DFD or after a course in structured analysis. On the other hand, without the additional
modelling tools such as Data Dictionary, Process Specification, DFD not only can‘t show all the necessary
details, but also becomes meaningless and useless.
In the example of library management system, corresponding to each level of function hierarchy diagram,
we develop the data flow diagrams:
58 Fundamentals of Database Management 

Fig – DFD High Level

Fig – DFD Exploded – Function 1


Fundamentals of Database Management 59

Fig – DFD Exploded – Function 2

Fig – DFD Exploded – Function 3


60 Fundamentals of Database Management 

Fig – DFD Exploded – Function 4

Functional Dependencies

Introduction

For our discussion on functional dependencies assume that a relational schema has attributes (A,
B, C... Z) and that the whole database is described by a single universal relation called R = (A,
B, C, ..., Z). This assumption means that every attribute in the database has a unique name.

What Is Functional Dependency In A Relation?

A functional dependency is a property of the semantics of the attributes in a relation. The semantics
indicate how attributes relate to one another, and specify the functional dependencies between
attributes. When a functional dependency is present, the dependency is specified as a constraint between
the attributes.
Consider a relation with attributes A and B, where attribute B is functionally dependent on attribute A. If
we know the value of A and we examine the relation that holds this dependency, we will find only
one value of B in all of the tuples that have a given value of A, at any moment in time. Note however,
that for a given value of B there may be several different values of A.

Fig –Functional Dependency


Fundamentals of Database Management 61

In the figure above, A is the determinant of B and B is the consequent of A.


The determinant of a functional dependency is the attribute or group of attributes on the left-hand side of
the arrow in the functional dependency. The consequent of afd is the attribute or group of attributes on
the right-hand side of the arrow.

Identifying Functional Dependencies

Now let us consider the following Relational schema:

Fig – Table for Illustrating Functional Dependency

The functional dependency staff# position clearly holds on this relation instance. However, the reverse
functional dependency position staff# clearly does not hold.

The relationship between staff# and position is 1:1 – for each staff member there is only one position. On
the other hand, the relationship between position and staff# is 1:M – there are several staff numbers
associated with a given position.

Fig – Illustrating Functional Dependency

For the purposes of normalization we are interested in identifying functional dependencies between
attributes of a relation that have a 1:1 relationship.
When identifying Fds between attributes in a relation it is important to distinguish clearly between the
values held by an attribute at a given point in time and the set of all possible values that an attributes
may hold at different times.
In other words, a functional dependency is a property of a relational schema (its intension) and
not a property of a particular instance of the schema (extension).
The reason that we need to identify Fds that hold for all possible values for attributes of a relation is that
these represent the types of integrity constraints that we need to identify. Such constraints indicate the
62 Fundamentals of Database Management 

limitations on the values that a relation can legitimately assume. In other words, they identify the
legal instances which are possible.
Let‘s identify the functional dependencies that hold using the relation schema
STAFFBRANCH.
In order to identify the time invariant Fds, we need to clearly understand the semantics of the various
attributes in each of the relation schemas in question.
For example, if we know that a staff member‘s position and the branch at which they are located
determines their salary. There is no way of knowing this constraint unless you are familiar with the
enterprise, but this is what the requirements analysis phase and the conceptual design phase are
all about!
staff# (sname, position, salary, branch#, baddress branch# baddressbaddress branch# branch#,
position salary baddress, position, salary )

Trivial Functional Dependencies

As well as identifying Fds which hold for all possible values of the attributes involved in the fd, we also
want to ignore trivial functional dependencies. A functional dependency is trivial if, the consequent is a
subset of the determinant. In other words, it is impossible for it not to be satisfied.
Although trivial Fds are valid, they offer no additional information about integrity constraints for the
relation. As far as normalization is concerned, trivial Fds are ignored.

Inference Rules for Functional Dependencies

We‘ll denote as F, the set of functional dependencies that are specified on a relational schema R.
Typically, the schema designer specifies the Fds that are semantically obvious; usually however,
numerous other Fds hold in all legal relation instances that satisfy the dependencies in F.
These additional Fds that hold are those Fds which can be inferred or deduced from the Fds in F.
The set of all functional dependencies implied by a set of functional dependencies F is called the closure of
F and is denoted F+.
The notation: FX → Y denotes that the functional dependency X→ Y is implied by the set of Fds F.
Formally, F+ {X→ Y | F X → Y}
A set of inference rules is required to infer the set of Fds in F+.
For example, if Kristi is older than Debi and that Debi is older than Traci, you are able to infer that Kristi is
older than Traci. How did you make this inference? Without thinking about it or maybe knowing about it,
you utilized a transitivity rule to allow you to make this inference. The set of all Fds that are implied by a
given set S of Fds is called the closure of S, written S+.
Clearly we need an algorithm that will allow us to compute S+ from S. You know the first attack on this
problem appeared in a paper by Armstrong which gives a set of inference rules. The following are the
six well-known inference rules that apply to functional dependencies.
 IR1: reflexive rule – if X Y, then X → Y
 IR2: augmentation rule – if X → Y, then XZ → YZ
 IR3: transitive rule – if X → Y and Y → Z, then X → Z
 IR4: projection rule – if X → YZ, then X → Y and X → Z
 IR5: additive rule – if X → Y and X → Z, then X → YZ
 IR6: pseudo transitive rule – if X → Y and YZ → W, then XZ → W
The first three of these rules (IR1-IR3) are known as Armstrong‘s Axioms and constitute a necessary and
sufficient set of inference rules for generating the closure of a set of functional dependencies. These rules
can be stated in a variety of equivalent ways. Each of these rules can be directly proved from the
definition of functional dependency. Moreover the rules are complete, in the sense that, given a set S of
Fds, all Fds implied by S can be derived from S using the rules. The other rules are derived from these
three rules.
Fundamentals of Database Management 63

Chapter 5 : Normalization

Analysis of Redundancies
Before we go into the detail of Normalization I would like to discuss with you the redundancies
in the databases.
A redundancy in a conceptual schema corresponds to a piece of information that can be derived (that is,
obtained through a series of retrieval operations) from other data in the database.

Deciding About Redundancies

The presence of a redundancy in a database may be decided upon the following factors
 An advantage: a reduction in the number of accesses necessary to obtain the
 derived information;
 A disadvantage: because of larger storage requirements, (but, usually at negligible
 cost) and the necessity to carry out additional operations in order to keep the derived data
consistent.
The decision to maintain or delete a redundancy is made by comparing the cost of operations
that involve the redundant information and the storage needed, in the case of presence or absence of
redundancy.

Issues Related To Redundancies (Anomalies)

The time has come to reveal the actual facts why normalization is needed. We will look in to the matter in
detail now.
The serious problem with using the relations is the problem of update anomalies. These can be classified
in to:
 Insertion anomalies
 Deletion anomalies .Modification anomalies

Insertion Anomalies

An "insertion anomaly" is a failure to place information about a new database entry into all the places in
the database where information about that new entry needs to be stored.
In a properly normalized database, information about a new entry needs to be inserted into only one place
in the database; in an inadequately normalized database, information about a new entry may need to be
inserted into more than one place and, human fallibility being what it is, some of the needed additional
insertions may be missed.
This can be differentiated in to two types based on the following example:

Emp_Dept

EName SSN BDate Address DNumber DName DMGRSSN

Smith 123456789 1965-01-09 Kandivly 5 Research 333445555


64 Fundamentals of Database Management 

THE RELATIONAL MODEL

TABLE: An arrangement of words, numbers, or signs, or combinations of them, as in parallel columns, to


exhibit a set of facts or relations in a de nite, compact, and comprehensive form; a synopsis or scheme.

|Webster's Dictionary of the English Language

Codd proposed the relational data model in 1970. At that time most database systems were based on one
of two older data models (the hierarchical model and the network model); the relational model
revolutionized the database field and largely supplanted these earlier models. Prototype relational database
management systems were devel-oped in pioneering research projects at IBM and UC-Berkeley by the
mid-70s, and several vendors were o ering relational database products shortly thereafter. Today, the
relational model is by far the dominant data model and is the foundation for the leading DBMS products,
including IBM's DB2 family, Informix, Oracle, Sybase, Mi-crosoft's Access and SQLServer, FoxBase, and
Paradox. Relational database systems are ubiquitous in the marketplace and represent a multibillion dollar
industry.

The relational model is very simple and elegant; a database is a collection of one or more relations, where
each relation is a table with rows and columns. This simple tabularrepresentation enables even novice
users to understand the contents of a database, and it permits the use of simple, high-level languages to
query the data. The major advantages of the relational model over the older data models are its simple
data representation and the ease with which even complex queries can be expressed.

This chapter introduces the relational model and covers the following issues:

 How is data represented?

 What kinds of integrity constraints can be expressed?

 How can data be created and modied?

 How can data be manipulated and queried?

 How do we obtain a database design in the relational model?

 How are logical and physical data independence achieved?

SQL: It was the query language of the pioneering System-R relational DBMS developed at IBM. Over the
years, SQL has become the most widely used language for creating, manipulating, and querying relational
DBMSs. Since many vendors o er SQL products, there is a need for a standard that de nes `o cial SQL.'
The existence of a standard allows users to measure a given vendor's version of SQL for completeness. It
also allows users to distinguish SQL features that are speci c to one product from those that are standard;
an application that relies on non-standard features is less portable.

The rst SQL standard was developed in 1986 by the American National Stan-dards Institute (ANSI), and
was called SQL-86. There was a minor revision in 1989 called SQL-89, and a major revision in 1992 called
SQL-92. The Interna-tional Standards Organization (ISO) collaborated with ANSI to develop SQL-92. Most
commercial DBMSs currently support SQL-92. An exciting development is the imminent approval of
SQL:1999, a major extension of SQL-92. While the cov-erage of SQL in this book is based upon SQL-92,
we will cover the main extensions of SQL:1999 as well.
While we concentrate on the underlying concepts, we also introduce the Data Def-inition Language (DDL)
features of SQL-92, the standard language for creating, manipulating, and querying data in a relational
DBMS. This allows us to ground the discussion rmly in terms of real database systems.
Fundamentals of Database Management 65

We discuss the concept of a relation in Section 3.1 and show how to create relations using the SQL
language. An important component of a data model is the set of constructs it provides for specifying
conditions that must be satised by the data. Such conditions, called integrity constraints (ICs), enable the
DBMS to reject operations that might corrupt the data. We present integrity constraints in the relational
model in Section 3.2, along with a discussion of SQL support for ICs. We discuss how a DBMS enforces
integrity constraints in Section 3.3. In Section 3.4 we turn to the mechanism for accessing and retrieving
data from the database, query languages, and introduce the querying features of SQL, which we examine
in greater detail in a later chapter.

We then discuss the step of converting an ER diagram into a relational database schema in Section 3.5.
Finally, we introduce views, or tables de ned using queries, in Section 3.6. Views can be used to de ne the
external schema for a database and thus provide the support for logical data independence in the
relational model.

INTRODUCTION TO THE RELATIONAL MODEL

The main construct for representing data in the relational model is a relation. A relation consists of a
relation schema and a relation instance. The relation instance

The Relational Model

is a table, and the relation schema describes the column heads for the table. We first describe the relation
schema and then the relation instance. The schema species the relation's name, the name of each field (or
column, or attribute), and the domain of each field. Adomain is referred to in a relation schema by the
domain name and has a set of associated values.

We use the example of student information in a university database from Chapter 1 to illustrate the parts
of a relation schema:

Students(sid: string, name: string, login: string, age: integer, gpa: real)

This says, for instance, that the field named sid has a domain named string. The set of values associated
with domain string is the set of all character strings.

We now turn to the instances of a relation. An instance of a relation is a set of tuples, also called records,
in which each tuple has the same number of fields as the relation schema. A relation instance can be
thought of as a table in which each tuple is a row, and all rows have the same number of fields. (The term
relation instance is often abbreviated to just relation, when there is no confusion with other aspects of a
relation such as its schema.)

An instance of the Students relation appears in Figure below. The instance S 1 contains
FIELDS (ATTRIBUTES,
COLUMNS)

Field names
sid name login age gpa
50000 Dave dave@cs 19 3.3
53666 Jones jones@cs 18 3.4
TUPLES 53688 Smith smith@ee 18 3.2
(RECORDS, ROWS) 53650 Smith smith@math 19 3.8
66 Fundamentals of Database Management 

madayan@mus
53831 Madayan ic 11 1.8
53832 Guldu guldu@music 12 2.0

Fig-1 An InstanceS 1 of the Students Relation

six tuples and has, as we expect from the schema, fields. Note that no two rows are identical. This is a
requirement of the relational model|each relation is de ned to be a set of unique tuples or rows.1 The
order in which the rows are listed is not important. Figure below shows the same relation instance. If the
fields are named, as in

1In practice, commercial systems allow tables to have duplicate rows, but we will assume that a relation is
indeed a set of tuples unless otherwise noted.

sid name Login age gpa

53831 Madayan madayan@music 11 1.8


53832 Guldu guldu@music 12 2.0
53688 Smith smith@ee 18 3.2
53650 Smith smith@math 19 3.8
53666 Jones jones@cs 18 3.4
50000 Dave dave@cs 19 3.3

Fig-2 An Alternative Representation of Instance S 1 of Students

our schema definitions depicting relation instances, the order of fields does not matter either. However,
an alternative convention is to list fields in a specific order and to refer to field by its position.
Thus sid is field 1 of Students login is field 3, and so on. If this convention is used, the order of fields is signi
-ficant. Most database systems use a combination of these conventions. For example, in SQL the named fields
convention is used in statements that retrieve tuples, and the ordered elds convention is commonly used
when inserting tuples.

A relation schema species the domain of each field or column in the relation instance. These domain
constraints in the schema specify an important condition that we want each instance of the relation to
satisfy: The values that appear in a column must be drawn from the domain associated with that column.
Thus, the domain of a field is essentially the type of that field, in programming language terms, and restricts
the values that can appear in the field.

More formally, let R(f1:D1, : : :, fn:Dn) be a relation schema, and for each fi, 1 i n, let Domi be the set of
Fundamentals of Database Management 67

values associated with the domain named Di. An instance of R that satises the domain constraints in the
schema is a set of tuples with n fields:

f hf1: d1; : : : ; fn:dni j d1 2 Dom1; : : : ; dn 2 Domn g

The angular brackets h: : :i identify the fields of a tuple. Using this notation, the rst Students tuple shown
in figure-1 is written as hsid: 50000, name: Dave, login:dave@cs, age: 19, gpa: 3.3i. The curly
brackets f: : :g denote a set (of tuples, in this de nition). The vertical bar j should be read `such that,' the
symbol 2 should be read `in,' and the expression to the right of the vertical bar is a condition that must
be satised by the field values of each tuple in the set.Thus, an instance of R is de ned as a set of tuples.
The field of each tuple must correspond to the fields in the relation schema.

Domain constraints are so fundamental in the relational model that we will henceforth consider only
relation instances that satisfy them; therefore, relation instance means relation instance that satises the
domain constraints in the relation schema.

The degree, also called cardinalityof a relation is the number of fields. The cardinality of a relation instance
is the number of tuples in it. In Figure-1, the degree of the relation (the number of columns) is five, and the
cardinality of this instance is six.

A relational database is a collection of relations with distinct relation names. The relational database
schema is the collection of schemas for the relations in the database. For example, in Chapter 1, we
discussed a university database with rela-tions called Students, Faculty, Courses, Rooms, Enrolled,
Teaches, and Meets In. An instance of a relational database is a collection of relation instances, one per
rela-tion schema in the database schema; of course, each relation instance must satisfy the domain
constraints in its schema.

Creating and Modifying Relations Using SQL-92

The SQL-92 language standard uses the word table to denote relation, and we will often follow this
convention when discussing SQL. The subset of SQL that supports the creation, deletion, and modication
of tables is called the Data De nitionLan-guage (DDL). Further, while there is a command that lets users
de ne new domains, analogous to type de nition commands in a programming language, we postpone a
dis-cussion of domain de nition until Section 5.11. For now, we will just consider domains that are built-in
types, such as integer.

The CREATE TABLE statement is used to de ne a new table.2To create the Students relation, we can use
the following statement:

CREATE TABLE Students ( sid CHAR(20),

name CHAR(30),

login CHAR(20),

age INTEGER,

gpa REAL )

Tuples are inserted using the INSERT command. We can insert a single tuple into the Students table as
follows:

INSERT
68 Fundamentals of Database Management 

INTO Students (sid, name, login, age, gpa)

VALUES (53688, `Smith', `smith@ee', 18, 3.2)

We can optionally omit the list of column names in the INTO clause and list the values in the appropriate
order, but it is good style to be explicit about column names.
2SQL also provides statements to destroy tables and to change the columns associated with a table; we

discuss these in Section 3.7.

We can delete tuples using the DELETE command. We can delete all Students tuples with name equal to
Smith using the command:

DELETE

FROM Students S

WHERE S.name = `Smith'

We can modify the column values in an existing row using the UPDATE command. For example, we can
increment the age and decrement the gpa of the student with sid 53688:

UPDATE Students S

SET S.age = S.age + 1, S.gpa = S.gpa - 1

WHERE S.sid = 53688

These examples illustrate some important points. The WHERE clause is applied rst and determines which
rows are to be modied. The SET clause then determines how these rows are to be modied. If the column
that is being modied is also used to determine the new value, the value used in the expression on the
right side of equals

(=) is the old value, that is, before the modication. To illustrate these points further, consider the
following variation of the previous query:

UPDATE Students S

SET S.gpa = S.gpa - 0.1

WHERE S.gpa>= 3.3

If this query is applied on the instance S 1 of Students shown in Figure-1 , we obtain the instance shown
in Figure-3

sid name login age gpa

50000 Dave dave@cs 19 3.2


53666 Jones jones@cs 18 3.3
53688 Smith smith@ee 18 3.2
53650 Smith smith@math 19 3.7
53831 Madayan madayan@music 11 1.8
53832 Guldu guldu@music 12 2.0
Fundamentals of Database Management 69

Figure-3 Students Instance S1 after Update

INTEGRITY CONSTRAINTS OVER RELATIONS

A database is only as good as the information stored in it, and a DBMS must therefore help prevent the
entry of incorrect information. An integrity constraint (IC) is a condition that is specified on a database
schema, and restricts the data that can be stored in an instance of condition that is specified on a database.

If a database instance satises all the integrity constraints specied on the database schema, it is a legal
instance. A DBMS enforces integrity constraints, in that it permits only legal instances to be stored
in the database.

Integrity constraints are specied and enforced at di erent times:

1.When the DBA or end user de nes a database schema, he or she species the ICs that must hold on
any instance of this database.

2.When a database application is run, the DBMS checks for violations and disallows changes to the data
that violate the specied ICs. (In some situations, rather than disallow the change, the DBMS might
instead make some compensating changes to the data to ensure that the database instance satises
all ICs. In any case, changes to the database are not allowed to create an instance that violates any
IC.)

Many kinds of integrity constraints can be specied in the relational model. We have already seen one
example of an integrity constraint in the domain constraints associated with a relation schema (Section 3.1).
In general, other kinds of constraints can be specied as well; for example, no two students have the same
sid value. In this section we discuss the integrity constraints, other than domain constraints, that a DBA or
user can specify in the relational model.

Consider the Students relation and the constraint that no two students have the same student id. This IC is
an example of a key constraint. A key constraint is a statement that a certain minimal subset of the fields of
a relation is a unique identier for a tuple. A set of fields that uniquely identies a tuple according to a key
constraint is called a candidate key for the relation; we often abbreviate this to just key. In the case of the
Students relation, the (set of fields containing just the) sid field is a candidate key.

Let us take a closer look at the above de nition of a (candidate) key. There are two parts to the de nition:3

1.Two distinct tuples in a legal instance (an instance that satises all ICs, including the key constraint)
cannot have identical values in all the fields of a key.
2.No subset of the set of fields in a key is a unique identier for a tuple.
3.The term key is rather overworked. In the context of access methods, we speak of search keys, which
are quite di erent.

The first part of the definition means that in any legal instance, the values in the key fields uniquely identify
a tuple in the instance.
When specifying a key constraint, the DBA or user must be sure that this constraint will not prevent them
from storing a correct' set of tuples. (A similar comment applies to the specication of other kinds of Integrity
Constraints well).The notion
of `correctness' here depends upon the nature of the data being stored. For example, several students may
have the same name, although each student has a unique student id. If the name field is declared to be a
key, the DBMS will not allow the Students relation to contain two tuples describing different students with
the same name!
70 Fundamentals of Database Management 

The second part of the definition means, for example, that the set of field names is not a key for
Students, because this set properly contains the key fi eld. The set fieid, name is an example of a superkey,
which is a set of fields that contains a key.

Look again at the instance of the Students relation in Figure 3. Observe that two di erent rows always
have di erentsid values; sid is a key and uniquely identies a tuple. However, this does not hold for
nonkey fields. For example, the relation contains two rows with Smith in the name field.

Note that every relation is guaranteed to have a key. Since a relation is a set of tuples, the set of all fields is
always a superkey. If other constraints hold, some subset of the fields may form a key, but if not, the set of
all fields is a key.

A relation may have several candidate keys. For example, the login and age fields of the Students relation
may, taken together, also identify students uniquely. That is, flogin, ageg is also a key. It may seem that
login is a key, since no two rows in the example instance have the same login value. However, the key must
identify tuples uniquely in all possible legal instances of the relation. By stating that flogin, age g is a key,
the user is declaring that two students may have the same login or age, but not both.

Out of all the available candidate keys, a database designer can identify a primary key. Intuitively, a tuple
can be referred to from elsewhere in the database by storing the values of its primary key fields. For
example, we can refer to a Students tuple by storing its sid value. As a consequence of referring to student
tuples in this manner, tuples are frequently accessed by specifying their sid value. In principle, we can use
any key, not just the primary key, to refer to a tuple. However, using the primary key is preferable because
it is what the DBMS expects|this is the signicance of designating a particular candidate key as a primary
key|and optimizes for. For example, the DBMS may create an index with the primary key fields as the search
key, to make the retrieval of a tuple given its primary key value e cient. The idea of referring to a tuple is
developed further in the next section.

Specifying Key Constraints in SQL-92

In SQL we can declare that a subset of the columns of a table constitute a key by using the UNIQUE
constraint. At most one of these `candidate' keys can be declared to be a primary key, using the PRIMARY
KEY constraint. (SQL does not require that such constraints be declared for a table.)

Let us revisit our example table de nition and specify key information:

CREATE TABLE Students ( sid CHAR(20),

name CHAR(30),

login CHAR(20),

ageINTEGER,

gpaREAL,

UNIQUE (name, age),

CONSTRAINT Students Key PRIMARY KEY (sid) )

This definition says that sid is the primary key and that the combination of name and age is also a key. The
definition of the primary key also illustrates how we can name a constraint by preceding it with CONSTRAINT
Fundamentals of Database Management 71

constraint-name. If the constraint is violated, the constraint name is returned and can be used to identify
the error.

Sometimes the information stored in a relation is linked to the information stored in another relation. If one
of the relations is modified, the other must be checked, and perhaps modified, to keep the data consistent. An
IC involving both relations must be specied if a DBMS is to make such checks. The most common IC
involving two relations is a foreign key constraint.

Suppose that in addition to Students, we have a second relation:

Enrolled(sid: string, cid: string, grade: string)

To ensure that onl y B a n d A g r a d e students can enroll in courses, any value that appears in the sid
field of an instance of the Enrolled relation should also appear in the sid field of some tuple in the Students
relation. The sid field of Enrolled is called a foreign key and refers to Students. The foreign key in the
the referencing relation (Enrolled, in our example) must match the primary key of the referencced relation
(Students), i.e., it must have the same number of columns and compatible data types, although the column
names can be different.
This constraint is illustrated in Figure- 4. As the figure shows, there may be some students who are not
referenced from Enrolled (e.g., the student with sid=50000).
However, every sid value that appears in the instance of the Enrolled table appears in the primary key
column of a row in the Students table.

Foreign key Primary key

cid grade sid

Carnatic101 C 53831

Reggae203 B 53832

Topology112 A 53650

History105 B 53666
72 Fundamentals of Database Management 

Enrolled (Referencing relation)

Students (Referenced relation)

Figure- 4 Referential Integrity

If we try to insert the tuple h55555, Art104, Ai into E1, the IC is violated because there is no tuple
in S1 with the id 55555; the database system should reject such an insertion. Similarly, if we
delete the tuple h53666, Jones, jones@cs, 18, 3.4i from S1, we violate the foreign key constraint
because the tuple h53666, History105, Bi in E1 contains sid value 53666, the sid of the deleted
Students tuple. The DBMS should disallow the deletion or, perhaps, also delete the Enrolled tuple
that refers to the deleted Students tuple. We discuss foreign key constraints and their impact on
updates in Section 3.3.

Finally, we note that a foreign key could refer to the same relation. For example, we could extend
the Students relation with a column called partner and declare this column to be a foreign key
referring to Students. Intuitively, every student could then have a partner, and the partner field
contains the partner's sid. The observant reader will no doubt ask, What if a student does not
(yet) have a partner?" This situation is handled in SQL by using a special value called null. The use
of null in field of a tuple means that value in that field either unknown or not applicable (e.g., we
do not know the partner yet, or there is no partner). The appearance of null in a foreign key field
does not violate the foreign key constraint. However, null values are not allowed to appear in a
primary key field (because the primary key fields are used to identify a tuple uniquely). We will
discuss null values further.

Specifying Foreign Key Constraints in SQL-92

Let us de ne Enrolled(sid: string, cid: string, grade: string):

CREATE TABLE Enrolled ( sid CHAR(20),

cid CHAR(20), grade CHAR(10),

PRIMARY KEY (sid, cid),

FOREIGN KEY (sid) REFERENCES Students )

The foreign key constraint states that every sid value in Enrolled must also appear in Students, that
is, sid in Enrolled is a foreign key referencing Students. Incidentally, the primary key constraint
states that a student has exactly one grade for each course that he or she is enrolled in. If we want
to record more than one grade per student per course, we should change the primary key
constraint.

General Constraints

Domain, primary key, and foreign key constraints are considered to be a fundamental part of the
relational data model and are given special attention in most commercial systems. Sometimes,
however, it is necessary to specify more general constraints.

For example, we may require that student ages be within a certain range of values; given such an
IC specication, the DBMS will reject inserts and updates that violate the constraint. This is very
useful in preventing data entry errors. If we specify that all students must be at least 16 years old,
the instance of Students shown in Figure- 1 is illegal because two students are underage. If we
Fundamentals of Database Management 73

disallow the insertion of these two tuples, we have a legal instance, as shown in Figure- 5

sid name login age gpa

53666 Jones jones@cs 18 3.4


53688 Smith smith@ee 18 3.2
53650 Smith smith@math 19 3.8

Figure-5 An Instance S2 of the Students


Relation

The IC that students must be older than 16 can be thought of as an extended domain constraint,
since we are essentially defining the set of permissible age values more strin-gently than is possible
by simply using a standard domain such as integer. In general, however, constraints that go well
beyond domain, key, or foreign key constraints can be specified. For example, we could require that
every student whose age is greater than 18 must have a gpa greater than 3.

Current relational database systems support such general constraints in the form of table
constraints and assertions. Table constraints are associated with a single table and are checked
whenever that table is modified. In contrast, assertions involve several tables and are checked
whenever any of these tables is modified. Both table constraints and assertions can use the full
power of SQL queries to specify the desired restriction. We discuss SQL support for table
constraints and assertions in Section 5.11 because a full appreciation of their power requires a
good grasp of SQL's query capabilities.

ENFORCING INTEGRITY CONSTRAINTS

As we observed earlier, ICs are specified when a relation is created and enforced when a relation is
modified. The impact of domain, PRIMARY KEY, and UNIQUE constraints is straightforward: if an
insert, delete, or update command causes a violation, it is rejected. Potential IC violation is
generally checked at the end of each SQL statement execution, although it can be deferred until
the end of the transaction executing the statement.

Consider the instance S1 of Students shown in Figure 1. The following insertion violates the
primary key constraint because there is already a tuple with the sid 53688, and it will be rejected
by the DBMS:

INSERT

INTO Students (sid, name, login, age, gpa)

VALUES (53688, `Mike', `mike@ee', 17, 3.4)

The following insertion violates the constraint that the primary key cannot contain null:

INSERT

INTO Students (sid, name, login, age, gpa)

VALUES (null, `Mike', `mike@ee', 17, 3.4)

Of course, a similar problem arises whenever we try to insert a tuple with a value in a field that is not
in the domain associated with that field, i.e., wh
nenever we violate a domain constraint. Deletion
74 Fundamentals of Database Management 

does not cause a violation of domain, primary key or unique constraints. However, an update can
cause violations, similar to an insertion:

UPDATE Students S

SET S.sid = 50000

WHERE S.sid = 53688

This update violates the primary key constraint because there is already a tuple with sid 50000.

The impact of foreign key constraints is more complex because SQL sometimes tries to rectify a
foreign key constraint violation instead of simply rejecting the change. We will
discuss the referential integrity enforcement steps taken by the DBMS in terms of our Enrolled and
Students tables, with the foreign key constraint that Enrolled.sid is a reference to (the primary key
of) Students.

In addition to the instance S1 of Students, consider the instance of Enrolled shown in Figure 4.
Deletions of Enrolled tuples do not violate referential integrity, but insertions of Enrolled tuples
could. The following insertion is illegal because there is no student with sid 51111:

INSERT

INTO Enrolled (cid, grade, sid)

VALUES (`Hindi101', `B', 51111)

On the other hand, insertions of Students tuples do not violate referential integrity although
deletions could. Further, updates on either Enrolled or Students that change the sid value could
potentially violate referential integrity.

SQL-92 provides several alternative ways to handle foreign key violations. We must consider three
basic questions:

1. What should we do if an Enrolled row is inserted, with a sid column value that does not appear
in any row of the Students table?

In this case the INSERT command is simply rejected.

2. What should we do if a Students row is deleted?

The options are:

Delete all Enrolled rows that refer to the deleted Students row.

Disallow the deletion of the Students row if an Enrolled row refers to it.

Set the sid column to the sid of some (existing) `default' student, for every Enrolled row that refers
to the deleted Students row.

For every Enrolled row that refers to it, set the sid column to null. In our example, this option
conflicts with the fact that sid is part of the primary key of Enrolled and therefore cannot be set to
null. Thus, we are limited to the first three options in our example, although this fourth option
(setting the foreign key to null) is available in the general case.
Fundamentals of Database Management 75

3. What should we do if the primary key value of a Students row is updated?

The options here are similar to the previous case.

SQL-92 allows us to choose any of the four options on DELETE and UPDATE. For example, we can
specify that when a Students row is deleted, all Enrolled rows that refer to it are to be deleted as
well, but that when the sid column of a Students row is modified, this update is to be rejected if an
Enrolled row refers to the modfiied Students row:

grade CHAR(10), PRIMARY KEY (sid, cid),

FOREIGN KEY (sid) REFERENCES Students

ON DELETE CASCADE

ON UPDATE NO ACTION )

The options are specied as part of the foreign key declaration. The default option is

NO ACTION, which means that the action (DELETE or UPDATE) is to be rejected. Thus, the ON
UPDATE clause in our example could be omitted, with the same effect. The CASCADE keyword says
that if a Students row is deleted, all Enrolled rows that refer to it are to be deleted as well. If the
UPDATE clause specied CASCADE, and the sid column of a Students row is updated, this update is
also carried out in each Enrolled row that refers to the updated Students row.

If a Students row is deleted, we can switch the enrollment to a `default' student by using ON
DELETE SET DEFAULT. The default student is specified as part of the definition of the sid field in
Enrolled; for example, sid CHAR(20) DEFAULT`53666'. Although the specication of a default value
is appropriate in some situations (e.g., a default parts supplier if a particular supplier goes out of
business), it is really not appropriate to switch enrollments to a default student. The correct
solution in this example is to also delete all enrollment tuples for the deleted student (that is,
CASCADE), or to reject the update.

SQL also allows the use of null as the default value by specifying ON DELETE SET NULL.

QUERYING RELATIONAL DATA

A relational database query (query, for short) is a question about the data, and the answer consists
of a new relation containing the result. For example, we might want to add all students younger than
18 or all students enrolled in Reggae203. A query language is a specialized language for writing
queries.

SQL is the most popular commercial query language for a relational DBMS. We now present some
SQL examples that illustrate how easily relations can be queried. Consider the instance of the
Students relation shown in Figure 1. We can retrieve rows corresponding to students who are
younger than 18 with the following SQL query:

SELECT *

FROM Students S

WHERE S.age< 1
76 Fundamentals of Database Management 

The symbol * means that we retain all fields of selected tuples in the result. To understand this
query, think of S as a variable that takes on the value of each tuple in Students, one tuple after the
other. The condition S.age<18 in the WHERE clause species that we want to select only tuples in
which the age field has a value less than 18.

18. This query evaluates to the relation shown in Figure-6

sid name login age gpa

53831 Madayan madayan@music 11 1.8


53832 Guldu guldu@music 12 2.0

Figure 6 shows-Students with age< 18 on Instance S 1

This example illustrates that the domain of a field restricts the operations that are permittedon field
values, in addition to restricting the values that can appear in the field. The condition S.age<18
involves an arithmetic comparison of an age value with an integer and is permissible because the
domain of age is the set of integers. On the other hand, a condition such as S.age = S.sid does not
make sense because it compares an integer value with a string value, and this comparison is de
fined to fail in SQL; a query containing this condition will produce no answer tuples.

In addition to selecting a subset of tuples, a query can extract a subset of the fields of each selected
tuple. We can compute the names and logins of students who are younger than 18 with the
following query:

SELECT S.name, S.login

FROM Students S

WHERE s.age<18;

Figure 7 shows the answer to this query; it is obtained by applying the selection to the instance
S1 of Students (to get the relation shown in Figure 6), followed by removing unwanted fields. Note
that the order in which we perform these operations does matter if we remove unwanted fields
first, we cannot check the condition S.age< 18 , which involves one of those fields.

We can also combine information in the Students and Enrolled relations. If we want to obtain the
names of all students who obtained an A and the id of the course in which they got an A, we could
write the following query:

SELECT S.name, E.cid

FROM Students S, Enrolled E

WHERE S.sid = E.sid AND E.grade = `A'

DISTINCT types in SQL: A comparison of two values drawn from different domains should fail, even
if the values are `compatible' in the sense that both are numeric or both are string values etc. For
Fundamentals of Database Management 77

example, if salary and age are two different domains whose values are represented as integers, a
comparison of a salary value with an age value should fail. Unfortunately, SQL-92's support for the
concept of domains does not go this far: We are forced to deny salary and age as integer types
and the comparison S < A will succeed when S is bound to the salary value 25 and A is bound to
the age value 50. The latest version of the SQL standard, called SQL:1999, addresses this problem,
and allows us to deny salary and age as DISTINCT types even though their values are represented
as integers. Many systems, e.g., Informix UDS and IBM DB2, already support this feature.

Name login

Madayan madayan@music
Guldu guldu@music

Figure- 7 Names and Logins of Students under 18

This query can be understood as follows: If there is a Students tuple S and an Enrolled tuple E
such that S.sid = E.sid (so that S describes the student who is enrolled in E) and E.grade = `A',
then print the student's name and the course id." When evaluated on the instances of Students and
Enrolled in Figure 3.4, this query returns a single tuple, hSmith, Topology112i.

LOGICAL DATABASE DESIGN: ER TO RELATIONAL

The ER model is convenient for representing an initial, high-level database design. Given an ER
diagram describing a database, there is a standard approach to generating a relational database
schema that closely approximates the ER design. (The translation is approximate to the extent that
we cannot capture all the constraints implicit in the ER design using SQL-92, unless we use certain
SQL-92 constraints that are costly to check.) We now describe how to translate an ER diagram into
a collection of tables with associated constraints, i.e., a relational database schema.

Entity Sets to Tables

An entity set is mapped to a relation in a straightforward way: Each attribute of the entity set
becomes an attribute of the table. Note that we know both the domain of each attribute and the
(primary) key of an entity set.

Consider the Employees entity set with attributes ssn, name, and lot shown in Figure 8. A
possible instance of the Employees entity set, containing three Employees

name

ssn lot

Employees

Figure-8 The Employees Entity Set


78 Fundamentals of Database Management 

entities, is shown in Figure-9 In a tabular format.

ssn name lot

123-22-3666 Attishoo 48
231-31-5368 Smiley 22
131-24-3650 Smethurst 35

Figure-9 An Instance of the Employees Entity Set

The following SQL statement captures the preceding information, including the domain constraints
and key information:

CREATE TABLE Employees ( ssn CHAR(11),

name CHAR(30),

lot INTEGER,

PRIMARY KEY (ssn) )

Relationship Sets (without Constraints) to Tables

A relationship set, like an entity set, is mapped to a relation in the relational model. We begin by
considering relationship sets without key and participation constraints, and we discuss how to
handle such constraints in subsequent sections. To represent a relationship, we must be able to
identify each participating entity and give values to the descriptive attributes of the relationship.
Thus, the attributes of the relation include:

The primary key attributes of each participating entity set, as foreign key fields.

The descriptive attributes of the relationship set.

The set of non-descriptive attributes is a superkeyfor the relation. If there are no key constraints
(see Section 2.4.1), this set of attributes is a candidate key.

Consider the Works_In2 relationship set shown in Figure 10. Each department has offices in several
locations and we want to record the locations at which each employee works.

since
Name dname
ssn Lot did budget

Employees Works_In2 Departments

address Locations capacity


Fundamentals of Database Management 79

Figure-10 A Ternary Relationship Set

All the available information about the Works_In2 table is captured by the following SQL definition:

CREATE TABLE Works In2 ( ssn CHAR(11),


did INTEGER,
address CHAR(20),
since DATE,

PRIMARY KEY (ssn, did, address),

FOREIGN KEY (ssn) REFERENCES Employees,

FOREIGN KEY (address) REFERENCES Locations,

FOREIGN KEY (did) REFERENCES Departments )

Note that the address, did, and ssn fields cannot take null values. Because these fields are part of
the primary key for Works_In2, a NOT NULL constraint is implicit for each of these fields.This
constraint ensures that these fields uniquely identitify a department, employee, and a location in
each tuple of Works In. We can also specify that a particular action is desired when a referenced
Employees, Departments or Locations tuple is deleted, as explained in the discussion of integrity
constraints in Section 3.2. In this chapter we assume that the default action is appropriate except
for situations in which the semantics of the ER diagram require some other action.

Finally, consider the Reports To relationship set shown in Figure 11.

name

ssn lot

Employees

supervisor subordinate

Reports_To

Figure-11 The Reports To Relationship Set


The role indicators supervisor and subordinate are used to create meaningful field names in the
CREATE statement for the Reports To table:

CREATE TABLE Reports To ( supervisorssn CHAR(11), subordinate ssn CHAR(11),

PRIMARY KEY (supervisor ssn, subordinate ssn),


80 Fundamentals of Database Management 

FOREIGN KEY (supervisor ssn) REFERENCES Employees(ssn), FOREIGN KEY (subordinate ssn)
REFERENCES Employees(ssn) )

Observe that we need to explicitly name the referenced field of Employees because the field name
differs from the name(s) of the referring field(s).

Translating Relationship Sets with Key Constraints

If a relationship set involves n entity sets and some m of them are linked via arrows in the ER
diagram, the key for any one of these m entity sets constitutes a key for the relation to which the
relationship set is mapped. Thus we have m candidate keys, and one of these should be designated
as the primary key. The translation discussed in Section 2.3 from relationship sets to a relation can
be used in the presence of key constraints, taking into account this point about keys.

Consider the relationship set Manages shown in Figure12. The table corresponding

since
name dname
ssn lot did budget

Employees Manages Departments

Figure-12 Key Constraint on Manages

to Manages has the attributes ssn, did, since. However, because each department has at most one
manager, no two tuples can have the same did value but differ on the ssn value. A consequence of
this observation is that did is itself a key for Manages; indeed, the set did, ssn is not a key
(because it is not minimal). The Manages relation can be defined using the following SQL
statement:

CREATE TABLE Manages ( ssn CHAR(11),

did INTEGER,

since DATE,

PRIMARY KEY (did),

FOREIGN KEY (ssn) REFERENCES Employees,

FOREIGN KEY (did) REFERENCES Departments )

A second approach to translating a relationship set with key constraints is often superior because
it avoids creating a distinct table for the relationship set. The idea is to include the information
about the relationship set in the table corresponding to the entity set with the key, taking
advantage of the key constraint. In the Manages example, because a department has at most one
manager, we can add the key fields of the Employees tuple denoting the manager and the since
attribute to the Departments tuple.
Fundamentals of Database Management 81

This approach eliminates the need for a separate Manages relation, and queries asking for a
department's manager can be answered without combining information from two relations. The
only drawback to this approach is that space could be wasted if several departments have no
managers. In this case the added fields would have to be lied with null values. The first translation
(using a separate table for Manages) avoids this inefficiency, but some important queries require us
to combine information from two relations, which can be a slow operation.

The following SQL statement, defining a DeptMgr relation that captures the information in both
Departments and Manages, illustrates the second approach to translating relationship sets with key
constraints:

CREATE TABLE Dept_Mgr ( did INTEGER,


dname CHAR(20),
budget REAL,
ssn CHAR(11),
since DATE,

PRIMARY KEY (did),

FOREIGN KEY (ssn) REFERENCES Employees )

Note that ssn can take on null values.

This idea can be extended to deal with relationship sets involving more than two entity sets. In
general, if a relationship set involves n entity sets and some m of them are linked via arrows in the
ER diagram, the relation corresponding to any one of the m sets can be augmented to capture the
relationship.

We discuss the relative merits of the two translation approaches further after considering how to
translate relationship sets with participation constraints into tables.

Translating Relationship Sets with Participation Constraints

Consider the ER diagram in Figure 13, which shows two relationship sets, Manages and Works In.

since
name dname
ssn lot did budget

Employees Manages Departments

Works_In

since
82 Fundamentals of Database Management 

Figure-13 Manages and Works In

Rajeev 333445555 1955-12-08 Vashi 5 Research 333445555

Greta 999887777 1968-07-19 Sion 4 Admin 987654321

Rajesh 987654321 1941-06-20 Dadar 4 Admin 987654321

First Instance: - To insert a new employee tuple in to Emp_Dept table, we must include either the
attribute values for the department that the employee works for, or nulls (if the employee does not
work for a department as yet). For example to insert a new tuple for an employee who works in
department no 5, we must enter the attribute values of department number 5 correctly so
that they are consistent, with values for the department 5 in other tuples in emp_dept.
Second Instance: - It is difficult to insert a new department that has no employees as yet in the
emp_dept relation. The only way to do this is to place null values in the attributes for the
employee this causes a problem because SSN in the primary key of emp_dept table and
each tuple is supposed to represent an employee entity- not a department entity.
Moreover, when the first employee is assigned to that department, we do not need this tuple with
null values anymore.

Deletion Anomalies
A "deletion anomaly" is a failure to remove information about an existing database entry when it is
time to remove that entry. In a properly normalized database, information about an old, to-
be-gotten-rid-of entry needs to be deleted from only one place in the database; in an
inadequately normalized database, information about that old entry may need to be deleted from
more than one place, and, human fallibility being what it is, some of the needed additional
deletions may be missed.
The problem of deletion anomaly is related to the second insertion anomaly situation which
we have discussed earlier, if we delete from emp_dept an employee tuple that happens to
represent the last employee working for a particular department, the information
concerning that department is lost from the database.
Modification Anomalies
In Emp_Dept, if we change the value of one of the attribute of a particular department- say, the
manager of department 5-we must update the tuples of all employees who work in that
department; otherwise, the database will become inconsistent. If we fail to update some tuples, the
same department will be shown to have 2 different values for manager in different employee tuple
which would be wrong.
All three kinds of anomalies are highly undesirable, since their occurrence constitutes
corruption of the database. Properly normalized databases are much less susceptible to
corruption than are unnormalized databases.

Normalization

Designing a normalized database structure is the first step when building a database that is meant
to last. Normalization is a simple, commonsense, process that leads to flexible, efficient,
maintainable database structures. We‘ll examine the major principles and objectives of
normalization and denormalization, and then take a look at some powerful optimization techniques
that can break the rules of normalization.
What is Normalization?
Fundamentals of Database Management 83

Yes, but what is this normalization all about? If I am simply putting it, normalization is a formal
process for determining which fields belong in which tables in a relational database.
Normalization follows a set of rules worked out at the time relational databases were born. A
normalized relational database provides several benefits:
 Elimination of redundant data storage.
 Close modeling of real world entities, processes, and their relationships.
 Structuring of data so that the model is flexible.
 Normalization ensures that you get the benefits relational databases offer. Time spent
learning about normalization will begin paying for itself immediately.

Design versus Implementation

Now we will look in to the aspects regarding the tasks associated with designing and
implementing a database.
Designing a database structure and implementing a database structure are different tasks. When
you design a structure it should be described without reference to the specific database
tool you will use to implement the system, or what concessions you plan to make for
performance reasons. These steps come later. After you‘ve designed the database structure
abstractly, then you implement it in a particular environment--4D in our case. Too often people
new to database design combine design and implementation in one step. 4D makes this
tempting because the structure editor is so easy to use. Implementing a structure without
designing it quickly leads to flawed structures that are difficult and costly to modify. Design first,
implement second, and you'll finish faster and cheaper.

Normalized Design: Pros and Cons

Oh, now we‘ve implied that there are various advantages to producing a properly
normalized design before you implement your system. Let's look at a detailed list of the pros and
cons:

Pros of Normalizing:

 More efficient database structure.


 Better understanding of your data.
 More flexible database structure.
 Easier to maintain database structure.
 Few (if any) costly surprises down the road.
 Validates your common sense and intuition.
 Avoids redundant fields.
 Ensures that distinct tables exist when necessary.

Cons of Normalizing:

You can‘t start building the database before you know what the user needs.
As from above, it is clear that the pros outweigh the cons.

Terminology
84 Fundamentals of Database Management 

There are a couple terms that are central to a discussion of normalization: "key" and
"dependency". These are probably familiar concepts to anyone who has built relational database
systems, though they may not be using these words. We define and discuss them here as
necessary background for the discussion of normal forms that follows.

Formal Definitions of the Normal Forms

1st Normal Form (1NF)


Def: A table (relation) is in 1NF if
 There are no duplicated rows in the table.
 Each cell is single-valued (i.e., there are no repeating groups or arrays).
 Entries in a column (attribute, field) are of the same kind.
Note: The order of the rows is immaterial; the order of the columns is immaterial. The requirement
that there be no duplicated rows in the table means that the table has a key (although the key
might be made up of more than one column—even, possibly, of all the columns).
So we come to the conclusion,
A relation is in first normal form if and only if, in every legal value of that relation every tuple
contains one value for each attribute.

The above definition merely states that the relations are always in first normal form which
is always correct. However the relation that is only in first normal form has a structure
those undesirable for a number of reasons.
First normal form (1NF) sets the very basic rules for an organized database:
 Eliminate duplicative columns from the same table.
 Create separate tables for each group of related data and identify each row with a unique
column or set of columns (the primary key).
2nd Normal Form (2NF)
Def: A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on the entire key.
Note: Since a partial dependency occurs when a non-key attribute is dependent on only a part of
the (composite) key, the definition of 2NF is sometimes phrased as, "A table is in
2NF if it is in 1NF and if it has no partial dependencies." Recall the general requirements of 2NF:
 Remove subsets of data that apply to multiple rows of a table and place them in
 separate rows.
 Create relationships between these new tables and their predecessors through the
 use of foreign keys.
These rules can be summarized in a simple statement: 2NF attempts to reduce the
amount of redundant data in a table by extracting it, placing it in new table(s) and
creating relationships between those tables.
Let's look at an example. Imagine an online store that maintains customer information in a
database. Their Customers table might look something like this:
Fundamentals of Database Management 85

A brief look at this table reveals a small amount of redundant data. We're storing the
"Sea Cliff, NY 11579" and "Miami, FL 33157" entries twice each. Now, that might not seem like too
much added storage in our simple example, but imagine the wasted space if we had thousands of
rows in our table. Additionally, if the ZIP code for Sea Cliff were to change, we'd need to make
that change in many places throughout the database.
In a 2NF-compliant database structure, this redundant information is extracted and stored in a
separate table. Our new table (let's call it ZIPs) might look like this:

If we want to be super-efficient, we can even fill this table in advance -- the post office provides a
directory of all valid ZIP codes and their city/state relationships. Surely, you've encountered
a situation where this type of database was utilized. Someone taking an order might have asked
you for your ZIP code first and then knew the city and state you were calling from. This type
of arrangement reduces operator error and increases efficiency.
Now that we've removed the duplicative data from the Customers table, we've satisfied the first
rule of second normal form. We still need to use a foreign key to tie the two tables together. We'll
use the ZIP code (the primary key from the ZIPs table) to create that relationship. Here's our new
Customers table:

We've now minimized the amount of redundant information stored within the database and our
structure is in second normal form, great isn‘t it?
3rd Normal Form (3NF)
86 Fundamentals of Database Management 

Def: A table is in 3NF if it is in 2NF and if it has no transitive dependencies. The basic requirements
of 3NF are as follows:
 Meet the requirements of 1NF and 2NF
 Remove columns that are not fully dependent upon the primary key.
Imagine that we have a table of widget orders:

Remember, our first requirement is that the table must satisfy the requirements of 1NF and 2NF.
Are there any duplicative columns? No. Do we have a primary key? Yes, the order number.
Therefore, we satisfy the requirements of 1NF. Are there any subsets of data that apply to multiple
rows? No, so we also satisfy the requirements of 2NF.
Now, are all of the columns fully dependent upon the primary key? The customer number
varies with the order number and it doesn't appear to depend upon any of the other
fields. What about the unit price? This field could be dependent upon the customer number in a
situation where we charged each customer a set price. However, looking at the data above, it
appears we sometimes charge the same customer different prices. Therefore, the unit price is
fully dependent upon the order number. The quantity of items also varies from order to order, so
we're OK there.
What about the total? It looks like we might be in trouble here. The total can be derived by
multiplying the unit price by the quantity; therefore it's not fully dependent upon the primary key.
We must remove it from the table to comply with the third normal form:

Now our table is in 3NF.


Boyce-Codd Normal Form (BCNF)
Def. A relation is said to be in the BCNF if and only if it is in the 3NF and every non- trivial, left-
irreducible functional dependency has a candidate key as its determinant. In more informal terms,
a relation is in BCNF if it is in 3NF and the only determinants are the candidate keys.
4th Normal Form (4NF)
Def: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.
5th Normal Form (5NF)
Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if every
join dependency in the table is a consequence of the candidate keys of the table.
Domain-Key Normal Form (DKNF)
Fundamentals of Database Management 87

Def: A table is in DKNF if every constraint on the table is a logical consequence of the definition of
keys and domains.

Steps to Normalize a Table

The following are the steps to normalize a table:


1.Eliminate Repeating Groups
2.Make a separate table for each set of related attributes, and give each table a
primary key.
3.Eliminate Redundant Data
4.If an attribute depends on only part of a multi-valued key, remove it to a separate table.
5.Eliminate Columns Not Dependent On Key
6.If attributes do not contribute to a description of the key, remove them to a separate table.
7.Isolate Independent Multiple Relationships
8.No table may contain two or more 1:n or n:m relationships that are not directly
related.
9. Isolate Semantically Related Multiple Relationships
10. There may be practical constrains on information that justify separating logically
related many-to-many relationships.
11. Optimal Normal Form

A model limited to only simple (elemental) facts, as expressed in ORM


12. Domain-Key Normal Form
A model free from all modification anomalies.

Understanding Database Instance

The term instance is typically used to describe a complete database environment, including the
RDBMS software, table structure, stored procedures and other functionality. It is most commonly
used when administrators describe multiple instances of the same database.
Example:
An organization with an employees database might have three different instances: production
(used to contain live data), pre-production (used to test new functionality prior to release into
production) and development (used by database developers to create new functionality).

Understanding Database Language

Database language is another important part of DBMS. It is used to access the required data from
database as well as to design the structure of database. A user uses a database language for
interfacing with the DBMS to access the data from database. A user can either be an application
programmer or an end-user. For example, an application programmer may use COBOL or C++ or
Visual Basic or any fourth-general language (4GL).
88 Fundamentals of Database Management 

Similarly, an end-user may use database access language, which is also known as query language.
Mostly, the application programmer inserts the statements of the database access language into its
program written in general-purpose programming language. It is because database access
language is also referred to as data sub-language. Similarly, the database language does not
provide the complete programming language features. Many DBMSs have their own unique sub-
languages.
The users use the database access language to enter new data, change the existing data in
database and to retrieve required data from databases. The user writes a set of appropriate
commands or statements in a database access language and submits these to the DBMS. The
DBMS translates the user commands and sends it to a specific part of the DBMS called the
Database Jet Engine. The database engine generates a set of results according to the commands
submitted by user, converts these into a user readable form called an Inquiry Report and then
displays them on the screen. The administrators use the database access language to create and
maintain the databases.
The most popular database access language is SQL (Structured Query Language). Relational
Databases are required to have a database query language. Today most of the RDBMSs use the
SQL as database access language. Ms-Access also uses the SQL to perform different operations on
the databases. These operations are hidden from the users.

Explaining Database Security

What is Database Security?

Database security is the system, processes, and procedures that protect a database from
unintended activity. Well Unintended activity can be categorized as authenticated misuse, malicious
attacks or inadvertent mistakes made by authorized individuals or processes. Database security is
also a specialty within the broader discipline of computer security.
Traditionally databases have been protected from external connections by firewalls or routers on
the network perimeter with the database environment existing on the internal network opposed to
being located within a demilitarized zone. Additional network security devices that detect and alert
on malicious database protocol traffic include network intrusion detection systems along with host-
based intrusion detection systems.
Database security is more critical as networks have become more open.
Databases provide many layers and types of information security, typically specified in the data
dictionary, including:
 Access control
 Auditing
 Authentication
 Encryption
 Integrity controls

Discretionary Access Control

Discretionary access control verifies whether the user who is attempting to perform an operation
has been granted the required privileges to perform that operation. You can perform the following
types of discretionary access control:
 Create user roles to control which users can perform operations on which database
objectsControl who is allowed to create databases.
 Prevent unauthorized users from registering user-defined routines.
 Control whether other users besides the DBSA are allowed to view executing SQL statements
Fundamentals of Database Management 89

User Roles

A role is a work-task classification, such as payroll or payroll manager. Each defined role has
privileges on the database object granted to the role. You use the CREATE ROLE statement to
define a role.

Setting Permission to Create Databases

Use the DBCREATE_PERMISSION configuration parameter to give specified users permission to


create databases and thus prevent other users from creating databases.

Security for External Routines (UDRs)

External routines with shared libraries that are outside the database server can be security risks.
External routines include user-defined routines (UDRs) and the routines in DataBlade modules.

Enabling non-DBSAs to View SQL Statements a Session Is


Executing

Mandatory Access

Mandatory Access Control (MAC) implementations in Relational Database Management Systems


(RDBMS) have focused solely on Multilevel Security (MLS). MLS has posed a number of challenging
problems to the database research community, and there has been an abundance of research work
to address those problems. Unfortunately, the use of MLS RDBMS has been restricted to a few
government organizations where MLS is of paramount importance such as the intelligence
community and the Department of Defense. The implication of this is that the investment of
building an MLS RDBMS cannot be leveraged to serve the needs of application domains where there
is a desire to control access to objects based on the label associated with that object and the label
associated with the subject accessing that object, but where the label access rules and the label
structure do not necessarily match the MLS two security rules and the MLS label structure. This talk
introduces a flexible and generic implementation of MAC in RDBMS that can be used to address the
requirements from a variety of application domains, as well as to allow an RDBMS to efficiently take
part in an end-to-end MAC enterprise solution. The talk also discusses the extensions made to the
SQL compiler component of an RDBMS to incorporate the label access rules in the access plan it
generates for an SQL query, and to prevent unauthorized leakage of data that could occur as a
result of traditional optimization techniques performed by SQL compilers

Statistical Databases

A statistical database is a database used for statistical analysis purposes. It is an OLAP instead of
OLTP system, although this term precedes that modern decision, and classical statistical databases
are often closer to the relational model than the multidimensional model commonly used in OLAP
systems today.
Statistical databases often incorporate support for advanced statistical analysis techniques, such as
correlations, which go beyond SQL. They also pose unique security concerns, which were the focus
of much research, particularly in the late 1970s and early to mid 1980s
90 Fundamentals of Database Management 

Security in Statistical Databases

In a statistical database, it is often desired to allow query access only to aggregate data, not
individual records. However, securing such a database is a difficult problem, since intelligent users
can use a combination of aggregate queries to derive information about a single individual.
Some common approaches are:

 Only allowing aggregate queries (SUM, COUNT, AVG, STDEV, etc.)


 Rather than returning exact values for sensitive data like income, only return which partition
it belongs to (e.g. 35k-40k)
 Return imprecise counts (e.g. rather than 141 records met query, only indicate 130-150
records met it.)
 Don't allow overly selective WHERE clauses
 Audit all users queries, so users using system incorrectly can be investigated
 Use intelligent agents to detect automatically inappropriate system use

Data Encryption

Encryption is the process of transforming information using an algorithm to make it unreadable to


anyone except those possessing special knowledge, usually referred to as a key. The result of the
process is encrypted information (in cryptography, referred to as ciphertext). In many contexts, the
word encryption also implicitly refers to the reverse process, decryption (e.g. ―software for
encryptionǁ can typically also perform decryption), to make the encrypted information readable
again (i.e. to make it unencrypted).
Encryption has long been used by militaries and governments to facilitate secret communication.
Encryption is now used in protecting information within many kinds of civilian systems, such as
computers, storage devices (e.g. USB flash drives),networks (e.g. the Internet e-commerce),
mobile telephones, wireless microphones, wireless intercom systems, Bluetooth devices and bank
automatic teller machines. Encryption is also used in digital rights management to prevent
unauthorized use or reproduction of copyrighted material and in software also to protect against
reverse engineering
Encryption, by itself, can protect the confidentiality of messages, but other techniques are still
needed to protect the integrity and authenticity of a message; for example, verification of a
message authentication code (MAC) or a digital signature. Standards and cryptographic software
and hardware to perform encryption are widely available, but successfully using encryption to
ensure security may be a challenging problem. A single slip-up in system design or execution can
allow successful attacks. Sometimes an adversary can obtain unencrypted information without
directly undoing the encryption.
Fundamentals of Database Management 91

Chapter 6 : Writing Queries Using SQL

A Brief History of SQL

Here we want to emphasize that SQL is both deep and wide. Deep in the sense that it is
implemented at many levels of database communication, from a simple Access form list box right
up to high-volume communications between mainframes. SQL is widely implemented in
that almost every DBMS supports SQL statements for communication. The reason for
this level of acceptance is partially explained by the amount of effort that went into the theory
and development of the standards.

Current State

So the ANSI-SQL group has published three standards over the years:
SQL89 (SQL1) SQL92 (SQL2) SQL99 (SQL3)
The vast majority of the language has not changed through these updates. We can all
profit from the fact that almost all of the code we wrote to SQL standards of 1989 is still perfectly
usable. Or in other words, as a new student of SQL there is over ten years of SQL code out there
that needs your expertise to maintain and expand.
Most DBMS are designed to meet the SQL92 standard. Virtually all of the material in this book
was available in the earlier standards as well. Since many of the advanced features of
SQL92 have yet to be implemented by DBMS vendors, there has been little pressure for a new
version of the standard. Nevertheless a SQL99 standard was developed to address advanced
issues in SQL. All of the core functions of SQL, such as adding, reading and modifying data, are the
same. Therefore, the topics in this book are not affected by the new standard. As of early
2001, no vendor has implemented the SQL99 standard.
There are three areas where there is current development in SQL standards. First entails improving
Internet access to data, particularly to meet the needs of the emerging XML standards. Second
is integration with Java, either through Sun's Java Database Connectivity (JDBC) or
through internal implementations. Last, the groups that establish SQL standards are considering
how to integrate object- based programming models.

SQL Data Definition Statements

Table space Creation

Use the CREATE TABLESPACE statement to create a tablespace, which is an


allocation of space in the database that can contain schema objects.
 A permanent tablespace contains persistent schema objects. Objects in permanent
tablespaces are stored in datafiles.
 An undo tablespace is a type of permanent tablespace used by Oracle Database to
manage undo data if you are running your database in automatic undo management mode.
Oracle strongly recommends that you use automatic undo management mode rather than
using rollback segments for undo.
 A temporary tablespace contains schema objects only for the duration of a session.
tempfiles.
Objects in temporary tablespaces are stored in
92 Fundamentals of Database Management 

When you create a tablespace, it is initially a read/write tablespace. You can


subsequently use the ALTER TABLESPACE statement to take the tablespace
offline or online, add datafiles or tempfiles to it, or make it a read-only
tablespace.
You can also drop a tablespace from the database with
the DROP TABLESPACE statement.
Syntax
create_tablespace::=

Description of the illustration create_tablespace.gif


permanent_tablespace_clause::=

Description of the illustration permanent_tablespace_clause.gif

temporary_tablespace_clause::=
Fundamentals of Database Management 93

Description of the illustration temporary_tablespace_clause.gifundo_tablespace_clause::=

Description of the illustration undo_tablespace_clause.gif

Semantics

BIGFILE | SMALLFILE

Use this clause to determine whether the tablespace is a bigfile or smallfiletablespace. This clause
overrides any default tablespace type setting for the database.
 A bigfiletablespace contains only one datafile or tempfile, which can contain up to
approximately 4 billion (232) blocks. The maximum size of the single datafile or tempfile is
128 terabytes (TB) for a tablespace with 32K blocks and 32TB for a tablespace with 8K
blocks.
 A smallfiletablespace is a traditional Oracle tablespace, which can contain 1022 datafiles
or tempfiles, each of which can contain up to approximately 4 million (222) blocks.

If you omit this clause, then Oracle Database uses the current default tablespace type of
permanent or temporary tablespace set for the database. If you specify BIGFILE for a permanent
tablespace, then the database by default creates a locally managed tablespace with automatic
segment-space management.

Table space Management

Introduction to Tablespaces, Datafiles, and Control Files

Oracle Database stores data logically in tablespaces and physically in datafiles associated with
the corresponding tablespace. Figure A illustrates this relationship.
94 Fundamentals of Database Management 

Figure-A Datafiles and Tablespaces

Description of "above figure Datafiles and Tablespaces"

Databases, tablespaces, and datafiles are closely related, but they have
important differences:

 An Oracle database consists of at least two logical storage units called tablespaces, which
collectively store all of the database's data. You must have
the SYSTEM and SYSAUX tablespaces and a third tablespace, called TEMP, is
optional.
 Each tablespace in an Oracle database consists of one or more files called datafiles, which
are physical structures that conform to the operating system in which Oracle Database is
running.
 A database's data is collectively stored in the datafiles that constitute each tablespace of
the database. For example, the simplest Oracle database would have one tablespace and
one datafile. Another database can have three tablespaces, each consisting of two datafiles
(for a total of six datafiles).
Fundamentals of Database Management 95

So What is SQL?

Structured Query Language, commonly abbreviated to SQL and pronounced as ―sequelǁ, is


not a conventional computer programming language in the normal sense of the phrase. It allows
users to access data in relational database management systems. SQL is about data and
results, each SQL statement returns a result, whether that result be a query, an update to a record
or the creation of a database table. SQL is most often used to address a relational database, which
is what some people refer to as a SQL database. So in brief we can describe SQL as follows:
 SQL stands for Structured Query Language
 SQL allows you to access a database
 SQL can execute queries against a database
 SQL can retrieve data from a database
 SQL can insert new records in a database
 SQL can delete records from a database
 SQL can update records in a database
 SQL is easy to learn

SQL Commands

There are three groups of commands in SQL:


 Data Definition
 Data Manipulation and
 Transaction Control

Characteristics Of SQL Commands

Here you can see that SQL commands follow a number of basic rules:
 SQL keywords are not normally case sensitive, though this in this tutorial all
commands (SELECT, UPDATE etc) are upper-cased.
 Variable and parameter names are displayed here as lower-case.
 New-line characters are ignored in SQL, so a command may be all on one line or
broken up across a number of lines for the sake of clarity.
 Many DBMS systems expect to have SQL commands terminated with a semi-colon
character.

SQL Data Definition Language (DDL)

The Data Definition Language (DDL) part of SQL permits database tables to be created or deleted.
We can also define indexes (keys), specify links between tables, and impose constraints between
database tables.
The most important DDL statements in SQL are:

 CREATE TABLE - creates a new database table


 ALTER TABLE - alters (changes) a database table
96 Fundamentals of Database Management 

 DROP TABLE - deletes a database table

How to Create Table


The SQL statement to create a table has the basic form:
CREATE TABLE name( col1 datatype, col2 datatype, …);

So, to create our User table we enter the following command:


CREATE TABLE User (FirstName TEXT, LastName TEXT, UserID TEXT, Dept
TEXT, EmpNo INTEGER, PCTypeTEXT );

The TEXT datatype, supported by many of the most common DBMS, specifies a string of characters
of any length. In practice there is often a default string length which varies by product. In some
DBMS TEXT is not supported, and instead a specific string length has to be declared. Fixed
length strings are often called CHAR(x), VCHAR(x) or VARCHAR(x), where x is the string
length. In the case of INTEGER there are often multiple flavors of integer available.
Remembering that larger integers require more bytes for data storage, the choice of int size is
usually a design decision that ought to be made up front.

How to Modify Table

Once a table is created it's structure is not necessarily fixed in stone. In time requirements change
and the structure of the database is likely to evolve to match your wishes. SQL can be used to
change the structure of a table, so, for example, if we need to add a new field to our User table to
tell us if the user has Internet access, then we can execute an SQL ALTER TABLE command as
shown below:
ALTER TABLE User ADD COLUMN Internet BOOLEAN;

To delete a column the ADD keyword is replaced with DROP, so to delete the field we have just
added the SQL is:
ALTER TABLE User DROP COLUMN Internet; How to delete table
If you have already executed the original CREATE TABLE command your database will already
contain a table called User, so let's get rid of that using the DROP command: DROP TABLE User;
And now we'll recreate the User table we'll use throughout the rest of this tutorial:
CREATE TABLE User (FirstName VARCHAR (20), LastName VARCHAR (20), UserID
VARCHAR(12) UNIQUE, Dept VARCHAR(20), EmpNo INTEGER UNIQUE, PCType
VARCHAR(20);

SQL Data Manipulation Language (DML)

SQL language also includes syntax to update, insert, and delete records.
These query and update commands together form the Data Manipulation Language (DML)
part of SQL:

 INSERT INTO - inserts new data into a database table


 UPDATE - updates data in a database table
 DELETE - deletes data from a database table
 SELECT - extracts data from a database table

How to Insert Data


Having now built the structure of the database it is time to populate the tables with some data. In
the vast majority of desktop database applications data entry is performed via a user interface built
Fundamentals of Database Management 97

around some kind of GUI form. The form gives a representation of the information required for the
application, rather than providing a simple mapping onto the tables. So, in this sample application
you would imagine a form with text boxes for the user details, drop-down lists to select from
the PC table, drop-down selection of the software packages etc. In such a situation the
database user is shielded both from the underlying structure of the database and from the SQL
which may be used to enter data into it. However we are going to use the SQL directly to populate
the tables so that we can move on to the next stage of learning SQL.
The command to add new records to a table (usually referred to as an append query), is:
INSERT INTO target [(field1[, field2[, ...]])] VALUES (value1[, value2[, ...]);
So, to add a User record for user Jim Jones, we would issue the following INSERT query:
INSERT INTO User (FirstName, LastName, UserID, Dept, EmpNo, PCType) VALUES ("Jim", "Jones",
"Jjones","Finance", 9, "DellDimR450");

Obviously populating a database by issuing such a series of SQL commands is both


tedious and prone to error, which is another reason why database applications have front- ends.
Even without a specifically designed front-end, many database systems - including MS Access -
allow data entry direct into tables via a spreadsheet-like interface.
The INSERT command can also be used to copy data from one table into another. For example, The
SQL query to perform this is:
INSERT INTO User ( FirstName, LastName, UserID, Dept, EmpNo, PCType, Internet ) SELECT
FirstName, LastName, UserID, Dept, EmpNo, PCType, Internet
FROM NewUsers; How to Update Data

The INSERT command is used to add records to a table, but what if you need to make an
amendment to a particular record? In this case the SQL command to perform updates is the
UPDATE command, with syntax:
UPDATE table SET newvalue WHERE criteria;
For example, let's assume that we want to move user Jim Jones from the Finance
department to Marketing. Our SQL statement would then be:
UPDATE User
SET Dept="Marketing" WHERE EmpNo=9;

Notice that we used the EmpNo field to set the criteria because we know it is unique. If we'd used
another field, for example LastName, we might have accidentally updated the records for any other
user with the same surname.
The UPDATE command can be used for more than just changing a single field or record at a time.
The SET keyword can be used to set new values for a number of different fields, so we
could have moved Jim Jones from Finance to marketing and changed the PCType as well in the
same statement (SET Dept="Marketing", PCType="PrettyPC"). Or if all of the Finance
department were suddenly granted Internet access then we could have issued the following
SQL query:
UPDATE User
SET Internet=TRUE WHERE Dept="Finance";
You can also use the SET keyword to perform arithmetical or logical operations on the values. For
example if you have a table of salaries and you want to give everybody a 10% increase you can
issue the following command:
UPDATE PayRoll
SET Salary=Salary * 1.1;
How to Delete Data
Now that we know how to add new records and to update existing records it only remains to learn
how to delete records before we move on to look at how we search through and collate data. As
you would expect SQL provides a simple command to delete complete records. The syntax of the
command is:
DELETE FROM table [WHERE <condition>];
98 Fundamentals of Database Management 

Let's assume we have a user record for John Doe, (with an employee number of 99),
which we want to remove from our User we could issue the following query:
DELETE * FROM User
WHERE EmpNo=99;
In practice delete operations are not handled by manually keying in SQL queries, but are likely to
be generated from a front end system which will handle warnings and add safe- guards against
accidental deletion of records.
Note that the DELETE query will delete an entire record or group of records. If you want to delete
a single field or group of fields without destroying that record then use an UPDATE query
and set the fields to Null to over-write the data that needs deleting. It is also worth noting that the
DELETE query does not do anything to the structure of the table itself, it deletes data only. To
delete a table, or part of a table, then you have to use the DROP clause of an ALTER TABLE query.

Transaction Control Language(TCL)

The SQL Data Control Language (DCL) provides security for your database. The DCL consists of
the GRANT, REVOKE, COMMIT, and ROLLBACK statements. GRANT and REVOKE statements
enable you to determine whether a user can view, modify, add, or delete database information.
Working With Transaction Control
Applications execute a SQL statement or group of logically related SQL statements to
perform a database transaction. The SQL statement or statements add, delete, or modify data in
the database.
Transactions are atomic and durable. To be considered atomic, a transaction must
successfully complete all of its statements; otherwise none of the statements execute. To be
considered durable, a transaction's changes to a database must be permanent.
Complete a transaction by using either the COMMIT or ROLLBACK statements. COMMIT
statements make permanent the changes to the database created by a transaction.
ROLLBACK restores the database to the state it was in before the transaction was performed.
SQL Transaction Control Language Commands (TCL.)
This page contains some SQL TCL. commands that I think it might be useful. Each
command's description is taken and modified from the SQLPlus help. They are provided as is and
most likely are partially described. So, if you want more detail or other commands, please
use HELP in the SQLPlus directly.

COMMIT

PURPOSE:
To end your current transaction and make permanent all changes performed in the
transaction. This command also erases all savepoints in the transaction and releases the
transaction's locks. You can also use this command to manually commit an indoubt
distributed transaction.
SYNTAX:

SQL>COMMIT;

Potrebbero piacerti anche