Sei sulla pagina 1di 124

Unit-I

Database Management
Systems

What is a Database?
A database is any collection of data.
A DBMS is a software system
designed to maintain a database.
We use a DBMS when
there is a large amount of data
security and integrity of the data are
important
many users access the data
concurrently

Example Database
Application
Consider a Phone Company, such as
AT&T
Kinds of information they deal with:
customer records

employee records

billing information

management records
switching and wiring diagrams

customer service orders

Concerns of a Database
User
With all that data,
AT&T must be concerned with
questions such as:
Where is the information kept?
How is the data structured?
How is the data kept consistent?
How is the data described?
How is the data kept secure?
How do different pieces of data interrelate?

Why Use a DBMS?


Without a DBMS, we'd have:
Access by a collection
of ad hoc programs
in C++, Java, PHP, etc.

data stored as bits on disks


organized as files

users of
the data

There is no control or
coordination of what
these programs do
with the data

Why Use a DBMS?


With a DBMS, we have:

applications

DBMS

data stored as bits on disks


organized as files

users of
the data

DBMS provides control


and coordination to
protect the data.

Levels of Abstraction
Users

Views describe how


users see the data.
Conceptual schema
defines logical structure

View 1 View 2 View 3


Conceptual Schema

Physical schema
describes the files and
indexes used.
(sometimes called the
ANSI/SPARC model)

Physical Schema

DB

Example: University Database


Conceptual schema:

View 1

View 2

View 3

Students(sid: string, name: string,


Conceptual Schema
login: string, age: integer, gpa:real)
Courses(cid: string, cname:string,
Physical Schema
credits:integer)
Enrolled(sid:string, cid:string,
DB
grade:string)

External Schema (View):


Course_info(cid:string,enrollment:integer)

Physical schema:
Relations stored as unordered files.
Index on first column of Students.

Data Independence
Applications insulated from
how data is structured and
stored.
Logical data independence:
Protection from changes in
logical structure of data.

View 1

View 2

View 3

Conceptual Schema
Physical Schema

Physical data
independence: Protection
from changes in physical
structure of data.

DB

Queries, Query Plans, and Operators


SELECT
SELECT eid,
E.loc,
ename,
AVG(E.sal)
title
COUNT
DISTINCT
(E.eid)
FROM
Emp
E
FROM
Emp
E,E.loc
Proj
P, Asgn A
WHERE
GROUP
BY
E.sal
> $50K
WHERE E.eid = A.eid
HAVING Count(*) > 5
AND P.pid = A.pid
AND E.loc <> P.loc

Count
Having
distinct

Group(agg)
Join
Select

Join
Emp

System handles query plan


generation & optimization;
ensures correct execution.

Proj

Emp
Emp
Asgn

Employees
Projects
Assignments

Issues: view reconciliation, operator ordering, physical


operator choice, memory management, access path (index)
use,

Levels of Abstraction

Categories of data models

One fundamental characteristic of the database approach is


that it provides some level of data abstraction

High-level or Conceptual data models:


Provide concept that are close to the way many users perceive
data
Low-level or Physical data model:
Provide concepts that describe the details of how data is
stored in the computer

Conceptual data models


It uses concepts such as entities, attributes and
relationships.
Entity represents a real-world object or concept,
such as employee or project
Attribute represents some property of interest
that further describes an entity, such as
employees name or salary
Relation represents an association among two or
more entitles

Example of a Relation

Schemas and Database


State
In any data model, it is important to
distinguish between the description
of the data and database itself
The description of the database is
called the database schema
A displayed Schema is called a
schema diagram

University Database

Example of a Database Schema

Example of a Database Schema

Schemas and Database


State

The data in the database at a particular moment in time is called a database


state
The distinction between database schema and database state is very important
When we define a new database, we specify its database schema only to the
DBMS
At this point, the corresponding database state is the empty state with no data
We get the initial state of the database when the database is first loaded
From then on, every time an update operation is applied to the database, we get
another database state

Schemas and Database


State
Valid State: a state that satisfies the structure
and constrains specified in the schema.
The database schema changes very
infrequently.
The database state changes every time the
database is updated
Schema is also called intension.
State is also called extension.

Three-Schema Architecture
Defines DBMS schemas at three levels:
Internal schema at the internal level to
describe physical storage structures and
access paths (e.g indexes).
Conceptual schema at the conceptual level
to describe the structure and constraints for
the whole database for a community of users.
External schemas at the external level to
describe the various user views.

The three-schema
architecture
User/application view
defined by user or
application
programmer in
consultation with DBA
Defined by DBA

Defined by DBA for


optimization

DBMS Languages
The first step to create a database through DBMS
is to specify conceptual and internal schemas for
the database
Data Definition Language (DDL): is used by
database designers to define schemas
Data Manipulation Language (DML)
View Definition Language (VDL): is to specify
user views
In current DBMS, the preceding types of
languages are usually not considered distinct
languages

Data Definition Language (DDL)


Specification notation for defining the
database schema
DDL compiler generates a set of tables
stored in a data dictionary
Data dictionary contains metadata (data
about data)
Data storage and definition language
special type of DDL in which the storage
structure and access methods used by the
database system are specified
24

Data Manipulation Language


(DML)
Language for accessing and
manipulating the data organized by
the appropriate data model
Two classes of languages
Procedural user specifies what data is
required and how to get those data
Nonprocedural user specifies what
data is required without specifying how
to get those data
25

ANSI/SPARC Architecture
ANSI - American
National Standards
Institute
SPARC - Standards
Planning and
Requirements
Committee
1975 - proposed a
framework for DBs

A three-level
architecture
Internal level: For
systems designers
Conceptual level: For
database designers and
administrators
External level: For
database users

Internal Level
Deals with physical
storage of data
Structure of records on
disk - files, pages,
blocks
Indexes and ordering of
records
Used by database
system programmers

Internal Schema
RECORD EMP
LENGTH=44
HEADER: BYTE(5)
OFFSET=0
NAME: BYTE(25)
OFFSET=5
SALARY: FULLWORD
OFFSET=30
DEPT: BYTE(10)
OFFSET=34

Conceptual Level
Deals with the
organisation of the
data as a whole
Abstractions are used to
remove unnecessary
details of the internal
level
Used by DBAs and
application
programmers

Conceptual Schema
CREATE TABLE
Employee (
Name
VARCHAR(25),
Salary REAL,
Dept_Name
VARCHAR(10))

External Level
Provides a view of the
database tailored to a
user
Parts of the data may
be hidden
Data is presented in a
useful form
Used by end users and
application
programmers

External Schemas
Payroll:
String Name
double Salary
Personnel:
char *Name
char *Department

Mappings
Mappings translate
information from one
level to the next
External/Conceptual
Conceptual/Internal

These mappings
provide data
independence

Physical data
independence
Changes to internal
level shouldnt affect
conceptual level

Logical data
independence
Conceptual level
changes shouldnt
affect external levels

ANSI/SPARC Architecture
User 1

External Schemas

User 2
External
View 1

User 3
External
View 2

External/Conceptual Mappings
Conceptual Schema

Conceptual
View

Conceptual/Internal Mapping
Internal Schema

Stored
Data

DBA

Typical DBMS Component


Modules

Interfacing Components of
DBMS
users

software

hardware

data

DBMS Roles
application developers

DBMS
system
developers
database
designer

data
definition
processor

application
application
application
program(s)
application
program(s)
program(s)
program(s)

users of
the
data

query processor
security manager
concurrency manager
index manager

data
dictionary

data

system
administrator
(and DB
designer)

DBMS Roles
Actors On the Scene
(people interested in the actual
data):
database administrators
database designers
systems analysts and application
programmers
end users

Actors on the Scene


Database Administrators
acquiring a DBMS
managing the system
acquiring HW and SW to support the
DBMS
authorizing access (security policies)
managing staff, including DB designers

Actors on the Scene


Database Designers
identifying the information of interested
in the Universe of Discourse (UoD)
designing the database conceptual
schema
designing views for particular users
designing the physical data layout and
logical schema
adjusting data parameters for
performance

Actors on the Scene


Systems Analysts and Application
Programmers
(generic database developers)
provide specialized knowledge to
optimize database usage
provide generic (canned) application
programs

Actors on the Scene


End Users
casual users: ad-hoc queries
nave or parametric users: canned queries such as
menus for a phone company customer service agent
sophisticated users: people who understand the
system and the data and use it in many novel ways
standalone users: people who use personal easy-touse databases for personal data

DBMS Roles
Actors Behind the Scene:
people who maintain the
environment
but aren't interested in the actual
data
DBMS designers and implementers
tools developers
operators and maintenance personnel
database researchers

Actors Behind the Scene


DBMS designers and implementers
work for the company that supplies the
DBMS
(i.e. Microsoft , Oracle, Sybase, MySQL
)
programmers and engineers
design and implement the DBMS

Actors Behind the Scene


Tools Developers
design and implement DBMS add-ons or
plug-ins
may work for DBMS supplier or be
independent
kinds of tools: database design aids,
performance monitoring tools, user and
designer interfaces

Actors Behind the Scene


Operators and maintenance
personnel
run and maintain the computer
environment in which a DBMS operates
probably work for the database
administrator (DBA)

Actors Behind the Scene


Database Researchers
academic or industrial researchers
develop new theory, new designs, new
data models and new algorithms to
improve future database management
systems

Software
controls the organization, storage, management, and retrieval of
data in a database.
It includes operating system, network software, and the
application programs
which encompasses the physical interconnections and devices
required to store and execute (or run) the software.
software consists of a machine language specific to an individual
processor.
It is usually written in high-level programming language more
efficient for humans to use .

Hardware
Hardware of a system can range from a PC to a network of
computers.
It also includes various storage devices like hard discs and
input and output devices like monitor, printer, etc.

DATA

Data stored in a database includes numerical data such as


whole numbers and floating point numbers and non numerical
data such as characters, date, or logical data.
More advanced systems may include more complicated data
entities such as pictures and images as data types.

Some other components of DBMS

User Interface
Data Manager
File Manager
Disk Manager
Physical Database

User Interface
The user interface is the is the aggregate of means by which the
people
the user interacts with the system a particular machine,
device, computer programme or other complex tools.
The user interface provides the means of:
-Input, allowing the users to manipulate the system.
-Output, allowing the system to produce the effects of the
users manipulation.
It refers to the graphical, textual and auditory information the
programme presents to the user and the control sequences the user
employs to the program.

Data Manager
It is a program which allows you to process and manipulate your data
in a easy and logical manner using a graphical interface.
Data Manager reads and writes delaminated files such as comma
separated files (CSV) and also can read data from ODBC Data
Sources.
It allows you to construct a conceptual design on how you are going
to process your data and transform it into another form.
You form your design by adding functional nodes and linking them
such that the links form the data flow through nodes on a graphical
work area.
You form your design by adding functional nodes and linking them
such that the links form the data flow through nodes on a graphical
work area.
Each node performs a single function on your data, once it completes
it passes your data to the node it is linked to and the process continues
until the data encounters a output node.
You can form a simple design or a complicated design with hundreds
of nodes and multiple input and output nodes.

File Manager
A file manager or file browser is a computer program that provides a
user interface to work with file systems.
They are very useful for speeding up interaction with files
The most common operations on files are create, open, edit, view,
print, play, rename, move, copy, delete, attributes, properties,
search/find, and permissions.
File managers may contain features inspired by web browsers,
including forward and back navigational buttons.
file managers also provide the ability to extend operations using user
written scripts.
It passes request to disk manager.

Disk Manager
Disk manager is a simple filesystem configurator that allows you to:
-Automatically detect new partitions at startup.

-Fully manage configuration of filesystem.


Disk Manager logs every change you make to the filesystem
configuration
explaining hardware concepts
documenting switches of many of the existing disks
putting into place custom software drivers, notably those related to
maximum disk or partition size
providing testing and informational utilities

Interaction of DBMS
components
Transaction Manager

DBMS
user
interface

Data
Manage
r

File
Manag
er

Disk
Manag
er

Recovery Manager

Physical
Databas
e

Explanation of interactions
The user requests for specific information with the help of user
interface.
This request is processed by data manager and after processing ,data
manager request for specific records to the file manager.
The file manager then request for the specific block to the disk
manager.
The disk manager then then retrives the block and sends it to file
manager,which sends the required record to data manager.
The transaction manager supervises the data transactions that is carried
out between the data manager, file manager, and the disk manager.
The recovery manager keeps a check on the transacted data so that in
case of system failure, the data can be protected.

Advantages of Using a
DBMS
application
application
application
program(s)
application
program(s)
program(s)
program(s)

users of
the
data

query processor
security manager
concurrency manager
index manager
data
definition
processor

data
dictionary

data

software operating
between the data and
the applications can
provide many
capabilities
in a generic way

Persistence
A DBMS provides
persistent objects, types and data structures
persistent = having a lifetime longer than
the programs that use the data
any information that fits the data model
of a particular DBMS
can be made persistent with little effort
data model = concepts that can be used to
describe the data

Concurrency
A DBMS supports access by concurrent users
concurrent = happening at the same time
concurrent access, particularly writes (data changes),

can result in inconsistent states


(even when the individual operations are correct)
the DBMS can check the actual operations of
concurrent users, to prevent activity that will lead to
inconsistent states

Access Control
A DBMS can restrict access to
authorized users
security policies often require control
that is more fine-grained than that
provided by a file system
since the DBMS understands the data
structure, it can enforce fairly
sophisticated and detailed security
policies

Redundancy Control
A DBMS can assist in controlling redundancy
redundancy = multiple copies of the same data
with file storage, it's often convenient to store
multiple copies of the same data, so that it's "local"
to other data and applications
this can cause many problems:
wasted disk space
inconsistencies
need to enter the data multiple times

Complex Semantics
A DBMS supports representation
of complex relationships and integrity
constraints
the semantics (meaning) of an application often
includes many relationships and rules
about the relative values of subsets of the data
these further restrict the possible instances of the
database
relationships and constraints can be defined as part of
the schema

Backup and Recovery


A DBMS can provide backup and recovery
backup = snapshots of the data particular times
recovery = restoring the data to a consistent state
after a system crash
the higher level semantics (relationships and
constraints)
can make it difficult to restore a consistent state
transaction analysis can allow a DBMS to
reconstruct a consistent state from a number of
backups

Views and Interfaces


A DBMS can support
multiple user interfaces and user views
since the DBMS provides a well-defined data model
and a persistent data dictionary, many different
interfaces can be developed to access the same data
data independence ensures that these UIs will not be
made invalid by most changes to the data
new user views can be supported as new schemas
defined against the conceptual schema

DBMS Structure
application
application
program(s)
application
program(s)
application
program(s)
program(s)

users of
the data

external/application view
internal/implementation view
DBMS
software
components

data
description

data
definition
processor

query processor
security manager
concurrency manager
index manager

data
dictionary

data

DBMS Languages
DML: data manipulation language
QL: query language
GPL: general purpose languages

application
application
application
program(s)
application
program(s)
program(s)
program(s)

users of
the
data

query processor
security manager
concurrency manager
index manager
DDL:
data
definition
language

data
definition
processor

data
dictionary

data

system
configuration
languages

Data Independence
physical data independence
conceptual and external schema are defined
in terms of the data model,
rather than the actual data layout
ensures that conceptual and external schemas
are not affected by changes to the physical data
layout

logical data independence


ensures that changes to the conceptual schema
don't affect the external views
(this is not always achievable)

Disadvantages of DBMS
The disadvantages of
summarized as follows:

the

database

approach

are

1.Complexity :The provision of the functionality that is


expected of a good DBMS makes the DBMS an extremely
complex piece of software. Database designers, developers,
database administrators and end-users must understand
this functionality to take full advantage of it. Failure to
understand the system can lead to bad design decisions,
which can have serious consequences for an organization.
2.Size :The complexity and breadth of functionality
makes the DBMS an extremely large piece of software,
occupying many megabytes of disk space and requiring
substantial amounts ofmemoryto run efficiently.
3.Performance:Typically, a File Based system is written
for a specific application, such as invoicing. As result,

4.Higher impact of a failure:The centralization of


resources increases the vulnerability of the system. Since
all users and applications rely on the ~vailabi1ity of the
DBMS, the failure of any component can bring operations
to a halt.

5.Cost of DBMS:The cost of DBMS varies significantly,


depending on the environment and functionality provided.
There is also the recurrent annual maintenance cost.
6. Additional Hardware costs:The disk storage
requirements for the DBMS and the database may
necessitate the purchase of additional storage space.
Furthermore, to achieve the required performance it may
be necessary to purchase a larger machine, perhaps even
a machine dedicated to running the DBMS. The
procurement of additional hardware results in further
expenditure.

Data Associations
Entities, Attributes and Relations

A database can be modeled as:


a collection of entities,
relationship among entities.

An entity is an object that exists and is


distinguishable from other objects.

Example: specific person, company, event, plant

Entities have attributes


Example: people have

names and addresses

An entity set is a set of entities of the same type that


share the same properties.

Example: set of all persons, companies, trees, holidays

Copyright @ www.bcanotes.com

Entity Sets customer and loan


customer-id customer- customer- customername street
city

Copyright @ www.bcanotes.com

loan- amount
number

Attribut
An entity ises
represented by a set of attributes, that is
descriptive
properties possessed by all members of
an entity set.
Example:
customer = (customer-id, customer-name,
customer-street, customer-city)
loan = (loan-number, amount)

Domain - the set of permitted values for each


attribute
Attribute types:

Simple and composite attributes.


Single-valued and multi-valued attributes

E.g. multivalued attribute: phone-numbers


Derived attributes

Can be computed from other attributes

E.g. age, given date of birth

Copyright @ www.bcanotes.com

Composite Attributes

Copyright @ www.bcanotes.com

Relationship Sets
A relationship is an association among several
entities
Example:
Hayes
depositor
A-102
customer entityrelationship setaccount entity

Example:

(Hayes, A-102) depositor

Relationship Set borrower

Copyright @ www.bcanotes.com

Relationship Sets (Cont.)


An attribute can also be property of a relationship set.
For instance, the depositor relationship set between entity sets

customer and account may have the attribute access-date

Copyright @ www.bcanotes.com

Degree of a Relationship Set


Refers to number of entity sets that participate in a
relationship set.
Relationship sets that involve two entity sets are
binary (or degree two). Generally, most relationship
sets in a database system are binary.
Relationship sets may involve more than two entity
sets. E.g. Suppose employees of a bank may have jobs

(responsibilities) at multiple branches, with different jobs at


different branches. Then there is a ternary relationship set
between entity sets employee, job and branch

Relationships between more than two entity sets are


rare. Most relationships are binary.

Copyright @ www.bcanotes.com

Mapping Cardinalities
Express the number of entities to which another
entity can be associated via a relationship set.
Most useful in describing binary relationship sets.
For a binary relationship set the mapping
cardinality must be one of the following types:

One to one
One to many

Many to one
Many to many

Copyright @ www.bcanotes.com

Mapping Cardinalities

One to one
One to many
Note: Some elements in A and B may not be mapped to any elements in the other set
Copyright @ www.bcanotes.com

Mapping Cardinalities ...

Many to one

Many to many

Note: Some elements in A and B may not be mapped to any elements in the other set
Copyright @ www.bcanotes.com

Mapping Cardinalities affect ER Design


Can make access-date an attribute of account, instead of a

relationship attribute, if each account can have only one customer


I.e., the relationship from account to customer is many to one,

or equivalently, customer to account is one to many

Copyright @ www.bcanotes.com

E-R Diagrams

Rectangles represent entity sets.


Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to relationship sets.

Ellipses represent attributes


Double ellipses represent multivalued attributes.

Dashed ellipses denote derived attributes.


Underline indicates primary key attributes (will study later)
Copyright @ www.bcanotes.com

E-R Diagram With Composite, Multivalued, and Derived


Attributes

Copyright @ www.bcanotes.com

Relationship Sets with Attributes

Roles
Entity sets of a relationship need not be distinct
The labels manager' and worker are called roles; they specify how
employee entities interact via the works-for relationship set.

Roles are indicated in E-R diagrams by labeling the lines that connect
diamonds to rectangles.
Role labels are optional, and are used to clarify semantics of the
relationship

Cardinality Constraints
We

express cardinality constraints by drawing either


a directed line (->), signifying one, or an
undirected line (), signifying many, between the
relationship set and the entity set.

E.g.: One-to-one relationship:


A customer is associated with at most one loan via the

relationship borrower
A loan is associated with at most one customer via borrower

One-To-Many Relationship
In the one-to-many relationship a loan is associated
with at most one customer via borrower, a customer
is associated with several (including 0) loans via
borrower

Many-To-One Relationships
In a many-to-one relationship a loan is associated with several

(including 0) customers via borrower, a customer is


associated with at most one loan via borrower

Many-To-Many Relationship

A customer is associated with several (possibly


0) loans via borrower
A loan is associated with several (possibly 0)
customers via borrower

Participation of an Entity Set in a Relationship Set


Total participation (indicated by double line): every entity in the entity
set participates in at least one relationship in the relationship set
E.g. participation of loan in borrower is total
every loan must have a customer associated to it via
borrower
Partial participation: some entities may not participate in any
relationship
in the
E.g.
participation
of relationship
customer in set
borrower is partial

Copyright @ www.bcanotes.com

Keys
A super key of an entity set is a set of one or more

attributes whose values uniquely determine each entity.


A candidate key of an entity set is a minimal super key
Customer-id is candidate key of customer
account-number is candidate key of account
Although several candidate keys may exist, one of the

candidate keys is selected to be the primary key.

Keys for Relationship Sets


The combination of primary keys of the participating
entity sets forms a super key of a relationship set.

(customer-id, account-number) is the super key of depositor


NOTE: this means a pair of entity sets can have at most one

relationship in a particular relationship set.

E.g. if we wish to track all access-dates to each account by each


customer, we cannot assume a relationship for each access. We can use
a multivalued attribute though

Must consider the mapping cardinality of the


relationship set when deciding the what are the
candidate keys
Need to consider semantics of relationship set in
selecting the primary key in case of more than one
candidate key

E-R Diagram with a Ternary Relationship

Copyright @ www.bcanotes.com

Cardinality Constraints on Ternary Relationship


We allow at most one arrow out of a ternary (or greater
degree) relationship to indicate a cardinality constraint
E.g. an arrow from works-on to job indicates each
employee works on at most one job at any branch.
If there is more than one arrow, there are two ways of
defining the meaning.

E.g a ternary relationship R between A, B and C with arrows to B

and C could mean


1. each A entity is associated with a unique entity from B and C or
2. each pair of entities from (A, B) is associated with a unique C
entity,
and each pair (A, C) is associated with a unique B
Each alternative has been used in different formalisms
To avoid confusion we outlaw more than one arrow

144DATABASEMODELS

A database model defines the logical design of data. The model also
describes the relationships between different parts of the data. In the
history of database design, three models have been in use: the
hierarchical model, the network model and the relational model.
1.File-Based Systems or Primitive Data Models:
Entities or objects of interest are represented by
records that are stored together in files.
Relationships between objects are represented by
using directories of various kinds.
2.Traditional Data Models: Most commonly used
traditional models are: hierarchical, network and
relational data model.
3.Semantic Data Models: this type models was
influenced by the semantic networks developed by

Hierarchical database model

In the hierarchical model, data is organized as an inverted


tree. Each entity has only one parent but can have several
children. At the top of the hierarchy, there is one entity,
which is called the root.

Figure 14.3 An example of the hierarchical model representing a university

Network database model

In the network model, the entities are organized in a graph,


in which some entities can be accessed through several
paths (Figure 14.4).

Figure 14.4 An example of the network model representing a university

Relational database model

In the relational model, data is organized in twodimensional tables called relations. The tables or relations
are, however, related to each other.

Figure 14.5 An example of the relational model representing a university

14.5THERELATIONALDATABASEMODEL

In the relational database management system (RDBMS), the


data is represented as a set of relations.
The entity-relationship model is a generalization of
other two commercial models (hierarchical and
network). It allows the representation of explicit
constraints as well as relationships. This model is
basically useful in the design and communication of
the logical database. In this model, the objects of
similar structures are collected into an entity set.
The relationship between entity sets is represented
by a named E-R relationship and is 1:1, 1:M or M:N,
mapping from one entity set to another. The
database structure, employing the E-R model is
usually shown pictorially using entity-relationship

Relations

A relation appears as a two-dimensional table. The RDBMS


organizes the data so that its external view is a set of
relations or tables. This does not mean that data is stored as
tables: the physical storage of the data is independent of the
way in which the data is logically organized.

Figure 14.6 An example of a relation

14.97

A relation in an RDBMS has the following features:


Name. Each relation in a relational database should have
a name that is unique among other relations.
Attributes. Each column in a relation is called an
attribute. The attributes are the column headings in the
table in Figure 14.6.
Tuples. Each row in a relation is called a tuple. A tuple
defines a collection of attribute values. The total number
of rows in a relation is called the cardinality of the
relation. Note that the cardinality of a relation changes
when tuples are added or deleted. This makes the
database dynamic.

146OPERATIONSONRELATIONS

In a relational database we can define several operations to


create new relations based on existing ones. We define nine
operations in this section: insert, delete, update, select, project,
join, union, intersection and difference. Instead of discussing
these operations in the abstract, we describe each operation as
defined in the database query language SQL (Structured
Query Language).

Structured Query Language


Structured Query Language (SQL) is the language
standardized by the American National Standards Institute
(ANSI) and the International Organization for
Standardization (ISO) for use on relational databases. It is
a declarative rather than procedural language, which
means that users declare what they want without having to
write a step-by-step procedure. The SQL language was first
implemented by the Oracle Corporation in 1979, with
various versions of SQL being released since then.

Insert
The insert operation is a unary operationthat is, it is
applied to a single relation. The operation inserts a new
tuple into the relation. The insert operation uses the
following format:

Figure 14.7 An example of an insert operation

14.101

Delete
The delete operation is also a unary operation. The operation
deletes a tuple defined by a criterion from the relation. The
delete operation uses the following format:

Figure 14.8 An example of a delete operation

14.102

Update

The update operation is also a unary operation that is applied


to a single relation. The operation changes the value of some
attributes of a tuple. The update operation uses the following
format:

Figure 14.9 An example of an update operation

14.103

Select
The select operation is a unary operation. The tuples (rows)
in the resulting relation are a subset of the tuples in the
original relation.

Figure 14.10 An example of an select operation

14.104

Project
The project operation is also a unary operation and creates
another relation. The attributes (columns) in the resulting
relation are a subset of the attributes in the original relation.

Figure 14.11 An example of a project operation

14.105

Join
The join operation is a binary operation that combines two
relations on common attributes.

Figure 14.12 An example of a join operation

Union

The union operation takes two relations with the same set of
attributes.

Figure 14.13 An example of a union operation

14.107

Intersection

The intersection operation takes two relations and creates a


new relation, which is the intersection of the two.

Figure 14.14 An example of an intersection operation

14.108

Difference

The difference operation is applied to two relations with the


same attributes. The tuples in the resulting relation are those
that are in the first relation but not the second.

Figure 14.15 An example of a difference operation

14.109

147DATABASEDESIGN

The design of any database is a lengthy and involved


task that can only be done through a step-by-step
process. The first step normally involves interviewing
potential users of the database. The second step is to
build an entity-relationship model (ERM) that defines
the entities, the attributes of those entities and the
relationship between those entities.

14.110

Entity-relationship models (ERM)

In this step, the database designer creates an entityrelationship (E-R) diagram to show the entities for which
information needs to be stored and the relationship between
those entities. E-R diagrams uses several geometric shapes,
but we use only a few of them here:
Rectangles represent entity sets
Ellipses represent attributes
Diamonds represent relationship sets
Lines link attributes to entity sets and link entity sets to
relationships sets
14.111

Example 14.1
Figure 14.16 shows a very simple E-R diagram with three entity
sets, their attributes and the relationship between the entity sets.

Figure 14.16 Entities, attributes and relationships in an E-R diagram


14.112

From E-R diagrams to relations

After the E-R diagram has been finalized, relations (tables)


in the relational database can be created.

Relations for entity sets


For each entity set in the E-R diagram, we create a relation
(table) in which there are n columns related to the n
attributes defined for that set.

14.113

Example 14.2
We can have three relations (tables), one for each entity set
defined in Figure 14.16, as shown in Figure 14.17.

Figure 14.17 Relations for entity set in Figure 14.16


14.114

Relations for relationship sets


For each relationship set in the E-R diagram, we create a
relation (table). This relation has one column for the key of
each entity set involved in this relationship and also one
column for each attribute of the relationship itself if the
relationship has attributes (not in our case).

14.115

Example 14.3
There are two relationship sets in Figure 14.16, teaches and takes,
each connected to two entity sets. The relations for these
relationship sets are added to the previous relations for the entity
set and shown in Figure 14.18.

Figure 14.18 Relations for E-R diagram in Figure 14.16


14.116

Normalization

Normalization is the process by which a given set of


relations are transformed to a new set of relations with a
more solid structure. Normalization is needed to allow any
relation in the database to be represented, to allow a
language like SQL to use powerful retrieval operations
composed of atomic operations, to remove anomalies in
insertion, deletion, and updating, and reduce the need for
restructuring the database as new data types are added.
The normalization process defines a set of hierarchical
normal forms (NFs). Several normal forms have been
proposed, including 1NF, 2NF, 3NF, BCNF (Boyce-Codd
Normal Form), 4NF, PJNF (Projection/Joint Normal Form),
5NF and so on.
14.117

First normal form (1NF)


When we transform entities or relationships into tabular
relations, there may be some relations in which there are
more values in the intersection of a row or column.

Figure 14.19 An example of 1NF

14.118

Second normal form (2NF)


In each relation we need to have a key (called a primary key)
on which all other attributes (column values) need to depend.
For example, if the ID of a student is given, it should be
possible to find the students name.

Figure 14.20 An example of 2NF

14.119

Other normal forms


Other normal forms use more complicated dependencies
among attributes. We leave these dependencies to books
dedicated to the discussion of database topics.

14.120

148OTHERDATABASEMODELS

The relational database is not the only database model


in use today. Two other common models are distributed
databases and object-oriented databases. We briefly
discuss these here.

14.121

Distributed databases

The distributed database model is not a new model, but is


based on the relational model. However, the data is stored on
several computers that communicate through the Internet or
a private wide area network. Each computer (or site)
maintains either part of the database or the whole database.
Fragmented distributed databases
In a fragmented distributed database, data is localized
locally used data is stored at the corresponding site.
However, this does not mean that a site cannot access data
stored at another site. Access is mostly local, but
occasionally global.
14.122

Replicated distributed databases


In a replicated distributed database, each site holds an exact
replica of another site. Any modification to data stored in one
site is repeated exactly at every site. The reason for having
such a database is security. If the system at one site fails,
users at the site can access data at another site.

14.123

Object-oriented databases

An object-oriented database tries to keep the advantages of


the relational model and at the same time allows applications
to access structured data. In an object-oriented database,
objects and their relations are defined. In addition, each
object can have attributes that can be expressed as fields.
XML
The query language normally used for objected-oriented
databases is XML (Extensible Markup Language). As we
discussed in Chapter 6, XML was originally designed to add
markup information to text documents, but it has also found
its application as a query language in databases. XML can
represent data with nested structures.
14.124

Potrebbero piacerti anche