IV Sem RDBMS

B.
Com (CA) IV - Semester Relational Database Management System

UNIT- I
RDBMS: It stands for Relational Database Management System. It is introduced by E.F.Codd in

the year of 1970’s who is the research fellow of IBM (International Business Machines) company.
What is Data?
Data is a raw collection of facts about people, places, things etc. That mean, unprocessed
data. All business activities deal with lot of data. Data recognized as an important asset
because data has the raw material from which information is derived. Business data are
likely to include facts about Employee, customers and products.
What is information?
Information is processed data. This is meaningful form of data and it is derived from
data. Data recognized as an imported accept because data has the raw material from which
information is derived.
Data Information
Data is raw collection of facts Information is meaningful data in an organized form
Data exists with the user Information is required by the user
This is unprocessed data This is processed data
Explain Meta data?

Meta data describes the properties or characteristics of other data; data about data is also
called as Meta data, Meta data includes complete details of Entity, such as Entity Name,
field name, data type and it size…..
What is DBMS?
Collection of logical related data in an organized form and set of application programs/
software is called Data Base Management System (DBMS). (or) collection of logical inter
related files and set of programs is called Data Base Management System (DBMS)., It
allows us to create databases.
What is database?
Collection of logical related data in an organized form is called Data Base.(or) collection of
logical inter related files is called Data Base., data base allows us to create objects such as
tables, views, indexes, procedures, etc…..
 In data base, we can store the data in the form of tables.

 Tables contains rows and columns, where rows called as records and columns called as
fields or attributes.
What is field?
A character or group of character that has a specific meaning is called field, in database, we
can store data in the form of tables, table consist of rows and columns, row called as records
or tuples, where as columns are called as fields or attributes.
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
What is record?
A logically connected set of one or more fields that describes a person, place or
thing…. . In DBMS, records also called as Rows, Tuples and Entity Instance etc…
 Database Management System:

 A Database Management System (DBMS) is system software for creating and managing
data bases.
 The DBMS provides users and programmer’s with a systematic way to create, retrieve,
update and manage data.
 A DBMS makes it possible for end user to create, read, update and delete data in a
database.
 The DBMS essentially serves as an interface between the database and end users or
application programmer’s.
 Ensuring that data is consistently organized and remains easily accessible.
 DBMS is like a designing total, communication total; it allows interaction among

designer, programmer and end user.
 DATA BASE MODEL:-
 Data modeling is the first step in designing a data base.
 Data model is a simple graphical representation of real-world data structure.
 A data represents data structures and their characteristics, relations and constrains.
 Data modeling is an iterative, progressive process.
 The main function of data base modeling is to understand the complexities of real-world
environment within the data base environment.
Importance of data models:
Data model is like a communication tool, it allows interaction among designer, programmer and
end user. Data base model is a blue “print” it containing all the instructions to build a data base
that will meet end-user requirements.
TYPES of DATA MODEL:-

1. Hierarchical data model
2. Network data model
3. Relational data model
4. Object oriented data model.
Hierarchical Data Model:
 It was developed in 1960’s to manage large amount of data.
 Logical structure represented as “Tree manner”.
 The hierarchical structure contains levels (or) segments.
 The TOP level is called as parent and the following levels consider as children’s of a
tree.
 Top most of the file is called root or parent. The bottom files are called child or leaves
intermediate file can have one parent and one or more childs.
 It allows one too many (1: m) relationship between parent and child segments. 
Network Data Model:

 This model was developed in 1960’s. It was mainly created to represent complex
 Each set us composed of at least two types of records they are owner record/parent,
member record / child.
 A set represent one to many relationship between owner record and member’s record.
 A network model allows a record have more than one parented.
Relational Data Model:

 This model was introduced in 1970’s by E.F.Codd.
 In this model data is arrange in the form of tables.
 Tables consists of rows and columns, where rows are called “Tuples” and columns
are called as “Attributes”.
 In this model, tables are related through common columns and not through pointer
 It supports SQL, which allows user to manipulate data.
Example:
Empno Ename Job Deptno Deptno Dname
(primary key) (foreign key) (primary key)
0919 Arshad Manager 10 10 Manager
0920 Ajay Clerk 20 20 Clerk
0921 Nisha Manager 10 30 HR
0922 Neha Hr 30
Object oriented Data Model:

 In this model member variables and member functions are encapsulated into a
single unit.
 The main advantage of this model is top up. It support graphics, sounds, videos,
etc…….
 It support many object oriented concepts like inheritance, polymorphism.
Sno
Sname
M1 Member Variables
M2
M3
Total( )
Member Functions
Avg( )
 File Based system / Traditional Based system

Physically arranging the records in large file is called “FILE MANAGEMENT SYSTEM”. In
traditional, file approach, data stored in FLAT FILES. A flat file contains records that have “no
structured inter relationship”. Which are maintained by the FILE SYSTEM, under the operating
system control; It was the first method to store business data in computerized manner. It allows
us to store, manipulate and retrieve the data of large files.
LIMITATIONS / DISADVANTAGES:
File management system has mainly 5 draw backs, they are:
1. Program data dependence
2. Data redundancy
3. Limited data sharing
4. Lengthy development time
5. Excessive program maintenance
1. Program data dependence
 In File Management System (FMS) all file descriptions are stored in each application
program that accesses the file.
 If we want to make any modifications, we need to change in all application programs
that access the file.
Explanation:
In the above diagram, program-A, program-B and Program-C are accessing “Employee
Master File”. This file contains file descriptions such as Empno, Ename, Sal, Ta, Hra..etc. These
file descriptions are stored in program-A, program-B and Program-C. If we want to change the
field sixe of Ename then we need to change in Program-A, Program-B and Program-C.
2. Data Redundancy:
 Storing the same data in many places is called Data Redundancy.
 Data Redundancy occurs with duplication of data of files
 This duplication of data (Redundancy) leads to higher storage and access cost.
 In additional it may lead to data inconsistency.
In traditional file approach, data stored in FLAT FILES. A flat file contains records that
have “no structured interrelationship”. Which are maintained by the FILE SYSTEM under the
operating systems control.
3. Limited Data Sharing
 Each department maintains its own files.
 The user of one department can access files of that department only, they cannot
access file of other department
 Ex: The user of orders department can access the file of orders department only,
they cannot access the files of other departments such as payroll department and
accounts departments
4. Lengthy Development time
 It takes lot of time to develop the file management system.
 Each new application begin with the designing of the file format, file descriptions
and writing file accessing logic.
 Comparing with modern business environment FMS takes lot of time for
development.
5. Excessive Program maintenance
 Cost is very high to maintain the application programs of FMS (File Management
system).
 80% of information systems budget may be used for program maintenance.
 If the data changes, the program has to be modified. Each new application must be
started from scratch by designing new file description.
 Advantages of DBMS
Database: collection of logical related data in an organized form is called database. Database
can share data among different users. They can use the data for different purpose. Data in
database:
 Is integrated
 Can be shared
 Can be concurrently accessed.
DBMS: -Collection of Logical Related Data in an organized form and Set of Programs Is Called
Data Base Management System (DBMS)
 DBMS allows users to create access and modify the database.
 The primary goal of a DBMS is to provide a convenient and efficient way to store, retrieve
and modify information.
The advantages are
1. Program Data Independency.
2. Minimal Data Redundancy.
3. Improve Data Consistency.
4. Improve Data sharing.
5. Improve Data accessibility.
6. Improve Data Quality.
7. Reduced Program Maintenance.

Program Data Independency:
 The separation of data description (Meta data) from the application program is
called data independence.
 Data descriptions are stored in a central location called as repository or data dictionary
 In dbms, no need to change in each application program to make modifications.
Minimal Data Redundancy:
 Storing the same data in many places is called “Data Redundancy”
 This duplication of data (redundancy) leads to higher storage and access cost.
 In additional it may lead to data inconsistency.
 The primary goal of a data base design approach is that previously separate
data files are integrated into a single, logical structure.
 Each primary fact (record) is stored in only one place in the database.
 Data base approach controls the data redundancy but it doesn‟t remove the
redundancy completely.
Improve Data Consistency:
 By controlling data redundancy, we can improve data inconsistency.
 By controlling data redundancy, each record is stored in one location.
 It becomes very easy to update and delete the records, when the records are stored on
only one place.
Improve Data Sharing:
 Data designed as a shared resource.
 Authorized users are granted permission to use the database.
 Each user provided one or more user views.
 A user view is a logical description, that required by a user to perform some tasks.
 In SQL “GRANT” command used to grant the permission to other users.
Example:- SQL> GRANT ALL ON EMP TO SMITH;
SQL> GRANT INSERT, UPDATE ON EMP TO SMITH;
Improve Data Accessibility:
 IN DBMS, we can store, retrieve and manipulate the dta easily.
 Using SQL, queries, we can insert, update and manipulate the data.
 Dbms make it possible to produce quick answers to any query.
 A query is a request or question is used to manipulate or retrieve the data.
 Queries are unified for all databases, using queries only we can access database.
Example:- SQL>: select * from Emp;
SQL> delete from Emp where empno=7369;
Improve Quality:
 The data base approach provides a number of tools and processes to improve data quality.
 Dbms mainly focus on quality dta.
 It defines the data, queries, reports and programs to access the dta through the dbms.
 The dbms is responsible for defining, storing and retrieving the dta.
 Dbms provides integrity constraints such as primary key, foreign key… etc that are
Implementing by the dbms.
 A constraint is a rule that cannot be violated by database users.
Reduced Program Maintenance: -
 Cost is very low to maintain the dta base.
 In dbms, we can add and remove the fields easily.
 Stored data must be changed, when a new data type, data column are added or data
formats are changed and so on, we can easily update such types of manipulation.
 Data Abstraction Levels (or) Database Architecture levels
 It describes overview of data environment levels of an organization.
 It is very helpful to design and understand the data models.
 In 1970‟s the ANSI American National standard institute and SPARC standards planning
and Requirements committee defined a framework for data modeling based on degree of
data abstraction.
 ANSI/SPARC architecture defines 3 levels of data abstraction.
1. External
2. Conceptual
3. Internal Physical
ANSI/SPARC frame work has been expanded with the additional of a physical model to
explicitly address physical-level implementation details of the internal model.
EXTERNAL MODEL:
 External model is the end user “view of the data environment”.
 It represents the subset of the data base.
 End users can use the application program to manipulate the data and generate
information.
 End users use the application for performing specific task in an environment. But in
general companies are divided into several units such as sales, finance and marketing.
Each unit has its own requirements and constraints.
 All departments use the same data in an organization.
 ERDs are used to represent the external views. External view is known as an External
schema.
End user End user
External External
Conceptual Model Designer’s View
Logical Independence
Internal DBMS View
Physical Independence
Physical
CONCEPTUAL MODEL:
 To integrate all external views (entities, relationship& constraints) into a single view is

called conceptual model.
 Conceptual model represents a global view of entire database; it is also known as
conceptual schema.
 It is a designer view of data environment.
 It also represent data environment using ERD.
 Conceptual design is nothing but logical design of the database.
INTERNAL MODEL:
 Once a specific DBMS has been selected, internal model maps the conceptual model
to the DBMS.
 Internal model is the representation of the database as “seen” by the DBMS.
 Internal model is depending on software therefore, a change in DBMS software
requires a change in internal model.
 Change in internal model without effecting conceptual model is called
LOGICAL INDEPENDENCY.
PHYSICAL MODEL:
 It describes how the data are stored in disks and how they can be accessed.
 The physical model requires the definition of both the physical storage devices and
access methods.
 This model is both software & hardware dependent.
 Changes in physical model without effecting the internal model is called physical
independence
 It is also called as physical schema.
Levels of data abstractions:-

Model Degree of Abstraction Focus On
External High End User
Conceptual Moderately High Designer View
Internal Moderately low DBMS View
Physical Low Storage & Access Methods
 The Need for Three Level Architecture:

The objective of three level architecture is to separate each user’s view of the database from the
way the database in physically represented.
Support of Multiple user view: Each user is able to access the same data, but have a different
customized view of the data. Each user should be able to change to view the data and this
change should not affect other users.
Insulation between user programs and data that does not concern them: Users should not
directly deal with physical storage details such as indexing. The user’s interactions with the
database should be independent of storage consideration.

 Physical DBMS Architecture:
 The physical architecture describes the software components used to enter and
process data and how these software components are related and interconnected.
 It is possible to identify a number of key functions which are common to most
database management systems.
 All the database management systems have two basic sets of languages data
definition language (DDL) and data manipulation language (DML).
 Based on various functions, the database system may be partitioned into the
following modules.
DDL Compiler:
 The DDL compiler converts the data definition statements into a set of tables
containing metadata tables.
 The DDL contains set of commands required to define the format of the data that is
being sorted.
 These tables containing information concerning the database and are in a form that
can be used by other components of the DBMS.
DML Compiler:
 Data Manipulation language which defines the set of commands that modify,
process data to create user definable output.
 The DML statements can also write in an application program to normal procedural
calls in the host language.
File Manager:
 File manager manages the allocation of space on disk storage.
 It establishes and maintains the list of structures and indices defined in the internal
schema.
 However, the file manager does not directly manage the physical output and input
of data.

Database Manager:
 It is the interface between low level data, application programs and queries.
 Databases typically require a large amount of storage space.
 It is stored on disks, as the main memory of computer cannot store this information.
 Data is moved between disk storage and main memory as needed. In database
management is a program module responsible for interfacing with the database file
system to user queries.
Query Processor:
 The query language processor is responsible for receiving query language statements
and changing them from the English like syntax of the query language to a form the
DBMS can understand.
 The Query language processor usually consist of two separate parts the parser and the
query optimizer.
 The parser receives query language statements from application programs and
examines the syntax of the statements to ensure they are correct.
 The query optimizer examines the query language statement and tries to choose the
best and most efficient way of executing the query.
Data Dictionary:
 A data dictionary stores information about the structure of the database.
 It is used heavily, hence a good data dictionary should have a good design and efficient
implementation.
Data files Indices:
 The data is stored in the data files. The indices are stored in the index files.
 Indices provide fast access to data items.
 Database Administrator (DBA) Or Functions & Role
 The personal having control over the database system is called Database
Administrator.
 The DBA administers the three levels of the database and define conceptual level of
database.
 One of the main reasons for using DBMS is to have a central control of both data and
the programs accessing those data.
 The following are the functions of a Database administrator.
o Schema definition
o Storage structure and access method definition
o Schema and physical organization modification
o Granting authorization for data access
o Routine maintenance.
Schema Definition:
 The database administrator creates the database schema by executing DDL statements.
 Schema includes the logical structure of database table (Relation) like data types of
attributes, integrity constraints etc.

Data Definition:
 The DBMS provides functions to define the structure of the data in the application.
These include defining and modifying the record structure, the type and size of
fields and the various constraints / conditions to be satisfied by the data in each
field
Data Manipulation:
 Once the data structure is defined, data needs to be inserted, modified or deleted.
 The functions which perform these operations are also part of the DBMS.
Granting authorization for data access:
 The DBA provides different access rights to the users according to their level.
Ordinary users might have highly restricted access to data.
Routine Maintenance:
 Some of the routine maintenance activities of a DBA is given below.
o Taking backup of database periodically
o Ensuring enough disk space is available all the time
o Monitoring jobs running on the database
o Ensure that performance is not degraded by some expensive task submitted by
some user.
Backup and Recovery:
 Backup and recovery procedures are tested regularly to assure their effectiveness in
restoring.
Relational and ER Models

The relational model uses a collection of tables to represent both data and relationships
among those data, each table has multiple columns and each column has a unique name.
 Relational Terms:
o Tuple: Each row in a table represents a record and is called a tuple.
o Attribute: The name of each column in a table is called an attribute.
o Domain: A domain is a set of permissible values that can be given to an attribute
o Relation: A database relation is a predefined row/column format for storing
information in a relational database.
 Entity: Databases store data in the form of entities (tables) and their attributes. An
entity may be a person, place, thing, event or concept. Entity is a common word
anything real or abstract which we want to store data.
Example:
Person: Employee, student and patient, etc.
Place; Store, warehouse, state
Event: Sale, registration, renewal
Entity type is a collection of entity, entities can be divided into 3 types they are:
a. Strong entity
b. Weak entity
c. Associative entity
Strong Entity: -Strong entity is the one, which does not depend upon other entities. Strong
entity is an entity that meets two criteria.
1. It should not be existence dependent.
2. The primary key must not be derived.
Example: - “Student” and “employee” instances of strong entity have unique characteristics.
Weak entity: Weak entity is the one, which depends upon other entities. Weak entity is an
entity that meets two criteria:
1. It is existence dependent on a related entity.
2. Its primary key is at least derived from the related entity‟s primary key.
3. Double lined rectangle is used to represent weak entity in the E-R diagram.
Associative entity:
It is an entity type that associates the instances of one or more entity types and contains
attributes that are peculiar to related entities in the relationship.
Entity Instances:
Single occurrence of an entity type is called as entity instance. It can be also called as Record or
Tuples.
Employee:
EMPNO ENAME JOB CITY
3200 Nisha Manager Hyd
3201 Neha Clerk Pune
 Attribute and its types:
 An attribute is a characteristic or properties of an entity it is also called Field.
 Attributes start with capital letter followed by small case letter. An entity could have
multiple attributes.
 Followings are types of attributes
1. Mandatory / Required attribute
2. Optional attribute
3. Single value attribute
4. Multi value attribute
5. Simple attribute
6. Composite attribute
7. Derived attribute
8. Stored attribute
9. Identified attribute
Mandatory / required Attribute:
It is an attribute it contain a value that is required or mandatory.
Ex: Date of birth
Optional Attribute:
It is an attribute it may or may not be a value
Ex: E-mail id
Single Value Attribute:
It is an attribute which can have only one value
Ex: Admin.no, S.No
Multi Valued Attribute:
It is an attribute that may have more than one value for a given entity instance
Ex: Language knows attribute may take multiple values for each record that is a student
may know more than one language.
Language Knows Skills
English C
Telugu Java
Hindi Oracle
English C
Telugu Java
Hindi CPP
Simple Attribute:
It is an attribute that cannot be broken down into smaller components.
Ex:
Sno
10
20
30
Composite Attribute:
It is an attribute that can be broken down into two or more components.
Ex:
Name First Name Last Name
Petetchen Peter Chen
Sachin tendulkar Sachin Tendulkar
Derived Attribute:
It is an attribute whose values can be calculated from related attribute values.
Ex: TOTAL
Total attribute is a derived attribute, which can be calculated from the other attributes
M1,M2 and M3
Stored Attribute:
It is an attribute stored in a database to supply values for the derived attributes.
Ex: M1,M2 and M3 are stored attributes that supply values to calculate value for the
“AVERAGE” attribute.
M1 M2 M3
95 90 80
60 80 60
Identifier Attribute:
It is an attribute used to identify a row uniquely.
Ex: S.No is an identifier attribute that identifies a row uniquely. It can be referred to as

primary key attribute.
SNo Name City
10 Raju Hyd
20 Kiran Mumbai
30 Raju Hyd
In the above example name and city can have duplicate values, but SNo has unique values
to identify the rows uniquely.
 Relational Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are
called Relational Integrity Constraints. There are three main integrity constraints −
 Key constraints
 Domain constraints
 Referential integrity constraints
Key Constraints
 Keys are attributes or sets of attributes that uniquely identify an entity within its entity set.
 An Entity set E can have multiple keys out of which one key will be designated as
the primary key.
 Primary Key must have unique and not null values in the relational table.
Example of Key Constraints in a simple relational table –
Key constraints force that −
 In a relation with a key attribute, no two tuples can have identical values for key attributes.
 A key attribute cannot have NULL values.
Domain Constraints
 Domain Constraints specifies that what set of values an attribute can take. Value of each
attribute X must be an atomic value from the domain.
 The data type associated with domains include integer, character, string, date, time,
currency etc. An attribute value must be available in the corresponding domain. Consider
the example below –

Referential integrity Constraints
 Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key
attribute of a relation that can be referred in other relation.
 Referential integrity constraint states that if a relation refers to a key attribute of a different
or same relation, then that key element must exist.
This rule states that if a foreign key in Table 1 refers to the Primary Key of Table 2, then every
value of the Foreign Key in Table 1 must be null or be available in Table 2. For example,
 Relational Algebra
 Relational algebra is a formal language describing how new relations are created from
old ones.
 It is useful tool for describing queries on a database management system
 Each row in the table represent one tuple from the relationship and each column one
attribute.
Operations in Relational Algebra:
1. Union (  ): Union combines all rows from two tables except duplicate rows. The tables must
have the same attributes characteristics.
Notation: T1  T2
Ex: SQL> select sname from T1 union select sname from T2;
2. Intersect: It display only the rows that appears in both tables that is common row
Notation: T1  T2
3. Difference (Minus): It gives all rows from one table that are not found in other table that is
subtracts one table from other table.
T1 – T2  T2 – T1
4. Product( X ):Product displays all possible pairs of rows from two tables. It is also known’s as
Cartesian product. Ex: If first table has three rows and second table has two rows then the
product gives a list composed of 3 X 2 = 6 rows.
Notation: T1 X T2
Ex: SQL> select sname from T1 product select sname from T;
5. Divide: Division operator is a binary operator. Division operator is helps to process on more
than one defined relation with condition that match source relation tuples into target relation
tuple through common attributes.
Ex: R1 / R2.
6. Select(  ): Select operator is used to give them a required information which is horizontal
subset of a table that is it give the records (rows).
σ
Notation: subject = "database" (Books)
7. Project (  ): Project displays the vertical subset of a table by using select command.
Notation: ∏subject, author (Books)

Selects and projects columns named as subject and author from the relation Books
8. Join (⋈): Join operator is used to combine information from multiple tables which are in a
relationship. T1 ⋈ T2;
9. Assignment (=): This operator assign the result value to a variable. The assignment operator
“:=” for example assign the maximum salary of all the employees to a variable.
SQL> maxsal:=select max(sal) from emp;
 Entity Relationship Model (ERM):

 ER model was developed IN 1976 BY PETER CHEN.
 Since then CHARLES BACHMAN and JAMES MARTIN have added some small
refinements to the basic ERD principles.
 E-R model stands for Entity- Relationship model.
 Logical representation of data for an organization is called “E-R model”.
 It is generally expressed in the form of E-R D (Entity relationship diagram)
 ER model is graphical representation of entities, attributes and relationship of a database.
 This model represented in an entity relationship diagram (ERD).
 ER model is based on the following components
1. Entity 2. Attribute 3. Relationship
ENTITY: - It is represented in the ERD by a Rectangle generally written in capital letters.

ATTRIBUTE: - It will describe particular characteristics of the entity represented by an “Eclipse”.
RELATIONSHIP:-It will describe associations among data. Relationship is represented by
Diamond symbol. Most relationships describe associations between 2 entities.

We can represent relationships in 4 types, they are
1. One to one relationship
2. One to Many relationship
3. Many to one relationship
4. Many to many relationship
One to one relationship:- In this relationship, one instance of one entity is related to another
Example: One person (P1,P2,P3& P4) can sit on one chair at any point of time and also one
chair (C1,C2,C3 & C4) can accommodate a maximum of one person at any given time. In this
relationship, both the participating entities have one to one relationship.
One to many relationship:- One instance of entity is related to multiple instances of another
entity.
Example 1:- One organization (O1, O2,O3) can have many employees but one employee
(e1,e2,e3,e4,e5) can work only for one organization.
Many to one Relationship: This is reverse of the one to many relationships.

Example :- Many employees (e1, e2, e3) can work for only one department but one department
(d1, d2, d3) can have many employees. The relationship between employees and department is
many to one relationship.
Many to many relationship: In this relationship, multiple instances of one entity related to
multiple instances of another entity.
Example 1:- One student (s1, s2, s3, s4) is enrolled from for many curse (c1, c2, c3, c4) and one
course is enrolled by many students.
 Degree of Relationship:
The number of entities to be participated in a relationship that is called “Degree of
Relationship”. Based on degree of relationship can classified into three types they are
1. Unary: The relationship between the instances of a single entity is called Unary
relationship it is also called as Recursive relationship. It represents the degree of
relationship is one.
2. Binary: Binary relationship is a relationship between the instances of two entities this is
most common relationship in data modeling. It represents degree of relation is
3. Ternary: Ternary relationship is a relationship among the instances of three entities it

represents the degree of relationship is 3.
 Entity Relationship Diagram (ERD):
 E-R Diagram or E-R D stands for Entity- Relationship Diagram.

 Graphical representation of data for an organization is called “ERD”.
 Graphical representation of E-R model is called E-R diagram.
 It uses different symbols to represent entities, relationships and attributes.
 The basic symbols are as follows:

Basic Symbols in ERD:-
Strong Weak Relationship Strong Associative Attribute Multi valued Derived
Entity Entity Relationship Entity Attribute Attribute
Relationship symbols in ERD:-

Unary Binary Ternary
 Types of Relational Keys:
A database is collection of entities, where an entity is set of related data. This is arranged in
the form of a table i.e. collection of several rows and columns. A database table includes
several following components (or) keys.
1. Primary key
2. Composite key
3. Candidate key
4. Foreign key
5. Secondary key
Primary Key: Combination of unique and not null constraints considered as primary key. It
cannot accept duplicate and not null values. Every database table contains minimum one
primary key. The primary key referred as the ENTITY INTEGRITY constraint.
SNO (Primary Key) Name Addres
1001 Smith New York
1002 Scott London
Composite Key: A primary key that consists of a combination of attributes known as “Composite
key”. (Or) A table contains more than one primary key attribute considered as “Composite key”
attribute.
Candidate Key: A candidate key is a set of one (primary key) or more (composite key) attributes
that can uniquely identify a row consider as “candidate key”. A table can have multiple candidate
keys. (or) Either primary key or composite key called “candidate key”.

Foreign Key: Foreign key value in one table must match primary key values in the related table. It
means, foreign key refers to primary key values of another table.
Secondary Key: A non-primary search key is known as a “secondary key”. If you don‟t know
the primary key value, some other attribute or combination of attributes may be used to find the
record. (OR) The other (non-primary key) attribute or combinations of attribute called as
“secondary key”.
 More about Entities and Relationships:
 The basic features of E-R diagrams are sufficient to design many database situations.
However, with more complex relations and advanced database applications It is
required to move to enhanced feature of E-R models.
 EERM stands for extended or enhanced entity relationship model.
 EERM is the result of adding more semantic constructs to the original entity
relationship (ER) model
 EERM represents organizational data using diagrams is called as EERD (
extended/enhanced relationship model)
 EER model constructs are :
1. Entity super type

2. Entity sub type
3. Entity clustering
1. Entity supertype: -an entity super type is a generic entity type that is related to one or more
entity subtypes, where the entity supertype contains the common characteristics, and the
entity subtypes contains the unique characteristics of each entity subtype.
2.Entity subtype: - sub types contains a unique subset of the super type entity set, in other words,
entity instance of the supertype can appear in only one of the supertype
3 Entity clustering: - An entity clustering is a virtual entity type used to represent multiple
entities and relationships in the ERD. An entity cluster is formed by combining multiple
interrelated entities into a single abstract entity object.
Generalization:
 Generalization is a process of defining more general types from a set of more specialized.
 This is the bottom up process of identifying a parent object from child object.
 Generalization is based on grouping common characteristics and relationship of the
subtypes

 Generalization is an object set that is a super set of another object set.
 It forms a super type from a set of sub type.
The process of generalizing entities, where the generalized entities contain the properties of all
the generalized entities is called Generalization. In generalization, a number of entities are
brought together into one generalized entity based on their similar characteristics. For an
example, pigeon, house sparrow, crow and dove all can be generalized as Birds
Example: in ER Model, full time employee entity is represented as shown below
Ename
Empno Address
FULL TIME EMPLOYEE
Salary Hra
Example: in ER Model, part time employee entity is represented as shown below
Ename
Empno Address
PART TIME EMPLOYEE
Hourly_salary
Example: EERM diagram with Super Type and Sub Type Relationship for EMPLOYEE super
type with 2 sub types Full Time Employee and Part Time Employee
Ename
Empno Address
EMPLOYEE
FULL TIME EMPLOYEE PART TIME EMPLOYEE
Salary Hra Hourly Salary
Specialization
Specialization is a process, which is opposite to generalization. In specialization, a group of
entities is divided into sub-groups based on their characteristics. Take a group Person for
Example. A person has name, date of birth, gender etc. These properties are common in all

persons, human beings. But in a company, a person can be identified as employee, employer,
customer or vendor based on what role do they play in company.
 Specialization is the top-down process of identifying lower level, more specific entity
subtypes from a higher level entity super type.
 Specialization is based on grouping unique characteristics and relationships of the subtypes
Example: -: in ER Model, PRODUCT entity is represented as shown below
Pname
Product Id Quantity
PRODUCT
Cost
Example: -: : EERM diagram with Super Type and Sub Type Relationship for PRODUCT super
type with 2 sub types CUSTOMER and Part SUPPLIER
 Defining Relationship for College Database and E-R Diagram:

The string entities in COLLEGE database are STUDENT, FACULTY, COURSE and
DEPATMENT. This database also has one weak entity called GUARDIAN.
We can specify the following relationship:
 Head-of: Heaf –of is a 1 : 1 relationship between FACULTY and DEPARTMENT
participation of the entity FACULTY is partial since not all the faculty members participate
in this relationship, while the participation from department side is total, since every
department has one head
 Works-in: it is 1 : N relationship between DEPARTMENT and FACULTY participation
from both side is total.
 Opts: it is a 1 : N relationship between COURSE and STUDENT participation from student
side is total because we are assuming that each student enrolled opts from a course. But the
participation from the course side is partial, since there can be courses that no student has
opted for.

 Taught_by: is M : N relationship between FACULTY and COURSE, as a faculty can teach
many courses and a course can be taught by many faculty members.
 Enrolled: is 1 : N relationship between STUDENT and DEPARTMENT as a student allowed
to enroll for only one department at a time.
 Has: is 1 : N relationship between STUDENT and GUARDIAN as a student can have more
than one local guardian and one local guardian is assumed to be related to one student only
the weak entity guardian has total participation in the relation “Has”.

UNIT – II
DATABASE INTEGRITY AND NORMALIZATION
 Relational Database Integrity:
o A Database is a collection of data. Relations from the database. They must satisfy
some properties, such as no duplicate tuples, no ordering of tuples, and atomic
attributes.
o Integrity simply means to maintain the consistency of data. Thus, integrity
constraints in a database ensure that changes made to the database by authorized
users do not compromise data consistency.
 Redundancy and Associated Problems:
A lot of information is being repeated in the table has data redundancy. This repetition of
information result in problems in addition to the wastage of space. This redundancy works
updating, Insertion and deletion Anomaly.
SNO SNAME SUBJECT CODE FACULTY

10 ABC BC107 SANDEEP
10 ABC BC207 SATYANARAYAN
10 ABC BC307 VIJAY
10 ABC BC407 VIJAY
10 ABC BC508 SMITA
10 ABC BC608 PRASANTH
 Update Anomaly: This anomaly is caused due to data redundancy. Redundant

information makes updates more difficult since. For Example: changing the name of the
FACULTY would require tuples containing SUBJECT CODE information be updated. If
for some reason, all tuples are not updated, the database will give inconsistent
information this problem is called Update Anomaly.
 Insertion Anomaly: This anomaly caused due to inability in representing certain
information the primary key of the above relation is SNO. Any new tuple to be inserted in
the relation must have a value for the primary key since entity integrity constraint
requires that a key may not be totally or partially NULL.
 Deletion Anomaly: This anomaly caused due to loss of useful information. In some
instances, useful information may be lost when a tuple is deleted
Ex: if we delete the tuple corresponding to student 10 enrolled BC207, it will lose relevant
information about the student.
 Single Valued Dependencies
A Database is collection of related information and it is therefore is mandatory that some
items of information in the database would depend on some other items of information.
The information is either single valued or multi valued. The name of a person is single
valued facts where qualification of a person is multi valued facts.
Functional Dependency:
The dependency between attributes is called “Functional dependency”. They are classified
into two types, they are

1. Partial Dependency: A dependency based on only part of the primary key is known as
partial dependency. The following table has partial dependency.
SNO NAME GROUP FEE COURSE FACULTY

In the above example name, group and fee are non-primary key attributes, they are
depending on part of primary key S.no
2. Transitive Dependency: A Dependency base on attribute that is not part of the primary
key is known as Transitive dependency. From the above example group and fee are the
non-primary key, fee depending on group.
 Normalization
 Decompose complex tables into simple or small tables called as “Normalization”.

 Normalization is a Refinement Process.
 Normalization is the process of removing anomalies (errors) from the tables.
 Normalization technique is based on a strong Mathematical Foundation.
 Normalization is also called Bottom-Up approach.
 It is always better to normalize to 2NF for select intensive (reporting) system.
 It is always better to normalize to 3NF for insert, update and delete intensive
(on-line transaction) system.
Normalization forms:-
1. First normal form (1 NF)
2. Second normal form (2 NF)
3. Third normal form (3 NF)
4. Boyce codd normal form( BCNF)
5. Fourth normal form (4 NF)
Normalization process: -
TABLE WITH ANOMALIES
Remove Multi valued attributes
1 NORMAL FORM
Remove Partial Dependency
2 NORMAL FORM
Remove Transitive Dependency

3 NORMAL FORM
Remove Remaining Anomalies

BOYCE CODD ( B C N F )
Remove Multi Valued Dependencies
4 NORMAL FORM
Remove Remaining Anomalies
HIGHER NORMAL FORM
Consider the following table with anomalies. The table has multi-valued attribute “skills”. Let us
decompose the tables into structured table step by step using Normalization.

Sample Table: - STUDENT
Sno Name Group Fee Skills Faculty
3200 Rahul B.Sc 8000 C Nisha

C++ Neha
3201 Rajiv B. Com 6000 Oracle Ashwini

Tally Sruthi
1. First Normal Form (1 NF):

Condition:
1. Remove multi-valued attributes from table (with anomalies) is called as “First normal
form (1 NF)”.
2. Identify primary keys
Solution: To make above the table (student) to 1NF, we have to remove all the multi-valued
attributes. First Normalization form (1NF):
Table: - STUDENT
S. No Name Group Fee Skills Faculty
3200 Rahul B. Sc 8000 C Nisha
3200 Rahul B. Sc 8000 C++ Neha
3201 Rajiv B. Com 6000 Oracle Ashwini
3201 Rajiv B. Com 6000 Tally Sruthi
Explanation: The above table (student) is in first normal form (1 NF) because, it contains no
multi-valued attributes. But it is not in 2Normal form (2NF) because, it has partial
dependency. i.e. Sno, Name, Group, fee are functionally dependent on part (but not all) of
the primary key (Sno).
2. Second Normalization form (2NF):
Condition:-
1. It is in the first Normal Form (1NF), and
2. No partial dependency exists between non-key attributes and key attributes.
Solution: To make above the tables (student) to 2NF, we have t remove all the partial
dependencies. To remove this partial dependency, we need to split student table into 2
separate tables, STUDENT, SKILL.
Explanation: The above tables are in second Normal Form (2NF) because, it is in first normal
form and every non-key attributes is fully functionally dependent on the primary key. But it is
not in 3NF

3. Third normal form (3NF):
Conditions:-
1. It is in 2NF, and
2. No transitive dependency exists between non-key attributes and key attributes.
The above 2 tables (STUDENT,SKILL) is in 2NF, it contains transitive dependency i.e. dependency
between “FEE” and “GROUP” attributes.
Solution: To make above tables (student, skills) to 3NF, we have to remove all the transitive
dependencies.
To remove this transitive dependency, we need to split STUDENT, SKILLS tables into 3 separate
tables STUDENT, SKILLS and GROUP.
Explanation: The above tables (STUDENT, SKILL and GROUP) are in third normal form
(3NF) because it is in second normal form and no transitive dependencies.
Boyce Codd Normal Formal (BCNF):
A relation is in BCNF if and only if every determinant in the relation is a candidate key;
here we need to remove resulting anomalies from functional dependency. A relation that is in
3NF can be converted to relations in BCNF using a simple two step process.
1. The relation is modified so that the determinant in the relation that is not a
candidate key becomes a component of the primary key of the revised table.
2. Decomposing the relation to eliminate the partial functional dependency.
Consider the following table
SID SUBJECT FACULTY MARKS
919 Computers Nisha 98
919 Math‟s Neha 87
920 Stats Ashwini 95
921 Physics Sruthi 77

In the above table,
 Subject (key is functionally dependent on faculty (non-key).
 Faculty and marks are functionally dependent on the composite primary key formed of Sid
and Subject.
According to the step 1, it can be converted as:
SID FACULTY SUBJECT MARKS
Now,
 Subject (non-key) is functionally dependent on faculty (key). Thus, there exists
partial functional dependency. The determinant becomes the part of composite
primary key.
According to the step 2, it can be decomposed to satisfy BCNF as:
Explanation: It is a relation in BCNF that contains no multi valued dependencies. When a

relation is in BCNF, there will not be any anomalies. However, there may be anomalies
those results from multi-valued dependencies. Consider the following table with multi
valued dependencies
Forth Normal Form: It is a relation in BCNF that contains no multi valued dependencies. When a
relation is in BCNF, there will not be any anomalies. However, there may be anomalies that results
from multi – valued dependencies.
Consider the following table with multi valued dependencies.
Course Faculty Textbook
Java Steve Dietel

Tikolo Norton
Oracle Allen Peter
Scott
Which on removing multiple values for the attributes becomes
Course Faculty Textbook

Java Steve Dietel
Java Tikolo Norton

Oracle Allen Peter
Oracle Allen Scott
To be in Fourth Normal form we remove multi – valued dependency in the above table. And, it is
split as:
Course Faculty Course Textbok

Java Steve Java Dietel
Java Tikolo Jave Norton
Oracle Allen Oracle Peter
Oracle Scott
 DECOMPOSITION
 Decomposition is the process of breaking down in parts or elements.
 It replaces a relation with a collection of smaller relations.
 It breaks the table into multiple tables in a database.
 It should always be lossless, because it confirms that the information in the original relation
can be accurately reconstructed based on the decomposed relations.
 If there is no proper decomposition of the relation, then it may lead to problems like loss of
information.
Properties of Decomposition
Following are the properties of Decomposition,
1. Lossless Decomposition
2. Dependency Preservation
3. Lack of Data Redundancy
1. Lossless Decomposition
 Decomposition must be lossless. It means that the information should not get lost from the
relation that is decomposed.
 It gives a guarantee that the join will result in the same relation as it was decomposed.
Example:
Let's take 'E' is the Relational Schema, Withinstance 'e'; is decomposed into: E1, E2, E3, . . . . En;
With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called as 'Lossless Join
Decomposition'.
 In the above example, it means that, if natural joins of all the decomposition give the
original relation, then it is said to be lossless join decomposition.
Example: <Employee_Department> Table
Eid Ename Age City Salary Deptid DeptName

E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource
 Decompose the above relation into two relations to check whether a decomposition is
lossless or lossy.
 Now, we have decomposed the relation that is Employee and Department.

Relation 1 : <Employee> Table
Eid Ename Age City Salary

E001 ABC 29 Pune 20000
E002 PQR 30 Pune 30000
E003 LMN 25 Mumbai 5000
E004 XYZ 24 Mumbai 4000
E005 STU 32 Bangalore 25000
 Employee Schema contains (Eid, Ename, Age, City, Salary).
Relation 2 : <Department> Table
Deptid Eid DeptName

D001 E001 Finance
D002 E002 Production
D003 E003 Sales
D004 E004 Marketing
D005 E005 Human Resource
 Department Schema contains (Deptid, Eid, DeptName).

 So, the above decomposition is a Lossless Join Decomposition, because the two relations
contains one common field that is 'Eid' and therefore join is possible.
 Now apply natural join on the decomposed relations.
2. Dependency Preservation
 Dependency is an important constraint on the database.

 Every dependency must be satisfied by at least one decomposed table.
 If {A → B} holds, then two sets are functional dependent. And, it becomes more useful for
checking the dependency easily if both sets in a same relation.
 This decomposition property can only be done by maintaining the functional dependency.
 In this property, it allows to check the updates without computing the natural join of the
database structure.
3. Lack of Data Redundancy
 Lack of Data Redundancy is also known as a Repetition of Information.

 The proper decomposition should not suffer from any data redundancy.
 The careless decomposition may cause a problem with the data.
 The lack of data redundancy property may be achieved by Normalization process.
 Storage of Database on Hard disks
Databases are stored in file formats, which contain records. At physical level, the actual data is
stored in electromagnetic format on some device. These storage devices can be broadly
categorized into three types −

Primary Storage − The memory storage that is directly accessible to the CPU comes under this
category. CPU's internal memory (registers), fast memory (cache), and main memory (RAM) are
directly accessible to the CPU, as they are all placed on the motherboard or CPU chipset.
Secondary Storage − Secondary storage devices are used to store data for future use or as backup.
Secondary storage includes memory devices that are not a part of the CPU chipset or
motherboard, for example, magnetic disks, optical disks (DVD, CD, etc.), hard disks, flash drives,
and magnetic tapes.
Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since such storage
devices are external to the computer system, they are the slowest in speed. These storage devices
are mostly used to take the back up of an entire system. Optical disks and magnetic tapes are
widely used as tertiary storage.
Magnetic Disks
 Hard disk drives are the most common secondary storage devices in present computer
systems. These are called magnetic disks because they use the concept of magnetization to
store information.
 Hard disks consist of metal disks coated with magnetizable material. These disks are placed
vertically on a spindle. A read/write head moves in between the disks and is used to
magnetize or de-magnetize the spot under it. A magnetized spot can be recognized as 0
(zero) or 1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has
many concentric circles on it, called tracks. Every track is further divided into sectors. A sector on
a hard disk typically stores 512 bytes of data.
Redundant Array of Independent Disks
 RAID or Redundant Array of Independent Disks, is a technology to connect multiple

secondary storage devices and use them as a single storage media.
 RAID consists of an array of disks in which multiple disks are connected together to achieve
different goals. RAID levels define the use of disk arrays.
RAID 0:
In this level, a striped array of disks is implemented. The data is broken down into blocks
and the blocks are distributed among disks. Each disk receives a block of data to write/read in
parallel. It enhances the speed and performance of the storage device. There is no parity and
backup in Level 0.

RAID 1: RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy
of data to all the disks in the array. RAID level 1 is also called mirroring and provides 100%
redundancy in case of a failure.
RAID 2: RAID 2 records Error Correction Code using Hamming distance for its data, striped on
different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes
of the data words are stored on a different set disks. Due to its complex structure and high cost,
RAID 2 is not commercially available.
RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored on a
different disk. This technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data disks and then the parity is generated and
stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses block-
level striping. Both level 3 and level 4 require at least three disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data block
stripe are distributed among all the data disks rather than storing them on a different dedicated
disk.

RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated and stored
in distributed fashion among multiple disks. Two parities provide additional fault tolerance. This
level requires at least four disk drives to implement RAID.
 File organization:
 Databases are used to store information. Normally the principal operations are to perform
on database are relating to creating data, retrieving data, modifying and deleting some
information which is no longer useful or valid.
 Databases are used to store information in the form of files of records and are typically
stored on magnetic disks.
 This unit focuses on the file organization in DBMS, the access methods available and the
system parameters associated with them.
 File organization is the way the files are arranged on the disk and access method is how the
data can be retrieved based on the file organization.
 In a database we have lots of data. Each data is grouped into related groups called tables.
Each table will have lots of related records. Any user will see these records in the form of
tables in the screen. But these records are stored as files in the memory.
 Storing the files in certain order is called file organization. The main objective of file
organization is
o Optimal selection of records i.e.; records should be accessed as fast as possible.
o Any insert, update or delete transaction on records should be easy, quick and
should not harm other records.
o No duplicate records should be induced as a result of insert, update or delete
o Records should be stored efficiently so that cost of storage is minimal.
Some of the file organizations are

1. Sequential File Organization
2. Heap File Organization
3. Hash/Direct File Organization
4. Indexed Sequential Access Method

Sequential File Organization:
It is one of the simple methods of file organization. Here each file/records are stored one after the
other in a sequential manner. This can be achieved in two ways:
 Records are stored one after the other as they are inserted into the tables. This method is
called pile file method. When a new record is inserted, it is placed at the end of the file. In
the case of any modification or deletion of record, the record will be searched in the
memory blocks. Once it is found, it will be marked for deleting and new block of record is
entered.
Inserting a new record:
In the diagram above, R1, R2, R3 etc are the records. They contain all the attribute of a row. i.e.;
when we say student record, it will have his id, name, address, course, DOB etc. Similarly R1, R2,
R3 etc can be considered as one full set of attributes.
 In the second method, records are sorted (either ascending or descending) each time they
are inserted into the system. This method is called sorted file method. Sorting of records
may be based on the primary key or on any other columns. Whenever a new record is
inserted, it will be inserted at the end of the file and then it will sort – ascending or
descending based on key value and placed at the correct position. In the case of update, it
will update the record and then sort the file to place the updated record in the right place.
Same is the case with delete.

Advantages:
 The design is very simple compared other file organization. There is no much effort
involved to store the data.
 When there are large volumes of data, this method is very fast and efficient. This method is
helpful when most of the records have to be accessed like calculating the grade of a student,
generating the salary slips etc where we use all the records for our calculations
 This method is good in case of report generation or statistical calculations.
 These files can be stored in magnetic tapes which are comparatively cheap.
Disadvantages of Sequential File Organization
 Sorted file method always involves the effort for sorting the record. Each time any
insert/update/ delete transaction is performed, file is sorted. Hence identifying the record,
inserting/ updating/ deleting the record, and then sorting them always takes some time
and may make system slow.
Heap File Organization
 This is the simplest form of file organization. Here records are inserted at the end of the file
as and when they are inserted. There is no sorting or ordering of the records.
 Once the data block is full, the next record is stored in the new block. This new block need
not be the very next block.
 This method can select any block in the memory to store the new records. It is similar to
pile file in the sequential method, but here data blocks are not selected sequentially. They
can be any data blocks in the memory.
 It is the responsibility of the DBMS to store the records and manage them.

 If a new record is inserted, then in the above case it will be inserted into data block 1.
 When a record has to be retrieved from the database, in this method, we need to traverse
from the beginning of the file till we get the requested record. Hence fetching the records in
very huge tables, it is time consuming. This is because there is no sorting or ordering of the
records. We need to check all the data.
 Similarly if we want to delete or update a record, first we need to search for the record.
Again, searching a record is similar to retrieving it- start from the beginning of the file till
the record is fetched. If it is a small file, it can be fetched quickly. But larger the file, greater
amount of time needs to be spent in fetching.
 In addition, while deleting a record, the record will be deleted from the data block. But it
will not be freed and it cannot be re-used.
Advantages of Heap File Organization
 Very good method of file organization for bulk insertion. i.e.; when there is a huge number
of data needs to load into the database at a time, then this method of file organization is
best suited. They are simply inserted one after the other in the memory blocks.
 It is suited for very small files as the fetching of records is faster in them. As the file size
grows, linear search for the record becomes time consuming.
Disadvantages of Heap File Organization
 This method is inefficient for larger databases as it takes time to search/modify the record.
 Proper memory management is required to boost the performance. Otherwise there would
be lots of unused memory blocks lying and memory size will simply be growing.
Hash/Direct File Organization
 In this method of file organization, hash function is used to calculate the address of the
block to store the records. The hash function can be any simple or complex mathematical
function.
 The hash function is applied on some columns/attributes – either key or non-key columns
to get the block address.
 Hence each record is stored randomly irrespective of the order they come. Hence this
method is also known as Direct or Random file organization.
 If the hash function is generated on key column, then that column is called hash key, and if
hash function is generated on non-key column, then the column is hash column.

 When a record has to be retrieved, based on the hash key column, the address is generated
and directly from that address whole record is retrieved. Here no effort to traverse through
whole file.
 Similarly when a new record has to be inserted, the address is generated by hash key and
record is directly inserted. Same is the case with update and delete. There is no effort for
searching the entire file or sorting the files. Each record will be stored randomly in the
memory.
These types of file organizations are useful in online transaction systems, where retrieval or
insertion/updation should be faster.
Advantages of Hash File Organization
 Records need not be sorted after any of the transaction. Hence the effort of sorting is
reduced in this method.
 Since block address is known by hash function, accessing any record is very faster.
Similarly updating or deleting a record is also very quick.
 This method can handle multiple transactions as each record is independent of other. i.e.;
since there is no dependency on storage location for each record, multiple records can be
accessed at the same time.
 It is suitable for online transaction systems like online banking, ticket booking system etc.
Disadvantages of Hash File Organization
 Since all the records are randomly stored, they are scattered in the memory. Hence memory
is not efficiently used.
 If we are searching for range of data, then this method is not suitable. Because, each record
will be stored at random address. Hence range search will not give the correct address

range and searching will be inefficient. For example, searching the employees with salary
from 20K to 30K will be efficient.
 Searching for records with exact name or value will be efficient. If the Student name
starting with ‘B’ will not be efficient as it does not give the exact name of the student
Indexed Sequential Access Method (ISAM)
 This is an advanced sequential file organization method. Here records are stored in order of
primary key in the file.
 Using the primary key, the records are sorted. For each primary key, an index value is
generated and mapped with the record. This index is nothing but the address of record in
the file.
In this method, if any record has to be retrieved, based on its index value, the data block address is
fetched and the record is retrieved from memory.
Advantages of ISAM
 This method gives flexibility of using any column as key field and index will be generated
based on that.
 In addition to the primary key and its index, we can have index generated for other fields
too.
 Hence searching becomes more efficient, if there is search based on columns other than
primary key.
Disadvantages of ISAM
 An extra cost to maintain index has to be afforded. i.e.; we need to have extra space in the
disk to store this index value. When there is multiple key-index combinations, the disk
space will also increase.
 As the new records are inserted, these files have to be restructured to maintain the
sequence. Similarly, when the record is deleted, the space used by it needs to be released.
Else, the performance of the database will slow down.
 Types of Indexes
 Indexing is a way to optimize performance of a database by minimizing the number of disk
accesses required when a query is processed.
 An index or database index is a data structure which is used to quickly locate and access
the data in a database table.
 Indexes are created using some database columns.
o The first column is the Search key that contains a copy of the primary key or
candidate key of the table. These values are stored in sorted order so that the

corresponding data can be accessed quickly (Note that the data may or may not be
stored in sorted order).
o The second column is the Data Reference which contains a set of pointers holding
the address of the disk block where that particular key value can be found.
Indexing Methods
Ordered Indices: The indices are usually sorted so that the searching is faster. The indices which are
sorted are known as ordered indices.
 If the search key of any index specifies same order as the sequential order of the file, it is
known as primary index or clustering index.
Note: The search key of a primary index is usually the primary key, but it is not necessarily
so.
 If the search key of any index specifies an order different from the sequential order of the file,
it is called the secondary index or non-clustering index.
Clustered Indexing: Clustering index is defined on an ordered data file. The data file is ordered on a
non-key field. In some cases, the index is created on non-primary key columns which may not be unique
for each record. In such cases, in order to identify the records faster, we will group two or more columns
together to get the unique values and create index out of them.
This method is known as clustering index. Basically, records with similar characteristics are
grouped together and indexes are created for these groups.
Primary Index
In this case, the data is sorted according to the search key. It induces sequential file organisation.
In this case, the primary key of the database table is used to create the index. As primary keys are
unique and are stored in sorted manner, the performance of searching operation is quite efficient.
The primary index is classified into two types : Dense Index and Sparse Index.
(I) Dense Index:
 For every search key value in the data file, there is an index record.
 This record contains the search key and also a reference to the first data record with that
search key value.
(II) Sparse Index:

 The index record appears only for a few items in the data file. Each item points to a block as
shown.
 To locate a record, we find the index record with the largest search key value less than or
equal to the search key value we are looking for.
 We start at that record pointed to by the index record, and proceed along the pointers in the
file (that is, sequentially) until we find the desired record.

Secondary Index:
 It is used to optimize query processing and access records in a database with some
information other than the usual search key (primary key).
 In this two levels of indexing are used in order to reduce the mapping size of the first level
and in general. Initially, for the first level, a large range of numbers is selected so that the
mapping size is small. Further, each range is divided into further sub ranges.
 In order for quick memory access, first level is stored in the primary memory. Actual
physical location of the data is determined by the second mapping level.
Clustering Index:
 In some cases, the index is created on non-primary key columns which may not be unique
for each record.
 In such cases, in order to identify the records faster, we will group two or more columns
together to get the unique values and create index out of them. This method is known as
clustering index.
 Basically, records with similar characteristics are grouped together and indexes are created
for these groups.
For example, students studying in each semester are grouped together. i.e.; 1st Semester students,
2nd semester students, 3rd semester students etc are grouped.

 In above diagram we can see that, indexes are created for each semester in the index file. In
the data block, the students of each semester are grouped together to form the cluster.
 The address in the index file points to the beginning of each cluster. In the data blocks,
requested student ID is then search in sequentially.
 New records are inserted into the clusters based on their group. In above case, if a new
student joins 3rd semester, then his record is inserted into the semester 3 cluster in the
secondary memory. Same is done with update and delete.
If there is short of memory in any cluster, new data blocks are added to that cluster.
This method of file organization is better compared to other method as it provides clean
distribution of record, and hence making search easier and faster.
 Tree Structure:
 Tree represents the nodes connected by edges. We will discuss binary tree or binary search
tree specifically.
 Binary Tree is a special data structure used for data storage purposes. A binary tree has a
special condition that each node can have a maximum of two children.
 A binary tree has the benefits of both an ordered array and a linked list as search is as
quick as in a sorted array and insertion or deletion operation are as fast as in linked list.

 B+ Tree :
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf
nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the
same height, thus balanced. Additionally, the leaf nodes are linked using a link list; therefore, a B +
tree can support random access as well as sequential access.
Structure of B+ Tree: Every leaf node is at equal distance from the root node. A B+ tree is of the
order n where n is fixed for every B+ tree.
Internal nodes:
 Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
 At most, an internal node can contain n pointers.
Leaf nodes:
 Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
 At most, a leaf node can contain n record pointers and n key values.
 Every leaf node contains one block pointer P to point to next leaf node and forms a linked list.
B+ Tree Insertion: B+ trees are filled from bottom and each entry is done at the leaf node.
 If a leaf node overflows:
 Split node into two parts.
 Partition at i = ⌊ (m+1)/2 ⌋.
 First i entries are stored in one node.
 Rest of the entries (i+1 onwards) are moved to a new node.
 ith key is duplicated at the parent of the leaf.
 If a non-leaf node overflows:
 Split node into two parts.
 Partition the node at i = ⌈(m+1)/2⌉.
 Entries up to i are kept in one node.
 Rest of the entries are moved to a new node.

B+ Tree Deletion: B+ tree entries are deleted at the leaf nodes.
 The target entry is searched and deleted.
o If it is an internal node, delete and replace with the entry from the left position.
 After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes left to it.
 If distribution is not possible from left, then
o Distribute from the nodes right to it.
 If distribution is not possible from left or from right, then
o Merge the node with left and right to it.
 Multi key file organization

 There are two basic file organization schemes that allow records to be accessed by more
than one key field. Thus, allowing multiple access path each having a different key. These
are called multi key file organization.
 Most of these techniques are based on building indexes to provide direct access by the key
value. Two of the commonest technique for this organizations are.
o Multi list file organization
o Inverted file organization.
Multi list file organization:
Multi list file organization is a multi index linked file organization. A linked file
organization is a logical organization where physical ordering of records is not of concern.
In addition to creating order during linking search through a file can be further facilitated by
creating primary and secondary indexes. All these concepts are supported in the multi list file
organization.
Record Number Empid Name Job Qualification City Salary
A 100 ABC Soft Eng.. B.Tech Hyd 25,000
B 101 DEF Soft eng.. B.Tech Hyd 28,000
C 102 PQR Soft Magr MCA Bang 26,000
D 103 STU Soft Magr B.Tech Bang 22,000
E 104 XYZ Soft Eng. MCA Hyd 23,000
Since, the primary key of the file is Empid. However as the file size will grow the search
performance of the file would deteriorate.
Create primary key index for this file having the Empid values in the range 100 to 104.
Inverted File organization:
Inverted file organization is one file organization where the index structure is most
important. In this organization the basic structure of file record does not matter much.
The index entries are of variable lengths as the number of records with the same key value is
changing.

UNIT – III
Structure Query Language
RDBMS stands for “Relational Database Management System”. RDBMS is the basis for
SQL, and for all modern database systems such as MYSQL Server, IBM DB2, Oracle, MYSQL and
Microsoft Access.
The data in RDBMS is stored in Database objects called “Tables”. A table is collection of
logical related data entries and it consists of columns and rows.
SQL Introduction:
 SQL stands for structured query language.

 It is a query language used for accessing and modifying information in the database.
 In 1970’s originally developed by IBM (International Business Machines) company that
language was “SEQUEL” later it is renamed to “SQL”.
 SQL is non-procedural language
 SQL commands provide access for the structure and also the data part of tables using the
following commands.
 SQL commands are grouped into 4 major categories depending on their functionality
o DDL (Data Definition Language)
o DML (Data Manipulation Language)
o DCL (Data Controlling Language)
o TCL (Transaction Controlling Language)
Characteristics of SQL:
 SQL is extremely flexible.
 SQL uses a free form syntax that gives the ability to user to structure the SQL statements in
a best suited way.
 It is a high level language.
 It receives natural extensions to its functional capabilities.
 It can execute queries against the database.
Advantages of SQL:
 SQL provides a greater degree of abstraction than procedural language.
 It is coded without embedded data-navigational instructions.
 It enables the end users to deal with a number of database management systems where it is
available.
 It retrieves quickly and efficiently huge amount of records from a database.
 No coding required while using standard SQL.
DDL (Data Definition Language):
 Its stands for Data Definition Language. These SQL commands are used for Creating,
Modifying and Dropping the structure of database objects.
 The commands are
o Create
o Alter
o Drop
o Truncate
o Rename
o Desc / Describe

1. Create: This command is used to create objects such as tables, index, views and clusters.
Syntax:
SQL> create table <table_name> (Column_Name1 Data_type1(size),
Column_Name2 Data_type2(size),……………..
Column_Name n Data_typen(size));
Ex: SQL> create table student( sno number(3), Name varchar2(10), add varchar2(10));
2. Alter: This command is used to modify the definition (Structure) of a table by modifying
the definition of its columns.
These commands are used to perform the following functions.
a. ADD a new column to existing table.
Syntax: SQL> alter table <table_name> ADD (Column_name Data_type(size));
Ex: SQL> alter table student ADD Contactno Number(10);
b. MODIFY the columns of existing table.
Syntax: SQL> alter table <table_name> modify (column_name data_type(size));
Ex: SQL> alter table student modify sno number(4);
c. DROP a column from existing table.
Syntax: SQL> alter table <table_name> DROP <Column_name>;
Ex: SQL> alter table student drop contactno;
3. Drop: The SQL drop command is used to remove and object from database. The drop
command removes the table structure from the database including all the rows in the table.
Once a table is dropped it is not retrieved back.
Syntax: SQL> drop table <table_name>;
Ex: SQL> drop table student;
4. Truncate: The SQL Truncate command is used to delete all rows from the table and free the
space containing the table.
Syntax: SQL> truncate table <table_name>;
Ex: SQL> truncate table emp1;
5. Rename: To Change the name of the selected table from old name to new name.
Syntax: SQL> rename table <old_name> to <New_name>;
Ex: SQL> rename table student to student1;
6. Desc / Describe: Use Describe to list all the columns in the specify table or view.
Syntax: SQL> Desc <table_name>;
Ex: SQL> desc student1;
Name Null? Type
-----------------------------------------------
Sno Number(3)
name Varchar2(10)
add Varchar2(10)
DML (Data Manipulation Language)
 DML is abbreviation of Data Manipulation Language. The DML commands help the user to
access the table and insert the content into the table of modify the data or delete the data
available in the table.
 Some of the commands which are used to enter the data and perform manipulation or
retrieve the selected content from the tables are
o Select
o Insert
o Update
o Delete
Select: The most commonly used SQL command is SELECT statement. The SQL select statement is
used to query or retrieve data from a table in the database.
A select statement used to display the records in multiple methods.
o All rows and all columns
Syntax: SQL> Select * from <table_name>;
Ex: SQL> select * from student:
The above command display all the records and fields from the student table.
o Selected columns and all rows
Syntax: SQL> select <column_list> from <table_name>;
Ex: SQL> select sno, name from student;
The above command is used to display selected columns with all records available
table.
o Selected rows with all columns
Syntax: SQL> select * from <table_name> where <column_name>=<Expression>;
Ex: SQL> select * from student where sno=3;
The above command is used to display all the columns for the record based on the
condition.
o Selected rows with selected columns
Syntax: SQL? Select <colum_list> from <table_name> where <column_name>=<Expression>;
Ex: SQL> select sno, name from student where sno=5;
The command is used to display selected columns and selected record based on the condition.
Syntax for select:
SQL> select <column_list> from <table_name>
[where clause] [Group by clause] [Having clause] [Orderby clause];

Insert into:
 Insert command is used to enter and store the values in the database object in the defined
order and type.
 The data values given by the user in the insert statement should match with the data type
declared for the selected column in the table.
 SQL insert command is implemented in 3 methods as
o Simple insert statement
Syntax: SQL> insert into <table_name> values (column1 value, column2 value,…………..
Column n value);
EX: SQL> insert into student values(1, ‘raju’, ‘knr’, 30);
The above insert statement is used to add new rows of data to table.
o Partial insert statement
Syntax:
SQL> insert into <table_name> <column_list> values(column1 value, column2 value,
………………….Column n value);
Ex: SQL> insert into student (sno, ‘name’, ‘add’, age) values (2, ‘ramu’, ‘hyd’, 25);
Order of values are depended on the column name given with insert command
o Interactive insert statement
Syntax: SQL> insert into <table_name> values(&column_name1,
‘&column_name2’,………………..
&column_name n);
Ex: SQL> insert into student values (&sno, ‘&name’, ‘&add’, age);
Enter value for sno : 3
Enter value for name : ravi
Enter value for add : JGL
Enter value for age : 30
1 row inserted
SQL> /
The above command help the user to insert 1 record and repeat the statement by
specifying / (forward slash key) operator in the command and the statements are
repeated for n number of times till the user pass other SQL command in the prompt.
Update:
 A DML command used to edit or update the data based on condition for the selected
record or field.
 A update command is implement with a set keyword which is used to overwrite the
previous values with current values.
 This command is used to change and modify the data values in a table.
Syntax: SQL> update <table_name> set <column_list>=<values> where <condition>;
Ex: SQL> update student set name= “SCOTT” where sno=3;
 Multiple update to edit or change the values for multiple fields or single record based on
condition.
Syntax: SQL> update <table_name> set <column_list>=<values> where <condition>;
Ex: SQL> update student set name= ‘sindhu’, age=20 where sno=5;
Delete
 SQL Delete statement is used to delete records from the table based on user choice either
single record or multiple records based on conditions the criteria is specified using where
clause.
 The SQL delete statement allows deleting a single record or multiple records from a table.
Syntax: SQL> delete from <table_name> where <column_name>=<expression>;
SQL> delete from student where sno=5;
 If you want to delete multiple records from table.
Syntax: SQL> delete from <table_name>;
SQL> delete from student;
DCL (Data Controlling Language)
 The Data Controlling Language (DCL) is a subset of the Structured Query Language (SQL)
that allows database administrators to configure security access to relational database.
 DCL is the simplest of the SQL subsets, DCL commands are used to enforce database
security in a multiple user database environment
 Two types of DCL commands are GRANT and REVOKE.
Grant: SQL GRANT is a command used to provide access or privileges on the database objects
to the users
Syntax: SQL> GRANT <privileges_list> on <object_name> to <user / public>;
Ex: SQL> GRANT all on emp to smith;
Revoke: SQL REVOKE command is used to remove the given privileges from the selected user
of the database.
Syntax: SQL> revoke <privileges_list> on <object_name>from <user / public>;
Ex: SQL> revoke all on emp from smith;
TCL (Transaction Controlling Language)
 Commit and Rollback are transaction statements that are used in database access; they
can also be called Data Controlling Language for SQL.
 A commit statement does what is says, and commits all the changes made to data that
have been made during the current transaction
Syntax: SQL> commit:
 The process of undoing a change to a database the ROLLBACK statement can be used to
either end a unit of work and back out all the relational database changes that were
made by that unit of work.
Syntax: SQL> rollback:
Ex: SQL> drop table student;
SQL> rollback;

 The Savepoint statement names and marks the current point in the processing of
transactions with the ROLLBACK to statement, save points undo parts of a transaction
instead of the whole transaction.
Syntax: SQL> savepoint <savepoint_name>;
SQL> savepoint ss1;
Data types in SQL
Data type represents the type of data an attribute or a variable will hold. The common data types
used in are listed below.
1. Number.
2. Char.
3. Varchar/varchar 2.
4. Integer / int
5. Date/Time.
6. Lob (Large Objects).
NUMBER: It will accept only numbers.
Syntax: Number (x, y) X=decimal digits , Y= floating digits
Example: Number (3, 2) O/p: 312.45
CHAR:
 It will accept characters up to 2000 character. It is a fixed length data type.
 When we use char data type memory wastage will be there that means it will allocate 2000
characters for every specified column.
Syntax: Char (20);
That means it accept 20 characters.
VARCHAR / VARCHAR2 :
 It will accept numbers and text up to 4000 characters.
 It is variable length data type.
 It is better than use to char data type.
Syntax: Varchar 2 (“25”)
INTEGER/INT: - It will accept decimal and binary values.
Syntax: - Int (5);
Date/Time: - It is used to assign date data type to the columns.
Format of date data type is DD: MM: YY
Syntax: Column name date;
Ex: - DOB date
LARGE OBJECTS:
 It is a new data type which is introduced recently in SQL.
 It contains mainly two data types.
1. BLOB (Binary Large Object)
2. CLOB (Character Large object)
 Both data types can accept 4GB of data.
These objects are store graphics, Images, files and documents etc.

SQL Constraints
 Constraint is a Rule for data base users.
 User cannot violate the roles.
 Constraints are used to ensure proper data entry in tables.
 Data base allows us to set constraints for every specific field.
 SQL provides following Constraint they are: -
1) Not Null
2) Unique
3) Primary key
4) Foreign key
5) Check
6) Default
 Not Null: - It won‘t accept Null values, but it will accept duplicate values.
SNo Name
1 Nani
1 Hima
 Unique: It cannot accept duplicate values. It will accept only unique values. But it will
accept null values.
Syntax: - create table Table_Name Column_Name data type unique
Example: - create table student (sno number (3) unique);
 Primary Key:
Combination of unique and Not-Null is called ―Primary Key. It will accept only
unique values and it will not accept null values (empty)
Syntax: AdmiNo Number (4) Primary Key.
 Foreign Key: - Establishing the relationship or connection between two tables based on
same column name.
Syntax: create table table_name2 column_name data type references table1 (column name)
Example: create table dept( deptno number(4) primary key, dname);
Create emp table( empid number(4), ename varchar2(20), deptno number(4) references
dept(deptno)
 Check: - Check constraint is used to check the user conditions. It will allow us to enter
when the check condition is true.
Syntax: - create table Table_Name (Column_Name data type check (condition)
Example: - create table student (marks number (3) check (marks>=0 and marks<=100);
 Default: - This constraint is used to assign a default value to a specific column.
Syntax: - create table Table_Name (Column_Name data type default value);
Example: - create table svndc (college_code Number (6) default = 7063);
When a user skip the default value column it will automatically insert the default value
otherwise it will store the inserted value.
Ex: Create Table student (S.No Number (4), Primary Key, Name Varchar (20) Not null,
College Code Number (4) Default = 20,
M1 Number (3) Check (M1 > = 0 AND M1 < = 100) M2 Number (3) check
(M2 > = 0 AND m2 < = 100)):

 Operators of SQL
 Operator is a symbol which can be used to perform some action.
 SQL provide many operators such as arithmetic, relational, logical, special operators.
 These operators can be used in DML Commands.
Types of operators
1. Arithmetic Operator 2. Relational Operator 3. Logical Operator

4. Set Operator 5. Special Operator
 Arithmetic Operator: [+,-,*,/]
Addition (+) : Select M1+M2 total from student
Subtraction (-) : Select (M1-M2)/2 Avg from student
Multiplication (*): -Select M1*M2 multiply from student
Division (/):-Select M1/M2 division from student
 Relational Operator:
Greater than (>): Less than (<):
Greater than or equal (>=) Less than or equal (<=):
Equal to (=): Not Equal to (<>)
Ex: Select * from EMP where sal> 2000;
Select * from EMP where sal< 2000;
Select * from EMP where job= ‘manager‘;
 Logical Operator: (AND, OR, NOT)
And: Select * from EMP where job= ‘clerk‘ and sal < = 2000;
Or: - Select * from EMP where Job= ‘clerk‘ or sal < = 2000;
 Set Operator
Union: combines unique records with matching attributes of multiple tables
Union all: combine all records matching attributes of multiple tables including duplicate
Intersect: retrieves common records of multiple tables
Minus: -retrieves other than common records with matching attributes from the first table.
 Special Operator
IN: - Check whether a value matches any value in a list.
BETWEEN AND: - Check whether a value is within a range
LIKE: - Checks whether a value matches a string pattern.
IS NULL: - Check whether a value is null.
EXISTS: - Check whether a job query returns a row.
||: - Concatenates two values.
DISTINCT: - To retrieve unique values from a table.
Examples:
Select * from emp where Ename In (“SMITH”, “JAMES”);
----------It will show records where ename are smith and James.
Select * from EMP where sal is BETWEEN 2000 AND 4000;
----------It will show the records where salaries between 2 to 4 thousands.
Select * from EMP where Ename like „S%‟;
----------It will display the records where name start
with S.
Select * from EMP where COMM IS NULL
----------Displays where commission is Null.
Select Deptno from Dept whereNOT EXISTS (Select Deptno from EMP where emp deptno =
Dept.Deptno);
----------It will display the dept no S which is not available in EMP.
Select “Asshu‟ || “Qadri‟ from dual;
----------O/p: Asshuqadri
Select DISTINCT job from EMP;
----------It will display unique job record.
 DUAL table in oracle:

Dual is table that is created by oracle together with data dictionary. It is suitable for in
selecting a pseudo column such as SYSDATE or USER. The table has a single
VARCHAR2(1) column called DUMMY that has a value of ‘X’.
SQL> desc dual;

Name Null? Type
………………… …………… ……….
DUMMY VARCHAR2(1)
The DUAL table is used because the relational model does not have a placeholder for
calculations because every command must be a SQL statement.
 Functions in SQL
 SQL provides several built-in-functions.
 Built-in functions are mainly used to get the results easily/quickly.
FUNCTIONS IN SQL:
 Aggregate (group)function
 Date and Time
 Numeric
 String
 Conversion
 Miscellaneous
AGGREGATE: -
 These functions are used to group of values, such as finding max, min, Avg and calculate
SUM etc.
 It allows group by close to categorize results.
 Aggregate functions can be divided into 2 types. Scalar aggregate, vector aggregate
Scalar aggregate:
Retrieving a single value using SQL query which includes aggregate function is
called scalar Aggregation.
Example: select Sum (sal) from EMP;
Vector Aggregate:
Using Aggregate functions retrieving multiple values from table is called “Vector
Aggregate”. Example: select Deptno, sum (sal) from emp group by dept no;
FUNCTION PURPOSE EXAMPLES
SUM Find sum of value of a column Select SUM (sal) from EMP;
Finding Average of value of a

AVG column Select Avg (sal) from EMP;
MAX Finding maximum value Select Max (sal) from EMP;
MIN Finding minimum values Select min (sal) from EMP;
COUNT Count the records in a column Select count (*) from EMP;
DATE & TIME: -

Sys Date: It is used to return systems date.
Syntax: select sys date from dual;
Add-Month: - It is used to add no of month to given date.
Syntax: Select Add-month (Date, No of month to add);
Ex: select add-months (sys date,2);
Months-Between: It is used to find out No of months between two dates.
Syntax: Select months-between (start date, end date) from dual;
Example: Select Month-between (sys date, Hire date) from EMP;
NUMERIC FUNCTIONS:
Sqrt: - It is used to find out square root of a given number.
Syntax: select sqrt (16) from dual;
Power: - It is used to find power values of a given number.
Syntax: select power (2,3) from dual;
Mod: - Returns reminder values
Select MOD (11,21) from dual;
Sin: - It returns SIN values
Select Sin (40*3.14/180) from dual;
Cos: - It returns Cos values
Select Cos (40*3.14/180) from dual;
STRING FUNCTIONS: -
Concat: - Add two strings
Select concat („SVN‟, „DC‟) from dual;
O/p: SVNDC
Upper: - It is converted lower to upper.
Select upper (Ename) from EMP;
O/p: ENAME
Lower: - Convert upper to lower
Select lower (ENAME) from EMP;
O/p: ename
Length: - Return no of characters of a string.
Select Length („SVNDC‟) from dual;
Chr: - Return character corresponding to ASCII no.
Select chr (65) from dual;
ASCII: Return ASCII value to given character
Select ASCII („A‟) from dual;
CONVERSION FUNCTION:-
To-Char: - It is used to convert a date to the char types.
Select to-char (sys date, „DD/MM/YYYY‟) from dual;
Select to-char (sys date, „DDD/MON/YEAR‟) from dual;
Select To-char (sys date, „HH:MM:SS‟) from dual;
To-Date: - Converts a string date to date data type.
Select to-date („09/05/2015‟, „DD/MM/YYYY‟) from dual;
O/P: 09-May-2015
MISCELLANEOUS:
User: Return current user
Select user from dual;
Uid: Return user ID
Select UID from Dual;
Greatest: Return greatest Number
Select greatest (10, 20, 30) from dual;
O/P: 10
 SQL Clause (or) structure of SQL
Clause enhanced the functionality of SQL command
Select
From
Where
Distinct
Group by
Order by
Having
SQL Clause Purpose
SELECT List the column from table or view
FROM Identify the table or view
WHERE Based on condition
DISTINCT Unique records
GROUP BY Categorized result
ORDER BY Ascending order or descending order
HAVING Can only be used with group by and acts as a secondary where
condition
UNION Combines unique records with matching attributes of multiple
tables
UNION ALL Combines all records with matching attributes of multiple tables
including duplicates
INTERSECT Retrieve common records with matching attributes of multiple
tables
MINUS Retrieve other than common records with matching attributes
from the first table.
Select: Select clause is used to list the columns from table or view.
Ex: SQL> select * from emp;
From the above example all columns selected from emp table.
From: From clause is used to purpose of identifying the tables or views from which column will
be chosen
Ex: SQL> select empid, ename from emp;
From the above example empid, ename columns are selected from emp table.
Where: Where clause is used to purpose of includes the conditions for rows selection within a
single table or multiple tables or views.
Ex: SQL> select ename from emp where job= ‘clerk’;
From the above example am selecting ename based on condition job as clerk.
Distinct: Distinct clause is used to purpose of avoid duplicate rows that is unique rows / records
Ex: SQL> select * from emp where distinct empid;
Groupby: This clause is used to purpose of groups rows according to the categories of specifies.
Ex: SQL> select * from emp group by job;
Order By: this clause is used to purpose of sorts the final result rows in ascending order or
descending order.
Ex: SQL> select * from emp order by sal;
Having: This clause is used to purpose of can only be used with a group by and add as a
secondary where clause.
Ex: SQL> select * from emp where ename having allen;
 SQL WHERE Clause

 The SQL WHERE clause is used to specify a condition while fetching the data from single
table or joining with multiple table if the given condition is satisfied then only it return
specific value from the table.
 WHERE clause is used to filter the records and fectching only necessary records.
 The WHERE clause not only used in SELECT statement, but it is also used in UPDATE,
DELETE statement etc……….
Syntax: SQL> select column1, column2, ……olumn n from <table_name> where <condition>;
The basic syntax of SELECT statement with WHERE clause is as follows
SQL> select ename,dname from emp,dept where emp.deptno=dept.deptno;
The above statement displays ename and dname from emp and dept tables based on the
where clause.
SQL> update table student set sname= ‘ravi’ where sno=10;
The above statement update the sname as ravi based on where clause.
SQL> delete from student where sname= ‘raju’;
The above statement delete the records /rows from student table based on where clause.
 SQL GROUP BY Clause
 SQL GROUP BY aggregates (consolidates and calculates) column values into a single record
value. GROUP BY requires a list of table columns on which to run the calculations. This
behavior will resemble the SELECT DISTINCT command which is discussed above.
 Consider the following table, EMPLYOEE having following records;
ID NAME AGE ADDRESS SALARY
1 Vijaya Delhi 2000.00
2 Kiran 25 Delhi
3 Srinivas 23 MP 0.00
4 Siddharth 25 Mumbai 6500.00
5 Sneha 23 Indore 8500.00
6 Jaya 22 MP 8000.00
7 Thangam 24 Indore 5000.00
SQL> select address from employee group by address;

ADDRESS
………….
Delhi
MP
Mumbai
Indore
SQL has consolidated like values and returned those that are unique. In this case, we have
actually duplicated the behavior of SELECT. How group by accept a table column as a list
and consolidates like employee values.
 SQL ORDER BY Clause
It is used in the last portion of select statement; by using this rows can be sorted. By default it
takes ascending order.
DESC: is used for sorting in descending order sorting by column.
Ex: SQL> select empno, ename ,sal from emp oder by sal;
The above statement displays empno, ename and sal from emp table which is order by sal in
ascending order.
Sorting by multiple columns ascending order on department number and descending order of
salary in each department;
SQL> select ename, deptno,sal from emp order by deptno,sal desc;
 SQL HAVING Clause:

The SQL HAVING clause is like a where clause for aggregated data. Its used with conditional
statements, just like where to filter results.
One thins to note is that any column name appearing in the HAVING clause must also appear
in the GROUP BY clasue.
To select the department that has total salary paid for its employees more than 10000 the SQL
query would be like:
SQL> select address, sum(salary) from employee group by address having sum(salary)>10000;
The output would be like:

Address salary
Delhi 2000.00
MP 8000.00
Mumbai 6500.00
When where, group by and having clauses are used together in a select statement, the
where clause is processed firsr, then the rows that are returned after the where clause is
executed are grouped based on the group by clause. Finally any conditions in the group
functions in the having clauses are applied to the grouped rows before the final output is
displayed.
 Nested Query (or) Sub Queries:

 A Sub query is a Query (select statement) inside a Query.
 A Sub Query is normally expressed inside parentheses.
 The first Query in the SQL statement is known as “Outer Query”.
 The Query inside the SQL statement is known as “Inner Query”.
 Inner Query will execute first.
 Inner Query is used as input for the outer Query.
 Sub Query is also called as “Nested Query”.
 The sub Query is always at the right side of comparison or assigning expression.
 Sub Query can return one value or multiple values.
Example:
Find second Max salary of the employee in EMP Table:
Select Max (sal) from EMP where sal < (select max (sal) from EMP);

 SQL Joins
 SQL joins are used to related information in different tables.
 A join condition is a part of the SQL query that retrieves rows from two or more tables.
 A SQL join condition is used in the SQL where clause of select, update, delete statements.
 An SQL join clause combines records from two or more tables in a database.
 It creates a set that can be saved as a table or used as it is.
 ANSI standard SQL specifies three types Inner, outer and cross join.
 Joining two tables effectively creates another table which combines information from both
tables.
 Join operations merge rows from two or more tables and return (executed) the rows with
following conditions.
 Have common values in common column [Natural Join].
 Meet a given join condition [equality or inequality join]
 Have no matching values [outer join]
 Have some matching values.
1. INNER JOIN: -
 Inner join is a traditional join in which only rows that meet a given condition are selected.
 That means, inner join will be executed only based on conditions.
 Inner join condition can be an equality condition or inequality condition.
 Equality join also called as Natural join or Equi join.
 Inequality join also called as theta join.
Equi-Join: It will return only matched records in two tables based on join condition.
Select Rno,name, std.cid, course.cid, cname from std, course where std.cid=course.cid;
Non Equi-Join: It will display Non-Equality join condition
Select R. no, name, STD.CID, Course.CID, CName from STD, Course where STD.CID! =
Course.CID;
. No Name CName
10 Niru C
10 Niru Java
10 Niru Oracle
20 Hima C
20 Hima Java
20 Hima Oracle
30 Monica C
30 Monica Java
30 Monica Oracle
Natural Join: same as equi join except one of the duplication columns is eliminated in the result
table.
Select rno,name , std.cid,course.cid,cname from std, course where std.cid=course.cid;
STD table Course table
Rno Name CID CID CName
30 Hima 33 10 C
40 Sri 44 20 Java
50 Mounika 55 30 Oracle
Outer Join:
 Outer join returns not only the matching rows but also the unmatched attribute values
from one or more table.

 That means outer join return matching and unmatched records.
 Outer Join can be divided into 3 types
1. Left join
2. Right join
3. Full join
Left Outer Join:
 Join operations take place two tales at a time.
 The first table named in from clauses will be the left side, and second table named will
be the right side.
 Left outer join returns not only matching the join condition but also the rows in the
left side table with unmatched values in the right side.
Syntax: select column list from tale1 left join table 2 on join condition
Example: select R.No, Name, STD.CID, CName, from STD, Course where STD CID (+) = course
CID
OUTPUT:
R.No NAME CID CNAME
33 Hima 10 C
20 Java
Bindu 30 Oracle
Right Outer Join:

 The right outer join returns not only the row matching the join condition but also the
rows in the right side table with unmatched value in the left side table.
 That means it will display matching record from two tables and first table unmatched
records
[All the records in first table].
Syntax: Select column-list from table1 Right join Table2 on Join condition.
Example:
Select R.No, Name, STD.CID, CName from STD Course where STD.CID = Course.CID (+);
R.NO NAME CID CNAME
33 Hima 10 C
44 sri 30 Oracle
55 Mounika 80
Full Outer Join:

 It will returns matched and unmatched records from two tables and it will display
all the records from first table and second table.
Syntax: Select column_list from table1 full join table2 on join condition.
Example:
Select R.No, Name, STD.CID, CName from STD full join course on STD.CID = Course.CID.
CROSS JOIN:
 A cross join performs a relational product of two tables. It is also called as Cartesian
product join.
Syntax: Select column_list from table1 cross join table2
Example: Select Name, CName from STD cross join course;
NAME CNAME
Hima C
Sri C
Mona C
Hima Java
Sri Java
Mona Java
Hima Oracle
Sri Oracle
Mona Oracle
 SQL Views:
 View is a virtual table based on a select query.
 A Query (select) contains column. Computed columns, aliases and aggregate functions
from one or more tables.
 The table on which the view is based is called “Base Table”.
 Views can be treated as a stored query.
 Create is a command is used to create the views.
 Select statement used to generates the virtual table.
ADVANTAGES:
 Views provide data security.
 Views are little storage space.
 Need not to write Query again and again for retrieving same data and same query.
 It is a stored Query.
 Views are dynamically updated.
 We can use the name of a view anywhere.
 Views can restrict users to only specified columns and specified rows in a table.
 Views may also be used to generate reports.
LIMITATIONS:
 Views cannot be indexed.
 Limitations on updating data through views.
To Create a View:
Syntax: Create view View_Name as select (column list) from table name where condition.
Ex: Create view EMP dept details as select EMP No, EMP NAME, JOB, Dept No, Dname From
EMP, DEPT where EMP.deptno=Dept.Deptno.
Run A View:
Select * from EMP dept details;
Types of Views:
There are two types of view simple view and complex view


Simple View Complex View
Created from one table Created from one or more table
Does not contain functions Contain functions
Does not contain groups of data Contains groups of data
 SEQUENCES:
o Automatically generate unique numbers
o Sequences are sharable
o Sequences are typically used to create a primary key value
o Sequences replication code
o Sequences are used to speed up the efficiency of accessing sequences values when cached
in memory.
o Sequences can be edited and modified.
o Sequences can be created and deleted any time.
o We can start with positive or negative values.
o We can increment values ascending or descending.
o Oracle sequences increment by default value.
o During data entry, an oracle sequence allows us to use 2 special pseudo columns.
 Next Val
 Current Val
To Create Sequence:
Create sequence SNo Min Value 1 Max value 100 start with 1 increment by 1;
To View the sequences:
Select * from user_sequences;
Ex: create a sequence name SEQSS that starts at 105, has a step of 1 and can take maximum value
as 2000.
Create sequence seqss start with 105 incremented by 1 max value 2000;
 Synonyms:
 A synonym is an alias for a database object (table, view, procedure, function, package,
sequence, etc.). Synonyms may be used to reference the original object in SQL as wel as
PL/SQL.
 They can be used to hide ownership and location of the database objects they refer to and
minimize the impact of moving or renaming the database objects.
 There are two types of synonyms
 Private: Private synonyms exist only in a specific user schema. The owner of the synonym
maintains control over availability to other users.
 Public: A public synonym is available to all users in the database
Ex: SQL> create synonym D30 for EMPD30;
Now if we give command
Select * from D30;





UNIT – IV
TRANSACTIONS AND CONCURRENCY MANAGEMENT
Introduction: A collection of actions that transforms the DB from one consistent state into another
consistent state, during the execution the DB might be inconsistent. The actions include
 Read
 Write
 Abort
 Commit
States of Transactions:
A transaction in a database can be in one of the following state
1. Active state
2. Partially committed state
3. Failed state
4. Abort state
5. Full committed state
o Active state: In this state the transaction is being executed. This is the initial state of every
transaction.
o Partially committed state: when a transaction executes its final operation, it is said to be in
this state. After execution of all operations, the database system performs some checks.
o Failed state: If the execution of the transaction cannot be processed, then the transaction
enters the fail state where the system performs the rollback operation.
o Abort state: The roll back transaction enters the abort state where the system performs any
one of the following operations.
o Restart: If the transaction was aborted due to hardware or software failure, then the
transaction can be restarted, such transaction is considered as a new transaction.
o Kill: If the transaction was aborted due to some internal logical error or due to bad
input then the system can kill the transaction.
o Full committed state: If transaction executes all its operations successfully it is said to be
committed. All its effects are now permanently made on database system.
 Transaction Properties (ACID Properties)
o A transaction is a logical unit of work.

o A transaction is an action, which reads data from a data base or writes data into a database.
o Using SQL statements, we can perform any transaction in data base.
o If any SQL statement is fail, the entire transaction is rolled back to the original data base
and that will not affect the data base.
o In a data base, transactions may be inserts, update or delete the data or retrieve data
from database.

o Each individual transaction must display atomicity, consistency, isolation and
durability. These properties are sometime referred to as the ACID TEST.
o The acid test intention is whether the transaction is successfully completed or not. If
not, transaction is aborted otherwise data base changes from one consistent state to
another.
Properties of Transactions:
Atomicity: Transaction is treated as a single, logical unit of work it requires that all SQL
requests which are performed by user successfully completed or not, if not transaction is
aborted.
Consistency: Consistency indicates the performance of data base’s consistent state. A
Successfully transaction changes the data base from one consistent state to another. A consistent
data base state is one in which all data integrity constraints are satisfied. When a transaction is
completed, the data base must be in a consistent state; otherwise the entire transaction is
aborted.
Isolation: Isolation means that, the data used during the execution of a transaction cannot be
used by a second transaction until the first transaction is completed. This property is
particularly useful in multi user data base environment. Because several users can access and
update the data base at the same time.
Durability: Ensures that once transaction changes are done, (committed), they cannot be lost,
even the system is fail (or) burn (or) collapse. Serializability ensures that the schedule for the
concurrent execution of the transactions. This property is important in multiuser and distributed
data bases.
 Concurrent Transaction:
When more than one transaction is executed by the operating system in a
multiprogramming environment, there are possibilities that instructions of one transaction are
interleaved with some other transactions.
 Schedule: A chronological execution sequence of transaction is called schedule. A
schedule can have many transactions in it, each comprising of number of instructions
/tasks.
 Serial Schedule: A schedule in which transactions are aligned in such way that one
transaction is executed first. When the first transaction complete its cycle then next
transaction is executed. Transactions are ordered one after other. This type of schedule is
called serial schedule are executed in a serial manner.
To resolve the problem, we allow parallel execution of transaction schedule if transactions in it
are either serializable or have some equivalence relation between or among transactions.

Serial Schedule Schedule for the two transactions when interleaved
T1 T2 T1 T2
Read (x) Read (x)
Write (x) Write (x)
Read (y) Read (x)
Write(y) Write (x)
Read (x) Read (y)
Write (x) Write(y)
Read (y) Read (y)
Write(y) Write(y)
 Concurrency Control:
o Simultaneous execution of transactions in a multiuser data base system is known as
o “Concurrency”.
o Concurrency control is the process of managing simultaneous operation.
o Simultaneous execution of transactions over a shared data base can create several data
integrity and consistency problems. This situation occurs in multiuser data base
environment
o Database allows No. of users to perform their individual operations, that may create several
o Data integrity and consistency problems.
Need concurrency Control :
Simultaneous execution of transactions over a shared database can create several data
integrity and consistency problems.
1. Lost updates
2. Uncommitted data
3. Inconsistent retrievals
Lost Updates: The lost update problem occurs when two concurrent transactions are updating
the same data elements and one of the updates is lost (over written by other transaction).it is a
kind of write/write conflict.
Uncommitted data (dirty-read):- the uncommitted data problem occurs when two
transactions, executed concurrently and first transaction is rolled back after the second
transaction has already accessed the uncommitted data. It is a kind of write/read conflict
and violates isolation property. This is also called as Dirty-Read Problem.
Inconsistent retrievals: - Inconsistent retrievals occur when a transaction accesses data before
and after another transaction finish working with such data. The problem is that transaction
might read some data before they are changed and other data after they are changed, thereby
yielding inconsistent (wrong) results. This is a kind of write/read conflict.
 Locking Protocol:
o Locking protocol is set of rules that are followed by every transaction of a DBMS, as this
protocol is capable of ensuring that only serializable , recoverable schedules are executed.
o A lock is basically a variable that is associated with a data item in the database.
o A lock can be placed by many transactions on a shared resource that it desire to use.

o When this is done, the data item is available for the exclusive use for that transactions i.e.
other transactions are locked out of that data item.
o When a transaction that has locked a data item does not desire to use it any more, it should
unlock the data item so that other transaction can use it.
o The locking mechanism helps us to convert a schedule into a serialisable schedule.
Serialisable Schedule: A Schedule S of n transactions is serialisable if it is equivalent to some

serial schedule of the same n transactions.
Schedule A: T2 followed by T1 Schedule B: T1 followed by T2
Schedule T1 T2 Schedule T1 T2
Read X Read X Read X Read X
Read Y Read Y Subtract 100 Subtract 100
Display X+Y Display X+Y Write X Write X
Read X Read X Read Y Read Y
Subtract 100 Subtract 100 Add 100 Add 100
Write X Write X Write Y Write Y
Read Y Read Y Read X Read X
Add 100 Add 100 Read Y Read Y
Write Y Write Y Display X+ Y Display X+ Y
Locks:
Serialisability is just a test whether a given interleaved schedule is ok or has a concurrency
related problem.
So let us discuss about what the different types of locks are and then how locking ensures
serialisability of executing transactions:
 Binary Lock: This locking mechanism has two states for to a data item locked or unlocked.
 Multiple-mode lock: In this locking type each data item can be in three states read locked,
write locked and or unlocked.
There are many transactions in the database system that never updates the data values,
these transactions can coexist with other transactions.
If a transaction is an updating transaction, that is it updates the data items, it has to ensure
that no other transaction can access (Read or write) those data items that it wants to update.
The properties of shared and exclusive locks are summarized below:

Shared lock (Or) read Lock:
o It is requested by a transaction that wants to just read the value of data item.
o A shared lock on a data item does not allow an exclusive lock to be placed but permits any
number of shared locks to be placed on that item.
Exclusive lock:
o It is requested by a transaction on a data item that it needs to update.
o No other transaction can place either shared lock or an exclusive lock on a data item that
has been locked in an exclusive mode.
Ex: Let us consider a transaction that deposits Rs .100 from Account A to account B. The
transaction should now be written as the following:
Lock – X (A) (Exclusive lock, we want to both read A’s value and modify it)
Read A;
A=A-100;
Write A;
Unlock(A) (Unlocking A after the modification is done)
Lock – X(B) (Exclusive lock, we want to both read B’s value and modify it)
Read B;
B=B+100;
Write B;
Unlock (B); (Unlocking B after the transaction is done)
 Two –Phase locking protocol (2PL):
The Two-phase locking protocol is used to ensure the serializability in database. This protocol
is implemented in two phase
1. Growing Phase:
In this phase we put read or write lock based on need on the data. In this phase we does
not release any lock.
2. Shrinking Phase:
This phase is just reverse of growing phase. In this phase we release read and write lock
but doesn’t put any lock on data.
Ex:
T1 T2 Lock Manager
Lock – S(P)
Grant –S(P,T1)
Read (P)
Unlock (P)
Lock – X(Q) Grant-X(Q,T1)
Read (Q)
Q=Q+P
Write (Q)
Unlock (Q)
Lock – S(Q)
Grant – S(Q,T2)
Read (Q)
Unlock (Q)
Lock –X(P)
Grant – X(P,T2)
Read (P)
P=P+Q
Write (P)
Unlock-(P)
 Deadlock and Its preventions:

In a multi process system, deadlock is an unwanted situation that arise in a shared resource
environment, where a process indefinitely waits for a resource that is held by another process.
Ex: Assume a set of transactions {T0,T1,T2……….Tn} T0 need a resource X to complete its task.
Resource X is held by T1 and T1 waiting for a resource Y, which is held by T2, T2 is waiting for
resource Z, which is held by T0. Thus, all the processes wait for each other to release resources.
In this situation, none of the processes can finish their task. This situation is known as a
DEADLOCK.
Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the
transactions involved in the deadlock are either rolled back or restarted.
Deadlock Prevention:
A transaction requesting a new lock is aborted when there is the Possibility that a deadlock can
occur. If the transaction is aborted, all changes made by this transaction are rolled back and all
locks obtained by the transaction are released. The transaction is then rescheduled for execution.
Deadlock prevention works because it avoids the conditions that lead to dead locking.
Wait – Die Scheme:
In this scheme, if a transaction requests to lock a resource (data item), which is already held
with a conflicting lock by another transaction, and then one of the two possibilities may occur.
 If TS(Ti)<TS(Tj) that is Ti, which is requesting a conflicting lock, is older than Tj then Ti
allowed to wait until the data item is available.
 If TS(Ti)>TS(Tj) that is Ti is younger than Tj then Ti dies. Ti is restarted later with a random
delay but with the same timestamp.
Wound-Wait scheme:
In this scheme, if a transaction requests to lock a resource (data item), which is already held
with conflicting lock by some another transaction, one of the two possibilities may occur.
 If TS(Ti)<TS(Tj), then Ti forces Tj to be rolled back – that is Ti wounds Tj. Tj is restarted later
with a random delay but with the same timestamp.
 If TS(Ti)>TS(Tj) then Ti is forced to wait until the resource is available.
Deadlock Avoidance:
Aborting a transaction is not always a practical approach. Instead, deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance. Methods like “Wait for
graph” are available but they are suitable for only those systems. Where transactions are
lightweight having fewer instances of resource.
Wait for Graph:

This is simple method available to track if any deadlock situation may arise. For each
transaction entering into the system, a node is created. When a transaction Ti requests for a lock
on an item , say X, which is held by some other transaction Tj, a directed edge is created from Ti to
Tj. If Tj releases item X, the edge between them is dropped and Ti locks the data item.
Waits for Lock(Y)

Ti
Tj
Waits for Lock (X)

 Concurrency control locking strategies:
Pessimistic Locking: This concurrency control strategy involves keeping an entity in a
database locked the entire time it exists in the databases memory. There are two types of locks
that fall under the category of pessimistic locking write lock and read lock.
With write lock, everyone but the holder of the lock is prevented from reading, updating, or
deleting the entity. With read lock other users can read the entity, but no one except for the
lock holder can update or delete it.
Optimistic Locking: This strategy can be used when instances of simultaneous transactions or
collision are expected to be infrequent. Pessimistic locking provides a guarantee that database
changes are made safety.
Optimistic locking can alleviate the problem of waiting for lock to release.
 Database Recovery:
During the life of a transaction, that is, a after the start of a transaction but before the
transaction commits, several changes may be made in a database state. The database during
such a state is in an inconsistent state.
Assume that a transaction transfers Rs. 2000/- from A’s account to B’s account. Simply we
are not showing any error checking in the transaction. The transaction may be written as
READ A
A=A-2000
WRITE A
Failure
READ B
B=B+2000
WRITE B
COMMIT
Recovery techniques are used to bring database, which does not satisfy consistency
requirements into consistent state. If a transaction completes normally and commits then all
changes made by the transaction on the database are permanently registered in the database.
An abnormal termination of transaction may be due to several reasons including:
a) User may decide to abort the transaction issued by him/her
b) There might be a deadlock in the system
c) There might be a system failure.
 Kinds of Failures
The kinds of failures that a transaction program during its execution can encounter are:
1. Software failures:
In such cases, a software error stops the execution of the current transaction (all
transactions), thus leading to losing the state of program execution and the state /contents
of the buffers.
2. Hardware failures:
Hardware failures are those failures when some hardware chip or disk fails. This may
result in loss of data. This may be due to many reasons. Some bad sectors may come on
disk or there is a disk crash. In all these cases, the database gets into an inconsistent state.
3. External failures:
A failure can also result due to an external cause. Such as fire, earthquakes, floods …etc. the
database must be duly backed up to avoid problem occurring due to such failures.
 Database Errors
An error is said to have occurred if the execution of a command to manipulate the database
cannot be successfully completed either due to inconsistent data or due to state of program.

Broadly errors are classified into the following categories:
1. User Error: This includes error in a program (Logical error) as well as error made by
online users of database. These types of errors can be avoid by applying some check
conditions in programs or by limiting the access rights of online users.
2. Consistency Error: These errors occur due to the inconsistent state of database caused may
be to wrong execution of commands or in case of a transaction abort. To overcome these
errors the database system should include routines that check for the consistency of data
entered in the database.
3. System Error: These include errors in database system or the OS. Such errors are fairly
hard to detect and require reprogramming of the components of the system software.
 Database Backup
 Database Backup is storage of data that means the copy of the data.
 It is a safeguard against unexpected data loss and application errors.
 It protects the database against data loss.
 If the original data is lost, then using the backup it can reconstructed.
The backups are divided into two types,
1. Physical Backup
2. Logical Backup
Physical backups
 Physical Backups are the backups of the physical files used in storing and recovering your
database, such as datafiles, control files and archived redo logs, log files.
 It is a copy of files storing database information to some other location, such as disk, some
offline storage like magnetic tape.
 Physical backups are the foundation of the recovery mechanism in the database.
 Physical backup provides the minute details about the transaction and modification to the
database.
Logical backup
 Logical Backup contains logical data which is extracted from a database.
 It includes backup of logical data like views, procedures, functions, tables, etc.
 It is a useful supplement to physical backups in many circumstances but not a sufficient
protection against data loss without physical backups, because logical backup provides only
structural information.
Importance of Backups
 Planning and testing backup helps against failure of media, operating system, software and
any other kind of failures that cause a serious data crash.
 It determines the speed and success of the recovery.
 Physical backup extracts data from physical storage (usually from disk to tape). Operating
system is an example of physical backup.
 Logical backup extracts data using SQL from the database and store it in a binary file.
 Logical backup is used to restore the database objects into the database. So the logical
backup utilities allow DBA (Database Administrator) to back up and recover selected objects
within the database.
Recovery can be done using / restoring the previous consistent state (Backward recovery) or by
moving forward to the next consistent state as per the committed transactions (Forward recovery)
recovery.
Backward Recovery (Undo):
In this scheme the uncommitted changes made by a transaction to a database are undone.
Instead the system is reset to the previous consistent state of database that is free from any
errors.
Database
with changes Database
UNDO without
changes
Before
images
Images
Forward Recovery (Redo):
In this scheme the committed changes made by a transaction are reapplied to an earlier
copy of the database
Database
with changes Database
REDO without
changes
After
images
Images
 Security & Integrity:

 Database security is the protection of information that is maintained in a database. It
deals with ensuring only the right people get the right to access the right data.
 By right people we mean those people who have the right to access/update the data
that they are requesting to access/update with the database.
 One of the concepts of database security is primarily a specification of access rules about
who has what type of access to what information. This is also known as the problem of
Authorization.
 These access rules are defined at the time database is defined. The person who writes
access rules is called the authorizer.
To protect database several levels of security measures are maintained:
 Physical: The site or sites containing the computer system must be physically secured
against illegal entry of unauthorized persons.
 Human: An authorization is given to a user to reduce the chance of any information
leakage and unwanted manipulations.

 Network: Since database allows distributed or remote access through terminals or
network, software level security within the network software is an important issue.
 Database system: The data item is a database need a fine level of access control.
Ex: A user may only be allowed to read a data item and is allowed to issue queries but
would not be allowed to deliberately modify the data.
The Database Administrator (DBA) is responsible for implementing the database security
policies in a database system. The organization or data owners create these policies.
 Database Security:
Database security refers to the collective measures used to protect and secure a database or
database management software from illegitimate use and malicious threats and attacks.
It is a broad term that includes a multitude of processes, tools and methodologies that
ensure security within a database environment.
Database and functions can be managed by two different modes of security controls:
1. Authentication
2. Authorization
Authentication:
o Database access usually requires user authentication and authorization. For user
authentication, the first level of security establishes that the person seeking system entry is
an authorized user.
o Authorization allows the database users to access certain part of database. However before
accessing the database, users need to identify themselves to the system to confirm their
correctness.
o Authentication is used by a server when the server needs to know exactly who is accessing
their information or site.
o Authentication is used by a client when the client needs to know that the server is system it
claims to be.
o In authentication, the user or computer has to prove its identity to the server or client.
o Usually, authentication by a server entails the use of a user name and password. Other
ways to authenticate can be through cards, retina scans, voice recognition, and fingerprints.
Authorization:
o Authorization is a process by which a server determines if the client has permission to use a
resource or access a file.
o Authorization is usually coupled with authentication so that the server has some concept of
who the client is that is requesting access.
o The type of authentication required for authorization may vary; passwords may be
required in some cases but not in others.
o In some cases, there is no authorization; any user may be use a resource or access a file
simply by asking for it. Most of the web pages on the Internet require no authentication or
authorization
Different types of access authorization may be allowed for a particular view, such as the following
1. Read authorization: allows reading, but not modification, deletion or insertion of data.
2. Insert authorization: allows insertion of new data, but no modification of existing data.
3. Update authorization: allows modification of data, but not deletion.
4. Delete authorization: allows deletion of data.

UNIT – V
DISTRIBUTED AND CLIENT SERVER DATABASE
 Distributed database is logically interrelated collection of shared data, physically distributed
over a computer network.
 Distributed DBMS (DDBMS) is software system that permits the management of the
distributed database and makes the distribution transparent to users.
 A distributed database is integrated database which is built on top of a computer network
rather than on a single computer.
 The data which constitute the database are stored at the different sites of the computer
network and the application program which are run by the computer access data at different
sites.
 Database may involves different database management systems, running on different
architectures, that distributes the execution of transactions.
A distributed database management system (DDBMS) is defined as the software that
handles the management of the DDB (Distributed Database) and makes the operation of such a
system appear to the user as a centralized database.
Database Computer
Technology Networks
Integration Distribution
Distributed
Database systems
Integration
Types of DDBMS:
DDBMS are basically divided into two types, they are
Homogeneous Distributed database:
 All sites have identical software
 Are aware of each other and agree to processing user requests
 Each sits surrenders part of its autonomy in terms of right to change scheme or software
 Appears to user as a single system
Heterogeneous Distributed database:
 Different sites may use different schemas and software
 Difference in scheme is a major problem for query processing
 Difference in software is a major problem for transaction processing.
Characteristics of Distributed Database Management Systems:

A DDBMS governs the storage and processing of logically related data over interconnected
computer systems in which both data and processing functions are distributed among several
sites.
A DBMS must have at least the following functions to be classified as distributed:

 Application interface to interact with the end user, application programs and other DBMSs
within the distributed database.
 Validation to analyze data requests for syntax correctness.
 Transformation to decompose complex requests into atomic data requests components.
 Backup and recovery to ensure the availability and recoverability of the database in case of
a failure.
 Structures of Distributed Database
A distributed database system consists of a collection of sites, each of which maintains a
local database system. Each site is able to process local transaction.
The sites in the system can be connected physically in a variety of ways. The various
topologies are represented as graphs whose nodes correspond to sites. An edge from node A to
node B corresponds to a direct connection between the two sites.
No Database sharing Architecture:
Computer system 1 Computer system 2
CPU DB CPU DB
Memory Memory
Local Area Network
Computer system n
CPU DB
Memory
Networked Architecture with a Centralized Database
DB1 Central site DB2

Mumbai
Site Calcutta
Site Delhi
Communication
Network
Site Chennai
Site Bangalore

Distributed Database Architecture:
DB Central site DB
Mumbai
Site Calcutta DB
Site Delhi
DB
Communication
Network
Site Chennai
Site Bangalore DB
DB
Advantages of DDBMS:
1. Data are located near the greatest demand site. The data in a distributed database system are
dispersed to match business requirements which reduce the cost of data access.
2. Faster data access. End users often work with only a locally stored subset of the company’s
data.
3. Faster data processing. A distributed database system spreads out the systems workload by
processing data at several sites.
4. Growth facilitation. New sites can be added to the network without affecting the operations of
other sites.
5. Improved communications. Because local sites are smaller and located closer to customers,
local sites foster better communication among departments and between customers and company
staff.
Disadvantages of DDBMS:
1. Complexity of management and control. Applications must recognize data location, and they
must be able to stitch together data from various sites. Database administrators must have the
ability to coordinate database activities to prevent database degradation due to data anomalies.
2. Technological difficulty. Data integrity, transaction management, concurrency control,
security, backup, recovery, query optimization, access path selection, and so on, must all be
addressed and resolved.
3. Security. The probability of security lapses increases when data are located at multiple sites.
The responsibility of data management will be shared by different people at several sites.
4. Lack of standards. There are no standard communication protocols at the database level.
(Although TCP/IP is the de facto standard at the network level, there is no standard at the
application level.) For example, different database vendors employ different—and often
incompatible—techniques to manage the distribution of data and processing in a DDBMS
environment.

5. Increased storage and infrastructure requirements. Multiple copies of data are required at
different sites, thus requiring additional disk storage space.
 Data Replication:
 Data Replication refers to the storage of data copies at multiple sites served by a
computer network. Fragment copies can be stored at several sites to serve specific
information requirements.
 Data replication is the frequent electronic copying data from a database in one computer
or server to database in another so that all users share the same level of information.
 The result s a distributed database in which users can access data relevant to their tasks
without interfering with the work of others.
 The implementation of database replication for the purpose of eliminating data or
inconsistency among users is known as normalization.
Replicated data are subject to the mutual consistency rule. The mutual consistency rule requires
that all copies of data fragments be identical. Therefore, to maintain data consistency among the
replicas, the DDBMS must ensure that a database update is performed at all sites where replicas
exist.
Advantages of Replication:
 Availability: Failure of site containing relation r does not result in unavailability of r is

replicas exist.
 Parallelism: queries on r may be processed by several nodes in parallel.
Disadvantages of Replication:
 Increased cost of updates: each replica of relation r must be updated.

 Increased complexity of concurrency control: concurrent updates to distinct replicas may
lead to inconsistent data unless special concurrency control mechanism are implemented.
Three replication Scenarios:
 Fully Replicated: A fully replicated database stored multiple copies of each database
fragment at multiple sites. In this case, all database fragments are replicated.
 Partially Replicated: A partially replicated database stores multiple copies of some
database fragments at multiple sites. Most DDBMS are able to handle the partially
replicated database well.
 Un replicated: An Un replicated database stores each database fragment at a single site.
Database replication can be done in at least three different ways
 Snapshot Replication: Data on one server is simply copied to another server, or to another
database on the same server.
 Merging Replication: Data from two or more databases is combined into a single database.
 Transactional Replication: Users receives full initial copies of the database and then
receive periodic updates as data changes.
A Distributed database management system (DDBMS) ensures that changes, additions, and
deletions performed on the data at any given location are automatically reflected in the data
stored at all the other locations.
 Data Fragmentation:
 Data fragmentation allows us to break a single object into two or more segments or
fragments.
 The object might be a user‟s database, a system database, or a table. Each fragment can
be stored at any site over a computer network.
 Information about data fragmentation is stored in the distributed data catalog (DDC), from
which it is accessed by the TP to process user requests.
 Data fragmentation strategies can be divided into 3types they are
1. Horizontal,
2. Vertical,
3. Mixed.
Horizontal Fragmentation: refers to the division of a relation into subsets (fragments) of tuples
(Rows). Each fragment is stored at a different node, and each fragment has unique rows. How-
ever, the unique rows all have the same attributes (columns). In short, each fragment represents
the equivalent of a SELECT statement, with the WHERE clause on a single attribute.
Vertical Fragmentation: refers to the division of a relation into attribute (column) subsets. Each
Subset (fragment) is stored at a different node, and each fragment has unique columns with the
exception of the key column, which is common to all fragments. This is the equivalent of the
PROJECT statement in SQL. '
Mixed Fragmentation: refers to a combination of horizontal and vertical strategies. In other
Words, a table may be divided into several horizontal subsets (rows), each one having a subset
of all attributes (columns)

Advantages of Fragmentation:
 Horizontal
o Allows parallel processing on fragments of a relation
o Allows a relation to be spilt so that tuples are located where they are most frequently
accessed.
 Vertical
o Allows tuples to be split so that each part of the tuple is stored where it is most
frequently accessed.
o Tuple – id attribute allows efficient joining of vertical fragements.
 Vertical and Horizontal fragmentation can be mixed
o Fragments may be successively fragmented to an arbitrary depth.
 Client Server Databases

 In the Oracle client/server architecture, the database application and the database are
separated into two parts: a front-end or client portion, and a back-end or server portion.
 The data processing is split into distinct parts. A part is either requester (client) or provider
(server). The client sends during the data processing one or more requests to the servers to
perform specified tasks. The server part provides services for the clients.
 The client and server are located on different computers; these computers are connected via
a network
 A single computer has more than one processor, and different processors separate the
execution of the client application
Request
Client Server
Response
 A client makes a request for a service and receives a replay to that request.
 A server receives and processes a request, and send back the required response
 The client/server systems may contain two different types of architecture/models, they are
1. 2-Tier client/server model
2. 3- Tier client /server model
2-Tier Client / Server Model:
The two-tier is based on Client Server architecture. The two-tier architecture is like client server
application. The direct communication takes place between client and server. There is no
intermediate between client and server. Because of tight coupling a 2 tiered application will run
faster.
 The above figure shows the architecture of two-tier. Here the direct communication between
client and server, there is no intermediate between client and server.
 Database and Server are incorporated with each other, so this technology is called as “Client-
Server Technology“.
The Two-tier architecture is divided into two parts:

1) Client Application (Client Tier)
2) Database (Data Tier)
On client application side the code is written for saving the data in the SQL server database. Client
sends the request to server and it process the request & send back with data. The main problem of
two tier architecture is the server cannot respond multiple request same time, as a result it cause a
data integrity issue.
Advantages:
1. Easy to maintain and modification is bit easy
2. Communication is faster.
Disadvantages:
1. In Two tier architecture application performance will be degrade upon increasing the users.
2. Cost-Ineffective.

3-Tier Client / Server Model:
3-Tier architecture typically comprises a presentation tier, a business or data access tier, and
a data tier. Three layers in the three tier architecture are as follows:
1. Client Layer
2. Business Layer
3. Data Layer
Client Layer:
It is also called as Presentation layer which contains UI (User Interface) part of our
application. This layer is used for the design purpose where data is presented to the user or input
is taken from the user.
For example designing registration form which contains text box, label, button etc…..
Business Layer:
In this layer all business logic written like validation of data, calculations, data insertion etc.
This acts as a interface between client layer and Data access layer. This layer is also called the
intermediary layer helps to make communication faster between client and data layer.
Data Layer:
In this layer actual database is comes on the picture. Data access layer contains methods to
connect with database and to perform insert, update, delete, get data from database based on our
input data.
Advantages:
1. High performance, lightweight persistent objects
2. High degree of flexibility in deployment platform and configuration.
3. Improve data integrity and improved security (Client is not direct access to database).
Disadvantage:
1. Increase Complexity / Effort.

IV Sem RDBMS

Caricato da

Informazioni sul documento

Copyright

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

IV Sem RDBMS

Caricato da

Copyright:

B.

Com (CA) IV - Semester Relational Database Management System

RDBMS: It stands for Relational Database Management System. It is introduced by E.F.Codd in

Explain Meta data?

 In data base, we can store the data in the form of tables.

 Database Management System:

 DBMS is like a designing total, communication total; it allows interaction among

TYPES of DATA MODEL:-

Network Data Model:

Relational Data Model:

Empno Ename Job Deptno Deptno Dname

(primary key) (foreign key) (primary key)

0919 Arshad Manager 10 10 Manager

0920 Ajay Clerk 20 20 Clerk

0921 Nisha Manager 10 30 HR

Object oriented Data Model:

 File Based system / Traditional Based system

-NSV Degree Colleges, Jagtial

Conceptual Model Designer’s View

Internal DBMS View

-NSV Degree Colleges, Jagtial

Levels of data abstractions:-

 The Need for Three Level Architecture:

-NSV Degree Colleges, Jagtial

-NSV Degree Colleges, Jagtial

-NSV Degree Colleges, Jagtial

Relational and ER Models

-NSV Degree Colleges, Jagtial

Example of Key Constraints in a simple relational table –

Key constraints force that −

-NSV Degree Colleges, Jagtial

Referential integrity Constraints

Notation: ∏subject, author (Books)

 Entity Relationship Model (ERM):

ENTITY: - It is represented in the ERD by a Rectangle generally written in capital letters.

-NSV Degree Colleges, Jagtial

Many to one Relationship: This is reverse of the one to many relationships.

3. Ternary: Ternary relationship is a relationship among the instances of three entities it

 E-R Diagram or E-R D stands for Entity- Relationship Diagram.

-NSV Degree Colleges, Jagtial

Relationship symbols in ERD:-

 Types of Relational Keys:

-NSV Degree Colleges, Jagtial

1. Entity super type

-NSV Degree Colleges, Jagtial

FULL TIME EMPLOYEE

Example: in ER Model, part time employee entity is represented as shown below

PART TIME EMPLOYEE

FULL TIME EMPLOYEE PART TIME EMPLOYEE

Salary Hra Hourly Salary

-NSV Degree Colleges, Jagtial

 Defining Relationship for College Database and E-R Diagram:

-NSV Degree Colleges, Jagtial

-NSV Degree Colleges, Jagtial

SNO SNAME SUBJECT CODE FACULTY

 Update Anomaly: This anomaly is caused due to data redundancy. Redundant

-NSV Degree Colleges, Jagtial

SNO NAME GROUP FEE COURSE FACULTY

 Decompose complex tables into simple or small tables called as “Normalization”.

TABLE WITH ANOMALIES

Remove Multi valued attributes

Remove Transitive Dependency

Remove Remaining Anomalies

Remove Multi Valued Dependencies

Remove Remaining Anomalies

HIGHER NORMAL FORM

-NSV Degree Colleges, Jagtial