Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
What is Data?
Data is a raw collection of facts about people, places, things etc. That mean, unprocessed
data. All business activities deal with lot of data. Data recognized as an important asset
because data has the raw material from which information is derived. Business data are
likely to include facts about Employee, customers and products.
What is information?
Information is processed data. This is meaningful form of data and it is derived from
data. Data recognized as an imported accept because data has the raw material from which
information is derived.
Data Information
Data is raw collection of facts Information is meaningful data in an organized form
Data exists with the user Information is required by the user
This is unprocessed data This is processed data
What is DBMS?
Collection of logical related data in an organized form and set of application programs/
software is called Data Base Management System (DBMS). (or) collection of logical inter
related files and set of programs is called Data Base Management System (DBMS)., It
allows us to create databases.
What is database?
Collection of logical related data in an organized form is called Data Base.(or) collection of
logical inter related files is called Data Base., data base allows us to create objects such as
tables, views, indexes, procedures, etc…..
What is field?
A character or group of character that has a specific meaning is called field, in database, we
can store data in the form of tables, table consist of rows and columns, row called as records
or tuples, where as columns are called as fields or attributes.
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
What is record?
A logically connected set of one or more fields that describes a person, place or
thing…. . In DBMS, records also called as Rows, Tuples and Entity Instance etc…
0922 Neha Hr 30
Explanation:
In the above diagram, program-A, program-B and Program-C are accessing “Employee
Master File”. This file contains file descriptions such as Empno, Ename, Sal, Ta, Hra..etc. These
file descriptions are stored in program-A, program-B and Program-C. If we want to change the
field sixe of Ename then we need to change in Program-A, Program-B and Program-C.
2. Data Redundancy:
Storing the same data in many places is called Data Redundancy.
Data Redundancy occurs with duplication of data of files
This duplication of data (Redundancy) leads to higher storage and access cost.
In additional it may lead to data inconsistency.
In traditional file approach, data stored in FLAT FILES. A flat file contains records that
have “no structured interrelationship”. Which are maintained by the FILE SYSTEM under the
operating systems control.
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
3. Limited Data Sharing
Each department maintains its own files.
The user of one department can access files of that department only, they cannot
access file of other department
Ex: The user of orders department can access the file of orders department only,
they cannot access the files of other departments such as payroll department and
accounts departments
4. Lengthy Development time
It takes lot of time to develop the file management system.
Each new application begin with the designing of the file format, file descriptions
and writing file accessing logic.
Comparing with modern business environment FMS takes lot of time for
development.
5. Excessive Program maintenance
Cost is very high to maintain the application programs of FMS (File Management
system).
80% of information systems budget may be used for program maintenance.
If the data changes, the program has to be modified. Each new application must be
started from scratch by designing new file description.
Advantages of DBMS
Database: collection of logical related data in an organized form is called database. Database
can share data among different users. They can use the data for different purpose. Data in
database:
Is integrated
Can be shared
Can be concurrently accessed.
DBMS: -Collection of Logical Related Data in an organized form and Set of Programs Is Called
Data Base Management System (DBMS)
DBMS allows users to create access and modify the database.
The primary goal of a DBMS is to provide a convenient and efficient way to store, retrieve
and modify information.
The advantages are
1. Program Data Independency.
2. Minimal Data Redundancy.
3. Improve Data Consistency.
4. Improve Data sharing.
5. Improve Data accessibility.
6. Improve Data Quality.
7. Reduced Program Maintenance.
1. External
2. Conceptual
3. Internal Physical
ANSI/SPARC frame work has been expanded with the additional of a physical model to
explicitly address physical-level implementation details of the internal model.
EXTERNAL MODEL:
External model is the end user “view of the data environment”.
It represents the subset of the data base.
End users can use the application program to manipulate the data and generate
information.
End users use the application for performing specific task in an environment. But in
general companies are divided into several units such as sales, finance and marketing.
Each unit has its own requirements and constraints.
All departments use the same data in an organization.
ERDs are used to represent the external views. External view is known as an External
schema.
End user End user
External External
Logical Independence
Physical Independence
Physical
CONCEPTUAL MODEL:
To integrate all external views (entities, relationship& constraints) into a single view is
INTERNAL MODEL:
Once a specific DBMS has been selected, internal model maps the conceptual model
to the DBMS.
Internal model is the representation of the database as “seen” by the DBMS.
Internal model is depending on software therefore, a change in DBMS software
requires a change in internal model.
Change in internal model without effecting conceptual model is called
LOGICAL INDEPENDENCY.
PHYSICAL MODEL:
It describes how the data are stored in disks and how they can be accessed.
The physical model requires the definition of both the physical storage devices and
access methods.
This model is both software & hardware dependent.
Changes in physical model without effecting the internal model is called physical
independence
It is also called as physical schema.
The physical architecture describes the software components used to enter and
process data and how these software components are related and interconnected.
It is possible to identify a number of key functions which are common to most
database management systems.
All the database management systems have two basic sets of languages data
definition language (DDL) and data manipulation language (DML).
Based on various functions, the database system may be partitioned into the
following modules.
DDL Compiler:
The DDL compiler converts the data definition statements into a set of tables
containing metadata tables.
The DDL contains set of commands required to define the format of the data that is
being sorted.
These tables containing information concerning the database and are in a form that
can be used by other components of the DBMS.
DML Compiler:
Data Manipulation language which defines the set of commands that modify,
process data to create user definable output.
The DML statements can also write in an application program to normal procedural
calls in the host language.
File Manager:
File manager manages the allocation of space on disk storage.
It establishes and maintains the list of structures and indices defined in the internal
schema.
However, the file manager does not directly manage the physical output and input
of data.
Entity Instances:
Single occurrence of an entity type is called as entity instance. It can be also called as Record or
Tuples.
Employee:
EMPNO ENAME JOB CITY
3200 Nisha Manager Hyd
3201 Neha Clerk Pune
Attribute and its types:
An attribute is a characteristic or properties of an entity it is also called Field.
Attributes start with capital letter followed by small case letter. An entity could have
multiple attributes.
Followings are types of attributes
1. Mandatory / Required attribute
2. Optional attribute
3. Single value attribute
4. Multi value attribute
5. Simple attribute
6. Composite attribute
7. Derived attribute
8. Stored attribute
9. Identified attribute
Mandatory / required Attribute:
It is an attribute it contain a value that is required or mandatory.
Ex: Date of birth
Optional Attribute:
It is an attribute it may or may not be a value
Ex: E-mail id
Single Value Attribute:
It is an attribute which can have only one value
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
Ex: Admin.no, S.No
Multi Valued Attribute:
It is an attribute that may have more than one value for a given entity instance
Ex: Language knows attribute may take multiple values for each record that is a student
may know more than one language.
Language Knows Skills
English C
Telugu Java
Hindi Oracle
English C
Telugu Java
Hindi CPP
Simple Attribute:
It is an attribute that cannot be broken down into smaller components.
Ex:
Sno
10
20
30
Composite Attribute:
It is an attribute that can be broken down into two or more components.
Ex:
Name First Name Last Name
Petetchen Peter Chen
Sachin tendulkar Sachin Tendulkar
Derived Attribute:
It is an attribute whose values can be calculated from related attribute values.
Ex: TOTAL
Total attribute is a derived attribute, which can be calculated from the other attributes
M1,M2 and M3
Stored Attribute:
It is an attribute stored in a database to supply values for the derived attributes.
Ex: M1,M2 and M3 are stored attributes that supply values to calculate value for the
“AVERAGE” attribute.
M1 M2 M3
95 90 80
60 80 60
Identifier Attribute:
It is an attribute used to identify a row uniquely.
Ex: S.No is an identifier attribute that identifies a row uniquely. It can be referred to as
Relational Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are
called Relational Integrity Constraints. There are three main integrity constraints −
Key constraints
Domain constraints
Referential integrity constraints
Key Constraints
Keys are attributes or sets of attributes that uniquely identify an entity within its entity set.
An Entity set E can have multiple keys out of which one key will be designated as
the primary key.
Primary Key must have unique and not null values in the relational table.
In a relation with a key attribute, no two tuples can have identical values for key attributes.
A key attribute cannot have NULL values.
Domain Constraints
Domain Constraints specifies that what set of values an attribute can take. Value of each
attribute X must be an atomic value from the domain.
The data type associated with domains include integer, character, string, date, time,
currency etc. An attribute value must be available in the corresponding domain. Consider
the example below –
Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key
attribute of a relation that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different
or same relation, then that key element must exist.
This rule states that if a foreign key in Table 1 refers to the Primary Key of Table 2, then every
value of the Foreign Key in Table 1 must be null or be available in Table 2. For example,
Relational Algebra
Relational algebra is a formal language describing how new relations are created from
old ones.
It is useful tool for describing queries on a database management system
Each row in the table represent one tuple from the relationship and each column one
attribute.
Operations in Relational Algebra:
1. Union ( ): Union combines all rows from two tables except duplicate rows. The tables must
have the same attributes characteristics.
Notation: T1 T2
Ex: SQL> select sname from T1 union select sname from T2;
2. Intersect: It display only the rows that appears in both tables that is common row
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
Notation: T1 T2
3. Difference (Minus): It gives all rows from one table that are not found in other table that is
subtracts one table from other table.
T1 – T2 T2 – T1
4. Product( X ):Product displays all possible pairs of rows from two tables. It is also known’s as
Cartesian product. Ex: If first table has three rows and second table has two rows then the
product gives a list composed of 3 X 2 = 6 rows.
Notation: T1 X T2
Ex: SQL> select sname from T1 product select sname from T;
5. Divide: Division operator is a binary operator. Division operator is helps to process on more
than one defined relation with condition that match source relation tuples into target relation
tuple through common attributes.
Ex: R1 / R2.
6. Select( ): Select operator is used to give them a required information which is horizontal
subset of a table that is it give the records (rows).
σ
Notation: subject = "database" (Books)
7. Project ( ): Project displays the vertical subset of a table by using select command.
One to one relationship:- In this relationship, one instance of one entity is related to another
Example: One person (P1,P2,P3& P4) can sit on one chair at any point of time and also one
chair (C1,C2,C3 & C4) can accommodate a maximum of one person at any given time. In this
relationship, both the participating entities have one to one relationship.
One to many relationship:- One instance of entity is related to multiple instances of another
entity.
Example 1:- One organization (O1, O2,O3) can have many employees but one employee
(e1,e2,e3,e4,e5) can work only for one organization.
Many to many relationship: In this relationship, multiple instances of one entity related to
multiple instances of another entity.
Example 1:- One student (s1, s2, s3, s4) is enrolled from for many curse (c1, c2, c3, c4) and one
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
course is enrolled by many students.
Degree of Relationship:
The number of entities to be participated in a relationship that is called “Degree of
Relationship”. Based on degree of relationship can classified into three types they are
1. Unary: The relationship between the instances of a single entity is called Unary
relationship it is also called as Recursive relationship. It represents the degree of
relationship is one.
2. Binary: Binary relationship is a relationship between the instances of two entities this is
most common relationship in data modeling. It represents degree of relation is
A database is collection of entities, where an entity is set of related data. This is arranged in
the form of a table i.e. collection of several rows and columns. A database table includes
several following components (or) keys.
1. Primary key
2. Composite key
3. Candidate key
4. Foreign key
5. Secondary key
Primary Key: Combination of unique and not null constraints considered as primary key. It
cannot accept duplicate and not null values. Every database table contains minimum one
primary key. The primary key referred as the ENTITY INTEGRITY constraint.
SNO (Primary Key) Name Addres
1001 Smith New York
1002 Scott London
Composite Key: A primary key that consists of a combination of attributes known as “Composite
key”. (Or) A table contains more than one primary key attribute considered as “Composite key”
attribute.
Candidate Key: A candidate key is a set of one (primary key) or more (composite key) attributes
that can uniquely identify a row consider as “candidate key”. A table can have multiple candidate
keys. (or) Either primary key or composite key called “candidate key”.
Secondary Key: A non-primary search key is known as a “secondary key”. If you don‟t know
the primary key value, some other attribute or combination of attributes may be used to find the
record. (OR) The other (non-primary key) attribute or combinations of attribute called as
“secondary key”.
More about Entities and Relationships:
The basic features of E-R diagrams are sufficient to design many database situations.
However, with more complex relations and advanced database applications It is
required to move to enhanced feature of E-R models.
EERM stands for extended or enhanced entity relationship model.
EERM is the result of adding more semantic constructs to the original entity
relationship (ER) model
EERM represents organizational data using diagrams is called as EERD (
extended/enhanced relationship model)
EER model constructs are :
3 Entity clustering: - An entity clustering is a virtual entity type used to represent multiple
entities and relationships in the ERD. An entity cluster is formed by combining multiple
interrelated entities into a single abstract entity object.
Generalization:
Generalization is a process of defining more general types from a set of more specialized.
This is the bottom up process of identifying a parent object from child object.
Generalization is based on grouping common characteristics and relationship of the
subtypes
Ename
Empno Address
Salary Hra
Ename
Empno Address
Hourly_salary
Example: EERM diagram with Super Type and Sub Type Relationship for EMPLOYEE super
type with 2 sub types Full Time Employee and Part Time Employee
Ename
Empno Address
EMPLOYEE
Specialization
Specialization is a process, which is opposite to generalization. In specialization, a group of
entities is divided into sub-groups based on their characteristics. Take a group Person for
Example. A person has name, date of birth, gender etc. These properties are common in all
Pname
Product Id Quantity
PRODUCT
Cost
Example: -: : EERM diagram with Super Type and Sub Type Relationship for PRODUCT super
type with 2 sub types CUSTOMER and Part SUPPLIER
Functional Dependency:
The dependency between attributes is called “Functional dependency”. They are classified
into two types, they are
2. Transitive Dependency: A Dependency base on attribute that is not part of the primary
key is known as Transitive dependency. From the above example group and fee are the
non-primary key, fee depending on group.
Normalization
1 NORMAL FORM
Remove Partial Dependency
2 NORMAL FORM
4 NORMAL FORM
Consider the following table with anomalies. The table has multi-valued attribute “skills”. Let us
decompose the tables into structured table step by step using Normalization.
Explanation: The above table (student) is in first normal form (1 NF) because, it contains no
multi-valued attributes. But it is not in 2Normal form (2NF) because, it has partial
dependency. i.e. Sno, Name, Group, fee are functionally dependent on part (but not all) of
the primary key (Sno).
2. Second Normalization form (2NF):
Condition:-
1. It is in the first Normal Form (1NF), and
2. No partial dependency exists between non-key attributes and key attributes.
Solution: To make above the tables (student) to 2NF, we have t remove all the partial
dependencies. To remove this partial dependency, we need to split student table into 2
separate tables, STUDENT, SKILL.
Explanation: The above tables are in second Normal Form (2NF) because, it is in first normal
form and every non-key attributes is fully functionally dependent on the primary key. But it is
not in 3NF
Explanation: The above tables (STUDENT, SKILL and GROUP) are in third normal form
(3NF) because it is in second normal form and no transitive dependencies.
Boyce Codd Normal Formal (BCNF):
A relation is in BCNF if and only if every determinant in the relation is a candidate key;
here we need to remove resulting anomalies from functional dependency. A relation that is in
3NF can be converted to relations in BCNF using a simple two step process.
1. The relation is modified so that the determinant in the relation that is not a
candidate key becomes a component of the primary key of the revised table.
2. Decomposing the relation to eliminate the partial functional dependency.
Consider the following table
SID SUBJECT FACULTY MARKS
Now,
Subject (non-key) is functionally dependent on faculty (key). Thus, there exists
partial functional dependency. The determinant becomes the part of composite
primary key.
According to the step 2, it can be decomposed to satisfy BCNF as:
DECOMPOSITION
Decomposition is the process of breaking down in parts or elements.
It replaces a relation with a collection of smaller relations.
It breaks the table into multiple tables in a database.
It should always be lossless, because it confirms that the information in the original relation
can be accurately reconstructed based on the decomposed relations.
If there is no proper decomposition of the relation, then it may lead to problems like loss of
information.
Properties of Decomposition
Following are the properties of Decomposition,
1. Lossless Decomposition
2. Dependency Preservation
3. Lack of Data Redundancy
1. Lossless Decomposition
Decomposition must be lossless. It means that the information should not get lost from the
relation that is decomposed.
It gives a guarantee that the join will result in the same relation as it was decomposed.
Example:
Let's take 'E' is the Relational Schema, Withinstance 'e'; is decomposed into: E1, E2, E3, . . . . En;
With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called as 'Lossless Join
Decomposition'.
In the above example, it means that, if natural joins of all the decomposition give the
original relation, then it is said to be lossless join decomposition.
Decompose the above relation into two relations to check whether a decomposition is
lossless or lossy.
Now, we have decomposed the relation that is Employee and Department.
2. Dependency Preservation
Databases are stored in file formats, which contain records. At physical level, the actual data is
stored in electromagnetic format on some device. These storage devices can be broadly
categorized into three types −
Primary Storage − The memory storage that is directly accessible to the CPU comes under this
category. CPU's internal memory (registers), fast memory (cache), and main memory (RAM) are
directly accessible to the CPU, as they are all placed on the motherboard or CPU chipset.
Secondary Storage − Secondary storage devices are used to store data for future use or as backup.
Secondary storage includes memory devices that are not a part of the CPU chipset or
motherboard, for example, magnetic disks, optical disks (DVD, CD, etc.), hard disks, flash drives,
and magnetic tapes.
Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since such storage
devices are external to the computer system, they are the slowest in speed. These storage devices
are mostly used to take the back up of an entire system. Optical disks and magnetic tapes are
widely used as tertiary storage.
Magnetic Disks
Hard disk drives are the most common secondary storage devices in present computer
systems. These are called magnetic disks because they use the concept of magnetization to
store information.
Hard disks consist of metal disks coated with magnetizable material. These disks are placed
vertically on a spindle. A read/write head moves in between the disks and is used to
magnetize or de-magnetize the spot under it. A magnetized spot can be recognized as 0
(zero) or 1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has
many concentric circles on it, called tracks. Every track is further divided into sectors. A sector on
a hard disk typically stores 512 bytes of data.
RAID 1: RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy
of data to all the disks in the array. RAID level 1 is also called mirroring and provides 100%
redundancy in case of a failure.
RAID 2: RAID 2 records Error Correction Code using Hamming distance for its data, striped on
different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes
of the data words are stored on a different set disks. Due to its complex structure and high cost,
RAID 2 is not commercially available.
RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored on a
different disk. This technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data disks and then the parity is generated and
stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses block-
level striping. Both level 3 and level 4 require at least three disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data block
stripe are distributed among all the data disks rather than storing them on a different dedicated
disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated and stored
in distributed fashion among multiple disks. Two parities provide additional fault tolerance. This
level requires at least four disk drives to implement RAID.
File organization:
Databases are used to store information. Normally the principal operations are to perform
on database are relating to creating data, retrieving data, modifying and deleting some
information which is no longer useful or valid.
Databases are used to store information in the form of files of records and are typically
stored on magnetic disks.
This unit focuses on the file organization in DBMS, the access methods available and the
system parameters associated with them.
File organization is the way the files are arranged on the disk and access method is how the
data can be retrieved based on the file organization.
In a database we have lots of data. Each data is grouped into related groups called tables.
Each table will have lots of related records. Any user will see these records in the form of
tables in the screen. But these records are stored as files in the memory.
Storing the files in certain order is called file organization. The main objective of file
organization is
o Optimal selection of records i.e.; records should be accessed as fast as possible.
o Any insert, update or delete transaction on records should be easy, quick and
should not harm other records.
o No duplicate records should be induced as a result of insert, update or delete
o Records should be stored efficiently so that cost of storage is minimal.
In the diagram above, R1, R2, R3 etc are the records. They contain all the attribute of a row. i.e.;
when we say student record, it will have his id, name, address, course, DOB etc. Similarly R1, R2,
R3 etc can be considered as one full set of attributes.
In the second method, records are sorted (either ascending or descending) each time they
are inserted into the system. This method is called sorted file method. Sorting of records
may be based on the primary key or on any other columns. Whenever a new record is
inserted, it will be inserted at the end of the file and then it will sort – ascending or
descending based on key value and placed at the correct position. In the case of update, it
will update the record and then sort the file to place the updated record in the right place.
Same is the case with delete.
This is the simplest form of file organization. Here records are inserted at the end of the file
as and when they are inserted. There is no sorting or ordering of the records.
Once the data block is full, the next record is stored in the new block. This new block need
not be the very next block.
This method can select any block in the memory to store the new records. It is similar to
pile file in the sequential method, but here data blocks are not selected sequentially. They
can be any data blocks in the memory.
It is the responsibility of the DBMS to store the records and manage them.
When a record has to be retrieved from the database, in this method, we need to traverse
from the beginning of the file till we get the requested record. Hence fetching the records in
very huge tables, it is time consuming. This is because there is no sorting or ordering of the
records. We need to check all the data.
Similarly if we want to delete or update a record, first we need to search for the record.
Again, searching a record is similar to retrieving it- start from the beginning of the file till
the record is fetched. If it is a small file, it can be fetched quickly. But larger the file, greater
amount of time needs to be spent in fetching.
In addition, while deleting a record, the record will be deleted from the data block. But it
will not be freed and it cannot be re-used.
Advantages of Heap File Organization
Very good method of file organization for bulk insertion. i.e.; when there is a huge number
of data needs to load into the database at a time, then this method of file organization is
best suited. They are simply inserted one after the other in the memory blocks.
It is suited for very small files as the fetching of records is faster in them. As the file size
grows, linear search for the record becomes time consuming.
Disadvantages of Heap File Organization
This method is inefficient for larger databases as it takes time to search/modify the record.
Proper memory management is required to boost the performance. Otherwise there would
be lots of unused memory blocks lying and memory size will simply be growing.
In this method of file organization, hash function is used to calculate the address of the
block to store the records. The hash function can be any simple or complex mathematical
function.
The hash function is applied on some columns/attributes – either key or non-key columns
to get the block address.
Hence each record is stored randomly irrespective of the order they come. Hence this
method is also known as Direct or Random file organization.
If the hash function is generated on key column, then that column is called hash key, and if
hash function is generated on non-key column, then the column is hash column.
When a record has to be retrieved, based on the hash key column, the address is generated
and directly from that address whole record is retrieved. Here no effort to traverse through
whole file.
Similarly when a new record has to be inserted, the address is generated by hash key and
record is directly inserted. Same is the case with update and delete. There is no effort for
searching the entire file or sorting the files. Each record will be stored randomly in the
memory.
These types of file organizations are useful in online transaction systems, where retrieval or
insertion/updation should be faster.
Advantages of Hash File Organization
Records need not be sorted after any of the transaction. Hence the effort of sorting is
reduced in this method.
Since block address is known by hash function, accessing any record is very faster.
Similarly updating or deleting a record is also very quick.
This method can handle multiple transactions as each record is independent of other. i.e.;
since there is no dependency on storage location for each record, multiple records can be
accessed at the same time.
It is suitable for online transaction systems like online banking, ticket booking system etc.
Disadvantages of Hash File Organization
Since all the records are randomly stored, they are scattered in the memory. Hence memory
is not efficiently used.
If we are searching for range of data, then this method is not suitable. Because, each record
will be stored at random address. Hence range search will not give the correct address
In this method, if any record has to be retrieved, based on its index value, the data block address is
fetched and the record is retrieved from memory.
Advantages of ISAM
This method gives flexibility of using any column as key field and index will be generated
based on that.
In addition to the primary key and its index, we can have index generated for other fields
too.
Hence searching becomes more efficient, if there is search based on columns other than
primary key.
Disadvantages of ISAM
An extra cost to maintain index has to be afforded. i.e.; we need to have extra space in the
disk to store this index value. When there is multiple key-index combinations, the disk
space will also increase.
As the new records are inserted, these files have to be restructured to maintain the
sequence. Similarly, when the record is deleted, the space used by it needs to be released.
Else, the performance of the database will slow down.
Types of Indexes
Indexing is a way to optimize performance of a database by minimizing the number of disk
accesses required when a query is processed.
An index or database index is a data structure which is used to quickly locate and access
the data in a database table.
Indexes are created using some database columns.
o The first column is the Search key that contains a copy of the primary key or
candidate key of the table. These values are stored in sorted order so that the
Indexing Methods
Ordered Indices: The indices are usually sorted so that the searching is faster. The indices which are
sorted are known as ordered indices.
If the search key of any index specifies same order as the sequential order of the file, it is
known as primary index or clustering index.
Note: The search key of a primary index is usually the primary key, but it is not necessarily
so.
If the search key of any index specifies an order different from the sequential order of the file,
it is called the secondary index or non-clustering index.
Clustered Indexing: Clustering index is defined on an ordered data file. The data file is ordered on a
non-key field. In some cases, the index is created on non-primary key columns which may not be unique
for each record. In such cases, in order to identify the records faster, we will group two or more columns
together to get the unique values and create index out of them.
This method is known as clustering index. Basically, records with similar characteristics are
grouped together and indexes are created for these groups.
Primary Index
In this case, the data is sorted according to the search key. It induces sequential file organisation.
In this case, the primary key of the database table is used to create the index. As primary keys are
unique and are stored in sorted manner, the performance of searching operation is quite efficient.
The primary index is classified into two types : Dense Index and Sparse Index.
(I) Dense Index:
For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with that
search key value.
Secondary Index:
It is used to optimize query processing and access records in a database with some
information other than the usual search key (primary key).
In this two levels of indexing are used in order to reduce the mapping size of the first level
and in general. Initially, for the first level, a large range of numbers is selected so that the
mapping size is small. Further, each range is divided into further sub ranges.
In order for quick memory access, first level is stored in the primary memory. Actual
physical location of the data is determined by the second mapping level.
Clustering Index:
In some cases, the index is created on non-primary key columns which may not be unique
for each record.
In such cases, in order to identify the records faster, we will group two or more columns
together to get the unique values and create index out of them. This method is known as
clustering index.
Basically, records with similar characteristics are grouped together and indexes are created
for these groups.
For example, students studying in each semester are grouped together. i.e.; 1st Semester students,
2nd semester students, 3rd semester students etc are grouped.
In above diagram we can see that, indexes are created for each semester in the index file. In
the data block, the students of each semester are grouped together to form the cluster.
The address in the index file points to the beginning of each cluster. In the data blocks,
requested student ID is then search in sequentially.
New records are inserted into the clusters based on their group. In above case, if a new
student joins 3rd semester, then his record is inserted into the semester 3 cluster in the
secondary memory. Same is done with update and delete.
If there is short of memory in any cluster, new data blocks are added to that cluster.
This method of file organization is better compared to other method as it provides clean
distribution of record, and hence making search easier and faster.
Tree Structure:
Tree represents the nodes connected by edges. We will discuss binary tree or binary search
tree specifically.
Binary Tree is a special data structure used for data storage purposes. A binary tree has a
special condition that each node can have a maximum of two children.
A binary tree has the benefits of both an ordered array and a linked list as search is as
quick as in a sorted array and insertion or deletion operation are as fast as in linked list.
B+ Tree :
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf
nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the
same height, thus balanced. Additionally, the leaf nodes are linked using a link list; therefore, a B +
tree can support random access as well as sequential access.
Structure of B+ Tree: Every leaf node is at equal distance from the root node. A B+ tree is of the
order n where n is fixed for every B+ tree.
Internal nodes:
Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
At most, an internal node can contain n pointers.
Leaf nodes:
Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
At most, a leaf node can contain n record pointers and n key values.
Every leaf node contains one block pointer P to point to next leaf node and forms a linked list.
B+ Tree Insertion: B+ trees are filled from bottom and each entry is done at the leaf node.
If a leaf node overflows:
Split node into two parts.
Partition at i = ⌊ (m+1)/2 ⌋.
First i entries are stored in one node.
Rest of the entries (i+1 onwards) are moved to a new node.
ith key is duplicated at the parent of the leaf.
If a non-leaf node overflows:
Split node into two parts.
Partition the node at i = ⌈(m+1)/2⌉.
Entries up to i are kept in one node.
Rest of the entries are moved to a new node.
SQL Introduction:
Characteristics of SQL:
SQL is extremely flexible.
SQL uses a free form syntax that gives the ability to user to structure the SQL statements in
a best suited way.
It is a high level language.
It receives natural extensions to its functional capabilities.
It can execute queries against the database.
Advantages of SQL:
SQL provides a greater degree of abstraction than procedural language.
It is coded without embedded data-navigational instructions.
It enables the end users to deal with a number of database management systems where it is
available.
It retrieves quickly and efficiently huge amount of records from a database.
No coding required while using standard SQL.
DDL (Data Definition Language):
Its stands for Data Definition Language. These SQL commands are used for Creating,
Modifying and Dropping the structure of database objects.
The commands are
o Create
o Alter
o Drop
o Truncate
o Rename
o Desc / Describe
2. Alter: This command is used to modify the definition (Structure) of a table by modifying
the definition of its columns.
These commands are used to perform the following functions.
a. ADD a new column to existing table.
3. Drop: The SQL drop command is used to remove and object from database. The drop
command removes the table structure from the database including all the rows in the table.
Once a table is dropped it is not retrieved back.
Syntax: SQL> drop table <table_name>;
Ex: SQL> drop table student;
4. Truncate: The SQL Truncate command is used to delete all rows from the table and free the
space containing the table.
Syntax: SQL> truncate table <table_name>;
Ex: SQL> truncate table emp1;
5. Rename: To Change the name of the selected table from old name to new name.
Syntax: SQL> rename table <old_name> to <New_name>;
Ex: SQL> rename table student to student1;
6. Desc / Describe: Use Describe to list all the columns in the specify table or view.
Syntax: SQL> Desc <table_name>;
Ex: SQL> desc student1;
-----------------------------------------------
Sno Number(3)
name Varchar2(10)
add Varchar2(10)
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
DML (Data Manipulation Language)
DML is abbreviation of Data Manipulation Language. The DML commands help the user to
access the table and insert the content into the table of modify the data or delete the data
available in the table.
Some of the commands which are used to enter the data and perform manipulation or
retrieve the selected content from the tables are
o Select
o Insert
o Update
o Delete
Select: The most commonly used SQL command is SELECT statement. The SQL select statement is
used to query or retrieve data from a table in the database.
The above command display all the records and fields from the student table.
The above command is used to display selected columns with all records available
table.
The above command is used to display all the columns for the record based on the
condition.
The command is used to display selected columns and selected record based on the condition.
Insert command is used to enter and store the values in the database object in the defined
order and type.
The data values given by the user in the insert statement should match with the data type
declared for the selected column in the table.
SQL insert command is implemented in 3 methods as
o Simple insert statement
Syntax: SQL> insert into <table_name> values (column1 value, column2 value,…………..
Column n value);
EX: SQL> insert into student values(1, ‘raju’, ‘knr’, 30);
The above insert statement is used to add new rows of data to table.
o Partial insert statement
Syntax:
SQL> insert into <table_name> <column_list> values(column1 value, column2 value,
………………….Column n value);
Ex: SQL> insert into student (sno, ‘name’, ‘add’, age) values (2, ‘ramu’, ‘hyd’, 25);
Order of values are depended on the column name given with insert command
o Interactive insert statement
Syntax: SQL> insert into <table_name> values(&column_name1,
‘&column_name2’,………………..
&column_name n);
Ex: SQL> insert into student values (&sno, ‘&name’, ‘&add’, age);
Enter value for sno : 3
Enter value for name : ravi
Enter value for add : JGL
Enter value for age : 30
1 row inserted
SQL> /
The above command help the user to insert 1 record and repeat the statement by
specifying / (forward slash key) operator in the command and the statements are
repeated for n number of times till the user pass other SQL command in the prompt.
Update:
A DML command used to edit or update the data based on condition for the selected
record or field.
A update command is implement with a set keyword which is used to overwrite the
previous values with current values.
This command is used to change and modify the data values in a table.
Syntax: SQL> update <table_name> set <column_list>=<values> where <condition>;
Ex: SQL> update student set name= “SCOTT” where sno=3;
Multiple update to edit or change the values for multiple fields or single record based on
condition.
Syntax: SQL> update <table_name> set <column_list>=<values> where <condition>;
Ex: SQL> update student set name= ‘sindhu’, age=20 where sno=5;
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
Delete
SQL Delete statement is used to delete records from the table based on user choice either
single record or multiple records based on conditions the criteria is specified using where
clause.
The SQL delete statement allows deleting a single record or multiple records from a table.
Syntax: SQL> delete from <table_name> where <column_name>=<expression>;
SQL> delete from student where sno=5;
If you want to delete multiple records from table.
Syntax: SQL> delete from <table_name>;
SQL> delete from student;
The Data Controlling Language (DCL) is a subset of the Structured Query Language (SQL)
that allows database administrators to configure security access to relational database.
DCL is the simplest of the SQL subsets, DCL commands are used to enforce database
security in a multiple user database environment
Two types of DCL commands are GRANT and REVOKE.
Grant: SQL GRANT is a command used to provide access or privileges on the database objects
to the users
Revoke: SQL REVOKE command is used to remove the given privileges from the selected user
of the database.
Commit and Rollback are transaction statements that are used in database access; they
can also be called Data Controlling Language for SQL.
A commit statement does what is says, and commits all the changes made to data that
have been made during the current transaction
Syntax: SQL> commit:
The process of undoing a change to a database the ROLLBACK statement can be used to
either end a unit of work and back out all the relational database changes that were
made by that unit of work.
Syntax: SQL> rollback:
Ex: SQL> drop table student;
SQL> rollback;
Data type represents the type of data an attribute or a variable will hold. The common data types
used in are listed below.
1. Number.
2. Char.
3. Varchar/varchar 2.
4. Integer / int
5. Date/Time.
6. Lob (Large Objects).
NUMBER: It will accept only numbers.
Syntax: Number (x, y) X=decimal digits , Y= floating digits
Example: Number (3, 2) O/p: 312.45
CHAR:
It will accept characters up to 2000 character. It is a fixed length data type.
When we use char data type memory wastage will be there that means it will allocate 2000
characters for every specified column.
Syntax: Char (20);
That means it accept 20 characters.
VARCHAR / VARCHAR2 :
It will accept numbers and text up to 4000 characters.
It is variable length data type.
It is better than use to char data type.
Syntax: Varchar 2 (“25”)
INTEGER/INT: - It will accept decimal and binary values.
Syntax: - Int (5);
Date/Time: - It is used to assign date data type to the columns.
Format of date data type is DD: MM: YY
Syntax: Column name date;
Ex: - DOB date
LARGE OBJECTS:
It is a new data type which is introduced recently in SQL.
It contains mainly two data types.
1. BLOB (Binary Large Object)
2. CLOB (Character Large object)
Both data types can accept 4GB of data.
These objects are store graphics, Images, files and documents etc.
SNo Name
1 Nani
1 Hima
Unique: It cannot accept duplicate values. It will accept only unique values. But it will
accept null values.
Syntax: - create table Table_Name Column_Name data type unique
Example: - create table student (sno number (3) unique);
Primary Key:
Combination of unique and Not-Null is called ―Primary Key. It will accept only
unique values and it will not accept null values (empty)
Syntax: AdmiNo Number (4) Primary Key.
Foreign Key: - Establishing the relationship or connection between two tables based on
same column name.
Syntax: create table table_name2 column_name data type references table1 (column name)
Example: create table dept( deptno number(4) primary key, dname);
Create emp table( empid number(4), ename varchar2(20), deptno number(4) references
dept(deptno)
Check: - Check constraint is used to check the user conditions. It will allow us to enter
when the check condition is true.
Syntax: - create table Table_Name (Column_Name data type check (condition)
Example: - create table student (marks number (3) check (marks>=0 and marks<=100);
Default: - This constraint is used to assign a default value to a specific column.
Syntax: - create table Table_Name (Column_Name data type default value);
Example: - create table svndc (college_code Number (6) default = 7063);
When a user skip the default value column it will automatically insert the default value
otherwise it will store the inserted value.
Ex: Create Table student (S.No Number (4), Primary Key, Name Varchar (20) Not null,
College Code Number (4) Default = 20,
M1 Number (3) Check (M1 > = 0 AND M1 < = 100) M2 Number (3) check
(M2 > = 0 AND m2 < = 100)):
And: Select * from EMP where job= ‘clerk‘ and sal < = 2000;
Or: - Select * from EMP where Job= ‘clerk‘ or sal < = 2000;
Set Operator
Union: combines unique records with matching attributes of multiple tables
Union all: combine all records matching attributes of multiple tables including duplicate
Intersect: retrieves common records of multiple tables
Minus: -retrieves other than common records with matching attributes from the first table.
Special Operator
IN: - Check whether a value matches any value in a list.
BETWEEN AND: - Check whether a value is within a range
LIKE: - Checks whether a value matches a string pattern.
IS NULL: - Check whether a value is null.
EXISTS: - Check whether a job query returns a row.
||: - Concatenates two values.
DISTINCT: - To retrieve unique values from a table.
Examples:
Select * from emp where Ename In (“SMITH”, “JAMES”);
----------It will show records where ename are smith and James.
Select * from EMP where sal is BETWEEN 2000 AND 4000;
----------It will show the records where salaries between 2 to 4 thousands.
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
Select * from EMP where Ename like „S%‟;
----------It will display the records where name start
with S.
Select * from EMP where COMM IS NULL
----------Displays where commission is Null.
Select Deptno from Dept whereNOT EXISTS (Select Deptno from EMP where emp deptno =
Dept.Deptno);
----------It will display the dept no S which is not available in EMP.
Select “Asshu‟ || “Qadri‟ from dual;
----------O/p: Asshuqadri
Select DISTINCT job from EMP;
----------It will display unique job record.
Functions in SQL
SQL provides several built-in-functions.
Built-in functions are mainly used to get the results easily/quickly.
FUNCTIONS IN SQL:
Aggregate (group)function
Date and Time
Numeric
String
Conversion
Miscellaneous
AGGREGATE: -
These functions are used to group of values, such as finding max, min, Avg and calculate
SUM etc.
It allows group by close to categorize results.
Aggregate functions can be divided into 2 types. Scalar aggregate, vector aggregate
Scalar aggregate:
Retrieving a single value using SQL query which includes aggregate function is
called scalar Aggregation.
Example: select Sum (sal) from EMP;
Vector Aggregate:
Using Aggregate functions retrieving multiple values from table is called “Vector
Aggregate”. Example: select Deptno, sum (sal) from emp group by dept no;
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
FUNCTION PURPOSE EXAMPLES
SUM Find sum of value of a column Select SUM (sal) from EMP;
COUNT Count the records in a column Select count (*) from EMP;
Select: Select clause is used to list the columns from table or view.
Ex: SQL> select * from emp;
From the above example all columns selected from emp table.
From: From clause is used to purpose of identifying the tables or views from which column will
be chosen
Ex: SQL> select empid, ename from emp;
From the above example empid, ename columns are selected from emp table.
Where: Where clause is used to purpose of includes the conditions for rows selection within a
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
single table or multiple tables or views.
Ex: SQL> select ename from emp where job= ‘clerk’;
From the above example am selecting ename based on condition job as clerk.
Distinct: Distinct clause is used to purpose of avoid duplicate rows that is unique rows / records
Ex: SQL> select * from emp where distinct empid;
Groupby: This clause is used to purpose of groups rows according to the categories of specifies.
Ex: SQL> select * from emp group by job;
Order By: this clause is used to purpose of sorts the final result rows in ascending order or
descending order.
Ex: SQL> select * from emp order by sal;
Having: This clause is used to purpose of can only be used with a group by and add as a
secondary where clause.
Ex: SQL> select * from emp where ename having allen;
Sorting by multiple columns ascending order on department number and descending order of
salary in each department;
Equi-Join: It will return only matched records in two tables based on join condition.
Select Rno,name, std.cid, course.cid, cname from std, course where std.cid=course.cid;
Non Equi-Join: It will display Non-Equality join condition
Select R. no, name, STD.CID, Course.CID, CName from STD, Course where STD.CID! =
Course.CID;
. No Name CName
10 Niru C
10 Niru Java
10 Niru Oracle
20 Hima C
20 Hima Java
20 Hima Oracle
30 Monica C
30 Monica Java
30 Monica Oracle
Natural Join: same as equi join except one of the duplication columns is eliminated in the result
table.
Select rno,name , std.cid,course.cid,cname from std, course where std.cid=course.cid;
STD table Course table
Rno Name CID CID CName
30 Hima 33 10 C
40 Sri 44 20 Java
50 Mounika 55 30 Oracle
Outer Join:
Outer join returns not only the matching rows but also the unmatched attribute values
from one or more table.
33 Hima 10 C
44 sri 30 Oracle
55 Mounika 80
CROSS JOIN:
A cross join performs a relational product of two tables. It is also called as Cartesian
product join.
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
Syntax: Select column_list from table1 cross join table2
Example: Select Name, CName from STD cross join course;
NAME CNAME
Hima C
Sri C
Mona C
Hima Java
Sri Java
Mona Java
Hima Oracle
Sri Oracle
Mona Oracle
SQL Views:
View is a virtual table based on a select query.
A Query (select) contains column. Computed columns, aliases and aggregate functions
from one or more tables.
The table on which the view is based is called “Base Table”.
Views can be treated as a stored query.
Create is a command is used to create the views.
Select statement used to generates the virtual table.
ADVANTAGES:
Views provide data security.
Views are little storage space.
Need not to write Query again and again for retrieving same data and same query.
It is a stored Query.
Views are dynamically updated.
We can use the name of a view anywhere.
Views can restrict users to only specified columns and specified rows in a table.
Views may also be used to generate reports.
LIMITATIONS:
Views cannot be indexed.
Limitations on updating data through views.
To Create a View:
Syntax: Create view View_Name as select (column list) from table name where condition.
Ex: Create view EMP dept details as select EMP No, EMP NAME, JOB, Dept No, Dname From
EMP, DEPT where EMP.deptno=Dept.Deptno.
Run A View:
Select * from EMP dept details;
Types of Views:
There are two types of view simple view and complex view
Ex: create a sequence name SEQSS that starts at 105, has a step of 1 and can take maximum value
as 2000.
Create sequence seqss start with 105 incremented by 1 max value 2000;
Synonyms:
A synonym is an alias for a database object (table, view, procedure, function, package,
sequence, etc.). Synonyms may be used to reference the original object in SQL as wel as
PL/SQL.
They can be used to hide ownership and location of the database objects they refer to and
minimize the impact of moving or renaming the database objects.
There are two types of synonyms
Private: Private synonyms exist only in a specific user schema. The owner of the synonym
maintains control over availability to other users.
Public: A public synonym is available to all users in the database
Ex: SQL> create synonym D30 for EMPD30;
Now if we give command
Select * from D30;
o Active state: In this state the transaction is being executed. This is the initial state of every
transaction.
o Partially committed state: when a transaction executes its final operation, it is said to be in
this state. After execution of all operations, the database system performs some checks.
o Failed state: If the execution of the transaction cannot be processed, then the transaction
enters the fail state where the system performs the rollback operation.
o Abort state: The roll back transaction enters the abort state where the system performs any
one of the following operations.
o Restart: If the transaction was aborted due to hardware or software failure, then the
transaction can be restarted, such transaction is considered as a new transaction.
o Kill: If the transaction was aborted due to some internal logical error or due to bad
input then the system can kill the transaction.
o Full committed state: If transaction executes all its operations successfully it is said to be
committed. All its effects are now permanently made on database system.
Concurrency Control:
o Simultaneous execution of transactions in a multiuser data base system is known as
o “Concurrency”.
o Concurrency control is the process of managing simultaneous operation.
o Simultaneous execution of transactions over a shared data base can create several data
integrity and consistency problems. This situation occurs in multiuser data base
environment
o Database allows No. of users to perform their individual operations, that may create several
o Data integrity and consistency problems.
Need concurrency Control :
Simultaneous execution of transactions over a shared database can create several data
integrity and consistency problems.
1. Lost updates
2. Uncommitted data
3. Inconsistent retrievals
Lost Updates: The lost update problem occurs when two concurrent transactions are updating
the same data elements and one of the updates is lost (over written by other transaction).it is a
kind of write/write conflict.
Uncommitted data (dirty-read):- the uncommitted data problem occurs when two
transactions, executed concurrently and first transaction is rolled back after the second
transaction has already accessed the uncommitted data. It is a kind of write/read conflict
and violates isolation property. This is also called as Dirty-Read Problem.
Inconsistent retrievals: - Inconsistent retrievals occur when a transaction accesses data before
and after another transaction finish working with such data. The problem is that transaction
might read some data before they are changed and other data after they are changed, thereby
yielding inconsistent (wrong) results. This is a kind of write/read conflict.
Locking Protocol:
o Locking protocol is set of rules that are followed by every transaction of a DBMS, as this
protocol is capable of ensuring that only serializable , recoverable schedules are executed.
o A lock is basically a variable that is associated with a data item in the database.
o A lock can be placed by many transactions on a shared resource that it desire to use.
Locks:
Serialisability is just a test whether a given interleaved schedule is ok or has a concurrency
related problem.
So let us discuss about what the different types of locks are and then how locking ensures
serialisability of executing transactions:
Binary Lock: This locking mechanism has two states for to a data item locked or unlocked.
Multiple-mode lock: In this locking type each data item can be in three states read locked,
write locked and or unlocked.
There are many transactions in the database system that never updates the data values,
these transactions can coexist with other transactions.
If a transaction is an updating transaction, that is it updates the data items, it has to ensure
that no other transaction can access (Read or write) those data items that it wants to update.
Ex:
T1 T2 Lock Manager
Lock – S(P)
Grant –S(P,T1)
Read (P)
Unlock (P)
Lock – X(Q) Grant-X(Q,T1)
Read (Q)
Q=Q+P
Write (Q)
Unlock (Q)
Lock – S(Q)
Grant – S(Q,T2)
Read (Q)
Unlock (Q)
Lock –X(P)
Grant – X(P,T2)
Read (P)
P=P+Q
Write (P)
Unlock-(P)
Ex: Assume a set of transactions {T0,T1,T2……….Tn} T0 need a resource X to complete its task.
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
Resource X is held by T1 and T1 waiting for a resource Y, which is held by T2, T2 is waiting for
resource Z, which is held by T0. Thus, all the processes wait for each other to release resources.
In this situation, none of the processes can finish their task. This situation is known as a
DEADLOCK.
Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the
transactions involved in the deadlock are either rolled back or restarted.
Deadlock Prevention:
A transaction requesting a new lock is aborted when there is the Possibility that a deadlock can
occur. If the transaction is aborted, all changes made by this transaction are rolled back and all
locks obtained by the transaction are released. The transaction is then rescheduled for execution.
Deadlock prevention works because it avoids the conditions that lead to dead locking.
Wait – Die Scheme:
In this scheme, if a transaction requests to lock a resource (data item), which is already held
with a conflicting lock by another transaction, and then one of the two possibilities may occur.
If TS(Ti)<TS(Tj) that is Ti, which is requesting a conflicting lock, is older than Tj then Ti
allowed to wait until the data item is available.
If TS(Ti)>TS(Tj) that is Ti is younger than Tj then Ti dies. Ti is restarted later with a random
delay but with the same timestamp.
Wound-Wait scheme:
In this scheme, if a transaction requests to lock a resource (data item), which is already held
with conflicting lock by some another transaction, one of the two possibilities may occur.
If TS(Ti)<TS(Tj), then Ti forces Tj to be rolled back – that is Ti wounds Tj. Tj is restarted later
with a random delay but with the same timestamp.
If TS(Ti)>TS(Tj) then Ti is forced to wait until the resource is available.
Deadlock Avoidance:
Aborting a transaction is not always a practical approach. Instead, deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance. Methods like “Wait for
graph” are available but they are suitable for only those systems. Where transactions are
lightweight having fewer instances of resource.
Tj
Optimistic Locking: This strategy can be used when instances of simultaneous transactions or
collision are expected to be infrequent. Pessimistic locking provides a guarantee that database
changes are made safety.
Optimistic locking can alleviate the problem of waiting for lock to release.
Database Recovery:
During the life of a transaction, that is, a after the start of a transaction but before the
transaction commits, several changes may be made in a database state. The database during
such a state is in an inconsistent state.
Assume that a transaction transfers Rs. 2000/- from A’s account to B’s account. Simply we
are not showing any error checking in the transaction. The transaction may be written as
READ A
A=A-2000
WRITE A
Failure
READ B
B=B+2000
WRITE B
COMMIT
Recovery techniques are used to bring database, which does not satisfy consistency
requirements into consistent state. If a transaction completes normally and commits then all
changes made by the transaction on the database are permanently registered in the database.
An abnormal termination of transaction may be due to several reasons including:
a) User may decide to abort the transaction issued by him/her
b) There might be a deadlock in the system
c) There might be a system failure.
Kinds of Failures
The kinds of failures that a transaction program during its execution can encounter are:
1. Software failures:
In such cases, a software error stops the execution of the current transaction (all
transactions), thus leading to losing the state of program execution and the state /contents
of the buffers.
2. Hardware failures:
Hardware failures are those failures when some hardware chip or disk fails. This may
result in loss of data. This may be due to many reasons. Some bad sectors may come on
disk or there is a disk crash. In all these cases, the database gets into an inconsistent state.
3. External failures:
A failure can also result due to an external cause. Such as fire, earthquakes, floods …etc. the
database must be duly backed up to avoid problem occurring due to such failures.
Database Errors
An error is said to have occurred if the execution of a command to manipulate the database
cannot be successfully completed either due to inconsistent data or due to state of program.
Physical backups
Physical Backups are the backups of the physical files used in storing and recovering your
database, such as datafiles, control files and archived redo logs, log files.
It is a copy of files storing database information to some other location, such as disk, some
offline storage like magnetic tape.
Physical backups are the foundation of the recovery mechanism in the database.
Physical backup provides the minute details about the transaction and modification to the
database.
Logical backup
Logical Backup contains logical data which is extracted from a database.
It includes backup of logical data like views, procedures, functions, tables, etc.
It is a useful supplement to physical backups in many circumstances but not a sufficient
protection against data loss without physical backups, because logical backup provides only
structural information.
Importance of Backups
Planning and testing backup helps against failure of media, operating system, software and
any other kind of failures that cause a serious data crash.
It determines the speed and success of the recovery.
Physical backup extracts data from physical storage (usually from disk to tape). Operating
system is an example of physical backup.
Logical backup extracts data using SQL from the database and store it in a binary file.
Logical backup is used to restore the database objects into the database. So the logical
backup utilities allow DBA (Database Administrator) to back up and recover selected objects
within the database.
-NSV Degree Colleges, Jagtial
B.Com (CA) IV - Semester Relational Database Management System
Recovery can be done using / restoring the previous consistent state (Backward recovery) or by
moving forward to the next consistent state as per the committed transactions (Forward recovery)
recovery.
Backward Recovery (Undo):
In this scheme the uncommitted changes made by a transaction to a database are undone.
Instead the system is reset to the previous consistent state of database that is free from any
errors.
Database
with changes Database
UNDO without
changes
Before
images
Images
Forward Recovery (Redo):
In this scheme the committed changes made by a transaction are reapplied to an earlier
copy of the database
Database
with changes Database
REDO without
changes
After
images
Images
Physical: The site or sites containing the computer system must be physically secured
against illegal entry of unauthorized persons.
Human: An authorization is given to a user to reduce the chance of any information
leakage and unwanted manipulations.
The Database Administrator (DBA) is responsible for implementing the database security
policies in a database system. The organization or data owners create these policies.
Database Security:
Database security refers to the collective measures used to protect and secure a database or
database management software from illegitimate use and malicious threats and attacks.
It is a broad term that includes a multitude of processes, tools and methodologies that
ensure security within a database environment.
Database and functions can be managed by two different modes of security controls:
1. Authentication
2. Authorization
Authentication:
o Database access usually requires user authentication and authorization. For user
authentication, the first level of security establishes that the person seeking system entry is
an authorized user.
o Authorization allows the database users to access certain part of database. However before
accessing the database, users need to identify themselves to the system to confirm their
correctness.
o Authentication is used by a server when the server needs to know exactly who is accessing
their information or site.
o Authentication is used by a client when the client needs to know that the server is system it
claims to be.
o In authentication, the user or computer has to prove its identity to the server or client.
o Usually, authentication by a server entails the use of a user name and password. Other
ways to authenticate can be through cards, retina scans, voice recognition, and fingerprints.
Authorization:
o Authorization is a process by which a server determines if the client has permission to use a
resource or access a file.
o Authorization is usually coupled with authentication so that the server has some concept of
who the client is that is requesting access.
o The type of authentication required for authorization may vary; passwords may be
required in some cases but not in others.
o In some cases, there is no authorization; any user may be use a resource or access a file
simply by asking for it. Most of the web pages on the Internet require no authentication or
authorization
Different types of access authorization may be allowed for a particular view, such as the following
1. Read authorization: allows reading, but not modification, deletion or insertion of data.
2. Insert authorization: allows insertion of new data, but no modification of existing data.
3. Update authorization: allows modification of data, but not deletion.
4. Delete authorization: allows deletion of data.
Database Computer
Technology Networks
Integration Distribution
Distributed
Database systems
Integration
Types of DDBMS:
DDBMS are basically divided into two types, they are
Homogeneous Distributed database:
All sites have identical software
Are aware of each other and agree to processing user requests
Each sits surrenders part of its autonomy in terms of right to change scheme or software
Appears to user as a single system
Heterogeneous Distributed database:
Different sites may use different schemas and software
Difference in scheme is a major problem for query processing
Difference in software is a major problem for transaction processing.
CPU DB CPU DB
Memory Memory
Computer system n
CPU DB
Memory
Site Calcutta
Site Delhi
Communication
Network
Site Chennai
Site Bangalore
DB Central site DB
Mumbai
Site Calcutta DB
Site Delhi
DB
Communication
Network
Site Chennai
Site Bangalore DB
DB
Advantages of DDBMS:
1. Data are located near the greatest demand site. The data in a distributed database system are
dispersed to match business requirements which reduce the cost of data access.
2. Faster data access. End users often work with only a locally stored subset of the company’s
data.
3. Faster data processing. A distributed database system spreads out the systems workload by
processing data at several sites.
4. Growth facilitation. New sites can be added to the network without affecting the operations of
other sites.
5. Improved communications. Because local sites are smaller and located closer to customers,
local sites foster better communication among departments and between customers and company
staff.
Disadvantages of DDBMS:
1. Complexity of management and control. Applications must recognize data location, and they
must be able to stitch together data from various sites. Database administrators must have the
ability to coordinate database activities to prevent database degradation due to data anomalies.
2. Technological difficulty. Data integrity, transaction management, concurrency control,
security, backup, recovery, query optimization, access path selection, and so on, must all be
addressed and resolved.
3. Security. The probability of security lapses increases when data are located at multiple sites.
The responsibility of data management will be shared by different people at several sites.
4. Lack of standards. There are no standard communication protocols at the database level.
(Although TCP/IP is the de facto standard at the network level, there is no standard at the
application level.) For example, different database vendors employ different—and often
incompatible—techniques to manage the distribution of data and processing in a DDBMS
environment.
Data Replication:
Data Replication refers to the storage of data copies at multiple sites served by a
computer network. Fragment copies can be stored at several sites to serve specific
information requirements.
Data replication is the frequent electronic copying data from a database in one computer
or server to database in another so that all users share the same level of information.
The result s a distributed database in which users can access data relevant to their tasks
without interfering with the work of others.
The implementation of database replication for the purpose of eliminating data or
inconsistency among users is known as normalization.
Replicated data are subject to the mutual consistency rule. The mutual consistency rule requires
that all copies of data fragments be identical. Therefore, to maintain data consistency among the
replicas, the DDBMS must ensure that a database update is performed at all sites where replicas
exist.
Advantages of Replication:
Disadvantages of Replication:
Fully Replicated: A fully replicated database stored multiple copies of each database
fragment at multiple sites. In this case, all database fragments are replicated.
Partially Replicated: A partially replicated database stores multiple copies of some
database fragments at multiple sites. Most DDBMS are able to handle the partially
replicated database well.
Un replicated: An Un replicated database stores each database fragment at a single site.
Snapshot Replication: Data on one server is simply copied to another server, or to another
database on the same server.
Merging Replication: Data from two or more databases is combined into a single database.
Transactional Replication: Users receives full initial copies of the database and then
receive periodic updates as data changes.
A Distributed database management system (DDBMS) ensures that changes, additions, and
deletions performed on the data at any given location are automatically reflected in the data
stored at all the other locations.
Data Fragmentation:
Data fragmentation allows us to break a single object into two or more segments or
fragments.
The object might be a user‟s database, a system database, or a table. Each fragment can
be stored at any site over a computer network.
Information about data fragmentation is stored in the distributed data catalog (DDC), from
which it is accessed by the TP to process user requests.
Data fragmentation strategies can be divided into 3types they are
1. Horizontal,
2. Vertical,
3. Mixed.
Horizontal Fragmentation: refers to the division of a relation into subsets (fragments) of tuples
(Rows). Each fragment is stored at a different node, and each fragment has unique rows. How-
ever, the unique rows all have the same attributes (columns). In short, each fragment represents
the equivalent of a SELECT statement, with the WHERE clause on a single attribute.
Vertical Fragmentation: refers to the division of a relation into attribute (column) subsets. Each
Subset (fragment) is stored at a different node, and each fragment has unique columns with the
exception of the key column, which is common to all fragments. This is the equivalent of the
PROJECT statement in SQL. '
Mixed Fragmentation: refers to a combination of horizontal and vertical strategies. In other
Words, a table may be divided into several horizontal subsets (rows), each one having a subset
of all attributes (columns)
Advantages of Fragmentation:
Horizontal
o Allows parallel processing on fragments of a relation
o Allows a relation to be spilt so that tuples are located where they are most frequently
accessed.
Vertical
o Allows tuples to be split so that each part of the tuple is stored where it is most
frequently accessed.
o Tuple – id attribute allows efficient joining of vertical fragements.
Vertical and Horizontal fragmentation can be mixed
o Fragments may be successively fragmented to an arbitrary depth.
Request
Client Server
Response
A client makes a request for a service and receives a replay to that request.
A server receives and processes a request, and send back the required response
The client/server systems may contain two different types of architecture/models, they are
1. 2-Tier client/server model
2. 3- Tier client /server model
The two-tier is based on Client Server architecture. The two-tier architecture is like client server
application. The direct communication takes place between client and server. There is no
intermediate between client and server. Because of tight coupling a 2 tiered application will run
faster.
The above figure shows the architecture of two-tier. Here the direct communication between
client and server, there is no intermediate between client and server.
Database and Server are incorporated with each other, so this technology is called as “Client-
Server Technology“.
Business Layer:
In this layer all business logic written like validation of data, calculations, data insertion etc.
This acts as a interface between client layer and Data access layer. This layer is also called the
intermediary layer helps to make communication faster between client and data layer.
Data Layer:
In this layer actual database is comes on the picture. Data access layer contains methods to
connect with database and to perform insert, update, delete, get data from database based on our
input data.
Advantages:
1. High performance, lightweight persistent objects
2. High degree of flexibility in deployment platform and configuration.
3. Improve data integrity and improved security (Client is not direct access to database).
Disadvantage:
1. Increase Complexity / Effort.