Sei sulla pagina 1di 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

UNIT I
INTRODUCTION TO DBMS
File Systems Organization - Sequential, Pointer, Indexed, Direct - Purpose of Database System-Database System
Terminologies-Database characteristics- Data models Types of data models Components of DBMS- Relational Algebra.
LOGICAL DATABASE DESIGN: Relational DBMS -Codd's Rule - Entity-Relationship model - Extended ER
Normalization Functional Dependencies, Anomaly- 1NF to 5NF- Domain Key Normal Form Denormalization
UNIT-I / PART-A
1
What is first normal form?
The domain of attribute must include only atomic (simple, indivisible) values.
2
What is 2NF?
Relation schema R is in 2NF if it is in 1NF and every non-prime attribute An in R is fully functionally dependent on
primary key.
3
Define Functional dependency. (Apr/May 2015)
In a given relation R, X and Y are attributes. Attribute Y is functionally dependent on attribute X if each value
of X determines EXACTLY ONE value of Y, which is represented as X Y (X can be composite in nature).
We say here x determines y or y is functionally dependent on x Empid Ename
4
What is a data model? List the types of data models used?
A data model is a collection of conceptual tools for describing data, data relationships, data semantics and
consistency constraints.
5
Define full functional dependency.
The removal of any attribute A from X means that the dependency does not hold any more.
6
Explain about partial functional dependency?
X and Y are attributes. Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of
attribute X.
7
What you meant by transitive functional dependency?
Transitive dependency is a functional dependency which holds by virtue of transitivity. A transitive dependency can
occur only in a relation that has three or more attributes. Let A, B, and C designates three distinct attributes (or
distinct collections of attributes) in the relation.
Suppose all three of the following conditions hold:
1.A B
2. It is not the case that BA
3. BC
Then the functional dependency AC (Which follows from 1 and 3 by the axiom of transitivity) is a transitive
dependency.
8
What is meant by domain key normal form?
Domain/key normal form (DKNF) is a normal form used in database normalization which requires that the database
contains no constraints other than domain constraints and key constraints.
9
Define database management system?
Database management system (DBMS) is a collection of interrelated data and a set of programs to access those data.
10 List any five applications of DBMS.
Banking, Airlines, Universities, Credit card transactions, Tele communication, Finance, Sales, Manufacturing,
Human resources.
11 What is the purpose of Database Management System? (Nov/Dec 2014)
Data redundancy and inconsistency , Difficulty in accessing data, Data isolation, Integrity problems, Atomicity
problems and Concurrent access anomalies
12 Define instance and schema?
Instance: Collection of data stored in the data base at a particular moment is called an Instance of the database
Schema: The overall design of the data base is called the data base schema.
13 Explain entity relationship model?(May/June 2016)
The entity relationship model is a collection of basic objects called entities and relationship among those objects. An
entity is a thing or object in the real world that is distinguishable from other objects.
14 What is relationship? Give examples
A relationship is an association among several entities. Example: A depositor relationship associates a customer with
each account that he/she has.
15 What are stored and derived attributes?
Stored attributes: The attributes stored in a data base are called stored attributes.
Derived attributes: The attributes that are derived from the stored attributes are called derived attributes

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 1 of 97

CS6302-Database Management Systems


16

17

18

19

20

21

22

23

24

25

26

27

28

29

Department of IT/CSE

2016-2017

What are composite attributes?


Composite attributes can be divided in to sub parts. The degree of relationship type is the number of participating
entity types.
Define weak and strong entity sets?
Weak entity set: entity set that do not have key attribute of their own are called weak entity sets.
Strong entity set: Entity set that has a primary key is termed a strong entity set.
What are attributes? Give examples.
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an
entity set.
Example: possible attributes of customer entity are customer name, customer id, Customer Street, customer city.
What does the cardinality ratio specify?
Mapping cardinalities or cardinality ratios express the number of entities to which another entity can be associated.
Mapping cardinalities must be one of the following:
One to one
One to many
Many to one
Many to many
Define- relational algebra.
The relational algebra is a procedural query language. It consists of a set of operations that take one or two relation
as input and produce a new relation as output.
What is a data dictionary?
A data dictionary is a data structure which stores meta data about the structure of the database ie. the schema of the
database.
Explain the two types of participation constraint.
Total: The participation of an entity set E in a relationship set R is said to be total if every entity in E participates in
at least one relationship in R.
Partial: if only some entities in E participate in relationships in R, the participation of entity set E in relationship R
is said to be partial.
Define Tuple variable?
Tuple variables are used for comparing two tuples in the same relation. The tuple variables are defined in the from
clause by way of the as clause.
Define De - normalization
De - Normalization is the process of attempting to optimize the performance of a database by adding redundant data
or by grouping data. In some cases, demoralizations helps cover up the inefficiencies inherent in relational database
software. A relational normalized database imposes a heavy access load over physical storage of data even if it is
well tuned for high performance.
List out the operations of the relational algebra
It has Six basic operators
1) select:
2) project:
3) union:
4) set difference
5) Cartesian product:
6) Rename
Define relational data model
Relational model use a collection of tables to represent both data and the relationships among those data. Each
table has a multiple columns and each column has unique name.
Define Single and Multi-valued Attributes
Single valued : can take on only a single value for each entity instance
E.g. age of employee. There can be only one value for this
Multi-valued:can take many values
E.g. skill set of employee
Explain Semi structured data model
Specification of data where individual data item of same type may have different sets of attributes
Sometimes called schema less or self-describing
XML is widely used to represent this data model
Mention the codds rule
Rule 0: The Foundation rule
Rule 1: The information rule

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 2 of 97

CS6302-Database Management Systems

30

31

32

33

34

35

36

37

Department of IT/CSE

2016-2017

Rule 2: The guaranteed access rule


Rule 3: Systematic treatment of null values
Rule 4: Active online catalog based on the relational model
Rule 5: The comprehensive data sublanguage rule
Rule 6: The view updating rule
Rule 7: High-level insert, update, and delete
Rule 8: Physical data independence
Rule 9: Logical data independence
Rule 10: Integrity independence
Rule 11: Distribution independence
Rule 12: The non subversion rule
Define Object based data model
Object based data model can be seen as extension of the E-R model with notion of encapsulation, methods and
object identify.
Explain Hierarchical data model
The Hierarchical data model organizes data in a tree structure. There is hierarchy of parent and child data
segments.
This model uses parent child relationship.
1:M Mapping between record type
Define Network Model
Some data were more naturally modeled with more than one parent per child.
This model permitted the modeling of M:N relationship
Why 4NF in Normal Form is more desirable than BCNF? (Nov/Dec 2014)
BCNF (Boyce code normal form) has all functional dependencies A to B are trivial of discriminator should be
super key. To get relation in BCNF, Splitting the relation schema not necessarily preserve all functional
dependency, Loss less decomposition and dependency are main points for the normalization sometime, it is not
possible to get a BCNF decomposition that is dependency, preserving. While 4NF has very similar definition as
BCNF.
A relational Schema is in 4NF, if all multivalued dependencies A to B are trivial and determinate A is super key
of schema.
If a relational schema is in 4nf, it is already in BCNF and 4NF decomposition preserves the all functional
dependency. So 4NF is preferable than to have BCNF.
Write the characteristics that distinguish the Database approach with the File-based approach. (Apr/May
2015)
File-based System.
1. Separation and isolation of data
2. Duplication of data
3. Incompatible file formats
4. Data dependence
Database Approach:
1. Control of data redundancy
2. Data consistency
3. Sharing of data
4. Improved data integrity
5. Improved security
State the anomalies of INF. (Nov/Dec 2015)
The anomalies of 1NF are
1. Insert anomaly
2. Delete anomaly
3. Update anomaly
Is it Possible for several attributes to have the same domain? Illustrate your answer with suitable examples.
(Nov/Dec 2015)
No. For example the attribute name can have various domain such as first name, last name or first name, middle
name, last name or title first name composite attribute
What are the disadvantages of file processing system?(May/June 2016)
The file processing system has the following major disadvantages:
Data redundancy and inconsistency,
Integrity Problems,

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 3 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Security Problems,
Difficulty in accessing data,
Data isolation.

UNIT-I / PART-B
Explain ER model by taking Hospital management /University Database Management as case study

Explain the various components of ER diagram with examples.

ENTITY-RELATIONSHIP(ER) MODELING:
ER modeling: A graphical technique for understanding and organizing the data independent of
the actual database implementation
Entity: Any thing that may have an independent existence and about which we intend to collect
data.Also known as Entity type.
Entity instance: a particular member of the entity type e.g. a particular student
Attributes: Properties/characteristics that describe entities
Relationships: Associations between entities
Attributes
The set of possible values for an attribute is called the domainof the attribute
Example:
The domain of attribute marital status is just the four values: single, married, divorced,
widowed
The domain of the attribute month is the twelve values ranging from January to
December
Key attribute: The attribute (or combination of attributes) that is unique for every entity instance
E.g the account number of an account, the employee id of an employee etc.
If the key consists of two or more attributes in combination, it is called a composite key
Simple Vs composite attribute
Simple attribute: cannot be divided into simpler components
E.g age of an employee
Composite attribute: can be split into components
E.g Date of joining of the employee.
Can be split into day, month and year
Single Vs Multi-valued Attributes
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 4 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Single valued : can take on only a single value for each entity instance
E.g. age of employee. There can be only one value for this
Multi-valued:can take many values
E.g. skill set of employee
Stored Vs Derived attribute
Stored Attribute:Attribute that need to be stored permanently.
E.g. name of an employee
Derived Attribute: Attribute that can be calculated based on other attributes
E.g. : years of service of employee can be calculated from date of joining and
current date
Regular Vs. Weak entity type
Regular Entity: Entity that has its own key attribute.
E.g.: Employee, student ,customer, policy holder etc.
Weak entity:Entity that depends on other entity for its existence and doesnt have key
attribute of its own
E.g. : spouse of employee
RELATIONSHIPS
A relationship type between two entity types defines the set of all associations between these
entity types
Each instance of the relationship between members of these entity types is called a Relationship
instance
Degree of a Relationship
Degree: the number of entity types involved
One: Unary Relationship

A unary relationship is represented as a diamond which connects one entity to


itself as a loop.
The relationship above means, some instances of employee manage other
instances of Employee.
E.g.: employee manager-of employee is unary
Two: Binary Relationship

A relationship between two entity types


E.g.: employee works-for department is binary
Three: Ternary Relationship

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 5 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

customer purchase item, shop keeper is a ternary relationship


Cardinality
Relationships can have different connectivity
one-to-one (1:1)
one-to-many (1:N)
many-to- One (M:1)
many-to-many (M:N)
E.g.: Employee head-of department (1:1).
Lecturer offers course (1:n) assuming a course is taught by a single lecturer Student
enrolls course (m:n).
The minimum and maximum values of this connectivity is called the cardinality of the relationship.
Cardinality One To One

One instance of entity type Person is related to one instance of the entity type Chair.
Cardinality One to- Many

One instance of entity type Organization is related to multiple instances of entity type Employee.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 6 of 97

CS6302-Database Management Systems


Cardinality Many-to-One

Department of IT/CSE

2016-2017

Reverse of the One to Many relationship


Cardinality Many-to-Many

Multiple instances of one Entity are related to multiple instances of another Entity.

Relationship Participation
Total: Every entity instance must be connected through the relationship to another instance of
the other participating entity types
Partial: All instances need not participate
E.g.: Employee Head-of Department
Employee: partial
Department: total
All employees will not be head-of some department. So only few instances of employee entity
participate in the above relationship. But each department will be headed by some employee. So
department entitys participation is total and employee entitys participation is partial in the above
relationship.
3

Explain Different types of file systems organization in detail.

Heap File Organization: When a file is created using Heap File Organization mechanism, the
Operating Systems allocates memory area to that file without any further accounting details. File
records can be placed anywhere in that memory area. It is the responsibility of software to manage the
records. Heap File does not support any ordering, sequencing or indexing on its own.
Sequential File Organization: Every file record contains a data field (attribute) to uniquely identify
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 7 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
that record. In sequential file organization mechanism, records are placed in the file in the some
sequential order based on the unique key field or search key. Practically, it is not possible to store all
the records sequentially in physical form.
Hash File Organization: This mechanism uses a Hash function computation on some field of the
records. As we know, that file is a collection of records, which has to be mapped on some block of the
disk space allocated to it. This mapping is defined that the hash computation. The output of hash
determines the location of disk block where the records may exist.
Clustered File Organization: Clustered file organization is not considered good for large databases.
In this mechanism, related records from one or more relations are kept in a same disk block, that is,
the ordering of records is not based on primary key or search key. This organization helps to retrieve
data easily based on particular join condition. Other than particular join condition, on which data is
stored, all queries become more expensive
4

Explain the purpose and components of DBMS in detail.

Database Users
Users are differentiated by the way they expect to interact with the system
Application programmers
Sophisticated users
Nave users
Database Administrator
Specialized users etc,.
Application programmers:
Professionals who write application programs and using these application programs they
interact with the database system
Sophisticated users :
These user interact with the database system without writing programs, But they submit
queries to retrieve the information
Specialized users:
Who write specialized database applications to interact with the database system.
Nave users:
Interacts with the database system by invoking some application programs that have been
written previously by application programmers
Eg : people accessing database over the web

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 8 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Database Administrator:
Coordinates all the activities of the database system; the database administrator has a good
understanding of the enterprises information resources and needs.
Schema definition
Access method definition
Schema and physical organization modification
Granting user authority to access the database
Monitoring performance
Storage Manager

The Storage Manager include these following components/modules


Authorization Manager
Transaction Manager
File Manager
Buffer Manager
Storage manager is a program module that provides the interface between the low-level data
stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible to the following tasks:
interaction with the file manager
efficient storing, retrieving and updating of data
Authorization Manager
Checks whether the user is an authorized person or not
Test the satisfaction of integrity constraints
Transaction Manager
Responsible for concurrent transaction execution
It ensures that the database remains in a consistent state despite of the system failure
Query Processor
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 9 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
It is also a collection of components and used to interprets the queries which is submitted by the
user.
The query processor includes these following components
DDL interpreter
DML Compiler
Query evaluation engine
DDL interpreter
-Interprets DDL statements and records the definition in the data dictionary
DML Compiler
Translate the DML Statements into an evaluation plan that contain low level
instructions and the query evaluation engine can understand these instruction
Query can be translated into many alternative evaluation plans
Query optimization picks up the lowest cost evaluation plan from the
alternatives
5

List out the disadvantages of File system over DB & explain it in detail.

In the early days, File-Processing system is used to store records. It uses various files for storing
the records.
Drawbacks of using file systems to store data:
Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
Difficulty in accessing data
Need to write a new program to carry out each new task
Data isolation multiple files and formats
Integrity problems
Hard to add new constraints or change existing ones

Atomicity problem
Failures may leave database in an inconsistent state with partial updates carried
out
E.g. transfer of funds from one account to another should either complete or not
happen at all
Concurrent access anomalies
Concurrent accessed needed for performance
Security problems
Database systems offer solutions to all the above problems

Discuss about (i) Data Models (ii) Mapping cardinalities. (Nov/Dec 2014)

(i) Data Models


It is underlying structure of database
It is collection of conceptual tolls for describing data, relationship and constraints.
Categories:
Entity-Relationship model
Relational model
Object based data model
Semi structured data model
1) Relational model
It uses a collection of tables to represent the data and the relationships among those data.
Each table contains many columns and each has a unique name.
It is the widely used data model and most of the database system is based on relational model.
2) Object based data model
Object based data model can be seen as extending the E-R model with notion of
encapsulation, methods and object indentify.
3) Semi structured data model
- Specification of data where individual data item of same type may have
different sets of attributes
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 10 of 97

CS6302-Database Management Systems


Department of IT/CSE
- Sometimes called schema less or self-describing
- XML is widely used to represent this data model

2016-2017

4) Entity-Relationship model
The entity-relationship (ER) data model allows us to describe the data involved in a real-world
enterprise in terms of objects and their relationships. It is widely used to develop an initial
database design.
(ii) Mapping cardinalities

One to one(1:1)
An entity in A is associated with at most one entity in B and an entity in B is
associated with at most one entity in A.

One to many(1:M)
An entity in A is associated with any number of entities in B and an entity in B is
associated with at most one entity in A.
Many to one(M:1)
An entity in A is associated with at most one entity in B and an entity in B is
associated with any number of entities in A.

Many to many(M:N)
An entity in A is associated with any number of entities in B and an entity in B is
associated with any number of entities in A.

List out the operations of the relational algebra and explain with suitable examples.

Relational Algebra:

Procedural language
The Operations can be classified as
Basic Operations
Additional Operations
Extended Operations
Basic Operations
select
project
union
set difference
Cartesian product
rename
Select Operation

It is used to select tuples from a relations


Notation: p(r)
p is called the selection predicate
Defined as:
p(r) = {t | t r and p(t)}
Each term is one of:
<attribute>
op <attribute> or <constant>
where op is one of: =, , >, . <.
Example
branch-name=Perryridge(account)
Project Operation

It is used to select certain columns from the relation.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 11 of 97

CS6302-Database Management Systems


Notation:

Department of IT/CSE

2016-2017

A1, A2, , Ak (r)


Where A1, A2 are attributing names and r is a relation name.
The result is defined as the relation of k columns obtained by erasing the columns that are not
listed
Duplicate rows removed from result, since relations are sets
E.g. To eliminate the branch-name attribute of account relation
account-number, balance (account)
Union Operation
For r s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of
values as does the 2nd column of s)
E.g. to find all customers with either an account or a loan
customer-name (depositor) customer-name (borrower)
Set Difference Operation

It is used to find tuples that are in one relation but are not in another relation.
Notation r s
Defined as:
r s = {t | t r and t s}
Set differences must be taken between compatible relations.
r and s must have the same arity
Cartesian-Product Operation
It is used to combine information from two relations.
Notation r x s
Defined as:
r x s = {t q | t r and q s}
Rename Operation
Allows us to name, and therefore to refer to, the results of relational-algebra expressions.
Allows us to refer to a relation by more than one name.
Example:
x (E)
returns the expression E under the name X
8

Explain functional dependency in database design with its properties.

Functional dependency
In a given relation R, X and Y are attributes. Attribute Y is functionally
dependent on attribute X if each value of X determines EXACTLY ONE value
of Y, which is represented as X -> Y (X can be composite in nature).
We say here x determines y or y is functionally dependent on x XY does
not imply YX
If the value of an attribute Marks is known then the value of an attribute
Grade is determined since Marks Grade
Full functional dependency
X and Y are attributes X Functionally determines Y Note: Subset of X should not
functionally determine Y
Marks is fully functionally dependent on STUDENT# COURSE# and not on sub
set of STUDENT# COURSE#. This means Marks can not be determined either
by STUDENT# OR COURSE# alone. It can be determined only using
STUDENT# AND COURSE# together. Hence Marks is fully functionally
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 12 of 97

CS6302-Database Management Systems


Department of IT/CSE
dependent on STUDENT# COURSE#.

2016-2017

Partial functional dependency


X and Y are attributes. Attribute Y is partially dependent on the attribute X only if
it is dependent on a sub-set of attribute X.
Course Name, IName, Room# are partially dependent on composite attributes
STUDENT# COURSE# because COURSE# alone defines the Course Name,
IName, Room#.
Transitive functional dependency
X Y and Z are three attributes. X -> Y, Y-> Z => X -> Z
Room# depends on IName and in turn IName depends on COURSE#. Hence
Room# transitively depends on COURSE#.
9

Define anomalies and explain 1st normal form & 2nd normal form with example. (Nov/Dec 2014)

A relation schema is in 1NF:


If and only if all the attributes of the relation R are atomic in nature.
Atomic: The smallest levels to which data may be broken down and remain meaningful
Second normal form: 2NF
A Relation is said to be in Second Normal Form if and only if:
It is in the First normal form, and
No partial dependency exists between non-key attributes and key attributes.
An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which
doesnt is a non-prime attribute. To make a table 2NF compliant, we have to remove all the partial
dependencies
Note: - All partial dependencies are eliminated
10

Discuss third normal form & BCNF with example. (Nov/Dec 2014)

Third normal form: (3 NF) A relation R is said to be in the Third Normal Form (3NF) if and only if
It is in 2NF and
No transitive dependency exists between non-key attributes and key attributes.
STUDENT# and COURSE# are the key attributes.
All other attributes, except grade are non-partially, non-transitively dependent on key attributes.
Student#, Course# - > Marks
Marks -> Grade
Note : - All transitive dependencies are eliminated
Boyce-Codd Normal form BCNF
A relation is said to be in Boyce Codd Normal Form (BCNF)
- if and only if all the determinants are candidate keys.BCNF relation is a strong 3NF, but not every
3NF relation is BCNF.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 13 of 97

CS6302-Database Management Systems


11

Department of IT/CSE

2016-2017

Explain the 4NF and 5NF in detail with example. (Nov/Dec 2014)

Fourth NF:
Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald
Fagin in 1977, 4NF is the next level of normalization after BoyceCodd normal form (BCNF). Whereas
the second, third, and BoyceCodd normal forms are concerned with functional dependencies, 4NF is
concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF
if and only if, for every one of its non-trivial multivalued dependencies X
Y, X is a super key that is, X
is either a candidate key or a superset thereof.
Multivalued dependencies
If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z,
then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y,
and z respectively. A multivalued dependency X
Y signifies that if we choose any x actually occurring
in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the table, we
will find that xc is associated with the same y entries regardless of z.A trivial multivalued dependency X
Y is one where either Y is a subset of X, or X and Y together form the whole set of attributes of the
relation.A functional dependency is a special case of multivalued dependency. In a functional dependency
X Y, every x determines exactly one y, never more than one.

Fifth normal form


St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 14 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database
normalization designed to reduce redundancy in relational databases recording multi-valued facts by
isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every
join dependency in it is implied by the candidate keys.
A join dependency *{A, B, Z} on R is implied by the candidate key(s) of R if and only if each of A, B,
, Z is a super key for R

12

i)

With the help of a neat block diagram explain the basic architecture of a database management system .
(May/June 2016)

Database Users
Users are differentiated by the way they expect to interact with the system
Application programmers
Sophisticated users
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 15 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Nave users
Database Administrator
Specialized users etc,.
Application programmers:
Professionals who write application programs and using these application programs they
interact with the database system
Sophisticated users :
These user interact with the database system without writing programs, But they submit
queries to retrieve the information
Specialized users:
Who write specialized database applications to interact with the database system.
Nave users:
Interacts with the database system by invoking some application programs that have been
written previously by application programmers
Eg : people accessing database over the web

Database Administrator:
Coordinates all the activities of the database system; the database administrator has a good
understanding of the enterprises information resources and needs.
Schema definition
Access method definition
Schema and physical organization modification
Granting user authority to access the database
Monitoring performance
Storage Manager

The Storage Manager include these following components/modules

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 16 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Authorization Manager
Transaction Manager
File Manager
Buffer Manager
Storage manager is a program module that provides the interface between the low-level data
stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible to the following tasks:
interaction with the file manager
efficient storing, retrieving and updating of data
Authorization Manager
Checks whether the user is an authorized person or not
Test the satisfaction of integrity constraints
Transaction Manager
Responsible for concurrent transaction execution
It ensures that the database remains in a consistent state despite of the system failure
Query Processor
It is also a collection of components and used to interprets the queries which is submitted by the
user.
The query processor includes these following components
DDL interpreter
DML Compiler
Query evaluation engine
DDL interpreter
-Interprets DDL statements and records the definition in the data dictionary
DML Compiler
Translate the DML Statements into an evaluation plan that contain low level
instructions and the query evaluation engine can understand these instruction
Query can be translated into many alternative evaluation plans
Query optimization picks up the lowest cost evaluation plan from the
alternatives
ii) What are the advantages of having a centralized control of data? Illustrate your answer with suitable example.
(Nov/Dec 2015)

Data integrity is maximised and data redundancy is minimised, as the single storing place of all
the data also implies that a given set of data only has one primary record. This aids in the
maintaining of data as accurate and as consistent as possible and enhances data reliability.
Generally bigger data security, as the single data storage location implies only a one possible
place from which the database can be attacked and sets of data can be stolen or tampered with.
Better data preservation than other types of databases due to often-included fault-tolerant setup.
Easier for using by the end-user due to the simplicity of having a single database design.
Generally easier data portability and database administration.
More cost effective than other types of database systems as labour, power supply and maintenance
costs are all minimised.
Data kept in the same location is easier to be changed, re-organised, mirrored, or analysed.
All the information can be accessed at the same time from the same location.
Updates to any given set of data are immediately received by every end-user.

Example:
Think about a banking application which uses centralized database. In this case the data is
stored in a common place. Applications running in various banks may communicate to the
common database over network to access or insert/update/delete information.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 17 of 97

CS6302-Database Management Systems

13

Department of IT/CSE

2016-2017

A car rental company maintains a database for all vehicles in its current fleet. For all vehicles, it includes the vehicle
identification number, license number, manufacturer, model, date of purchase, and color. Special data are included
for certain types of vehicles.
Trucks: cargo capacity.
Sports cars: horsepower, renter age requirement.
Vans: number of passengers.
Off-road vehicles: ground clearance, drivetrain (four- or two-wheel drive).
Construct an ER model for the car rental company database. (Nov/Dec 2015)

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 18 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

14

Draw E R Diagram for the Restaurant Menu Ordering System, which will facilitate the food items ordering and
services within a restaurant. The entire restaurant scenario is detailed as follows. The Customer is able to view the
food items menu, call the waiter, place orders and obtain the final bill through the computer kept in their table. The
waiters through their wireless tablet PC are able to initialize a table for customers, control the table functions to
assist customers, orders, send orders to food preparation staff (chef) and finalize the customers bill. The food
preparation staffs (Chefs), with their touch-display interface to the system, are able to view orders sent to the kitchen
by waiters. During preparation, they are able to let the waiter know the status of each item, and can send notification
when items are completed. The system should have full accountability and logging facilities, and should support
supervisor actions to account for exceptional circumstances, such as a meal being refunded or walked out on.
(Apr/May 2015)

15

State the need for Normalization of a Database and Explain the various Normal Forms (1 st, 2nd, 3rd, BCNF, 4th, 5th and
Domain- Key) with suitable examples. (Apr/May 2015)

A relation schema is in 1NF:


If and only if all the attributes of the relation R are atomic in nature.
Atomic: The smallest levels to which data may be broken down and remain meaningful
Second normal form: 2NF
A Relation is said to be in Second Normal Form if and only if:
It is in the First normal form, and
No partial dependency exists between non-key attributes and key attributes.
An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which
doesnt is a non-prime attribute. To make a table 2NF compliant, we have to remove all the partial
dependencies
Note: - All partial dependencies are eliminated
Third normal form: (3 NF) A relation R is said to be in the Third Normal Form (3NF) if and only if
It is in 2NF and
No transitive dependency exists between non-key attributes and key attributes.
STUDENT# and COURSE# are the key attributes.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 19 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
All other attributes, except grade are non-partially, non-transitively dependent on key attributes.
Student#, Course# - > Marks
Marks -> Grade
Note : - All transitive dependencies are eliminated
Boyce-Codd Normal form BCNF
A relation is said to be in Boyce Codd Normal Form (BCNF)
- if and only if all the determinants are candidate keys.BCNF relation is a strong 3NF, but not every
3NF relation is BCNF.
Fourth NF:
Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald
Fagin in 1977, 4NF is the next level of normalization after BoyceCodd normal form (BCNF). Whereas
the second, third, and BoyceCodd normal forms are concerned with functional dependencies, 4NF is
concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF
if and only if, for every one of its non-trivial multivalued dependencies X
Y, X is a super key that is, X
is either a candidate key or a superset thereof.
Multivalued dependencies
If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z,
then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y,
and z respectively. A multivalued dependency X
Y signifies that if we choose any x actually occurring
in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the table, we
will find that xc is associated with the same y entries regardless of z.A trivial multivalued dependency X
Y is one where either Y is a subset of X, or X and Y together form the whole set of attributes of the
relation.A functional dependency is a special case of multivalued dependency. In a functional dependency
X Y, every x determines exactly one y, never more than one.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 20 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Fifth normal form


Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database
normalization designed to reduce redundancy in relational databases recording multi-valued facts by
isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every
join dependency in it is implied by the candidate keys.
A join dependency *{A, B, Z} on R is implied by the candidate key(s) of R if and only if each of A, B,
, Z is a super key for R

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 21 of 97

CS6302-Database Management Systems

16

Department of IT/CSE

2016-2017

Briefly explain about views of data.(May/June 2016)


Database System provides three levels of abstraction.
Physical level
Logical level
View level

Physical level: It describes how the data are stored.


Logical level: It describes data stored in database, and the relationships among the
data.
View level: It describes only part of the database. Views can hide information for
security purposes
UNIT II
SQL & QUERY OPTIMIZATION
SQL Standards - Data types - Database Objects- DDL-DML-DCL-TCL-Embedded SQL-Static Vs Dynamic SQL QUERY OPTIMIZATION: Query Processing and Optimization - Heuristics and Cost Estimates in Query Optimization.
UNIT-II / PART-A
1
Define query language?
A query is a statement requesting the retrieval of information. The portion of DML that involves information
retrieval is called a query language.
2
List the string operations supported by SQL?
1) Pattern matching Operation
2) Concatenation
3) Extracting character strings
4) Converting between uppercase and lower case letters.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 22 of 97

CS6302-Database Management Systems


3

10
11

12
13

14

15

16

17

18

Department of IT/CSE

2016-2017

Name the categories of SQL command? (May/June 2016)


SQL commands are divided in to the following categories:
1. Data - definition language
2. Data manipulation language
3. Data Query language
4. Data control language
5. Data administration statements
6. Transaction control statements
What is the use of Union and intersection operation?
Union: The result of this operation includes all tuples that are either in r1 or in r2 or in both r1 and r2.Duplicate
tuples are automatically eliminated.
Intersection: The result of this relation includes all tuples that are in both r1 and r2.
What are aggregate functions? And list the aggregate functions supported by SQL?
Aggregate functions are functions that take a collection of values as input and return a single value. Aggregate
functions supported by SQL are
Average: avg
Minimum: min
Maximum: max
Total: sum Count: count
List out some date functions.
To_date
To_char(sysdate,fmt)
d,dd,ddd,mon,dy,day,y,yy,yyy,yyyy,year,month,mm
What is the use of sub queries?
A sub query is a select-from-where expression that is nested with in another query. A common use of sub queries is
to perform tests for set membership, make set comparisons, and determine set cardinality.
List the SQL domain Types?
SQL supports the following domain types.

1) Char (n) ,2) varchar (n), 3) int , 4) numeric (p,d) ,5) float(n) , 6) date.
What is the use of integrity constraints?
Integrity constraints ensure that changes made to the database by authorized users do not result in a loss of data
consistency. Thus integrity constraints guard against accidental damage to the database
Mention the 2 forms of integrity constraints in ER model?
Key declarations, Form of a relationship
List some security violations (or) name any forms of malicious access.
1) Unauthorized reading of data
2) Unauthorized modification of data
3) Unauthorized destruction of data.
What is called query processing?
Query processing refers to the range of activities involved in extracting data from a database.
What is called a query evaluation plan?
A sequence of primitive operations that can be used to evaluate be query is a query evaluation plan or a query
execution plan.
What is called as an N-way merge?
The merge operation is a generalization of the two-way merge used by the standard in-memory sort-merge
algorithm. It merges N runs, so it is called an N-way merge.
What is a primary key?
Primary key is a set of one or more attributes that can uniquely identify record from the relation; it will not accept
null values and redundant values. A relation can have only one primary key.
What is a super key?
A super key is a set of one or more attributes that collectively allows us to identify uniquely an entity in the entity
set.
What is foreign key?
A relation schema r1 derived from an ER schema may include among its attributes the primary key of another
relation schema r2.this attribute is called a foreign key from r1 referencing r2.
What is the difference between unique and primary key?

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 23 of 97

CS6302-Database Management Systems


19

20
21
22

23

24

25

26

27

28

Department of IT/CSE

2016-2017

Unique and primary key are keys which are used to uniquely identify record from the relation. But unique key
accepts null values; primary key does not accept null values.
What is the difference between char and varchar2 data type?
Char and varchar2 are data types which are used to store character values.
Char is static memory allocation; varchar2 is dynamic memory allocation.
How to add primary key to a table with suitable query?
Alter table <table name> add primary key(column);
Explain Query optimization?(May/June 2016)
Query optimization refers to the process of finding the lowest cost method of evaluating a given query.
Define Candidate key
A Candidate key is a set of one or more attributes that can uniquely identify a row in a given table. A relation can
have more than one candidate keys.
How to create composite primary key with suitable query?
Syntax: Create table <table name>(Var1 datatype, Var2 datatype, primary key(Var1,Var2));
Example:
Create table course(course_id int, student_id int, primary key(course_id, student_id));
Why does SQL allow duplicate tuples in a table or in a query result? (Nov/Dec 2015)
If key constraint is not set on a relation every result in a relation will be considered as a tuple and hence SQL allows
duplicate tuples in a table. Distinct keyword is used to avoid duplicate tuples in the result.
State the need for Query Optimization (Apr/May 2015)
The query optimizer attempts to determine the most efficient way to execute a given query by considering the
possible query plans.
Differentiate static and dynamic SQL. (Nov/Dec 2014) (Apr/May 2015) (Nov/Dec 2015)
Static SQL
Dynamic SQL
The SQL statements do not change each time the
The program must dynamically allocate memory to
program is run is called Static SQL.
receive query result
Static SQL is compiled and optimized prior to its
Dynamic SQL is compiled and optimized during
execution
execution
In Static SQL, the statement is prepared before the
Dynamic SQL components of SQL allows programs
program is executed and the operational form of the to construct and submit SQL Queries at runtime
statement persists beyond the execution of the
program.
Define: DDL, DML, DCL and TCL. (Nov/Dec 2014) (Apr/May 2015)
DDL COMMANDS:
Create
Alter
Add
Modify
Drop
Rename
Drop
DML Commands:
Insert
Select
Update
Delete
DCL commands
Grant - Provide access privilege to user
Revoke - Get back access privilege from user
TCL commands
Commit
Rollback
Save point
What is embedded SQL? What are its advantages?
The SQL standard defines embedded of SQL in a variety of programming languages such as C, Java, and Cobol. A

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 24 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

language to which SQL queries are embedded is referred to as a host language, and the SQL structures permitted in
the host language comprise embedded SQL.
The basic form of these languages follows that of the System R embedding of SQL into PL/I.
EXEC SQL statement is used to identify embedded SQL request to the preprocessor
EXEC SQL <embedded SQL statement > END_EXEC
UNIT-II / PART-B
Explain about data definition language in SQL with examples. (Nov/Dec 2014)(May/June 2016)

DDL COMMANDS (DATA DEFINITION LANGUAGE)

Create
Alter
Add
Modify
Drop
Rename
Drop

SYNTAX:
CREATE:
Create table <table name> (column name1 datatype1 constraints, column2 datatype2 . . .);
ALTER:
ADD:
Alter table <table name> add(column name1 datatype1);
MODIFY:
Alter table <table name> modify(column name1 datatype1);
DROP:
Alter table <table name> drop (column name);
RENAME:
Rename <old table name> to <new table name>;
DROP:
Drop table <table name>;
2

Consider a student registration database comprising of the below given table schema.
Student File
Student Number
Student Name
Address
Course File
Course Number

Description

Hours

Telephone
Professor Number

Professor File
Professor Number

Name

Office

Registration file
Student Number

Course Number

Date

Consider a suitable sample of tuples/records for the above mentioned tables and write DML statements (SQL) to
answer for the queries listed below.
1. Which courses does a specific professor teach?
2. What courses does specific professors?
3. Who teaches a specific course and where is his/her office?
4. For a specific student number, in which courses is the student registered and what is his/her name?
5. Who are the professors for a specific student?
6. Who is the student registered in a specific course? (Apr/May 2015)

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 25 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Answer
1.select coursenumber,description,professor_name from coursefile,professorfile where
coursefile.professornumber=professorfile.professornumber;
2. select description from course where professor number=102 and professor number=103;
3. select coursenumber,description,professor_name,office from coursefile,professorfile where
coursefile.professornumber=professorfile.professornumber
4.select studentname,coursename from studentfile.coursefile,registrationfile where
studentfile.studentnumber=registration.studentnumber and
coursefile.coursenumber=regitrationfile.coursenumber;
5.select studentname,professorname from studentfile,coursefile,professorfile,registrationfile where
studentfile.studentnumber=registration.studentnumber and
coursefile.coursenumber=regitrationfile.coursenumber;
and coursefile.professornumber=professorfile.professornumber
6. select studentname,coursename from studentfile.coursefile,registrationfile where
studentfile.studentnumber=registration.studentnumber and
coursefile.coursenumber=regitrationfile.coursenumber;
3

Explain about data manipulation language in SQL with examples. (Nov/Dec 2014)

DATA MANIPULATION LANGUAGE


DML Commands:

Insert
Select
Update
Delete

SYNTAX:
INSERT:
Single level:
Insert into <table name> values (attributes1, attributes2);
Multilevel:
Insert into <table name> values (&attributes1,&attributes2.);
SELECT:
Single level:
Select <column name> from <table name>;
Multilevel:
Select * from <table name> where <condition>;
UPDATE:
Single level:
Update <table name> set <column name>=values where <condition>;
Multilevel:
Update <table name> set <column name>=values;
DELETE:
Single level:
Delete from <table name> where <column name>=values;
Multilevel:
Delete from <table name>;
4

Explain about data control language in SQL with examples.

DCL (Data Control Language)


St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 26 of 97

CS6302-Database Management Systems


Department of IT/CSE
GRANT
Used to grant privileges to the user

2016-2017

GRANT privilege_name
ON object_name
TO {user_name |PUBLIC |role_name}
[WITH GRANT OPTION];

privilege_name is the access right or privilege granted to the user. Some of the access rights
are ALL, EXECUTE, and SELECT.
object_name is the name of an database object like TABLE, VIEW, STORED PROC and
SEQUENCE.
user_name is the name of the user to whom an access right is being granted.
user_name is the name of the user to whom an access right is being granted.
PUBLIC is used to grant access rights to all users.
ROLES are a set of privileges grouped together.
WITH GRANT OPTION - allows a user to grant access rights to other users.

REVOKE
Used to revoke privileges from the user
REVOKE privilege_name
ON object_name
FROM {user_name |PUBLIC |role_name}
1) System privileges - This allows the user to CREATE, ALTER, or DROP database objects.
2) Object privileges - This allows the user to EXECUTE, SELECT, INSERT, UPDATE, or DELETE data
from database objects to which the privileges apply.
5

Explain about TCL in SQL with examples.

TCL (Transaction Control Language)


COMMIT
Used to made the changes permanently in the Database.
ROLLBACK
Similar to the undo operation.
SQL> delete from branch;
6 rows deleted.
SQL> select * from branch;
no rows selected
SQL> rollback;
Rollback complete.
SQL> select * from branch;
BRANCH_NAME BRANCH_CITY
ASSETS
--------------- -------------------- ------------------------------------------tambaram
chennai-20
50000
adayar
chennai-20
100000
tnagar
chennai-17
250000
saidapet
chennai-15
150000
chrompet
chennai-43
450000
guindy
chennai-32
150000
6 rows selected.
SAVE POINT
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 27 of 97

CS6302-Database Management Systems


Department of IT/CSE
SQL> select * from customer;
CUSTID PID
QUANTITY
---------- ---------- ---------100
1234
10
101
1235
15
102
1236
15
103
1237
10
SQL> savepoint s1;
Savepoint created.
SQL> Delete from customer where custid=103;
CUSTID PID
QUANTITY
---------- ---------- ---------100
1234
10
101
1235
15
102
1236
15
SQL> rollback to s1;
Rollback complete.
SQL> select * from customer;
CUSTID PID
QUANTITY
---------- ---------- ---------100
1234
10
101
1235
15
102
1236
15
103
1237
10
SQL> commit;
6

2016-2017

Explain Integrity constraints with example.

Constraints

Constraints within a database are rules which control values allowed in columns.
Integrity
Integrity refers to requirement that information be protected from improper modification.
Integrity constraints
Integrity constraints provide a way of ensuring that changes made to the database by
authorized users do not result in a loss of data consistency
Constraints
- Primary key
- Referential integerity
- Check constraint
- Unique Constraint
- Not Null/Null
Primary key
The primary key of a relational table uniquely identifies each record in the table
-Unique
-Not null
Referential integrity
We ensure that a value appears in one relation for a given set of attribute also appears for
a certain set of attribute in another relation.
eg : branch_name varchar2(10) references branch(branch_name)
The foreign key identifies a column or set of columns in one (referencing) table that refers to a
column or set of columns in another (referenced) table

Check contraint
Eg :
Create table branch(
Balance number check (balance >500));
7

Design an employee detail relation and explain referential integrity using SQL queries.

Employee(emp_id pk,emp_Name,Addr,phone_no,dept_id FK)


Dept(dept_id PK,dept_name)
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 28 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Dept table
Create table dept(dept_id varchar2(20) primary key, dept_name varchar2(20))
Employee table
Create table employee(emp_id varchar2(20) primary key,emp_name varchar2(20),phone_no
number,dept_id varchar2(20) references emp(dept_id));
8

Explain about Query optimization with neat Diagram. (Nov/Dec 2014)

Query Optimization:
Among all equivalent evaluation plans choose the one with lowest cost.
Cost is estimated using statistical information from the database catalog
e.g. number of tuples in each relation, size of tuples, etc.

Explain about Embedded SQL,Static and Dynamic SQL (Nov/Dec 2014)

Embedded SQL
-The SQL standards defines embedding of SQL in variety of programming language such as
VB,C,C++,Java.
-A language to which SQL queries are embedded is referred to as a host language and the SQL
structures permitted in the language comprise embedded SQL.
Two reasons
Not all queries expressed in SQL,since SQL does not provide the full expressive power of
a general purpose language.
Nondeclarative actions- printing a report,intracting with user
Dynamic SQL
The dynamic SQL components of SQL allows programs to construct and submit SQL Queries at
runtime.
In contrast the embedded SQL statement must be completely present at compile time they are
compiled by the embedded SQL pre processor
Using dynamic SQL the programs can create SQL queries at runtime and can either have them
executed immediately or have them prepared for subsequent use.
10

Explain about different data types in data base language with an example.

User defined types useful in situations where the same data type is used in several columns from
different tables.
The names of the user-defined types provides the extra information.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 29 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Syntax
Creation
Create or replace type <type_name> as object <representation >
Description:
Desc <type_name>;
Delete type:
Drop type <typename>;
11

Explain in detail about Heuristics and Cost estimates in Query Optimization (Nov/Dec 2014).

Process for heuristics optimization


The parser of a high-level query generates an initial internal representation
Apply heuristics rules to optimize the internal representation
A query execution plan is generated to execute groups of operations based on the access paths
available on the files involved in the query
The main heuristic is to apply first the operations that reduce the size of intermediate results.
E.g., Apply SELECT and PROJECT operations before applying the JOIN or other binary
operations.
Cost Based Query Optimiztion:
Cost-based optimization is expensive, even with dynamic programming.
Systems may use heuristics to reduce the number of choices that must be made in a cost-based
fashion.
Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in
all cases) improve execution performance:
Perform selection early (reduces the number of tuples)
Perform projection early (reduces the number of attributes)
Perform most restrictive selection and join operations before other similar operations.
Some systems use only heuristics, others combine heuristics with partial cost-based
optimization.
Steps in Typical Heuristic Optimization
1.
Deconstruct conjunctive selections into a sequence of single selection operations
2.
Move selection operations down the query tree for the earliest possible execution .
3.
Execute first those selection and join operations that will produce the smallest relations.
4.
Replace Cartesian product operations that are followed by a selection condition by join
operations
5.
Deconstruct and move as far down the tree as possible lists of projection attributes,
creating new projections where needed
6.
Identify those subtrees whose operations can be pipelined, and execute them using
pipelining.
12

Describe the six clauses in the syntax of an SQL query, and show what type of constructs can be specified in each of
the six clauses. Which of the six clauses are required and which are optional? (Nov/Dec 2015)
WHERE Clause
The WHERE clause defines the criteria to limit the records affected by SELECT, UPDATE, and DELETE
statements.
Syntax:
SELECT column1, column2, columnN
FROM table_name
WHERE [condition]
Eg:
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000;
Having clause
Specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement.
HAVING is typically used in a GROUP BY clause.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 30 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Syntax:
SELECT column1, column2
FROM table1, table2
WHERE [ conditions ]
GROUP BY column1, column2
HAVING [ conditions ];
Eg:
SQL > SELECT ID, NAME, AGE, ADDRESS, SALARY
FROM CUSTOMERS
GROUP BY age
HAVING COUNT(age) >= 2;
Group by clause
Group by clause is used to group the results of a SELECT query based on one or more columns.
Syntax:
SELECT column1, column2
FROM table_name
WHERE [ conditions ]
GROUP BY column1, column2
Eg:
SELECT NAME, SUM(SALARY) FROM CUSTOMERS GROUP BY NAME;
Order by
The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on one or more
columns. Some database sorts query results in ascending order by default.
Syntax:
The basic syntax of ORDER BY clause is as follows:
SELECT column-list
FROM table_name
[WHERE condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
Eg:
SELECT * FROM CUSTOMER ORDER BY NAME DESC;

Into clause
The SELECT INTO statement copies data from one table and inserts it into a new table.
Syntax:
SELECT column_name(s)
INTO newtable [IN externaldb]
FROM table1;
Eg:
SELECT CustomerName, ContactName
INTO Customers1
FROM Customers;

From clause
The SQL FROM clause is used to list the tables and any joins required for the SQL statement.
Syntax
FROM table1
[ { INNER JOIN
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 31 of 97

CS6302-Database Management Systems


| LEFT [OUTER] JOIN
| RIGHT [OUTER] JOIN
| FULL [OUTER] JOIN } table2
ON table1.column1 = table2.column1 ]

Department of IT/CSE

2016-2017

Eg:
SELECT products.product_name, categories.category_name
FROM products
INNER JOIN categories
ON products.category_id = categories.category_id;
13

Discuss about join order optimization and heuristic optimization algorithm. (Apr/May 2015)

Join order optimization


A join combines records from two tables based on some common information. But since
a join works with only two tables at a time, a query requesting data from n tables must be
executed as a sequence of n 1 joins. Although the results of a query are the same regardless of
the join order, the order in which the tables are joined greatly influences the cost and performance
of a query. The Query Optimizer must find the optimal sequence of joins between the tables used
in the query. Finding the optimal join order is one of the most difficult problems in query
optimization.
Example
For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
If r2 r3 is quite large and r1 r2 is small, we choose
(r1 r2) r3
So that we compute and store a smaller temporary relation.
Consider the expression

customer-name ((branch-city = Brooklyn (branch))


account depositor)
Could compute account depositor first, and join result with
branch-city = Brooklyn (branch)
but account depositor is likely to be a large relation. Since it is more likely that only a
small fraction of the banks customers have accounts in branches located in Brooklyn, it
is better to compute.
branch-city = Brooklyn (branch) account
Join Order Optimization Algorithm
procedure findbestplan(S)
if (bestplan[S].cost )
return bestplan[S]
// else bestplan[S] has not been computed earlier, compute it now
for each non-empty subset S1 of S such that S1 S
P1= findbestplan(S1)
P2= findbestplan(S - S1)
A = best algorithm for joining results of P1 and P2
cost = P1.cost + P2.cost + cost of A
if cost < bestplan[S].cost
bestplan[S].cost = cost
bestplan[S].plan = execute P1.plan; execute P2.plan; join results of P1 and P2 using A
return bestplan[S]
Heuristic Optimization Algorithm.
Process for heuristics optimization
1. The parser of a high-level query generates an initial internal representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of operations based on the access paths
available on the files involved in the query.
The main heuristic is to apply first the operations that reduce the size of intermediate results.
E.g., Apply SELECT and PROJECT operations before applying the JOIN or other binary
operations.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 32 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Heuristic Optimization of Query Trees:
The same query could correspond too many different relational algebra expressions - and hence
many different query trees.
The task of heuristic optimization of query trees is to find a final query tree that is efficient to
execute.
Steps in converting during heuristic optimization:
a. Initial query tree for SQL query tree Q
b. Move SELECT operation down the query tree.
c. Applying more restrictive SELECT operation first.
d. Replacing CARTISIEN PRODUCT and SELECT with JOIN operation
e. Move PROJECT operation down the query.
Example of transforming a query
Problem:
Find the last names of employees born after 1957 who work on a project names
AQUARIUS
Query:
SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = AQUARIUSAND PNMUBER=PNO
AND ESSN=SSN AND BDATE > 1957-12-31;
Applying rules

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 33 of 97

CS6302-Database Management Systems

14

Department of IT/CSE

2016-2017

Explain about SQL Fundamentals.(May/June 2016)

DDL COMMANDS (DATA DEFINITION LANGUAGE)

Create
Alter
Add
Modify
Drop
Rename
Drop

SYNTAX:
CREATE:
Create table <table name> (column name1 datatype1 constraints, column2 datatype2 . . .);
ALTER:
ADD:
Alter table <table name> add(column name1 datatype1);
MODIFY:
Alter table <table name> modify(column name1 datatype1);
DROP:
Alter table <table name> drop (column name);
RENAME:
Rename <old table name> to <new table name>;
DROP:
Drop table <table name>;
DATA MANIPULATION LANGUAGE
DML Commands:
Insert
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 34 of 97

CS6302-Database Management Systems


Select
Update
Delete

Department of IT/CSE

2016-2017

SYNTAX:
INSERT:
Single level:
Insert into <table name> values (attributes1, attributes2);
Multilevel:
Insert into <table name> values (&attributes1,&attributes2.);
SELECT:
Single level:
Select <column name> from <table name>;
Multilevel:
Select * from <table name> where <condition>;
UPDATE:
Single level:
Update <table name> set <column name>=values where <condition>;
Multilevel:
Update <table name> set <column name>=values;
DELETE:
Single level:
Delete from <table name> where <column name>=values;
Multilevel:
Delete from <table name>;
DCL (Data Control Language)
GRANT
Used to grant privileges to the user
GRANT privilege_name
ON object_name
TO {user_name |PUBLIC |role_name}
[WITH GRANT OPTION];

privilege_name is the access right or privilege granted to the user. Some of the access rights
are ALL, EXECUTE, and SELECT.
object_name is the name of an database object like TABLE, VIEW, STORED PROC and
SEQUENCE.
user_name is the name of the user to whom an access right is being granted.
user_name is the name of the user to whom an access right is being granted.
PUBLIC is used to grant access rights to all users.
ROLES are a set of privileges grouped together.
WITH GRANT OPTION - allows a user to grant access rights to other users.

REVOKE
Used to revoke privileges from the user
REVOKE privilege_name
ON object_name
FROM {user_name |PUBLIC |role_name}
1) System privileges - This allows the user to CREATE, ALTER, or DROP database objects.
2) Object privileges - This allows the user to EXECUTE, SELECT, INSERT, UPDATE, or DELETE data
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 35 of 97

CS6302-Database Management Systems


Department of IT/CSE
from database objects to which the privileges apply.

2016-2017

TCL (Transaction Control Language)


COMMIT
Used to made the changes permanently in the Database.
ROLLBACK
Similar to the undo operation.
SQL> delete from branch;
6 rows deleted.
SQL> select * from branch;
no rows selected
SQL> rollback;
Rollback complete.
SQL> select * from branch;
BRANCH_NAME BRANCH_CITY
ASSETS
--------------- -------------------- ------------------------------------------tambaram
chennai-20
50000
adayar
chennai-20
100000
tnagar
chennai-17
250000
saidapet
chennai-15
150000
chrompet
chennai-43
450000
guindy
chennai-32
150000
6 rows selected.
SAVE POINT
SQL> select * from customer;
CUSTID PID
QUANTITY
---------- ---------- ---------100
1234
10
101
1235
15
102
1236
15
103
1237
10
SQL> savepoint s1;
Savepoint created.
SQL> Delete from customer where custid=103;
CUSTID PID
QUANTITY
---------- ---------- ---------100
1234
10
101
1235
15
102
1236
15
SQL> rollback to s1;
Rollback complete.
SQL> select * from customer;
CUSTID PID
QUANTITY
---------- ---------- ---------100
1234
10
101
1235
15
102
1236
15
103
1237
10
SQL> commit;
15

Assume the following table.


Degree(degcode, name, subject)
Candidate(cand_name seatno, degcode, semester, month, year, result)
Marks(seatno, degcode, semester, month, year, papcode, marks)

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 36 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Degcode-degree code, Name-name of the degree (MSc, MCOM)


Subject-subject of the course. E.g. Phy, Papcode- Paper code E.g. A1
Solve the following queries using SQL:
(i) Write a SELECT statement to display all the degree codes which are there in the candidate table but not present in
degree table in the order of degcode.

Select degcode from candidate minus select degcode from degree order by degcode;

16

(ii) Write a SELECT statement to display the name of all the candidates who have got less than 40 marks in exactly 2
subjects.
Select cand_name from candidate c,degree d,mark m group by cand_name having (c.degcode =m.degcode) and
m.mark<40 and count(subject)=2;
(iii)Write a SELECT statement to display the name, subject and number of candidates for all degrees in which there
are less than 5 candidates.
Select name,subject,count(cand_name) from degree d,candidate c having (d.degcode =c.candidate )and
count(cand_name)<5 group by subject;
(iv) Write a SELECT statement to display the names of all the candidates who have got highest total marks in MSc.,
(Maths) (Nov/Dec 2015)
Select cand_name,subject sum(mark) from candidate c , degree d,mark m where d.degcode=m.degcode and
subject=MSc.,(maths) group by mark;
Briefly explain about Query Processing(May/June 2016)

The process of choosing a suitable execution strategy for processing a query is called query
optimization.
Two internal representations of a query
Query Tree
a tree data structure that corresponds to a relational algebra expression. It
represents the input relations of the query as leaf nodes of the tree, and
represents the relational algebra operations as internal nodes.
Query Graph
a graph data structure that corresponds to a relational calculus expression.
It does not indicate an order on which operations to perform first. There is
only a single graph corresponding to each query.

Steps in processing a high-level query

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 37 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Cost-based query optimization


Estimate and compare the costs of executing a query using different execution strategies
and choose the strategy with the lowest cost estimate.
Issues

Cost function
Number of execution strategies to be considered

Cost Components for Query Execution


1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
Catalog Information Used in Cost Functions
Information about the size of a file
o number of records (tuples) (r),
o record size (R),
o number of blocks (b)
o blocking factor (bfr)
Information about indexes and indexing attributes of a file
o Number of levels (x) of each multilevel index
o Number of first-level index blocks (bI1)
o Number of distinct values (d) of an attribute
o Selectivity (sl) of an attribute
o Selection cardinality (s) of an attribute. (s = sl * r)
Examples of Cost Functions for SELECT
S1. Linear search (brute force) approach
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 38 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

o CS1a = b;
o For an equality condition on a key, CS1a = (b/2) if the record is found; otherwise
CS1a = b.
S2. Binary search:
o CS2 = log2b + (s/bfr) 1
o For an equality condition on a unique (key) attribute, CS2 =log2b
S3. Using a primary index (S3a) or hash key (S3b) to retrieve a single record
o CS3a = x + 1; CS3b = 1 for static or linear hashing;
o CS3b = 1 for extendible hashing;
S4. Using an ordering index to retrieve multiple records:
o For the comparison condition on a key field with an ordering index, CS4 = x +
(b/2)
S5. Using a clustering index to retrieve multiple records:
o CS5 = x + (s/bfr)
S6. Using a secondary (B+-tree) index:
o For an equality comparison, CS6a = x + s;
o For an comparison condition such as >, <, >=, or <=,
o CS6a = x + (bI1/2) + (r/2)
S7. Conjunctive selection:
o Use either S1 or one of the methods S2 to S6 to solve.
o For the latter case, use one condition to retrieve the records and then check in the
memory buffer whether each retrieved record satisfies the remaining conditions in
the conjunction.
S8. Conjunctive selection using a composite index:
o Same as S3a, S5 or S6a, depending on the type of index.

Examples of Cost Functions for JOIN


J1. Nested-loop join:
o CJ1 = bR + (bR*bS) + ((js* |R|* |S|)/bfrRS)
o (Use R for outer loop)
J2. Single-loop join (using an access structure to retrieve the matching record(s))
o If an index exists for the join attribute B of S with index levels xB, we can retrieve
each record s in R and then use the index to retrieve all the matching records t
from S that satisfy t[B] = s[A].
o The cost depends on the type of index.
o For a secondary index,
CJ2a = bR + (|R| * (xB + sB)) + ((js* |R|* |S|)/bfrRS);
o For a clustering index,
CJ2b = bR + (|R| * (xB + (sB/bfrB))) + ((js* |R|* |S|)/bfrRS);
o For a primary index,
CJ2c = bR + (|R| * (xB + 1)) + ((js* |R|* |S|)/bfrRS);
o If a hash key exists for one of the two join attributes B of S
CJ2d = bR + (|R| * h) + ((js* |R|* |S|)/bfrRS);
J3. Sort-merge join:
CJ3a = CS + bR + bS + ((js* |R|* |S|)/bfrRS);
(CS: Cost for sorting files)
UNIT III
TRANSACTION PROCESSING AND CONCURRENCY CONTROL
Introduction-Properties of Transaction- Serializability- Concurrency Control Locking Mechanisms-Two Phase Commit
Protocol-Dead lock.
UNIT-III/ PART-A

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 39 of 97

CS6302-Database Management Systems


1

2
3
4

9
10

11

12
13
14

15

16
17

18

19

Department of IT/CSE

2016-2017

Give the reasons for allowing concurrency?


The reasons for allowing concurrency is if the transactions run serially, a short transaction may have to wait for a
preceding long transaction to complete, which can lead to unpredictable delays in running a transaction. So
concurrent execution reduces the unpredictable delays in running transactions.
What is average response time?
The average response time is that the average time for a transaction to be completed after it has been submitted.
What are the two types of serializability?
The two types of serializability is Conflict serializability, View serializability.
Differentiate strict two phase locking protocol and rigorous two phase locking protocol.(May/June 2016)
In strict two phases locking protocol all exclusive mode locks taken by a transaction is held until that transaction
commits. Rigorous two phase locking protocol requires that all locks be held until the Transaction commits.
How the time stamps are implemented
Use the value of the system clock as the time stamp. That is a transactions time stamp is equal to the value of the
clock when the transaction enters the system.
Use a logical counter that is incremented after a new timestamp has been assigned; that is the time stamp is equal to
the value of the counter.
What are the time stamps associated with each data item?
W-timestamp (Q) denotes the largest time stamp if any transaction that executed WRITE (Q) successfully.
R-timestamp (Q) denotes the largest time stamp if any transaction that executed READ (Q) successfully.
Define blocks?
The database system resides permanently on nonvolatile storage, and is partitioned into fixed-length storage units
called blocks.
What are the different modes of lock?
The modes of lock are:
Shared
Exclusive
Define deadlock?
Neither of the transaction can ever proceed with its normal execution. This situation is called deadlock
Define the phases of two phase locking protocol
Growing phase: a transaction may obtain locks but not release any lock.
Shrinking phase: a transaction may release locks but may not obtain any new locks.
Define upgrade and downgrade?
It provides a mechanism for conversion from shared lock to exclusive lock is known as upgrade.
It provides a mechanism for conversion from exclusive lock to shared lock is known as downgrade.
What is a database graph?
The partial ordering implies that the set D may now be viewed as a directed acyclic graph, called a database graph.
What is meant by buffer blocks?
The blocks residing temporarily in main memory are referred to as buffer blocks.
What are uncommitted modifications?
The immediate-modification technique allows database modifications to be output to the database while the
transaction is still in the active state. Data modifications written by active transactions are called uncommitted
modifications.
Define shadow paging.
An alternative to log-based crash recovery technique is shadow paging. This technique needs fewer disk accesses
than do the log-based methods.
Define page.
The database is partitioned into some number of fixed-length blocks, which are referred to as pages.
Explain current page table and shadow page table.
The key idea behind the shadow paging technique is to maintain two page tables during the life of the transaction:
the current page table and the shadow page table. Both the page tables are identical when the transaction starts. The
current page table may be changed when a transaction performs a write operation.
What are the drawbacks of shadow-paging technique?
Commit Overhead
Data fragmentation
Garbage collection
What is meant by garbage collection.(May/June 2016)
Garbage may be created also as a side effect of crashes. Periodically, it is necessary to find all the garbage pages and
to add them to the list of free pages. This process is called garbage collection.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 40 of 97

CS6302-Database Management Systems


20
21

22

23

24

25

26

27

Department of IT/CSE

2016-2017

What is transaction?
Collections of operations that form a single logical unit of work are called transactions.
What are the properties of transaction? Or Write the ACID properties of Transaction. (Nov/Dec 2014)
(Apr/May 2015)(May/June 2016)
Atomicity
Consistency
Isolation

Durability
What is recovery management component?
Ensuring durability is the responsibility of a software component of the base system called the recovery management
component.
When is a transaction rolled back?
Any changes that the aborted transaction made to the database must be undone. Once the changes caused by an
aborted transaction have been undone, then the transaction has been rolled back.
What are the states of transaction?
The states of transaction are
Active
Partially committed
Failed
Aborted
Committed
Terminated
What is a shadow copy scheme?
It is simple, but efficient, scheme called the shadow copy schemes. It is based on making copies of the database
called shadow copies that one transaction is active at a time. The scheme also assumes that the database is simply a
file on disk.
What is serializability? How it is tested? (Nov/Dec 2014)
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Precedence graph is used to
test the serializability
Mention the approaches of deadlock recovery
The common solution is to roll back one or more transactions to break the deadlock

28
29

Selection of victim , Rollback - Partial , Total , Starvation.


Give an example of two phase commit protocol. (Nov/Dec 2015)
Client want all or nothing transactions and Transfer either happens or nothing at all.
What is meant by concurrency control? (Nov/Dec 2015)
A transaction is a particular execution of the program. When multiple transactions are trying to access the same
shareable resource, many problems arise if the access control is not done properly. Mechanisms to which access
control can be maintained is called Concurrency control.
UNIT-III / PART-B
What is serializability? Explain its types? (Nov/Dec 2015)

A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different


forms of schedule equivalence give rise to the notions of:
1.
conflict serializability
2.
view serializability
conflict serializability

Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some
item Q accessed by both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q).li and lj dont conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
If a schedule S can be transformed into a schedule S by a series of swaps of non-conflicting
instructions, we say that S and S are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 41 of 97

CS6302-Database Management Systems


view serializability

Department of IT/CSE

2016-2017

Let S and S be two schedules with the same set of transactions. S and S are view equivalent if
the following three conditions are met, for each data item Q,
If in schedule S, transaction Ti reads the initial value of Q, then in schedule S also
transaction Ti must read the initial value of Q.
If in schedule S transaction Ti executes read(Q), and that value was produced by
transaction Tj (if any), then in schedule S also transaction Ti must read the value of Q
that was produced by the same write(Q) operation of transaction Tj .
The transaction (if any) that performs the final write(Q) operation in schedule S must also
perform the final write(Q) operation in schedule S.

Write short notes on Transaction State and discuss the properties of transaction.

Active the initial state; the transaction stays in this state while it is executing
Partially committed after the final statement has been executed.
Failed -- after the discovery that normal execution can no longer proceed.
Aborted after the transaction has been rolled back and the database restored to its state prior to
the start of the transaction. Two options after it has been aborted:
restart the transaction
kill the transaction
Committed after successful completion.
ACID Properties
- Atomicity
-Consistency
-Isolation
-Durability
3

Briefly describe two phase locking in concurrency control techniques. (Nov/Dec 2014)

This protocol requires that each transaction issue lock and unlock request in two phases
Growing phase
Shrinking phase
Growing phase
-During this phase new locks can be occurred but none can be released
Shrinking phase
-During which existing locks can be released and no new locks can be occurred
Types
Strict two phase locking protocol
Rigorous two phase locking protocol
Strict two phase locking protocol
This protocol requires not only that locking be two phase, but also all exclusive locks
taken by a transaction be held until that transaction commits.
Rigorous two phase locking protocol
This protocol requires that all locks be held until all transaction commits
4

Explain the concepts of concurrent execution in Transaction processing system. (Nov/Dec 2014)

The transaction-processing system allows concurrent execution of multiple transactions to


improve the system performance. In concurrent execution, the database management system controls the
execution of two or more transactions in parallel; however, allows only one operation of any transaction to
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 42 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
occur at any given time within the system. This is also known as interleaved execution of multiple
transactions. The database system allows concurrent execution of transactions due to two reasons.
First, a transaction performing read or write operation using I/O devices may not be using the
CPU at a particular point of time. Thus, while one transaction is performing I/O operations, the CPU can
process another transaction. This is possible because CPU and I/O system in the computer system are
capable of operating in parallel. This overlapping of I/O and CPU activities reduces the amount of time
for which the disks and processors are idle and, thus, increases the throughput of the system (the number
of transactions executed in a given amount of time).
5

Why concurrency control is needed? Explain with an example.

Simultaneous execution of transactions over a shared database can create several data
integrity and consistency problems.
- lost updated problem
-Temporary updated problem
-Incorrect summery problem
Lost updated problem
T1

T2

Read(X)
X:=X-N
Write(X)
Read(Y)
Y:=Y+N
Write(Y)

Read(X)
X:=X+M
Write(X)

Temporary updated problem


T1

T2

Read(X)
X:=X-N
Write(X)
Rollback
Read(Y)

Read(X)
X:=X-M
Write(X)

Incorrect summery problem


T1

T2

Read(B)
B:=B-N
Write(B)
Read(X)
X:=X-N
Write(X)

Sum:=0
Read(A)
Sum:=sum+A
Read(B)
Sum:=sum+B
.
.
.
Read(X)
Sum:=sum+X

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 43 of 97

CS6302-Database Management Systems


6

Department of IT/CSE

2016-2017

Explain about dead lock recovery algorithm with an example.

The common solution is to roll back one or more transactions to break the deadlock.
Three action need to be taken
a. Selection of victim
b. Rollback
c. Starvation
Selection of victim
i. Set of deadlocked transations,must determine which transaction to roll back to
break the deadlock.
ii. Consider the factor minimum cost
Rollback
- once we decided that a particular transaction must be rolled back, must determine how far this
transaction should be rolled back
- Total rollback
- Partial rollback
Starvation
Ensure that a transaction can be picked as victim only a finite number of times.
7

Illustrate Granularity locking method in concurrency control.

Intent locking
Intent locks are put on all the ancestors of a node before that node is locked explicitly.
If a node is locked in an intention mode, explicit locking is being done at a lower level of the tree.
Intent shared lock(IS)
If a node is locked in indent shared mode, explicit locking is being done at a lower level
of the tree, but with only shared-mode lock
Suppose the transaction T1 reads record ra2 in file Fa. Then,T1 needs to lock the database,
area A1,and Fa in IS mode, and finally lock ra2 in S mode.
Intent exclusive lock(IX)
if a node is locked in intent locking is being done at a lower level of the tree, but with exclusive
mode or shared-mode locks.
Suppose the transaction T2 modifies record ra9 in file Fa. Then,T2 needs to lock the database, area
A1,and Fa in IX mode, and finally to lock ra9 in X mode
Shared Intent exclusive lock (SIX)
If the node is locked in Shared Intent exclusive mode, the subtree rooted by that node is
locked explicitly in shared mode, and that explicit locking is being done at lower level
with exclusive mode.
Shared lock (S)
-T can tolerate concurrent readers but not concurrent updaters in R.
Exclusive lock (X)
-T cannot tolerate any concurrent access to R at all.
8

Describe Database Recovery concepts.

Crash Recovery
Though we are living in highly technologically advanced era where hundreds of satellite monitor the earth
and at every second billions of people are connected through information technology, failure is expected
but not every time acceptable.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 44 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

DBMS is highly complex system with hundreds of transactions being executed every second. Availability
of DBMS depends on its complex architecture and underlying hardware or system software. If it fails or
crashes amid transactions being executed, it is expected that the system would follow some sort of
algorithm or techniques to recover from crashes or failures.
Failure Classification
To see where the problem has occurred we generalize the failure into various categories, as follows:
TRANSACTION FAILURE
When a transaction is failed to execute or it reaches a point after which it cannot be completed
successfully it has to abort. This is called transaction failure. Where only few transaction or process are
hurt.
Reason for transaction failure could be:
Logical errors: where a transaction cannot complete because of it has some code error or any
internal error condition
System errors: where the database system itself terminates an active transaction because DBMS
is not able to execute it or it has to stop because of some system condition. For example, in case of
deadlock or resource unavailability systems aborts an active transaction.
SYSTEM CRASH
There are problems, which are external to the system, which may cause the system to stop abruptly and
cause the system to crash. For example interruption in power supply, failure of underlying hardware or
software failure.
Examples may include operating system errors.
DISK FAILURE:
In early days of technology evolution, it was a common problem where hard disk drives or storage drives
used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other
failure, which destroys all or part of disk storage
Storage Structure
We have already described storage system here. In brief, the storage structure can be divided in various
categories:
Volatile storage: As name suggests, this storage does not survive system crashes and mostly
placed very closed to CPU by embedding them onto the chipset itself for examples: main memory, cache
memory. They are fast but can store a small amount of information.
Nonvolatile storage: These memories are made to survive system crashes. They are huge in data
storage capacity but slower in accessibility. Examples may include, hard disks, magnetic tapes, flash
memory, non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it many have several transactions being executed and various files opened for
them to modifying data items. As we know that transactions are made of various operations, which are
atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole must
be maintained that is, either all operations are executed or none.
When DBMS recovers from a crash it should maintain the following:

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 45 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

It should check the states of all transactions, which were being executed.

A transaction may be in the middle of some operation; DBMS must ensure the atomicity of
transaction in this case.

It should check whether the transaction can be completed now or needs to be rolled back.

No transactions would be allowed to left DBMS in inconsistent state.

There are two types of techniques, which can help DBMS in recovering as well as maintaining the
atomicity of transaction:

Maintaining the logs of each transaction, and writing them onto some stable storage before
actually modifying the database.

Maintaining shadow paging, where are the changes are done on a volatile memory and later the
actual database is updated.

Log-Based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is
important that the logs are written prior to actual modification and stored on a stable storage media, which
is failsafe.
Log based recovery works as follows:

The log file is kept on stable storage media

When a transaction enters the system and starts execution, it writes a log about it
<Tn, Start>

When the transaction modifies an item X, it write logs as follows:


<Tn, X, V1, V2>

It reads Tn has changed the value of X, from V1 to V2.

When transaction finishes, it logs:


<Tn, commit>

Database can be modified using two approaches:


Deferred database modification: All logs are written on to the stable storage and database is updated
when transaction commits.
Immediate database modification: Each log follows an actual database modification. That is, database is
modified immediately after every operation.
Recovery with concurrent transactions
When more than one transactions are being executed in parallel, the logs are interleaved. At the time of
recovery it would become hard for recovery system to backtrack all logs, and then start recovering. To
ease this situation most modern DBMS use the concept of 'checkpoints'.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 46 of 97

CS6302-Database Management Systems


CHECKPOINT

Department of IT/CSE

2016-2017

Keeping and maintaining logs in real time and in real environment may fill out all the memory space
available in the system. At time passes log file may be too big to be handled at all. Checkpoint is a
mechanism where all the previous logs are removed from the system and stored permanently in storage
disk. Checkpoint declares a point before which the DBMS was in consistent state and all the transactions
were committed.

RECOVERY
When system with concurrent transaction crashes and recovers, it does behave in the following manner:

The recovery system reads the logs backwards from the end to the last Checkpoint.

It maintains two lists, undo-list and redo-list.

If the recovery system sees a log with <T n, Start> and <Tn, Commit> or just <Tn, Commit>, it puts
the transaction in redo-list.
If the recovery system sees a log with <T n, Start> but no commit or abort log found, it puts the
transaction in undo-list.

All transactions in undo-list are then undone and their logs are removed. All transaction in redolist, their previous logs are removed and then redone again and log saved.

What is concurrency control? How is it implemented in DBMS? Illustrate with a suitable example.
In a multiprogramming environment where multiple transactions can be executed simultaneously, it is highly
important to control the concurrency of transactions. We have concurrency control protocols to ensure atomicity,
isolation, and serializability of concurrent transactions. Concurrency control protocols can be broadly divided into
two categories

Lock based protocols

Time stamp based protocols

Lock-based Protocols
Database systems equipped with lock-based protocols use a mechanism by which any transaction cannot read or
write data until it acquires an appropriate lock on it. Locks are of two kinds

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 47 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Binary Locks A lock on a data item can be in two states; it is either locked or unlocked.

Shared/exclusive This type of locking mechanism differentiates the locks based on their uses. If a lock is
acquired on a data item to perform a write operation, it is an exclusive lock. Allowing more than one
transaction to write on the same data item would lead the database into an inconsistent state. Read locks
are shared because no data value is being changed.

There are four types of lock protocols available


Simplistic Lock Protocol
Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write' operation is
performed. Transactions may unlock the data item after completing the write operation.
Pre-claiming Lock Protocol
Pre-claiming protocols evaluate their operations and create a list of data items on which they need locks. Before
initiating an execution, the transaction requests the system for all the locks it needs beforehand. If all the locks are
granted, the transaction executes and releases all the locks when all its operations are over. If all the locks are not
granted, the transaction rolls back and waits until all the locks are granted.

Two-Phase Locking 2PL


This locking protocol divides the execution phase of a transaction into three parts. In the first part, when the
transaction starts executing, it seeks permission for the locks it requires. The second part is where the transaction
acquires all the locks. As soon as the transaction releases its first lock, the third phase starts. In this phase, the
transaction cannot demand any new locks; it only releases the acquired locks.

Two-phase locking has two phases, one is growing, where all the locks are being acquired by the transaction; and
the second phase is shrinking, where the locks held by the transaction are being released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it to an

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 48 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

exclusive lock.
Strict Two-Phase Locking
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the transaction
continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a lock after using it. Strict-2PL
holds all the locks until the commit point and releases all the locks at a time.

Strict-2PL does not have cascading abort as 2PL does.


Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This protocol uses either system
time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the time of execution,
whereas timestamp-based protocols start working as soon as a transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age of the transaction. A
transaction created at 0002 clock time would be older than all other transactions that come after it. For example,
any transaction 'y' entering the system at 0004 is two seconds younger and the priority would be given to the older
one.
In addition, every data item is given the latest read and write-timestamp. This lets the system know when the last
read and write operation was performed on the data item.
Timestamp Ordering Protocol
The timestamp-ordering protocol ensures serializability among transactions in their conflicting read and write
operations. This is the responsibility of the protocol system that the conflicting pair of tasks should be executed
according to the timestamp values of the transactions.

The timestamp of transaction Ti is denoted as TS(Ti).

Read time-stamp of data-item X is denoted by R-timestamp(X).

Write time-stamp of data-item X is denoted by W-timestamp(X).

Timestamp ordering protocol works as follows

If a transaction Ti issues a read(X) operation

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 49 of 97

CS6302-Database Management Systems


o

Operation rejected.

If TS(Ti) >= W-timestamp(X)

2016-2017

If TS(Ti) < W-timestamp(X)

Department of IT/CSE

Operation executed.

All data-item timestamps updated.

If a transaction Ti issues a write(X) operation


o

If TS(Ti) < R-timestamp(X)

If TS(Ti) < W-timestamp(X)

Operation rejected.
Operation rejected and Ti rolled back.

Otherwise, operation executed.

Thomas' Write Rule


This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and Ti is rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.
In a multi-process system, deadlock is an unwanted situation that arises in a shared resource environment, where a
process indefinitely waits for a resource that is held by another process.
For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to complete its task. Resource X
is held by T1, and T1 is waiting for a resource Y, which is held by T2. T2 is waiting for resource Z, which is held by
T0. Thus, all the processes wait for each other to release resources. In this situation, none of the processes can finish
their task. This situation is known as a deadlock.
Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the transactions involved in the
deadlock are either rolled back or restarted.
10

What is deadlock? How does it occur? How transactions be written to (i) Avoid deadlock (ii) guarantee correct
execution. Illustrate with suitable example. (Nov/Dec 2015) (Nov/Dec 2014)
deadlock
System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another
transaction in the set.
Consider the following two transactions:
T1: write (A)
T2: write(A)
write(B)
write(B)
Schedule with deadlock

Avoid deadlock:
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 50 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Deadlock prevention protocol


Ensure that the system will never enter into a deadlock state.
Some prevention strategies :
approach1
Require that each transaction locks all its data items before it begins execution either all are
locked in one step or none are locked.
Disadvantages
Hard to predict ,before transaction begins, what data item need to be locked.
Data item utilization may be very low.
Approch2
Assign a unique timestamp to each transaction.
These timestamps only to decide whether a transaction should wait or rollback.
schemes:
- wait-die scheme
- wound-wait scheme
wait-die scheme
- non preemptive technique
When transaction Ti request a data item currently
held by Tj,Ti is allowed to wait only if it has a
timestamp smaller than that of Tj. otherwise ,Ti rolled back(dies)
older transaction may wait for younger one to release data item. Younger transactions never
wait for older ones; they are rolled back instead.
A transaction may die several times before acquiring needed data item
Example.
Transaction T1,T2,T3 have time stamps 5,10,15,respectively.
if T 1 requests a data item held by T2,then T1 will wait.
If T3 request a data item held by T2,then T3 will be rolled back
wound-wait scheme
- Preemptive technique
- When transaction Ti requests a data item currently held by T j,Ti is allowed to wait only if it has a
timestamp larger than that of Tj. Otherwise Tj is rolled back
-Older transaction wounds (forces rollback) of younger transaction instead of waiting for it. Younger
transactions may wait for older ones.
Example
Transaction T1,T2,T3 have time stamps
5,10,15,respectively.
if T1 requests a data item held by T2,then the data item will be preempted from T 2,and T2 will be
rolled back.
If T3 requests a data item held by T2,then T3 will wait.
11

Briefly explain about Two phase commit and three phase commit protocols. (Apr/May 2015) (May/June 2016)
(Nov/Dec 2014)

Two phase commit protocol


Assumes fail-stop model failed sites simply stop working, and do not cause any other harm,
such as sending incorrect messages to other sites.
Execution of the protocol is initiated by the coordinator after the last step of the transaction has
been reached.
The protocol involves all the local sites at which the transaction executed
Let T be a transaction initiated at site Si, and let the transaction coordinator at Si be Ci.
Phase 1: Obtaining a Decision (prepare)
Coordinator asks all participants to prepare to commit transaction Ti.
Ci adds the records <prepare T> to the log and forces log to stable storage
sends prepare T messages to all sites at which T executed
Upon receiving message, transaction manager at site determines if it can commit the transaction
if not, add a record <no T> to the log and send abort T message to Ci
if the transaction can be committed, then:
add the record <ready T> to the log
force all records for T to stable storage
send ready T message to Ci

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 51 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Phase 2: Recording the Decision (commit)
T can be committed of Ci received a ready T message from all the participating sites: otherwise T
must be aborted.
Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto
stable storage. Once the record stable storage it is irrevocable (even if failures occur)
Coordinator sends a message to each participant informing it of the decision (commit or abort)
Participants take appropriate action locally.

Possible Failures
Site Failure
Coordinator Failure
Network Partition
Three phase commit protocols.
Assumptions:
No network partitioning
At any point, at least one site must be up.
At most K sites (participants as well as coordinator) can fail
Phase 1: Obtaining Preliminary Decision: Identical to 2PC Phase 1.
Every site is ready to commit if instructed to do so
Under 2PC each site is obligated to wait for decision from coordinator
Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator failure
Phase 2: Recording the Preliminary Decision
Coordinator adds a decision record (<abort T> or <precommit T>) in its log and forces record
to stable storage. Coordinator sends a message to each participant informing it of the decision.
Participant records decision in its log
If abort decision reached then participant aborts locally
If pre-commit decision reached then participant replies with
<acknowledge T>
Phase 3: Recording Decision in the Database
Executed only if decision in phase 2 was to pre commit
Coordinator collects acknowledgments. It sends <commit T> message to the participants as soon as it
receives K acknowledgments.
Coordinator adds the record <commit T> in its log and forces record to stable storage.
Coordinator sends a message to each participant to <commit T>.
Participants take appropriate action locally
Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator failure
Avoids blocking problem as long as < K sites fail
Drawbacks:
higher overheads
Assumptions may not be satisfied in practice.
12

Consider the following schedules. The actions are listed in the order they are schedule, and prefixed with transaction
name.
S1: T1: R(X), T2: R(x), T1: W(Y), T2: W(Y), T1: R(Y), T2: R(Y)
S2:T3: R(X), T1: R(X), T1: W(Y), T2: R (Z), T2: W (Z), T3: R (Z)
For each of the schedules, answer the following questions:
i.
What is the precedence graph for the schedule?
ii.
Is the schedule conflict-serializable? If so, what are all the conflict equivalent serial schedules?
iii.
Is the schedule view-serializable? If so, what are all the view equivalent serial schedules? (Apr/May 2015)

The Precedence graph for this schedule

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 52 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

1. Serizability (or view) cannot be decided but NOT conflict serizability. It is recoverable and
avoid cascading aborts; NOT strict
2. It is serializable, conflict-serializable, and view-serializable regardless which action (commit or
abort) follows It is NOT avoid cascading aborts, NOT strict; We can not decide whether it's
recoverable or not, since the abort/commit sequence of these two transactions are not specified.
3. It is the same with 2.
4. Serizability (or view) cannot be decided but NOT conflict serizability. It is NOT avoided
cascading aborts, NOT strict; we cannot decide whether it's recoverable or not, since the
abort/commit sequence of these transactions are not specified.
5. It is serializable, conflict-serializable, and view-serializable; It is recoverable and avoid
cascading aborts; it is NOT strict.
6. It is NOT serializable, NOT view-serializable, and NOT conflict-serializable; it is recoverable
and avoids cascading aborts; It is NOT strict.
7. It belongs to all classes
8. It is serializable, NOT view-serializable, NOT conflict-serializable; it is NOT recoverable,
therefore NOT avoid cascading aborts, NOT strict.
9. It is serializable, view-serializable, and conflict-serializable; It is NOT recoverable, therefore
NOT avoid cascading aborts, NOT strict.
10. It belongs to all above classes.
11. It is NOT serializable and NOT view-serializable, NOT conflict-serializable; it is recoverable,
avoid cascading aborts and strict.
12. It is NOT serializable and NOT view-serializable, NOT conflict-serializable; it is recoverable,
but NOT avoid cascading aborts, NOT strict.
13

Explain about Locking Protocols. (May/June 2016).


Lock
A lock is a variable associated with a data item that describe the statues of the item with respect to possible
operations that can be applied to it.
Locking is an operation which secures
(a) permission to Read
(b) permission to Write a data item for a transaction.
Example:
Lock (X). Data item X is locked in behalf of the requesting transaction.
Unlocking is an operation which removes these permissions from the data item.
Example:
Unlock (X): Data item X is made available to all other transactions.
Lock and Unlock are Atomic operations.
Types of lock

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 53 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Binary lock
Read/write(shared / Exclusive) lock

Binary lock

It can have two states (or) values 0 and 1.


0 unlocked
1 - locked
Lock value is 0 then the data item can accessed when requested.
When the lock value is 1,the item cannot be accessed when requested.
Lock_item(x)
B : if lock(x) = 0 ( * item is unlocked * )
then lock(x)
1
else begin
wait ( until lock(x) = 0 )
goto B;
end;
Unlock_item(x)
B : if lock(x)=1 ( * item is locked * )
then lock(x)
0
else
printf ( already is unlocked )
goto B;
end;
Read / write(shared/exclusive) lock
Read_lock
- its also called shared-mode lock
- If a transaction Ti has obtain a shared-mode lock on item X, then Ti can read, but cannot write ,X.
- Outer transactions are also allowed to read the data item but cannot write.
Read_lock(x)
B : if lock(x) = unlocked then
begin
lock(x)
read_locked
no_of_read(x)
1
else if
lock(x) = read_locked
then
no_of_read(x)
no_of_read(x) +1
else begin
wait (until lock(x) = unlocked
goto B;
end;
Write_lock
It is also called exclusive-mode lock.
If a transaction Ti has obtained an exclusive-mode lock on item X , then T i can both read and write
X.
Outer transactions cannot allowed to read and write the data item.
Write_lock(x)
B : if lock(x) = unlocked then
begin
lock(x)
write_locked
else if
lock(x) = write_locked
wait ( until lock(x) = unlocked )
else begin
lock(x)=read_locked then
wait ( until lock(x) = unlocked )
end;
Unlock(x)

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 54 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

If lock(x) = write_locked then


Begin
Lock(x)
unlocked
Else if
lock(x) = read_locked then
Begin
No_of_read(x)
no_of_read(x) - 1
If ( no_of_read(x) = 0 ) then
Begin
Lock(x)
unlocked
end

This protocol requires that each transaction issue lock and unlock request in two phases
Growing phase
Shrinking phase
Growing phase
-During this phase new locks can be occurred but none can be released
Shrinking phase
-During which existing locks can be released and no new locks can be occurred
Types
Strict two phase locking protocol
Rigorous two phase locking protocol
Strict two phase locking protocol
This protocol requires not only that locking be two phase, but also all exclusive locks
taken by a transaction be held until that transaction commits.
Rigorous two phase locking protocol
This protocol requires that all locks be held until all transaction commits
UNIT IV
TRENDS IN DATABASE TECHNOLOGY
Overview of Physical Storage Media Magnetic Disks RAID Tertiary storage File Organization Organization of
Records in Files Indexing and Hashing Ordered Indices B+ tree Index Files B tree Index Files Static Hashing
Dynamic Hashing - Introduction to Distributed Databases- Client server technology- Multidimensional and Parallel
databases- Spatial and multimedia databases- Mobile and web databases- Data Warehouse-Mining- Data marts.
UNIT-IV / PART-A
1
What is B-Tree?
A B-tree eliminates the redundant storage of search-key values .It allows search key values to appear only once.
2
What is a B+-Tree index?
A B+-Tree index takes the form of a balanced tree in which every path from the root of the root of the root of the tree
to a leaf of the tree is of the same length.
3
What is a hash index?
A hash index organizes the search keys, with their associated pointers, into a hash file structure
4
Define seek time.
The time for repositioning the arm is called the seek time and it increases with the distance that the arm is called the
seek time.
5
Define rotational latency time.
The time spent waiting for the sector to be accessed to appear under the head is called the rotational latency time.
6
What is called mirroring?
The simplest approach to introducing redundancy is to duplicate every disk. This technique is called mirroring or
shadowing.
7
What are the two main goals of parallelism?
Load balance multiple small accesses, so that the throughput of such accesses increases.
Parallelize large accesses so that the response time of large accesses is reduced.
8
What are the factors to be taken into account when choosing a RAID level?
Monetary cost of extra disk storage requirements.
Performance requirements in terms of number of I/O operations
Performance when a disk has failed.
Performances during rebuild.
9
What is an index?

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 55 of 97

CS6302-Database Management Systems


10
11

12

13

14

15

16

17
18

19

20

21

22
23
24
25

26

27

Department of IT/CSE

2016-2017

An index is a structure that helps to locate desired records of a relation quickly, without examining all records
What are the types of storage devices?
Primary storage, Secondary storage, Tertiary storage, Volatile storage, Nonvolatile storage
What is called remapping of bad sectors?
If the controller detects that a sector is damaged when the disk is initially formatted, or when an attempt is made to
write the sector, it can logically map the sector to a different physical location.
Define software and hardware RAID systems?(May/June 2016)
RAID can be implemented with no change at the hardware level, using only software modification. Such RAID
implementations are called software RAID systems and the systems with special hardware support are called
hardware RAID systems.
Define hot swapping?
Hot swapping permits the removal of faulty disks and replaces it by new ones without turning power off. Hot
swapping reduces the mean time to repair.
What are the ways in which the variable-length records arise in database systems?
Storage of multiple record types in a file, Record types that allow variable lengths for one or more fields, Record
types that allow repeating fields.
What are the two types of blocks in the fixed length representation? Define them.
Anchor block: Contains the first record of a chain.
Overflow block: Contains the records other than those that are the first record of a chain.
What is hashing file organization?
In the hashing file organization, a hash function is computed on some attribute of each record. The result of the hash
function specifies in which block of the file the record should be placed.
What are called index-sequential files?
The files that are ordered sequentially with a primary index on the search key are called index-sequential files.
Define Primary index and Secondary Index
It is in a sequentially ordered file, the index whose search key specifies the sequential order of the file. Also called
clustering index. The search key of a primary index is usually but not necessarily the primary key
It is an index whose search key specifies an order different from the sequential order of the file. Also called non
clustering index.
Define Data marts.
A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data
mart is a subset of the data warehouse that is usually oriented to a specific business line or team. Data marts are
small slices of the data warehouse.
Define Distributed Database Systems
Database spread over multiple machines (also referred to as sites or nodes).Network interconnects the machines.
Database shared by users on multiple machines is called Distributed Database Systems
Define Data mining and Data warehouse (Nov/Dec 2014)(May/June 2016)
Data mining: It refers to the mining or discovery of new information in terms of patterns or rules from vast amounts
of data.
Data warehouse: Data warehouses provide access to data for complex analysis, knowledge discovery, and decision
making
What are the advantages of web-DBMS?
Platform independence, graphical user interface, transparent network access, scalable deployment
What are the issues to be considered for mobile database?
Data distribution , replication, recovery and fault tolerance
What is spatial data?
Spatial data include geographic data, such as maps and associated information and computer aided design data
What is multimedia data? Explain about multimedia database
Multimedia data typically means digital images, audio, video, animation and graphics together with text data.The
acquisition, generation, storage and processing of multimedia data in computers and transmission over networks
have grown tremendously in the recent past. Consistency, concurrency, integrity, security and availability of data.
What are the types of Distributed Database
Homogenous distributed DB
Heterogeneous distributed DB
Define fragmentation in Distributed Database
The system partitions the relation into several fragment and stores each fragment at different sites
Two approaches : Horizontal fragmentation and Vertical fragmentation

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 56 of 97

CS6302-Database Management Systems


28

Department of IT/CSE

2016-2017

Differentiate static and dynamic hashing. (Apr/May 2015) (Nov/Dec 2015)


Static Hashing
Dynamic Hashing
In static hashing, when a search-key value is
Hash function, in dynamic hashing, is made to
provided, the hash function always computes the
produce a large number of values and only a few are
same address.
used initially.
The number of buckets provided remains unchanged
Dynamic hashing provides a mechanism in which data
at all times i.e. fixed
buckets are added and removed dynamically and ondemand .i.e. no. of buckets not fixed.
Space and overhead is more
As file grows performance decreases

Minimum space and less overhead


Performance do not degrade as file grows

29

Give an example of a join that is not a simple equi-join for which partitioned parallelism can be used.
(Nov/Dec 2015)

30

Write about the four types (Star, Snowflake, Galaxy and Fast constellation) of Data warehouse schemas.
(Apr/May 2015)
Star schema: Consists of a fact table with a single table for each dimension. This dimension table contains the set of
attributes
Snowflake Schema: It is a variation of star schema, in which the dimensional tables from a star schema are
organized into a hierarchy by normalizing them.
Fact Constellation Fact constellation is a set of tables that share some dimension tables. However, fact
constellations limit the possible queries for the warehouse.
Galaxy Combination of both Snowflake and star schema
UNIT-IV / PART-B
Describe File Organization.

-The database is stored as a collection of files.


- Each file is a sequence of records.
-A record is a sequence of fields.
-Classifications of records
i. Fixed length record
ii. Variable length record
Fixed length record approach:
b. assume record size is fixed
c. each file has records of one particular type only
d.
Different files are used for different relations. This case is easiest to implement.
Simple approach:
Record access is simple
Modification: do not allow records to cross block boundaries
Variable length record
Deletion of record I:
alternatives:
move records i + 1, . . ., n to i, . . . , n 1
do not move records, but link all free records on a free list
Variable-length records arise in database systems in several ways:
Storage of multiple record types in a file.
Record types that allow variable lengths for one or more fields.

Byte string representation


Attach an end-of-record () control character to the end of each record
Difficulty with deletion

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 57 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Slotted Page Structure


Slotted page header contains:
number of record entries
end of free space in the block
location and size of each record

Free Lists
Store the address of the first deleted record in the file header.
Use this first record to store the address of the second deleted record, and so on
Pointer method
A variable-length record is represented by a list of fixed-length records, chained together
via pointers.
Can be used even if the maximum record length is not known

Disadvantage to pointer structure; space is wasted in all records except the first in a a chain.
Solution is to allow two kinds of block in file:
Anchor block contains the first records of chain
Overflow block contains records other than those that are the first records of chains.

Define RAID and Briefly Explain RAID techniques. (Nov/Dec 2014) (Apr/May 2015) (Nov/Dec 2015)(May/June 2016)

RAID: Redundant Arrays of Independent Disks


a. disk organization techniques that manage a large numbers of disks, providing a view of a
single disk of
i. high capacity and high speed by using multiple disks in parallel, and
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 58 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
ii. high reliability by storing data redundantly, so that data can be recovered even if
a disk fails
RAID Level 0: striping; non-redundant.

Used in high-performance applications where data lost is not critical.


RAID Level 1: Mirroring (or shadowing)
Duplicate every disk.
Every write is carried out on both disks
If one disk in a pair fails, data still available in the other
Data loss would occur only if a disk fails, and its mirror disk also fails before the
system is repaired
Probability of combined event is very small

RAID Level 2: Error-Correcting-Codes (ECC) with bit striping.


RAID Level 3: Bit-Interleaved Parity
a single parity bit is enough for error correction, not just detection, since we know which
disk has failed
When writing data, corresponding parity bits must also be computed and written
to a parity bit disk
To recover data in a damaged disk, compute XOR of bits from other disks
(including parity bit disk)

RAID Level 4: Block-Interleaved Parity.


When writing data block, corresponding block of parity bits must also be computed and
written to parity disk
To find value of a damaged block, compute XOR of bits from corresponding blocks
(including parity block) from other disks.

RAID Level 5: Block-Interleaved Distributed Parity; partitions data and parity among all N + 1
disks, rather than storing data in N disks and parity in 1 disk.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 59 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores extra redundant
information to guard against multiple disk failures.
Better reliability than Level 5 at a higher cost; not used as widely.

Explain Secondary storage devices.

Can differentiate storage into:


b. volatile storage: loses contents when power is switched off
c. non-volatile storage:
i. Contents persist even when power is switched off.
ii. Includes secondary and tertiary storage.
primary storage: Fastest media but volatile (cache, main memory).
secondary storage: next level in hierarchy, non-volatile, moderately fast access time.
also called on-line storage
tertiary storage: lowest level in hierarchy, non-volatile, slow access timealso called off-line
storage
E.g. magnetic tape, optical storage
Cache fastest and most costly form of storage; volatile; managed by the computer system
hardware.
Main memory:
fast access
generally too small
capacities of up to a few Gigabytes widely used currently
Volatile contents of main memory are usually lost if a power failure or system
crash occurs.
Flash memory
Data survives power failure
Data can be written at any location.
The location can be erased and written to again
Can support only a limited number of write/erase cycles.
Reads are fast.
But writes are slow (few microseconds), erase is slower
also known as EEPROM (Electrically Erasable Programmable Read-Only Memory)
Magnetic-disk
a. Data is stored on spinning disk, and read/written magnetically
b. Primary medium for the long-term storage of data.
c. Data must be moved from disk to main memory for access, and written back for storage
i. Much slower access than main memory
d. direct-access possible to read data on disk in any order, unlike magnetic tape
i. Much larger capacity than main memory/flash memory
ii. disk failure can destroy data, but is very rare
Read-write head
-Positioned very close to the platter surface
-Reads or writes magnetically.
-Surface of platter divided into circular tracks
-Each track is divided into sectors.
-A sector is the smallest unit of data that can be read or written.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 60 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
-Head-disk assemblies
-multiple disk platters on a single spindle (typically 2 to 4)
-ne head per platter, mounted on a common arm.
Cylinder i consists of ith track of all the platters
Disk controller interfaces between the computer system and the disk drive hardware.
accepts high-level commands to read or write a sector
initiates actions such as moving the disk arm to the right track and actually reading or
writing the data
Ensures successful writing by reading back sector after writing it.
Optical storage
non-volatile, data is read optically from a spinning disk using a laser
CD-ROM and DVD most popular forms
Write-one, read-many (WORM) optical disks are available (CD-R and DVD-R)
Multiple write versions also available (CD-RW, DVD-RW)
Reads and writes are slower than with magnetic disk
Tape storage
non-volatile, Used mainly for backup, for storage of infrequently used information, and as
an off-line medium for transferring information from one system to another..
Hold large volumes of data and provide high transfer rates
sequential-access much slower than disk
Very slow access time in comparison to magnetic disks and optical disks
very high capacity (300 GB tapes available)
storage costs much cheaper than disk, but drives are expensive
4

Explain about static and dynamic hashing with an example

Static hashing:
A bucket is a unit of storage containing one or more records.
In a hash file organization we obtain the bucket of a record directly from its search-key value
using a hash function.
Hash function h is a function from the set of all search-key values K to the set of all
bucket addresses B.
Hash function is used to locate records for access, insertion as well as deletion.
Example:
-There are 10 buckets,
-The hash function returns the sum of the binary representations of the characters modulo 10
E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3

Dynamic hashing:
Good for database that grows and shrinks in size
Allows the hash function to be modified dynamically
Extendable hashing one form of dynamic hashing
Hash function generates values over a large range typically b-bit integers, with
b = 32.
At any time use only a prefix of the hash function to index into a table of bucket
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 61 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
addresses.
Let the length of the prefix be i bits, 0 i 32.
Bucket address table size = 2i. Initially i = 0
Value of i grows and shrinks as the size of the database grows and shrinks.
Multiple entries in the bucket address table may point to a bucket.
Thus, actual number of buckets is < 2i
The number of buckets also changes dynamically due to coalescing and splitting
of buckets.

General Extendable Hash Structure

In this structure, i2 = i3 = i, whereas i1 = i 1 (see next slide for details)

Use of Extendable Hash Structure


To locate the bucket containing search-key Kj:
1.
Compute h(Kj) = X
2.
Use the first i high order bits of X as a displacement into bucket address table, and follow
the pointer to appropriate bucket
Updates in Extendable Hash Structure
To insert a record with search-key value Kj
follow same procedure as look-up and locate the bucket, say j.
If there is room in the bucket j insert record in the bucket.
Overflow buckets used instead in some cases.
To delete a key value,
locate it in its bucket and remove it.
The bucket itself can be removed if it becomes empty
Coalescing of buckets can be done
Decreasing bucket address table size is also possible

Explain about Multidimensional and parallel with an example


Multidimensional DB
A multidimensional database is a computer software system designed to allow for the efficient and
convenient storage and retrieval of large volumes of data that is
(1) intimately related and
(2) stored,viewed and analyzed from different perspectives. These perspectives are called dimensions

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 62 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Let us now consider a larger three-dimensional array.


Each of our three dimensions now has 10 positions along each dimension. Conceptually, this array would look like
the 10x10x10 cube below

In relational table format, this array translates into a table of 1000 records

PERFORMANCE ADVANTAGES
- A user wants to know the Sales Volume figure when
CAR TYPE=SEDAN, COLOR=BLUE, and
DEALERSHIP=GLEASON.
- A relational system might have to search through all 1000
records just to find the right record.
- In contrast the multidimensional structure has more
"knowledge" about where a particular piece of data lies.
-If we assume that the average search equals half the maximum search, we are looking at 15 versus 500 searchesa
3300% improvement in performance!
FEATURES OF MULTIDIMENSIONAL DATABASES
Multidimensional Data Views:
Rotation
Ranging
Hierarchies, roll-ups & drill-downs
Advantages
Supports end user business queries
Powerful and efficient for comparative analysis
Handles large volumes of data
Easily maintainable
Parallel DB:
Parallel machines are becoming quite common and affordable
Prices of microprocessors, memory and disks have dropped sharply

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 63 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Databases are growing increasingly large


large volumes of transaction data are collected and stored for later analysis.
multimedia objects like images are increasingly stored in databases
Large-scale parallel database systems increasingly used for:
storing large volumes of data
processing time-consuming decision-support queries
providing high throughput for transaction processing

Types of architectures
Shared-memory
Shared-disc
Shared-nothing
Parallelism in Databases:
Data can be partitioned across multiple disks for parallel I/O.
Individual relational operations (e.g., sort, join, aggregation) can be executed in parallel
data can be partitioned and each processor can work independently on its own partition.
Queries are expressed in high level language (SQL, translated to relational algebra)
makes parallelization easier.
Different queries can be run in parallel with each other.
Concurrency control takes care of conflicts.
Thus, databases naturally lend themselves to parallelism.
I/OParallelism:
Horizontal partitioning
-Round-robin
-Hash partitioning:
Range partitioning
Interquery Parallelism:
Queries/transactions execute in parallel with one another.
Increases transaction throughput; used primarily to scale up a transaction processing system to support a
larger number of transactions per second.
Easiest form of parallelism to support, particularly in a shared-memory parallel database, because even
sequential database systems support concurrent processing.
More complicated to implement on shared-disk or shared-nothing architectures
Locking and logging must be coordinated by passing messages between processors.
Data in a local buffer may have been updated at another processor.
Cache-coherency has to be maintained reads and writes of data in buffer must find latest
version of data.
Intraquery Parallelism:
Execution of a single query in parallel on multiple processors/disks; important for speeding up long-running
queries.
Two complementary forms of intraquery parallelism :
Intraoperation Parallelism parallelize the execution of each individual operation in the query.
Interoperation Parallelism execute the different operations in a query expression in parallel.
Interoperator Parallelism:
Pipelined parallelism
Independent parallelism
6

Explain about ordered indices with an example

In an ordered index, index entries are stored sorted on the search key value.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential
order of the file.
Secondary index: an index whose search key specifies an order different from the sequential
order of the file.
Types of Ordered Indices
Dense index
Sparse index
Dense index Index record appears for every search-key value in the file.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 64 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Sparse Index: contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
Find index record with largest search-key value < K
Search file sequentially starting at the record to which the index record points

Multilevel index
If primary index does not fit in memory, access becomes expensive.
To reduce number of disk accesses to index records, treat primary index kept on disk as a
sequential file and construct a sparse index on it.
outer index a sparse index of primary index
inner index the primary index file
If even outer index is too large to fit in main memory, yet another level of index can be
created, and so on.

Secondary Indices
- Index record points to a bucket that contains pointers to all the actual records with that
particular searchkey value.
- Secondary indices have to be dense
7

Explain about B+ trees indexing concepts with an example (Nov/Dec 2014)(May/June 2016)

Disadvantage of indexed-sequential files: performance degrades as file grows, since many overflow
blocks get created. Periodic reorganization of entire file is required.
Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in the
face of insertions and deletions. Reorganization of entire file is not required to maintain performance.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 65 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead.
A B+-tree is a rooted tree satisfying the following properties:

All paths from root to leaf are of the same length


Each node that is not a root or a leaf has between [n/2] and n children.
Special cases:
o If the root is not a leaf, it has at least 2 children.
o If the root is a leaf, it can have between 0 and (n1) values.

B+Tree Node Structure


-Typical node

*Ki are the search-key values


*Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of
records (for
leaf nodes).
-The searchkeys in a node are ordered
K1 < K2 < K3 < . . . < Kn1
Properties of leaf node
For i = 1, 2, . . ., n1, pointer Pi either points to a file record with search-key value Ki, or to a
bucket of pointers to file records, each record having search-key value Ki.
Pn points to next leaf node in search-key order
Non-Leaf Nodes in B+-Trees
Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with m
pointers:All the search-keys in the subtree to which P1 points are less than K1.
8

Explain about B trees indexing concepts with an example (Nov/Dec 2014)

Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant
storage of search keys.
Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer field for
each search key in a nonleaf node must be included.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 66 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Generalized B-tree leaf node

Nonleaf node pointers Bi are the bucket or file record pointers.


9

Explain about Distributed Databases and their characteristics, also list out advantages and disadvantages.

In distributed database system data reside in several location where as centralized database system
the data reside in single location
Classification
- Homogenous distributed DB
- Heterogeneous distributed DB

Homogenous distributed DB
All sites have identical database management software, are aware of one another.
Agree to cooperate in processing users request.
Heterogeneous distributed DB
different sites may use different schemas and different DBMS software.
The sites may not be aware of one another
Provide only limited facilities for cooperation in transaction processing.
Consider a relation r, there are two approaches to store this relation in the distributed DB.
Replication
Fragmentation
Replication
The system maintains several identical replicas(copies) of the relation at different site.
Full replication- copy is stored in every site in the system.
Advantages and disadvantages
Availability
Increased parallelism
Increased overhead update
Fragmentation
The system partitions the relation into several fragment and stores each fragment at
different sites
Two approaches
Horizontal fragmentation
Vertical fragmentation
Horizontal fragmentation
Splits the relation by assigning each tuple of r to one or more fragments
relation r is partitioned into a number of subsets, r 1 ,r2,..rn and can be reconstruct the
original relation using union of all fragments, that is

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 67 of 97

CS6302-Database Management Systems


Department of IT/CSE
r = r1 U r2 UU rn

10

2016-2017

Vertical fragmentation
Splits the relation by decomposing scheme R of relation and reconstruct the original
relation by using natural join of all fragments. that is
r = r1 r2 rn

Illustrate indexing and hashing techniques with suitable examples. (Nov/Dec 2015)

In an ordered index, index entries are stored sorted on the search key value.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential
order of the file.
Secondary index: an index whose search key specifies an order different from the sequential
order of the file.
Types of Ordered Indices
Dense index
Sparse index
Dense index Index record appears for every search-key value in the file.

Sparse Index: contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
Find index record with largest search-key value < K
Search file sequentially starting at the record to which the index record points

Multilevel index
If primary index does not fit in memory, access becomes expensive.
To reduce number of disk accesses to index records, treat primary index kept on disk as a
sequential file and construct a sparse index on it.
outer index a sparse index of primary index
inner index the primary index file
If even outer index is too large to fit in main memory, yet another level of index can be
created, and so on.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 68 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Secondary Indices
- Index record points to a bucket that contains pointers to all the actual records with that
particular searchkey value.
- Secondary indices have to be dense
Hashing:
Static hashing:
A bucket is a unit of storage containing one or more records.
In a hash file organization we obtain the bucket of a record directly from its search-key value
using a hash function.
Hash function h is a function from the set of all search-key values K to the set of all
bucket addresses B.
Hash function is used to locate records for access, insertion as well as deletion.
Example:
-There are 10 buckets,
-The hash function returns the sum of the binary representations of the characters modulo 10
E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3

Dynamic hashing:
Good for database that grows and shrinks in size
Allows the hash function to be modified dynamically
Extendable hashing one form of dynamic hashing
Hash function generates values over a large range typically b-bit integers, with
b = 32.
At any time use only a prefix of the hash function to index into a table of bucket
addresses.
Let the length of the prefix be i bits, 0 i 32.
Bucket address table size = 2i. Initially i = 0
Value of i grows and shrinks as the size of the database grows and shrinks.
Multiple entries in the bucket address table may point to a bucket.
Thus, actual number of buckets is < 2i
The number of buckets also changes dynamically due to coalescing and splitting
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 69 of 97

CS6302-Database Management Systems


of buckets.

Department of IT/CSE

2016-2017

General Extendable Hash Structure

In this structure, i2 = i3 = i, whereas i1 = i 1 (see next slide for details)

11

Use of Extendable Hash Structure


To locate the bucket containing search-key Kj:
1.
Compute h(Kj) = X
2.
Use the first i high order bits of X as a displacement into bucket address table, and follow
the pointer to appropriate bucket
Updates in Extendable Hash Structure
To insert a record with search-key value Kj
follow same procedure as look-up and locate the bucket, say j.
If there is room in the bucket j insert record in the bucket.
Overflow buckets used instead in some cases.
To delete a key value,
locate it in its bucket and remove it.
The bucket itself can be removed if it becomes empty
Coalescing of buckets can be done
Decreasing bucket address table size is also possible

Write short notes on (i) Spatial and multimedia databases (ii) Mobile and web databases. (Nov/Dec 2015)
Spatial DB:

A spatial database is a database that is optimized to store and query data related to objects in space,
including points, lines and polygons
Keep track of objects in a multi-dimensional space
Maps
Geographical Information Systems (GIS)
Weather
Examples of Spatial data
Census Data
NASA satellites imagery - terabytes of data per day
Weather and Climate Data
Rivers, Farms, ecological impact
Medical Imaging
R-trees
Quad trees
Multimedia databases

Types of multimedia data are available in current systems


Text: May be formatted or unformatted. For ease of parsing structured documents,
standards like SGML and variations such as HTML are being used.
Graphics: Examples include drawings and illustrations that are encoded using some
descriptive standards (e.g. CGM, PICT, postscript).
Images: Includes drawings, photographs, and so forth, encoded in standard formats such
as bitmap, JPEG, and MPEG. Compression is built into JPEG and MPEG.
Animations: Temporal sequences of image or graphic data.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 70 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Mobile database
A mobile database is either a stationary database that can be connected to by a mobile
computing device (e.g., smart phones and PDAs) over a mobile network, or a database which is actually
stored by the mobile device. This could be a list of contacts, price information, distance travelled, or any
other information
In mobile computing, the problems are more difficult, mainly:
The limited and intermittent connectivity afforded by wireless communications.
The limited life of the power supply(battery).
The changing topology of the network.
Web databases

12

Database access over the internet


Web-based clients
Web server
Database server(s)
Web server serves pages to browsers (clients) and can access database(s)
Typical operation
Client sends a request for a page to the web server
Web server sends SQL to database
The web server uses results to create page
The page is returned to the client

Explain the architectural components of a data warehouse and write Data marts. (Apr/May 2015)

Architectural component of a data warehouse:

Operational Source System


Its the traditional OLTP systems which stores transaction data of the organizations business. Its
generally used one record at any time not necessarily stores history of the organizations informations.
Operational source systems generally not used for reporting like data warehouse.
LOAD MANAGER
The load manager performs all the operations associated with extraction and loading data into the
data warehouse.
QUERY MANAGER
The query manager performs all operations associated with management of user queries. This
component is usually constructed using vendor end-user access tools, data warehousing monitoring tools,
database facilities and custom built programs. The complexity of a query manager is determined by
facilities provided by the end-user access
ARCHIVE AND BACK UP DATA
This area of the warehouse stores detailed and summarized data for the purpose of archiving and
back up.
META DATA
The data warehouse also stores all the Meta data (data about data) definitions used by all
processes in the warehouse.
Data Staging Area
Data staging area is the storage area as well as set of ETL process that extract data from source
system. It is everything between source systems and Data warehouse.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 71 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
THE E T L (EXTRACT TRANSFORMATION LOAD) PROCESS
The 4 major process of the data warehouse. They are extract (data from the operational systems
and bring it to the data warehouse), transform (the data into internal format and structure of the data
warehouse), cleanse (to make sure it is of sufficient quality to be used for decision making) and load
(cleanse data is put into the data warehouse). The four processes from extraction through loading often
referred collectively as Data Staging.
Data Presentation Area
Data presentation area is generally called as data warehouse. Its the place where cleaned,
transformed data is stored in a dimensionally structured warehouse and made available for analysis
purpose.
Data Access Tools
Once data is available in presentation area it is accessed using data access tools like Business
Objects.
Data mart
A subset of a data warehouse that supports the requirements of a particular department or
business function.
Characteristics include:
Do not normally contain detailed operational data unlike data warehouses.
May contain certain levels of aggregation

Reasons for Creating a Data Mart


To give users more flexible access to the data they need to analyse most often.
To provide data in a form that matches the collective view of a group of users
To improve end-user response time.
Potential users of a data mart are clearly defined and can be targeted for support
To provide appropriately structured data as dictated by the requirements of the end-user access
tools.
Building a data mart is simpler compared with establishing a corporate data warehouse.
The cost of implementing data marts is far less than that required to establish a data warehouse.
Data Marts Issues
Data mart functionality
Data mart size
Data mart load performance
Users access to data in multiple data marts
Data mart Internet / Intranet access
Data mart administration
Data mart installation
UNIT V ADVANCED TOPICS
DATABASE SECURITY: Data Classification-Threats and risks Database access Control Types of Privileges
Cryptography- Statistical Databases- Distributed Databases-Architecture-Transaction Processing-Data Warehousing and
Mining-Classification-Association rules-Clustering-Information Retrieval- Relevance ranking-Crawling and Indexing the

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 72 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Web- Object Oriented Databases-XML Databases.


1

5
6

10

11

12

13

14

15

UNIT-V / PART-A
Define Database security.
Database security concerns the use of a broad range of information security controls to protect databases (potentially
including the data, the database applications or stored functions, the database systems, the database servers and the
associated network links) against compromises of their confidentiality, integrity and availability.
What are the Security risks to database systems?
Unauthorized or unintended activity or misuse by authorized database users, database administrators, or
network/systems managers, or by unauthorized users or hackers. For example inappropriate access to sensitive data,
metadata or functions within databases, or inappropriate changes to the database programs, structures or security
configurations
List out the types of information security.
Access control ,Auditing, Authentication, Encryption, Integrity controls, Backups, Application security, Database
Security applying Statistical Method
Define Authentication.
Authentication is the act of determining the identity of a user and of the host that they are using. The goal of
authentication is to first verify that the user, either a person or system, which is attempting to interact with your
system is allowed to do so. The second goal of authentication is to gather information regarding the way that the user
is accessing your system
What is Authorization?
Authorization is the act of determining the level of access that an authorized user has to behaviour and data.
Define Check Point.
This is the place to validate users and to make appropriate decisions when dealing with security breaches. Also
known as Access Verification, Validation and Penalization, and Holding off Hackers.
Define OLTP.
Online transaction processing, or OLTP, is a class of information systems that facilitate and manage transactionoriented applications, typically for data entry and retrieval transaction processing.
Define Privileges.
A separate database administrator (DBA) account should be created and protected that has full privileges to
create/drop databases, create user accounts, and update user privileges. This simple means of separation of
responsibility helps prevent accidental mis-configuration, lowers risk and lowers scope of compromise.
What is Cryptography in database?
Cryptography is the art of "extreme information security." It is extreme in the sense that once treated with a
cryptographic algorithm; a message (or a database field) is expected to remain secure even if the adversary has full
access to the treated message. The adversary may even know which algorithm was used. If the cryptography is good,
the message will remain secure.
What is Benefits of Database Encryption?
Ensure guaranteed access to encrypted data by authorized users by automating storage and back-up for mission
critical master encryption keys. Simplify data privacy compliance obligations and reporting activities through the use
of a security-certified encryption and key management to enforce critical best practices and other standards of due
care.
What is statistical database?
A statistical database is a database used for statistical analysis purposes. It is an OLAP (online analytical processing),
instead of OLTP (online transaction processing) system. Statistical databases contain parameter data and the
measured data for these parameters.
Define Database replication.
Database replication can be used on many database management systems, usually with a master/slave relationship
between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave
outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent
updates.
What is homogeneous distributed database and heterogeneous distributed database
A homogeneous distributed database has identical software and hardware running all databases instances, and may
appear through a single interface as if it were a single database. A heterogeneous distributed database may have
different hardware, operating systems, database management systems, and even data models for different databases.
What are benefits of a data ware house
Congregate data from multiple sources into a single database so a single query engine can be used to present data.
Mitigate the problem of database isolation level lock contention in transaction processing systems caused by
attempts to run large, long running, analysis queries in transaction processing databases.
Define offline data warehouse and Online database.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 73 of 97

CS6302-Database Management Systems

16

17

18

19

20

21

22

23

24

25
26
27
28

29

Department of IT/CSE

2016-2017

Data warehouses at this stage are updated from data in the operational systems on a regular basis and the data
warehouse data are stored in a data structure designed to facilitate reporting. Online Integrated Data Warehousing
represent the real time Data warehouses stage data in the warehouse is updated for every transaction performed on
the source data
Write about integrated data warehouse
These data warehouses assemble data from different areas of business, so users can look up the information they
need across other systems.
Define spatial data mining.
Spatial data mining is the application of data mining methods to spatial data. The end objective of spatial data
mining is to find patterns in data with respect to geography.
What is the advantage of OODB?
An integrated repository of information that is shared by multiple users, multiple products, multiple applications on
multiple platforms.
Define Text mining.
Text mining is also referred to as text data mining, it is equivalent to text analytics, refers to the process of deriving
high-quality information from text. High-quality information is typically derived through the devising of patterns
and trends through means such as statistical pattern learning.
Define XML Database.
An XML database is a data persistence software system that allows data to be stored in XML format. These data can
then be queried, exported and serialized into the desired format. XML databases are usually associated with
document-oriented databases.
Define Clustering.
Clustering techniques consider data tuples as objects. They partition the objects into groups or clusters so that objects
within a cluster are similar to one another and dissimilar to objects in other clusters.
Define Association Rule Mining. (Apr/May 2015)
Association rule mining is discovering frequent patterns, associations and correlations among items which are
meaningful to the users and can generate strong rules on the basis of these frequent patterns, which helps in decision
support system.
Define Information Retrieval.
It is an activity of obtaining information resources relevant to an information need from a collection of information
resources.
Define Crawling and indexing the web. (Nov/Dec 2014)
Web Crawling is the process of search engines combing through web pages in order to properly index them. These
web crawlers systematically crawl pages and look at the keywords contained on the page, the kind of content, all
the links on the page, and then returns that information to the search engines server for indexing. Then they follow
all the hyperlinks on the website to get to other websites. When a search engine user enters a query, the search engine
will go to its index and return the most relevant search results based on the keywords in the search term. Web
crawling is an automated process and provides quick, up to date data.
Define Relevance Ranking. (Nov/Dec 2014)
A system in which the search engine tries to determine the theme of a site that a link is coming from.
Can we have more than one constructor in a class? If yes, explain the need for such a situation . (Nov/Dec 2015)
Yes, default constructor and constructor with parameter
List the types of privileges used in database access control. (Nov/Dec 2015)
Types of privileges used in database access control are Read, Write and Update.
Define Threats and Risks (Apr/May 2015)
Threats: Accidental loss and corruption, and from deliberate unauthorized attempts to access or alter that data.Loss
of integrity, Loss of availability, Loss of confidentiality
Risks: Unauthorized or unintended activity or misuse by authorized database users, database administrators, or
network/systems managers, or by unauthorized users or hackers.
Explain Data Classification. (May/June 2016)
Classification is a data mining function that assigns items in a collection to target categories or classes. The goal of
classification is to accurately predict the target class for each case in the data. For example, a classification model
could be used to identify loan applicants as low, medium, or high credit risks.
UNIT-V / PART-B
Explain about Object Oriented Databases and XML Databases.

OODB = Object Orientation + Database Capabilities


Object-Oriented Database Features:
-persistence
- support of transactions
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 74 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
- simple querying of bulk data
- concurrent access
- resilience
- security
Integration and Sharing
-Seamless integration of operating systems, databases, languages, spreadsheets, word processors, AI
expert system shells.
-Sharing of data, information, software components, products, computing environments.
-Referential sharing:
Multiple applications, products, or objects share common sub-objects.
Object-oriented databases allows referential sharing through the support of object identity and inheritance.
The object-oriented paradigm is illustrated below:

Objects and Identity


The following figure shows object with state and behavior. The state is represented by the values of the
object's attributes, and the behavior is defined by the methods acting on the state of the object. There is a
unique object identifier OID to identify the object.

Complex Objects
Complex objects are built by applying constructors to simpler objects including: sets, lists and tuples. An
example is illustrated below:

Encapsulation
Encapsulation is derived from the notion of Abstract Data Type (ADT). It is motivated by the need to
make a clear distinction between the specification and the implementation of an operation. It reinforces
modularity and provides a form of logical data independence.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 75 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Class
A class object is an object which acts as a template.
It specifies:
A structure that is the set of attributes of the instances
A set of operations
A set of methods which implement the operations
Instantiation means generating objects, Ex. 'new' operation in C++
Persistence of objects: Two approaches
An implicit characteristic of all objects
An orthogonal characteristic - insert the object into a persistent collection of objects
Inheritance
A mechanism of reusability, the most powerful concept of OO programming

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 76 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Association
Association is a link between entities in an application
In OODB, associations are represented by means of references between objects
a representation of a binary association
a representation of a ternary association
reverse reference

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 77 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

ADVANTAGES OF OODB
-An integrated repository of information that is shared by multiple users, multiple products, multiple
applications on multiple platforms.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 78 of 97

CS6302-Database Management Systems


Department of IT/CSE
- It also solves the following problems:
1. The semantic gap: The real world and the Conceptual model is very similar.

2016-2017

2. Impedance mismatch: Programming languages and database systems must be interfaced to solve
application problems. But the language style, data structures, of a programming language (such as C) and
the DBMS (such as Oracle) are different. The OODB supports general purpose programming in the
OODB framework.
3. New application requirements: Especially in OA, CAD, CAM, CASE, object-orientation is the most
natural and most convenient.
XML DB

XML Database is used to store the huge amount of information in the XML format. As the use
of XML is increasing in every field, it is required to have the secured place to store the XML
documents. The data stored in the database can be queried using XQuery, serialized, and
exported into desired format.

XML Database Types


There are two major types of XML databases:

XML- enabled

Native XML (NXD)

XML- Enabled Database


XML enabled database is nothing but the extension provided for the conversion of XML
document. This is relational database, where data are stored in tables consisting of rows and
columns. The tables contain set of records, which in turn consist of fields.

Native XML Database


Native XML database is based on the container rather than table format. It can store large
amount of XML document and data. Native XML database is queried by the XPathexpressions.
Native XML database has advantage over the XML-enabled database. It is highly capable to
store, query and maintain the XML document than XML-enabled database.
2

Explain in detail (i) Clustering (ii) Information Retrieval (iii) Transaction processing.

(Nov/Dec 2014)

(i) Clustering
Unsupervised learning or clustering builds models from data without predefined classes.
The goal is to place records into groups where the records in a group are highly similar to each
other and dissimilar to records in other groups.
The k-Means algorithm is a simple yet effective clustering technique
Agglomerative clustering algorithms
Build small clusters, then cluster small clusters into bigger clusters, and
so on
Divisive clustering algorithms
Start with all items in a single cluster, repeatedly refine (break) clusters
into smaller ones
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 79 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
(ii) Information Retrieval
Information retrieval (IR) systems use a simpler data model than database systems
Information organized as a collection of documents
Documents are unstructured, no schema
Information retrieval locates relevant documents, on the basis of user input such as keywords or
example documents
e.g., find documents containing the words database systems
Can be used even on textual descriptions provided with non-textual data such as images
IR on Web documents has become extremely important
E.g. google, altavista
Conceptually, IR is the study of finding needed information. I.e., IR helps users find information
that matches their information needs.
Expressed as queries
Historically, IR is about document retrieval, emphasizing document as the basic unit.
Finding documents relevant to user queries
Technically, IR studies the acquisition, organization, storage, retrieval, and distribution of
information.
An IR model governs how a document and a query are represented and how the relevance of a
document to a user query is defined.
Main models:
Boolean model
Vector space model
Statistical language model
(iii) Transaction processing.
It is type of information system that collects stores, retrieves and modifies the data transaction of
enterprises.
3

Explain about Types of Privileges in database language


-Grant
-Revoke

privilege is a right to execute a particular type of SQL statement or to access another user's object. Some
examples of privileges include the right to:

Connect to the database


Create a table
Select rows from another user's table
Execute another user's stored procedure

You grant privileges to users so these users can accomplish tasks required for their jobs. You should grant
a privilege only to a user who requires that privilege to accomplish the necessary work. Excessive
granting of unnecessary privileges can compromise security. A user can receive a privilege in two
different ways:

You can grant privileges to users explicitly. For example, you can explicitly grant to user SCOTT
the privilege to insert records into the employees table.
You can also grant privileges to a role (a named group of privileges), and then grant the role to
one or more users. For example, you can grant the privileges to select, insert, update, and delete
records from the employees table to the role named clerk, which in turn you can grant to
users scott and brian.

Types:

System Privileges
Schema Object Privileges
Table Privileges
View Privileges
Procedure Privileges

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 80 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Type Privileges
In some cases it is desirable to grant a privilege to a user temporarily. For example,
The owner of a relation may want to grant the SELECT privilege to a user for a specific task and then
revoke that privilege once the task is completed.
Hence, a mechanism for revoking privileges is needed. In SQL, a REVOKE command is included for the
purpose of canceling privileges.
Suppose that the DBA creates four accounts
A1, A2, A3, A4 and wants only A1 to be able to create base relations. Then the DBA must issue the
following GRANT command in SQL
GRANT CREATETAB TO A1;
In SQL the same effect can be accomplished by having the DBA issue a CREATE SCHEMA command
as follows:
CREATE SCHAMA EXAMPLE AUTHORIZATION A1;
User account A1 can create tables under the schema called EXAMPLE.

Suppose that A1 creates the two base relations EMPLOYEE and DEPARTMENT
A1 is then owner of these two relations and hence all the relation privileges on each of them.
Suppose that A1 wants to grant A2 the privilege to insert and delete tuples in both of these

relations, but A1 does not want A2 to be able to propagate these privileges to additional accounts:

GRANT INSERT, DELETE ON EMPLOYEE, DEPARTMENT TO A2;


Suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE relation from A3;
A1 can issue:
REVOKE SELECT ON EMPLOYEE FROM A3;
4

Explain about Threats and risks in database security.


Threats to databases
Loss of integrity
Loss of availability
Loss of confidentiality
Threats
o Excessive and Unused Privileges
o Privilege Abuse
o SQL Injection
o Malware
o Weak Audit Trail
o Storage Media Exposure
o Exploitation of Vulnerabilities
o Misconfigured Databases
o Unmanaged Sensitive Data
o Denial of Service

Security risk to Database System:


Security risks to database systems include, for example:
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 81 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Unauthorized or unintended activity or misuse by authorized database users, database


administrators, or network/systems managers, or by unauthorized users or hackers
(e.g. inappropriate access to sensitive data, metadata or functions within databases, or
inappropriate changes to the database programs, structures or security
configurations);
Malware infections causing incidents such as unauthorized access, leakage or
disclosure of personal or proprietary data, deletion of or damage to the data or
programs, interruption or denial of authorized access to the database, attacks on other
systems and the unanticipated failure of database services;
Overloads, performance constraints and capacity issues resulting in the inability of
authorized users to use databases as intended;
Physical damage to database servers caused by computer room fires or floods,
overheating, lightning, accidental liquid spills, static discharge, electronic
breakdowns/equipment failures and obsolescence;
Design flaws and programming bugs in databases and the associated programs and
systems, creating various security vulnerabilities (e.g. unauthorized privilege
escalation), data loss/corruption, performance degradation etc.;
Data corruption and/or loss caused by the entry of invalid data or commands,
mistakes in database or system administration processes, sabotage/criminal damage

Explain about Data Warehousing with an example. Nov/Dec 2015)


Purpose of Data Warehousing
Traditional databases are not optimized for data access only they have to balance the requirement
of data access with the need to ensure integrity of data.
Most of the times the data warehouse users need only read access but, need the access to be fast
over a large volume of data.
Most of the data required for data warehouse analysis comes from multiple databases and these
analysis are recurrent and predictable to be able to design specific software to meet the
requirements.
W. H Inmon characterized a data warehouse as:
A subject-oriented, integrated, nonvolatile, time-variant collection of data in support of
managements decisions.
Data Warehouse processing involves
Cleaning and reformatting of data
OLAP
Data Mining
Back Flushing

Data Warehouse

Databases

Cleaning

Reformatting

OLAP

Data
Metadata

DSSI
EIS
Data
Mining

Data Modeling for Data Warehouses


Advantages of a multi-dimensional model
Multi-dimensional models lend themselves readily to hierarchical views in what is known
Other Data Inputs

Updates/New Data

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 82 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
as roll-up display and drill-down display.
The data can be directly queried in any combination of dimensions, bypassing complex
database queries.
Multi-dimensional schemas are specified using:
Dimension table
It consists of tuples of attributes of the dimension.
Fact table
Each tuple is a recorded fact. This fact contains some measured or observed
variable (s) and identifies it with pointers to dimension tables. The fact table
contains the data, and the dimensions to identify each tuple in the data.
Star schema:
Consists of a fact table with a single table for each dimension.

Snowflake Schema:
It is a variation of star schema, in which the dimensional tables from a
star schema are organized into a hierarchy by normalizing them.

Fact Constellation
Fact constellation is a set of tables that share some dimension tables.
However, fact constellations limit the possible queries for the
warehouse.

Building A Data Warehouse


The Design of a Data Warehouse involves following steps.
Acquisition of data for the warehouse.
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 83 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Ensuring that Data Storage meets the query requirements efficiently.
Giving full consideration to the environment in which the data warehouse resides.
Functionality of a Data Warehouse
Functionality that can be expected:
Roll-up: Data is summarized with increasing generalization
Drill-Down: Increasing levels of detail are revealed
Pivot: Cross tabulation is performed
Slice and dice: Performing projection operations on the dimensions.
Sorting: Data is sorted by ordinal value.
Selection: Data is available by value or range.
Derived attributes: Attributes are computed by operations on stored derived values.
Difficulties of implementing Data Warehouses
Lead time is huge in building a data warehouse
Potentially it takes years to build and efficiently maintain a data warehouse.
Both quality and consistency of data are major concerns.
Revising the usage projections regularly to meet the current requirements.
The data warehouse should be designed to accommodate addition and attrition of data
sources without major redesign
Administration of data warehouse would require far broader skills than are needed for a
traditional database
6

Explain about Data mining with an example.

Data Mining:
The discovery of new information in terms of patterns or rules from vast amounts of data.
The process of finding interesting structure in data.
The process of employing one or more computer learning techniques to automatically analyze and
extract knowledge from data.
Goals of Data Mining:
Prediction:
Determine how certain attributes will behave in the future.
Identification:
Identify the existence of an item, event, or activity.
Classification:
Partition data into classes or categories.
Optimization:
Optimize the use of limited resources.
Types of Discovered Knowledge
Association Rules
Classification Hierarchies
Sequential Patterns
Patterns Within Time Series
Clustering
Additional Data Mining Methods:
Sequential pattern analysis
Time Series Analysis
Regression
Neural Networks
Genetic Algorithms
Association Rules

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 84 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Algorithm used
Market-Basket Model, Support, and Confidence
Apriori Algorithm
Sampling Algorithm
Frequent-Pattern Tree Algorithm
Partition Algorithm
Example

Retail shops are often interested in associations between different items that people buy.
Someone who buys bread is quite likely also to buy milk
A person who bought the book Database System Concepts is quite likely also to buy the
book Operating System Concepts.
Associations information can be used in several ways.
E.g. when a customer buys a particular book, an online shop may suggest associated
books.
Association rules:
bread milk
DB-Concepts, OS-Concepts Networks
Left hand side: antecedent, right hand side: consequent
An association rule must have an associated population; the population consists of a set of
instances

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 85 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Sequential Pattern Analysis


Transactions ordered by time of purchase form a sequence of itemsets.
The problem is to find all subsequences from a given set of sequences that have a minimum
support.
The sequence S1, S2, S3, .. is a predictor of the fact that a customer purchasing itemset S 1 is likely
to buy S2 , and then S3, and so on.
Time Series Analysis
Time series are sequences of events. For example, the closing price of a stock is an event that
occurs each day of the week.
Time series analysis can be used to identify the price trends of a stock or mutual fund.
Time series analysis is an extended functionality of temporal data management.
Regression
A regression equation estimates a dependent variable using a set of independent variables and
a set of constants.
The independent variables as well as the dependent variable are numeric.
A regression equation can be written in the form Y=f(x1,x2,,xn) where Y is the dependent
variable.
If f is linear in the domain variables xi, the equation is call a linear regression equation
Neural Networks
A neural network is a set of interconnected nodes designed to imitate the functioning of the brain.
Node connections have weights which are modified during the learning process.
Neural networks can be used for supervised learning and unsupervised clustering.
The output of a neural network is quantitative and not easily understood.
Genetic Learning
Genetic learning is based on the theory of evolution.
An initial population of several candidate solutions is provided to the learning model.
A fitness function defines which solutions survive from one generation to the next.
Crossover, mutation and selection are used to create new population elements.
Application
Marketing
Marketing strategies and consumer behavior
Finance
Fraud detection, creditworthiness and investment analysis
Manufacturing
Resource optimization
Health
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 86 of 97

CS6302-Database Management Systems


Department of IT/CSE
Image analysis, side effects of drug, and treatment effectiveness

2016-2017

Explain the concepts of Crawling and Indexing the Web


Web Crawling and indexing:

Web crawlers are programs that locate and gather information on the Web
Recursively follow hyperlinks present in known documents, to find other documents
Starting from a seed set of documents
Fetched documents
Handed over to an indexing system
Can be discarded after indexing, or store as a cached copy
Crawling the entire Web would take a very large amount of time
Search engines typically cover only a part of the Web, not all of it
Take months to perform a single crawl

most well-known crawler is called Googlebot.


Crawling is done by multiple processes on multiple machines, running in parallel
Set of links to be crawled stored in a database
New links found in crawled pages added to this set, to be crawled later
Indexing process also runs on multiple machines
Creates a new copy of index instead of modifying old index
Old index is used to answer queries
After a crawl is completed new index becomes old index
Multiple machines used to answer queries
Indices may be kept in memory
Queries may be routed to different machines for load balancing
Methods:
Basic Crawling
Selective crawling
Focused crawling
Distributed crawling
8

Explain Association rules and its types with an example

Association Rules

Algorithm used
Market-Basket Model, Support, and Confidence
Apriori Algorithm
Sampling Algorithm
Frequent-Pattern Tree Algorithm
Partition Algorithm

Example

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 87 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Retail shops are often interested in associations between different items that people buy.
Someone who buys bread is quite likely also to buy milk
A person who bought the book Database System Concepts is quite likely also to buy the
book Operating System Concepts.
Associations information can be used in several ways.
E.g. when a customer buys a particular book, an online shop may suggest associated
books.
Association rules:
bread milk
DB-Concepts, OS-Concepts Networks
Left hand side: antecedent, right hand side: consequent
An association rule must have an associated population; the population consists of a set of
instances
Support and Confidence
Support count:
The support count of an itemset X, denoted by X.count, in a data set T is the number of
transactions in T that contain X. Assume T has n transactions.
Then,

( X Y ).count
n Y ).count
(X
confidence
X .count
support

Generating Association Rules:


Apriori Algorithm:

Apriori principle:
If an itemset is frequent, then all of its subsets must also be frequent

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 88 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
Apriori principle holds due to the following property of the support measure:

Support of an
Xitemset
, Y : ( Xnever
Y exceeds
) s ( Xthe
) support
s (Y ) of its subsets
This is known as the anti-monotone property of support

Apriori Alogorithm
N= Nomber of attributes
Attribute_list = all attributes
For i=1 to N
Frequent_item_set = generate item set of i attributes using attribute_list
Frequent_item_set = remove all infrequent itemsets
Rule = Rule U generate rule using Frequent_item_set
Attribute_list = Only attributes contained in Frequent_item_set
End For
Brute-force approach;
Given a set of transactions T, the goal of association rule mining is to find all rules having
support minsup threshold
confidence minconf threshold
Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Discard rules that fail the minsup and minconf thresholds
Two steps approach
The general algorithm for generating association rules is a two-step process.
Generate all itemsets that have a support exceeding the given threshold. Itemsets with this
property are called large or frequent itemsets.
Generate rules for each itemset as follows:
For itemset X and Y a subset of X, let Z = X Y;
If support(X)/Support(Z) > minimum confidence, the rule Z=>Y is a valid rule.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 89 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

The Sampling Algorithm:


The sampling algorithm selects samples from the database of transactions that individually fit into
memory. Frequent itemsets are then formed for each sample.
If the frequent itemsets form a superset of the frequent itemsets for the entire database,
then the real frequent itemsets can be obtained by scanning the remainder of the database.
In some rare cases, a second scan of the database is required to find all frequent itemsets .
Frequent-Pattern Tree Algorithm
The Frequent-Pattern Tree Algorithm reduces the total number of candidate itemsets by
producing a compressed version of the database in terms of an FP-tree.
The FP-tree stores relevant information and allows for the efficient discovery of frequent itemsets.
The algorithm consists of two steps:
Step 1 builds the FP-tree.
Step 2 uses the tree to find frequent itemsets.
The Partition Algorithm
Divide the database into non-overlapping subsets.
Treat each subset as a separate database where each subset fits entirely into main memory.
Apply the Apriori algorithm to each partition.
Take the union of all frequent itemsets from each partition.
These itemsets form the global candidate frequent itemsets for the entire database.
Verify the global set of itemsets by having their actual support measured for the entire database.
9

Describe the GRANT functions and explain how it relates to security. What types of privileges may be granted? How
rights could be revoked? (Nov/Dec 2015)
Granting and revoking privileges

The account level:


At this level, the DBA specifies the particular privileges that each account holds
independently of the relations in the database.
The relation level (or table level):
At this level, the DBA can control the privilege to access each individual relation or view
in the database.
Propagation of Privileges using the GRANT OPTION
Whenever the owner A of a relation R grants a privilege on R to another account B, privilege can
be given to B with or without the GRANT OPTION.
If the GRANT OPTION is given, this means that B can also grant that privilege on R to other
accounts.
Suppose that B is given the GRANT OPTION by A and that B then grants the privilege
on R to a third account C, also with GRANT OPTION. In this way, privileges on R can
propagate to other accounts without the knowledge of the owner of R.
If the owner account A now revokes the privilege granted to B, all the privileges that B
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 90 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
propagated based on that privilege should automatically be revoked by the system.
Suppose that the DBA creates four accounts
A1, A2, A3, A4
and wants only A1 to be able to create base relations. Then the DBA must issue the following
GRANT command in SQL
GRANT CREATETAB TO A1;
In SQL2 the same effect can be accomplished by having the DBA issue a CREATE SCHEMA
command as follows:
CREATE SCHAMA EXAMPLE AUTHORIZATION A1;
User account A1 can create tables under the schema called EXAMPLE.
Suppose that A1 creates the two base relations EMPLOYEE and DEPARTMENT
A1 is then owner of these two relations and hence all the relation privileges on each of
them.
Suppose that A1 wants to grant A2 the privilege to insert and delete tuples in both of these
relations, but A1 does not want A2 to be able to propagate these privileges to additional accounts:
GRANT INSERT, DELETE ON EMPLOYEE, DEPARTMENT TO A2;
Suppose that A1 wants to allow A3 to retrieve information from either of the two tables and also
to be able to propagate the SELECT privilege to other accounts.
A1 can issue the command:
GRANT SELECT ON EMPLOYEE, DEPARTMENT TO A3 WITH GRANT
OPTION;
Suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE relation from A3;
A1 can issue:
REVOKE SELECT ON EMPLOYEE FROM A3;
10

Suppose an Object Oriented database had an object A, which references object B, which in turn references object C.
Assume all objects are on disk initially? Suppose a program first dereferences A, then dereferences B by following
the reference from A, and then finally dereferences C. Show the objects that are represented in memory after each
dereference, along with their state. (Nov/Dec 2015)

For tuples of relations = "row types."


2. For columns of relations = "types."
But row types can also be used as column types.
References :
Row types can have references.
If T is a row type, then REF(T) is the type of a reference to a T object.
CREATE ROW TYPE A (
name CHAR(20) UNIQUE,
addr CHAR(20)
);
CREATE ROW TYPE B(
name CHAR(20) UNIQUE,
manf CHAR(20)
);
CREATE ROW TYPE C (
a REF(A),
b REF(B),
price FLOAT
);
Row-type declarations do not create tables.
They are used in place of element lists in CREATE TABLE statements.
Example
CREATE TABLE a1 OF TYPE A
CREATE TABLE b1 OF TYPE B
CREATE TABLE c1 OF TYPE B
Dereferencing:

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 91 of 97

CS6302-Database Management Systems

11

Department of IT/CSE

2016-2017

A -> B = the B attribute of the object referred to by reference A.


Example
Find the B served by Joe.
SELECT b -> name
FROM c1
WHERE c -> name = 'Joe';

Neatly write the K-Mean algorithm and show the intermediate results in clustering the below given points into two
clusters using K-Means algorithm.
P1: (0,0), P2:(1,10), P3:(2,20), P4:(1,15), P5:(1000,2000),P6:(1500,1500),P7:(1000,1250). (Apr/May 2015)

K-Means algorithm
K-Means clustering intends to partition n objects into k clusters in which each object belongs to the
cluster with the nearest mean. This method produces exactly k different clusters of greatest possible
distinction. The best number of clusters k leading to the greatest separation (distance) is not known as a
priori and must be computed from the data. The objective of K-Means clustering is to minimize total
intra-cluster variance, or, the squared error function:

Algorithm
1. Clusters the data into k groups where k is predefined.
2. Select k points at random as cluster centers.
3. Assign objects to their closest cluster center according to the Euclidean distance function.
4. Calculate the centroid or mean of all objects in each cluster.
5. Repeat steps 2, 3 and 4 until the same points are assigned to each cluster in consecutive rounds
Problem
Consider the following data set consisting of the scores of two variables on each of seven
individuals:

This data set is to be grouped into two clusters. As a first step in finding a sensible initial partition, let the
A & B values of the two individuals furthest apart (using the Euclidean distance measure), define the
initial cluster means, giving:

The remaining individuals are now examined in sequence and allocated to the cluster to which they are
closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated each time a
new member is added. This leads to the following series of steps:

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 92 of 97

CS6302-Database Management Systems

Department of IT/CSE

2016-2017

Now the initial partition has changed, and the two clusters at this stage having the following
characteristics:

But we cannot yet be sure that each individual has been assigned to the right cluster. So, we compare each
individuals distance to its own cluster mean and to that of the opposite cluster. And we find:

Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2) than its own (Cluster 1). In other
words, each individual's distance to its own cluster mean should be smaller than the distance to the other
cluster's mean (which is not the case with individual 3). Thus, individual 3 is relocated to Cluster 2
resulting in the new partition:

The iterative relocation would now continue from this new partition until no more
relocationoccurs. However, in this example each individual is now nearer its own cluster mean than that
of theother cluster and the iteration stops, choosing the latest partitioning as the final cluster solution.Also,
it is possible that the k-means algorithm won't find a final solution. In this case it would bea good idea to
consider stopping the algorithm after a pre-chosen maximum of iterations
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 93 of 97

CS6302-Database Management Systems


12

Department of IT/CSE

2016-2017

Discuss about the Access control mechanism and Cryptography methods to secure the Database. (Apr/May 2015)

Access control mechanism


Its all about users access to the database.
Types
Discretionary Access Control
Mandatory Access Control
Role based Access Control.
Discretionary Access Control
The typical method of enforcing discretionary access control in a database system is based on
the granting and revoking privileges
Types of Discretionary Privileges
The account level:
At this level, the DBA specifies the particular privileges that each account holds
independently of the relations in the database.
The privileges at the account level apply to the capabilities provided to the account itself
and can include
The CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base
relation;
The CREATE VIEW privilege;
The ALTER privilege, to apply schema changes such adding or removing attributes from
relations;
The DROP privilege, to delete relations or views;
The MODIFY privilege, to insert, delete, or update tuples;
The SELECT privilege, to retrieve information from the database by using a SELECT
query.
The relation level (or table level):
At this level, the DBA can control the privilege to access each individual relation or view
in the database.
This includes base relations and virtual (view) relations.
To control the granting and revoking of relation privileges, each relation R in a database is
assigned and owner account, which is typically the account that was used when the relation was
created in the first place.
The owner of a relation is given all privileges on that relation.
Suppose that the DBA creates four accounts
A1, A2, A3, A4
And wants only A1 to be able to create base relations. Then the DBA must issue the following
GRANT command in SQL
GRANT CREATETAB TO A1;
In SQL2 the same effect can be accomplished by having the DBA issue a CREATE SCHEMA
command as follows:
CREATE SCHAMA EXAMPLE AUTHORIZATION A1;
Suppose that A1 wants to allow A3 to retrieve information from either of the two tables and also
to be able to propagate the SELECT privilege to other accounts.
A1 can issue the command:
GRANT SELECT ON EMPLOYEE, DEPARTMENT
TO A3 WITH GRANT OPTION;
Mandatory Access Control and Role-Based Access Control for Multilevel Security
The discretionary access control techniques of granting and revoking privileges on relations have
traditionally been the main security mechanism for relational database systems.
This is an all-or-nothing method:
A user either has or does not have a certain privilege.
In many applications, and additional security policy is needed that classifies data and users based
on security classes.
This approach as mandatory access control would typically be combined with the
discretionary access control mechanisms.
Typical security classes are top secret (TS), secret (S), confidential (C), and unclassified (U),
St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 94 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
where TS is the highest level and U the lowest: TS S C U
The commonly used model for multilevel security, known as the Bell-LaPadula model, classifies
each subject (user, account, program) and object (relation, tuple, column, view, operation) into
one of the security classifications, T, S, C, or U:
Clearance (classification) of a subject S as class(S) and to the classification of an object
O as class (O).
Two restrictions are enforced on data access based on the subject/object classifications:
Simple security property: A subject S is not allowed read access to an object O unless
class(S) class (O).
A subject S is not allowed to write an object O unless class(S) class(O). This known as
the star property (or * property).
Role-based access control
Its basic notion is that permissions are associated with roles, and users are assigned to appropriate
roles.
Roles can be created using the CREATE ROLE and DESTROY ROLE commands.
The GRANT and REVOKE commands discussed under DAC can then be used to assign
and revoke privileges from roles.
Cryptography method to secure the database
Encryption is a means of maintaining secure data in an insecure environment.
Encryption consists of applying an encryption algorithm to data using some pre specified
encryption key.
The resulting data has to be decrypted using a decryption key to recover the original data.
Data and Advanced Encryption Standards (DES)
DES can provide end-to-end encryption on the channel between the sender A and receiver B.
DES algorithm is a careful and complex combination of two of the fundamental building blocks
of encryption:
Substitution and permutation (transposition).
The DES algorithm derives its strength from repeated application of these two techniques for a
total of 16 cycles.
Plaintext (the original form of the message) is encrypted as blocks of 64 bits.
Public Key Encryption
Public key algorithms are based on mathematical functions rather than operations on bit patterns.
They also involve the use of two separate keys
In contrast to conventional encryption, which uses only one key.
The use of two keys can have profound consequences in the areas of confidentiality, key
distribution, and authentication.
The two keys used for public key encryption are referred to as the public key and the
private key.
The essential steps are as follows:
Each user generates a pair of keys to be used for the encryption and decryption of
messages.
Each user places one of the two keys in a public register or other accessible file. This is
the public key. The companion key is kept private (private key).
If a sender wishes to send a private message to a receiver, the sender encrypts the
message using the receivers public key.
When the receiver receives the message, he or she decrypts it using the receivers private key.
No other recipient can decrypt the message because only the receiver knows his or her private key.
Digital Signatures:
A digital signature is an example of using encryption techniques to provide authentication
services in e-commerce applications.
A digital signature is a means of associating a mark unique to an individual with a body of text.
The mark should be unforgettable, meaning that others should be able to check that the
signature does come from the originator.
A digital signature consists of a string of symbols.
Signature must be different for each use.
This can be achieved by making each digital signature a function of the message that it is signing,
together with a time stamp.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 95 of 97

CS6302-Database Management Systems


Department of IT/CSE
Public key techniques are the means creating digital signatures
13

2016-2017

Explain types of database security and database security issues.(May/June2016)


Database security issues
Types of Security
Legal and ethical issues
Policy issues
System-related issues
The need to identify multiple security levels
Threats to databases
Loss of integrity
Loss of availability
Loss of confidentiality
To protect databases against these types of threats four kinds of countermeasures can be implemented:
Access control
The security mechanism of a DBMS must include provisions for restricting access to the
database as a whole
This function is called access control and is handled by creating user accounts and
passwords to control login process by the DBMS.
Inference control
The security problem associated with databases is that of controlling the access to a
statistical database, which is used to provide statistical information or summaries of
values based on various criteria.
The countermeasures to statistical database security problem is called inference control
measures.
Flow control

Another security is that of flow control, which prevents information from flowing in such a
way that it reaches unauthorized users.

Encryption
A final security issue is data encryption, which is used to protect sensitive data (such as credit card
numbers) that is being transmitted via some type communication network.
The data is encoded using some encoding algorithm.
Types of database security
A DBMS typically includes a database security and authorization subsystem that is responsible for ensuring
the security portions of a database against unauthorized access.
Types of database security mechanisms:

Discretionary Access Control


Mandatory Access Control
Role based Access Control.
Discretionary Access Control
The typical method of enforcing discretionary access control in a database system is based on
the granting and revoking privileges
Types of Discretionary Privileges
The account level:
At this level, the DBA specifies the particular privileges that each account holds
independently of the relations in the database.
The privileges at the account level apply to the capabilities provided to the account itself
and can include
The CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base
relation;
The CREATE VIEW privilege;
The ALTER privilege, to apply schema changes such adding or removing attributes from
relations;
The DROP privilege, to delete relations or views;
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 96 of 97

CS6302-Database Management Systems


Department of IT/CSE
2016-2017
The MODIFY privilege, to insert, delete, or update tuples;
The SELECT privilege, to retrieve information from the database by using a SELECT
query.
The relation level (or table level):
At this level, the DBA can control the privilege to access each individual relation or view
in the database.
This includes base relations and virtual (view) relations.
To control the granting and revoking of relation privileges, each relation R in a database is
assigned and owner account, which is typically the account that was used when the relation was
created in the first place.
The owner of a relation is given all privileges on that relation.
Suppose that the DBA creates four accounts
A1, A2, A3, A4
And wants only A1 to be able to create base relations. Then the DBA must issue the following
GRANT command in SQL
GRANT CREATETAB TO A1;
In SQL2 the same effect can be accomplished by having the DBA issue a CREATE SCHEMA
command as follows:
CREATE SCHAMA EXAMPLE AUTHORIZATION A1;
Suppose that A1 wants to allow A3 to retrieve information from either of the two tables and also
to be able to propagate the SELECT privilege to other accounts.
A1 can issue the command:
GRANT SELECT ON EMPLOYEE, DEPARTMENT
TO A3 WITH GRANT OPTION;
Mandatory Access Control and Role-Based Access Control for Multilevel Security
The discretionary access control techniques of granting and revoking privileges on relations have
traditionally been the main security mechanism for relational database systems.
This is an all-or-nothing method:
A user either has or does not have a certain privilege.
In many applications, and additional security policy is needed that classifies data and users based
on security classes.
This approach as mandatory access control would typically be combined with the
discretionary access control mechanisms.
Typical security classes are top secret (TS), secret (S), confidential (C), and unclassified (U),
where TS is the highest level and U the lowest: TS S C U
The commonly used model for multilevel security, known as the Bell-LaPadula model, classifies
each subject (user, account, program) and object (relation, tuple, column, view, operation) into
one of the security classifications, T, S, C, or U:
Clearance (classification) of a subject S as class(S) and to the classification of an object
O as class (O).
Two restrictions are enforced on data access based on the subject/object classifications:
Simple security property: A subject S is not allowed read access to an object O unless
class(S) class (O).
A subject S is not allowed to write an object O unless class(S) class(O). This known as
the star property (or * property).
Role-based access control
Its basic notion is that permissions are associated with roles, and users are assigned to appropriate
roles.
Roles can be created using the CREATE ROLE and DESTROY ROLE commands.
The GRANT and REVOKE commands discussed under DAC can then be used to assign
and revoke privileges from roles.

St. Josephs College of Engineering & St. Josephs Institute of Technology

Page 97 of 97

Potrebbero piacerti anche