Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Department of IT/CSE
2016-2017
UNIT I
INTRODUCTION TO DBMS
File Systems Organization - Sequential, Pointer, Indexed, Direct - Purpose of Database System-Database System
Terminologies-Database characteristics- Data models Types of data models Components of DBMS- Relational Algebra.
LOGICAL DATABASE DESIGN: Relational DBMS -Codd's Rule - Entity-Relationship model - Extended ER
Normalization Functional Dependencies, Anomaly- 1NF to 5NF- Domain Key Normal Form Denormalization
UNIT-I / PART-A
1
What is first normal form?
The domain of attribute must include only atomic (simple, indivisible) values.
2
What is 2NF?
Relation schema R is in 2NF if it is in 1NF and every non-prime attribute An in R is fully functionally dependent on
primary key.
3
Define Functional dependency. (Apr/May 2015)
In a given relation R, X and Y are attributes. Attribute Y is functionally dependent on attribute X if each value
of X determines EXACTLY ONE value of Y, which is represented as X Y (X can be composite in nature).
We say here x determines y or y is functionally dependent on x Empid Ename
4
What is a data model? List the types of data models used?
A data model is a collection of conceptual tools for describing data, data relationships, data semantics and
consistency constraints.
5
Define full functional dependency.
The removal of any attribute A from X means that the dependency does not hold any more.
6
Explain about partial functional dependency?
X and Y are attributes. Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of
attribute X.
7
What you meant by transitive functional dependency?
Transitive dependency is a functional dependency which holds by virtue of transitivity. A transitive dependency can
occur only in a relation that has three or more attributes. Let A, B, and C designates three distinct attributes (or
distinct collections of attributes) in the relation.
Suppose all three of the following conditions hold:
1.A B
2. It is not the case that BA
3. BC
Then the functional dependency AC (Which follows from 1 and 3 by the axiom of transitivity) is a transitive
dependency.
8
What is meant by domain key normal form?
Domain/key normal form (DKNF) is a normal form used in database normalization which requires that the database
contains no constraints other than domain constraints and key constraints.
9
Define database management system?
Database management system (DBMS) is a collection of interrelated data and a set of programs to access those data.
10 List any five applications of DBMS.
Banking, Airlines, Universities, Credit card transactions, Tele communication, Finance, Sales, Manufacturing,
Human resources.
11 What is the purpose of Database Management System? (Nov/Dec 2014)
Data redundancy and inconsistency , Difficulty in accessing data, Data isolation, Integrity problems, Atomicity
problems and Concurrent access anomalies
12 Define instance and schema?
Instance: Collection of data stored in the data base at a particular moment is called an Instance of the database
Schema: The overall design of the data base is called the data base schema.
13 Explain entity relationship model?(May/June 2016)
The entity relationship model is a collection of basic objects called entities and relationship among those objects. An
entity is a thing or object in the real world that is distinguishable from other objects.
14 What is relationship? Give examples
A relationship is an association among several entities. Example: A depositor relationship associates a customer with
each account that he/she has.
15 What are stored and derived attributes?
Stored attributes: The attributes stored in a data base are called stored attributes.
Derived attributes: The attributes that are derived from the stored attributes are called derived attributes
Page 1 of 97
17
18
19
20
21
22
23
24
25
26
27
28
29
Department of IT/CSE
2016-2017
Page 2 of 97
30
31
32
33
34
35
36
37
Department of IT/CSE
2016-2017
Page 3 of 97
Department of IT/CSE
2016-2017
Security Problems,
Difficulty in accessing data,
Data isolation.
UNIT-I / PART-B
Explain ER model by taking Hospital management /University Database Management as case study
ENTITY-RELATIONSHIP(ER) MODELING:
ER modeling: A graphical technique for understanding and organizing the data independent of
the actual database implementation
Entity: Any thing that may have an independent existence and about which we intend to collect
data.Also known as Entity type.
Entity instance: a particular member of the entity type e.g. a particular student
Attributes: Properties/characteristics that describe entities
Relationships: Associations between entities
Attributes
The set of possible values for an attribute is called the domainof the attribute
Example:
The domain of attribute marital status is just the four values: single, married, divorced,
widowed
The domain of the attribute month is the twelve values ranging from January to
December
Key attribute: The attribute (or combination of attributes) that is unique for every entity instance
E.g the account number of an account, the employee id of an employee etc.
If the key consists of two or more attributes in combination, it is called a composite key
Simple Vs composite attribute
Simple attribute: cannot be divided into simpler components
E.g age of an employee
Composite attribute: can be split into components
E.g Date of joining of the employee.
Can be split into day, month and year
Single Vs Multi-valued Attributes
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 4 of 97
Page 5 of 97
Department of IT/CSE
2016-2017
One instance of entity type Person is related to one instance of the entity type Chair.
Cardinality One to- Many
One instance of entity type Organization is related to multiple instances of entity type Employee.
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 6 of 97
Department of IT/CSE
2016-2017
Multiple instances of one Entity are related to multiple instances of another Entity.
Relationship Participation
Total: Every entity instance must be connected through the relationship to another instance of
the other participating entity types
Partial: All instances need not participate
E.g.: Employee Head-of Department
Employee: partial
Department: total
All employees will not be head-of some department. So only few instances of employee entity
participate in the above relationship. But each department will be headed by some employee. So
department entitys participation is total and employee entitys participation is partial in the above
relationship.
3
Heap File Organization: When a file is created using Heap File Organization mechanism, the
Operating Systems allocates memory area to that file without any further accounting details. File
records can be placed anywhere in that memory area. It is the responsibility of software to manage the
records. Heap File does not support any ordering, sequencing or indexing on its own.
Sequential File Organization: Every file record contains a data field (attribute) to uniquely identify
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 7 of 97
Database Users
Users are differentiated by the way they expect to interact with the system
Application programmers
Sophisticated users
Nave users
Database Administrator
Specialized users etc,.
Application programmers:
Professionals who write application programs and using these application programs they
interact with the database system
Sophisticated users :
These user interact with the database system without writing programs, But they submit
queries to retrieve the information
Specialized users:
Who write specialized database applications to interact with the database system.
Nave users:
Interacts with the database system by invoking some application programs that have been
written previously by application programmers
Eg : people accessing database over the web
Page 8 of 97
Department of IT/CSE
2016-2017
Database Administrator:
Coordinates all the activities of the database system; the database administrator has a good
understanding of the enterprises information resources and needs.
Schema definition
Access method definition
Schema and physical organization modification
Granting user authority to access the database
Monitoring performance
Storage Manager
Page 9 of 97
List out the disadvantages of File system over DB & explain it in detail.
In the early days, File-Processing system is used to store records. It uses various files for storing
the records.
Drawbacks of using file systems to store data:
Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
Difficulty in accessing data
Need to write a new program to carry out each new task
Data isolation multiple files and formats
Integrity problems
Hard to add new constraints or change existing ones
Atomicity problem
Failures may leave database in an inconsistent state with partial updates carried
out
E.g. transfer of funds from one account to another should either complete or not
happen at all
Concurrent access anomalies
Concurrent accessed needed for performance
Security problems
Database systems offer solutions to all the above problems
Discuss about (i) Data Models (ii) Mapping cardinalities. (Nov/Dec 2014)
Page 10 of 97
2016-2017
4) Entity-Relationship model
The entity-relationship (ER) data model allows us to describe the data involved in a real-world
enterprise in terms of objects and their relationships. It is widely used to develop an initial
database design.
(ii) Mapping cardinalities
One to one(1:1)
An entity in A is associated with at most one entity in B and an entity in B is
associated with at most one entity in A.
One to many(1:M)
An entity in A is associated with any number of entities in B and an entity in B is
associated with at most one entity in A.
Many to one(M:1)
An entity in A is associated with at most one entity in B and an entity in B is
associated with any number of entities in A.
Many to many(M:N)
An entity in A is associated with any number of entities in B and an entity in B is
associated with any number of entities in A.
List out the operations of the relational algebra and explain with suitable examples.
Relational Algebra:
Procedural language
The Operations can be classified as
Basic Operations
Additional Operations
Extended Operations
Basic Operations
select
project
union
set difference
Cartesian product
rename
Select Operation
Page 11 of 97
Department of IT/CSE
2016-2017
It is used to find tuples that are in one relation but are not in another relation.
Notation r s
Defined as:
r s = {t | t r and t s}
Set differences must be taken between compatible relations.
r and s must have the same arity
Cartesian-Product Operation
It is used to combine information from two relations.
Notation r x s
Defined as:
r x s = {t q | t r and q s}
Rename Operation
Allows us to name, and therefore to refer to, the results of relational-algebra expressions.
Allows us to refer to a relation by more than one name.
Example:
x (E)
returns the expression E under the name X
8
Functional dependency
In a given relation R, X and Y are attributes. Attribute Y is functionally
dependent on attribute X if each value of X determines EXACTLY ONE value
of Y, which is represented as X -> Y (X can be composite in nature).
We say here x determines y or y is functionally dependent on x XY does
not imply YX
If the value of an attribute Marks is known then the value of an attribute
Grade is determined since Marks Grade
Full functional dependency
X and Y are attributes X Functionally determines Y Note: Subset of X should not
functionally determine Y
Marks is fully functionally dependent on STUDENT# COURSE# and not on sub
set of STUDENT# COURSE#. This means Marks can not be determined either
by STUDENT# OR COURSE# alone. It can be determined only using
STUDENT# AND COURSE# together. Hence Marks is fully functionally
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 12 of 97
2016-2017
Define anomalies and explain 1st normal form & 2nd normal form with example. (Nov/Dec 2014)
Discuss third normal form & BCNF with example. (Nov/Dec 2014)
Third normal form: (3 NF) A relation R is said to be in the Third Normal Form (3NF) if and only if
It is in 2NF and
No transitive dependency exists between non-key attributes and key attributes.
STUDENT# and COURSE# are the key attributes.
All other attributes, except grade are non-partially, non-transitively dependent on key attributes.
Student#, Course# - > Marks
Marks -> Grade
Note : - All transitive dependencies are eliminated
Boyce-Codd Normal form BCNF
A relation is said to be in Boyce Codd Normal Form (BCNF)
- if and only if all the determinants are candidate keys.BCNF relation is a strong 3NF, but not every
3NF relation is BCNF.
Page 13 of 97
Department of IT/CSE
2016-2017
Explain the 4NF and 5NF in detail with example. (Nov/Dec 2014)
Fourth NF:
Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald
Fagin in 1977, 4NF is the next level of normalization after BoyceCodd normal form (BCNF). Whereas
the second, third, and BoyceCodd normal forms are concerned with functional dependencies, 4NF is
concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF
if and only if, for every one of its non-trivial multivalued dependencies X
Y, X is a super key that is, X
is either a candidate key or a superset thereof.
Multivalued dependencies
If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z,
then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y,
and z respectively. A multivalued dependency X
Y signifies that if we choose any x actually occurring
in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the table, we
will find that xc is associated with the same y entries regardless of z.A trivial multivalued dependency X
Y is one where either Y is a subset of X, or X and Y together form the whole set of attributes of the
relation.A functional dependency is a special case of multivalued dependency. In a functional dependency
X Y, every x determines exactly one y, never more than one.
Page 14 of 97
Department of IT/CSE
2016-2017
Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database
normalization designed to reduce redundancy in relational databases recording multi-valued facts by
isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every
join dependency in it is implied by the candidate keys.
A join dependency *{A, B, Z} on R is implied by the candidate key(s) of R if and only if each of A, B,
, Z is a super key for R
12
i)
With the help of a neat block diagram explain the basic architecture of a database management system .
(May/June 2016)
Database Users
Users are differentiated by the way they expect to interact with the system
Application programmers
Sophisticated users
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 15 of 97
Database Administrator:
Coordinates all the activities of the database system; the database administrator has a good
understanding of the enterprises information resources and needs.
Schema definition
Access method definition
Schema and physical organization modification
Granting user authority to access the database
Monitoring performance
Storage Manager
Page 16 of 97
Data integrity is maximised and data redundancy is minimised, as the single storing place of all
the data also implies that a given set of data only has one primary record. This aids in the
maintaining of data as accurate and as consistent as possible and enhances data reliability.
Generally bigger data security, as the single data storage location implies only a one possible
place from which the database can be attacked and sets of data can be stolen or tampered with.
Better data preservation than other types of databases due to often-included fault-tolerant setup.
Easier for using by the end-user due to the simplicity of having a single database design.
Generally easier data portability and database administration.
More cost effective than other types of database systems as labour, power supply and maintenance
costs are all minimised.
Data kept in the same location is easier to be changed, re-organised, mirrored, or analysed.
All the information can be accessed at the same time from the same location.
Updates to any given set of data are immediately received by every end-user.
Example:
Think about a banking application which uses centralized database. In this case the data is
stored in a common place. Applications running in various banks may communicate to the
common database over network to access or insert/update/delete information.
Page 17 of 97
13
Department of IT/CSE
2016-2017
A car rental company maintains a database for all vehicles in its current fleet. For all vehicles, it includes the vehicle
identification number, license number, manufacturer, model, date of purchase, and color. Special data are included
for certain types of vehicles.
Trucks: cargo capacity.
Sports cars: horsepower, renter age requirement.
Vans: number of passengers.
Off-road vehicles: ground clearance, drivetrain (four- or two-wheel drive).
Construct an ER model for the car rental company database. (Nov/Dec 2015)
Page 18 of 97
Department of IT/CSE
2016-2017
14
Draw E R Diagram for the Restaurant Menu Ordering System, which will facilitate the food items ordering and
services within a restaurant. The entire restaurant scenario is detailed as follows. The Customer is able to view the
food items menu, call the waiter, place orders and obtain the final bill through the computer kept in their table. The
waiters through their wireless tablet PC are able to initialize a table for customers, control the table functions to
assist customers, orders, send orders to food preparation staff (chef) and finalize the customers bill. The food
preparation staffs (Chefs), with their touch-display interface to the system, are able to view orders sent to the kitchen
by waiters. During preparation, they are able to let the waiter know the status of each item, and can send notification
when items are completed. The system should have full accountability and logging facilities, and should support
supervisor actions to account for exceptional circumstances, such as a meal being refunded or walked out on.
(Apr/May 2015)
15
State the need for Normalization of a Database and Explain the various Normal Forms (1 st, 2nd, 3rd, BCNF, 4th, 5th and
Domain- Key) with suitable examples. (Apr/May 2015)
Page 19 of 97
Page 20 of 97
Department of IT/CSE
2016-2017
Page 21 of 97
16
Department of IT/CSE
2016-2017
Page 22 of 97
10
11
12
13
14
15
16
17
18
Department of IT/CSE
2016-2017
1) Char (n) ,2) varchar (n), 3) int , 4) numeric (p,d) ,5) float(n) , 6) date.
What is the use of integrity constraints?
Integrity constraints ensure that changes made to the database by authorized users do not result in a loss of data
consistency. Thus integrity constraints guard against accidental damage to the database
Mention the 2 forms of integrity constraints in ER model?
Key declarations, Form of a relationship
List some security violations (or) name any forms of malicious access.
1) Unauthorized reading of data
2) Unauthorized modification of data
3) Unauthorized destruction of data.
What is called query processing?
Query processing refers to the range of activities involved in extracting data from a database.
What is called a query evaluation plan?
A sequence of primitive operations that can be used to evaluate be query is a query evaluation plan or a query
execution plan.
What is called as an N-way merge?
The merge operation is a generalization of the two-way merge used by the standard in-memory sort-merge
algorithm. It merges N runs, so it is called an N-way merge.
What is a primary key?
Primary key is a set of one or more attributes that can uniquely identify record from the relation; it will not accept
null values and redundant values. A relation can have only one primary key.
What is a super key?
A super key is a set of one or more attributes that collectively allows us to identify uniquely an entity in the entity
set.
What is foreign key?
A relation schema r1 derived from an ER schema may include among its attributes the primary key of another
relation schema r2.this attribute is called a foreign key from r1 referencing r2.
What is the difference between unique and primary key?
Page 23 of 97
20
21
22
23
24
25
26
27
28
Department of IT/CSE
2016-2017
Unique and primary key are keys which are used to uniquely identify record from the relation. But unique key
accepts null values; primary key does not accept null values.
What is the difference between char and varchar2 data type?
Char and varchar2 are data types which are used to store character values.
Char is static memory allocation; varchar2 is dynamic memory allocation.
How to add primary key to a table with suitable query?
Alter table <table name> add primary key(column);
Explain Query optimization?(May/June 2016)
Query optimization refers to the process of finding the lowest cost method of evaluating a given query.
Define Candidate key
A Candidate key is a set of one or more attributes that can uniquely identify a row in a given table. A relation can
have more than one candidate keys.
How to create composite primary key with suitable query?
Syntax: Create table <table name>(Var1 datatype, Var2 datatype, primary key(Var1,Var2));
Example:
Create table course(course_id int, student_id int, primary key(course_id, student_id));
Why does SQL allow duplicate tuples in a table or in a query result? (Nov/Dec 2015)
If key constraint is not set on a relation every result in a relation will be considered as a tuple and hence SQL allows
duplicate tuples in a table. Distinct keyword is used to avoid duplicate tuples in the result.
State the need for Query Optimization (Apr/May 2015)
The query optimizer attempts to determine the most efficient way to execute a given query by considering the
possible query plans.
Differentiate static and dynamic SQL. (Nov/Dec 2014) (Apr/May 2015) (Nov/Dec 2015)
Static SQL
Dynamic SQL
The SQL statements do not change each time the
The program must dynamically allocate memory to
program is run is called Static SQL.
receive query result
Static SQL is compiled and optimized prior to its
Dynamic SQL is compiled and optimized during
execution
execution
In Static SQL, the statement is prepared before the
Dynamic SQL components of SQL allows programs
program is executed and the operational form of the to construct and submit SQL Queries at runtime
statement persists beyond the execution of the
program.
Define: DDL, DML, DCL and TCL. (Nov/Dec 2014) (Apr/May 2015)
DDL COMMANDS:
Create
Alter
Add
Modify
Drop
Rename
Drop
DML Commands:
Insert
Select
Update
Delete
DCL commands
Grant - Provide access privilege to user
Revoke - Get back access privilege from user
TCL commands
Commit
Rollback
Save point
What is embedded SQL? What are its advantages?
The SQL standard defines embedded of SQL in a variety of programming languages such as C, Java, and Cobol. A
Page 24 of 97
Department of IT/CSE
2016-2017
language to which SQL queries are embedded is referred to as a host language, and the SQL structures permitted in
the host language comprise embedded SQL.
The basic form of these languages follows that of the System R embedding of SQL into PL/I.
EXEC SQL statement is used to identify embedded SQL request to the preprocessor
EXEC SQL <embedded SQL statement > END_EXEC
UNIT-II / PART-B
Explain about data definition language in SQL with examples. (Nov/Dec 2014)(May/June 2016)
Create
Alter
Add
Modify
Drop
Rename
Drop
SYNTAX:
CREATE:
Create table <table name> (column name1 datatype1 constraints, column2 datatype2 . . .);
ALTER:
ADD:
Alter table <table name> add(column name1 datatype1);
MODIFY:
Alter table <table name> modify(column name1 datatype1);
DROP:
Alter table <table name> drop (column name);
RENAME:
Rename <old table name> to <new table name>;
DROP:
Drop table <table name>;
2
Consider a student registration database comprising of the below given table schema.
Student File
Student Number
Student Name
Address
Course File
Course Number
Description
Hours
Telephone
Professor Number
Professor File
Professor Number
Name
Office
Registration file
Student Number
Course Number
Date
Consider a suitable sample of tuples/records for the above mentioned tables and write DML statements (SQL) to
answer for the queries listed below.
1. Which courses does a specific professor teach?
2. What courses does specific professors?
3. Who teaches a specific course and where is his/her office?
4. For a specific student number, in which courses is the student registered and what is his/her name?
5. Who are the professors for a specific student?
6. Who is the student registered in a specific course? (Apr/May 2015)
Page 25 of 97
Explain about data manipulation language in SQL with examples. (Nov/Dec 2014)
Insert
Select
Update
Delete
SYNTAX:
INSERT:
Single level:
Insert into <table name> values (attributes1, attributes2);
Multilevel:
Insert into <table name> values (&attributes1,&attributes2.);
SELECT:
Single level:
Select <column name> from <table name>;
Multilevel:
Select * from <table name> where <condition>;
UPDATE:
Single level:
Update <table name> set <column name>=values where <condition>;
Multilevel:
Update <table name> set <column name>=values;
DELETE:
Single level:
Delete from <table name> where <column name>=values;
Multilevel:
Delete from <table name>;
4
Page 26 of 97
2016-2017
GRANT privilege_name
ON object_name
TO {user_name |PUBLIC |role_name}
[WITH GRANT OPTION];
privilege_name is the access right or privilege granted to the user. Some of the access rights
are ALL, EXECUTE, and SELECT.
object_name is the name of an database object like TABLE, VIEW, STORED PROC and
SEQUENCE.
user_name is the name of the user to whom an access right is being granted.
user_name is the name of the user to whom an access right is being granted.
PUBLIC is used to grant access rights to all users.
ROLES are a set of privileges grouped together.
WITH GRANT OPTION - allows a user to grant access rights to other users.
REVOKE
Used to revoke privileges from the user
REVOKE privilege_name
ON object_name
FROM {user_name |PUBLIC |role_name}
1) System privileges - This allows the user to CREATE, ALTER, or DROP database objects.
2) Object privileges - This allows the user to EXECUTE, SELECT, INSERT, UPDATE, or DELETE data
from database objects to which the privileges apply.
5
Page 27 of 97
2016-2017
Constraints
Constraints within a database are rules which control values allowed in columns.
Integrity
Integrity refers to requirement that information be protected from improper modification.
Integrity constraints
Integrity constraints provide a way of ensuring that changes made to the database by
authorized users do not result in a loss of data consistency
Constraints
- Primary key
- Referential integerity
- Check constraint
- Unique Constraint
- Not Null/Null
Primary key
The primary key of a relational table uniquely identifies each record in the table
-Unique
-Not null
Referential integrity
We ensure that a value appears in one relation for a given set of attribute also appears for
a certain set of attribute in another relation.
eg : branch_name varchar2(10) references branch(branch_name)
The foreign key identifies a column or set of columns in one (referencing) table that refers to a
column or set of columns in another (referenced) table
Check contraint
Eg :
Create table branch(
Balance number check (balance >500));
7
Design an employee detail relation and explain referential integrity using SQL queries.
Page 28 of 97
Department of IT/CSE
2016-2017
Dept table
Create table dept(dept_id varchar2(20) primary key, dept_name varchar2(20))
Employee table
Create table employee(emp_id varchar2(20) primary key,emp_name varchar2(20),phone_no
number,dept_id varchar2(20) references emp(dept_id));
8
Query Optimization:
Among all equivalent evaluation plans choose the one with lowest cost.
Cost is estimated using statistical information from the database catalog
e.g. number of tuples in each relation, size of tuples, etc.
Embedded SQL
-The SQL standards defines embedding of SQL in variety of programming language such as
VB,C,C++,Java.
-A language to which SQL queries are embedded is referred to as a host language and the SQL
structures permitted in the language comprise embedded SQL.
Two reasons
Not all queries expressed in SQL,since SQL does not provide the full expressive power of
a general purpose language.
Nondeclarative actions- printing a report,intracting with user
Dynamic SQL
The dynamic SQL components of SQL allows programs to construct and submit SQL Queries at
runtime.
In contrast the embedded SQL statement must be completely present at compile time they are
compiled by the embedded SQL pre processor
Using dynamic SQL the programs can create SQL queries at runtime and can either have them
executed immediately or have them prepared for subsequent use.
10
Explain about different data types in data base language with an example.
User defined types useful in situations where the same data type is used in several columns from
different tables.
The names of the user-defined types provides the extra information.
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 29 of 97
Explain in detail about Heuristics and Cost estimates in Query Optimization (Nov/Dec 2014).
Describe the six clauses in the syntax of an SQL query, and show what type of constructs can be specified in each of
the six clauses. Which of the six clauses are required and which are optional? (Nov/Dec 2015)
WHERE Clause
The WHERE clause defines the criteria to limit the records affected by SELECT, UPDATE, and DELETE
statements.
Syntax:
SELECT column1, column2, columnN
FROM table_name
WHERE [condition]
Eg:
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000;
Having clause
Specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement.
HAVING is typically used in a GROUP BY clause.
Page 30 of 97
Department of IT/CSE
2016-2017
Syntax:
SELECT column1, column2
FROM table1, table2
WHERE [ conditions ]
GROUP BY column1, column2
HAVING [ conditions ];
Eg:
SQL > SELECT ID, NAME, AGE, ADDRESS, SALARY
FROM CUSTOMERS
GROUP BY age
HAVING COUNT(age) >= 2;
Group by clause
Group by clause is used to group the results of a SELECT query based on one or more columns.
Syntax:
SELECT column1, column2
FROM table_name
WHERE [ conditions ]
GROUP BY column1, column2
Eg:
SELECT NAME, SUM(SALARY) FROM CUSTOMERS GROUP BY NAME;
Order by
The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on one or more
columns. Some database sorts query results in ascending order by default.
Syntax:
The basic syntax of ORDER BY clause is as follows:
SELECT column-list
FROM table_name
[WHERE condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
Eg:
SELECT * FROM CUSTOMER ORDER BY NAME DESC;
Into clause
The SELECT INTO statement copies data from one table and inserts it into a new table.
Syntax:
SELECT column_name(s)
INTO newtable [IN externaldb]
FROM table1;
Eg:
SELECT CustomerName, ContactName
INTO Customers1
FROM Customers;
From clause
The SQL FROM clause is used to list the tables and any joins required for the SQL statement.
Syntax
FROM table1
[ { INNER JOIN
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 31 of 97
Department of IT/CSE
2016-2017
Eg:
SELECT products.product_name, categories.category_name
FROM products
INNER JOIN categories
ON products.category_id = categories.category_id;
13
Discuss about join order optimization and heuristic optimization algorithm. (Apr/May 2015)
Page 33 of 97
14
Department of IT/CSE
2016-2017
Create
Alter
Add
Modify
Drop
Rename
Drop
SYNTAX:
CREATE:
Create table <table name> (column name1 datatype1 constraints, column2 datatype2 . . .);
ALTER:
ADD:
Alter table <table name> add(column name1 datatype1);
MODIFY:
Alter table <table name> modify(column name1 datatype1);
DROP:
Alter table <table name> drop (column name);
RENAME:
Rename <old table name> to <new table name>;
DROP:
Drop table <table name>;
DATA MANIPULATION LANGUAGE
DML Commands:
Insert
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 34 of 97
Department of IT/CSE
2016-2017
SYNTAX:
INSERT:
Single level:
Insert into <table name> values (attributes1, attributes2);
Multilevel:
Insert into <table name> values (&attributes1,&attributes2.);
SELECT:
Single level:
Select <column name> from <table name>;
Multilevel:
Select * from <table name> where <condition>;
UPDATE:
Single level:
Update <table name> set <column name>=values where <condition>;
Multilevel:
Update <table name> set <column name>=values;
DELETE:
Single level:
Delete from <table name> where <column name>=values;
Multilevel:
Delete from <table name>;
DCL (Data Control Language)
GRANT
Used to grant privileges to the user
GRANT privilege_name
ON object_name
TO {user_name |PUBLIC |role_name}
[WITH GRANT OPTION];
privilege_name is the access right or privilege granted to the user. Some of the access rights
are ALL, EXECUTE, and SELECT.
object_name is the name of an database object like TABLE, VIEW, STORED PROC and
SEQUENCE.
user_name is the name of the user to whom an access right is being granted.
user_name is the name of the user to whom an access right is being granted.
PUBLIC is used to grant access rights to all users.
ROLES are a set of privileges grouped together.
WITH GRANT OPTION - allows a user to grant access rights to other users.
REVOKE
Used to revoke privileges from the user
REVOKE privilege_name
ON object_name
FROM {user_name |PUBLIC |role_name}
1) System privileges - This allows the user to CREATE, ALTER, or DROP database objects.
2) Object privileges - This allows the user to EXECUTE, SELECT, INSERT, UPDATE, or DELETE data
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 35 of 97
2016-2017
Page 36 of 97
Department of IT/CSE
2016-2017
Select degcode from candidate minus select degcode from degree order by degcode;
16
(ii) Write a SELECT statement to display the name of all the candidates who have got less than 40 marks in exactly 2
subjects.
Select cand_name from candidate c,degree d,mark m group by cand_name having (c.degcode =m.degcode) and
m.mark<40 and count(subject)=2;
(iii)Write a SELECT statement to display the name, subject and number of candidates for all degrees in which there
are less than 5 candidates.
Select name,subject,count(cand_name) from degree d,candidate c having (d.degcode =c.candidate )and
count(cand_name)<5 group by subject;
(iv) Write a SELECT statement to display the names of all the candidates who have got highest total marks in MSc.,
(Maths) (Nov/Dec 2015)
Select cand_name,subject sum(mark) from candidate c , degree d,mark m where d.degcode=m.degcode and
subject=MSc.,(maths) group by mark;
Briefly explain about Query Processing(May/June 2016)
The process of choosing a suitable execution strategy for processing a query is called query
optimization.
Two internal representations of a query
Query Tree
a tree data structure that corresponds to a relational algebra expression. It
represents the input relations of the query as leaf nodes of the tree, and
represents the relational algebra operations as internal nodes.
Query Graph
a graph data structure that corresponds to a relational calculus expression.
It does not indicate an order on which operations to perform first. There is
only a single graph corresponding to each query.
Page 37 of 97
Department of IT/CSE
2016-2017
Cost function
Number of execution strategies to be considered
Page 38 of 97
Department of IT/CSE
2016-2017
o CS1a = b;
o For an equality condition on a key, CS1a = (b/2) if the record is found; otherwise
CS1a = b.
S2. Binary search:
o CS2 = log2b + (s/bfr) 1
o For an equality condition on a unique (key) attribute, CS2 =log2b
S3. Using a primary index (S3a) or hash key (S3b) to retrieve a single record
o CS3a = x + 1; CS3b = 1 for static or linear hashing;
o CS3b = 1 for extendible hashing;
S4. Using an ordering index to retrieve multiple records:
o For the comparison condition on a key field with an ordering index, CS4 = x +
(b/2)
S5. Using a clustering index to retrieve multiple records:
o CS5 = x + (s/bfr)
S6. Using a secondary (B+-tree) index:
o For an equality comparison, CS6a = x + s;
o For an comparison condition such as >, <, >=, or <=,
o CS6a = x + (bI1/2) + (r/2)
S7. Conjunctive selection:
o Use either S1 or one of the methods S2 to S6 to solve.
o For the latter case, use one condition to retrieve the records and then check in the
memory buffer whether each retrieved record satisfies the remaining conditions in
the conjunction.
S8. Conjunctive selection using a composite index:
o Same as S3a, S5 or S6a, depending on the type of index.
Page 39 of 97
2
3
4
9
10
11
12
13
14
15
16
17
18
19
Department of IT/CSE
2016-2017
Page 40 of 97
22
23
24
25
26
27
Department of IT/CSE
2016-2017
What is transaction?
Collections of operations that form a single logical unit of work are called transactions.
What are the properties of transaction? Or Write the ACID properties of Transaction. (Nov/Dec 2014)
(Apr/May 2015)(May/June 2016)
Atomicity
Consistency
Isolation
Durability
What is recovery management component?
Ensuring durability is the responsibility of a software component of the base system called the recovery management
component.
When is a transaction rolled back?
Any changes that the aborted transaction made to the database must be undone. Once the changes caused by an
aborted transaction have been undone, then the transaction has been rolled back.
What are the states of transaction?
The states of transaction are
Active
Partially committed
Failed
Aborted
Committed
Terminated
What is a shadow copy scheme?
It is simple, but efficient, scheme called the shadow copy schemes. It is based on making copies of the database
called shadow copies that one transaction is active at a time. The scheme also assumes that the database is simply a
file on disk.
What is serializability? How it is tested? (Nov/Dec 2014)
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Precedence graph is used to
test the serializability
Mention the approaches of deadlock recovery
The common solution is to roll back one or more transactions to break the deadlock
28
29
Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some
item Q accessed by both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q).li and lj dont conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
If a schedule S can be transformed into a schedule S by a series of swaps of non-conflicting
instructions, we say that S and S are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule
Page 41 of 97
Department of IT/CSE
2016-2017
Let S and S be two schedules with the same set of transactions. S and S are view equivalent if
the following three conditions are met, for each data item Q,
If in schedule S, transaction Ti reads the initial value of Q, then in schedule S also
transaction Ti must read the initial value of Q.
If in schedule S transaction Ti executes read(Q), and that value was produced by
transaction Tj (if any), then in schedule S also transaction Ti must read the value of Q
that was produced by the same write(Q) operation of transaction Tj .
The transaction (if any) that performs the final write(Q) operation in schedule S must also
perform the final write(Q) operation in schedule S.
Write short notes on Transaction State and discuss the properties of transaction.
Active the initial state; the transaction stays in this state while it is executing
Partially committed after the final statement has been executed.
Failed -- after the discovery that normal execution can no longer proceed.
Aborted after the transaction has been rolled back and the database restored to its state prior to
the start of the transaction. Two options after it has been aborted:
restart the transaction
kill the transaction
Committed after successful completion.
ACID Properties
- Atomicity
-Consistency
-Isolation
-Durability
3
Briefly describe two phase locking in concurrency control techniques. (Nov/Dec 2014)
This protocol requires that each transaction issue lock and unlock request in two phases
Growing phase
Shrinking phase
Growing phase
-During this phase new locks can be occurred but none can be released
Shrinking phase
-During which existing locks can be released and no new locks can be occurred
Types
Strict two phase locking protocol
Rigorous two phase locking protocol
Strict two phase locking protocol
This protocol requires not only that locking be two phase, but also all exclusive locks
taken by a transaction be held until that transaction commits.
Rigorous two phase locking protocol
This protocol requires that all locks be held until all transaction commits
4
Explain the concepts of concurrent execution in Transaction processing system. (Nov/Dec 2014)
Page 42 of 97
Simultaneous execution of transactions over a shared database can create several data
integrity and consistency problems.
- lost updated problem
-Temporary updated problem
-Incorrect summery problem
Lost updated problem
T1
T2
Read(X)
X:=X-N
Write(X)
Read(Y)
Y:=Y+N
Write(Y)
Read(X)
X:=X+M
Write(X)
T2
Read(X)
X:=X-N
Write(X)
Rollback
Read(Y)
Read(X)
X:=X-M
Write(X)
T2
Read(B)
B:=B-N
Write(B)
Read(X)
X:=X-N
Write(X)
Sum:=0
Read(A)
Sum:=sum+A
Read(B)
Sum:=sum+B
.
.
.
Read(X)
Sum:=sum+X
Page 43 of 97
Department of IT/CSE
2016-2017
The common solution is to roll back one or more transactions to break the deadlock.
Three action need to be taken
a. Selection of victim
b. Rollback
c. Starvation
Selection of victim
i. Set of deadlocked transations,must determine which transaction to roll back to
break the deadlock.
ii. Consider the factor minimum cost
Rollback
- once we decided that a particular transaction must be rolled back, must determine how far this
transaction should be rolled back
- Total rollback
- Partial rollback
Starvation
Ensure that a transaction can be picked as victim only a finite number of times.
7
Intent locking
Intent locks are put on all the ancestors of a node before that node is locked explicitly.
If a node is locked in an intention mode, explicit locking is being done at a lower level of the tree.
Intent shared lock(IS)
If a node is locked in indent shared mode, explicit locking is being done at a lower level
of the tree, but with only shared-mode lock
Suppose the transaction T1 reads record ra2 in file Fa. Then,T1 needs to lock the database,
area A1,and Fa in IS mode, and finally lock ra2 in S mode.
Intent exclusive lock(IX)
if a node is locked in intent locking is being done at a lower level of the tree, but with exclusive
mode or shared-mode locks.
Suppose the transaction T2 modifies record ra9 in file Fa. Then,T2 needs to lock the database, area
A1,and Fa in IX mode, and finally to lock ra9 in X mode
Shared Intent exclusive lock (SIX)
If the node is locked in Shared Intent exclusive mode, the subtree rooted by that node is
locked explicitly in shared mode, and that explicit locking is being done at lower level
with exclusive mode.
Shared lock (S)
-T can tolerate concurrent readers but not concurrent updaters in R.
Exclusive lock (X)
-T cannot tolerate any concurrent access to R at all.
8
Crash Recovery
Though we are living in highly technologically advanced era where hundreds of satellite monitor the earth
and at every second billions of people are connected through information technology, failure is expected
but not every time acceptable.
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 44 of 97
Department of IT/CSE
2016-2017
DBMS is highly complex system with hundreds of transactions being executed every second. Availability
of DBMS depends on its complex architecture and underlying hardware or system software. If it fails or
crashes amid transactions being executed, it is expected that the system would follow some sort of
algorithm or techniques to recover from crashes or failures.
Failure Classification
To see where the problem has occurred we generalize the failure into various categories, as follows:
TRANSACTION FAILURE
When a transaction is failed to execute or it reaches a point after which it cannot be completed
successfully it has to abort. This is called transaction failure. Where only few transaction or process are
hurt.
Reason for transaction failure could be:
Logical errors: where a transaction cannot complete because of it has some code error or any
internal error condition
System errors: where the database system itself terminates an active transaction because DBMS
is not able to execute it or it has to stop because of some system condition. For example, in case of
deadlock or resource unavailability systems aborts an active transaction.
SYSTEM CRASH
There are problems, which are external to the system, which may cause the system to stop abruptly and
cause the system to crash. For example interruption in power supply, failure of underlying hardware or
software failure.
Examples may include operating system errors.
DISK FAILURE:
In early days of technology evolution, it was a common problem where hard disk drives or storage drives
used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other
failure, which destroys all or part of disk storage
Storage Structure
We have already described storage system here. In brief, the storage structure can be divided in various
categories:
Volatile storage: As name suggests, this storage does not survive system crashes and mostly
placed very closed to CPU by embedding them onto the chipset itself for examples: main memory, cache
memory. They are fast but can store a small amount of information.
Nonvolatile storage: These memories are made to survive system crashes. They are huge in data
storage capacity but slower in accessibility. Examples may include, hard disks, magnetic tapes, flash
memory, non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it many have several transactions being executed and various files opened for
them to modifying data items. As we know that transactions are made of various operations, which are
atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole must
be maintained that is, either all operations are executed or none.
When DBMS recovers from a crash it should maintain the following:
Page 45 of 97
Department of IT/CSE
2016-2017
It should check the states of all transactions, which were being executed.
A transaction may be in the middle of some operation; DBMS must ensure the atomicity of
transaction in this case.
It should check whether the transaction can be completed now or needs to be rolled back.
There are two types of techniques, which can help DBMS in recovering as well as maintaining the
atomicity of transaction:
Maintaining the logs of each transaction, and writing them onto some stable storage before
actually modifying the database.
Maintaining shadow paging, where are the changes are done on a volatile memory and later the
actual database is updated.
Log-Based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is
important that the logs are written prior to actual modification and stored on a stable storage media, which
is failsafe.
Log based recovery works as follows:
When a transaction enters the system and starts execution, it writes a log about it
<Tn, Start>
Page 46 of 97
Department of IT/CSE
2016-2017
Keeping and maintaining logs in real time and in real environment may fill out all the memory space
available in the system. At time passes log file may be too big to be handled at all. Checkpoint is a
mechanism where all the previous logs are removed from the system and stored permanently in storage
disk. Checkpoint declares a point before which the DBMS was in consistent state and all the transactions
were committed.
RECOVERY
When system with concurrent transaction crashes and recovers, it does behave in the following manner:
The recovery system reads the logs backwards from the end to the last Checkpoint.
If the recovery system sees a log with <T n, Start> and <Tn, Commit> or just <Tn, Commit>, it puts
the transaction in redo-list.
If the recovery system sees a log with <T n, Start> but no commit or abort log found, it puts the
transaction in undo-list.
All transactions in undo-list are then undone and their logs are removed. All transaction in redolist, their previous logs are removed and then redone again and log saved.
What is concurrency control? How is it implemented in DBMS? Illustrate with a suitable example.
In a multiprogramming environment where multiple transactions can be executed simultaneously, it is highly
important to control the concurrency of transactions. We have concurrency control protocols to ensure atomicity,
isolation, and serializability of concurrent transactions. Concurrency control protocols can be broadly divided into
two categories
Lock-based Protocols
Database systems equipped with lock-based protocols use a mechanism by which any transaction cannot read or
write data until it acquires an appropriate lock on it. Locks are of two kinds
Page 47 of 97
Department of IT/CSE
2016-2017
Binary Locks A lock on a data item can be in two states; it is either locked or unlocked.
Shared/exclusive This type of locking mechanism differentiates the locks based on their uses. If a lock is
acquired on a data item to perform a write operation, it is an exclusive lock. Allowing more than one
transaction to write on the same data item would lead the database into an inconsistent state. Read locks
are shared because no data value is being changed.
Two-phase locking has two phases, one is growing, where all the locks are being acquired by the transaction; and
the second phase is shrinking, where the locks held by the transaction are being released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it to an
Page 48 of 97
Department of IT/CSE
2016-2017
exclusive lock.
Strict Two-Phase Locking
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the transaction
continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a lock after using it. Strict-2PL
holds all the locks until the commit point and releases all the locks at a time.
Page 49 of 97
Operation rejected.
2016-2017
Department of IT/CSE
Operation executed.
Operation rejected.
Operation rejected and Ti rolled back.
What is deadlock? How does it occur? How transactions be written to (i) Avoid deadlock (ii) guarantee correct
execution. Illustrate with suitable example. (Nov/Dec 2015) (Nov/Dec 2014)
deadlock
System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another
transaction in the set.
Consider the following two transactions:
T1: write (A)
T2: write(A)
write(B)
write(B)
Schedule with deadlock
Avoid deadlock:
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 50 of 97
Department of IT/CSE
2016-2017
Briefly explain about Two phase commit and three phase commit protocols. (Apr/May 2015) (May/June 2016)
(Nov/Dec 2014)
Page 51 of 97
Possible Failures
Site Failure
Coordinator Failure
Network Partition
Three phase commit protocols.
Assumptions:
No network partitioning
At any point, at least one site must be up.
At most K sites (participants as well as coordinator) can fail
Phase 1: Obtaining Preliminary Decision: Identical to 2PC Phase 1.
Every site is ready to commit if instructed to do so
Under 2PC each site is obligated to wait for decision from coordinator
Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator failure
Phase 2: Recording the Preliminary Decision
Coordinator adds a decision record (<abort T> or <precommit T>) in its log and forces record
to stable storage. Coordinator sends a message to each participant informing it of the decision.
Participant records decision in its log
If abort decision reached then participant aborts locally
If pre-commit decision reached then participant replies with
<acknowledge T>
Phase 3: Recording Decision in the Database
Executed only if decision in phase 2 was to pre commit
Coordinator collects acknowledgments. It sends <commit T> message to the participants as soon as it
receives K acknowledgments.
Coordinator adds the record <commit T> in its log and forces record to stable storage.
Coordinator sends a message to each participant to <commit T>.
Participants take appropriate action locally
Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator failure
Avoids blocking problem as long as < K sites fail
Drawbacks:
higher overheads
Assumptions may not be satisfied in practice.
12
Consider the following schedules. The actions are listed in the order they are schedule, and prefixed with transaction
name.
S1: T1: R(X), T2: R(x), T1: W(Y), T2: W(Y), T1: R(Y), T2: R(Y)
S2:T3: R(X), T1: R(X), T1: W(Y), T2: R (Z), T2: W (Z), T3: R (Z)
For each of the schedules, answer the following questions:
i.
What is the precedence graph for the schedule?
ii.
Is the schedule conflict-serializable? If so, what are all the conflict equivalent serial schedules?
iii.
Is the schedule view-serializable? If so, what are all the view equivalent serial schedules? (Apr/May 2015)
Page 52 of 97
Department of IT/CSE
2016-2017
1. Serizability (or view) cannot be decided but NOT conflict serizability. It is recoverable and
avoid cascading aborts; NOT strict
2. It is serializable, conflict-serializable, and view-serializable regardless which action (commit or
abort) follows It is NOT avoid cascading aborts, NOT strict; We can not decide whether it's
recoverable or not, since the abort/commit sequence of these two transactions are not specified.
3. It is the same with 2.
4. Serizability (or view) cannot be decided but NOT conflict serizability. It is NOT avoided
cascading aborts, NOT strict; we cannot decide whether it's recoverable or not, since the
abort/commit sequence of these transactions are not specified.
5. It is serializable, conflict-serializable, and view-serializable; It is recoverable and avoid
cascading aborts; it is NOT strict.
6. It is NOT serializable, NOT view-serializable, and NOT conflict-serializable; it is recoverable
and avoids cascading aborts; It is NOT strict.
7. It belongs to all classes
8. It is serializable, NOT view-serializable, NOT conflict-serializable; it is NOT recoverable,
therefore NOT avoid cascading aborts, NOT strict.
9. It is serializable, view-serializable, and conflict-serializable; It is NOT recoverable, therefore
NOT avoid cascading aborts, NOT strict.
10. It belongs to all above classes.
11. It is NOT serializable and NOT view-serializable, NOT conflict-serializable; it is recoverable,
avoid cascading aborts and strict.
12. It is NOT serializable and NOT view-serializable, NOT conflict-serializable; it is recoverable,
but NOT avoid cascading aborts, NOT strict.
13
Page 53 of 97
Department of IT/CSE
2016-2017
Binary lock
Read/write(shared / Exclusive) lock
Binary lock
Page 54 of 97
Department of IT/CSE
2016-2017
This protocol requires that each transaction issue lock and unlock request in two phases
Growing phase
Shrinking phase
Growing phase
-During this phase new locks can be occurred but none can be released
Shrinking phase
-During which existing locks can be released and no new locks can be occurred
Types
Strict two phase locking protocol
Rigorous two phase locking protocol
Strict two phase locking protocol
This protocol requires not only that locking be two phase, but also all exclusive locks
taken by a transaction be held until that transaction commits.
Rigorous two phase locking protocol
This protocol requires that all locks be held until all transaction commits
UNIT IV
TRENDS IN DATABASE TECHNOLOGY
Overview of Physical Storage Media Magnetic Disks RAID Tertiary storage File Organization Organization of
Records in Files Indexing and Hashing Ordered Indices B+ tree Index Files B tree Index Files Static Hashing
Dynamic Hashing - Introduction to Distributed Databases- Client server technology- Multidimensional and Parallel
databases- Spatial and multimedia databases- Mobile and web databases- Data Warehouse-Mining- Data marts.
UNIT-IV / PART-A
1
What is B-Tree?
A B-tree eliminates the redundant storage of search-key values .It allows search key values to appear only once.
2
What is a B+-Tree index?
A B+-Tree index takes the form of a balanced tree in which every path from the root of the root of the root of the tree
to a leaf of the tree is of the same length.
3
What is a hash index?
A hash index organizes the search keys, with their associated pointers, into a hash file structure
4
Define seek time.
The time for repositioning the arm is called the seek time and it increases with the distance that the arm is called the
seek time.
5
Define rotational latency time.
The time spent waiting for the sector to be accessed to appear under the head is called the rotational latency time.
6
What is called mirroring?
The simplest approach to introducing redundancy is to duplicate every disk. This technique is called mirroring or
shadowing.
7
What are the two main goals of parallelism?
Load balance multiple small accesses, so that the throughput of such accesses increases.
Parallelize large accesses so that the response time of large accesses is reduced.
8
What are the factors to be taken into account when choosing a RAID level?
Monetary cost of extra disk storage requirements.
Performance requirements in terms of number of I/O operations
Performance when a disk has failed.
Performances during rebuild.
9
What is an index?
Page 55 of 97
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Department of IT/CSE
2016-2017
An index is a structure that helps to locate desired records of a relation quickly, without examining all records
What are the types of storage devices?
Primary storage, Secondary storage, Tertiary storage, Volatile storage, Nonvolatile storage
What is called remapping of bad sectors?
If the controller detects that a sector is damaged when the disk is initially formatted, or when an attempt is made to
write the sector, it can logically map the sector to a different physical location.
Define software and hardware RAID systems?(May/June 2016)
RAID can be implemented with no change at the hardware level, using only software modification. Such RAID
implementations are called software RAID systems and the systems with special hardware support are called
hardware RAID systems.
Define hot swapping?
Hot swapping permits the removal of faulty disks and replaces it by new ones without turning power off. Hot
swapping reduces the mean time to repair.
What are the ways in which the variable-length records arise in database systems?
Storage of multiple record types in a file, Record types that allow variable lengths for one or more fields, Record
types that allow repeating fields.
What are the two types of blocks in the fixed length representation? Define them.
Anchor block: Contains the first record of a chain.
Overflow block: Contains the records other than those that are the first record of a chain.
What is hashing file organization?
In the hashing file organization, a hash function is computed on some attribute of each record. The result of the hash
function specifies in which block of the file the record should be placed.
What are called index-sequential files?
The files that are ordered sequentially with a primary index on the search key are called index-sequential files.
Define Primary index and Secondary Index
It is in a sequentially ordered file, the index whose search key specifies the sequential order of the file. Also called
clustering index. The search key of a primary index is usually but not necessarily the primary key
It is an index whose search key specifies an order different from the sequential order of the file. Also called non
clustering index.
Define Data marts.
A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data
mart is a subset of the data warehouse that is usually oriented to a specific business line or team. Data marts are
small slices of the data warehouse.
Define Distributed Database Systems
Database spread over multiple machines (also referred to as sites or nodes).Network interconnects the machines.
Database shared by users on multiple machines is called Distributed Database Systems
Define Data mining and Data warehouse (Nov/Dec 2014)(May/June 2016)
Data mining: It refers to the mining or discovery of new information in terms of patterns or rules from vast amounts
of data.
Data warehouse: Data warehouses provide access to data for complex analysis, knowledge discovery, and decision
making
What are the advantages of web-DBMS?
Platform independence, graphical user interface, transparent network access, scalable deployment
What are the issues to be considered for mobile database?
Data distribution , replication, recovery and fault tolerance
What is spatial data?
Spatial data include geographic data, such as maps and associated information and computer aided design data
What is multimedia data? Explain about multimedia database
Multimedia data typically means digital images, audio, video, animation and graphics together with text data.The
acquisition, generation, storage and processing of multimedia data in computers and transmission over networks
have grown tremendously in the recent past. Consistency, concurrency, integrity, security and availability of data.
What are the types of Distributed Database
Homogenous distributed DB
Heterogeneous distributed DB
Define fragmentation in Distributed Database
The system partitions the relation into several fragment and stores each fragment at different sites
Two approaches : Horizontal fragmentation and Vertical fragmentation
Page 56 of 97
Department of IT/CSE
2016-2017
29
Give an example of a join that is not a simple equi-join for which partitioned parallelism can be used.
(Nov/Dec 2015)
30
Write about the four types (Star, Snowflake, Galaxy and Fast constellation) of Data warehouse schemas.
(Apr/May 2015)
Star schema: Consists of a fact table with a single table for each dimension. This dimension table contains the set of
attributes
Snowflake Schema: It is a variation of star schema, in which the dimensional tables from a star schema are
organized into a hierarchy by normalizing them.
Fact Constellation Fact constellation is a set of tables that share some dimension tables. However, fact
constellations limit the possible queries for the warehouse.
Galaxy Combination of both Snowflake and star schema
UNIT-IV / PART-B
Describe File Organization.
Page 57 of 97
Department of IT/CSE
2016-2017
Free Lists
Store the address of the first deleted record in the file header.
Use this first record to store the address of the second deleted record, and so on
Pointer method
A variable-length record is represented by a list of fixed-length records, chained together
via pointers.
Can be used even if the maximum record length is not known
Disadvantage to pointer structure; space is wasted in all records except the first in a a chain.
Solution is to allow two kinds of block in file:
Anchor block contains the first records of chain
Overflow block contains records other than those that are the first records of chains.
Define RAID and Briefly Explain RAID techniques. (Nov/Dec 2014) (Apr/May 2015) (Nov/Dec 2015)(May/June 2016)
RAID Level 5: Block-Interleaved Distributed Parity; partitions data and parity among all N + 1
disks, rather than storing data in N disks and parity in 1 disk.
Page 59 of 97
Department of IT/CSE
2016-2017
RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores extra redundant
information to guard against multiple disk failures.
Better reliability than Level 5 at a higher cost; not used as widely.
Page 60 of 97
Static hashing:
A bucket is a unit of storage containing one or more records.
In a hash file organization we obtain the bucket of a record directly from its search-key value
using a hash function.
Hash function h is a function from the set of all search-key values K to the set of all
bucket addresses B.
Hash function is used to locate records for access, insertion as well as deletion.
Example:
-There are 10 buckets,
-The hash function returns the sum of the binary representations of the characters modulo 10
E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3
Dynamic hashing:
Good for database that grows and shrinks in size
Allows the hash function to be modified dynamically
Extendable hashing one form of dynamic hashing
Hash function generates values over a large range typically b-bit integers, with
b = 32.
At any time use only a prefix of the hash function to index into a table of bucket
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 61 of 97
Page 62 of 97
Department of IT/CSE
2016-2017
In relational table format, this array translates into a table of 1000 records
PERFORMANCE ADVANTAGES
- A user wants to know the Sales Volume figure when
CAR TYPE=SEDAN, COLOR=BLUE, and
DEALERSHIP=GLEASON.
- A relational system might have to search through all 1000
records just to find the right record.
- In contrast the multidimensional structure has more
"knowledge" about where a particular piece of data lies.
-If we assume that the average search equals half the maximum search, we are looking at 15 versus 500 searchesa
3300% improvement in performance!
FEATURES OF MULTIDIMENSIONAL DATABASES
Multidimensional Data Views:
Rotation
Ranging
Hierarchies, roll-ups & drill-downs
Advantages
Supports end user business queries
Powerful and efficient for comparative analysis
Handles large volumes of data
Easily maintainable
Parallel DB:
Parallel machines are becoming quite common and affordable
Prices of microprocessors, memory and disks have dropped sharply
Page 63 of 97
Department of IT/CSE
2016-2017
Types of architectures
Shared-memory
Shared-disc
Shared-nothing
Parallelism in Databases:
Data can be partitioned across multiple disks for parallel I/O.
Individual relational operations (e.g., sort, join, aggregation) can be executed in parallel
data can be partitioned and each processor can work independently on its own partition.
Queries are expressed in high level language (SQL, translated to relational algebra)
makes parallelization easier.
Different queries can be run in parallel with each other.
Concurrency control takes care of conflicts.
Thus, databases naturally lend themselves to parallelism.
I/OParallelism:
Horizontal partitioning
-Round-robin
-Hash partitioning:
Range partitioning
Interquery Parallelism:
Queries/transactions execute in parallel with one another.
Increases transaction throughput; used primarily to scale up a transaction processing system to support a
larger number of transactions per second.
Easiest form of parallelism to support, particularly in a shared-memory parallel database, because even
sequential database systems support concurrent processing.
More complicated to implement on shared-disk or shared-nothing architectures
Locking and logging must be coordinated by passing messages between processors.
Data in a local buffer may have been updated at another processor.
Cache-coherency has to be maintained reads and writes of data in buffer must find latest
version of data.
Intraquery Parallelism:
Execution of a single query in parallel on multiple processors/disks; important for speeding up long-running
queries.
Two complementary forms of intraquery parallelism :
Intraoperation Parallelism parallelize the execution of each individual operation in the query.
Interoperation Parallelism execute the different operations in a query expression in parallel.
Interoperator Parallelism:
Pipelined parallelism
Independent parallelism
6
In an ordered index, index entries are stored sorted on the search key value.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential
order of the file.
Secondary index: an index whose search key specifies an order different from the sequential
order of the file.
Types of Ordered Indices
Dense index
Sparse index
Dense index Index record appears for every search-key value in the file.
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 64 of 97
Department of IT/CSE
2016-2017
Sparse Index: contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
Find index record with largest search-key value < K
Search file sequentially starting at the record to which the index record points
Multilevel index
If primary index does not fit in memory, access becomes expensive.
To reduce number of disk accesses to index records, treat primary index kept on disk as a
sequential file and construct a sparse index on it.
outer index a sparse index of primary index
inner index the primary index file
If even outer index is too large to fit in main memory, yet another level of index can be
created, and so on.
Secondary Indices
- Index record points to a bucket that contains pointers to all the actual records with that
particular searchkey value.
- Secondary indices have to be dense
7
Explain about B+ trees indexing concepts with an example (Nov/Dec 2014)(May/June 2016)
Disadvantage of indexed-sequential files: performance degrades as file grows, since many overflow
blocks get created. Periodic reorganization of entire file is required.
Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in the
face of insertions and deletions. Reorganization of entire file is not required to maintain performance.
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 65 of 97
Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant
storage of search keys.
Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer field for
each search key in a nonleaf node must be included.
Page 66 of 97
Department of IT/CSE
2016-2017
Explain about Distributed Databases and their characteristics, also list out advantages and disadvantages.
In distributed database system data reside in several location where as centralized database system
the data reside in single location
Classification
- Homogenous distributed DB
- Heterogeneous distributed DB
Homogenous distributed DB
All sites have identical database management software, are aware of one another.
Agree to cooperate in processing users request.
Heterogeneous distributed DB
different sites may use different schemas and different DBMS software.
The sites may not be aware of one another
Provide only limited facilities for cooperation in transaction processing.
Consider a relation r, there are two approaches to store this relation in the distributed DB.
Replication
Fragmentation
Replication
The system maintains several identical replicas(copies) of the relation at different site.
Full replication- copy is stored in every site in the system.
Advantages and disadvantages
Availability
Increased parallelism
Increased overhead update
Fragmentation
The system partitions the relation into several fragment and stores each fragment at
different sites
Two approaches
Horizontal fragmentation
Vertical fragmentation
Horizontal fragmentation
Splits the relation by assigning each tuple of r to one or more fragments
relation r is partitioned into a number of subsets, r 1 ,r2,..rn and can be reconstruct the
original relation using union of all fragments, that is
Page 67 of 97
10
2016-2017
Vertical fragmentation
Splits the relation by decomposing scheme R of relation and reconstruct the original
relation by using natural join of all fragments. that is
r = r1 r2 rn
Illustrate indexing and hashing techniques with suitable examples. (Nov/Dec 2015)
In an ordered index, index entries are stored sorted on the search key value.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential
order of the file.
Secondary index: an index whose search key specifies an order different from the sequential
order of the file.
Types of Ordered Indices
Dense index
Sparse index
Dense index Index record appears for every search-key value in the file.
Sparse Index: contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
Find index record with largest search-key value < K
Search file sequentially starting at the record to which the index record points
Multilevel index
If primary index does not fit in memory, access becomes expensive.
To reduce number of disk accesses to index records, treat primary index kept on disk as a
sequential file and construct a sparse index on it.
outer index a sparse index of primary index
inner index the primary index file
If even outer index is too large to fit in main memory, yet another level of index can be
created, and so on.
Page 68 of 97
Department of IT/CSE
2016-2017
Secondary Indices
- Index record points to a bucket that contains pointers to all the actual records with that
particular searchkey value.
- Secondary indices have to be dense
Hashing:
Static hashing:
A bucket is a unit of storage containing one or more records.
In a hash file organization we obtain the bucket of a record directly from its search-key value
using a hash function.
Hash function h is a function from the set of all search-key values K to the set of all
bucket addresses B.
Hash function is used to locate records for access, insertion as well as deletion.
Example:
-There are 10 buckets,
-The hash function returns the sum of the binary representations of the characters modulo 10
E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3
Dynamic hashing:
Good for database that grows and shrinks in size
Allows the hash function to be modified dynamically
Extendable hashing one form of dynamic hashing
Hash function generates values over a large range typically b-bit integers, with
b = 32.
At any time use only a prefix of the hash function to index into a table of bucket
addresses.
Let the length of the prefix be i bits, 0 i 32.
Bucket address table size = 2i. Initially i = 0
Value of i grows and shrinks as the size of the database grows and shrinks.
Multiple entries in the bucket address table may point to a bucket.
Thus, actual number of buckets is < 2i
The number of buckets also changes dynamically due to coalescing and splitting
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 69 of 97
Department of IT/CSE
2016-2017
11
Write short notes on (i) Spatial and multimedia databases (ii) Mobile and web databases. (Nov/Dec 2015)
Spatial DB:
A spatial database is a database that is optimized to store and query data related to objects in space,
including points, lines and polygons
Keep track of objects in a multi-dimensional space
Maps
Geographical Information Systems (GIS)
Weather
Examples of Spatial data
Census Data
NASA satellites imagery - terabytes of data per day
Weather and Climate Data
Rivers, Farms, ecological impact
Medical Imaging
R-trees
Quad trees
Multimedia databases
Page 70 of 97
12
Explain the architectural components of a data warehouse and write Data marts. (Apr/May 2015)
Page 71 of 97
Page 72 of 97
Department of IT/CSE
2016-2017
5
6
10
11
12
13
14
15
UNIT-V / PART-A
Define Database security.
Database security concerns the use of a broad range of information security controls to protect databases (potentially
including the data, the database applications or stored functions, the database systems, the database servers and the
associated network links) against compromises of their confidentiality, integrity and availability.
What are the Security risks to database systems?
Unauthorized or unintended activity or misuse by authorized database users, database administrators, or
network/systems managers, or by unauthorized users or hackers. For example inappropriate access to sensitive data,
metadata or functions within databases, or inappropriate changes to the database programs, structures or security
configurations
List out the types of information security.
Access control ,Auditing, Authentication, Encryption, Integrity controls, Backups, Application security, Database
Security applying Statistical Method
Define Authentication.
Authentication is the act of determining the identity of a user and of the host that they are using. The goal of
authentication is to first verify that the user, either a person or system, which is attempting to interact with your
system is allowed to do so. The second goal of authentication is to gather information regarding the way that the user
is accessing your system
What is Authorization?
Authorization is the act of determining the level of access that an authorized user has to behaviour and data.
Define Check Point.
This is the place to validate users and to make appropriate decisions when dealing with security breaches. Also
known as Access Verification, Validation and Penalization, and Holding off Hackers.
Define OLTP.
Online transaction processing, or OLTP, is a class of information systems that facilitate and manage transactionoriented applications, typically for data entry and retrieval transaction processing.
Define Privileges.
A separate database administrator (DBA) account should be created and protected that has full privileges to
create/drop databases, create user accounts, and update user privileges. This simple means of separation of
responsibility helps prevent accidental mis-configuration, lowers risk and lowers scope of compromise.
What is Cryptography in database?
Cryptography is the art of "extreme information security." It is extreme in the sense that once treated with a
cryptographic algorithm; a message (or a database field) is expected to remain secure even if the adversary has full
access to the treated message. The adversary may even know which algorithm was used. If the cryptography is good,
the message will remain secure.
What is Benefits of Database Encryption?
Ensure guaranteed access to encrypted data by authorized users by automating storage and back-up for mission
critical master encryption keys. Simplify data privacy compliance obligations and reporting activities through the use
of a security-certified encryption and key management to enforce critical best practices and other standards of due
care.
What is statistical database?
A statistical database is a database used for statistical analysis purposes. It is an OLAP (online analytical processing),
instead of OLTP (online transaction processing) system. Statistical databases contain parameter data and the
measured data for these parameters.
Define Database replication.
Database replication can be used on many database management systems, usually with a master/slave relationship
between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave
outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent
updates.
What is homogeneous distributed database and heterogeneous distributed database
A homogeneous distributed database has identical software and hardware running all databases instances, and may
appear through a single interface as if it were a single database. A heterogeneous distributed database may have
different hardware, operating systems, database management systems, and even data models for different databases.
What are benefits of a data ware house
Congregate data from multiple sources into a single database so a single query engine can be used to present data.
Mitigate the problem of database isolation level lock contention in transaction processing systems caused by
attempts to run large, long running, analysis queries in transaction processing databases.
Define offline data warehouse and Online database.
Page 73 of 97
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Department of IT/CSE
2016-2017
Data warehouses at this stage are updated from data in the operational systems on a regular basis and the data
warehouse data are stored in a data structure designed to facilitate reporting. Online Integrated Data Warehousing
represent the real time Data warehouses stage data in the warehouse is updated for every transaction performed on
the source data
Write about integrated data warehouse
These data warehouses assemble data from different areas of business, so users can look up the information they
need across other systems.
Define spatial data mining.
Spatial data mining is the application of data mining methods to spatial data. The end objective of spatial data
mining is to find patterns in data with respect to geography.
What is the advantage of OODB?
An integrated repository of information that is shared by multiple users, multiple products, multiple applications on
multiple platforms.
Define Text mining.
Text mining is also referred to as text data mining, it is equivalent to text analytics, refers to the process of deriving
high-quality information from text. High-quality information is typically derived through the devising of patterns
and trends through means such as statistical pattern learning.
Define XML Database.
An XML database is a data persistence software system that allows data to be stored in XML format. These data can
then be queried, exported and serialized into the desired format. XML databases are usually associated with
document-oriented databases.
Define Clustering.
Clustering techniques consider data tuples as objects. They partition the objects into groups or clusters so that objects
within a cluster are similar to one another and dissimilar to objects in other clusters.
Define Association Rule Mining. (Apr/May 2015)
Association rule mining is discovering frequent patterns, associations and correlations among items which are
meaningful to the users and can generate strong rules on the basis of these frequent patterns, which helps in decision
support system.
Define Information Retrieval.
It is an activity of obtaining information resources relevant to an information need from a collection of information
resources.
Define Crawling and indexing the web. (Nov/Dec 2014)
Web Crawling is the process of search engines combing through web pages in order to properly index them. These
web crawlers systematically crawl pages and look at the keywords contained on the page, the kind of content, all
the links on the page, and then returns that information to the search engines server for indexing. Then they follow
all the hyperlinks on the website to get to other websites. When a search engine user enters a query, the search engine
will go to its index and return the most relevant search results based on the keywords in the search term. Web
crawling is an automated process and provides quick, up to date data.
Define Relevance Ranking. (Nov/Dec 2014)
A system in which the search engine tries to determine the theme of a site that a link is coming from.
Can we have more than one constructor in a class? If yes, explain the need for such a situation . (Nov/Dec 2015)
Yes, default constructor and constructor with parameter
List the types of privileges used in database access control. (Nov/Dec 2015)
Types of privileges used in database access control are Read, Write and Update.
Define Threats and Risks (Apr/May 2015)
Threats: Accidental loss and corruption, and from deliberate unauthorized attempts to access or alter that data.Loss
of integrity, Loss of availability, Loss of confidentiality
Risks: Unauthorized or unintended activity or misuse by authorized database users, database administrators, or
network/systems managers, or by unauthorized users or hackers.
Explain Data Classification. (May/June 2016)
Classification is a data mining function that assigns items in a collection to target categories or classes. The goal of
classification is to accurately predict the target class for each case in the data. For example, a classification model
could be used to identify loan applicants as low, medium, or high credit risks.
UNIT-V / PART-B
Explain about Object Oriented Databases and XML Databases.
Complex Objects
Complex objects are built by applying constructors to simpler objects including: sets, lists and tuples. An
example is illustrated below:
Encapsulation
Encapsulation is derived from the notion of Abstract Data Type (ADT). It is motivated by the need to
make a clear distinction between the specification and the implementation of an operation. It reinforces
modularity and provides a form of logical data independence.
Page 75 of 97
Department of IT/CSE
2016-2017
Class
A class object is an object which acts as a template.
It specifies:
A structure that is the set of attributes of the instances
A set of operations
A set of methods which implement the operations
Instantiation means generating objects, Ex. 'new' operation in C++
Persistence of objects: Two approaches
An implicit characteristic of all objects
An orthogonal characteristic - insert the object into a persistent collection of objects
Inheritance
A mechanism of reusability, the most powerful concept of OO programming
Page 76 of 97
Department of IT/CSE
2016-2017
Association
Association is a link between entities in an application
In OODB, associations are represented by means of references between objects
a representation of a binary association
a representation of a ternary association
reverse reference
Page 77 of 97
Department of IT/CSE
2016-2017
ADVANTAGES OF OODB
-An integrated repository of information that is shared by multiple users, multiple products, multiple
applications on multiple platforms.
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 78 of 97
2016-2017
2. Impedance mismatch: Programming languages and database systems must be interfaced to solve
application problems. But the language style, data structures, of a programming language (such as C) and
the DBMS (such as Oracle) are different. The OODB supports general purpose programming in the
OODB framework.
3. New application requirements: Especially in OA, CAD, CAM, CASE, object-orientation is the most
natural and most convenient.
XML DB
XML Database is used to store the huge amount of information in the XML format. As the use
of XML is increasing in every field, it is required to have the secured place to store the XML
documents. The data stored in the database can be queried using XQuery, serialized, and
exported into desired format.
XML- enabled
Explain in detail (i) Clustering (ii) Information Retrieval (iii) Transaction processing.
(Nov/Dec 2014)
(i) Clustering
Unsupervised learning or clustering builds models from data without predefined classes.
The goal is to place records into groups where the records in a group are highly similar to each
other and dissimilar to records in other groups.
The k-Means algorithm is a simple yet effective clustering technique
Agglomerative clustering algorithms
Build small clusters, then cluster small clusters into bigger clusters, and
so on
Divisive clustering algorithms
Start with all items in a single cluster, repeatedly refine (break) clusters
into smaller ones
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 79 of 97
privilege is a right to execute a particular type of SQL statement or to access another user's object. Some
examples of privileges include the right to:
You grant privileges to users so these users can accomplish tasks required for their jobs. You should grant
a privilege only to a user who requires that privilege to accomplish the necessary work. Excessive
granting of unnecessary privileges can compromise security. A user can receive a privilege in two
different ways:
You can grant privileges to users explicitly. For example, you can explicitly grant to user SCOTT
the privilege to insert records into the employees table.
You can also grant privileges to a role (a named group of privileges), and then grant the role to
one or more users. For example, you can grant the privileges to select, insert, update, and delete
records from the employees table to the role named clerk, which in turn you can grant to
users scott and brian.
Types:
System Privileges
Schema Object Privileges
Table Privileges
View Privileges
Procedure Privileges
Page 80 of 97
Suppose that A1 creates the two base relations EMPLOYEE and DEPARTMENT
A1 is then owner of these two relations and hence all the relation privileges on each of them.
Suppose that A1 wants to grant A2 the privilege to insert and delete tuples in both of these
relations, but A1 does not want A2 to be able to propagate these privileges to additional accounts:
Page 81 of 97
Department of IT/CSE
2016-2017
Data Warehouse
Databases
Cleaning
Reformatting
OLAP
Data
Metadata
DSSI
EIS
Data
Mining
Updates/New Data
Page 82 of 97
Snowflake Schema:
It is a variation of star schema, in which the dimensional tables from a
star schema are organized into a hierarchy by normalizing them.
Fact Constellation
Fact constellation is a set of tables that share some dimension tables.
However, fact constellations limit the possible queries for the
warehouse.
Page 83 of 97
Data Mining:
The discovery of new information in terms of patterns or rules from vast amounts of data.
The process of finding interesting structure in data.
The process of employing one or more computer learning techniques to automatically analyze and
extract knowledge from data.
Goals of Data Mining:
Prediction:
Determine how certain attributes will behave in the future.
Identification:
Identify the existence of an item, event, or activity.
Classification:
Partition data into classes or categories.
Optimization:
Optimize the use of limited resources.
Types of Discovered Knowledge
Association Rules
Classification Hierarchies
Sequential Patterns
Patterns Within Time Series
Clustering
Additional Data Mining Methods:
Sequential pattern analysis
Time Series Analysis
Regression
Neural Networks
Genetic Algorithms
Association Rules
Page 84 of 97
Department of IT/CSE
2016-2017
Algorithm used
Market-Basket Model, Support, and Confidence
Apriori Algorithm
Sampling Algorithm
Frequent-Pattern Tree Algorithm
Partition Algorithm
Example
Retail shops are often interested in associations between different items that people buy.
Someone who buys bread is quite likely also to buy milk
A person who bought the book Database System Concepts is quite likely also to buy the
book Operating System Concepts.
Associations information can be used in several ways.
E.g. when a customer buys a particular book, an online shop may suggest associated
books.
Association rules:
bread milk
DB-Concepts, OS-Concepts Networks
Left hand side: antecedent, right hand side: consequent
An association rule must have an associated population; the population consists of a set of
instances
Page 85 of 97
Department of IT/CSE
2016-2017
Page 86 of 97
2016-2017
Web crawlers are programs that locate and gather information on the Web
Recursively follow hyperlinks present in known documents, to find other documents
Starting from a seed set of documents
Fetched documents
Handed over to an indexing system
Can be discarded after indexing, or store as a cached copy
Crawling the entire Web would take a very large amount of time
Search engines typically cover only a part of the Web, not all of it
Take months to perform a single crawl
Association Rules
Algorithm used
Market-Basket Model, Support, and Confidence
Apriori Algorithm
Sampling Algorithm
Frequent-Pattern Tree Algorithm
Partition Algorithm
Example
Page 87 of 97
Department of IT/CSE
2016-2017
Retail shops are often interested in associations between different items that people buy.
Someone who buys bread is quite likely also to buy milk
A person who bought the book Database System Concepts is quite likely also to buy the
book Operating System Concepts.
Associations information can be used in several ways.
E.g. when a customer buys a particular book, an online shop may suggest associated
books.
Association rules:
bread milk
DB-Concepts, OS-Concepts Networks
Left hand side: antecedent, right hand side: consequent
An association rule must have an associated population; the population consists of a set of
instances
Support and Confidence
Support count:
The support count of an itemset X, denoted by X.count, in a data set T is the number of
transactions in T that contain X. Assume T has n transactions.
Then,
( X Y ).count
n Y ).count
(X
confidence
X .count
support
Apriori principle:
If an itemset is frequent, then all of its subsets must also be frequent
Page 88 of 97
Support of an
Xitemset
, Y : ( Xnever
Y exceeds
) s ( Xthe
) support
s (Y ) of its subsets
This is known as the anti-monotone property of support
Apriori Alogorithm
N= Nomber of attributes
Attribute_list = all attributes
For i=1 to N
Frequent_item_set = generate item set of i attributes using attribute_list
Frequent_item_set = remove all infrequent itemsets
Rule = Rule U generate rule using Frequent_item_set
Attribute_list = Only attributes contained in Frequent_item_set
End For
Brute-force approach;
Given a set of transactions T, the goal of association rule mining is to find all rules having
support minsup threshold
confidence minconf threshold
Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Discard rules that fail the minsup and minconf thresholds
Two steps approach
The general algorithm for generating association rules is a two-step process.
Generate all itemsets that have a support exceeding the given threshold. Itemsets with this
property are called large or frequent itemsets.
Generate rules for each itemset as follows:
For itemset X and Y a subset of X, let Z = X Y;
If support(X)/Support(Z) > minimum confidence, the rule Z=>Y is a valid rule.
Page 89 of 97
Department of IT/CSE
2016-2017
Describe the GRANT functions and explain how it relates to security. What types of privileges may be granted? How
rights could be revoked? (Nov/Dec 2015)
Granting and revoking privileges
Page 90 of 97
Suppose an Object Oriented database had an object A, which references object B, which in turn references object C.
Assume all objects are on disk initially? Suppose a program first dereferences A, then dereferences B by following
the reference from A, and then finally dereferences C. Show the objects that are represented in memory after each
dereference, along with their state. (Nov/Dec 2015)
Page 91 of 97
11
Department of IT/CSE
2016-2017
Neatly write the K-Mean algorithm and show the intermediate results in clustering the below given points into two
clusters using K-Means algorithm.
P1: (0,0), P2:(1,10), P3:(2,20), P4:(1,15), P5:(1000,2000),P6:(1500,1500),P7:(1000,1250). (Apr/May 2015)
K-Means algorithm
K-Means clustering intends to partition n objects into k clusters in which each object belongs to the
cluster with the nearest mean. This method produces exactly k different clusters of greatest possible
distinction. The best number of clusters k leading to the greatest separation (distance) is not known as a
priori and must be computed from the data. The objective of K-Means clustering is to minimize total
intra-cluster variance, or, the squared error function:
Algorithm
1. Clusters the data into k groups where k is predefined.
2. Select k points at random as cluster centers.
3. Assign objects to their closest cluster center according to the Euclidean distance function.
4. Calculate the centroid or mean of all objects in each cluster.
5. Repeat steps 2, 3 and 4 until the same points are assigned to each cluster in consecutive rounds
Problem
Consider the following data set consisting of the scores of two variables on each of seven
individuals:
This data set is to be grouped into two clusters. As a first step in finding a sensible initial partition, let the
A & B values of the two individuals furthest apart (using the Euclidean distance measure), define the
initial cluster means, giving:
The remaining individuals are now examined in sequence and allocated to the cluster to which they are
closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated each time a
new member is added. This leads to the following series of steps:
Page 92 of 97
Department of IT/CSE
2016-2017
Now the initial partition has changed, and the two clusters at this stage having the following
characteristics:
But we cannot yet be sure that each individual has been assigned to the right cluster. So, we compare each
individuals distance to its own cluster mean and to that of the opposite cluster. And we find:
Only individual 3 is nearer to the mean of the opposite cluster (Cluster 2) than its own (Cluster 1). In other
words, each individual's distance to its own cluster mean should be smaller than the distance to the other
cluster's mean (which is not the case with individual 3). Thus, individual 3 is relocated to Cluster 2
resulting in the new partition:
The iterative relocation would now continue from this new partition until no more
relocationoccurs. However, in this example each individual is now nearer its own cluster mean than that
of theother cluster and the iteration stops, choosing the latest partitioning as the final cluster solution.Also,
it is possible that the k-means algorithm won't find a final solution. In this case it would bea good idea to
consider stopping the algorithm after a pre-chosen maximum of iterations
St. Josephs College of Engineering & St. Josephs Institute of Technology
Page 93 of 97
Department of IT/CSE
2016-2017
Discuss about the Access control mechanism and Cryptography methods to secure the Database. (Apr/May 2015)
Page 94 of 97
2016-2017
Another security is that of flow control, which prevents information from flowing in such a
way that it reaches unauthorized users.
Encryption
A final security issue is data encryption, which is used to protect sensitive data (such as credit card
numbers) that is being transmitted via some type communication network.
The data is encoded using some encoding algorithm.
Types of database security
A DBMS typically includes a database security and authorization subsystem that is responsible for ensuring
the security portions of a database against unauthorized access.
Types of database security mechanisms:
Page 97 of 97