Sei sulla pagina 1di 23

ANNA UNIVERSITY- CHENNAI-JUNE 2010

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SUB CODE/SUB NAME: CS9221 / DATABASE TECHNOLOGY
ANSWER KEY

Part A – (10*2=20 Marks)

1. What is fragmentation?

Fragmentation is a database server feature that allows you to control where data is stored
at the table level. Fragmentation enables you to define groups of rows or index keys
within a table according to some algorithm or scheme. You use SQL statements to create
the fragments and assign them to dbspaces.

2. What is Concurrency control?

Concurrency control is the activity of coordinating concurrent accesses to a database in a


multiuser system. Concurrency control allows user to access a database in a multi-
programmed fashion while preserving the consistency of the data.

3. What is Persistence?

Persistence is the property of an object through which its existence transcends time i.e.
(the object continues to exist after its creator ceases to exist), and/or space (i.e. the
object’s location moves from the address space in which it was created).

4. What is Transaction Processing?

A Transaction Processing system (TPS) is a set of information which processes the data
transaction in database system that monitors transaction programs (a special kind of
program). For e.g. in an electronic payment is made, the amount must be both withdrawn
from one account and added to the other; it cannot complete only one of those steps.
Either both must occur, or neither. In case of a failure preventing transaction completion,
the partially executed transaction must be 'rolled back' by the TPS.

5. What is Client/Server model?


 The server in a client/server model is simply the DBMS, whereas the client is the
database application serviced by the DBMS.

 The client/server model of a database system is classified into basic & distributed
client/server model.

6. What is the difference between data warehousing and data mining?

Data warehousing: It is the process that is used to integrate and combine data from
multiple sources and format into a single unified schema. So it provides the enterprise
with a storage mechanism for its huge amount of data.

Data mining: It is the process of extracting interesting patterns and knowledge from
huge amount of data. So we can apply data mining techniques on the data warehouse of
an enterprise to discover useful patterns.

7. Why do we need Normalization?

Normalization is a process followed for eliminating redundant data and establishes a


meaningful relationship among tables based on rules and regulations in order to maintain
integrity of data. It is done for maintaining storage space and also for performance tuning.

8. What is Integrity?

Integrity refers to the process of ensuring that a database remains an accurate reflection of
the universe of discourse it is modeling or representing. In other words there is a close
correspondence between the facts stored in the database and the real world it models0
9. Give two features of Multimedia Databases

 The multimedia database systems are to be used when it is required to administrate a


huge amounts of multimedia data objects of different types of data media (optical storage,
video, tapes, audio records, etc.) so that they can be used (that is, efficiently accessed and
searched) for as many applications as needed.

 The Objects of Multimedia Data are: text, images, graphics, sound recordings, video

recordings, signals, etc., that are digitalized and stored.

10. What are Deductive Databases?

A Deductive Database is the combination of a conventional database containing facts, a


knowledge base containing rules, and an inference engine which allows the derivation of
information implied by the facts and rules.

Part B – (5*16=80 Marks)

11. (a)Explain the architecture of Distributed Databases. (16)


(b)Write notes on the following:
(i) Query processing. (8)
(ii) Transaction processing. (8)

A transaction is a collection of actions that make consistent transformations of system states


while preserving system consistency.

≪ concurrency transparency ≪ failure transparency

Example Transaction – SQL Version


Begin_transaction Reservation
begin
input(flight_no, date, customer_name);
EXEC SQL UPDATE FLIGHT
SET STSOLD = STSOLD + 1
WHERE FNO = flight_no AND DATE = date;
EXEC SQL INSERT
INTO FC(FNO, DATE, CNAME, SPECIAL);
VALUES (flight_no, date, customer_name, null);
output(“reservation completed”)
end . {Reservation}

Properties of Transactions

ATOMICITY

≪ all or nothing
CONSISTENCY

≪ no violation of integrity constraints

ISOLATION

≪ concurrent changes invisible E serializable

DURABILITY

≪ committed updates persist

These are the ACID Properties of Transaction

Atomicity

Either all or none of the transaction's operations are performed. Atomicity requires that if a
transaction is interrupted by a failure, its partial results must be undone. The activity of
preserving the transaction's atomicity in presence of transaction aborts due to input errors,
system overloads, or deadlocks is called transaction recovery. The activity of ensuring atomicity
in the presence of system crashes is called crash recovery.

Consistency

Internal consistency

≪ A transaction which executes alone against a consistent database leaves it in a consistent


state.

≪ Transactions do not violate database integrity constraints.

Transactions are correct programs.

Isolation

Degree 0

≪ Transaction T does not overwrite dirty data of other transactions

≪ Dirty data refers to data values that have been updated by a transaction prior to its
commitment.

Degree 2

≪ T does not overwrite dirty data of other transactions

≪ T does not commit any writes before EOT


≪ T does not read dirty data from other transactions

Degree 3

≪ T does not overwrite dirty data of other transactions

≪ T does not commit any writes before EOT

≪ T does not read dirty data from other transactions

≪ Other transactions do not dirty any data read by T before T completes.

Isolation

Serializability

≪ If several transactions are executed concurrently, the results must be the same as if they were
executed serially in some order.

Incomplete results

≪ An incomplete transaction cannot reveal its results to other transactions before its
commitment.

≪ Necessary to avoid cascading aborts.

Durability: Once a transaction commits, the system must guarantee that the results of its
operations will never be lost, in spite of subsequent failures.

Database recovery

Transaction transparency: Ensures all distributed Ts maintain distributed database’s integrity


and consistency.

• Distributed T accesses data stored at more than one location.

• Each T is divided into no. of subTs, one for each site that has to be accessed.
• DDBMS must ensure the indivisibility of both the global T and each of the subTs.

Concurrency transparency: All Ts must execute independently and be logically consistent with
results obtained if Ts executed in some arbitrary serial order.

• Replication makes concurrency more complex

Failure transparency: must ensure atomicity and durability of global T.

• Means ensuring that subTs of global T either all commit or all abort.

• Classification transparency: In IBM’s Distributed Relational Database Architecture


(DRDA), four types of Ts:

• Remote request

• Remote unit of work

• Distributed unit of work

• Distributed request.

12. (a)Discuss the Modeling and design approaches for Object Oriented
Databases

MODELING AND DESIGN


Basically, an OODBMS is an object database that provides DBMS capabilities to objects that
have been created using an object-oriented programming language (OOPL). The basic
principle is to add persistence to objects and to make objects persistent. Consequently
application programmers who use OODBMSs typically write programs in a native OOPL
such as Java, C++ or Smalltalk, and the language has some kind of Persistent class, Database
class, Database Interface, or Database API that provides DBMS functionality as, effectively,
an extension of the OOPL.
Object-oriented DBMSs, however, go much beyond simply adding persistence to any one
object-oriented programming language. This is because, historically, many object-oriented
DBMSs were built to serve the market for computer-aided design/computer-aided
manufacturing (CAD/CAM) applications in which features like fast navigational access,
versions, and long transactions are extremely important.
Object-oriented DBMSs, therefore, support advanced object-oriented database applications
with features like support for persistent objects from more than one programming language,
distribution of data, advanced transaction models, versions, schema evolution, and dynamic
generation of new types.
Object data modeling
An object consists of three parts: structure (attribute, and relationship to other objects like
aggregation, and association), behavior (a set of operations) and characteristic of types
(generalization/serialization). An object is similar to an entity in ER model; therefore we
begin with an example to demonstrate the structure and relationship.

Attributes are like the fields in a relational model. However in the Book example we have, for
attributes publishedBy and writtenBy, complex types Publisher and Author, which are also
objects. Attributes with complex objects, in RDNS, are usually other tables linked by keys to the
employee table.
Relationships: publish and writtenBy are associations with I: N and 1:1 relationship; composed
of is an aggregation (a Book is composed of chapters). The 1: N relationship is usually realized
as attributes through complex types and at the behavioral level. For example,
Generalization/Serialization is the is a relationship, which is supported in OODB through class
hierarchy. An ArtBook is a Book, therefore the ArtBook class is a subclass of Book class. A
subclass inherits all the attribute and method of its superclass.

Message: means by which objects communicate, and it is a request from one object to another to
execute one of its methods. For example: Publisher_object.insert (”Rose”, 123…) i.e. request to
execute the insert method on a Publisher object)
Method: defines the behavior of an object. Methods can be used to change state by modifying its
attribute values to query the value of selected attributes The method that responds to the message
example is the method insert defied in the Publisher class. The main differences between
relational database design and object oriented database design include:

Many-to-many relationships must be removed before entities can be translated into relations.
Many-to-many relationships can be implemented directly in an object-oriented database.
Operations are not represented in the relational data model. Operations are one of the main
components in an object-oriented database. In the relational data model relationships are
implemented by primary and foreign keys. In the object model objects communicate through
their interfaces. The interface describes the data (attributes) and operations (methods) that are
visible to other objects.
(b) Explain the Multi-Version Locks and Recovery in Query Languages. (16)

Multi-Version Locks

Multiversion concurrency control (abbreviated MCC or MVCC), in the database field of


computer science, is a concurrency control method commonly used by database management
systems to provide concurrent access to the database and in programming languages to
implement transactional memory.

For instance, a database will implement updates not by deleting an old piece of data and
overwriting it with a new one, but instead by marking the old data as obsolete and adding the
newer "version." Thus there are multiple versions stored, but only one is the latest. This allows
the database to avoid overhead of filling in holes in memory or disk structures but requires
(generally) the system to periodically sweep through and delete the old, obsolete data objects.
For a document-oriented database such as CouchDB, Riak or MarkLogic Server it also allows
the system to optimize documents by writing entire documents onto contiguous sections of disk
—when updated, the entire document can be re-written rather than bits and pieces cut out or
maintained in a linked, non contiguous database structure

MVCC also provides potential "point in time" consistent views. In fact read transactions under
MVCC typically use a timestamp or transaction ID to determine what state of the DB to read,
and read these "versions" of the data.

This avoids managing locks for read transactions because writes can be isolated by virtue of the
old versions being maintained, rather than through a process of locks or mutexes. Writes affect
future "version" but at the transaction ID that the read is working at, everything is guaranteed to
be consistent because the writes are occurring at a later transaction ID.
In other words, MVCC provides each user connected to the database with a "snapshot" of the
database for that person to work with. Any changes made will not be seen by other users of the
database until the transaction has been committed.

MVCC uses timestamps or increasing transaction IDs to achieve transactional consistency.


MVCC ensures a transaction never has to wait for a database object by maintaining several
versions of an object. Each version would have a write timestamp and it would let a transaction
(Ti) read the most recent version of an object which precedes the transaction timestamp (TS (Ti)).

If a transaction (Ti) wants to write to an object, and if there is another transaction (Tk), the
timestamp of Ti must precede the timestamp of Tk (i.e., TS(Ti) < TS(Tk)) for the object write
operation to succeed, which is to say a write cannot complete if there are outstanding
transactions with an earlier timestamp.

Every object would also have a read timestamp, and if a transaction Ti wanted to write to object
P, and the timestamp of that transaction is earlier than the object's read timestamp (TS(Ti) <
RTS(P)), the transaction Ti is aborted and restarted. Otherwise, Ti creates a new version of P and
sets the read/write timestamps of P to the timestamp of the transaction TS (Ti).

The obvious drawback to this system is the cost of storing multiple versions of objects in the
database. On the other hand reads are never blocked, which can be important for workloads
mostly involving reading values from the database. MVCC is particularly adept at implementing
true snapshot isolation, something which other methods of concurrency control frequently do
either incompletely or with high performance costs.

At t1 the state of a DB could be

Time Object1 Object2

t1 “Hello” “Bar”

t2 “Foo” “Bar”

This indicates that the current set of this database (perhaps a key-value store database) is
Object1="Hello", Object2="Bar". Previously, Object1 was "Foo" but that value has been
superseded. It is not deleted because the database holds “multiple versions” but will be deleted
later.

If a long running transaction starts a read operation, it will operate at transaction "t1" and see this
state. If there is a concurrent update (during that long-running read transaction) which deletes
Object 2 and adds Object 3 = “foo-bar” the database will look like

Time Object1 Object2 Object3

t2 “Hello” (deleted) “Foo-Bar”


t1 “Hello” Bar

t0 “Hello” Bar

Now there is a new version as of transaction ID t2. Note critically that the long-running read
transaction still has access to a coherent snapshot of the system at t1* even though the write
transaction added data as of t2, so the read transaction is able to run in isolation from the update
transaction that created the t2 values. This is how MVCC allows isolated, ACID, reads without
any locks.

Recovery
13. (a) Discuss in detail Data Warehousing and Data Mining.

Potrebbero piacerti anche