Sei sulla pagina 1di 19

Subject : Advanced Databases

ORAL QUESTIONS

GENERAL DBMS QUESTIONS

1. What is data abstraction ?


Data abstraction is the enforcement of a clear separation between
the abstract properties of a data type and the concrete details of its
implementation. The abstract properties are those that are visible to
client code that makes use of the data type--the interface to the data
type--while the concrete implementation is kept entirely private, and
indeed can change, for example to incorporate efficiency
improvements over time. The idea is that such changes are not
supposed to have any impact on client code, since they involve no
difference in the abstract behaviour.
For example, one could define an abstract data type called lookup table,
where keys are uniquely associated with values, and values may be
retrieved by specifying their corresponding keys. Such a lookup table
may be implemented in various ways: as a hash table, a binary search tree, or
even a simple linear list. As far as client code is concerned, the abstract
properties of the type are the same in each case.

2. What are 3 levels of data abstraction ?


Since many users of database systems are not deeply familiar with
computer data structures, database developers often hide complexity
through the following levels:

Data abstraction levels of a database system


Physical level: The lowest level of abstraction describes how the data is
actually stored. The physical level describes complex low-level data
structures in detail.
Logical level: The next higher level of abstraction describes what data are
stored in the database, and what relationships exist among those data. The
logical level thus describes an entire database in terms of a small number of
relatively simple structures. Although implementation of the simple
structures at the logical level may involve complex physical level structures,
the user of the logical level does not need to be aware of this complexity.
Database administrators, who must decide what information to keep in a
database, use the logical level of abstraction.
View level: The highest level of abstraction describes only part of the entire
database. Even though the logical level uses simpler structures, complexity
remains because of the variety of information stored in a large database.
Many users of a database system do not need all this information; instead,
they need to access only a part of the database. The view level of abstraction
exists to simplify their interaction with the system. The system may provide
many views for the same database.

3. Explain Normalization ,1NF,2NF,3NF,Boyee codd NF,4NF


Database normalization, sometimes referred to as canonical
synthesis, is a technique for designing relational database tables to
minimize duplication of information and, in so doing, to safeguard the
database against certain types of logical or structural problems,
namely data anomalies. For example, when multiple instances of a
given piece of information occur in a table, the possibility exists that
these instances will not be kept consistent when the data within the
table is updated, leading to a loss of data integrity. A table that is
sufficiently normalized is less vulnerable to problems of this kind,
because its structure reflects the basic assumptions for when multiple
instances of the same information should be represented by a single
instance only.
1NF :A table is in first normal form (1NF) if and only if it represents a
relation.[3] Given that database tables embody a relation-like form, the
defining characteristic of one in first normal form is that it does not
allow duplicate rows or nulls. Simply put, a table with a unique key
(which, by definition, prevents duplicate rows) and without any
nullable columns is in 1NF.
2NF:The table must be in 1NF.
• None of the non-prime attributes of the table are functionally
dependent on a part (proper subset) of a candidate key; in other words,
all functional dependencies of non-prime attributes on candidate keys
are full functional dependencies.[7] For example, consider an
"Employees' Skills" table whose attributes are Employee ID,
Employee Name, and Skill; and suppose that the combination of
Employee ID and Skill uniquely identifies records within the table.
Given that Employee Name depends on only one of those attributes –
namely, Employee ID – the table is not in 2NF.
• In simple terms, a table is 2NF if it is in 1NF and all fields are
dependent on the whole of the primary key, or a relation is in 2NF if it
is in 1NF and every non-key attribute is fully dependent on each
candidate key of the relation.
• Note that if none of a 1NF table's candidate keys are composite – i.e.
every candidate key consists of just one attribute – then we can say
immediately that the table is in 2NF.
• All columns must be a fact about the entire key, and not a subset of
the key.

3NF:The criteria for third normal form (3NF) are:


• The table must be in 2NF.
• Transitive dependencies must not be eliminated. All attributes must
rely only on the primary key. So, if a database has a table with
columns Student ID, Student, Company, and Company Phone
Number, it is not in 3NF. This is because the Phone number relies on
the Company. So, for it to be in 3NF, there must be a second table
with Company and Company Phone Number columns; the Phone
Number column in the first table would be removed.

Boyee codd NF: A table is in Boyce-Codd normal form (BCNF) if


and only if, for every one of its non-trivial functional dependencies X
→ Y, X is a superkey—that is, X is either a candidate key or a superset
thereof.[
4NF:A table is in fourth normal form (4NF) if and only if, for every
one of its non-trivial multivalued dependencies X Y, X is a superkey—
that is, X is either a candidate key or a superset thereof.[9]
• For example, if you can have two phone numbers values and two
email address values, then you should not have them in the same table.

4. What is Object Oriented Databases ?


In an object database (also object oriented database), information is
represented in the form of objects as used in object-oriented programming.
When database capabilities are combined with object programming
language capabilities, the result is an object database management
system (ODBMS). An ODBMS makes database objects appear as
programming language objects in one or more object programming
languages. An ODBMS extends the programming language with
transparently persistent data, concurrency control, data recovery,
associative queries, and other capabilities.

5. What is ORDBMS ?
An object-relational database (ORD) or object-relational database
management system (ORDBMS) is a database management system
(DBMS) similar to a relational database, but with an object-oriented
database model: objects, classes and inheritance are directly supported
in database schemas and in the query language. In addition, it supports
extension of the data model with custom data-types and methods.
One aim for this type of system is to bridge the gap between
conceptual data modeling techniques such as Entity-relationship
diagram (ERD) and object-relational mapping (ORM), which often use
classes and inheritance, and relational databases, which do not directly
support them.
Another, related, aim is to bridge the gap between relational databases
and the object-oriented modeling techniques used in programming
languages such as Java, C++ or C#. However, a more popular
alternative for achieving such a bridge is to use a standard relational
database systems with some form of ORM software.

6. Explain

7. What are Acid Properties ?


In computer science, ACID (Atomicity, Consistency, Isolation,
Durability) is a set of properties that guarantee that database transactions
are processed reliably. In the context of databases, a single logical
operation on the data is called a transaction.

Atomicity
Atomicity refers to the ability of the DBMS to guarantee that either all of the
tasks of a transaction are performed or none of them are. For example, the
transfer of funds can be completed or it can fail for a multitude of reasons,
but atomicity guarantees that one account won't be debited if the other is not
credited. Atomicity states that database modifications must follow an “all or
nothing” rule. Each transaction is said to be “atomic.” If one part of the
transaction fails, the entire transaction fails. It is critical that the database
management system maintain the atomic nature of transactions in spite of
any DBMS, operating system or hardware failure. Atomicity is obtained
when an attribute can no longer be broken down any further.

Consistency
The Consistency property ensures that the database remains in a consistent
state before the start of the transaction and after the transaction is over
(whether successful or not).
Consistency states that only valid data will be written to the database. If, for
some reason, a transaction is executed that violates the database’s
consistency rules, the entire transaction will be rolled back and the database
will be restored to a state consistent with those rules. On the other hand, if a
transaction successfully executes, it will take the database from one state
that is consistent with the rules to another state that is also consistent with
the rules.

Isolation
Isolation refers to the requirement that other operations cannot access or see
the data in an intermediate state during a transaction. This constraint is
required to maintain the performance as well as the consistency between
transactions in a DBMS system.

Durability
Durability refers to the guarantee that once the user has been notified of
success, the transaction will persist, and not be undone. This means it will
survive system failure, and that the database system has checked the integrity
constraints and won't need to abort the transaction. Many databases
implement durability by writing all transactions into a log that can be played
back to recreate the system state right before a failure. A transaction can
only be deemed committed after it is safely in the log.

UNIT 1
1. Explain architecture of parallel database And explain with
example
4 types of PDB architectures based on arrangement of processors,
disks and memory:
Shared memory, shared disk, shared nothing and hierarchical
2. Explain speedup and scale up w.r.to parallel databases
Speed up : more no. of small transactions per unit time given by
Ts/Tl.
Scale up : larger transactions executed in same time by parallelism
and increasing resources
Ts/Tl=1

3. Explain different partitioning techniques


1. round robin 2. hash partitioning 3. range partitioning

4. Explain intraquery and interquery parallelism


intraquery : operations within a query are executed in parallel
This will improve both throughput as well as response
time.(speed up)
interquery : queries within a transaction are executed in parallel
This will increase throughput but not response time
(scale up)
5. Describe the good way to parallelize
a. The difference operation
b. Count, avg : 1. Partition the relation on grouping attributes
2. Compute the aggregate value locally on each
processor.
c. Join : any of the 4 methods
1. partitioned join
2. fragment and replicate join
3. partitioned parallel hash join
4. parallel nested loop join
6. Explain skew handling w r to P.D
1. balanced range partitioning vector can be constructed by sorting
2. Use virtual processors to distribute the work.

UNIT 2

7. What is distributed database system ? explain with example


A distributed database management system is a software system that permits
the management of a distributed database and makes the distribution
transparent to the users. A distributed database is a collection of
multiple, logically interrelated databases distributed over a computer
network. Sometimes "distributed database system" is used to refer
jointly to the distributed database and the distributed DBMS.
A distributed database is a database that is under the control of a central
database management system (DBMS) in which storage devices are not all
attached to a common CPU. It may be stored in multiple computers
located in the same physical location, or may be dispersed over a
network of interconnected computers.
Collections of data (eg. in a database) can be distributed across multiple
physical locations. A distributed database is distributed into separate
partitions/fragments. Each partition/fragment of a distributed database may be
replicated (ie. redundant fail-overs, RAID like).
Besides distributed database replication and fragmentation, there are many
other distributed database design technologies. For example, local
autonomy, synchronous and asynchronous distributed database technologies.
These technologies' implementation can and does depend on the needs of the
business and the sensitivity/confidentiality of the data to be stored in the
database, and hence the price the business is willing to spend on ensuring
data security, consistency and integrity.

8. What is homogenous and heterogenous distributed system ?


Homogenous DDBS : same schema, dbms and sites are aware of each
other
Heterogeneous : different schema and dbms
sites r unaware of each other

9. What is distributed data storage ?


How a relation is stored a different sites by replication, fragmentation
or both.

10.What is the role of transaction manager in distributed system ?


To manage access to data stored at that site
To maintain a log for recovery purpose.
Concurrency control scheme to control concurrent execution of
transactions at THAT site.

11.What is the role of transaction coordinator in distributed system ?


To coordinate execution of transactions(local/global) initiated at that
site.
To start transaction.
Divide into subtransactions and distribute subtransactions to
appropriate sites.
To coordinate termination of transactions.

12.What are system failure modes in d.s.


Site failure.
Lost messages.
Communication link failure.
Network partition.

13.Explain availability w.r.to d.s.


System should continue normal functioning even if some site fails.
Concurrency protocols can be modified to allow availability:
1. Majority based approach 2. Read 1, write all availabile approach.
Failed site should be reintegrated properly.
In case of coordinator failure : backup coordinator can be used
or new coordinator can be selected by election algo.

14.Consider a d.s with 2 sites A and B Can site A distinguish among


the following
a. B goes down
b. The link between A and B goes down
c. B is extremely loaded
No. It cannot distingush between the above cases . Site A can
detect failure, but it cannot determine the reason of failure.

15.Explain working of election algorithm


Every site is given a unique ID and site with highest ID becomes
coordinator.

16.What are directories?


Directory is a listing of info about some class of objects.

17.Give examples of directories


telephone directory.
Favourites in web browser.
18.What is directory system ?
software engineering, a directory is similar to a dictionary; it enables
the look up of a name and information associated with that name. As a
word in a dictionary may have multiple definitions, in a directory, a
name may be associated with multiple, different, pieces of
information. Likewise, as a word may have different parts and
different definitions, a name in a directory may have many different
types of data. Based on this rudimentary explanation of a directory, a
directory service is simply the software system that stores, organizes
and provides access to information in a directory.
Directories may be very narrow in scope, supporting only a small set of node
types and data types, or they may be very broad, supporting an arbitrary or
extensible set of types. In a telephone directory, the nodes are names and the
data items are telephone numbers. In the DNS the nodes are domain names or
internet addresses. In a directory used by a network operating system, the
nodes represent resources that are managed by the OS, including users,
computers, printers and other shared resources. Many different directory
services have been used since the advent of the Internet but this article
focuses mainly on those that have descended from the X.500 directory
service.

19.What is LDAP ?
The Lightweight Directory Access Protocol, or LDAP (IPA: [ˈɛl dæp]),
is an application protocol for querying and modifying directory services
running over TCP/IP.[1]
A directory is a set of objects with similar attributes organised in a logical
and hierarchical manner. The most common example is the telephone
directory, which consists of a series of names (either of persons or
organizations) organized alphabetically, with each name having an address
and phone number attached.
An LDAP directory tree often reflects various political, geographic, and/or
organizational boundaries, depending on the model chosen. LDAP
deployments today tend to use Domain name system (DNS) names for
structuring the topmost levels of the hierarchy. Deeper inside the directory
might appear entries representing people, organizational units, printers,
documents, groups of people or anything else that represents a given tree
entry (or multiple entries).

20.Explain LDIF format for LDAP


LDAP data interchange format.
21.How querying mechanism works in LDAP ?
Consists of simple selections and projections, no joins.
Query can be fired directly or API can be used.
Query consists of search condition, base, return attributes, limit and
scope.

22.How LDAP works at client side ?


Client uses API to access LDAP server. A query is transparently
processed using referrals.

23.How does LDAP works ?

24.What are LDAP backends ?


LDAP servers??

25.What are LDAP objects ?

26.What are LDAP attributes ?


Entries in LDAP can have attributes.

27.How access cntl mechanism works in LDAP ?

28.Explain conf. file sections

29.What are benefits of LDAP ?


Simple n/w protocol to access directory info.
Referrals allow transparent access to a distributed LDAP tree.

UNIT 3
30.What is XML?
Extensible Markup Language.
Provides standard data format for data exchange between applications
over the web.

31.Name the XML parser and working of each in brief?


???
32.What is two, three, multi tier architecture?
2 tier architecture : tier 1 : web server and application server are
combined
tier 2 : data server
3 tier architecture : tier 1 : web server
tier 2 : application server
tier 3 : data server
N tier architecture : client has presentation GUI
tier 1 : presentation logic
tier 2 : business logic tier/proxy tier (SOAP)
tier 3 : database access tier
tier 4 : data tier

33.Explain XML DTD


Document type declaration specifies schema of XML documents.
It spcifies : 1. what elements may occur
2. How they may be nested
3. what are thei attributes

34.Explain SOAP
Simple object access protocol invoking procedures by specifying a
standard XML format fot procedure parameters, return values which
are embedded in the SOAP XML header.
SOAP procedures can be invoked by any application and are
programming language independent.
SOAP uses HTTP as transport protocol.

UNIT 4
35.What is the need of Data warehousing ?
1. normal processing in operational database will get slowed down if
time is spent processing analytical queries
2. Analysis and decision making needs historic data

36.Explain OLAP
It is an interactive system which provides summary about
multidimensional data.

37.Explain OLTP
Interactiv system which handles storing records about data created,
srored and used by business transactions.

38.What is difference between OLTP and OLAP


1. OLTP : Views current data OLAP : viw historic data
2. OLTP : handles operational data OLAP : informational data
3. OLTP : transaction OLAP : decision making
4 OLTP : E-R model OLAP : multidimensional data model

39.What is ROLAP,MOLAP,Hybrid OLAP


Relational OLAP : extension of relation database acting between from
end tools and back-end relational DB, highly scalable
MOLAP : multidimensional data model
very fast computation
Hybrid OLAP : combination of both

40.What are different OLAP operation?


1. Roll up
2. Drill Down
3. Slice n dice
4. Pivot
5. Drill across
6. Drill through

41.What are schemas for multidimensional Databases?


Star, snowflake, fact constellation

42.What is decision support system?


Enables easy decision making to managers by providing statistical
data.

43.What is data cube?


Generation of cross-tab visualized having n-dimensions is called data
cube.

44.Explain architecture of data warehouse


3-tier architecture of data warehouse :
Bottom tier : Data warehouse server
Middle tier : OLAP server
Top tier : front end tools
45.What is Data mart?
It is a subset of corporate wide data that has value to a specific group
of users.

46.What are the different phases of data warehouse? Explain each


??
Extract data
Clean data
transform data
load
refresh

47.What are the forms of data pre-processing ?


Data cleaning
Data integration
Data transformation
data reduction

48.What is the need of cleaning Data?


Data in warehouse is incomplete, noisy and inconsistent.
Hence it needs to be cleaned before used for data mining.

UNIT 5
49.What is materalised view ?
A materialized view is a database object that contains the results of a query. They
are local copies of data located remotely, or are used to create summary tables
based on aggregations of a table's data. Materialized views, which store data
based on remote tables, are also known as snapshots. Snapshot is redefined as
Materialized view and the Query rewrite feature is added from ORACLE 8i.

A materialized view takes a different approach in which the query result is


cached as a concrete table that may be updated from the original base tables from
time to time. This enables much more efficient access, at the cost of some data
being potentially out-of-date. It is most useful in data warehousing scenarios,
where frequent queries of the actual base tables can be extremely expensive.
50.What is Data mining?
It is the process of extracting knowledge from large amounts of data.

51.Explain architecture of Data mining?


1. data cleaning, integration and selection
2. Data warehouse server
3. data mining engine
4. knowledge base
5. pattern evaluation
6. User interface

52.What is frequent pattern ?


53.What is sequential pattern ?
54.Explain support and confidence w .r . to Association Rule Mining
D : set of transactions
A : itemset
Support : P(A U B)
: percentage of transactions in D containing (A U B)
Confidence : P(B/A)
: percentage of transactions in D containing A that also
contain B.

55.What is association rule mining ?


Associations among data in large transactional databases can be found
by perfoming frequent itemset mining.
Association rule mining is a 2 step process :
1. Find all frquent itemsets
2. Generate strong association rules from frequent itemsets : these
rules must satisfy minimum support and minimum confidence.

56.What is frequent itemsets?


It is an itemset which occurs at least as frequently as predetermined
minimum support count.

57.What is closed itemsets ?


An itemset X is clodes in dataset D if there exists no proper super-
itemset Y, which has the same support count as X, in D.
58.What is closed frequent itemset?
If X is both closed and frequent in D.

59.Explain Apriori algorithm


It is an algorithm for finding frequent itemsets using candidate
generation.

Principle : All non-empty subsets of a frequent itemset must also be


frequent.

Input : D : set of transactions


min_sup : minimum support count threshold

Output : frequent Itemsets in D

Steps : 1. Join
2. prune

60.Explain generation of association rule from frequent itemset


Association rule mining is a 2 step process :
1. Find all frquent itemsets
2. Generate strong association rules from frequent itemsets : these
rules must satisfy minimum support and minimum confidence.

61.What is correlation analysis ?


??

62.What is classification ?
Given past instances and classes to which thy belong, the problem is
to find the class to which a new item belongs.

63.What is prediction ?
Prediction is a continuous valued function unlike classification which
gives categorical values.

64.What is difference between classification and prediction ?


Classification finds categorical labels, while prediction is a continuous
valued function.
Accuracy of a classifier refers to the ability find accutare class label,
while accuracy of predictor is in how accurately the predictor can
guess the value.

65.Explain decision tree induction with algorithm


Input : 1. set of training tuples and their associated class labels.
2. Attribute list
3. Attribute selection method.
Output : Decision tree giving classification rules
Method : find nodes by selecting attributes from the list which are best
splitting attributes that partition the tuples into distinct
classes.

66.What is Bayesian classification?


It is a statistical classification that allows to predict class membership
probabilities.

67.What is bayes theorem?


Let H be a hypothesis that X belongs to specific class C.
Then posterior probability of H condition on X is
P(H/X) = P(X/H) P(H)/P(X)

68.How to predict a class label using naïve bayesian classification


X is a tuple
C1, C2.......Cn are classes.
Then X belongs to a class having highest posterior probability.
I.e. P(Ci/X) > P(Cj/X) ; for j=1 to n

69.What is cluster analysis?


Clustering is the grouping of physical or abstract objects into classes
of similar objects.

70.Explain centroid based technique – K-mean algorithm


Partitions set of n objects into K clusters so that resulting intracluster
similarity is high, and intercluster similarity is low.

71.What is outlier analysis?


It is the detecton and analysis of outlier data.
72.What is Text Mining?
It is the process of mining information stored in text documents.

UNIT 6
73.What is information retrieval system?
Systems which are used to store and query unstructured textual data
such as documents.

74.What is difference between IR and Database system


Database systems handle structured data which is has a complex data
model. Queryind data is relatively easier.
IR systems handle ustructured data which follows a simple data
model. Problems in approximate keyword searchind relevance
ranking.

75.What is the need of relevance ranking?


User cannot state the query precisely.
Also, keyword search returns a large no. of documents which match.
Hence, IR system needs to order answer on the basis of relevance.

76.What are functionality of web search engine?


Finds wen pages relevant to given keywords which are ranked
according to relevance.

77.Explain architecture of web search engine


1. Search engine software
2. Web crawler
3. Index database
4. Relevance Ranking algorithm
78.Explain diferent ways of relevance ranking?
1. relevance ranking using terms
2. relevance ranking using hyperlinks

79.Explain Pagerank algorithm?


Measure of popularity of page is based on popularity of pages that
link to that page.
Page rank pf page page gives the probability that a random walker
will visit that page.

80.Explain HITS algorithm


Compute popularity of pages using only pages that contain keyword.

81. What is hub & authorities w. r. to HITS algorithm


Hub is a page that stores links to many related pages; but may not in
itself contain information on that topic; but it points to pages that
contain actual information.
Authority is page which contains actual information on that topic,
although it may not store links to many related pages.

82.how to evaluate ranked list ?


???

83.What are the measure of text retrieval?


Precision : Percentage of relevant pages in retrieved pages.
Recall : Percentage of retrieved pages in relevant.

84.What is web crawler?


It is a process which recursively follows hyperlinks and stores indexes
and information about web pages.

85.What is indexing? name the data structure used for


Indexing storing key-pointer pairs for fast retrieval of data.
Here, Inverted index is used for storing list Si of document identifiers
which contain a particular keyword Ki.
B+ tree is used.

86.Explain inverted list for indexing?


Inverted index is used for storing list Si of document identifiers which
contain a particular keyword Ki.
87.What is need of context based querying?
The problem of homonyms can b solved by concept based querying.
Here, concept that each word in the document is understood and
replacement is done.
This disambiguation is done by looking at surrounding words in the
document.

88.What is ontology?
These are hierarchical structures which represent relationships
between concepts.

89.What is synonym, homonym?


Synonyms : words with same meaning
Homonyms : same word having different meaning.

PRACTICAL ASSN QUESTIONS

90.Explain the architecture of MYSQL, Oracle ,Sql server


91.Name ETL tools
92.Explain ETL tool working
93.Compare different ETL tools available

Potrebbero piacerti anche