Sei sulla pagina 1di 60

Master of Computer Application (MCA) Semester II MC0067 Database Management 4 Credits

(Book ID: B0716 & BO717)

Assignment Set 1
BOOK ID: B0716 & BO717

1. Write about:

With respect to Hashing Techniques Answer : Linear search:


Linear search, also known as sequential search, means starting at the beginning of the data and checking each item in turn until either the desired item is found or the end of the data is reached. Linear search is a search algorithm, also known as sequential search that is suitable for searching a list of data for a particular value. It operates by checking every element of a list one at a time in sequence until a match is found. The Linear Search, or sequential search, is simply examining each element in a list one by one until the desired element is found. The Linear Search is not very efficient. If the item of data to be found is at the end of the list, then all previous items must be read and checked before the item that matches the search criteria is found. This is a very straightforward loop comparing every element in the array with the key. As soon as an equal value is found, it returns. If the loop finishes without finding a match, the search failed and -1 is returned. For small arrays, linear search is a good solution because it's so straightforward. In an array of a million elements linear search on average will take500, 000 comparisons to find the key. For a much faster search, take a look at binary search. Algorithm For each item in the database if the item matches the wanted info

exit with this item Continue loop wanted item is not in database

Collision Chain: In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associate array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought. Ideally, the hash function should map each possible key to a unique slot index, but this ideal is rarely achievable in practice (unless the hash keys are fixed; i.e. new entries are never added to the table after it is created). Instead, most hash table designs assume that hast collisionsdifferent keys that map to the same hash valuewill occur and must be accommodated in some way.

2. Write about:

Answer: Integrity Rules:


These are the rules which a relational database follows in order to stay accurate and accessible. These rules govern which operations can be performed on the data and on the structure of the database. There are three integrity rules defined for a relational databse,which are:

Distinct Rows in a Table - this rule says that all the rows of a table should be distinct to avoid in ambiguity while accessing the rows of that table. Most of the modern database management systems can be configured to avoid duplicate rows. Entity Integrity (A Primary Key or part of it cannot be null) - this rule says that 'null' is special value in a relational database and it doesn't mean blank or zero. It means the unavailability of data and hence a 'null' primary key would not be a complete identifier. This integrity rule is also termed as entity integirty. Referential Integrity - this rule says that if a foreign key is defined on a table then a value matching that foreign key value must exist as th e primary key of a row in some other table.

The following are the integrity rules to be satisfied by any relation. No Component of the Primary Key can be null.

The Database must not contain any unmatched Foreign Key values. This is called the referential integrity rule. Unlike the case of Primary Keys, there is no integrity rule saying that no component of the foreign key can be null. This can be logically explained with the help of the following example: Consider the relations Employee and Account as given below. Employee
EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal EmpAcc# 120001 120002 Null 120003

Emp# X101 X102 X103 X104

Account
OpenDate 30-Aug-1998 29-Oct-1998 01-Jan-1999 04-Mar-1999 BalAmt 5000 1200 3000 500

ACC# 120001 120002 120003 120004

EmpAcc# in Employee relation is a foreign key creating reference from Employee to Account. Here, a Null value in EmpAcc# attribute is logically possible if an Employee does not have a bank account. If the business rules allow an employee to exist in the system without opening an account, a Null value can be allowed for EmpAcc# in Employee relation. In the case example given, Cust# in Ord_Aug cannot accept Null if the business rule insists that the Customer No. needs to be stored for every order placed.

Relational Operators: In the relational model, the database objects seen so far have specific names: Name Relation Tuple Attribute Cardinality Meaning Table Record(Row) Field(Column) Number of Records(Rows)

Degree(or Arity) View

Number of Fields(Columns) Query/Answer table

On these objects, a set of operators (relational operators) is provided to manipulate them: 1. Restrict 2. Project 3. Union 4. Difference 5. Product 6. Intersection 7. Join 8. Divide Restrict: Restrict simply extract records from a table. it is also known as Select, but not the same SELECT as defined in SQL. Project: Project selects zero or more fields from a table and generates a new table that contains all of the records and only the selected fields (with no duplications). Union: Union creates a new table by adding the records of one table to another tables, must be compatible: have the same number of fields and each of the field pairs has to have values in the same domain. Difference: The difference of two tables is a third table which contains the records which appear in the first BUT NOT in the second.

Product: The product of two tables is a third which contains all of the records in the first one added to each of the records in the second. Intersection: The intersection of two tables is a third tables which contains the records which are common to both. Join: The join of two tables is a third which contains all of the records in the first and the second which are related. Divide: Dividing a table by another table gives all the records in the first which have values in their fields matching ALL the records in the second. The eight relational algebra operators are 1. SELECT To retrieve specific tuples/rows from a relation.

Ord# 101 104

OrdDate Cust# 02-08-94 002 18-09-94 002

2. PROJECT To retrieve specific attributes/columns from a relation.

Description Power Supply 101-Keyboard 2000 Mouse 800 MS-DOS 6.0 5000 MS-Word 6.0 8000

Price 4000 2000 800 5000 8000

3. PRODUCT To obtain all possible combination of tuples from two relations.

Ord# 101 101 101 101 101 102 102

OrdDate 02-08-94 02-08-94 02-08-94 02-08-94 02-08-94 11-08-94 11-08-94

O.Cust# 002 002 002 002 002 003 003

C.Cust# 001 002 003 004 005 001 002

CustName Shah Srinivasan Gupta Banerjee Apte Shah Srinivasan

City Bombay Madras Delhi Calcutta Bombay Bombay Madras

4. UNION To retrieve tuples appearing in either or both the relations participating in the UNION.

Eg: Consider the relation Ord_Jul as follows (Table: Ord_Jul) Ord# 101 102 101 102 103 104 105 OrdDate 03-07-94 27-07-94 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 001 003 002 003 003 002 005

Note: The union operation shown above logically implies retrieval of records of Orders placed in July or in August 5. INTERSECT To retrieve tuples appearing in both the relations participating in the INTERSECT.

Eg: To retrieve Cust# of Customers whove placed orders in July and in August Cust# 003

6. DIFFERENCE To retrieve tuples appearing in the first relation participating in the DIFFERENCE but not the second.

Eg: To retrieve Cust# of Customers whove placed orders in July but not in August Cust# 001 7. JOIN To retrieve combinations of tuples in two relations based on a common field in both the relations.

Eg: ORD_AUG join CUSTOMERS (here, the common column is Cust#) Ord# OrdDate Cust# CustNames City 101 02-08-94 002 Srinivasan Madras 102 11-08-94 003 Gupta Delhi 103 21-08-94 003 Gupta Delhi 104 28-08-94 002 Srinivasan Madras 105 30-08-94 005 Apte Bombay Note: The above join operation logically implies retrieval of details of all orders and the details of the corresponding customers who placed the orders. Such a join operation where only those

rows having corresponding rows in the both the relations are retrieved is called the natural join or inner join. This is the most common join operation. Consider the example of EMPLOYEE and ACCOUNT relations. EMPLOYEE EMP # X101 X102 X103 X104 ACCOUNT Acc# 120001 120002 120003 120004 OpenDate 30. Aug. 1998 29. Oct. 1998 1. Jan. 1999 4. Mar. 1999 BalAmt 5000 1200 3000 500 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Acc# 120001 120002 Null 120003

A join can be formed between the two relations based on the common column Acc#. The result of the (inner) join is : Emp# X101 X102 X104 EmpName Shekhar Raj Vani EmpCity Bombay Pune Bhopal Acc# 120001 120002 120003 OpenDate 30. Aug. 1998 29. Oct. 1998 1. Jan 1999 BalAmt 5000 1200 3000

Note that, from each table, only those records which have corresponding records in the other table appear in the result set. This means that result of the inner join shows the details of those employees who hold an account along with the account details. The other type of join is the outer join which has three variations the left outer join, the right outer join and the full outer join. These three joins are explained as follows: The left outer join retrieves all rows from the left-side (of the join operator) table. If there are corresponding or related rows in the right-side table, the correspondence will be shown. Otherwise, columns of the right-side table will take null values.

EMPLOYEE left outer join ACCOUNT gives: Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Acc# 120001 120002 NULL 120003 OpenDate 30. Aug. 1998 29. Oct. 1998 NULL 1. Jan 1999 BalAmt 5000 1200 NULL 3000

The right outer join retrieves all rows from the right-side (of the join operator) table. If there are corresponding or related rows in the left-side table, the correspondence will be shown. Otherwise, columns of the left-side table will take null values.

EMPLOYEE right outer join ACCOUNT gives: Emp# X101 X102 X104 NULL EmpName Shekhar Raj Vani NULL EmpCity Bombay Pune Bhopal NULL Acc# 120001 120002 120003 120004 OpenDate 30. Aug. 1998 29. Oct. 1998 1. Jan 1999 4. Mar. 1999 BalAmt 5000 1200 3000 500

(Assume that Acc# 120004 belongs to someone who is not an employee and hence the details of the Account holder are not available here) The full outer join retrieves all rows from both the tables. If there is a correspondence or relation between rows from the tables of either side, the correspondence will be shown. Otherwise, related columns will take null values.

EMPLOYEE full outer join ACCOUNT gives:

Emp# X101 X102 X103 X104 NULL

EmpName Shekhar Raj Sharma Vani NULL

EmpCity Bombay Pune Nagpur Bhopal NULL

Acc# 120001 120002 NULL 120003 120004

OpenDate 30. Aug. 1998 29. Oct. 1998 NULL 1. Jan 1999 4. Mar. 1999

BalAmt 5000 1200 NULL 3000 500

8. DIVIDE Consider the following three relations:

R1 divide by R2 per R3 gives: a Thus the result contains those values from R1 whose corresponding R2 values in R3 include all R2 values.

3. Write about:

Answer : Data are actually stored as bits, or numbers and strings, but it is difficult to work with data at this level. It is necessary to view data at different levels of abstraction. Schema:

Description of data at some level. Each level has its own schema.

We will be concerned with three forms of schemas:

physical, conceptual, and external.

4. Explain the SQL syntax for:

Answer: create-table-stmt:

column-def:

type-name:

column-constraint:

table-constraint:

foreign-key-clause:

The "CREATE TABLE" command is used to create a new table in an SQLite database. A CREATE TABLE command specifies the following attributes of the new table:

The name of the new table. The database in which the new table is created. Tables may be created in the main database, the temp database, or in any attached database. The name of each column in the table. The declared type of each column in the table. A default value or expression for each column in the table. A default collation sequence to use with each column. Optionally, a PRIMARY KEY for the table. Both single column and composite (multiple column) primary keys are supported. A set of SQL constraints for each table. SQLite supports UNIQUE, NOT NULL, CHECK and FOREIGN KEY constraints.

Every CREATE TABLE statement must specify a name for the new table. Table names that begin with "sqlite_" are reserved for

internal use. It is an error to attempt to create a table with a name that starts with "sqlite_". If a <database-name> is specified, it must be either "main", "temp", or the name of an attached database. In this case the new table is created in the named database. If the "TEMP" or "TEMPORARY" keyword occurs between the "CREATE" and "TABLE" then the new table is created in the temp database. It is an error to specify both a <database-name> and the TEMP or TEMPORARY keyword, unless the <database-name> is "temp". If no database name is specified and the TEMP keyword is not present then the table is created in the main database. It is usually an error to attempt to create a new table in a database that already contains a table, index or view of the same name. However, if the "IF NOT EXISTS" clause is specified as part of the CREATE TABLE statement and a table or view of the same name already exists, the CREATE TABLE command simply has no effect (and no error message is returned). An error is still returned if the table cannot be created because of an existing index, even if the "IF NOT EXISTS" clause is specified. It is not an error to create a table that has the same name as an existing trigger. Tables are removed using the DROP TABLE statement. CREATE TABLE ... AS SELECT Statements A "CREATE TABLE ... AS SELECT" statement creates and populates a database table based on the results of a SELECT statement. The table has the same number of columns as the rows returned by the SELECT statement. The name of each column is the same as the name of the corresponding column in the result set of the SELECT statement. The declared type of each column is determined by the expression affinity of the corresponding expression in the result set of the SELECT statement, as follows:
Expression Affinity Column Declared Type TEXT "TEXT"

NUMERIC INTEGER REAL NONE

"NUM" "INT" "REAL" "" (empty string)

A table created using CREATE TABLE AS has no PRIMARY KEY and no constraints of any kind. The default value of each column is NULL. The default collation sequence for each column of the new table is BINARY. Tables created using CREATE TABLE AS are initially populated with the rows of data returned by the SELECT statement. Rows are assigned contiguously ascending rowid values, starting with 1, in the order that they are returned by the SELECT statement. Column Definitions Unless it is a CREATE TABLE ... AS SELECT statement, a CREATE TABLE includes one or more column definitions, optionally followed by a list of table constraints. Each column definition consists of the name of the column, optionally followed by the declared type of the column, then one or more optional column constraints. Included in the definition of "column constraints" for the purposes of the previous statement are the COLLATE and DEFAULT clauses, even though these are not really constraints in the sense that they do not restrict the data that the table may contain. The other constraints NOT NULL, CHECK, UNIQUE, PRIMARY KEY and FOREIGN KEY constraints - impose restrictions on the tables data, and are are described under SQL Data Constraints below. Unlike most SQL databases, SQLite does not restrict the type of data that may be inserted into a column based on the columns declared type. Instead, SQLite uses dynamic typing. The declared type of a column is used to determine theaffinity of the column only. The DEFAULT clause specifies a default value to use for the column if no value is explicitly provided by the user when doing an INSERT. If there is no explicit DEFAULT clause attached to a column

definition, then the default value of the column is NULL. An explicit DEFAULT clause may specify that the default value is NULL, a string constant, a blob constant, a signed-number, or any constant expression enclosed in parentheses. An explicit default value may also be one of the special case-independent keywords CURRENT_TIME, CURRENT_DATE or CURRENT_TIMESTAMP. For the purposes of the DEFAULT clause, an expression is considered constant provided that it does not contain any sub-queries or string constants enclosed in double quotes. Each time a row is inserted into the table by an INSERT statement that does not provide explicit values for all table columns the values stored in the new row are determined by their default values, as follows:

If the default value of the column is a constant NULL, text, blob or signed-number value, then that value is used directly in the new row. If the default value of a column is an expression in parentheses, then the expression is evaluated once for each row inserted and the results used in the new row. If the default value of a column is CURRENT_TIME, CURRENT_DATE or CURRENT_TIMESTAMP, then the value used in the new row is a text representation of the current UTC date and/or time. For CURRENT_TIME, the format of the value is "HH:MM:SS". For CURRENT_DATE, "YYYY-MM-DD". The format for CURRENT_TIMESTAMP is "YYYY-MM-DD HH:MM:SS".

The COLLATE clause specifies the name of a collating sequence to use as the default collation sequence for the column. If no COLLATE clause is specified, the default collation sequence is BINARY. The number of columns in a table is limited by the SQLITE_MAX_COLUMN compile-time parameter. A single row of a table cannot store more than SQLITE_MAX_LENGTH bytes of data. Both of these limits can be lowered at runtime using the sqlite3_limit() C/C++ interface. SQL Data Constraints

Each table in SQLite may have at most one PRIMARY KEY. If the keywords PRIMARY KEY are added to a column definition, then the primary key for the table consists of that single column. Or, if a PRIMARY KEY clause is specified as a table-constraint, then the primary key of the table consists of the list of columns specified as part of the PRIMARY KEY clause. If there is more than one PRIMARY KEY clause in a single CREATE TABLE statement, it is an error. If a table has a single column primary key, and the declared type of that column is "INTEGER", then the column is known as an INTEGER PRIMARY KEY. See below for a description of the special properties and behaviors associated with an INTEGER PRIMARY KEY. Each row in a table with a primary key must feature a unique combination of values in its primary key columns. For the purposes of determining the uniqueness of primary key values, NULL values are considered distinct from all other values, including other NULLs. If an INSERT or UPDATE statement attempts to modify the table content so that two or more rows feature identical primary key values, it is a constraint violation. According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a long-standing coding oversight, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY SQLite allows NULL values in a PRIMARY KEY column. We could change SQLite to conform to the standard (and we might do so in the future), but by the time the oversight was discovered, SQLite was in such wide use that we feared breaking legacy code if we fixed the problem. So for now we have chosen to continue allowing NULLs in PRIMARY KEY columns. Developers should be aware, however, that we may change SQLite to conform to the SQL standard in future and should design new programs accordingly. A UNIQUE constraint is similar to a PRIMARY KEY constraint, except that a single table may have any number of UNIQUE constraints. For each UNIQUE constraint on the table, each row must feature a unique combination of values in the columns identified by the UNIQUE constraint. As with PRIMARY KEY constraints, for the purposes of UNIQUE constraints NULL values are considered distinct from all other values (including other NULLs). If

an INSERT or UPDATEstatement attempts to modify the table content so that two or more rows feature identical values in a set of columns that are subject to a UNIQUE constraint, it is a constraint violation. INTEGER PRIMARY KEY columns aside, both UNIQUE and PRIMARY KEY constraints are implemented by creating an index in the database (in the same way as a "CREATE UNIQUE INDEX" statement would). Such an index is used like any other index in the database to optimize queries. As a result, there often no advantage (but significant overhead) in creating an index on a set of columns that are already collectively subject to a UNIQUE or PRIMARY KEY constraint. A CHECK constraint may be attached to a column definition or specified as a table constraint. In practice it makes no difference. Each time a new row is inserted into the table or an existing row is updated, the expression associated with each CHECK constraint is evaluated and cast to a NUMERIC value in the same way as a CAST expression. If the result is zero (integer value 0 or real value 0.0), then a constraint violation has occurred. If the CHECK expression evaluates to NULL, or any other non-zero value, it is not a constraint violation. The expression of a CHECK constraint may not contain a subquery. CHECK constraints have been supported since version 3.3.0. Prior to version 3.3.0, CHECK constraints were parsed but not enforced. A NOT NULL constraint may only be attached to a column definition, not specified as a table constraint. Not surprisingly, a NOT NULL constraint dictates that the associated column may not contain a NULL value. Attempting to set the column value to NULL when inserting a new row or updating an existing one causes a constraint violation. Exactly how a constraint violation is dealt with is determined by the constraint conflict resolution algorithm. Each PRIMARY KEY, UNIQUE, NOT NULL and CHECK constraint has a default conflict resolution algorithm. PRIMARY KEY, UNIQUE and NOT NULL constraints may be explicitly assigned a default conflict resolution

algorithm by including aconflict-clause in their definitions. Or, if a constraint definition does not include a conflict-clause or it is a CHECK constraint, the default conflict resolution algorithm is ABORT. Different constraints within the same table may have different default conflict resolution algorithms. See the section titled ON CONFLICT for additional information.
Aggregate functions compute a single result value from a set of input values. The built-in aggregate functions are listed in Table 9-37 andTable 9-38. The special syntax considerations for aggregate

Table 9-37. General-Purpose Aggregate Functions

Function

Argument Type

Return Type

Descriptio n

smallint, int, bigint,real, avg(expr double ession) precision,nu meric, or interval

numeric for any integer type argument, double precision for a floating-point
argument, otherwise the same as the argument data type

the average (arithmetic mean) of all input values

bit_and( smallint, int, same as argument data type expressi bigint, orbit on)

the bitwise AND of all non-null input values, or null if none

bit_or(e smallint, int, same as argument data type xpressio bigint, orbit n)

the bitwise OR of all non-null input values, or null if none

Function

Argument Type

Return Type

Descriptio n

bool_and (express bool ion)

bool

true if all input values are true, otherwise false

bool_or( expressi bool on)

bool

true if at least one input value is true, otherwise false

count(*)

bigint

number of input rows

count(ex pression any )

bigint

number of input rows for which the value ofexpre ssion is not null

every(ex pression bool )

bool

equivalent to bool

_and
maximum value ofexpre ssion a

max(expr any array, numeric, string, or date/time ession) type

same as argument type

Function

Argument Type

Return Type

Descriptio n cross all input values

min(expr any array, numeric, string, or date/time ession) type

same as argument type

minimum value ofexpre ssion a cross all input values

smallint, int, bigint,real, sum(expr double ession) precision,nu meric, or interval

bigint for smallint or int arg uments, numeric for bigintargu ments, double precision for
floating-point arguments, otherwise the same as the argument data type

sum of expr

ession
across all input values

It should be noted that except for count, these functions return a null value when no rows are selected. In particular, sum of no rows returns null, not zero as one might expect. The coalesce function may be used to substitute zero for null when necessary.

Table 9-38 shows aggregate functions typically used in statistical analysis. (These are separated out merely to avoid cluttering the listing of more-commonly-used aggregates.) Where the description mentions N, it means the number of input rows for which all the input expressions are non-null. In all cases, null is returned if the computation is meaningless, for example when N is zero.

Table 9-38. Aggregate Functions for Statistics

Function

Argument Type

Return Type

Description

Function

Argument Type

Return Type

Description

corr(Y, X)

double precision

double precisi correlation coefficient on double precisi population covariance on double precisi sample covariance on
average of the

covar_pop(Y, X) double precision

covar_samp(Y, X double precision )

regr_avgx(Y, X) double precision

double independent precisi variable (sum(X)/N on


)

regr_avgy(Y, X) double precision

double dependent precisi variable (sum(Y)/N on


)

average of the

regr_count(Y, X double precision )

bigint

number of input rows in which both expressions are nonnull

regr_intercept double precision

y-intercept of double the leastprecisi squares-fit

Function

Argument Type

Return Type

Description linear equation determined by the (X, Y) pairs

(Y, X)

on

regr_r2(Y, X)

double precision

double square of the precisi correlation coefficient on


slope of the least-squaresfit linear equation determined by the (X, Y) pairs

regr_slope(Y, X double precision )

double precisi on

regr_sxx(Y, X)

double precision

sum(X^2) double sum(X)^2 precisi /N ("sum of on squares" of the


independent variable)

regr_sxy(Y, X)

double precision

double precisi ("sum of products" of on


independent times dependent variable)

sum(X*Y) - sum(X) * sum(Y)/N

Function

Argument Type

Return Type

Description

regr_syy(Y, X)

double precision

sum(Y^2) double sum(Y)^2 precisi /N ("sum of on squares" of the


dependent variable)

double precisi historical alias smallint, int, bigint, on for stddev(express real,double floating-point for stddev ion) _samp precision, ornumeric arguments, otherwisenu meric double precisi population smallint, int, bigint, on for stddev_pop(exp standard real,double floating-point deviation of the ression) precision, ornumeric arguments, input values otherwisenu meric double precisi sample smallint, int, bigint, on for stddev_samp(ex standard real,double floating-point deviation of the pression) precision, ornumeric arguments, input values otherwisenu meric variance(expres smallint, int, bigint, double
historical alias

Function

Argument Type

Return Type

Description

sion)

real,double precision, ornumeric

precisi for var_sa on for mp


floating-point arguments, otherwisenu

meric double population precisi variance of the smallint, int, bigint, on for input values var_pop(express real,double floating-point (square of the ion) population precision, ornumeric arguments, otherwisenu standard deviation) meric double sample precisi variance of the smallint, int, bigint, on for input values var_samp(expres real,double floating-point (square of the sion) sample precision, ornumeric arguments, otherwisenu standard deviation) meric
This section describes the SQL-compliant subquery expressions available in PostgreSQL. All of the expression forms documented in this section return Boolean (true/false) results.

EXISTS
EXISTS (subquery)

The argument of EXISTS is an arbitrary SELECT statement, or subquery. The subquery is evaluated to determine whether it returns any rows. If it returns at least one row, the result of EXISTS is "true"; if the subquery returns no rows, the result of EXISTS is "false". The subquery can refer to variables from the surrounding query, which will act as constants during any one evaluation of the subquery. The subquery will generally only be executed far enough to determine whether at least one row is returned, not all the way to completion. It is unwise to write a subquery that has any side effects (such as calling sequence functions); whether the side effects occur or not may be difficult to predict. Since the result depends only on whether any rows are returned, and not on the contents of those rows, the output list of the subquery is normally uninteresting. A common coding convention is to write all EXISTS tests in the form EXISTS(SELECT 1 WHERE ...). There are exceptions to this rule however, such as subqueries that use INTERSECT. This simple example is like an inner join on col2, but it produces at most one output row for each tab1 row, even if there are multiple matching tab2 rows:

SELECT col1 FROM tab1 WHERE EXISTS(SELECT 1 FROM tab2 WHERE col2 = tab1.col2);

IN
expression IN (subquery)

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of IN is "true" if any equal subquery row is found. The result is "false" if no equal row is found (including the special case where the subquery returns no rows). Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the IN construct will be null, not false. This is in accordance with SQL's normal rules for Boolean combinations of null values. As with EXISTS, it's unwise to assume that the subquery will be evaluated completely.

row_constructor IN (subquery)
The left-hand side of this form of IN is a row constructor, as described The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. The result of IN is "true" if any equal subquery row is found. The result is "false" if no equal row is found (including the special case where the subquery returns no rows). As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal if any corresponding members are non-null and unequal; otherwise the result of that row comparison is unknown (null). If all the per-row results are either unequal or null, with at least one null, then the result of IN is null.

NOT IN
expression NOT IN (subquery)
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of NOT IN is "true" if only unequal subquery rows are found (including the special case where the subquery returns no rows). The result is "false" if any equal row is found. Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the NOT IN construct will be null, not true. This is in accordance with SQL's normal rules for Boolean combinations of null values. As with EXISTS, it's unwise to assume that the subquery will be evaluated completely.

row_constructor NOT IN (subquery)


The left-hand side of this form of NOT IN is a row constructor, as described The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the lefthand row. The left-hand expressions are evaluated and compared rowwise to each row of the subquery result. The result of NOT IN is "true" if only unequal subquery rows are found (including the special case where the subquery returns no rows). The result is "false" if any equal row is found. As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal

if any corresponding members are non-null and unequal; otherwise the result of that row comparison is unknown (null). If all the per-row results are either unequal or null, with at least one null, then the result of NOT IN is null.

ANY/SOME
expression operator ANY (subquery) expression operator SOME (subquery)
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result using the given operator, which must yield a Boolean result. The result of ANY is "true" if any true result is obtained. The result is "false" if no true result is found (including the special case where the subquery returns no rows).

SOME is a synonym for ANY. IN is equivalent to = ANY.


Note that if there are no successes and at least one right-hand row yields null for the operator's result, the result of the ANY construct will be null, not false. This is in accordance with SQL's normal rules for Boolean combinations of null values. As with EXISTS, it's unwise to assume that the subquery will be evaluated completely.

row_constructor operator ANY (subquery) row_constructor operator SOME (subquery)

The left-hand side of this form of ANY is a row constructor, as described in The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result, using the given operator. The result of ANY is "true" if the comparison returns true for any subquery row. The result is "false" if the comparison returns false for every subquery row (including the special case where the subquery returns no rows). The result is NULL if the comparison does not return true for any row, and it returns NULL for at least one row.

ALL
expression operator ALL (subquery)
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result using the given operator, which must yield a Boolean result. The result of ALL is "true" if all rows yield true (including the special case where the subquery returns no rows). The result is "false" if any false result is found. The result is NULL if the comparison does not return false for any row, and it returns NULL for at least one row.

NOT IN is equivalent to <> ALL.


As with EXISTS, it's unwise to assume that the subquery will be evaluated completely.

row_constructor operator ALL (subquery)


The left-hand side of this form of ALL is a row constructor, as described The right-hand side is a parenthesized subquery, which must

return exactly as many columns as there are expressions in the lefthand row. The left-hand expressions are evaluated and compared rowwise to each row of the subquery result, using the given operator. The result of ALL is "true" if the comparison returns true for all subquery rows (including the special case where the subquery returns no rows). The result is "false" if the comparison returns false for any subquery row. The result is NULL if the comparison does not return false for any subquery row, and it returns NULL for at least one row. . Row-wise Comparison

row_constructor operator (subquery)


The left-hand side is a row constructor, as described in Section 4.2.11. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. Furthermore, the subquery cannot return more than one row. (If it returns zero rows, the result is taken to be null.) The left-hand side is evaluated and compared row-wise to the single subquery result row.
5. Compare and Contrast the Centralized and Client / Server Architecture for DBMS Answer : In centralized database systems, the database system, application programs, and user-interface all are
executed on a single system and dummy terminals are connected to it. The processing power of single system is utilized and dummy terminals are used only to display the information. As the personal computers became faster, more powerful, and cheaper, the database system started to exploit the available processing power of the system at the users side, which led to the development of client/server architecture. In client/server architecture, the processing power of the computer system at the users end is utilized by processing the user-interface on that system. A client is a computer system that sends request to the server connected to the network, and aserver is a computer system that receives the request, processes it, and returns the requested information back to the client. Client and server are usually present at different sites. The end users (remote database users) work on client computer system and database system runs on the server. Servers can be of several types, for example, file servers, printer servers, web servers, database servers, etc. The client machines have user interfaces that help users to utilize the servers. It also provides users the local processing power to run local applications on the client side.

There are two approaches to implement client/server architecture. In the first approach, the user interface and application programs are placed on the client side and the database system on the server side. This architecture is called two-tier architecture. The application programs that reside at the client side invoke the DBMS at the server side. The application program interface standards like Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) are used for interaction between client and server. shows two-tier architecture.

END USER APPLICATION PROGRAMMER

CLIENT

SERVER
DATABASE SYSTEMS The second approach, that is, three-tier architecture is primarily used for web-based applications. It adds intermediate layer known as application server (or web server) between the client and the database server. The client communicates with the application server, which in turn communicates with the database server. The application server stores the business rules (procedures and constraints) used for accessing data from database server. It checks the clients credentials before forwarding a request to database server. Hence, it improves database security. When a client requests for information, the application server accepts the request, processes it, and sends corresponding database commands to database server. The database server sends the result back to application server which is converted into GUI format and presented to the client. shows the three-tier architecture. Three-tier architecture

GUI WEB INTERFASE CLIENT APPLICATIONS PROGRAMS WEB PAGES APPLICATION SERVER / WEB SERVER DATA BASE SYSTEM DATA BASE SERVER

6. Taking an example Enterprise System, List out the Entity types, Entity Sets, Attributes and Keys Answer : Entity Types, Entity Sets, Attributes, and Keys Entities and Attributes Entities and Their Attributes. The basic object that the ER model represents is an entity, which is a "thing" in the real world with an independent existence. An entity may be an object with a physical existence (for example, a particular person, car, house, or employee) or it may be an object with a conceptual existence (for example, a company, a job, or a university course). Each entity has attributesthe particular properties that describe it. For example, an employee entity may be described by the employees name, age, address, salary, and job. A particular entity will have a value for each of its attributes. The attribute values that describe each entity become a major part of the data stored in the database.

Figure shows an E-R schema diagram for the company database Figure 2.3 shows two entities and the values of their attributes. The employee entity e, has four attributes: Name, Address, Age, and HomePhone; their values are "Mahesh Kumar," "2311 Ameerpet, Hyderabad, AP 500001," "55," and "402459672," respectively. The company entity c, has three attributes: Name, Headquarters, and President; their values are "CDAC," "Hyderabad," and "Mahesh Kumar," respectively.

Figure shows an E-R schema diagram for the company database

Several types of attributes occur in the ER model: simple versus composite, singlevalued versus multivalued, and stored versus derived. We first define these attribute types and illustrate their use via examples. We then introduce the concept of a null value for an attribute. Composite versus Simple (Atomic) Attributes: Composite attributes can be divided into smaller subparts, which represent more basic attributes with independent meanings. For example, the Address attribute of the employee entity shown in Figure 2.3 can be subdivided into StreetAddress, City, State, and Zip, with the values "2311 Ameerpet," "Hyderabad," "AR" and "500001." Attributes that are not divisible are called simple or atomic attributes. Composite attributes can form a hierarchy. For example, StreetAddress can be further subdivided into three simple attributes: Number, Street, and ApartmentNumber, as shown in Figure 2.4. The value of a composite attribute is the concatenation of the values of its constituent simple attributes.

A hierarchy of composite attributes Composite attributes are useful to model situations in which a user sometimes refers to the composite attribute as a unit but at other times refers specifically to its components. If the composite attribute is referenced only as a whole, there is no need to subdivide it into component attributes. For example, if there is no need to refer to the individual components of an address (zip code, street, and so on), then the whole address can be designated as a simple attribute. Single-Valued versus Multivalued Attributes: Most attributes have a single value for a particular entity; such attributes are called single-valued. For example, Age is a single-valued attribute of a person. In some cases an attribute can have a set of values for the same entity for example, a Colors attribute for a car, or a CollegeDegrees attribute for a person. Cars with one color have a single value, whereas two-tone cars have two values for Colors. Similarly, one person may not have a college degree, another person may have one, and a third person may have two or more degrees; therefore, different persons can have different numbers of values for the CollegeDegrees attribute. Such attributes arc called multivalued. A multivalued attribute may have lower and upper bounds to constrain the number of values allowed for each individual entity. For example, the Colors attribute of a car may have between one and three values, if we assume that a car can have at most three colors. Stored versus Derived Attributes. In some cases, two (or more) attribute values are related for example, the Age and POB attributes of a person. For a particular person entity,

the value of Age can be determined from the current (todays) date and the value of that persons date of birth (DOB). The Age attribute is hence called a derived attribute and is said to be derivable from the DOB attribute, which is called a stored attribute. Some attribute values can be derived from related entities. For example, an attribute Number Of Employees of a department entity can be derived by counting the number of employees related to (working for) that department. Null Values. In some cases a particular entity may not have an applicable value for an attribute. For example, the ApartmentNumber attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a CollegeDegrees attribute applies only to persons with college degrees. For such situations, a special value called null is created. An address of a single-family home would have null for its ApartmentNumber attribute, and a person with no college degree would have null for ColicgcDegrees. Null can also be used if we do not know the value of an attribute for a particular entity for example, if we do not know the home phone of "Mahesh Kumar" the meaning of the former type of null is not applicable, whereas the meaning of the latter is unknown. The "unknown" category of null can be further classified into two cases. The first: case arises when it is known that the attribute value exists but is missing for example, if the Height attribute of a person is listed as null. The second case arises when it is net known whether the attribute value exists tor example, if the HomePhone attribute of a person is null. Complex Attributes. Notice that composite and multivalued attributes can be nested in an arbitrary way. We can represent arbitrary nesting by grouping components of a composite attribute between parentheses () and separating the components with commas, and by displaying multivalued attributes between braces {}. Such attributes are called complex attributes. For example, if a person can have more than one residence and each residence can have multiple phones, an attribute AddressPhone for a person can be specified as shown in Figure 2.5.

A complex attribute: AddressPhone Entity Types, Entity Sets, Keys, and Value Sets Entity Types and Entity Sets. A database usually contains groups of entities that are similar. For example, a company employing hundreds of employees may want to store pimilar information concerning each of the employees. These employee entities share the same attributes, but each entity has its own value(s) tor each attribute. An entity type defines a collection (or set) of entities that have the same attributes. Each entity type in the database is described by its name and attributes. Figure 2.6 shows two entity types, named employee and COMPANY, and a list of attributes for each. A few individual entities of each type are also illustrated, alone; with the values of their attributes. The collection of all entities of a particular

entity type in the database at any point in time is called an entity set; the entity set is usually referred to using the same name as the entity type. For example, employee refers to both a type of entity as well as the current set of all employee entities m the database. An entity type is represented in ER diagrams as a rectangular box enclosing the entity type name. Attribute names are enclosed in ovals and are attached to their entity type by straight lines. Composite attributes are attached to their component attributes by straight lines. Multivalued attributes arc displayed in double ovals. An entity type describes the schema or intension for a set of entities that share the same structure. The collection of entities of a particular entity type are grouped into an entity set, which is also called the extension of the entity type. Key Attributes Of an Entity Type An important constraint on the entities of an entity type is the key or uniqueness constraint on attributes. An entity type usually has an attribute whose values are distinct for each individual entity in the entity set. Such an attribute is called a key attribute, and its values can be used to identify each entity uniquely. For example, the Name attribute is a key of the company entity type in Figure 2.6, because no two companies are allowed to have the same name. Sometimes, several attributes together form a key, meaning that the combination of the attribute values must be distinct for each entity. If a set of attributes possesses this property, the proper way to represent this in the ER model that we describe here. Specifying that an attribute is a key of an entity type means that the preceding uniqueness property must hold for every entity set of the entity type. Hence, it is a constraint that prohibits any two entities from having the same value for the key attribute at the same time. It is not the property of a particular extension; rather, it is a constraint on all extensions of the entity type. This key constraint (and other constraints we discuss later) is derived from the constraints of the miniworld that the database represents. Some entity types have more than one key attribute. For example, each of the VehiclelD and Registration attributes of the entity type CAR is a key in its own right. The Registration attribute is an example of a composite key formed from two simple component attributes, RegistrationNumber and State, neither of which is a key on its own. An entity type may also have no key, in which case it is called a weak entity type. Value Sets (Domains) of Attributes Each simple attribute of an entity type is associated with a value set (or domain of values), which specifies the set of values that may be assigned to that attribute for each individual entity. We can specify the value set for the Name attribute as being the set of strings of alphabetic characters separated by blank characters, and so on. Value sets are not displayed in ER diagrams. Value sets are typically specified using the basic data types available in most

programming languages, such as integer, string, boolean, float, enumerated type, subrange, and so on. Additional data types to represent date, time, and other concepts are also employed. Mathematically, an attribute A of entity type E whose value set is V can be defined as a function from E to the power set P(V) of V: A : E - P(V) We refer to the value of attribute A for entity e as A(e). The previous definition covers both single-valued and multivalued attributes, as well as nulls. A null value is represented by the empty set. For single-valued attributes, A(e) is restricted to being a singleton set for each entity e in E, whereas there is no restriction on multivalued attributes. For a composite attribute A, the value set V is the Cartesian product of: P(V2),, P(VJ, where V,, V2,. . ., Vn are the value sets of the simple component attributes that form A: V = P(V,) xP(V,) x x

7. Illustrate with an example of your own the Relational Model Notations Answer : Relational Data Model: The model uses the concept of a mathematical relation-which looks somewhat like a table of values-as its basic building block, and has its theoretical basis in set theory and first order predicate logic. The relational model represents the database a collection of relations. Each relation resembles a table of values or, to some extent, a flat file of records. When a relation is thought of as a table of values, each row in the table represents a collection of related data values. In the relation model, each row in the table represents a fact that typically corresponds to a real-world entity or relationship. The table name and column names are used to help in interpreting the meaning of the values in each row. In the formal relational model terminology, a row is called a tuple, a column header is called an attribute, and the table is called a relation. The data type describing the types of values that can appear in each column is represented by domain of possible values. ER Model: An entity-relationship model (ERM) is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created by this process are called entity-relationship diagrams, ER diagrams, or ERDs. The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a

physical model during physical design. We create a relational schema from an entityrelationship(ER) schema. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Sometimes, both of these phases are referred to as "physical design". Key elements of this model are entities, attributes, identifiers and relationships. Correspondence between ER and Relational Models: ER Model Entity type 1:1 or 1:N relationship type M:N relationship type n ary relationship type Simple attributes Composite attributes Multivalued attributes Value set Key attribute Relational Model Entity relation Foregin key Relationship relation and two foreign keys Relationship relation and n foreign keys Attributes Set of simple component attributes Relation and foreign key Domain Primary key or secondary key

The COMPANY ER schema is below:

Result of mapping the company ER schema into a relational database schema: EMPLOYEE FNAM E INITIA L LNAM E EN O DO B ADDRES S SE X SALAR Y SUPEREN O DN O

DEPARTMENT

DNAME

DNUMBER

MGRENO

MGRSTARTDATE

DEPT_LOCATIONS DNUMBER PROJECT PNAME PNUMBER PLOCATION DNUM DLOCATION

WORKS_ON EENO PNO HOURS

DEPENDENT EENO DEPENDENT_NAME SEX DOB RELATIONSHIP

8. Consider a University Database System and develop the ER Conceptual Schema diagram i.e. E-R Diagram for the same
This used to be where I'd let off steam after uncovering a nasty bug in the Illustra object-relational database management system. And, in fact, sometimes I reached such poetic heights of vitriol that I'm leaving the old stuff at the bottom (also, it might be useful if you are still running Illustra for some reason).

However, there really aren't any good reasons to pick on Illustra anymore. The company was bought by Informix, one of the "big three" traditional RDBMS vendors (Oracle and Sybase being the other two). Informix basically folded the interesting features of the old Illustra system into their industrial-strength enterprise-scale RDBMS and calls the result "Informix Universal Server" (IUS). To the extent that IUS is based on old code, it is based on Informix's tried and true Online Server, which has been keeping banks and insurance companies with thousands of simultaneous users up and running for many years. I plan to be experimenting with IUS in some heavily accessed sites during the latter portion of 1997. I'm going to record my experiences here and hope to have lots of tips and source code to distribute. Schemas, Instances, and Database State

In any data model, it is important to distinguish between the description of the database and the database itself. The description of a database is called the database schema, which is specified during database design and is not expected to change frequently. Most data models have certain conventions for displaying schemas as diagrams. A displayed schema is called a schema diagram. Figure 1.1 shows a schema diagram for a database; the diagram displays the structure of each record type but not the actual instances of records. We call each object in the schema such as STUDENT or COURSE a schema construct. A schema diagram displays only some aspects of a schema, such as the names of record types and data items, and some types of constraints. Other aspects are not specified in the schema diagram. Many types of constraints are not represented in schema diagrams. A constraint such as "students majoring in computer science must take CS1310 before the end of their second year" is quite difficult to represent. The actual data in a database may change quite frequently. The data in the database at a particular moment in time is called a database state or snapshot. It is also called the current set of occurrences or instances in the database. In a given database state, each schema construct has its own current set of instances. Many database states can be constructed to correspond to a particular database schema. Every time we insert or delete a record or change the value of a data item in a record, we change one state of the database into another state. The distinction between database schema and database state is very important. When we define a new database, we specify its database schema only to the DBMS. At this point, the corresponding database state is the empty state with no data. We get the initial state of the database when the database is first populated or loaded with the initial data. From then on, every time an update operation is applied to the database, we get another database state. At any point in time, the database has a current state. The DBMS is partly responsible for ensuring that every state of the database is a valid state that is, a state that satisfies the structure and constraints specified in the schema. Hence, specifying a correct schema to the DBMS is extremely important, and the schema must be designed with the utmost care. The DBMS stores the descriptions of the schema constructs and constraints also called the meta-data in the DBMS catalog so that DBMS software can refer to the schema whenever it needs to. The schema is sometimes called the intension, and a database state an extension of the schema. Although, as mentioned earlier, the schema is not supposed to change frequently, it is not uncommon that changes need to be occasionally applied to the schema as the application requirements change. Most modern DBMSs include some operations for schema evolution that can be applied while the database is operational.

MC0067 Database Management


Assignment Set 2 BOOK ID: B0716 & BO717 1. Write about:

Answer : Physical Storage Structure of DBMS: The physical design of the database specifies the physical configuration of the database on the storage media. This includes detailed specification of data elements, data types, indexing options and other parameters residing in the DBMS data dictionary. It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system. Physical structures are those that can be seen and operated on from the operating system, such as the physical files that store data on a disk. Basic Storage Concepts (Hard Disk) disk access time = seek time + rotational delay disk access times are much slower than access to main memory.

overriding DBMS performance objective is to minimise the number of disk accesses (disk I/Os)

Indexing: Data structure allowing a DBMS to locate particular records more quickly and hence speed up queries.Book index has index term (stored in alphabetic order) with a page number.Database index (on a particular attribute) has attribute value (stored in order) with a memory address.An index gives direct access to a record and prevents having to scan every record sequentially to find the one required. Using SUPPLIER(Supp# , SName, SCity) Consider the query Get all the suppliers in a certain city ( e.g. London) 2 possible strategies: a. Search the entire supplier file for records with city 'London'

b. Create an index on cities, access it for 'London entries and follow the pointer to the corresponding records SCity Index Dublin London London Paris Paris 2. Write about: Supp# S1 S2 S3 S4 S5 SName Smith Jones Brown Clark Ellis SCity London Paris Paris London Dublin

Answer : Application Logic


Database architectures can be distinguished by examining the way application logic is distributed throughout the system. Application logic consists of three components: Presentation Logic, Processing Logic, and Storage Logic. The presentation logic component is responsible for formatting and presenting data on the users screen The processing logic component handles data processing logic, business rules logic, and data management logic. Finally, the storage logic component is responsible for the storage and retrieval from actual devices such as a hard drive or RAM. By determining which tier(s) these components are processed on we can get a good idea of what type of architecture and subtype we are dealing with One Tier Architectures Imagine a person on a desktop computer who uses Microsoft Access to load up a list of personal addresses and phone numbers that he or she has saved in MS Windows My Documents folder. This is an example of a one-tier database architecture. The program (Microsoft Access) runs on the users local machine, and references a file that is stored on that machines hard drive, thus using a single physical resource to access and process information. Another example of a one-tier architecture is a file server architecture. In this scenario, a workgroup database is stored in a shared location on a single machine. Workgroup members use a software package such as Microsoft Access to load the data and then process it on their local machine. In this case, the data may be shared among different users, but all of the processing occurs on the local machine. Essentially, the file-server is just an extra hard drive from which to retrieve files. Yet another way one-tier architectures have appeared is in that of mainframe computing. In this outdated system, large machines provide directly connected unintelligent terminals with the means necessary to access, view and manipulate data. Even though this is considered a client-server system, since all of the processing power (for both data and applications) occurs on a single machine, we have a one-tier architecture. One-tier architectures can be beneficial when we are dealing with data that is relevant to a single user (or small number of users) and we have a relatively small amount of data. They are somewhat inexpensive to deploy and maintain. Client/Server Architectures A two-tier architecture is one that is familiar to many of todays computer users. A common implementation of this type of system is that of a Microsoft Windows based client program that accesses a server database such as Oracle or SQL Server. Users interact through a GUI (Graphical User Interface) to communicate with the database server across a network via SQL (Structured Query Language). In two-tier architectures it is important to note that two configurations exist. A thin-client (fat-server) configuration exists when most of the processing occurs on the server tier. Conversely, a fat-client (thin-server) configuration exists when most of the processing occurs on

the client machine. Another example of a two-tier architecture can be seen in web-based database applications. In this case, users interact with the database through applications that are hosted on a web-server and displayed through a web-browser such as Internet Explorer. The web server processes the web application, which can be written in a language such as PHP or ASP. The web app connects to a database server to pass along SQL statements which in turn are used to access, view, and modify data. The DB server then passes back the requested data which is then formatted by the web server for the user. Although this appears to be a three-tier system because of the number of machines required to complete the process, it is not. The webserver does not normally house any of the business rules and therefore should be considered part of the client tier in partnership with the web-browser. Two-tier architectures can prove to be beneficial when we have a relatively small number of users on the system (100-150) and we desire an increased level of scalability.

Client-Server Architecture

3. Write about: -R Modeling -R Notations with examples Answer : Basic Constructs of E-R Modeling: The ER model views the real world as a construct of entities and association between entities. The basic constructs of ER modeling are entities, attributes, and relationships. Entity: An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of some domain. When we speak of an entity we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world. An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term.

Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem. Relationship: A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a

company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem. Attributes: Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute.

ER Notation There is no standard for representing data objects in ER diagrams. Each modeling methodology uses its own notation. The original notation used by Chen is widely used in academics texts and journals but rarely seen in either CASE tools or publications by nonacademics. Today, there are a number of notations used, among the more common are Bachman, crows foot, and IDEFIX. All notational styles represent entities as rectangular boxes and relationships as lines connecting boxes. Each style uses a special set of symbols to represent the cardinality of a connection. The notation used in this document is from Martin. The symbols used for the basic ER constructs are: entities are represented by labeled rectangles. The label is the name of the entity. Entity names should be singular nouns. relationships are represented by a solid line connecting two entities. The name of the relationship is written above the line. Relationship names should be verbs. attributes, when included, are listed inside the entity rectangle. Attributes which are identifiers are underlined. Attribute names should be singular nouns. cardinality of many is represented by a line ending in a crows foot. If the crows foot is omitted, the cardinality is one. existence is represented by placing a circle or a perpendicular bar on the line. Mandatory existence is shown by the bar (looks like a 1) next to the entity for an instance is required. Optional existence is shown by placing a circle next to the entity that is optional. Examples of these symbols are : ER Notation

4. Write about:

with appropriate examples for each Answer : Types of Discretionary Privileges:

The concept of an authorization identifier is used to refer, to a user account. The DBMS must provide selective access to each relation in the database based on specific accounts. There are two levels for assigning privileges to use use the database system:

The account level: At this level, the DBA specifies the particular privileges that each account holds independently of the relations in the database. The relation (or table level): At this level, the DBA can control the privilege to access each individual relation or view in the database. The privileges at the account level apply to the capabilities provided to the account itself and can include the CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base relation; the CREATE VIEW privilege; the ALTER privilege, to apply schema changes such adding or removing attributes from relations; the DROP privilege, to delete relations or views; the MODIFY privilege, to insert, delete, or update tuples; and the SELECT privilege, to retrieve information from the database by using a SELECT query. The second level of privileges applies to the relation level, whether they are base relations or virtual (view) relations. The granting and revoking of privileges generally follow an authorization model for discretionary privileges known as the access matrix model, where the rows of a matrix M represents subjects (users, accounts, programs) and the columns represent objects (relations, records, columns, views, operations). Each position M(i,j) in the matrix represents the types of privileges (read, write, update) that subject i holds on object j. To control the granting and revoking of relation privileges, each relation R in a database is assigned and owner account, which is typically the account that was used when the relation was created in the first place. The owner of a relation is given all privileges on that relation. The owner account holder can pass privileges on R to other users by granting privileges to their accounts. In SQL the following types of privileges can be granted on each individual relation R: SELECT (retrieval or read) privilege on R: Gives the account retrieval privilege. In SQL this gives the account the privilege to use the SELECT statement to retrieve tuples from R. MODIFY privileges on R: This gives the account the capability to modify tuples of R. In SQL this privilege is further divided into UPDATE, DELETE, and INSERT privileges to apply the corresponding SQL command to R. In addition, both the INSERT and UPDATE privileges can specify that only certain attributes can be updated or inserted by the account. REFERENCES privilege on R: This gives the account the capability to reference relation R when specifying integrity constraints. The privilege can also be restricted to specific attributes of R. Propagation of Privileges using the GRANT OPTION: Whenever the owner A of a relation R grants a privilege on R to another account B, privilege can be given to B with or without the GRANT OPTION. If the GRANT OPTION is given, this means that B can also grant that privilege on R to other accounts. Suppose that B is given the GRANT OPTION by A and that B then grants the privilege on R to a third account C, also with GRANT OPTION. In this way, privileges on R can propagate to other accounts without the knowledge of the owner of R. If the owner account A now revokes the privilege granted to B, all the privileges that B propagated based on that privilege should automatically be revoked by the system.

5. Describe the following Association Rules:

Classification C l a s s i f i c a t i o n i s t h e p r o c e s s o f l e a r n i n g a m o d e l t h a t d e s c r i b e s differe nt classes of data. The classes are predetermined. For example, in ab a n k i n g a p p l i c a t i o n , c u s t o m e r s w h o a p p l y f o r a c r e d i t c a r d m a y b e classified as a poor risk, a fair risk, or a good risk. Hence this type of activity is also called supervised learning. Once the model is built, then itcan be used to classify new data. The first step, of learning the model, isa c c o m p l i s h e d b y u s i n g a t r a i n i n g s e t o f d a t a t h a t h a s a l r e a d y b e e n classified. Each record in the training data contains an attribute, called theclass label, that indicates which class the record belongs to. The model thatis produced is usually in the form of a decision tree or a set of rules. Some of the important issues with rega rd to the model and the algorithm that produces the model include the models ability to predict the correct classof new data, the computational cost associated with the algorithm, and thescalability of the algorithm. W e will examine the approach where our model is in the form of a decision tree. A decision tree is simply a graphical representation of thed e s c r i p t i o n o f e a c h c l a s s o r i n o t h e r w o r d s , a r e p r e s e n t a t i o n o f t h e classification rules. Clustering : Association rules and clustering are fundamental data mining techniques used for different goals. We propose a unifying theory by proving association support and rule confidence can be bounded and estimated from clusters on binary dimensions. Three support metrics are introduced: lower, upper and average support. Three confidence metrics are proposed: lower, upper and average confidence. Clusters represent a simple model that allows understanding and approximating association rules, instead of searching for them in a large transaction data set
6. Write about:

With an example for each Answer : Categories of Data Models


A model is a representation of reality, real world objects and events, and their associations. It is an abstraction that concentrates on the essential, inherent aspects of an organization and ignores the accidental properties. A data model represents the organization itself. Let should provide the basic concepts and notations that will allow database designers and end users unambiguously and accurately to communicate their understanding of the organizational data.

Data Model can be defined as an integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization. A data model comprises of three components: A structural part, consisting of a set of rules according to which databases can be constructed. A manipulative part. Defining the types of operation that are allowed on the data (this includes the operations that are used or updating or retrieving data from the database and for changing the structure of the database). Possibly a set of integrity rules, which ensures that the data is accurate. The purpose of a data model is to represent data and to make the data understandable. There have been many data models proposed in the literature. They fall into three broad categories: Object Based Data Models Physical Data Models Record Based Data Models The object based and record based data models are used to describe data at the conceptual and external levels, the physical data model is used to describe data at the internal level. A) Object Based logical Models: These models are used to describe data at the logical and view levels. The following are the well known models in this group.

Entity Relationship Model. The Object-Oriented Model. The Semantic data model The functional data model. Entity Relationship Model: Entity: An Entity is an object or a thing such as person, place about which an organization keeps information. Any two objects or things are distinguishable. E.g.: Each student is an entity. Attribute: The describing properties of an entity are called Attributes.

E.g.: For a student entity, name, sex, date of birth are attributes. Relationship: An association among entities is called a relationship. The data model that consists of a set of entities and a set of relationships among those entities is called ER Model. The set of all entities of the same type is called an entity set and the set of all relationship of the same type are called a relationship set. The Object-Oriented Model: The object oriented model is a data model based on a collection of objects. Each object has a unique identity. The group of objects containing the same type of are called classes. The Semantic data model: These models were based on semantic networks. Inter dependencies among the entities can be expressed in this data model. Functional Data Model: In this model objects, properties of objects, their relationships are viewed uniformly and are defined as functions. values and the same methods

C) Physical Data Models: These models are used to represent data at the lowest level. Two important physical Data Models are:

Unifying Model Frame Memory Model.

B) Record Based Logical models: This model is used to describe data at the logical and view levels. The database is structured in fixed format records of different types. Each record type has a fixed number of fields. And each field is of fixed length. The following are the three important record based logical models.

Relational Model Network Model Hierarchical Model. Relational Model:

A data model in which both data and their relationships are represented by means of tables is called Relational Model. The relation is the only data structure used in this model to represent both entities and their interrelationships. A relation is a two dimensional table with a unique name. Each row of a table is called a tuple and each column of a table is called an attribute. The set of all possible values in an attribute is called the domain of the attribute. Network Model: The network model uses two different structures. The data are represented by a collection of records and the relationships among data are represented by links. Hierarchical Model: In Hierarchical Model, data are represented by records and relationships among data are represented by links. But unlike in Network model, data are organized in an ordered tree structure, which is called Hierarchical structure.

Schemas
A database schema is described in a formal language supported by the database management system (DBMS). In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables. Schemas are generally stored in a data dictionary. Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure Levels of database schema 1. Conceptual schema, a map of concepts and their relationships 2. Logical schema, a map of entities and their attributes and relations 3. Physical schema, a particular implementation of a logical schema 4. Schema object, Oracle database object 5. Schema is the overall structure of the database

The following examples illustrate common schema designs based on the design considerations that are essential to usability and performance. This example illustrates a multi star schema, in which the primary and foreign keys are not composed of the same set of columns. This design also contains a family of fact tables: a Bookings table, an Actual table, and a Promo_Schedule table. This database tracks reservations (bookings) and actual accommodations rented for a chain of hotels, as well as various promotions. It also maintains information about customers, promotions, and each hotel in the chain.

In cases where payment is received in advance (for example, reservation deposits, cable TV subscriptions, automobile insurance), in accordance with proper accounting procedures, transactions must be made to reflect income as it is earned, not when it is received, and the database must be designed to accommodate such transactions.

Instances

Every running Oracle database is associated with an Oracle instance. When a database is started on a database server (regardless of the type of computer), Oracle allocates a memory area called the System Global Area (SGA) and starts one or more Oracle processes. This combination of the SGA and the Oracle processes is called an Oracle instance. The memory and processes of an instance manage the associated database's data efficiently and serve the one or multiple users of the database. Below figure shows an Oracle instance.

Example: An organization with an employees database might have three different instances: production (used to contain live data), pre-production (used to test new functionality prior to release into production) and development (used by database developers to create new functionality).

Database States When a configuration is enabled, its databases can be in one of several states that direct the behavior of Data Guard, for example transmitting redo data or applying redo data. The broker does not manage the state of the database (that is, mounted or opened). Below table describes the various database states.

Database States and Descriptions


Database Role

State Name Description


TRANSPORTON

Primary

Redo transport services are set up to transmit redo data to the standby databases when the primary database is open for read/write access.

Database Role

State Name Description

If this is an Oracle RAC database, all instances open in read/write mode will have redo transport services running. This is the default state for a primary database when it is enabled for the first time. Primary
TRANSPORTOFF

Redo transport services are stopped on the primary database. If this is an Oracle RAC database, redo transport services are not running on any instances.

Physical standby

APPLY-ON

Redo Apply is started on a physical standby database. If the standby database is an Oracle RAC database, the broker starts Redo Apply on exactly one standby instance, called the apply instance. If this instance fails, the broker automatically chooses another instance that is either mounted or open read-only. This new instance then becomes the apply instance. This is the default state for a physical standby database when it is enabled for the first time. If a license for the Oracle Active Data Guard option has been purchased, a physical standby database can be open while Redo Apply is active. This capability is known as real-time query.

Physical standby

APPLY-OFF

Redo Apply is stopped. If this is an Oracle RAC database, there is no instance running Apply Services until you change the database state to APPLY-ON.

Snapshot standby Logical standby

APPLY-OFF

Redo data is received from the primary database but is not applied. The database is opened for read/write access. SQL Apply is started on the logical standby database when it is opened and the logical standby database guard is on. If this is an Oracle RAC database, SQL Apply is running on one instance, the apply instance. If this instance fails, the broker automatically chooses another open instance. This new instance becomes the apply instance. This is the default state for a logical standby database when it is enabled for the first time.

APPLY-ON

Database Role

State Name Description


APPLY-OFF

Logical standby

SQL Apply is not running on the logical standby database. The logical standby database guard is on. If this is an Oracle RAC database, there is no instance running SQL Apply until you change the state to APPLY-ON.

We can use the DGMGRL EDIT DATABASE command to explicitly change the state of a database. For example, the EDIT DATABASE command in the following example changes the state of the North_Sales database to TRANSPORT-OFF.
DGMGRL> EDIT DATABASE 'North_Sales' SET STATE='TRANSPORT-OFF'; Succeeded.

7. Taking an example Enterprise System, List out the Relationship types, Relationship sets and roles Answer : There are three type of relationships
1) One to one 2) One to many 3) Many to many Say we have table1 and table2 For one to one relationship, a record(row) in table1 will have at most one matching record or row in table2 I.e. it mustnt have two matching records or no matching records in table2. For one to many, a record in table1 can have more than one record in table2 but not vice versa Lets take an example, Say we have a database which saves information about Guys and whom they are dating. We have two tables in our database Guys and Girls Guy id 1 Guy name Andrew

2 3 Girl id 1 2 3

Bob Craig Girl name Girl1 Girl2 Girl3

Here in above example Guy ID and Girl ID are primary keys of their respective table. Say Andrew is dating Girl1, Bob Girl2 and Craig is dating Girl3. So we are having a one to one relationship over there. So in this case we need to modify the Girls table to have a Guy id foreign key in it. Girl id 1 2 3 Girl name Girl1 Girl2 Girl3 Guy id 1 2 3

Now let say one guy has started dating more than one girl. i.e. Andrew has started dating Girl1 and say a new Girl4 That takes us to one to many relationships from Guys to Girls table. Now to accommodate this change we can modify our Girls table like this Girl Id 1 2 3 4 Girl Name Girl1 Girl2 Girl3 Girl4 Guy Id 1 2 3 1

Now say after few days, comes a time where girls have also started dating more than one boy i.e. many to many relationships So the thing to do over here is to add another table which is called Junction Table, Associate Table or linking Table which will contain primary key columns of both girls and guys table.

Let see it with an example Guy id 1 2 3 Girl id 1 2 3 Guy name Andrew Bob Craig Girl name Girl1 Girl2 Girl3

Andrew is now dating Girl1 and Girl2 and Now Girl3 has started dating Bob and Craig so our junction table will look like this

Guy ID 1 1 2 2 3

Girl ID 1 2 2 3 3

It will contain primary key of both the Girls and Boys table.

A relationship type R among n entity types E1, E2, , En is a set of associations among entities from these types. Actually, R is a set of relationship instances ri where each ri is an n-tuple of entities (e1, e2, , en), and each entity ej in ri is a member of entity type Ej, 1jn. Hence, a relationship type is a mathematical relation on E1, E2, , En, or alternatively it can be defined as a subset of the Cartesian product E1x E2x xEn . Here, entity types E1, E2, , En defines a set of relationship, called relationship sets. Relationship instance: Each relationship instance ri in R is an association of entities, where the association includes exactly one entity from each participating entity type. Each such relationship instance ri represent the fact that the entities participating in ri are related in some way in the corresponding miniworld situation. For example, in relationship type WORKS_FOR associates one EMPLOYEE and DEPARTMENT, which associates each employee with the

department for which the employee works. Each relationship instance in the relationship set WORKS_FOR associates one EMPLOYEE and one DEPARTMENT.

8. With an example show that Serializability can be guaranteed with two phase locking Answer : In databases and transaction processing, two-phase locking (2PL) is a concurrency
control method that guaranteesserializability. It is also the name of the resulting set of database transaction schedules (histories). The protocol utilizes locks, applied by a transaction to data, which may
[1][2]

block (interpreted as signals to stop) other transactions from accessing the same data during the transaction's life. By the 2PL protocol locks are applied and removed in two phases: Expanding phase: locks are acquired and no locks are released. Shrinking phase: locks are released and no locks are acquired. Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions. 2PL is a superset of strong strict two-phase locking (SS2PL),[3] also called rigorousness,[4] which has been widely utilized for concurrency control in general-purpose database systems since the 1970s. SS2PL implementations have many variants. SS2PL was called strict 2PL[1] but this name usage is not recommended now. Now strict 2PL (S2PL) is the intersection of strictness and 2PL, which is different from SS2PL. SS2PL is also a special case of commitment ordering,[3] and inherits many of CO's useful properties. SS2PL actually comprises only one phase: phase-2 does not exist, and all locks are released only after transaction end. Thus this useful 2PL type is not two-phased at all. Neither 2PL nor S2PL in their general forms are known to be used in practice. Thus 2PL by itself does not seem to have much practical importance, and whenever 2PL or S2PL utilization has been mentioned in the literature, the intention has been SS2PL. What has made SS2PL so popular (probably the most utilized serializability mechanism) is the effective and efficient locking-based combination of two ingredients (the first does not exist in both general 2PL and S2PL; the second does not exist in general 2PL): Commitment ordering, which provides both serializability, and effective distributed serializability and global serializability, and Strictness, which provides cascadelessness (ACA, cascade-less recoverability) and (independently) allows efficient databaserecovery from failure. Additionally SS2PL is easier, with less overhead to implement than both 2PL and S2PL, provides exactly same locking, but sometimes releases locks later. However, practically (though not simplistically theoretically) such later lock release occurs only slightly later, and this apparent disadvantage is insignificant and disappears next to the advantages of SS2PL. Thus, the importance of the general Two-phase locking (2PL) is historic only, while Strong strict twophase locking (SS2PL) is practically the important mechanism and resulting schedule property respective properties) that have common schedules, either onecontains the other (strictly contains if they are not equal), or they areincomparable. The containment relationships among the 2PL classes and other major schedule classes are summarized in the following diagram. 2PL and its subclasses are inherently blocking, which means that no optimistic implementations for them exist (and whenever "Optimistic 2PL" is mentioned it refers to a different mechanism with a class that includes also schedules not in the 2PL class).
[edit]Deadlocks in 2PL

Locks block data-access operations. Mutual blocking between transactions results in a deadlock, where execution of these transactions is stalled, and no completion can be reached. Thus deadlocks need to be

resolved to complete these transactions' executions and release related computing resources. A deadlock is a reflection of a potential cycle in the precedence graph, that would occur without the blocking. A deadlock is resolved by aborting a transaction involved with such potential cycle, and breaking the cycle. It is often detected using a wait-for graph (a graph of conflicts blocked by locks from being materialized; conflicts not materialized in the database due to blocked operations are not reflected in the precedence graph and do not affect serializability), which indicates which transaction is "waiting for" lock release by which transaction, and a cycle means a deadlock. Aborting one transaction per cycle is sufficient to break the cycle. Transactions aborted due to deadlock resolution are executed again immediately. In a distributed environment an atomic commitment protocol, typically the Two-phase commit (2PC) protocol, is utilized for atomicity. When recoverable data (data under transaction control) are partitioned among 2PC participants (i.e., each data object is controlled by a single 2PC participant), then distributed (global) deadlocks, deadlocks involving two or more participants in 2PC, are resolved automatically as follows: When SS2PL is effectively utilized in a distributed environment, then global deadlocks due to locking generate voting-deadlocks in 2PC, and are resolved automatically by 2PC (see Commitment ordering (CO), in Exact characterization of voting-deadlocks by global cycles; No reference except the CO articles is known to notice this). For the general case of 2PL, global deadlocks are similarly resolved automatically by the synchronization point protocol of phase-1 end in a distributed transaction (synchronization point is achieved by "voting" (notifying local phase-1 end), and being propagated to the participants in a distributed transaction the same way as a decision point in atomic commitment; in analogy to decision point in CO, a conflicting operation in 2PL cannot happen before phase-1 end synchronization point, with the same resulting voting-deadlock in the case of a global

data-access deadlock; the voting-deadlock (which is also a locking based global deadlock) is automatically resolved by the protocol aborting some transaction involved, with a missing vote, typically using a timeout).

Potrebbero piacerti anche