Advance Concept in Data Bases Unit-1 by Arun Pratap Singh

PREPARED BY ARUN PRATAP SINGH MTECH 2nd SEMESTER
PREPARED BY ARUN PRATAP SINGH 1

1

DBMS CONCEPT INTRODUCTION :
A database is a collection of related data. By data we mean known facts that can be recorded
and that have implicit () meaning.
A database has the following implicit properties :
A database represents some aspect of the real world, sometimes called the miniworld or
the Universe of Discourse ( UoD ).Changes to the miniworld are reflected in the database.
A database is a logically coherent () collection of data with some inherent ()
meaning.
A database is designed, build and populated with data for specific purpose. It has an
intended group of users and some preconceived ( ) application in which
these users are interested.
Data: Known facts that can be recorded and have an implicit meaning.
Database: A collection of related data
Mini-world: Some part of the real world about which data is stored in a database. For
example, student grades in a university.
Database Management System (DBMS): A software package/system to facilitate the
creation and maintenance of a computerized database.
Database System: The DBMS software together with the data itself. Sometimes, the
applications are also included.

Architecture of a DBMS Layers / Three Level Architecture diagram
UNIT : I


2

Architecture of DBMS


3

DATA MODEL :
Mapping cardinalities :
One to one
One to many
Many to one
Many to many

A hierarchical database model is a data model in which the data is organized into a tree-like
structure. The structure allows representing information using parent/child relationships: each
parent can have many children, but each child has only one parent (also known as a 1-to-many
relationship). All attributes of a specific record are listed under an entity type.
In a database an entity type is the equivalent of a table. Each individual record is represented as
a row, and each attribute as a column. Entity types are related to each other using 1:N mappings,
also known as one-to-many relationships. This model is recognized as the first database model
created by IBM in the 1960s.
Currently the most widely used hierarchical databases are IMS developed by IBM and Windows
Registry by Microsoft.
System 2000 from Intel is also a package of Hierarchical Data model.


4


5


6

The network model is a database model conceived ( ) as a flexible way of representing
objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in
which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy
or lattice.


7


8

The relational model is the conceptual basis of relational databases. Proposed by E.F. Codd in
1969, it is a method of structuring data using relations, which are grid-like mathematical structures
consisting of columns and rows. Codd proposed the relational model for IBM, but he had no idea
how extremely vital and influential his work would become as the basis of relational databases.
Most of us are very familiar with the physical manifestation of a relation in a database - it's called
a table.
In the relational model, all data must be stored in relations (tables), and each relation consists of
rows and columns. Each relation must have a header and body. The header is simply the list of
columns in the relation. The body is the set of data that actually populates the relation, organized
into rows. You can extrapolate that the junction of one column and one row will result in a unique
value - this value is called a tuple.
The second major characteristic of the relational model is the usage of keys. These are specially
designated columns within a relation, used to order data or relate data to other relations. One of
the most important keys is the primary key, which is used to uniquely identify each row of data.
To make querying for data easier, most relational databases go further and physically order the
data by the primary key. Foreign keys relate data in one relation to the primary key of another
relation.


9

In relation data model, relations are saved in the format of Tables. This format stores the relation
among entities. A table has rows and columns, where rows represent records and columns
represents the attributes.
Tuple: A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance: A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema: This describes the relation name (table name), attributes and their names.
Relation key: Each row has one or more attributes which can identify the row in the relation
(table) uniquely, is called the relation key.
Attribute domain: Every attribute has some pre-defined value scope, known as attribute domain.
Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions
are called Relational Integrity Constraints. There are three main integrity constraints.
Key Constraints
Domain constraints
Referential integrity constraints
KEY CONSTRAINTS:
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation. If there are more than
one such minimal subsets, these are called candidate keys.


10
Key constraints forces that:
in a relation with a key attribute, no two tuples can have identical value for key attributes.
key attribute can not have NULL values.
Key constrains are also referred to as Entity Constraints.
DOMAIN CONSTRAINTS
Attributes have specific values in real-world scenario. For example, age can only be positive
integer. The same constraints has been tried to employ on the attributes of a relation. Every
attribute is bound to have a specific range of values. For example, age can not be less than zero
and telephone number can not be a outside 0-9.
REFERENTIAL INTEGRITY CONSTRAINTS
This integrity constraints works on the concept of Foreign Key. A key attribute of a relation can be
referred in other relation, where it is called foreign key.
Referential integrity constraint states that if a relation refers to an key attribute of a different or
same relation, that key element must exists.

ER MODEL BASIC CONCEPTS :

Entity relationship model defines the conceptual view of database. It works around real world
entity and association among them. At view level, ER model is considered well for designing
databases.

Now we shall learn how ER Model is represented by means of ER diagram. Every object like
entity, attributes of an entity, relationship set, and attributes of relationship set can be
represented by tools of ER diagram.


11

ENTITY :
A real-world thing either animate or inanimate that can be easily identifiable and distinguishable.
For example, in a school database, student, teachers, class and course offered can be considered
as entities. All entities have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities. Entity set may contain entities with attribute
sharing similar values. For example, Students set may contain all the student of a school; likewise
Teachers set may contain all the teachers of school from all faculties. Entities sets need not to be
disjoint.
Entities are represented by means of rectangles. Rectangles are named with the entity set they
represent.

o A database can be modeled as:
o a collection of entities,
o relationship among entities.
o An entity is an object that exists and is distinguishable from other objects.
o Example: specific person, company, event, plant
o Entities have attributes
o Example: people have names and addresses
o An entity set is a set of entities of the same type that share the same properties.
o Example: set of all persons, companies, trees, holidays


12


13
ATTRIBUTES :

o An entity is represented by a set of attributes, that is descriptive properties
possessed by all members of an entity set.

Attributes are, simply put, the characteristics of entities. Some entities can have many attributes
while others may only have a couple. As well, there are five categories that attributes are classified
in. This simple table will be used to explain how each attribute can be a different type of attribute:

Student (stu_LastName, stu_MiddleName, stu_FirstName, stu_Age, stu_Phone, stu_Email).

Required or Optional Attributes
A required attribute is an attribute that must have a value in it, while an optional attribute may not
have a value in it and can be left blank. The reasoning for making an attribute required is to put
emphasis on what is important in that entity and what makes it stand out from other entities.

Example: Consider the entity Student above; stu_LastName and stu_FirstName would be
required attributes as it uniquely defines that table and we assume all students have a first and
last name. Optional attributes in the table Student could be stu_MiddleName, stu_Email, and
stu_Phone since some students may not have a middle name, a phone number, or an email
address.

Keys and non-keys Attributes
In every entity an attribute or grouped attributes uniquely identify that entity. These attributes are
the key attributes and range from Primary key (single attribute identifier) to a Composite Key (Multi
attribute Identifier). The rest of the attributes after the identifier are considered the non-key
attributes or descriptors, which just describe the entity.

Example: Above in the table Student there is only one unique identifier, stu_LastName, which is
the primary key of the table. The rest of the attributes are descriptors.

Single and Composite Attributes
Attributes can be classified as having many parts to them or just a single unbreakable attribute.
The composite attribute is an attribute that can be subdivided into other single attributes with
meanings of their own. A single attribute is just an attribute that cannot be subdivided into parts.

Example: Imagine from the entity Student that instead of having the three attributes:
stu_LastName, stu_MiddleName, stu_FirstName it had one attribute called stu_Name. The
attribute stu_Name would be considered a composite attribute since it can be subdivided into the
other three attributes: stu_LastName, stu_MiddleName, stu_FirstName. The rest of attributes
would be consider single attributes since they can't be subdivided into parts.

Single-valued and multi-valued Attributes
Attributes can be classified as single or multi value. The single-value attribute can only have one
value, while the multi-valued attributes usually can store multiple data in them.

Example: In the entity Student, stu_Address could be considered a multi-value attribute since a
student could have multiple addresses where he lives at. An example of a single-value attribute
would be stu_LastName since a student usually has one last name that uniquely identifies
him/her.


14

Derived Attributes
The last category that attributes can be defined is called a derived attribute, where one attribute
is calculated from another attribute. The derived attribute may not be stored in the database but
rather calculated using algorithm.

Example: In the entity Student, stu_Age would be considered a derived attribute since it could be
calculated using the student's date of birth with the current date to find their age.

examples of derived attributes are:salary,age or DOB.

IN OTHER WORDS :
Entities are represented by means of their properties, called attributes. All attributes have values.
For example, a student entity may have name, class, age as attributes.
There exist a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
TYPES OF ATTRIBUTES:

Simple attribute:
Simple attributes are atomic values, which cannot be divided further. For example, student's
phone-number is an atomic value of 10 digits.
Composite attribute:
Composite attributes are made of more than one simple attribute. For example, a student's
complete name may have first_name and last_name.
Derived attribute:
Derived attributes are attributes, which do not exist physical in the database, but there values are
derived from other attributes presented in the database. For example, average_salary in a
department should be saved in database instead it can be derived. For another example, age can
be derived from data_of_birth.
Single-valued attribute:
Single valued attributes contain on single value. For example: Social_Security_Number.
Multi-value attribute:
Multi-value attribute may contain more than one values. For example, a person can have more
than one phone numbers, email_addresses etc.
These attribute types can come together in a way like:
simple single-valued attributes


15
simple multi-valued attributes
composite single-valued attributes
composite multi-valued attributes
Attributes are properties of entities. Attributes are represented by means of eclipses. Every
eclipse represents one attribute and is directly connected to its entity (rectangle).

[Image: Simple Attributes]

If the attributes are composite, they are further divided in a tree like structure. Every node is then
connected to its attribute. That is composite attributes are represented by eclipses that are
connected with an eclipse.

[Image: Composite Attributes]

Multivalued attributes are depicted by double eclipse.


16

[Image: Multivalued Attributes]

Derived attributes are depicted by dashed eclipse.

[Image: Derived Attributes]


17
RELATIONSHIP :

A relationship is a defined connection between the rows of two tables. This connection is generally
determined by values in selected columns from a parent table that correspond to values in the
child table.
For example, some database management systems have the following requirements for
relationships:
The parent table must have a primary key that is related to the foreign key in the child
table.
Corresponding columns must have identical data types and attributes.
The association among entities is called relationship. For example, employee entity has relation
works_at with department. Another example is for student who enrolls in some course. Here,
Works_at and Enrolls are called relationship.
RELATIONSHIP SET:
Relationship of similar type is called relationship set. Like entities, a relationship too can have
attributes. These attributes are called descriptive attributes.
DEGREE OF RELATIONSHIP
The number of participating entities in an relationship defines the degree of the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree


18


19


20

One-to-Many Relationship :

A one-to-many relationship is the most common type of relationship. In this type of relationship,
a row in table A can have many matching rows in table B, but a row in table B can have only one
matching row in table A. For example, the publishers and titles tables have a one-to-many
relationship: each publisher produces many titles, but each title comes from only one publisher.
Make a one-to-many relationship if only one of the related columns is a primary key or has a
unique constraint.
The primary key side of a one-to-many relationship is denoted by a key symbol. The foreign key
side of a relationship is denoted by an infinity symbol.

One entity from entity set A can be associated with more than one entities of entity set B but from
entity set B one entity can be associated with at most one entity.


21

[Image: One-to-many relation]
Many-to-Many Relationships :

In a many-to-many relationship, a row in table A can have many matching rows in table B, and
vice versa. You create such a relationship by defining a third table, called a junction table, whose
primary key consists of the foreign keys from both table A and table B. For example,
the authors table and the titles table have a many-to-many relationship that is defined by a one-
to-many relationship from each of these tables to the titleauthors table. The primary key of
the titleauthors table is the combination of the au_id column (the authors table's primary key) and
the title_id column (the titles table's primary key).

One entity from A can be associated with more than one entity from B and vice versa.

[Image: Many-to-many relation]

One-to-One Relationships :

In a one-to-one relationship, a row in table A can have no more than one matching row in table
B, and vice versa. A one-to-one relationship is created if both of the related columns are primary
keys or have unique constraints.


22
This type of relationship is not common because most information related in this way would be
all in one table. You might use a one-to-one relationship to:
Divide a table with many columns.
Isolate part of a table for security reasons.
Store data that is short-lived and could be easily deleted by simply deleting the table.
Store information that applies only to a subset of the main table.
The primary key side of a one-to-one relationship is denoted by a key symbol. The foreign key
side is also denoted by a key symbol.

One entity from entity set A can be associated with at most one entity of entity set B and vice
versa.

[Image: One-to-one relation]

Many-to-One Relationships : More than one entities from entity set A can be associated
with at most one entity of entity set B but one entity from entity set B can be associated
with more than one entity from entity set A.

[Image: Many-to-one relation]


23

Relationships are represented by diamond shaped box. Name of the relationship is written in the
diamond-box. All entities (rectangles), participating in relationship, are connected to it by a line.
BINARY RELATIONSHIP AND CARDINALITY
A relationship where two entities are participating, is called a binary relationship. Cardinality is
the number of instance of an entity from a relation that can be associated with the relation.

One-to-one
When only one instance of entity is associated with the relationship, it is marked as '1'. This image
below reflects that only 1 instance of each entity should be associated with the relationship. It
depicts one-to-one relationship

[Image: One-to-one]
One-to-many
When more than one instance of entity is associated with the relationship, it is marked as 'N'. This
image below reflects that only 1 instance of entity on the left and more than one instance of entity
on the right can be associated with the relationship. It depicts one-to-many relationship

[Image: One-to-many]
Many-to-one
When more than one instance of entity is associated with the relationship, it is marked as 'N'. This
image below reflects that more than one instance of entity on the left and only one instance of
entity on the right can be associated with the relationship. It depicts many-to-one relationship


24

[Image: Many-to-one]
Many-to-many
This image below reflects that more than one instance of entity on the left and more than one
instance of entity on the right can be associated with the relationship. It depicts many-to-many
relationship

[Image: Many-to-many]

PARTICIPATION CONSTRAINTS

Total Participation: Each entity in the entity is involved in the relationship. Total participation is
represented by double lines.
Partial participation: Not all entities are involved in the relation ship. Partial participation is
represented by single line.

[Image: Participation Constraints]


25
ER Model has the power of expressing database entities in conceptual hierarchical manner such
that, as the hierarchical goes up it generalize the view of entities and as we go deep in the
hierarchy it gives us detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to represent
a more generalized view. For example, a particular student named, Mira can be generalized along
with all the students, the entity shall be student, and further a student is person. The reverse is
called specialization where a person is student, and that student is Mira.
Generalization
As mentioned above, the process of generalizing entities, where the generalized entities contain
the properties of all the generalized entities is called Generalization. In generalization, a number
of entities are brought together into one generalized entity based on their similar characteristics.
For an example, pigeon, house sparrow, crow and dove all can be generalized as Birds.

[Image: Generalization]
Specialization
Specialization is a process, which is opposite to generalization, as mentioned above. In
specialization, a group of entities is divided into sub-groups based on their characteristics. Take
a group Person for example. A person has name, date of birth, gender etc. These properties are
common in all persons, human beings. But in a company, a person can be identified as employee,
employer, customer or vendor based on what role do they play in company.

[Image: Specialization]


26
Similarly, in a school database, a person can be specialized as teacher, student or staff; based
on what role do they play in school as entities.
Inheritance
We use all above features of ER-Model, in order to create classes of objects in object oriented
programming. This makes it easier for the programmer to concentrate on what she is
programming. Details of entities are generally hidden from the user, this process known as
abstraction.
One of the important features of Generalization and Specialization, is inheritance, that is, the
attributes of higher-level entities are inherited by the lower level entities.

[Image: Inheritance]
For example, attributes of a person like name, age, and gender can be inherited by lower level
entities like student and teacher etc.


27


28
SYMBOLS AND NOTATION :-


29


30


31


32


33


34
RELATIONAL DATA MODEL :

In the relational model, data is organized in two-dimensional tables called relations. The tables or
relations are, however, related to each other.

In the relational database management system (RDBMS), the data is represented as a set of
relations.

A relation appears as a two-dimensional table. The RDBMS organizes the data so that its external
view is a set of relations or tables. This does not mean that data is stored as tables: the physical
storage of the data is independent of the way in which the data is logically organized.

Relational data model is the primary data model, which is used widely around the world for data
storage and processing. This model is simple and have all the properties and capabilities required
to process data with storage efficiency.


35
Concepts-
Tables: In relation data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represent records and
columns represents the attributes.
Tuple: A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance: A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema: This describes the relation name (table name), attributes and their names.
Relation key: Each row has one or more attributes which can identify the row in the relation
(table) uniquely, is called the relation key.
Attribute domain: Every attribute has some pre-defined value scope, known as attribute domain.
Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions
are called Relational Integrity Constraints. There are three main integrity constraints.
Key Constraints
Domain constraints
Referential integrity constraints
KEY CONSTRAINTS:
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation. If there are more than
one such minimal subsets, these are called candidate keys.
Key constraints forces that:
in a relation with a key attribute, no two tuples can have identical value for key attributes.
key attribute can not have NULL values.
Key constrains are also referred to as Entity Constraints.
DOMAIN CONSTRAINTS
Attributes have specific values in real-world scenario. For example, age can only be positive
integer. The same constraints has been tried to employ on the attributes of a relation. Every
attribute is bound to have a specific range of values. For example, age can not be less than zero
and telephone number can not be a outside 0-9.
REFERENTIAL INTEGRITY CONSTRAINTS
This integrity constraints works on the concept of Foreign Key. A key attribute of a relation can be
referred in other relation, where it is called foreign key.


36
Referential integrity constraint states that if a relation refers to an key attribute of a different or
same relation, that key element must exists.

RELATIONAL ALGEBRA AND RELATIONAL CALCULUS :

Relational database systems are expected to be equipped by a query language that can assist
its user to query the database instances. This way its user empowers itself and can populate the
results as required. There are two kinds of query languages, relational algebra and relational
calculus.
Relational algebra
Relational algebra is a procedural query language, which takes instances of relations as input
and yields instances of relations as output. It uses operators to perform queries. An operator can
be either unary or binary. They accept relations as their input and yields relations as their output.
Relational algebra is performed recursively on a relation and intermediate results are also
considered relations.

Fundamental operations of Relational algebra:
Select
Projection
Union
Set different


37
Cartesian product
Join
Rename
These are defined briefly as follows:
Select Operation () :
The SELECT operator is (sigma) symbol Used as an expression to choose tuples that meet
the selection condition
<selection condition>(R)
-> Select operation selects tuples that satisfy a given predicate.
Ex:- find all employees born after 1st Jan 1950:
dob '01/JAN/1950'(employee)
OR
Selects tuples that satisfy the given predicate from a relation.
Notation p(r)
Where p stands for selection predicate and r stands for relation. p is prepositional logic formulae
which may use connectors like and, or and not. These terms may use relational operators like:
=, , , < , >, .
For example:
subject="database"
Output : Selects tuples from books where subject is 'database'.
subject="database" and price="450"
Output : Selects tuples from books where subject is 'database' and 'price' is 450.
subject="database" and price < "450" or year > "2010"
Output : Selects tuples from books where subject is 'database' and 'price' is 450 or the publication
year is greater than 2010, that is published after 2010.
Project Operation () :
(pi) symbol used to choose attributes from a relation.
This operator shows the list of those attributes that we wish to appear in the result and
rest attributes are eliminated from the table.
<attribute list>(relation)


38
Projects column(s) that satisfy given predicate.
Notation: A1, A2, An (r)
Where a1, a2 , an are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
for example:
subject, author (Books)
Selects and projects columns named as subject and author from relation Books.

Union Operation () :
UNION is symbolized by , and includes all tuples that are in R or in S, eliminating
duplicate tuples, therefore set R UNION set S would be expressed as:
RESULT R S
Union operation performs binary union between two given relations and is defined as:
r s = { t | t r or t s}


39
Notion: r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold:
r, s must have same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
author (Books) author (Articles)
Output : Projects the name of author who has either written a book or an article or both.

Set Difference ( ) :
the MINUS operation includes tuples from one Relation that are not in another Relation
and symbolized by the (minus) symbol. Therefore R S would be expressed as
RESULT R S
The result of set difference operation is tuples which present in one relation but are not in the
second relation.
Notation: r s
Finds all tuples that are present in r but not s.


40
author (Books) author (Articles)
Output: Results the name of authors who has written books but not articles.
EXAMPLE :

Intersection () : The INTERSECTION operation on a relation A INTERSECTION
relation B, is symbolized by R S, includes tuples that are only in R and S.
RESULT R S
EXAMPLE :


41
Cartesian Product () :
Creates a relation that has all the attributes of R and S, allowing all the attainable
combinations of tuples from R and S in the result. The notation used is X.
C = R X S
Combines information of two different relations into one.
Notation: r s
Where r and s are relations and there output will be defined as:
r s = { q t | q r and t s}
author = 'tutorialspoint'(Books Articles)
Output : yields a relation as result which shows all books and articles written by tutorialspoint.
EXAMPLE :

JOIN :
The JOIN operation is denoted by the R|X|S symbol and is used to compound similar
tuples from two Relations into single longer tuples.
Join operation is generally the cross product of two relation.
The notation used is
R JOIN join condition S


42
EXAMPLE :

Types of join :
Natural Join
Outer Join
Natural Join :
The JOIN involves an equality test, and thus is often described as an equi-join. Such
joins result in two attributes in the resulting relation having exactly the same value. A
`natural join' will remove the duplicate attribute(s).
In most systems a natural join will require that the attributes have the same name to
identify the attribute(s) to be used in the join. This may require a renaming mechanism.
If you do use natural joins make sure that the relations do not have two attributes with
the same name by accident.
Outer Join :
There are three forms of the outer join, depending on which data is to be kept.
LEFT OUTER JOIN - keep data from the left-hand table
RIGHT OUTER JOIN - keep data from the right-hand table
FULL OUTER JOIN - keep data from both tables
LEFT & RIGHT OUTER JOIN Example :


43

Full OUTER JOIN Example :

Rename operation ( ) :
Results of relational algebra are also relations but without any name. The rename operation
allows us to rename the output relation. rename operation is denoted with small greek letter rho
Notation: x (E)
Where the result of expression E is saved with name of x.


44

RELATIONAL CALCULUS :
In contrast with Relational Algebra, Relational Calculus is non-procedural query language, that is,
it tells what to do but never explains the way, how to do it.
Relational calculus exists in two forms:
Tuple relational calculus (TRC)
Filtering variable ranges over tuples
Notation: { T | Condition }
Returns all tuples T that satisfies condition.
For Example:
{ T.name | Author(T) AND T.article = 'database' }
Output: returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified also. We can use Existential ( )and Universal Quantifiers ( ).
For example:
{ R| T Authors(T.article='database' AND R.name=T.name)}
Output : the query will yield the same result as the previous one.
Domain relational calculus (DRC)
In DRC the filtering variable uses domain of attributes instead of entire tuple values (as done in
TRC, mentioned above).
Notation:
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example:
{< article, page, subject > | TutorialsPoint subject = 'database'}
Output: Yields Article, Page and Subject from relation TutorialsPoint where Subject is database.
Just like TRC, DRC also can be written using existential and universal quantifiers. DRC also
involves relational operators.


45
Expression power of Tuple relation calculus and Domain relation calculus is equivalent to
Relational Algebra.

NORMALIZATION AND NORMAL FORM :


46


47


48


49


50

DOMAIN TUPLES :

A relation is defined as a set of tuples that have the same attributes. A tuple usually represents
an object and information about that object. Objects are typically physical objects or concepts. A
relation is usually described as a table, which is organized into rows and columns. All the data
referenced by an attribute are in the same domain and conform to the same constraints.


51
The relational model specifies that the tuples of a relation have no specific order and that the
tuples, in turn, impose no order on the attributes. Applications access data by specifying queries,
which use operations such as select to identify tuples, project to identify attributes, and join to
combine relations. Relations can be modified using the insert,delete, and update operators. New
tuples can supply explicit values or be derived from a query. Similarly, queries identify tuples for
updating or deleting.
Tuples by definition are unique. If the tuple contains a candidate or primary key then obviously it
is unique; however, a primary key need not be defined for a row or record to be a tuple. The
definition of a tuple requires that it be unique, but does not require a primary key to be defined.
Because a tuple is unique, its attributes by definition constitute asuperkey.


52

KEY :


53

SCHEMA :

A database schema of a database system is its structure described in a formal
language supported by the database management system (DBMS) and refers to the organization
of data as a blueprint of how a database is constructed (divided into database tables in case
of Relational Databases). The formal definition of database schema is a set of formulas
(sentences) called integrity constraints imposed on a database. These integrity constraints ensure
compatibility between parts of the schema. All constraints are expressible in the same language.
A database can be considered a structure in realization of the database language.
[1]
The states
of a created conceptual schema are transformed into an explicit mapping, the database schema.
This describes how real world entities are modeled in the database.


54

INTEGRITY CONSTRAINTS :

Before one can start to implement the database tables, one must define the integrity constraints.
Intergrity means something like 'be right' and consistent. The data in a database must be right
and in good condition.

There are the domain integrity, the entity integrity, the referential integrity and the foreign key
integrity constraints.

Domain Integrity :-
Domain integrity means the definition of a valid set of values for an attribute. You define
- data type,
- lenght or size
- is null value allowed
- is the value unique or not
for an attribute.

You may also define the default value, the range (values in between) and/or specific values for
the attribute. Some DBMS allow you to define the output format and/or input mask for the
attribute.

These definitions ensure that a specific attribute will have a right and proper value in the
database.
Entity Integrity Constraint
The entity integrity constraint states that primary keys can't be null. There must be a proper
value in the primary key field.

This is because the primary key value is used to identify individual rows in a table. If there were
null values for primary keys, it would mean that we could not indentify those rows.


55
On the other hand, there can be null values other than primary key fields. Null value means that
one doesn't know the value for that field. Null value is different from zero value or space.

In the Car Rental database in the Car table each car must have a proper and unique Reg_No.
There might be a car whose rate is unknown - maybe the car is broken or it is brand new - i.e.
the Rate field has a null value. See the picture below.

The entity integrity constraints assure that a spesific row in a table can be identified.

Picture. Car and CarType tables in the Rent database
Referential Integrity Constraint
The referential integrity constraint is specified between two tables and it is used to maintain the
consistency among rows between the two tables.

The rules are:
1. You can't delete a record from a primary table if matching records exist in a related table.
2. You can't change a primary key value in the primary table if that record has related records.
3. You can't enter a value in the foreign key field of the related table that doesn't exist in the
primary key of the primary table.
4. However, you can enter a Null value in the foreign key, specifying that the records are
unrelated.

Examples

Rule 1. You can't delete any of the rows in the CarType table that are visible in the picture since
all the car types are in use in the Car table.

Rule 2. You can't change any of the model_ids in the CarType table since all the car types are


56
in use in the Car table.

Rule 3. The values that you can enter in the model_id field in the Car table must be in the
model_id field in the CarType table.

Rule 4. The model_id field in the Car table can have a null value which means that the car type
of that car in not known
Foreign Key Integrity Constraint
There are two foreign key integrity constraints: cascade update related fields and cascade
delete related rows. These constraints affect the referential integrity constraint.

Cascade Update Related Fields

Any time you change the primary key of a row in the primary table, the foreign key values are
updated in the matching rows in the related table. This constraint overrules rule 2 in the
referential integrity constraints.

If this contraint is defined in the relationship between the tables Car and CarType, it is possible
to change the model_id in the CarType table. If one should change the model_id 1 (Ford Focus)
to model_id 100 in the CarType table, the model_ids in the Car table would change from 1 to
100 (cars ABC-112, ABC-122, ABC-123).

Cascade Delete Related Rows

Any time you delete a row in the primary table, the matching rows are automatically deleted in
the related table. This constraint overrules rule 1 in the referential integrity constraints.

If this contraint is defined in the relationship between the tables Car and CarType, it is possible
to delete rows from the CarType table. If one should delete the Ford Focus row from the
CarType table, the cars ABC-112, ABC-122, ABC-123 would be deleted from the Car table, too.
Source: Gillette Cynthia. 2001. MSCE SQL 2000 Database Design. Chapter 2: Data Modelling.
Coriolis Group.

SOME SOLVED QUESTIONS

Q.1 Why is the normalisation process necessary for a good database design ? Discuss in
detail the boyce-codd normal form with suitable example.

Ans :
Normalisation is necessary for a good database design because of following reasons:
It eliminates data redundancy. Same data do not occur in more than one places.
By making use of normalisation query process is easy.
Data entry time is saved as the tables are broken down in repeating and not repeating
fields.
Data modification easy.


57
Database becomes more flexible
Inconsistent dependency is eliminatedtion is made.

Boyce-codd normal form :

BoyceCodd normal form (or BCNF or 3.5NF) is a normal form used in database normalization.
It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974
by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly not dealt with by
3NF as originally defined.
If a relational schema is in BCNF then all redundancy based on functional dependency has
been removed, although other types of redundancy may still exist. A relational schemaR is in
BoyceCodd normal form if and only if for every one of its dependencies X Y, at least one of
the following conditions hold:
[2]

X Y is a trivial functional dependency (Y X)
X is a superkey for schema R

Example :

Patient No Patient Name Appointment Id Time Doctor
1 John 0 09:00 Zorro
2 Kerr 0 09:00 Killer
3 Adam 1 10:00 Zorro
4 Robert 0 13:00 Killer
5 Zane 1 14:00 Zorro
Lets consider the database extract shown above. This depicts a special dieting clinic where the
each patient has 4 appointments. On the first they are weighed, the second they are exercised,
the third their fat is removed by surgery, and on the fourth their mouth is stitched closed Not all
patients need all four appointments! If the Patient Name begins with a letter before P they get a
morning appointment, otherwise they get an afternoon appointment. Appointment 1 is either 09:00
or 13:00, appointment 2 10:00 or 14:00, and so on. From this (hopefully) make-believe scenario
we can extract the following determinants:

DB(Patno,PatName,appNo,time,doctor)
Patno -> PatName
Patno,appNo -> Time,doctor
Time -> appNo


58
Now we have to decide what the primary key of DB is going to be. From the information we
have, we could chose:
DB(Patno,PatName,appNo,time,doctor) (example 1a)
Q . 2 Explain the basic relational algebra operations with the symbol used and anexample
for each.
Ans : Explained above.
Q . 3 What is an ER diagram ? Construct an ER diagram for a hospital with a set of patients
and set of doctors. associate with each patients and a set of doctors associate with each
patient a log of the various tests and examination conducted.
Ans : ER diagram : Explained above.
ER diagram for a hospital with a set of patients and set of doctors. associate with each patients
and a set of doctors associate with each patient a log of the various tests and examination
conducted :

Q . 4 Construct an E-R diagram for a car-insurance company whose customers own one
or more cars each. Each car has associated with it zero to any number of recorded
accidents.

Ans :


59

Q . 5 Construct an E-R diagram for the registrar's office. Document all assumptions you
make about the mapping constraints.

Ans :

Assumptions:
o A class meets only at one particular place and time. This diagram does not attempt
to model a class meeting at different places or at different times
o There is no guarantee that the database does not have two classes meeting at the
same place and time
o Each class has a unique instructor

Q . 6 Consider a database used to record the marks that students get in different exams of
different course offerings.


60
(a) Construct an E-R diagram for the database modeling exams as entities and using a
ternary relationship
(b)

(c) Construct an alternative E-R diagram that uses only a binary relationship
between students and course-offerings. Make sure that only one relationship exists
between a particular student and course-offering pair, yet you can represent the
marks that a student gets in different exams of a course offering

Q . 7 Explain the difference between physical and logical data independence.

Ans :

Physical data independence is the ability to modify the physical schema without making it
necessary to rewrite application programs. E.G., changing from unblocked to blocked
record storage, or from sequential to random-access files.


61
Logical data independence is the ability to modify the conceptual schema without making
it necessary to rewrite application programs. E.G., adding a new field to a record. An
application program's view hides this change from the program.

Q . 8 Explain the distinctions among the terms primary key, candidate key, and superkey.

Ans :

A superkey is any set of attributes such that the values of the attributes (taken together)
uniquely identify one entity in the entity set.
A candidate key is a minimal superkey -- a superkey with no redundant attributes. In other
words, if any one of the attributes is removed, the set of attributes that remain no longer
form a superkey.
A primary key is one of the candidate keys, designated by the database designer.
Every primary key is also a candidate key; every candidate key is also a superkey, but not
vice versa.

Q . 9 What are the database design challenges ?

Ans :

Database designers often must make design compromises that are triggered by conflicting
goals, such as adherence to design standards (design elegance), processing speed, and
information requirements.

Design standards:
The database design must conform to design standards. Such standards have guided you in
developing logical structures that minimize data redundancies, thereby minimizing the likelihood
that destructive data anomalies will occur. You have also learned how standards prescribe
avoiding nulls to the greatest extent possible. In short, design standards allow you to work with
well-defined components and to evaluate the interaction of those components with some
precision. Without design standards, it is nearly impossible to formulate a proper
design process, to evaluate an existing design, or to trace the likely logical impact of changes in
design.

Processing speed:
In many organizations, particularly those generating large numbers of transactions, high
processing speeds are often a top priority in database design. High processing speed means
minimal access time, which may be achieved by minimizing the number and complexity of
logically desirable relationships. For example, a perfect design might use a 1:1 relationship to
avoid nulls, while a higher transaction-speed design might combine the two tables to avoid the
use of an additional relationship, using dummy entries to avoid the nulls. If the focus is on data-
retrieval speed, you might also be forced to include derived attributes in the design.

Information requirements:
The quest for timely information might be the focus of database design. Complex information
requirements may dictate data transformations, and they may expand the number of entities
and attributes within the design. Therefore, the database may have to sacrifice some of its
clean design structures and/or some of its high transaction speed to ensure maximum
information generation.


62
A design that meets all logical requirements and design conventions is an important goal.
However, if this perfect design fails to meet the customers transaction speed and/or information
requirements, the designer will not have done a proper job from the end users point of view.
Compromises are a fact of life in the real world of database design.

Even while focusing on the entities, attributes, relationships, and constraints, the designer
should begin thinking about end-user requirements such as performance, security, shared
access, and data integrity. The designer must consider processing requirements and verify that
all update, retrieval, and deletion options are available. Finally, a design is of little value unless
the end product is capable of delivering all specified query and reporting requirements.

Finally, document, document, and document! Put all design activities in writing. Then review
what youve written. Documentation not only helps you stay on track during the design process,
but also enables you (or those following you) to pick up the design thread when the time comes
to modify the design. Although the need for documentation should be obvious, one of the most
vexing problems in database and systems analysis work is that the put it in writing rule is often
not observed in all of the design and implementation stages. The development of organizational
documentation standards is a very important aspect of ensuring data compatibility and
coherence.

Q . 10 What is an entity super type and sub type and why it is used ?

Ans :


63

Q . 11 Explain PJNF with example.

Ans :

Fifth normal form (5NF), also known as project-join normal form (PJ/NF) is a level
of database normalization designed to reduce redundancy in relational databases recording
multi-valued facts by isolating semantically related multiple relationships. A table is said to be in
the 5NF if and only if every non-trivial join dependency in it is implied by the candidate keys.
A join dependency *{A, B, Z} on R is implied by the candidate key(s) of R if and only if each
of A, B, , Z is a superkey for R.


64
Consider the following example:
Traveling Salesman Product Availability By Brand
Traveling Salesman Brand Product Type
Jack Schneider Acme Vacuum Cleaner
Jack Schneider Acme Breadbox
Willy Loman Robusto Pruning Shears
Willy Loman Robusto Vacuum Cleaner
Willy Loman Robusto Breadbox
Willy Loman Robusto Umbrella Stand
Louis Ferguson Robusto Vacuum Cleaner
Louis Ferguson Robusto Telescope
Louis Ferguson Acme Vacuum Cleaner
Louis Ferguson Acme Lava Lamp
Louis Ferguson Nimbus Tie Rack


65
The table's predicate is: Products of the type designated by Product Type, made by the brand
designated by Brand, are available from the traveling salesman designated byTraveling
Salesman.
In the absence of any rules restricting the valid possible combinations of Traveling Salesman,
Brand, and Product Type, the three-attribute table above is necessary in order to model the
situation correctly.
Suppose, however, that the following rule applies: A Traveling Salesman has certain Brands and
certain Product Types in his repertoire. If Brand B1 and Brand B2 are in his repertoire, and Product
Type P is in his repertoire, then (assuming Brand B1 and Brand B2 both make Product Type P),
the Traveling Salesman must offer products of Product Type P those made by Brand B1 and
those made by Brand B2.
In that case, it is possible to split the table into three:
Product Types By Traveling Salesman
Traveling Salesman Product Type
Jack Schneider Vacuum Cleaner
Jack Schneider Breadbox
Willy Loman Pruning Shears
Willy Loman Vacuum Cleaner
Willy Loman Breadbox
Willy Loman Umbrella Stand
Louis Ferguson Telescope


66
Louis Ferguson Vacuum Cleaner
Louis Ferguson Lava Lamp
Louis Ferguson Tie Rack
Brands By Traveling Salesman
Traveling Salesman Brand
Jack Schneider Acme
Willy Loman Robusto
Louis Ferguson Robusto
Louis Ferguson Acme
Louis Ferguson Nimbus
Product Types By Brand
Brand Product Type
Acme Vacuum Cleaner
Acme Breadbox


67
Acme Lava Lamp
Robusto Pruning Shears
Robusto Vacuum Cleaner
Robusto Breadbox
Robusto Umbrella Stand
Robusto Telescope
Nimbus Tie Rack
In this case, it's impossible for Louis Ferguson to refuse to offer Vacuum Cleaners made by ACME
(assuming ACME makes Vacuum Cleaners) if he sells anything else made by Acme (Lava Lamp)
and he also sells Vacuum Cleaners made by any other brand (Robusto).
Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling
Robusto's products Breadboxes and Vacuum Cleaners. In the previous setup we would have to
add two new entries one for each product type (<Jack Schneider, Robusto, Breadboxes>, <Jack
Schneider, Robusto, Vacuum Cleaners>). With the new setup we need to add only a single entry
(<Jack Schneider, Robusto>)in Brands By Traveling Salesman.

Q . 12 What are the different components of DBMS ? Draw ER diagram for PG student
registration system for RGPV University.
Ans :
A database management system (DBMS) consists of several components. Each component
plays very important role in the database management system environment. The major
components of database management system are:
Software
Hardware


68
Data
Procedures
Database Access Language
Software
The main component of a DBMS is the software. It is the set of programs used to handle the
database and to control and manage the overall computerized database
1. DBMS software itself, is the most important software component in the overall
system
2. Operating system including network software being used in network, to share the
data of database among multiple users.
3. Application programs developed in programming languages such as C++, Visual
Basic that are used to to access database in database management system. Each
program contains statements that request the DBMS to perform operation on database.
The operations may include retrieving, updating, deleting data etc . The application
program may be conventional or online workstations or terminals.
Hardware
Hardware consists of a set of physical electronic devices such as computers (together with
associated I/O devices like disk drives), storage devices, I/O channels, electromechanical devices
that make interface between computers and the real world systems etc, and so on. It is impossible
to implement the DBMS without the hardware devices, In a network, a powerful computer with
high data processing speed and a storage device with large storage capacity is required as
database server.
Data
Data is the most important component of the DBMS. The main purpose of DBMS is to process
the data. In DBMS, databases are defined, constructed and then data is stored, updated and
retrieved to and from the databases. The database contains both the actual (or operational) data
and the metadata (data about data or description about data).
Procedures
Procedures refer to the instructions and rules that help to design the database and to use the
DBMS. The users that operate and manage the DBMS require documented procedures on hot
use or run the database management system. These may include.
1. Procedure to install the new DBMS.
2. To log on to the DBMS.
3. To use the DBMS or application program.
4. To make backup copies of database.
5. To change the structure of database.
6. To generate the reports of data retrieved from database.
Database Access Language
The database access language is used to access the data to and from the database. The users
use the database access language to enter new data, change the existing data in database and


69
to retrieve required data from databases. The user write a set of appropriate commands in a
database access language and submits these to the DBMS. The DBMS translates the user
commands and sends it to a specific part of the DBMS called the Database Jet Engine. The
database engine generates a set of results according to the commands submitted by user,
converts these into a user readable form called an Inquiry Report and then displays them on the
screen. The administrators may also use the database access language to create and maintain
the databases.
The most popular database access language is SQL (Structured Query Language). Relational
databases are required to have a database query language.
Users
The users are the people who manage the databases and perform different operations on the
databases in the database system.There are three kinds of people who play different roles in
database system

1. Application Programmers
2. Database Administrators
3. End-Users
Application Programmers
The people who write application programs in programming languages (such as Visual Basic,
Java, or C++) to interact with databases are called Application Programmer.
Database Administrators
A person who is responsible for managing the overall database management system is called
database administrator or simply DBA.
End-Users
The end-users are the people who interact with database management system to perform
different operations on database such as retrieving, updating, inserting, deleting data etc.
ER diagram for PG student registration system for RGPV University :


70

Advance Concept in Data Bases Unit-1 by Arun Pratap Singh

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Advance Concept in Data Bases Unit-1 by Arun Pratap Singh

Caricato da

Copyright:

Formati disponibili

PREPARED BY ARUN PRATAP SINGH MTECH 2nd SEMESTER

PREPARED BY ARUN PRATAP SINGH 1

Potrebbero piacerti anche