Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
A Database Management System allows a person or user to organize, store, and retrieve data from a
computer. A database, as a collection of information, can be organized so a Database Management System
can access and pull specific information.
A DBMS is a software that allows creation, definition and manipulation of database, allowing users to store,
process and analyse data easily. DBMS provides us with an interface or a tool, to perform various operations
like creating database, storing data in it, updating data, creating tables in the database and a lot more. DBMS
also provides protection and security to the databases. It also maintains data consistency in case of multiple
users. Some examples of popular DBMS used these days:
MySql
Oracle
SQL Server
IBM DB2
PostgreSQL
Amazon SimpleDB (cloud based) etc.
Applications where we use Database Management Systems are:
Telecom: There is a database to keeps track of the information regarding calls made, network
usage, customer details etc. Without the database systems it is hard to maintain that huge amount
of data that keeps updating every millisecond.
Industry: Where it is a manufacturing unit, warehouse or distribution centre, each one needs a
database to keep the records of ins and outs. For example distribution centre should keep a track of
the product units that supplied into the centre as well as the products that got delivered out from the
distribution centre on each day; this is where DBMS comes into picture.
Banking System: For storing customer info, tracking day to day credit and debit transactions,
generating bank statements etc. All this work has been done with the help of Database
management systems.
Education sector: Database systems are frequently used in schools and colleges to store and
retrieve the data regarding student details, staff details, course details, exam details, payroll data,
attendance details, fees details etc. There is a hell lot amount of inter-related data that needs to be
stored and retrieved in an efficient manner.
Online shopping: You must be aware of the online shopping websites such as Amazon, Flipkart
etc. These sites store the product information, your addresses and preferences, credit details and
provide you the relevant list of products based on your query. All this involves a Database
management system.
Components of DBMS:
The database management system can be divided into five major components, they are:
1. Hardware
2. Software
3. Data
4. Procedures
5. Database Access Language
Let's have a simple diagram to see how they all fit together to form a database management system.
DBMS Components: Hardware
When we say Hardware, we mean computer, hard disks, I/O channels for data, and
any other physical component involved before any data is successfully stored into
the memory.
When we run Oracle or MySQL on our personal computer, then our computer's Hard
Disk, our Keyboard using which we type in all the commands, our computer's RAM,
ROM all become a part of the DBMS hardware.
DBMS Components: Software
This is the main component, as this is the program which controls everything. The
DBMS software is more like a wrapper around the physical database, which provides
us with an easy-to-use interface to store, access and update data.
The DBMS software is capable of understanding the Database Access Language
and intrepret it into actual database commands to execute them on the DB.
DBMS Components: Data
Data is that resource, for which DBMS was designed. The motive behind the
creation of DBMS was to store and utilise data.
In a typical Database, the user saved Data is present and meta data is stored.
Metadata is data about the data. This is information stored by the DBMS to better
understand the data stored in it.
For example: When I store my Name in a database, the DBMS will store when the
name was stored in the database, what is the size of the name, is it stored as related
data to some other data, or is it independent, all this information is metadata.
DBMS Components: Procedures
Procedures refer to general instructions to use a database management system.
This includes procedures to setup and install a DBMS, To login and logout of DBMS
software, to manage databases, to take backups, generating reports etc.
DBMS Components: Database Access Language
Database Access Language is a simple language designed to write commands to
access, insert, update and delete data stored in any database.
A user can write commands in the Database Access Language and submit it to the
DBMS for execution, which is then translated and executed by the DBMS.
User can create new databases, tables, insert data, fetch stored data, update data
and delete the data using the access language.
Abstraction is one of the main features of database systems. Hiding irrelevant details from user and
providing abstract view of data to users, helps in easy and efficient user-database interaction.
To understand the view of data, you must have a basic knowledge of data abstraction and instance &
schema. Refer these two tutorials to learn them in detail.
1. Instance and schema
2. Data abstraction
The difference between Schema and Instance are given in below table:
Schema Instance
Definition The overall logical design of the database. The collection of information stored in the
database at a particular moment.
Includes Table names,column names, datatypes, and size of Actual data or information stored in tables
columns, various constraint at logical level. in form of different records.
Change changes infrequently. changes frequently.
Cause of insertion of tables or columns and change in insert,delete or update operation on data
change datatype,size or constraints on any column. stored in database.
Analogy variable declaration value of the variable
A database schema can be divided broadly into two categories –
Physical Database Schema − this schema pertains to the actual storage of data and its form of
storage like files, indices, etc. It defines how the data will be stored in a secondary storage.
Logical Database Schema − this schema defines all the logical constraints that need to be
applied on the data stored. It defines tables, views, and integrity constraints.
These three levels provide data abstraction which means hide the low level complexities from end
users. A database system should be efficient in performance and convenient in use. Using these
three levels, it is possible to use complex structures at internal level for efficient operations and to
provide simpler convenient interface at external level.
1. Internal level:
This is the lowest level of data abstraction.
It describes how the data are actually stored on storage devices.
It is also known as physical level.
It provides internal view of physical storage of data.
It deals with complex low level data structures, file structures and access methods in detail.
It also deals with Data Compression and Encryption techniques, if used.
2. Conceptual level:
This is the next higher level than internal level of data abstraction.
It describes what data are stored in the database and what relationships exist among those
data.
It is also known as logical level.
It hides low level complexities of physical storage.
Database administrator and designers work at this level to determine what data to keep in
database.
Application developers also work on this level.
3. External Level:
This is the highest level of data abstraction.
It describes only part of the entire database that a end user concern.
It is also known as a view level.
End users need to access only part of the database rather than entire database.
Different user needs different views of database and so, there can be many view level
abstractions of the same database.
We have namely two levels of data independence arising from these levels of abstraction:
1. Physical level data independence:
It refers to the characteristic of being able to modify the physical schema without any
alterations to the conceptual or logical schema, done for optimization purposes, e.g.,
Conceptual structure of the database would not be affected by any change in storage size of
the database system server. Changing from sequential to random access files is one such
example. These alterations or modifications to the physical structure may include:
Utilizing new storage devices.
Modifying data structures used for storage.
Altering indexes or using alternative file organization techniques etc.
2. Logical level data independence:
It refers characteristic of being able to modify the logical schema without affecting the
external schema or application program. The user view of the data would not be affected by
any changes to the conceptual view of the data. These changes may include insertion or
deletion of attributes, altering table structures entities or relationships to the logical schema
etc.
Mappings
Process of transforming request and results between three levels it's called Mapping. There are the two
types of mappings:
Conceptual/Internal Mapping
External/Conceptual Mapping
1. Conceptual/Internal Mapping:
The conceptual/internal mapping defines the correspondence between the conceptual view and
the store database.
It specifies how conceptual record and fields are represented at the internal level.
It relates conceptual schema with internal schema.
If structure of the store database is changed.
If changed is made to the storage structure definition-then the conceptual/internal mapping must
be changed accordingly, so that the conceptual schema can remain invariant.
There could be one mapping between conceptual and internal levels.
2. External/Conceptual Mapping:
The external/conceptual mapping defines the correspondence between a particular external view
and conceptual view.
It relates each external schema with conceptual schema.
The differences that can exist between these two levels are analogous to those that can exist
between the conceptual view and the stored database.
Example: fields can have different data types; fields and record name can be changed; several
conceptual fields can be combined into a single external field.
Any number of external views can exist at the same time; any number of users can share a given
external view: different external views can overlap.
There could be several mapping between external and conceptual levels.
Structure of DBMS:
DBMS (Database Management System) acts as an interface between the user and the database.
The user requests the DBMS to perform various operations such as insert, delete, update and
retrieval on the database.
The components of DBMS perform these requested operations on the database and provide
necessary data to the users.
Components of a DBMS
The components of DBMS can be divided into two parts:
DDL Compiler:
Data Description Language compiler processes schema definitions specified in the DDL.
It includes metadata information such as the name of the files, data items, storage
details of each file, mapping information and constraints etc.
DML Compiler and Query optimizer:
The DML commands such as insert, update, delete, retrieve from the application
program are sent to the DML compiler for compilation into object code for database
access.
The object code is then optimized in the best way to execute a query by the query
optimizer and then send to the data manager.
Data Manager:
The Data Manager is the central software component of the DBMS also knows as
Database Control System.
The Main Functions Of Data Manager Are:
o Convert operations in user's Queries coming from the application programs or
combination of DML Compiler and Query optimizer which is known as Query
Processor from user's logical view to physical file system.
o Controls DBMS information access that is stored on disk.
o It also controls handling buffers in main memory.
o It also enforces constraints to maintain consistency and integrity of the data.
o It also synchronizes the simultaneous operations performed by the concurrent
users.
o It also controls the backup and recovery operations.
Data Dictionary:
Data Dictionary, which stores metadata about the database, in particular the schema of
the database.
Names of the tables, names of attributes of each table, length of attributes, and number
of rows in each table.
Detailed information on physical database design such as storage structure, access
paths, files and record sizes.
Usage statistics such as frequency of query and transactions.
Data dictionary is used to actually control the data integrity, database operation and
accuracy. It may be used as a important part of the DBMS
Data Files:
Which store the database itself.
Indices:
A database index is a data structure that improves the speed of data retrieval
operations on a database table at the cost of additional writes and storage space to
maintain the index data structure.
Indexes are used to quickly locate data without having to search every row in a
database table every time a database table is accessed.
Indexes can be created using one or more columns of a database table, providing the
basis for both rapid random lookups and efficient access of ordered records.`
Compiled DML:-
The DML complier converts the high level Queries into low level file access commands
known as compiled DML.
Database Users: - Database users are the one who really use and take the benefits of
database. There will be different types of users depending on their need and way of accessing
the database.
Sophisticated Users - They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to request
the database. They directly interact with the database by means of query language like
SQL. These users will be scientists, engineers, analysts who thoroughly study SQL and
DBMS to apply the concepts in their requirement. In short, we can say this category
includes designers and developers of DBMS and SQL.
Specialized Users - These are also sophisticated users, but they write special database
application programs. They are the developers who develop the complex programs to
the requirement.
Naive Users - these are the users who use the existing application to interact with the
database. For example, online library system, ticket booking systems, ATMs etc which
has existing application and users use them to interact with the database to fulfill their
requests.
Database Administration - Person in the organization who controls the design and the
use of the database refers as DBA.
o Schema Definition:
The DBA definition the logical Schema of the database.A Schema refers
to the overall logical structure of the database.
According to this schema, database will be developed to store required
data for an organization.
o Storage Structure and Access Method Definition:
The DBA decides how the data is to be represented in the stored
database.
o Assisting Application Programmers:
The DBA provides assistance to application programmers to develop
application programs.
o Physical Organization Modification:
The DBA modifies the physical organization of the database to reflect the
changing needs of the organization or to improve performance.
o Approving Data Access:
The DBA determines which user needs access to which part of the
database.
According to this, various types of authorizations are granted to different
users.
o Monitoring Performance:
The DBA monitors performance of the system. The DBA ensures that
better performance is maintained by making changes in physical or logical
schema if required.
o Backup and Recovery:
Database should not be lost or damaged.
The DBA ensures this periodically backing up the database on magnetic
tapes or remote servers.
In case of failure, such as virus attack database is recovered from this
backup.
Query Processor Units: - Executes low level instructions generated by DML compiler.
o DDL Interpreter
o DML Compiler
o Embedded DML Pre-compiler
o Query Evaluation Engine
Storage Manager Units
o Checks the authority of users to access data.
o Checks for the satisfaction of the integrity constraints.
o Preserves atomicity and controls concurrency.
o Manages allocation of space on disk.
o Fetches data from disk storage to memory for being used:
Authorization Manager
Integrity Manager
Transaction Manager
File manager
Buffer Manager
Functions of DBMS:
o DBMS free the programmers from the need to worry about the organization and
location of the data i.e. it shields the users from complex hardware level details.
o DBMS can organize process and present data elements from the database. This
capability enables decision makers to search and query database contents in order
to extract answers that are not available in regular Reports.
o Programming is speeded up because programmer can concentrate on logic of the
application.
o It includes special user friendly query languages which are easy to understand by
non programming users of the system.
The service provided by the DBMS includes :-
o Authorization services like log on to the DBMS start the database stop the Database
etc.
o Transaction supports like Recovery, Rollback etc,
o Import and Export of Data.
o Maintaining data dictionary
o User's Monitoring.
Database Languages:-
Database Languages are used to create and maintain database on computer. Database languages
are used for read, update and store data in a database. There are several such languages that can
be used for this purpose; one of them is SQL (Structured Query Language).The four main
categories of SQL statements are as follows:
1. DML (Data Manipulation Language)
2. DDL (Data Definition Language)
3. DCL (Data Control Language)
4. TCL (Transaction Control Language)
Data Definition Language (DDL): DDL is used for specifying the database schema. Let’s take
SQL for instance to categorize the statements that comes under DDL.
To create the database instance – CREATE
To alter the structure of database – ALTER
To drop database instances – DROP
To delete tables in a database instance – TRUNCATE
To rename database instances – RENAME
Data Manipulation Language (DML): DML is used for accessing and manipulating data in a
database.
To read records from table(s) – SELECT
To insert record(s) into the table(s) – INSERT
Update the data in table(s) – UPDATE
Delete all the records from the table – DELETE
Data Control language (DCL): DCL is used for granting and revoking user access on a database
–
To grant access to user – GRANT
To revoke access from user – REVOKE
TCL (Transaction Control Language): TCL statements allow you to control and manage
transactions to maintain the integrity of data within SQL statements.
To opens a transaction – BEGIN
To commits a transaction – COMMIT
To ROLLBACK a transaction in case of any error - ROLLBACK
Data Administrator and Database administrator:-
DA (Data Administrator) and DBA (Database Administrator) both are responsible for managing
database for an organization. They differ from each other in their required skills and
responsibilities.
DBA Responsibilities:
The following sections examine the responsibilities of the database administrator and how they
translate to various Microsoft SQL Server tasks.
Installing and Upgrading an SQL Server
The DBA is responsible for installing SQL Server or upgrading an existing SQL Server. In
the case of upgrading SQL Server, the DBA is responsible for ensuring that if the upgrade is
not successful, the SQL Server can be rolled back to an earlier release until the upgrade
issues can be resolved.
The DBA is also responsible for applying SQL Server service packs. A service pack is not a
true upgrade, but an installation of the current version of software with various bug fixes and
patches that have been resolved since the product's release.
Monitoring the Database Server's Health and Tuning Accordingly
Monitoring the health of the database server means making sure that the following is done:
The server is running with optimal performance.
The error log or event log is monitored for database errors.
Databases have routine maintenance performed on them, and the overall system has
periodic maintenance performed by the system administrator.
Data Models
“A data model is a way of finding the tools for both business and IT professionals, which uses a set
of symbols and text to precisely explain a subset of real information to improve communication
within the organization and thereby lead to a more flexible and stable application environment”.
A data model is an idea which describes how the data can be represented and accessed from
software system after its complete implementation.
It is a simple abstraction of complex real world data gathering environment.
It defines data elements and relationships among various data elements for a specified system.
The main purpose of data model is to give an idea that how final system or software will look like
after development is completed.
Depending on the levels of data we are modeling, we have divided data models into 3 categories –
Object Based, Physical and Record based Data models.
ER model is a graphical representation of real world objects with their attributes and
relationship. It makes the system easily understandable. This model is considered as a top
down approach of designing a requirement.
Advantages
It makes the requirement simple and easily understandable by representing simple diagrams.
One can covert ER diagrams into record based data model easily.
Easy to understand ER diagrams
Disadvantages
No standard notations are available for ER diagram. There is great flexibility in the notation. It’s all
depends upon the designer, how he draws it.
It is meant for high level designs. We cannot simplify for low level design like coding.
What about all the employees above? They too have all the attributes what a person has. In
addition, they have their EMPLOYEE_ID, EMPLOYEE_TYPE and DEPARTMENT_ID attributes
to identify them in the organization and their department. We have to retrieve their department
details, and hence we sp_getDeptDetails procedure. Currently, say we need to have only these
attributes and functionality.
Since all employees inherit the attributes and functionalities of Person, we can re-use those
features in Employee. But do we do that? We group the features of person together into class.
Hence a class has all the attributes and functionalities. For example, we would create a person
class and it will have name, address, age and phone as its attribute, and sp_getAddress and
sp_getPhone as procedures in it. The values for these attributes at any instance of time are
object. i.e. ; {John, Troy, 25, 2453545 : sp_getAddress (John), sp_getPhone (John)} forms on
person object. {Mathew, Fraser Town, 28, 5645677: sp_getAddress (Mathew), sp_getPhone
(Mathew} forms another person object.
Now, we will create another class called Employee which will inherit all the functionalities of
Person class. In addition it will have attributes EMPLOYEE_ID, EMPLOYEE_TYPE and
DEPARTMENT_ID, and sp_getDeptDetails procedure. Different objects of Employee class are
Engineer, Accountant, Manager and Clerk.
Here we can observe that the features of Person are available only if other class is inherited
from it. It would be a black box to any other classes. This feature of this model is called
encapsulation. It binds the features in one class and hides it from other classes. It is only visible
to its objects and any inherited classes.
Advantages
Because of its inheritance property, we can re-use the attributes and functionalities. It reduces the
cost of maintaining the same data multiple times. Also, these informations are encapsulated and,
there is no fear being misused by other objects. If we need any new feature we can easily add
new class inherited from parent class and adds new features. Hence it reduces the overhead
and maintenance costs.
Because of the above feature, it becomes more flexible in the case of any changes.
Codes are re-used because of inheritance.
Since each class binds its attributes and its functionality, it is same as representing the real world
object. We can see each object as a real entity. Hence it is more understandable.
Disadvantages
It is not widely developed and complete to use it in the database systems. Hence it is not accepted
by the users.
It is an approach for solving the requirement. It is not a technology. Hence it fails to put it in the
database management systems.
Physical based Data Models
Physical data model represent the model where it describes how data are stored in
computer memory, how they are scattered and ordered in the memory, and how they would
be retrieved from memory. Basically physical data model represents the data at data layer
or internal layer. It represents each table, their columns and specifications, constraints like
primary key, foreign key etc. It basically represents how each tables are built and related to
each other in DB.
Above diagram shows how physical data model is designed. It is represented as UML
diagram along with table and its columns. Primary key is represented at the top. The
relationship between the tables is represented by interconnected arrows from table to table.
Above STUDENT table is related to CLASS and SUBJECT is related to CLASS. The above
diagram depicts CLASS as the parent table and it has 2 child tables – STUDENT and
SUBJECT.
In short we can say a physical data model has
o Tables and its specifications – table names and their columns. Columns are
represented along with their datatypes and size. In addition primary key of each
table is shown at the top of the column list.
o Foreign keys are used to represent the relationship between the tables. Mapping
between the tables are represented using arrows between them.
o Physical data model can have demoralized structure based on the user requirement.
The tables might not be in normalized forms.
Physical data model is dependent on the RDBMS i.e.; it varies based on the RDBMS used.
This means data type notation varies depending on the RDBMS. For example, we have
different datatypes in SQL server and oracle server. In addition, the representation of
physical data model diagram may be different, though it contains same information as
described above – some may represent primary key and foreign keys separately at the end
of the column list. This data model depends on the user / designer how he specifies the
diagram and the RDBMS servers. Below diagram shows different ways of representing a
table.
Hence object based data model is based on the real requirement from the user, whereas
record based data model is based on the actual relationships and data in DB. The Physical
data model is based on the table structure in the DB.
It can also be imagined as root like structure. This model will have only one main root. It then
branches into sub-roots, each of which will branch again. This type of relationship is best
defined for 1:N type of relationships. E.g.; One company has multiple departments (1:N), one
company has multiple suppliers (1:N),one department has multiple employees (1:N), each
department has multiple projects(1:N) . If we have M:N relationships, then we have to duplicate
the entities and show it in the diagram. For example, if a project in the company involves
multiple departments, then our hierarchical representation changes as below:
Advantages
It helps to address the issues of flat file data storage. In flat files, data will be scattered and
there will not be any proper structuring of the data. This model groups the related data into
tables and defines the relationship between the tables, which is not addressed in flat files.
Disadvantages
Redundancy: - When data is stored in a flat file, there might be repetition of same data multiple
times and any changes required for the data will need to change in all the places in the flat file.
Missing to update at any one place will cause incorrect data. This kind redundancy is solved by
hierarchical model to some extent. Since records are grouped under related table, it solves the
flat file redundancy issue. But look at the many to many relationship examples given above. In
such case, we have to store same project information for more than one department. This is
duplication of data and hence a redundancy. So, this model does not reduce the redundancy
issue to a significant level.
As we have seen above, it fails to handle many to many relationships efficiently. It results in
redundancy and confusion. It can handle only parent-child kind of relationship.
If we need to fetch any data in this model, we have to start from the root of the model and
traverse through its child till we get the result. In order to perform the traversing, either we
should know well in advance the layout of model or we should be very good programmer.
Hence fetching through this model becomes bit difficult.
Imagine company has got some new project details, but it did not assign it to any department
yet. In this case, we cannot store project information in the PROJECT table, till company
assigns it to some department. That means, in order to enter any child information, its parent
information should be already known / entered.
Observe the table structures above. They are very simple to understand. There is no redundant
data as well. It addressed major drawback of earlier data models. This type of data model is
called relational data model.
This model is based on the mathematical concepts of set theory. It considers the tables as a
two dimensional table with rows and columns. It is least bothered about the physical storage of
structure and data in the memory. It considers only the data and how it can be represented in
the form of rows and columns, and the way it can establish the relation between other tables.
A relational data model revolves around 5 important rules.
o Order of rows / records in the table is not important. For example, displaying the records
for Joseph is independent of displaying the records for Rose or Mathew in Employee
table. It does not change the meaning or level of them. Each record in the table is
independent of other. Similarly, order of columns in the table is not important. That
means, the value in each column for a record is independent of other. For example,
representing DEPT_ID at the end or at the beginning in the employee table does not
have any affect.
o Each record in the table is unique. That is there is no duplicate record exists in the table.
This is achieved by the use of primary key or unique constraint.
o Each column/attribute will have single value in a row. For example, in Department table,
DEPT_NAME column cannot have ‘Accounting’ and ‘Quality’ together in a single cell.
Both has to be in two different rows as shown above.
o All attributes should be from same domain. That means each column should have
meaningful value. For example, Age column cannot have dates in it. It should contain
only valid numbers to represent individual’s age. Similarly, name columns should have
valid names, Date columns should have proper dates.
o Table names in the database should be unique. In the database, same schema cannot
contain two or more tables with same name. But two tables with different names can
have same column names. But same column name is not allowed in the same table.
Examine below table structure for Employee, Department and Project and see if it satifies
relational data model rules.
Advantages
Structural independence:- Any changes to the database structure, does not the way we are
accessing the data. For example, Age is added to Employee table. But it does not change the
relationship between the other tables nor changes the existing data. Hence it provides the total
independence from its structure.
Simplicity:- This model is designed based on the logical data. It does not consider how data are
stored physically in the memory. Hence when the designer designs the database, he
concentrates on how he sees the data. This reduces the burden on the designer.
Because of simplicity and data independence, this kind of data model is easy to maintain and
access.
This model supports structured query language – SQL. Hence it helps the user to retrieve and
modify the data in the database. By the use of SQL, user can get any specific information from
the database.
Disadvantages
Compared to the advantages above, the disadvantages of this model can be ignored.
High hardware cost:- In order to separate the physical data information from the logical data,
more powerful system hardwares – memory is required. This makes the cost of database high.
Sometimes, design will be designed till the minute level, which will lead to complexity in the
database.
Comparison among the three Record Based Data Models