Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
What is a Database?
To find out what database is, we have to start from data, which is the basic building block of any DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, Sachin, 35 etc). Record: Collection of related data items, e.g. in the above example the three data items had no
meaning. But if we organize them in the following way, then they collectively represent meaningful information.
Rollno
1
Name
Sachin
Age
35
Name
Sachin Saurav Rahul
Age
35 32 34
The columns of this relation are called Fields, Attributes or Domains. The rows are called Tuples or Records.
Database: Collection of related relations. Consider the following collection of tables: Table-1
Rollno
1 2 3
Name
Sachin Saurav Rahul
Age
35 32 34
Table-2
Rollno
1 2 3
Address
Mumbai Kolkatta Bengaluru
Table-3 Rollno
1 2 3
Year
III I II
Table-4
Year
I II III
Hostel
H1 H2 H3
We now have a collection of 4 tables. They can be called a related collection because we can clearly find out that there are some common attributes existing in a selected pair of tables. Because of these common attributes we may combine the data of two or more tables together to find out the complete details of a student. Questions like Which hostel does the youngest student live in? can be answered now, although Age and Hostel attributes are in different tables. In a database, data is organized strictly in row and column format. The rows are called Tuple or Record. The data items within one row may belong to different data types. On the other hand, the columns are often called Domain or Attribute. All the data items within a single attribute are of the same data type.
Tuple / Row:
A single row in the table is called as tuple. Each row represents the data of a single entity.
Attribute / Column:
A column stores an attribute of the entity. For example, if details of students are stored then student name is an attribute; course is another attribute and so on.
Column Name:
Each column in the table is given a name. This name is used to refer to value in the column.
Table Name:
Each table is given a name. This is used to refer to the table. The name depicts the content of the table.
Primary Key:
A table contains the data related entities. If you take STUDETNS table, it contains data related to students. For each student there will be one row in the table. Each students data in the table must be uniquely identified. In order to identify each entity uniquely in the table, we use a column in the table. That column, which is used to uniquely identify entities (students) in the table, is called as primary key. In case of STUDENTS table (see figure 1) we can use ROLLNO as the primary key as it is not duplicated. So a primary key can be defined as a set of columns used to uniquely identify rows of a table. Some other examples for primary keys are account number in bank, product code of products, and employee number of an employee.
Foreign Key:
In relational model, we often store data in different tables and put them together to get complete information. For example, in PAYMENTS table we have only ROLLNO of the student. To get remaining information about the student we have to use STUDETNS table. Roll number in PAYMENTS table can be used to obtain remaining information about the student. The relationship between entities student and payment is one-to-many. One student may make payment for many times. As we already have ROLLNO column in PAYMENTS table, it is possible to join with STUDENTS table and get information about parent entity (student). Roll number column of PAYMENTS table is called as foreign key as it is used to join PAYMENTS table with STUDENTS table. So, foreign key is the key on the many side of the relationship.
ROLLNO column of PAYMENTS table must derive its values from ROLLNO column of STUDENTS table. When a child table contains a row that doesnt refer to a corresponding parent key, it is called as orphan record. We must not have orphan records, as they are result of lack of data integrity.
Integrity Rules:
Data integrity is to be maintained at any cost. If data loses integrity it becomes garbage. So every effort is to be made to ensure data integrity is maintained. The following are the main integrity rules that are to be followed.
Domain integrity:
Data is said to contain domain integrity when the value of a column is derived from the domain. Domain is the collection of potential values. For example, column date of joining must be a valid date. All valid dates form one domain. If the value of date of joining is an invalid date, then it is said to violate domain integrity.
Entity integrity:
This specifies that all values in primary key must be not null and unique. Each entity that is stored in the table must be uniquely identified. Every table must contain a primary key and primary key must be not null and unique.
Referential Integrity:
This specifies that a foreign key must be either null or must have a value that is derived from corresponding parent key. For example, if we have a table called BATCHES, then ROLLNO column of the table will be referencing ROLLNO column of STUDENTS table. All the values of ROLLNO column of BATCHES table must be derived from ROLLNO column of STUDENTS table. This is because of the fact that no student who is not part of STUDENTS table can join a batch.
Atomicity: The atomicity property identifies that the transaction is atomic. An atomic
transaction is either fully completed, or is not begun at all. Any updates that a transaction might affect on a system are completed in their entirety. If for any reason an error occurs
and the transaction is unable to complete all of its steps, the then system is returned to the state it was in before the transaction was started. An example of an atomic transaction is an account transfer transaction. The money is removed from account A then placed into account B. If the system fails after removing the money from account A, then the transaction processing system will put the money back into account A, thus returning the system to its original state.
Consistency: The consistency property ensures that any transaction will bring the
database from one valid state to another. Any data written to the database must be valid according to all defined rules. The transaction should leave the database in a consistent state, whether or not it completed successfully.
Isolation: Data modifications made by one transaction must be isolated from the data
modifications made by all other transactions. A transaction sees data in the state it was in before another concurrent transaction modified it, or it sees the data after the second transaction has completed, but it doesn't see an intermediate data.
Durability: It means that once a transaction has been committed, it will remain so, even
in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently. If the database crashes immediately thereafter, it should be possible to restore the database to the state after the last transaction committed.
Example:
Suppose a transaction has to transfer Rs. 50 from account A to account B. So, this will be executed in following sequential steps: 1. Read A 2. A = A 50 3. Write A 4. Read B 5. B = B + 50 6. Write B
Atomicity Requirement: If the transaction fails after step 3 and before step 6, the
system should ensure that its updates are not reflected in the database, else an inconsistency will result.
Durability Requirement: Once the use has been notified that the transaction has
been completed (i.e. The transfer of Rs. 50 has taken place), the updates to the database by the transaction must persist despite failures.
DDL (Data Definition Language): DDL commands are used to define, modify and
delete the structure of database and its objects. Example: CREATE, ALTER, DROP etc.
DML (Data Manipulation Language): DML commands are used to insert, update
and delete data to existing table. Example: INSERT, UPDATE, DELETE
DCL (Data Control Language): DCL commands are used to grant and revoke
different privilege to/from user. Example: GRANT, REVOKE
DBMS. Redundancy is the problem of storing the same data item in more one place. Redundancy creates several problems like requiring extra storage space, entering same data more than once during data insertion, and deleting data from more than one place during deletion.
2. Sharing of Data: In a paper-based record keeping, data cannot be shared among many
users. But in computerized DBMS, many users can share the same database if they are connected via a network.
3. Data Integrity: We can maintain data integrity by specifying integrity constrains, which
are rules and restrictions about what kind of data may be entered or manipulated within the database. This increases the reliability of the database as it can be guaranteed that no wrong data can exist within the database at any point of time.
4. Data security: We can restrict certain people from accessing the database or allow them
to see certain portion of the database while blocking sensitive information. This is not possible very easily in a paper-based record keeping.
5. Data Independence: Data independence means that "the application is independent of
the storage structure and access strategy of data". In other words, The ability to modify the schema definition in one level should not affect the schema definition in the next higher level. There are two types of Data Independence:
Physical Data Independence: Modification in physical level should not affect the logical level. Logical Data Independence: Modification in logical level should affect the view level.
6. Data Abstraction: It means hiding implementation details(i.e. high level details) from
end user. e.g. In case of storage of data in database user can only access the database, but
implementation details such as how the data is stored physically onto the disc is hidden from user.
Physical level: The lowest level of abstraction describes how data are stored. Logical level: The next higher level of abstraction, describes what data are stored in database and what relationship among those data. View level: The highest level of abstraction describes only part of entire database.