Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Database
Instructor Seble
M.T
2006
WWW.AAU.EDU.ET
Fundamentals of Database
CHAPTER ONE
Database System
Database systems are designed to manage large data set in an organization. The data management involves both definition and the manipulation of the data which ranges from simple representation of the data to considerations of structures for the storage of information. The data management also consider the provision of mechanisms for the manipulation of information. Today, Databases are essential to every business. They are used to maintain internal records, to present data to customers and clients on the World-Wide-Web, and to support many other commercial processes. Databases are likewise found at the core of many modern organizations. The power of databases comes from a body of knowledge and technology that has developed over several decades and is embodied in specialized software called a database management system, or DBMS. A DBMS is a powerful tool for creating and managing large amounts of data efficiently and allowing it to persist over long periods of time, safely. These systems are among the most complex types of software available. Thus, for our question: What is a database? In essence a database is nothing more than a collection of shared information that exists over a long period of time, often many years. In common dialect, the term database refers to a collection of data that is managed by a DBMS.
1. Manual Approach
In the manual approach, data storage and retrieval follows the primitive and traditional way of information handling where cards and paper are used for the purpose. Files for as many event and objects as the organization has are used to store information. Each of the files containing various kinds of information is labeled and stored in one or more cabinets. The cabinets could be kept in safe places for security purpose based on the sensitivity of the information contained in it. Insertion and retrieval is done by searching first for the right cabinet then for the right the file then the information. One could have an indexing system to facilitate access to the data
Prone to error Difficult to update, retrieve, integrate You have the data but it is difficult to compile the information Limited to small size information Cross referencing is difficult
An alternative approach of data handling is a computerized way of dealing with the information. The computerized approach could also be either decentralized or centralized base on where the data resides in the system.
computer applications with file based processing used for the purpose of data handling. Even though the approach evolved over time, the basic structure is still similar if not identical. File based systems were an early attempt to computerize the manual filing system. This approach is the decentralized computerized data handling method. A collection of application programs perform services for the end-users. In such systems, every application program that provides service to end users, define and manage its own data. Such systems have number of programs for each of the different applications in the organization. Since every application defines and manages its own data, the system is subjected to serious data duplication problem. File, in traditional file based approach, is a collection of records which contains logically related data.
Page 3
Limitations of the Traditional File Based approach As business application become more complex demanding more flexible and reliable data handling methods, the shortcomings of the file based system became evident. These shortcomings include, but not limited to: Separation or Isolation of Data: Available information in one application may not be known. Limited data sharing Lengthy development and maintenance time Duplication or redundancy of data Data dependency on the application Incompatible file formats between different applications and programs creating inconsistency. Fixed query processing which is defined during application development The limitations for the traditional file based data handling approach arise from two basic reasons. 1. Definition of the data is embedded in the application program which makes it difficult to modify the database definition easily. 2. No control over the access and manipulation of the data beyond that imposed by the application programs.
The most significant problem experienced by the traditional file based approach of data handling is the update anomalies. We have three types of update anomalies; 1. Modification Anomalies: a problem experienced when one ore more data value is modified on one application program but not on others containing the same data set. 2. Deletion Anomalies: a problem encountered where one record set is deleted from one application but remain untouched in other application programs. 3. Insertion Anomalies: a problem encountered where one cannot decide whether the data to be inserted is valid and consistent with other similar data set.
Page 4
3. Database Approach
Following a famous paper written by Ted Codd in 1970, database systems changed significantly. Codd proposed that database systems should present the user with a view of data organized as tables called relations. Behind the scenes, there might be a complex data structure that allowed rapid response to a variety of queries. But, unlike the user of earlier database systems, the user of a relational system would not be concerned with the storage structure. Queries could be expressed in a very high-level language, which greatly increased the efficiency of database programmers. The database approach emphasizes the integration and sharing of data throughout the organization. Thus in Database Approach: Database is just a computerized record keeping system or a kind of electronic filing cabinet. Database is a repository for collection of computerized data files. Database is a shared collection of logically related data designed to meet the information needs of an organization. Since it is a shared corporate resource, the database is integrated with minimum amount of or no duplication. Database is a collection of logically related data where these logically related data comprises entities, attributes, relationships, and business rules of an organization's information. In addition to containing data required by an organization, database also contains a description of the data which called as Metadata or Data Dictionary or Systems Catalogue or Data about Data. Since a database contains information about the data (metadata), it is called a self descriptive collection on integrated records. The purpose of a database is to store information and to allow users to retrieve and update that information on demand. Database is deigned once and used simultaneously by many users. Unlike the traditional file based approach in database approach there is program data independence. That is the separation of the data definition from the application. Thus the application is not affected by changes made in the data structure and file organization. Each database application will perform the combination of: Creating database, Reading, Updating and Deleting data.
AAU, Computer Science Department, 2013 Page 5
Page 6
Page 7
updating, storing, retrieving data in a database. DBMS also provides the service of controlling data access, enforcing data integrity, managing concurrency control, and recovery. Having this in mind, a full scale DBMS should at least have the following services to provide
to the user. 1. Data storage, retrieval and update in the database 2. A user accessible catalogue 3. Transaction support service: ALL or NONE transaction, which minimize data inconsistency. 4. Concurrency Control Services: access and update on the database by different users simultaneously should be implemented correctly. 5. Recovery Services: a mechanism for recovering the database after a failure must be available. 6. Authorization Services (Security): must support the implementation of access and authorization service to database administrator and users. 7. Support for Data Communication: should provide the facility to integrate with data transfer software or data communication managers. 8. Integrity Services: rules about data and the change that took place on the data, correctness and consistency of stored data, and quality of data based on business constraints. 9. Services to promote data independency between the data and the application 10. Utility services: sets of utility service facilities like
Importing data Statistical analysis support Index reorganization Garbage collection
AAU, Computer Science Department, 2013 Page 8
2.
Software:
operating systems, network software, and other relevant software. 3. Data: This is the most important component to the user of the database. There are two types of data in a database approach that is Operational and Metadata. The structure of the data in the database is called the schema, which is composed of the Entities, Properties of entities, and relationship between entities. 4.
Procedure: this is the rules and regulations on how to design and use a database. It
includes procedures like how to log on to the DBMS, how to use facilities, how to start and stop transaction, how to make backup, how to treat hardware and software failure, how to change the structure of the database.
5.
Page 9
Planning: Analysis:
database solution to solve the problem. 2. that concentrates more on fact finding about the problem or the
opportunity. Feasibility analysis, requirement determination and structuring, and selection of best design method are also performed at this phase. 3.
Design:
further divided into three sub-phases. a. Conceptual Design: concise description of the data, data type, relationship between data and constraints on the data. There is no implementation or physical detail consideration. Used to elicit and structure all information requirements b. Logical Design: a higher level conceptual abstraction with selected specific data model to implement the data structure. It is particular DBMS independent and with no other physical considerations. c. Physical Design: physical implementation of the upper level design of the database with respect to internal storage and file structure of the database for the selected DBMS. To develop all technology and organizational specification. 4. Implementation: the testing and deployment of the designed database for use. 5.
Page 10
to oversee, control and manage the database resources (the database itself,
for determining and acquiring hardware and software resources for problems like poor security, poor performance of the system
We can have further classifications of this role in big organizations having huge amount of data and user requirement. 1. Data Administrator (DA): is responsible on management of data resources. Involves in database planning, development, maintenance of standards policies and procedures at the conceptual and logical design phases. 2. DataBase Administrator (DBA): is more technically oriented role. Responsible for the physical realization of the database. Involves in physical design, implementation, security and integrity control of the database.
We have two distinctions of database designers, one involving in the logical and conceptual design and another involving in physical design.
Page 11
2. Physical DBD
Take logical design specification as input and decide how it should be physically realized. Map the logical data model on the specified DBMS with respect to tables and integrity constraints. (DBMS dependent designing) Select specific storage structure and access path to the database Design security measures required on the database
4. End Users
Workers, whose job requires accessing the database frequently for various purpose. There are different group of users in this category.
1. Nave Users:
Sizable proportion of users Unaware of the DBMS
AAU, Computer Science Department, 2013 Page 12
Only access the database based on their access level and demand Use standard and pre-specified types of queries.
2. Sophisticated Users
Are users familiar with the structure of the Database and facilities of the DBMS. Have complex requirements Have higher level queries Are most of the time engineers, scientists, business analysts, etc
3. Casual Users
Users who access the database occasionally. Need different information from the database each time. Use sophisticated database queries to satisfy their needs. Are most of the time middle to high level managers.
These users can be again classified as Actors on the Scene and Workers Behind the Scene.
Page 13
CHAPTER TWO
the database. An attribute represents some property of interest that further describes an entity, such as the employee's name or salary. A relationship among two or more entities represents an association among two or more entities, for example, a works-on relationship between an employee and a project. Representational or implementation data models are the models used most frequently in traditional commercial DBMSs. These include the widely used relational data model, as well as the so-called legacy data models-the network and hierarchical models-that have been widely used in the past. Representational data models represent data by using record structures and hence are sometimes called record-based data models. We can regard object data models as a new family of higher-level implementation data models that are closer to conceptual data models.
FIGURE 1.2
Page 2
Page 4
specified in the schema. Hence, specifying a correct schema to the DBMS is extremely important, and the schema must be designed with the utmost care. The DBMS stores the descriptions of the schema constructs and constraints-also called the meta-data-in the DBMS catalog so that DBMS software can refer to the schema whenever it needs to. The schema is sometimes called the intension, and a database state an extension of the schema. Although, as mentioned earlier, the schema is not supposed to change frequently, it is not uncommon that changes need to be occasionally applied to the schema as the application requirements change. For example, we may decide that another data item needs to be stored for each record in a file, such as adding the DateOfBirth to the STUDENT schema in Figure 2.1. This is known as schema evolution. Most modern DBMSs include some operations for schema evolution that can be applied while the database is operational. Name STUDENT CourseName COURSE SectionIdentifier CourseNumber Semester Year Instructor CourseNumber CreditHours Department StudentNumber Class Major
SECTION
FIGURE 2.1 Schema diagram for the database in Figure 1.2.
Page 7
Internal Level
The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database.
Conceptual Level
The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. Usually, a representational data model is used to describe the conceptual schema when a database system is implemented. This implementation conceptual schema is often based on a conceptual schema design in a high-level data model.
External Level
The external or view level includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. As in the previous case, each external schema is typically implemented using a representational data model, possibly based on an external schema design in a high level data model.
Page 8
The three-schema architecture is a convenient tool with which the user can visualize the schema levels in a database system. Most DBMSs do not separate the three levels completely, but support the three-schema architecture to some extent. Some DBMSs may include physical-level details in the conceptual schema. In most DBMSs that support user views, external schemas are specified in the same data model that describes the conceptual-level information. Some DBMSs allow different data models to be used at the conceptual and external levels. Notice that the three schemas are only descriptions of data; the only data that actually exists is at the physical level. In a DBMS based on the three-schema architecture, each user group refers only to its own external schema. Hence, the DBMS must transform a request specified on an external schema into a request against the conceptual schema, and then into a request on the internal schema for processing over the stored database. If the request is a database retrieval, the data extracted from the stored database must be reformatted to match the user's external view. The processes of transforming requests and results between levels are called mappings. These mappings may be time-consuming, so some DBMSs-especially those that are meant to support small databases-do not support external views. Even in such systems, however, a certain amount of mapping is necessary to transform requests between the conceptual and internal levels.
Page 9
Internal Level struct STAFF{ int Staff_No; int Branch_No; char FName[15]; char LName[15]; struct DOB; float salary; } Fig. Differences between Three Levels of ANSI-SPARC Architecture The ANSI-SPARC Architecture Defines DBMS schemas at three levels:
Internal schema o at the internal level to describe physical storage structures and access paths. Typically uses a physical data model. Conceptual schema o at the conceptual level to describe the structure and constraints for the whole database for a community of users. Uses a conceptual or an implementation data model. External schemas o at the external level to describe the various user views. Usually uses the same data model as the conceptual level.
Page 10
Data Independence
The three-schema architecture can be used to further explain the concept of data independence, which can be defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. We can define two types of data independence:
change the conceptual schema. For example, providing an access path to improve retrieval speed of SECTION records (Figure 1.2) by Semester and Year should not require a query such as "list all sections offered in fall 1998" to be changed, although the query would be executed more efficiently by the DBMS by utilizing the new access path.
Page 12
Page 13
Page 1
1. Entities/Relation/Table
The basic object that the ER model represents is an entity, which is a "thing" in the real world with an independent existence. An entity may be an object with a physical existence (for example, a particular person, car, house, or employee) or it may be an object with a conceptual existence (for example, a company, a job, or a university course). Each entity has attributes-the particular properties that describe it. For example, an employee entity may be described by the employee's name, age, address, salary, and job. A particular entity will have a value for each of its attributes. NB: The name given to an entity should always be a singular noun descriptive of each item to be stored in it. E.g.: student NOT students. Every relation has a schema, which describes the columns, or fields The relation itself corresponds to our familiar notion of a table A relation is a collection of tuples, each of which contains values for a fixed number of attributes
2. Attributes/Fields/Columns
Attributes are pieces of information ABOUT entities. The analysis must of course identify those which are actually relevant to the proposed application. At this level we need to know such things as: Attribute name (be explanatory words or phrases) The domain from which attribute values are taken (A DOMAIN is a set of values from which attribute values may be taken.) Each attribute has values taken from a domain. For example, the domain of Name is string and that for salary is real Whether the attribute is part of the entity identifier (attributes which just describe an entity and those which help to identify it uniquely) Whether it is permanent or time-varying (which attributes may change their values over time) Whether it is required or optional for the entity (whose values will sometimes be unknown or irrelevant)
Page 2
Several types of attributes occur in the ER model: simple versus composite, single-valued versus multi-valued, and stored versus derived.
Page 3
attribute NumberOfEmployees of a department entity can be derived by counting the number of employees related to (working for) that department.
Null Values
In some cases a particular entity may not have an applicable value for an attribute. For example, the ApartmentNumber attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a CollegeDegrees attribute applies only to persons with college degrees. For such situations, a special value called null is created. An address of a single-family home would have null for its ApartmentNumber attribute, and a person with no college degree would have null for CollegeDegrees. NB: NULL applies to attributes which are not applicable or which do not have values. You may enter the value NA (meaning not applicable) Value of a key attribute cannot be null. Default value - assumed value if no explicit value
Page 4
3. Relationships
A relationship type R among n entity types E1, E2, --- En, defines a set of associations among entities. For example, consider a relationship type WORKS_FOR between two entity types EMPLOYEE and DEPARTMENT, which associates each employee with the department for which the employee works.
Degree of a Relationship
The degree of a relationship type is the number of participating entity types. Hence, the WORKS-FOR relationship is of degree two. A relationship type of degree two is called binary, and one of degree three is called ternary. Relationships can generally be of any degree, but the ones most common are binary relationships. Higher-degree relationships are For example, suppose that we want to capture which employees use which skills on which project. We might try to represent this data in a database as three binary relationships between skills and project, project and employee, and employee and skill.
Page 5
Fig. A ternary relationship Works-on o Abebe and Kebede have worked on projects A and B. Used-on o Abebe has used programming skills on project x Have skill o An employee has a certain skill. This is different than used on because there are some skills that an employee has that he or she may not have used on a particular project. Needed o A project needs a particular skill. This is different than used on because there may be some skills for which employees have not been assigned to the project yet. Manages o An employee manages a project. This is a completely different dimension than skill so it could not be captured by used on.
Page 6
In some cases the same entity participates more than once in a relationship with the same entity in different roles. In such cases the role name is essential for distinguishing the meaning of each participation. Such relationship types are called recursive relationships. For example, the SUPERVISION relationship type relates an employee to a supervisor, where both employee and supervisor entities are members of the same EMPLOYEE entity.
Page 7
ONE-TO-MANY: one tuple can be associated with many other tuples, but not the reverse. Eg1. Department-Student: as one department can have multiple students. Eg 2: Employee-Deprtament: as one employee can work in multiple departments.
MANY-TO-ONE: many tuples are associated with one tuple but not the reverse. E.g. Employee Department: as many employees belong to a single department. MANY-TO-MANY: one tuple is associated with many other tuples and from the other side, with a different role name one tuple will be associated with many tuples
E.g. Student Course: as a student can take many courses and a single course
can be attended by many students.
Page 8
1. Base Relation
A Named Relation corresponding to an entity in the conceptual schema, whose tuples are physically stored in the database. 2. View Is the dynamic result of one or more relational operations operating on the base relations to produce another virtual relation. So a view virtually derived relation that does not necessarily exist in the database but can be produced upon request by a particular user at the time of request.
Purpose of a view
Hides unnecessary information from users Provide powerful flexibility and security Provide customized view of the database for users A view of one base relation can be updated. Update on views derived from various relations is not allowed
Page 9
ER-Diagrams
Entity is represented by a RECTANGLE containing the name of the entity
Attributes are represented by OVALS and are connected to the entity by a line
Page 10
Key
Relationships are represented by DIAMOND shaped symbols
One-to-One Relationship
Relationship Manages between Employee and Department The multiplicity of the relationship is: o One department can only have one manager o One employee could manage either one or no departments
Employee
01
Manages
11
Department
One-to-Many Relationship
Relationship Leads between Employee and Project The multiplicity of the relationship o One staff may Lead one or more project(s) o One project is Lead by one employee
Employee
0*
Leads
11
Project
Many-to-Many Relationship
Relationship Teaches between Instructor and Course The multiplicity of the relationship o One Instructor Teaches one or more Course(s) o One Course is thought by zero or more instructor(s) 1*
Instructor Teaches
0*
Course
2. Participation of an Entity Set in a Relationship Set Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set. The entity with total participation will be connected with the relationship using a double line.
Page 12
E.g. 1: Participation of EMPLOYEE in belongs to relationship with DEPARTMENT is total since every employee should belong to a department.
E.g. 2: Participation of EMPLOYEE in manages relationship with DEPARTMENT, DEPARTMENT will have total participation but not EMPLOYEE
Partial participation: some entities may not participate in any relationship in the relationship set E.g. 1: Participation of EMPLOYEE in manages relationship with DEPARTMENT, EMPLOYEE will have partial participation since not all employees are managers.
Page 13
Attributes of Relationship Types Relationship types can also have attributes, similar to those of entity types. For example, to record the number of hours per week that an employee works on a particular project, we can include an attribute Hours for the WORKS_ON relationship type. Another example is to include the date on which a manager started managing a department via an attribute StartDate for the MANAGES relationship type. Notice that attributes of 1:1 or 1:M relationship types can be migrated to one of the participating entity types. For example, the StartDate attribute for the MANAGES relationship can be an attribute of either EMPLOYEE or DEPARTMENT, although conceptually it belongs to
Page 14
MANAGES. This is because MANAGES is a 1:1 relationship, so every department or employee entity participates in at most one relationship instance. Hence, the value of the StartDate attribute can be determined separately, either by the participating department entity or by the participating employee (manager) entity.
Page 15
Step 3: Mapping of Binary 1:1 Relationship Types. For each binary 1:1 relationship type R in the ER schema, identify the relations S and T that correspond to the entity types participating in R. Choose one of the relations-S, say-and include as a foreign key in S the primary key of T. It is better to choose an entity type with total participation in R in the role of S. Include all the simple attributes (or simple components of composite attributes) of the 1:1 relationship type R as attributes of S. In our example, we map the 1:1 relationship type MANAGES by choosing the participating entity type DEPARTMENT to serve in the role of S, because its participation in the MANAGES relationship type is total (every department has a manager). We include the primary key of the EMPLOYEE relation as foreign key in the DEPARTMENT relation and rename it MGRSSN. We also include the simple attribute STARTDATE of the MANAGES relationship type in the DEPARTMENT relation and rename it MGRSTARTDATE. Note that it is possible to include the primary key of S as a foreign key in T instead. In our example, this amounts to having a foreign key attribute, say
DEPARTMENT_MANAGED in the EMPLOYEE relation, but it will have a null value for employee tuples who do not manage a department. Step 4: Mapping of Binary 1:N Relationship Types. For each regular binary 1:N relationship type R, identify the relation S that represents the participating entity type at the N-side of the relationship type. Include as foreign key in S the primary key of the relation T that represents the other entity type participating in R; this is done because each entity instance on the N-side is related to at most one entity instance on the 1-side of the relationship type. Include any simple attributes (or simple components of composite attributes) of the 1:N relationship type as attributes of S. In our example, we now map the 1:N relationship types WORKS_FOR, CONTROLS, and SUPERVISION . For WORKS_FOR we include the primary key DNUMBER of the DEPARTMENT relation as foreign key in the EMPLOYEE relation and call it DNO.
Page 17
For SUPERVISION we include the primary key of the EMPLOYEE relation as foreign key in the EMPLOYEE relation itself because the relationship is recursive-and call it SUPERSSN.
The CONTROLS relationship is mapped to the foreign key attribute DNUM of PROJECT, which references the primary key DNUMBER of the DEPARTMENT relation. Step 5: Mapping of Binary M:N Relationship Types.
For each binary M:N relationship type R, create a new relation S to represent R. Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types; their combination will form the primary key of S. Also include any simple attributes of the M:N relationship type (or simple components of composite attributes) as attributes of S. Notice that we cannot represent an M:N relationship type by a single foreign key attribute in one of the participating relations (as we did for 1:1 or I:N relationship types) because of the M:N cardinality ratio; we must create a separate relationship relation S. In our example, we map the M:N relationship type WORKS_ON relation WORKS_ON. We include the primary keys of the PROJECT and EMPLOYEE relations as foreign keys in WORKS_ON and rename them PNO and ESSN, respectively. We also include an attribute HOURS in WORKS_ON to represent the HOURS attribute of the relationship type. The primary key of the WORKS_ON relation is the combination of the foreign key attributes {ESSN, PNO}. Step 6: Mapping of Multivalued Attributes. For each multivalued attribute A, create a new relation R. This relation R will include an attribute corresponding to A, plus the primary key attribute K-as a foreign key in R-of the relation that represents the entity type or relationship type that has A as an attribute. The primary key of R is the combination of A and K. If the multivalued attribute is composite, we include its simple components. by creating the
Page 18
In our example, we create a relation DEPT_LOCATIONS. The attribute DLOCATION represents the multivalued attribute LOCATIONS of DEPARTMENT, while DNUMBER-as foreign key represents the primary key of the DEPARTMENT relation. The primary key of DEPT_LOCATIONS is the combination of {DNUMBER, DLOCATION}. A separate tuple will exist in DEPT_LOCATIONS for each location that a department has. Step 7: Mapping of N-ary Relationship Types.
For each n-ary relationship type R, where n > 2, create a new relation S to represent R. Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types. Also include any simple attributes of the n-ary relationship type (or simple components of composite attributes) as attributes of S. The primary key of S is usually a combination of all the foreign keys that reference the relations representing the participating entity types.
For example, consider the relationship type SUPPLY of the Figure above. This can be mapped to the relation SUPPLY shown in below; whose primary key is the combination of the three foreign keys {SNAME, PARTNO, PROJNAME}.
Page 19
Page 1
GUIDELINE 2. Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly. GUIDELINE 3. As far as possible, avoid placing attributes in a base relation whose values may frequently be null. If nulls are unavoidable, make sure that they apply in exceptional cases only and do not apply to a majority of tuples in the relation.
Normalization
A relational database is merely a collection of data, organized in a particular manner. As the father of the relational database approach, Codd created a series of rules called normal forms that help define that organization. Database normalization is a series of steps followed to obtain a database design that allows for consistent storage and efficient access of data in a relational database. These steps reduce data redundancy and the risk of data becoming inconsistent. Normalization is the process of identifying the logical associations between data items and designing a database that will represent such associations but without suffering the update anomalies which are; Insertion Anomalies, Deletion Anomalies and Modification Anomalies. All the normalization rules will eventually remove the update anomalies that may exist during data manipulation after the implementation. The underlying ideas in normalization are simple enough. Through normalization we want to design for our relational database a set of tables that; Contain all the data necessary for the purposes that the database is to serve, Have as little redundancy as possible, Accommodate multiple values for types of data that require them, Permit efficient updates of the data in the database, and Avoid the danger of losing data unknowingly
Normalization may reduce system performance since data will be cross referenced from many tables. Thus denormalization is sometimes used to improve performance, at the cost of reduced consistency guarantees.
Page 2
Drawbacks of Normalization Requires data to see the problems May reduce performance of the system Is time consuming, Difficult to design and apply and Prone to human error
The type of problems that could occur in insufficiently normalized table is called update anomalies which includes;
1. Insertion anomalies
An "insertion anomaly" is a failure to place information about a new database entry into all the places in the database where information about that new entry needs to be stored. In a properly normalized database, information about a new entry needs to be inserted into only one place in the database; in an inadequately normalized database, information about a new entry may need to be inserted into more than one place and, human fallibility being what it is, some of the needed additional insertions may be missed.
2. Deletion anomalies
A "deletion anomaly" is a failure to remove information about an existing database entry when it is time to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database; in an inadequately normalized database, information about that old entry may need to be deleted from more than one place, and, human fallibility being what it is, some of the needed additional deletions may be missed.
3. Modification anomalies
A modification of a database involves changing some value of the attribute of a table. In a properly normalized database table, whatever information is modified by the user, the change will be effected and used accordingly. The purpose of normalization is to reduce the chances for anomalies to occur in a database.
Page 3
Example of problems related with Anomalies EmpID 12 16 28 25 65 24 51 94 18 13 FName Abebe Lemma Chane Abera Almaz Dereje Selam Alem Girma Yared LName Mekuria Alemu Kebede Taye Belay Tamiru Belay Kebede Dereje Gizaw SkillID 2 5 2 6 2 8 4 3 1 7 Skill SQL C++ SQL VB6 SQL Oracle Prolog Cisco IP Java SkillType Database Programming Database Programming Database Database Programming Networking Programming Programming School AAU Unity AAU Helico Helico Unity Jimma AAU Jimma AAU SchoolAddress Sidist_Kilo Gerji Sidist_Kilo Piazza Piazza Gerji Jimma City Sidist_Kilo Jimma City Sidist_Kilo Skill Level 5 6 10 8 9 5 8 7 4 6
Deletion Anomalies:
If employee with ID 16 is deleted then ever information about skill C++ and the type of skill is deleted from the database. Then we will not have any information about C++ and its skill type. Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We cannot decide whether Pascal is allowed as a value for skill and we have no clue about the type of skill that Pascal should be categorized as. Modification Anomalies:
What if the address for Helico is changed from Piazza to Mexico? We need to look for every occurrence of Helico and change the value of School_Add from Piazza to Mexico, which is prone to error.
Page 4
The essence of this idea is that if the existence of something, call it A, implies that B must exist and have a certain value, then we say that "B is functionally dependent on A." We also often express this idea by saying that "A determines B," or that "B is a function of A," or that "A functionally governs B." Often, the notions of functionality and functional dependency are expressed briefly by the statement, "If A, then B." It is important to note that the value B must be unique for a given value of A, i.e., any given value of A must imply just one and only one value of B. X Y holds if whenever two tuples have the same value for X, they must have the same value for Y. The notation is: A B which is read as; B is functionally dependent on A In general, a functional dependency is a relationship among attributes. In relational databases, we can have a determinant that governs one other attribute or several other attributes. FDs are derived from the real-world constraints on the attributes Example
Since the type of Wine served depends on the type of Dinner, we say Wine is functionally dependent on Dinner. DinnerWine
Since both Wine type and Fork type are determined by the Dinner type, we say Wine is functionally dependent on Dinner and Fork is functionally dependent on Dinner. Dinner Wine Dinner Fork
Page 5
Partial Dependency If an attribute which is not a member of the primary key is dependent on some part of the primary key (if we have composite primary key) then that attribute is partially functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute. Then if {A,B} C and B C Then C is partially functionally dependent on {A,B}
Full Dependency If an attribute which is not a member of the primary key is not dependent on some part of the primary key but the whole key (if we have composite primary key) then that attribute is fully functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute Then if {A,B} C and B C and A C does not hold Then C Fully functionally dependent on {A,B}
Transitive Dependency In mathematics and logic, a transitive relationship is a relationship of the following form: "If A implies B, and if also B implies C, then A implies C." Example: If Abebe is a Human, and if every Human is an Animal, then Abebe must be an Animal. Generalized way of describing transitive dependency is that: If A functionally governs B, AND If B functionally governs C THEN A functionally governs C In the normal notation: {(AB) AND (BC)} then AC
Steps of Normalization
We have various levels or steps in normalization called Normal Forms. The level of complexity, strength of the rule and decomposition increases as we move from one lower level Normal Form to the higher. A table in a relational database is said to be in a certain normal form if it satisfies certain constraints. Normalization towards a logical design consists of the following steps: UnNormalized Form: Identify all data elements First Normal Form: Find the key with which you can find all data Second Normal Form: Remove part-key dependencies. Make all data dependent on the whole key. Third Normal Form: Remove non-key dependencies. Make all data dependent on nothing but the key.
Page 6
For most practical purposes, databases are considered normalized if they adhere to third normal form. First Normal Form (1NF) Requires that all column values in a table are atomic (e.g., a number is an atomic value, while a list or a set is not). We have two ways of achieving this: 1. Putting each repeating group into a separate table and connecting them with a primary keyforeign key relationship 2. Moving these repeating groups to a new row by repeating the common attributes. Then find the key with which you can find all data A table/relation is in 1NF If There are no duplicated rows in the table. Unique identifier Each cell is single-valued (i.e., there are no repeating groups). Entries in a column (attribute, field) are of the same kind.
Example for First Normal form (1NF ) UNNORMALIZED RELATION EmpID 12 16 28 25 65 24 51 94 FName Abebe Lemma Chane Abera Almaz Dereje Selam Alem LName Mekuria Alemu Kebede Taye Belay Tamiru Belay Kebede SkillID 2 5 2 6 2 8 4 3 Skill SQL C++ SQL VB6 SQL Oracle Prolog Cisco SkillType Database Programming Database Programming Database Database Programming Networking School AAU Unity AAU Helico Helico Unity Jimma AAU SchoolAddress Sidist_Kilo Gerji Sidist_Kilo Piazza Piazza Gerji Jimma City Sidist_Kilo Skill Level 5 6 10 8 9 5 8 7
FIRST NORMAL FORM (1NF) Remove all repeating groups. Distribute the multi-valued attributes into different rows and identify a unique identifier for the relation so that is can be said is a relation in relational database.
EmpID 12 12 16 16 28
SkillID 1 3 2 7 1
SkillLevel 5 8 6 4 10
Page 7
65 65 65 24 94
1 5 8 4 6
9 8 6 5 7
Second Normal form 2NF No partial dependency of a non key attribute on part of the primary key. This will result in a set of relations with a level of Second Normal Form. Any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is automatically also in 2NF. A table/relation is in 2NF If It is in 1NF and If all non-key attributes are dependent on the entire key. i.e. no partial dependency.
Since a partial dependency occurs when a non-key attribute is dependent on only a part of the (composite) key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial dependencies". Example for 2NF: EMP_PROJ
EmpID
EMP_PROJ
EmpName
Rearranged
ProjNo
ProjName
ProjLoc
ProjFund
ProjMangID
EmpID
ProjNo
EmpName
ProjName
ProjLoc
ProjFund
ProjMangID
This schema is in its 1NF since we dont have any repeating groups or attributes with multi -valued property. To convert it to a 2NF we need to remove all partial dependencies of non key attributes on part of the primary key. {EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund, ProjMangID But in addition to this we have the following dependencies EmpIDEmpName ProjNoProjName, ProjLoc, ProjFund, ProjMangID As we can see some non key attributes are partially dependent on some part of the primary key. Thus these collections of attributes should be moved to a new relation.
Page 8
EMPLOYEE
EmpID
PROJECT
EmpName
ProjNo
ProjName
ProjLoc
ProjFund
ProjMangID
EMP_PROJ
EmpID
ProjNo
Third Normal Form (3NF ) Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of the key, remove them to a separate table. This level avoids update and delete anomalies. A table/relation is in 3NF If It is in 2NF and There are no transitive dependencies between attributes.
Example for (3NF) Assumption: Students of same batch (same year) live in one building or dormitory STUDENT
Year 1 3 3 1 3
This schema is in its 2NF since the primary key is a single attribute. Lets take StudID, Year and Dormitary and see the dependencies. StudIDYear AND YeaRDormitary Then transitively StudId Dormitary To convert it to a 3NF we need to remove all transitive dependencies of non key attributes on the primary key. STUDENT
Year 1 3 3
Page 9
165/97 985/95
DORM
Alem Almaz
Kebede Belay
InfoSc Geog
1 3
Year 1 3
2. Design physical representation Analyze transactions o o To understand the functionality of the transactions that will run on the database and to analyze the important transactions Choose file organization To determine an efficient file organization for each base relation
Choose indexes
Page 10
Estimate disk space and system requirement To estimate the amount of disk space that will be required by the database
3. Design user view o To design the user views that were identified in the conceptual database design methodology 4. Design security mechanisms 5. Consider controlled redundancy o To determine whether introducing redundancy in a controlled manner by relaxing the normalization rules will improve the performance of the system. 6. Monitor and tune the operational system o Design access rules To design the access rules to the base relations and user views
Page 11
SQL is a comprehensive database language: It has statements for data definition, query, and update. Hence, it is both a DDL and a DML. In addition, it has facilities for defining views on the database, for specifying security and authorization, for defining integrity constraints, and for specifying transaction controls. SQL uses the terms table, row, and column for the formal relational model terms relation, tuple, and attribute, respectively.
The relations declared through CREATE TABLE statements are called base tables (or base relations); this means that the relation and its tuples are actually created and stored as a file by the DBMS.
Page 12
Attribute Data Types and Domains in SQL The basic data types available for attributes include numeric, character string, bit string, boolean, date, and time. Numeric data types include integer numbers of various sizes (INTEGER or INT) and floatingpoint (real) numbers of various precision (FLOAT or REAL). Formatted numbers can be declared by using DECIMAL(i,j) or NUMERIC(i,j)-where i, the precision, is the total number of decimal digits and j, the scale, is the number of digits after the decimal point. Character-string data types are either fixed length--CHAR(n), where n is the number of characters-or varying length-VARCHAR(n), where n is the maximum number of characters. A boolean data type has the traditional values of TRUE or FALSE. The DATE data type has ten positions, and its components are YEAR, MONTH, and DAY in the form YYYY-MM-DD. The TIME data type has at least eight positions, with the components HOUR, MINUTE, and SECOND in the form HH:MM:SS. Defining Constraints Because SQL allows NULLs as attribute values, a constraint NOT NULL may be specified if NULL is not permitted for a particular attribute. This is always implicitly specified for the attributes that are part of the primary key of each relation, but it can be specified for any other attributes whose values are required not to be NULL. It is also possible to define a default value for an attribute by appending the clause DEFAULT <value> to an attribute definition. The default value is included in any new tuple if an explicit value is not provided for that attribute. Another type of constraint can restrict attribute or domain values using the CHECK clause following an attribute or domain definition. For example, suppose that department ids are restricted to integer numbers between 1 and 20; then, we can change the attribute declaration of DNUMBER in the DEPARTMENT table to the following:
Page 13
Defining Key and Referential Integrity Constraints Because keys and referential integrity constraints are very important, there are special clauses within the CREATE TABLE statement to specify them. The PRIMARY KEY clause specifies one or more attributes that make up the primary key of a relation. If a primary key has a single attribute, the clause can follow the attribute directly. For example, the primary key of DEPARTMENT can be specified as follows
CREATE TABLE student( ID INT PRIMARY KEY Name VARCHAR(20) Department INT);
Referential integrity is specified via the FOREIGN KEY clause like the following
Example 1: CREATE TABLE student( ID INT PRIMARY KEY, Name VARCHAR(20), Sex CHAR(1), stdDept INT, FOREIGN KEY(stdDept) references departm ent(DID);
SQL ALTER command The definition of a base table or of other named schema elements can be changed by using the ALTER command. For base tables, the possible alter table actions include adding or dropping a column (attribute), changing a column definition, and adding or dropping table constraints. For example, to add an attribute for keeping track of age of students we can a new age attribute to the student base relations of Example 1, using the following command:
Data manipulation languages have their functional capability organized by the initial word in a statement, which is almost always a verb. In the case of SQL, these verbs are:
SELECT ... FROM ... WHERE ... INSERT INTO ... VALUES ... UPDATE ... SET ... WHERE ... DELETE FROM ... WHERE ...
Adding Table Data: The INSERT Command In its simplest form, INSERT is used to add a single tuple to a relation. We must specify the relation name and a list of values for the tuple. The values should be listed in the same order in which the corresponding attributes were specified in the CREATE TABLE command.
Example 2: INSERT INTO student (ID, name, sex, stdDept) VALUES (001, 'Abebe', 'M', 012); OR INSERT INTO student VALUES (001, 'Abebe', 'M', 012);
Attributes not specified in are set to their DEFAULT or to NULL, and the values are listed in the same order as the attributes are listed in the INSERT command itself. It is also possible to insert into a relation multiple tuples separated by commas in a single INSERT command. The attribute values forming each tuple are enclosed in parentheses.
Page 15
Modifying Table Data: The UPDATE Command Use the UPDATE command to update one or more columns in an existing row of data in a table.
If you include a WHERE clause, only rows meeting the criteria specified in the condition are updated. Conditions are described in Conditions. If no condition is specified in the WHERE clause, all rows are updated. For example, to change Abebes department whose record is inserted to the database at Example 2 from 012 to 044, the following command can be used:
Page 16
<attribute list> is a list of attribute names whose values are to be retrieved by the query. <table list> is a list of the relation names required to process the query. <condition> is a conditional (Boolean) ex pression that identifies the tuples to be retrieved by the query.
In SQL, the basic logical comparison operators for comparing attribute values with one another and with literal constants are =, <, <=, >, >=, and <>. Use an * to retrieve all non-hidden columns in the table. Otherwise, specify a comma-separated list of columns to retrieve. A missing WHERE clause indicates no condition on tuple selection; hence, all tuples of the relation specified in the FROM clause qualify and are selected for the query result. Example 6 : To retrieve every column information of every student from student table:
SELECT FROM
OR
* student;
Page 17
SELECT FROM
Id,name,sex,stdDept student;
Example 7: To retrieve only name and student department information of every student from student table:
SELECT FROM
Example 8: To retrieve only name and student department information of male students from student table:
Page 18
Chapter 5
The Relational Data Model and Relational Database Constraints
Chapter Outline
Relational Model Concepts Relational Model Constraints and Relational Database Schemas Update Operations and Dealing with Constraint Violations
Slide 5- 2
A Relation is a mathematical concept based on the ideas of sets The model was first proposed by Dr. E.F. Codd of IBM Research in 1970 in the following paper:
"A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970
The above paper caused a major revolution in the field of database management and earned Dr. Codd the coveted ACM Turing Award
Slide 5- 3
Informal Definitions
Informally, a relation looks like a table of values. A relation typically contains a set of rows. The data elements in each row represent certain facts that correspond to a real-world entity or relationship In the formal model, rows are called tuples Each column has a column header that gives an indication of the meaning of the data items in that column
In the formal model, the column header is called an attribute name (or just attribute)
Slide 5- 4
Example of a Relation
Slide 5- 5
Informal Definitions
Key of a Relation:
Each row has a value of a data item (or set of items) that uniquely identifies that row in the table
In the STUDENT table, SSN is the key The degree (or arity) of a relation is the number of attributes n of its relation schema. For example the above STUDENT relation is degree of seven
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Degree of a Relation:
Slide 5- 6
Denoted by R(A1, A2, .....An) R is the name of the relation The attributes of the relation are A1, A2, ..., An
CUSTOMER is the relation name Defined over the four attributes: Cust-id, Cust-name, Address, Phone#
For example, the domain of Cust-id is 6 digit numbers.
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 5- 7
A tuple is an ordered set of values (enclosed in angled brackets < >) Each value is derived from an appropriate domain. A row in the CUSTOMER relation is a 4-tuple and would consist of four values, for example:
<632895, "John Smith", "101 Main St.", "(404) 894-2000"> This is called a 4-tuple as it has 4 values A tuple (row) in the CUSTOMER relation.
Slide 5- 8
Example: USA_phone_numbers are the set of 10 digit phone numbers valid in the U.S.
The USA_phone_numbers may have a format: (ddd)ddd-dddd where each d is a decimal digit.
Dates have various formats such as year, month, date formatted as yyyy-mm-dd, or as dd mm,yyyy etc.
The attribute name designates the role played by a domain in a relation: Used to interpret the meaning of the data elements corresponding to that attribute
Example: The domain Date may be used to define two attributes named Invoice-date and Payment-date with different meanings
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 5- 9
Formally,
R(A1, A2, , An) is the schema of the relation R is the name of the relation A1, A2, , An are the attributes of the relation r(R): a specific state (or "value" or population) of relation R this is a set of tuples (rows)
r(R) = {t1, t2, , tn} where each ti is an n-tuple ti = <v1, v2, , vn> where each vj element-of dom(Aj)
Slide 5- 10
Definition Summary
Informal Terms
Table Column Header All possible Column Values
Formal Terms
Relation Attribute Domain
Row
Tuple
Table Definition
Populated Table
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Schema of a Relation
State of the Relation
Slide 5- 11
Slide 5- 12
Characteristics Of Relations
Ordering of tuples in a relation r(R): The tuples are not considered to be ordered, even though they appear to be in the tabular form. Ordering of attributes in a relation schema R (and of values within each tuple): We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .
(However, a more general alternative definition of relation does not require this ordering).
Slide 5- 13
Slide 5- 14
Characteristics Of Relations
Values in a tuple:
All values are considered atomic (indivisible). Each value in a tuple must be from the domain of the attribute for that column
If tuple t = <v1, v2, , vn> is a tuple (row) in the relation state r of R(A1, A2, , An) Then each vi must be a value from dom(Ai)
A special null value is used to represent values that are unknown or inapplicable to certain tuples.
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 5- 15
Constraints are conditions that must hold on all valid relation states. There are three main types of constraints in the relational model:
Key constraints Entity integrity constraints Referential integrity constraints Every value in a tuple must be from the domain of its attribute (or it could be null, if allowed for that attribute)
Slide 5- 16
Key Constraints
If a relation has several candidate keys, one is chosen arbitrarily to be the primary key.
CAR(State, Reg#, SerialNo, Make, Model, Year) We chose SerialNo as the primary key
The primary key value is used to uniquely identify each tuple in a relation
Slide 5- 17
CAR table with two candidate keys LicenseNumber chosen as Primary Key
Slide 5- 18
A set S of relation schemas that belong to the same database. S is the name of the whole database schema S = {R1, R2, ..., Rn} R1, R2, , Rn are the names of the individual relation schemas within the database S
Slide 5- 19
Slide 5- 20
Entity Integrity
Entity Integrity: The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R).
This is because primary key values are used to identify the individual tuples. t[PK] null for any tuple t in r(R) If PK has several attributes, null is not allowed in any of these attributes
Note: Other attributes of R may be constrained to disallow null values, even though they are not members of the primary key.
Slide 5- 21
Referential Integrity
Slide 5- 22
Referential Integrity
Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2.
A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.
Slide 5- 23
The value in the foreign key column (or columns) FK of the the referencing relation R1 can be either:
(1) a value of an existing primary key value of a corresponding primary key PK in the referenced relation R2, or (2) a null.
In case (2), the FK in R1 should not be a part of its own primary key.
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 5- 24
Each relation schema can be displayed as a row of attribute names The name of the relation is written above the attribute names The primary key attribute (or attributes) will be underlined A foreign key (referential integrity) constraints is displayed as a directed arc (arrow) from the foreign key attributes to the referenced table
Can also point the the primary key of the referenced relation for clarity
Slide 5- 26
Each relation will have many tuples in its current relation state The relational database state is a union of all the individual relation states Whenever the database is changed, a new state arises Basic operations for changing the database:
INSERT a new tuple in a relation DELETE an existing tuple from a relation MODIFY an attribute of an existing tuple
Slide 5- 27
Slide 5- 28
INSERT a tuple. DELETE a tuple. MODIFY a tuple. Integrity constraints should not be violated by the update operations. Several update operations may have to be grouped together. Updates may propagate to cause other updates automatically. This may be necessary to maintain integrity constraints.
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 5- 29
Cancel the operation that causes the violation (RESTRICT or REJECT option) Perform the operation but inform the user of the violation Trigger additional updates so the violation is corrected (CASCADE option, SET NULL option) Execute a user-specified error-correction routine
Slide 5- 30
Domain constraint:
if one of the attribute values provided for the new tuple is not of the specified attribute domain if the value of a key attribute in the new tuple already exists in another tuple in the relation if a foreign key value in the new tuple references a primary key value that does not exist in the referenced relation if the primary key value is null in the new tuple
Key constraint:
Referential integrity:
Entity integrity:
Slide 5- 31
If the primary key value of the tuple being deleted is referenced from other tuples in the database
RESTRICT option: reject the deletion CASCADE option: propagate the new primary key value into the foreign keys of the referencing tuples SET NULL option: set the foreign keys of the referencing tuples to NULL
One of the above options must be specified during database design for each foreign key constraint
Slide 5- 32
UPDATE may violate domain constraint and NOT NULL constraint on an attribute being modified Any of the other constraints may also be violated, depending on the attribute being updated:
Similar to a DELETE followed by an INSERT Need to specify similar options to DELETE May violate referential integrity
Slide 5- 33
Page 1
Database security is considered in relation to the following situations: Theft and fraud Loss of confidentiality (secrecy) Loss of privacy Loss of integrity Loss of availability Database security - the mechanisms that protect the database against intentional or accidental threats. And Database security encompasses hardware, software, people and data Threat any situation or event, whether intentional or accidental, that may adversely affect a system and consequently the organization A threat may be caused by a situation or event involving a person, action, or circumstance that is likely to bring harm to an organization The harm to an organization may be tangible or intangible Tangible loss of hardware, software, or data Intangible loss of credibility or client confidence Examples of threats: Using another persons means of access Unauthorized amendment/modification or copying of data Program alteration Wire-tapping Illegal entry by hacker Blackmail Creating trapdoor into system Theft of data, programs, and equipment Failure of security mechanisms, giving greater access than normal Inadequate staff training
Page 2
Viewing and disclosing unauthorized data Data corruption owing to power loss or surge Fire (electrical fault, lightning strike, arson), flood, bomb Physical damage to equipment Breaking cables or disconnection of cables Introduction of viruses An organization needs to identify the types of threat it may be subjected to and initiate appropriate plans and countermeasures, bearing in mind the costs of implementing them
Page 3
Page 4
Backup and recovery Backup is the process of periodically taking a copy of the database and log file (and possibly programs) on to offline storage media A DBMS should provide backup facilities to assist with the recovery of a database following failure Database recovery is the process of restoring the database to a correct state in the event of a failure Journaling is the process of keeping and maintaining a log file (or journal) of all changes made to the database to enable recovery to be undertaken effectively in the event of a failure The advantage of journaling is that, in the event of a failure, the database can be recovered to its last known consistent state using a backup copy of the database and the information contained in the log file If no journaling is enabled on a failed system, the only means of recovery is to restore the database using the latest backup version of the database However, without a log file, any changes made after the last backup to the database will be lost Integrity Integrity constraints contribute to maintaining a secure database system by preventing data from becoming invalid and hence giving misleading or incorrect results Domain Integrity , Entity integrity , Referential integrity and Key constraints Encryption The encoding of the data by a special algorithm that renders the data unreadable by any program without the decryption key If a database system holds particularly sensitive data, it may be deemed necessary to encode it as a precaution against possible external threats or attempts to access it The DBMS can access data after decoding it, although there is a degradation in performance because of the time taken to decode it Encryption also protects data transmitted over communication lines
Page 5
Any database access request will have the following three major components 1. Requested Operation: what kind of operation is requested by a specific query? 2. Requested Object: on which resource or data of the database is the operation sought to be applied? 3. Requesting User: who is the user requesting the operation on the specified object? The database should be able to check for all the three components before processing any request. The checking is performed by the security subsystem of the DBMS.
Page 6
Authentication
All users of the database will have different access levels and permission for different data objects, and authentication is the process of checking whether the user is the one with the privilege for the access level. Is the process of checking the users are who they say they are. Each user is given a unique identifier, which is used by the operating system to determine who they are Thus the system will check whether the user with a specific username and password is trying to use the resource. Associated with each identifier is a password, chosen by the user and known to the operation system, which must be supplied to enable the operating system to authenticate who the user claims to be Forms of user authorization There are different forms of user authorization on the resource of the database. These forms are privileges on what operations are allowed on a specific data object. User authorization on the data/extension 1. Read Authorization: the user with this privilege is allowed only to read the content of the data object. 2. Insert Authorization: the user with this privilege is allowed only to insert new records or items to the data object. 3. Update Authorization: users with this privilege are allowed to modify content of attributes but are not authorized to delete the records. 4. Delete Authorization: users with this privilege are only allowed to delete a record and not anything else.
Page 7
Different users, depending on the power of the user, can have one or the combination of the above forms of authorization on different data objects. Role of DBA in Database Security The database administrator is responsible to make the database to be as secure as possible. For this the DBA should have the most powerful privilege than every other user. The DBA provides capability for database users while accessing the content of the database. The major responsibilities of DBA in relation to authorization of users are: 1. Account Creation: involves creating different accounts for different USERS as well as USER GROUPS. 2. Security Level Assignment: involves in assigning different users at different categories of access levels. 3. Privilege Grant: involves giving different levels of privileges for different users and user groups. 4. Privilege Revocation: involves denying or canceling previously granted privileges for users due to various reasons. 5. Account Deletion: involves in deleting an existing account of users or user groups. Is similar with denying all privileges of users on the database.
Page 8