Database Lecture Note

AAU
Database
Instructor Seble
M.T
2006
WWW.AAU.EDU.ET
Fundamentals of Database
Chapter One: Introduction to Database Systems
CHAPTER ONE
Database System
Database systems are designed to manage large data set in an organization. The data management involves both definition and the manipulation of the data which ranges from simple representation of the data to considerations of structures for the storage of information. The data management also consider the provision of mechanisms for the manipulation of information. Today, Databases are essential to every business. They are used to maintain internal records, to present data to customers and clients on the World-Wide-Web, and to support many other commercial processes. Databases are likewise found at the core of many modern organizations. The power of databases comes from a body of knowledge and technology that has developed over several decades and is embodied in specialized software called a database management system, or DBMS. A DBMS is a powerful tool for creating and managing large amounts of data efficiently and allowing it to persist over long periods of time, safely. These systems are among the most complex types of software available. Thus, for our question: What is a database? In essence a database is nothing more than a collection of shared information that exists over a long period of time, often many years. In common dialect, the term database refers to a collection of data that is managed by a DBMS.
Thus the DB course is about:

How to organize data Supporting multiple users Efficient and effective data retrieval Secured and reliable storage of data Maintaining consistent data Making information useful for decision making Data management passes through the different levels of development along with the development in technology and services. These levels could best be described by categorizing the levels into three levels of development. Even though there is an advantage and a problem overcome at each new level, all methods of data handling are in use to some extent. The major three levels are;
AAU, Computer Science Department, 2013 Page 1
1. Manual Approach 2. Traditional File Based Approach 3. Database Approach
1. Manual Approach
In the manual approach, data storage and retrieval follows the primitive and traditional way of information handling where cards and paper are used for the purpose. Files for as many event and objects as the organization has are used to store information. Each of the files containing various kinds of information is labeled and stored in one or more cabinets. The cabinets could be kept in safe places for security purpose based on the sensitivity of the information contained in it. Insertion and retrieval is done by searching first for the right cabinet then for the right the file then the information. One could have an indexing system to facilitate access to the data
Limitations of the Manual approach
Prone to error Difficult to update, retrieve, integrate You have the data but it is difficult to compile the information Limited to small size information Cross referencing is difficult
An alternative approach of data handling is a computerized way of dealing with the information. The computerized approach could also be either decentralized or centralized base on where the data resides in the system.
2. Traditional File Based Approach

After the introduction of Computer for data processing to the business community, the need to use the device for data storage and processing increase. There were, and still are, several
computer applications with file based processing used for the purpose of data handling. Even though the approach evolved over time, the basic structure is still similar if not identical. File based systems were an early attempt to computerize the manual filing system. This approach is the decentralized computerized data handling method. A collection of application programs perform services for the end-users. In such systems, every application program that provides service to end users, define and manage its own data. Such systems have number of programs for each of the different applications in the organization. Since every application defines and manages its own data, the system is subjected to serious data duplication problem. File, in traditional file based approach, is a collection of records which contains logically related data.
AAU, Computer Science Department, 2013
Page 3
Limitations of the Traditional File Based approach As business application become more complex demanding more flexible and reliable data handling methods, the shortcomings of the file based system became evident. These shortcomings include, but not limited to: Separation or Isolation of Data: Available information in one application may not be known. Limited data sharing Lengthy development and maintenance time Duplication or redundancy of data Data dependency on the application Incompatible file formats between different applications and programs creating inconsistency. Fixed query processing which is defined during application development The limitations for the traditional file based data handling approach arise from two basic reasons. 1. Definition of the data is embedded in the application program which makes it difficult to modify the database definition easily. 2. No control over the access and manipulation of the data beyond that imposed by the application programs.
The most significant problem experienced by the traditional file based approach of data handling is the update anomalies. We have three types of update anomalies; 1. Modification Anomalies: a problem experienced when one ore more data value is modified on one application program but not on others containing the same data set. 2. Deletion Anomalies: a problem encountered where one record set is deleted from one application but remain untouched in other application programs. 3. Insertion Anomalies: a problem encountered where one cannot decide whether the data to be inserted is valid and consistent with other similar data set.
Page 4
3. Database Approach
Following a famous paper written by Ted Codd in 1970, database systems changed significantly. Codd proposed that database systems should present the user with a view of data organized as tables called relations. Behind the scenes, there might be a complex data structure that allowed rapid response to a variety of queries. But, unlike the user of earlier database systems, the user of a relational system would not be concerned with the storage structure. Queries could be expressed in a very high-level language, which greatly increased the efficiency of database programmers. The database approach emphasizes the integration and sharing of data throughout the organization. Thus in Database Approach: Database is just a computerized record keeping system or a kind of electronic filing cabinet. Database is a repository for collection of computerized data files. Database is a shared collection of logically related data designed to meet the information needs of an organization. Since it is a shared corporate resource, the database is integrated with minimum amount of or no duplication. Database is a collection of logically related data where these logically related data comprises entities, attributes, relationships, and business rules of an organization's information. In addition to containing data required by an organization, database also contains a description of the data which called as Metadata or Data Dictionary or Systems Catalogue or Data about Data. Since a database contains information about the data (metadata), it is called a self descriptive collection on integrated records. The purpose of a database is to store information and to allow users to retrieve and update that information on demand. Database is deigned once and used simultaneously by many users. Unlike the traditional file based approach in database approach there is program data independence. That is the separation of the data definition from the application. Thus the application is not affected by changes made in the data structure and file organization. Each database application will perform the combination of: Creating database, Reading, Updating and Deleting data.
Benefits of the database approach

Data can be shared: two or more users can access and use same data instead of storing data in redundant manner for each user. Improved accessibility of data: by using structured query languages, the users can easily access data without programming experience. Redundancy can be reduced: isolated data is integrated in database to decrease the redundant data stored at different applications. Quality data can be maintained: the different integrity constraints in the database approach will maintain the quality leading to better decision making Inconsistency can be avoided: controlled data redundancy will avoid inconsistency of the data in the database to some extent. Transaction support can be provided: basic demands of any transaction support systems are implanted in a full scale DBMS. Integrity can be maintained: data at different applications will be integrated together with additional constraints to facilitate shared data resource. Security majors can be enforced: the shared data can be secured by having different levels of clearance and other data security mechanisms. Improved decision support: the database will provide information useful for decision making. Standards can be enforced: the different ways of using and dealing with data by different unite of an organization can be balanced and standardized by using database approach. Compactness: since it is an electronic data handling method, the data is stored compactly (no voluminous papers). Speed: data storage and retrieval is fast as it will be using the modern fast computer systems. Less labour: unlike the other data handling methods, data maintenance will not demand much resource. Centralized information control: since relevant data in the organization will be stored at one repository, it can be controlled and managed at the central level.
Page 6
Limitations and risk of Database Approach

Introduction of new professional and specialized personnel. Complexity in designing and managing data Te cost and risk during conversion from the old to the new system High cost incurred to develop and maintain Complex backup and recover services from the users perspective Reduced performance due to centralization High impact on the system when failure occur
Page 7
Database Management System (DBMS)

Database Management System (DBMS) is a Software package used for providing EFFICIENT, CONVENIENT and SAFE MULTI-USER (many people/programs
data, simultaneously) accessing same database, or even same outlives
storage of and access to MASSIVE amounts of PERSISTENT (data
programs that operate on it)
data. A DBMS also provides a systematic method for creating,
updating, storing, retrieving data in a database. DBMS also provides the service of controlling data access, enforcing data integrity, managing concurrency control, and recovery. Having this in mind, a full scale DBMS should at least have the following services to provide
to the user. 1. Data storage, retrieval and update in the database 2. A user accessible catalogue 3. Transaction support service: ALL or NONE transaction, which minimize data inconsistency. 4. Concurrency Control Services: access and update on the database by different users simultaneously should be implemented correctly. 5. Recovery Services: a mechanism for recovering the database after a failure must be available. 6. Authorization Services (Security): must support the implementation of access and authorization service to database administrator and users. 7. Support for Data Communication: should provide the facility to integrate with data transfer software or data communication managers. 8. Integrity Services: rules about data and the change that took place on the data, correctness and consistency of stored data, and quality of data based on business constraints. 9. Services to promote data independency between the data and the application 10. Utility services: sets of utility service facilities like
Importing data Statistical analysis support Index reorganization Garbage collection
DBMS and Components of DBMS Environment

A DBMS is software package used to design, manage, and maintain databases. It provides the following facilities: Data Definition Language (DDL): o Language used to define each data element required by the organization. o Commands for setting up schema of database Data Manipulation Language (DML): o Language used by end-users and programmers to store, retrieve, and access the data e.g. SQL o Also called "query language" Data Dictionary: tool used to store and organize information about the data The DBMS is software that helps to design, handle, and use data using the database approach. Taking a DBMS as a system, one can describe it with respect to it environment or other systems interacting with the DBMS. The DBMS environment has five components. 1.
Hardware: Components that are comprised of personal computers, mainframe or

any server computers, network infrastructure, etc.
2.
Software:
those components like the DBMS software, application programs,
operating systems, network software, and other relevant software. 3. Data: This is the most important component to the user of the database. There are two types of data in a database approach that is Operational and Metadata. The structure of the data in the database is called the schema, which is composed of the Entities, Properties of entities, and relationship between entities. 4.
Procedure: this is the rules and regulations on how to design and use a database. It
includes procedures like how to log on to the DBMS, how to use facilities, how to start and stop transaction, how to make backup, how to treat hardware and software failure, how to change the structure of the database.
5.
People: people in the organization responsible to designing, implement, manage,

administer and use of the database.
Page 9
Database Development Life Cycle

As it is one component in most information system development tasks, there are several steps in designing a database system. Here more emphasis is given to the design phases of the system development life cycle. The major steps in database design are; 1.
Planning: Analysis:
that is identifying information gap in an organization and propose a
database solution to solve the problem. 2. that concentrates more on fact finding about the problem or the
opportunity. Feasibility analysis, requirement determination and structuring, and selection of best design method are also performed at this phase. 3.
Design:
in database designing more emphasis is given to this phase. The phase is
further divided into three sub-phases. a. Conceptual Design: concise description of the data, data type, relationship between data and constraints on the data. There is no implementation or physical detail consideration. Used to elicit and structure all information requirements b. Logical Design: a higher level conceptual abstraction with selected specific data model to implement the data structure. It is particular DBMS independent and with no other physical considerations. c. Physical Design: physical implementation of the upper level design of the database with respect to internal storage and file structure of the database for the selected DBMS. To develop all technology and organizational specification. 4. Implementation: the testing and deployment of the designed database for use. 5.
Operation and Support:
administering and maintaining the operation of the
database system and providing support to users.
Page 10
Roles in Database Design and Use

As people are one of the components in DBMS environment, there are group of roles played by different stakeholders of the designing and operation of a database system.
1. DataBase Administrator (DBA)

Responsible
to oversee, control and manage the database resources (the database itself,
the DBMS and other related software)

Authorizing
access to the database and monitoring the use of the database
Coordinating Responsible Accountable Involves
for determining and acquiring hardware and software resources for problems like poor security, poor performance of the system
in all steps of database development
We can have further classifications of this role in big organizations having huge amount of data and user requirement. 1. Data Administrator (DA): is responsible on management of data resources. Involves in database planning, development, maintenance of standards policies and procedures at the conceptual and logical design phases. 2. DataBase Administrator (DBA): is more technically oriented role. Responsible for the physical realization of the database. Involves in physical design, implementation, security and integrity control of the database.
2. DataBase Designer (DBD)

Identifies the data to be stored and choose the appropriate structures to represent and store the data. Should understand the user requirement and should choose how the user views the database. Involve on the design phase before the implementation of the database system.
We have two distinctions of database designers, one involving in the logical and conceptual design and another involving in physical design.
Page 11
1. Logical and Conceptual DBD

Identifies data (entity, attributes and relationship) relevant to the organization Identifies constraints on each data Understand data and business rules in the organization Sees the database independent of any data model at conceptual level and consider one specific data model at logical design phase.
2. Physical DBD
Take logical design specification as input and decide how it should be physically realized. Map the logical data model on the specified DBMS with respect to tables and integrity constraints. (DBMS dependent designing) Select specific storage structure and access path to the database Design security measures required on the database
3. Application Programmer and Systems Analyst

System analyst determines the user requirement and how the user wants to view the database. The application programmer implements these specifications as programs; code, test, debug, document and maintain the application program. Determines the interface on how to retrieve, insert, update and delete data in the database. The application could use any high level programming language according to the availability, the facility and the required service.
4. End Users
Workers, whose job requires accessing the database frequently for various purpose. There are different group of users in this category.
1. Nave Users:
Sizable proportion of users Unaware of the DBMS
Only access the database based on their access level and demand Use standard and pre-specified types of queries.
2. Sophisticated Users
Are users familiar with the structure of the Database and facilities of the DBMS. Have complex requirements Have higher level queries Are most of the time engineers, scientists, business analysts, etc
3. Casual Users
Users who access the database occasionally. Need different information from the database each time. Use sophisticated database queries to satisfy their needs. Are most of the time middle to high level managers.
These users can be again classified as Actors on the Scene and Workers Behind the Scene.
Actors On the Scene:

Data Administrator Database Administrator Database Designer End Users
Workers Behind the Scene

DBMS designers and implementers: who design and implement different DBMS software. Tool Developers: experts who develop software packages that facilitates database system designing and use. Prototype, simulation, code generator developers could be an example. Independent software vendors could also be categorized in this group. Operators and Maintenance Personnel: system administrators who are responsible for actually running and maintaining the hardware and software of the database system and the information technology facilities.
Page 13
Fundamental of Database Systems: Data Models and Database System Architecture
CHAPTER TWO
DATABASE SYSTEM ARCHITECTURE

Data Models, Schemas, and Instances
One fundamental characteristic of the database approach is that it provides some level of data abstraction by hiding details of data storage that are not needed by most database users. A data model is a collection of concepts that can be used to describe the structure of a database and also it provides the necessary means to achieve this abstraction. By structure of a database, we mean the data types, relationships, and constraints that should hold for the data. Most data models also include a set of basic operations for specifying retrievals and updates on the database. In addition to the basic operations provided by the data model, it is becoming more common to include concepts in the data model to specify the dynamic aspect or behavior of a database application. This allows the database designer to specify a set of valid user defined operations that arc allowed on the database objects. An example of a user-defined operation could be COMPUTE_GPA, which can be applied to a STUDENT object. On the other hand, generic operations to insert, delete, modify, or retrieve any kind of object are often included in the basic data model operations.
Categories of Data Models

Many data models have been proposed, which we can categorize according to the types of concepts they use to describe the database structure. High-level or conceptual data models provide concepts that are close to the way many users perceive data, whereas low-level or physical data models provide concepts that describe the details of how data is stored in the computer. Concepts provided by low-level data models are generally meant for computer specialists, not for typical end users. Between these two extremes is a class of representational (or implementation) data models, which provide concepts that may be understood by end users but that are not too far removed from the way data is organized within the computer. Representational data models hide some details of data storage but can be implemented on a computer system in a direct way. Conceptual data models use concepts such as entities, attributes, and relationships. An entity represents a real-world object or concept, such as an employee or a project that is described in
Page 1
the database. An attribute represents some property of interest that further describes an entity, such as the employee's name or salary. A relationship among two or more entities represents an association among two or more entities, for example, a works-on relationship between an employee and a project. Representational or implementation data models are the models used most frequently in traditional commercial DBMSs. These include the widely used relational data model, as well as the so-called legacy data models-the network and hierarchical models-that have been widely used in the past. Representational data models represent data by using record structures and hence are sometimes called record-based data models. We can regard object data models as a new family of higher-level implementation data models that are closer to conceptual data models.
FIGURE 1.2
A database that stores student and course information.
Page 2
Hierarchical Data Model

In the hierarchical data model, information is organized as a collection of inverted trees of records. The inverted trees may be of arbitrary depth. The record at the root of a tree has zero or more child records; the child records, in turn, serve as parent records for their immediate descendants. This parent-child relationship recursively continues down the tree. The records consists of fields, where each field may contain simple data values (e.g. integer, real, text)., or a pointer to a record. The pointer graph is not allowed to contain cycles. Some combinations of fields may form the key for a record relative to its parent. Applications can navigate a hierarchical database by starting at a root and successively navigate downward from parent to children until the desired record is found. Applications can interleave parent-child navigation with traversal of pointers. Searching down a hierarchical tree is very fast since the storage layer for hierarchical databases use contiguous storage for hierarchical structures. All other types of queries require sequential search techniques. The hierarchical data model is impoverished for expressing complex information models. Often a natural hierarchy does not exist and it is awkward to impose a parent-child relationship. Pointers partially compensate for this weakness, but it is still difficult to specify suitable hierarchical schemas for large models. Hierarchical Data Model Summary The simplest data model Record type is referred to as node or segment The top node is the root node Nodes are arranged in a hierarchical structure as sort of upside-down tree A parent node can have more than one child node A child node can only have one parent node The relationship between parent and child is one-to-many Relation is established by creating physical link between stored records (each is stored with a predefined access path to other records) Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies
Page 3
Fig. Example of Hierarchical data model
Network Data Model

In the network data model, information is organized as a collection of graphs of record that are related with pointers. Network data models represent data in a symmetric manner, unlike the hierarchical data model (distinction between a parent and a child). A network data model is more flexible than a hierarchical data model and still permits efficient navigation. The records consists of lists of fields (fixed or variable length with maximum length), where each field contains a simple value (fixed or variable size). The network data model also introduces the notion of indexes of fields and records, sets of pointers, and physical placement of records. Network Data Model Summary Allows record types to have more that one parent unlike hierarchical model A network data models sees records as set members Each set has an owner and one or more members Allow no many to many relationship between entities Like hierarchical model network model is a collection of physically linked records. Allow member records to have more than one owner Database contains a complex array of pointers that thread through a set of records.
Page 4
Fig. Many-to-many relationship defined using Network data type
Relational Data Model

In the relational data model, information is organized in relations (two-dimensional tables). Each relation contain a set of tuples (records). Each tuple contain a number of fields. A field may contain a simple value (fixed or variable size) from some domain (e.g. integer, real, text, etc.). The relational data model is based on a mathematical foundation, called relational algebra. This mathematical foundation is the cornerstone to some of the very attractive properties of relational databases, since it first of all offers data independence, and offers a mathematical framework for many of the optimizations possible in relational databases (e.g. query optimization). Relational Data Model Summary Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A Relational Model for Large Shared Data Banks') Viewed as a collection of tables called Relations equivalent to collection of record types Stores information or data in the form of tables .. rows and columns A row of the table is called tuple.. equivalent to record A column of a table is called attribute.. equivalent to field The tables seem to be independent but are related some how. Conducts searches by using data in specified columns of one table to find additional data in another table In conducting searches, a relational database matches information from a field in one table with information in a corresponding field of another table to produce a third table that combines requested data from both tables Can define more flexible and complex relationship
Page 5
Schemas, Instances, and Database State

In any data model, it is important to distinguish between the descriptions of the database and the database itself. The description of a database is called the database schema, which is specified during database design and is not expected to change frequently. Most data models have certain conventions for displaying schemas as diagrams. A displayed schema is called a schema diagram. Figure 2.1 shows a schema diagram for the database shown in Figure 1.2; the diagram displays the structure of each record type but not the actual instances of records. We call each object in the schema-such as STUDENT or COURSE-a schema construct. A schema diagram displays only some aspects of a schema, such as the names of record types and data items, and some types of constraints. Other aspects are not specified in the schema diagram; for example, Figure 2.1 shows neither the data type of each data item nor the relationships among the various files. Many types of constraints are not represented in schema diagrams. A constraint such as "students majoring in computer science must take CS1310 before the end of their sophomore year" is quite difficult to represent. The actual data in a database may change quite frequently. For example, the database shown in Figure 1.2 changes every time we add a student or enter a new grade for a student. The data in the database at a particular moment in time is called a database state or snapshot. It is also called the current set of occurrences or instances in the database. In a given database state, each schema construct has its own current set of instances; for example, the STUDENT construct will contain the set of individual student entities (records) as its instances. Many database states can be constructed to correspond to a particular database schema. Every time we insert or delete a record or change the value of a data item in a record, we change one state of the database into another state. The distinction between database schema and database state is very important. When we define a new database, we specify its database schema only to the DBMS. At this point, the corresponding database state is the empty state with no data. We get the initial state of the database when the database is first populated or loaded with the initial data. From then on, every time an update operation is applied to the database, we get another database state. At any point in time, the database has a current state. The DBMS is partly responsible for ensuring that every state of the database is a valid state-that is, a state that satisfies the structure and constraints
Page 6
specified in the schema. Hence, specifying a correct schema to the DBMS is extremely important, and the schema must be designed with the utmost care. The DBMS stores the descriptions of the schema constructs and constraints-also called the meta-data-in the DBMS catalog so that DBMS software can refer to the schema whenever it needs to. The schema is sometimes called the intension, and a database state an extension of the schema. Although, as mentioned earlier, the schema is not supposed to change frequently, it is not uncommon that changes need to be occasionally applied to the schema as the application requirements change. For example, we may decide that another data item needs to be stored for each record in a file, such as adding the DateOfBirth to the STUDENT schema in Figure 2.1. This is known as schema evolution. Most modern DBMSs include some operations for schema evolution that can be applied while the database is operational. Name STUDENT CourseName COURSE SectionIdentifier CourseNumber Semester Year Instructor CourseNumber CreditHours Department StudentNumber Class Major
SECTION
FIGURE 2.1 Schema diagram for the database in Figure 1.2.
Page 7
THREE-SCHEMA ARCHITECTURE AND DATA INDEPENDENCE

Three of the four important characteristics of the database approach, are (1) insulation of program; and data (program-data and program-operation independence), (2) support of multiple user views, and (3) use of a catalog to store the database description (schema). In this section we specify an architecture for database systems, called the three-schema architecture also known as the ANSI/SPARC architecture, that was proposed to help achieve and visualize these characteristics. We then further discuss the concept of data independence.
The Three-Schema Architecture

The goal of the three-schema architecture, illustrated in Figure 2.2, is to separate the user applications and the physical database. In this architecture, schemas can be defined at the following three levels:
Internal Level
The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database.
Conceptual Level
The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. Usually, a representational data model is used to describe the conceptual schema when a database system is implemented. This implementation conceptual schema is often based on a conceptual schema design in a high-level data model.
External Level
The external or view level includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. As in the previous case, each external schema is typically implemented using a representational data model, possibly based on an external schema design in a high level data model.
Page 8
The three-schema architecture is a convenient tool with which the user can visualize the schema levels in a database system. Most DBMSs do not separate the three levels completely, but support the three-schema architecture to some extent. Some DBMSs may include physical-level details in the conceptual schema. In most DBMSs that support user views, external schemas are specified in the same data model that describes the conceptual-level information. Some DBMSs allow different data models to be used at the conceptual and external levels. Notice that the three schemas are only descriptions of data; the only data that actually exists is at the physical level. In a DBMS based on the three-schema architecture, each user group refers only to its own external schema. Hence, the DBMS must transform a request specified on an external schema into a request against the conceptual schema, and then into a request on the internal schema for processing over the stored database. If the request is a database retrieval, the data extracted from the stored database must be reformatted to match the user's external view. The processes of transforming requests and results between levels are called mappings. These mappings may be time-consuming, so some DBMSs-especially those that are meant to support small databases-do not support external views. Even in such systems, however, a certain amount of mapping is necessary to transform requests between the conceptual and internal levels.
ANSI-SPARC Three-level Architecture
Page 9
External View 1 Staff_No FName Salary
External View 2 FName DOB
Conceptual Level Staff_No FName LName DOB Salary Branch_No
Internal Level struct STAFF{ int Staff_No; int Branch_No; char FName[15]; char LName[15]; struct DOB; float salary; } Fig. Differences between Three Levels of ANSI-SPARC Architecture The ANSI-SPARC Architecture Defines DBMS schemas at three levels:
Internal schema o at the internal level to describe physical storage structures and access paths. Typically uses a physical data model. Conceptual schema o at the conceptual level to describe the structure and constraints for the whole database for a community of users. Uses a conceptual or an implementation data model. External schemas o at the external level to describe the various user views. Usually uses the same data model as the conceptual level.
Page 10
Data Independence
The three-schema architecture can be used to further explain the concept of data independence, which can be defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. We can define two types of data independence:
Logical Data Independence

Logical data independence is the capacity to change the conceptual schema without having to change external schemas or application programs. We may change the conceptual schema to expand the database (by adding a record type or data item), to change constraints, or to reduce the database (by removing a record type or data item). In the last case, external schemas that refer only to the remaining data should not be affected. For example, the external schema of Figure l.4a should not be affected by changing the GRADE_REPORT file shown in Figure 1.2 into the one shown in Figure 1.5a. Only the view definition and the mappings need be changed in a DBMS that supports logical data independence. After the conceptual schema undergoes a logical reorganization, application programs that reference the external schema constructs must work as before. Changes to constraints can be applied to the conceptual schema without affecting the external schemas or application programs.
Logical Data Independence Summary

Refers to immunity of external schemas to changes in conceptual schema. Conceptual schema changes e.g. addition/removal of entities should not require changes to external schema or rewrites of application programs. The capacity to change the conceptual schema without having to change the external schemas and their application programs.
Physical Data Independence

Physical data independence is the capacity to change the internal schema without having to change the conceptual schema. Hence, the external schemas need not be changed as well. Changes to the internal schema may be needed because some physical files had to be reorganized-for example, by creating additional access structures-to improve the performance of retrieval or update. If the same data as before remains in the database, we should not have to
Page 11
change the conceptual schema. For example, providing an access path to improve retrieval speed of SECTION records (Figure 1.2) by Semester and Year should not require a query such as "list all sections offered in fall 1998" to be changed, although the query would be executed more efficiently by the DBMS by utilizing the new access path.
Physical Data Independence Summary

The ability to modify the physical schema without changing the logical schema Applications depend on the logical schema In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others. The capacity to change the internal schema without having to change the conceptual schema Refers to immunity of conceptual schema to changes in the internal schema Internal schema changes e.g. using different file organizations, storage structures/devices should not require change to conceptual or external schemas. Whenever we have a multiple-level DBMS, its catalog must be expanded to include information on how to map requests and data among the various levels. The DBMS uses additional software to accomplish these mappings by referring to the mapping information in the catalog. Data independence occurs because when the schema is changed at some level, the schema at the next higher level remains unchanged; only the mapping between the two levels is changed. Hence, application programs referring to the higher-level schema need not be changed. The three-schema architecture can make it easier to achieve true data independence, both physical and logical. However, the two levels of mappings create an overhead during compilation or execution of a query or program, leading to inefficiencies in the DBMS. Because of this, few DBMSs have implemented the full three-schema architecture.
Page 12
Fig. Data Independence and the ANSI-SPARC Three-level Architecture
Page 13
Fundamental of Database Systems: Relational Data Model & ER-Model
CHAPTER THREE Relational Data Model & ER-Model

In database development life cycle, once all the requirements have been collected and analyzed, the next step is to create a conceptual schema for the database, using a high-level conceptual data model (one of which is ER-Model). This step is called conceptual design. The conceptual schema is a concise description of the data requirements of the users and includes detailed descriptions of the entity types, relationships, and constraints; these are expressed using the concepts provided by the high-level data model. Because these concepts do not include implementation details, they are usually easier to understand and can be used to communicate with nontechnical users. The high-level conceptual schema enables the database designers to concentrate on specifying the properties of the data, without being concerned with storage details. Consequently, it is easier for them to come up with a good conceptual database design. The next step in database design is the actual implementation of the database, using a commercial DBMS. Most current commercial DBMSs use an implementation data model such as the relational or the object-relational database model-so the conceptual schema is transformed from the high-level data model into the implementation data model. This step is called logical design or data model mapping, and its result is a database schema in the implementation data model of the DBMS. The last step is the physical design phase, during which the internal storage structures, indexes, access paths, and file organizations for the database files are specified. In this chapter only the basic ER model concepts for conceptual schema design are discussed.
Relational Data Model

The building blocks of the relational data model are: Entities: real world physical or logical object Attributes: properties used to describe each Entity or real world object. Relationship: the association between Entities Constraints: rules that should be obeyed while manipulating the data.
Page 1
1. Entities/Relation/Table
The basic object that the ER model represents is an entity, which is a "thing" in the real world with an independent existence. An entity may be an object with a physical existence (for example, a particular person, car, house, or employee) or it may be an object with a conceptual existence (for example, a company, a job, or a university course). Each entity has attributes-the particular properties that describe it. For example, an employee entity may be described by the employee's name, age, address, salary, and job. A particular entity will have a value for each of its attributes. NB: The name given to an entity should always be a singular noun descriptive of each item to be stored in it. E.g.: student NOT students. Every relation has a schema, which describes the columns, or fields The relation itself corresponds to our familiar notion of a table A relation is a collection of tuples, each of which contains values for a fixed number of attributes
2. Attributes/Fields/Columns
Attributes are pieces of information ABOUT entities. The analysis must of course identify those which are actually relevant to the proposed application. At this level we need to know such things as: Attribute name (be explanatory words or phrases) The domain from which attribute values are taken (A DOMAIN is a set of values from which attribute values may be taken.) Each attribute has values taken from a domain. For example, the domain of Name is string and that for salary is real Whether the attribute is part of the entity identifier (attributes which just describe an entity and those which help to identify it uniquely) Whether it is permanent or time-varying (which attributes may change their values over time) Whether it is required or optional for the entity (whose values will sometimes be unknown or irrelevant)
Page 2
Several types of attributes occur in the ER model: simple versus composite, single-valued versus multi-valued, and stored versus derived.
Composite versus Simple (Atomic) Attributes

Composite attributes can be divided into smaller subparts, which represent more basic attributes with independent meanings. For example, Address attribute of an employee entity can be subdivided into StreetAddress, City, State, and postal code. Attributes that are not divisible are called simple or atomic attributes. Composite attributes can form a hierarchy; for example, StreetAddress can be further subdivided into three simple attributes: Number, Street, and ApartmentNumber. The value of a composite attribute is the concatenation of the values of its constituent simple attributes.
Single-Valued versus Multi-valued Attributes

Most attributes have a single value for a particular entity; such attributes are called singlevalued. For example, Age is a single-valued attribute of a person. In some cases an attribute can have a set of values for the same entity-for example, Colors attribute for a car, or a CollegeDegrees attribute for a person. Cars with one color have a single value, whereas two-tone cars have two values for Colors. Similarly, one person may not have a college degree, another person may have one, and a third person may have two or more degrees; therefore, different persons can have different numbers of values for the CollegeDegrees attribute. Such attributes are called multi-valued. A multi-valued attribute may have lower and upper bounds to constrain the number of values allowed for each individual entity. For example, the Colors attribute of a car may have between one and three values, if we assume that a car can have at most three colors.
Stored versus Derived Attributes

In some cases, two (or more) attribute values are related-for example, the Age and BirthDate attributes of a person. For a particular person entity, the value of Age can be determined from the current (today's) date and the value of that person's BirthDate. The Age attribute is hence called a derived attribute and is said to be derivable from the BirthDate attribute, which is called a stored attribute. Some attribute values can be derived from related entities; for example, an
Page 3
attribute NumberOfEmployees of a department entity can be derived by counting the number of employees related to (working for) that department.
Null Values
In some cases a particular entity may not have an applicable value for an attribute. For example, the ApartmentNumber attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a CollegeDegrees attribute applies only to persons with college degrees. For such situations, a special value called null is created. An address of a single-family home would have null for its ApartmentNumber attribute, and a person with no college degree would have null for CollegeDegrees. NB: NULL applies to attributes which are not applicable or which do not have values. You may enter the value NA (meaning not applicable) Value of a key attribute cannot be null. Default value - assumed value if no explicit value
Key Attributes of Entity

An important constraint on the entities of an entity is the key or uniqueness constraint on attributes. An entity usually has an attribute whose values are distinct for each individual entity in the entity set. Such an attribute is called a key attribute (Primary Key), and its values can be used to identify each entity uniquely. For example, a Name attribute can be a key for a COMPANY entity type, because no two companies are allowed to have the same name. For the PERSON entity type, a typical key attribute is SocialSecurityNumber. Sometimes, several attributes together form a key, meaning that the combination of the attribute values must be distinct for each entity. If a set of attributes possesses this property, the proper way to represent this is to define a composite attribute and designate it as a key attribute of the entity type. Notice that such a composite key must be minimal; that is, all component attributes must be included in the composite attribute to have the uniqueness property. An entity type may also have no key, in which case it is called a weak entity.
Page 4
Summary of Key Attributes

o Each row of a table is uniquely identified by a PRIMARY KEY composed of one or more columns o Group of columns, that uniquely identifies a row in a table is called a CANDIDATE KEY o A column or combination of columns that matches the primary key of another table is called a FOREIGN KEY. Used to cross-reference tables
Value Sets (Domains) of Attributes

Each simple attribute of an entity type is associated with a value set (or domain of values), which specifies the set of values that may be assigned to that attribute for each individual entity. For instance, if the range of ages allowed for employees is between 16 and 70, we can specify the value set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16 and 70. Similarly, we can specify the value set for the Name
3. Relationships
A relationship type R among n entity types E1, E2, --- En, defines a set of associations among entities. For example, consider a relationship type WORKS_FOR between two entity types EMPLOYEE and DEPARTMENT, which associates each employee with the department for which the employee works.
Degree of a Relationship
The degree of a relationship type is the number of participating entity types. Hence, the WORKS-FOR relationship is of degree two. A relationship type of degree two is called binary, and one of degree three is called ternary. Relationships can generally be of any degree, but the ones most common are binary relationships. Higher-degree relationships are For example, suppose that we want to capture which employees use which skills on which project. We might try to represent this data in a database as three binary relationships between skills and project, project and employee, and employee and skill.
Page 5
Fig. A ternary relationship Works-on o Abebe and Kebede have worked on projects A and B. Used-on o Abebe has used programming skills on project x Have skill o An employee has a certain skill. This is different than used on because there are some skills that an employee has that he or she may not have used on a particular project. Needed o A project needs a particular skill. This is different than used on because there may be some skills for which employees have not been assigned to the project yet. Manages o An employee manages a project. This is a completely different dimension than skill so it could not be captured by used on.
Role Names and Recursive Relationships

Each entity that participates in a relationship plays a particular role in the relationship. The role name signifies the role that a participating entity from the entity type plays in each relationship instance, and helps to explain what the relationship means. For example, in the WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker and DEPARTMENT plays the role of department or employer.
Page 6
In some cases the same entity participates more than once in a relationship with the same entity in different roles. In such cases the role name is essential for distinguishing the meaning of each participation. Such relationship types are called recursive relationships. For example, the SUPERVISION relationship type relates an employee to a supervisor, where both employee and supervisor entities are members of the same EMPLOYEE entity.
Cardinality Ratios for Binary Relationships

An important concept about relationship is the number of instances/tuples that can be associated with a single instance from one entity in a single relationship. The cardinality ratio for a binary relationship specifies the maximum number of relationship instances that an entity can participate in. The number of instances participating or associated with a single instance from an entity in a relationship is called the CARDINALITY of the relationship. For example, in the WORKS_FOR binary relationship type, DEPARTMENT: EMPLOYEE is of cardinality ratio 1: N, meaning that each department can be related to (that is, employs) any number of employees. But an employee can be related to (work for) only one department. The possible cardinality ratios for binary relationship types are 1:1, l:M, M:l, and M:M. ONE-TO-ONE: one tuple is associated with only one other tuple An example of a 1:1 binary relationship is MANAGES, which relates a department entity to the employee who manages that department. This means any point in time an employee can manage only one department and a department has only one manager Another example of 1:1 relationship: Building Location as a single building will be located in a single location and as a single location will only accommodate a single Building.
Page 7
ONE-TO-MANY: one tuple can be associated with many other tuples, but not the reverse. Eg1. Department-Student: as one department can have multiple students. Eg 2: Employee-Deprtament: as one employee can work in multiple departments.
MANY-TO-ONE: many tuples are associated with one tuple but not the reverse. E.g. Employee Department: as many employees belong to a single department. MANY-TO-MANY: one tuple is associated with many other tuples and from the other side, with a different role name one tuple will be associated with many tuples
E.g. Student Course: as a student can take many courses and a single course
can be attended by many students.
E.g. Employee- Project: as an employee works-on many projects and a single

project can be done by many employees.
Page 8
4. Relational Constraints/Integrity Rules

Relational Integrity o Domain Integrity: No value of the attribute should be beyond the allowable limits o Entity Integrity: In a base relation, no attribute of a primary key can be null o Referential Integrity: If a foreign key exists in a relation, either the foreign key value must match a candidate key in its home relation or the foreign key value must be null foreign key to primary key match-ups o Enterprise Integrity: Additional rules specified by the users or database administrators of a database are incorporated
Relational languages and views

The languages in relational database management systems are the DDL and the DML that are used to define or create the database and perform manipulation on the database. We have the two kinds of relation in relational database. The difference is on how the relation is created, used and updated:
1. Base Relation
A Named Relation corresponding to an entity in the conceptual schema, whose tuples are physically stored in the database. 2. View Is the dynamic result of one or more relational operations operating on the base relations to produce another virtual relation. So a view virtually derived relation that does not necessarily exist in the database but can be produced upon request by a particular user at the time of request.
Purpose of a view
Hides unnecessary information from users Provide powerful flexibility and security Provide customized view of the database for users A view of one base relation can be updated. Update on views derived from various relations is not allowed
Page 9
Conceptual Database Design

Conceptual design revolves around discovering and analyzing organizational and user data requirements. The important activities are to identify: Entities, Attributes, Relationships and Constraints and based on these components develop the ER model using ER diagrams.
The Entity Relationship (E-R) Model

Entity-Relationship modeling is used to represent conceptual view of the database. The main components of ER Modeling are: Entities, Attributes, Relationships and Constraints. Before working on the conceptual design of the database, one has to know and answer the following basic questions. What are the entities and relationships in the enterprise What information about these entities and relationships should we store in the database? What are the integrity constraints that hold? Constraints on each data with respect to update, retrieval and store. Represent this information pictorially in ER diagrams, then map ER diagram into a relational database schema.
ER-Diagrams
Entity is represented by a RECTANGLE containing the name of the entity
Attributes are represented by OVALS and are connected to the entity by a line
Page 10
A derived attribute is indicated by a DOTTED LINE
PRIMARY KEYS are underlined
Key
Relationships are represented by DIAMOND shaped symbols
Structural Constraints on Relationships

Relationship types usually have certain constraints that limit the possible combinations of entities that may participate in the corresponding relationship set. These constraints are determined from the miniworld situation that the relationships represent. For example, if a company has a rule that each employee must work for exactly one department, then we would like to describe this constraint in the schema. We can distinguish two main types of relationship structural constraints: cardinality ratio and participation. 1. Constraints on Relationship / Multiplicity/ Cardinality Constraints Multiplicity constraint is the number of or range of possible occurrence of an entity type/relation that may relate to a single occurrence/tuple of an entity type/relation through a particular relationship. These constraints are mostly used to insure appropriate enterprise constraints. Specifies that each entity e in E participates in at least min and at most max relationship instances in R Default(no constraint): min=0, max=n (signifying no limit) Must have minmax, min0, max 1
Page 11
One-to-One Relationship
Relationship Manages between Employee and Department The multiplicity of the relationship is: o One department can only have one manager o One employee could manage either one or no departments
Employee
01
Manages
11
Department
One-to-Many Relationship
Relationship Leads between Employee and Project The multiplicity of the relationship o One staff may Lead one or more project(s) o One project is Lead by one employee
Employee
0*
Leads
11
Project
Many-to-Many Relationship
Relationship Teaches between Instructor and Course The multiplicity of the relationship o One Instructor Teaches one or more Course(s) o One Course is thought by zero or more instructor(s) 1*
Instructor Teaches
0*
Course
2. Participation of an Entity Set in a Relationship Set Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set. The entity with total participation will be connected with the relationship using a double line.
Page 12
E.g. 1: Participation of EMPLOYEE in belongs to relationship with DEPARTMENT is total since every employee should belong to a department.
E.g. 2: Participation of EMPLOYEE in manages relationship with DEPARTMENT, DEPARTMENT will have total participation but not EMPLOYEE
Partial participation: some entities may not participate in any relationship in the relationship set E.g. 1: Participation of EMPLOYEE in manages relationship with DEPARTMENT, EMPLOYEE will have partial participation since not all employees are managers.
Page 13
Attributes of Relationship Types Relationship types can also have attributes, similar to those of entity types. For example, to record the number of hours per week that an employee works on a particular project, we can include an attribute Hours for the WORKS_ON relationship type. Another example is to include the date on which a manager started managing a department via an attribute StartDate for the MANAGES relationship type. Notice that attributes of 1:1 or 1:M relationship types can be migrated to one of the participating entity types. For example, the StartDate attribute for the MANAGES relationship can be an attribute of either EMPLOYEE or DEPARTMENT, although conceptually it belongs to
Page 14
MANAGES. This is because MANAGES is a 1:1 relationship, so every department or employee entity participates in at most one relationship instance. Hence, the value of the StartDate attribute can be determined separately, either by the participating department entity or by the participating employee (manager) entity.
Weak Entity Types

Entity types that do not have key attributes of their own are called weak entity types. In contrast, regular entity types that do have a key attribute are called strong entity types. Entities belonging to a weak entity type are identified by being related to specific entities from another entity type in combination with one of their attribute values. A weak entity type always has a total participation constraint (existence dependency) with respect to its identifying relationship, because a weak entity cannot be identified without an owner entity. However, not every existence dependency results in a weak entity type. For example, a DRIVER_LICENSE entity cannot exist unless it is related to a PERSON entity, even though it has its own key (LicenseNumber) and hence is not a weak entity. Consider the entity type DEPENDENT, related to EMPLOYEE, which is used to keep track of the dependents of each employee via a l:N relationship (Figure 3.2). The attributes of DEPENDENT are Name (the first name of the dependent), BirthDate, Sex, and Relationship (to the employee). Two dependents of two distinct employees may, by chance, have the same values for Name, BirthDate, Sex, and Relationship, but they are still distinct entities. They are identified as distinct entities only after determining the particular employee entity to which each dependent is related. Each employee entity is said to own the dependent entities that are related to it. A weak entity type normally has a partial key, which is the set of attributes that can uniquely identify weak entities that are related to the same owner entity. In our example, if we assume that no two dependents of the same employee ever have the same first name, the attribute Name of DEPENDENT is the partial key. In the worst case, a composite attribute of all the weak entity's attributes will be the partial key.
Page 15
ER-to-Relational Mapping Algorithm

Step 1: Mapping of Regular Entity Types For each regular (strong) entity type E in the ER schema, create a relation R that includes all the simple attributes of E. Include only the simple component attributes of a composite attribute. Choose one of the key attributes of E as primary key for R. If the chosen key of E is composite, the set of simple attributes that form it will together form the primary key of R. Example: For the company database ER (Figure 3.2) o regular entities are: EMPLOYEE, DEPARTMENT, and PROJ ECT o The foreign key and relationship attributes, if any, are not included yet; they will be added during subsequent steps. These include the attributes SUPERSSN and DNO of EMPLOYEE, MGRSSN and MGRSTARTDATE of DEPARTMENT, and DNUM of PROJECT. o In our example, we choose empid, DNUMBER, and PNUMBER as primary keys for the relations EMPLOYEE, DEPARTMENT, and PROJECT, Step 2: Mapping of Weak Entity Types For each weak entity W in the ER schema with owner entity type E, create a relation R and include all simple attributes (or simple components of composite attributes) of W as attributes of R. In addition, include as foreign key attributes of R the primary key attributes) of the relations) that correspond to the owner entity types): this takes care of the identifying relationship type of W. The primary key of R is the combination of the primary keys of the owners and the partial key of the weak entity type W, if any. In our example, we create the relation DEPENDENT in this step to correspond to the weak entity type DEPENDENT. We include the primary key empId of the EMPLOYEE relation which corresponds to the owner entity type-as a foreign key attribute of DEPENDENT; The primary key of the DEPENDENT relation is the combination {empId, DEPENDENT_NAME} because DEPENDENT_NAME is the partial key of DEPENDENT.
Page 16
Step 3: Mapping of Binary 1:1 Relationship Types. For each binary 1:1 relationship type R in the ER schema, identify the relations S and T that correspond to the entity types participating in R. Choose one of the relations-S, say-and include as a foreign key in S the primary key of T. It is better to choose an entity type with total participation in R in the role of S. Include all the simple attributes (or simple components of composite attributes) of the 1:1 relationship type R as attributes of S. In our example, we map the 1:1 relationship type MANAGES by choosing the participating entity type DEPARTMENT to serve in the role of S, because its participation in the MANAGES relationship type is total (every department has a manager). We include the primary key of the EMPLOYEE relation as foreign key in the DEPARTMENT relation and rename it MGRSSN. We also include the simple attribute STARTDATE of the MANAGES relationship type in the DEPARTMENT relation and rename it MGRSTARTDATE. Note that it is possible to include the primary key of S as a foreign key in T instead. In our example, this amounts to having a foreign key attribute, say
DEPARTMENT_MANAGED in the EMPLOYEE relation, but it will have a null value for employee tuples who do not manage a department. Step 4: Mapping of Binary 1:N Relationship Types. For each regular binary 1:N relationship type R, identify the relation S that represents the participating entity type at the N-side of the relationship type. Include as foreign key in S the primary key of the relation T that represents the other entity type participating in R; this is done because each entity instance on the N-side is related to at most one entity instance on the 1-side of the relationship type. Include any simple attributes (or simple components of composite attributes) of the 1:N relationship type as attributes of S. In our example, we now map the 1:N relationship types WORKS_FOR, CONTROLS, and SUPERVISION . For WORKS_FOR we include the primary key DNUMBER of the DEPARTMENT relation as foreign key in the EMPLOYEE relation and call it DNO.
Page 17
For SUPERVISION we include the primary key of the EMPLOYEE relation as foreign key in the EMPLOYEE relation itself because the relationship is recursive-and call it SUPERSSN.
The CONTROLS relationship is mapped to the foreign key attribute DNUM of PROJECT, which references the primary key DNUMBER of the DEPARTMENT relation. Step 5: Mapping of Binary M:N Relationship Types.
For each binary M:N relationship type R, create a new relation S to represent R. Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types; their combination will form the primary key of S. Also include any simple attributes of the M:N relationship type (or simple components of composite attributes) as attributes of S. Notice that we cannot represent an M:N relationship type by a single foreign key attribute in one of the participating relations (as we did for 1:1 or I:N relationship types) because of the M:N cardinality ratio; we must create a separate relationship relation S. In our example, we map the M:N relationship type WORKS_ON relation WORKS_ON. We include the primary keys of the PROJECT and EMPLOYEE relations as foreign keys in WORKS_ON and rename them PNO and ESSN, respectively. We also include an attribute HOURS in WORKS_ON to represent the HOURS attribute of the relationship type. The primary key of the WORKS_ON relation is the combination of the foreign key attributes {ESSN, PNO}. Step 6: Mapping of Multivalued Attributes. For each multivalued attribute A, create a new relation R. This relation R will include an attribute corresponding to A, plus the primary key attribute K-as a foreign key in R-of the relation that represents the entity type or relationship type that has A as an attribute. The primary key of R is the combination of A and K. If the multivalued attribute is composite, we include its simple components. by creating the
Page 18
In our example, we create a relation DEPT_LOCATIONS. The attribute DLOCATION represents the multivalued attribute LOCATIONS of DEPARTMENT, while DNUMBER-as foreign key represents the primary key of the DEPARTMENT relation. The primary key of DEPT_LOCATIONS is the combination of {DNUMBER, DLOCATION}. A separate tuple will exist in DEPT_LOCATIONS for each location that a department has. Step 7: Mapping of N-ary Relationship Types.
For each n-ary relationship type R, where n > 2, create a new relation S to represent R. Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types. Also include any simple attributes of the n-ary relationship type (or simple components of composite attributes) as attributes of S. The primary key of S is usually a combination of all the foreign keys that reference the relations representing the participating entity types.
For example, consider the relationship type SUPPLY of the Figure above. This can be mapped to the relation SUPPLY shown in below; whose primary key is the combination of the three foreign keys {SNAME, PARTNO, PROJNAME}.
Page 19
Fundamental of Database Systems: Functional Dependency and Normalization
CHAPTER FOUR Functional Dependency and Normalization

So far, we have assumed that attributes are grouped to form a relation schema by using the common sense of the database designer or by mapping a database schema design from a conceptual data model such as the ER or some other conceptual data model. These models make the designer identify entity types and relationship types and their respective attributes, which leads to a natural and logical grouping of the attributes into relations when the mapping procedures in Chapter 3 are followed. However, we still need some formal measure of why one grouping of attributes into a relation schema may be better than another. So far in our discussion of conceptual design and its mapping into the relational model, we have not developed any measure of appropriateness or "goodness" to measure the quality of the design, other than the intuition of the designer. In this chapter we discuss some of the theory that has been developed with the goal of evaluating relational schemas for design quality-that is, to measure formally why one set of groupings of attributes into relation schemas is better than another. In this chapter, the concept of functional dependency, a formal constraint among attributes that is the main tool for formally measuring the appropriateness of attribute groupings into relation schemas is defined. We show how functional dependencies can be used to group attributes into relation schemas that are in a normal form. A relation schema is in a normal form when it satisfies certain desirable properties. The process of normalization consists of analyzing relations to meet increasingly more stringent normal forms leading to progressively better groupings of attributes. Normal forms are specified in terms of functional dependencies-which are identified by the database designer-and key attributes of relation schemas.
Informal Measures to goodness of a Relation

The ease with which the meaning of a relation's attributes can be explained is an informal measure of how well the relation is designed. GUIDELINE 1. Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation. Intuitively, if a relation schema corresponds to one entity type or one relationship type, it is straightforward to explain its meaning. Otherwise, if the relation corresponds to a mixture of multiple entities and relationships, semantic ambiguities will result and the relation cannot be easily explained.
Page 1
GUIDELINE 2. Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly. GUIDELINE 3. As far as possible, avoid placing attributes in a base relation whose values may frequently be null. If nulls are unavoidable, make sure that they apply in exceptional cases only and do not apply to a majority of tuples in the relation.
Normalization
A relational database is merely a collection of data, organized in a particular manner. As the father of the relational database approach, Codd created a series of rules called normal forms that help define that organization. Database normalization is a series of steps followed to obtain a database design that allows for consistent storage and efficient access of data in a relational database. These steps reduce data redundancy and the risk of data becoming inconsistent. Normalization is the process of identifying the logical associations between data items and designing a database that will represent such associations but without suffering the update anomalies which are; Insertion Anomalies, Deletion Anomalies and Modification Anomalies. All the normalization rules will eventually remove the update anomalies that may exist during data manipulation after the implementation. The underlying ideas in normalization are simple enough. Through normalization we want to design for our relational database a set of tables that; Contain all the data necessary for the purposes that the database is to serve, Have as little redundancy as possible, Accommodate multiple values for types of data that require them, Permit efficient updates of the data in the database, and Avoid the danger of losing data unknowingly
Normalization may reduce system performance since data will be cross referenced from many tables. Thus denormalization is sometimes used to improve performance, at the cost of reduced consistency guarantees.
Page 2
Drawbacks of Normalization Requires data to see the problems May reduce performance of the system Is time consuming, Difficult to design and apply and Prone to human error
The type of problems that could occur in insufficiently normalized table is called update anomalies which includes;
1. Insertion anomalies
An "insertion anomaly" is a failure to place information about a new database entry into all the places in the database where information about that new entry needs to be stored. In a properly normalized database, information about a new entry needs to be inserted into only one place in the database; in an inadequately normalized database, information about a new entry may need to be inserted into more than one place and, human fallibility being what it is, some of the needed additional insertions may be missed.
2. Deletion anomalies
A "deletion anomaly" is a failure to remove information about an existing database entry when it is time to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database; in an inadequately normalized database, information about that old entry may need to be deleted from more than one place, and, human fallibility being what it is, some of the needed additional deletions may be missed.
3. Modification anomalies
A modification of a database involves changing some value of the attribute of a table. In a properly normalized database table, whatever information is modified by the user, the change will be effected and used accordingly. The purpose of normalization is to reduce the chances for anomalies to occur in a database.
Page 3
Example of problems related with Anomalies EmpID 12 16 28 25 65 24 51 94 18 13 FName Abebe Lemma Chane Abera Almaz Dereje Selam Alem Girma Yared LName Mekuria Alemu Kebede Taye Belay Tamiru Belay Kebede Dereje Gizaw SkillID 2 5 2 6 2 8 4 3 1 7 Skill SQL C++ SQL VB6 SQL Oracle Prolog Cisco IP Java SkillType Database Programming Database Programming Database Database Programming Networking Programming Programming School AAU Unity AAU Helico Helico Unity Jimma AAU Jimma AAU SchoolAddress Sidist_Kilo Gerji Sidist_Kilo Piazza Piazza Gerji Jimma City Sidist_Kilo Jimma City Sidist_Kilo Skill Level 5 6 10 8 9 5 8 7 4 6
Deletion Anomalies:
If employee with ID 16 is deleted then ever information about skill C++ and the type of skill is deleted from the database. Then we will not have any information about C++ and its skill type. Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We cannot decide whether Pascal is allowed as a value for skill and we have no clue about the type of skill that Pascal should be categorized as. Modification Anomalies:
What if the address for Helico is changed from Piazza to Mexico? We need to look for every occurrence of Helico and change the value of School_Add from Piazza to Mexico, which is prone to error.
Functional Dependency (FD)

Before moving to the definition and application of normalization, it is important to have an understanding of "functional dependency." Data Dependency The logical association between data items that point the database designer in the direction of a good database design are referred to as determinant or dependent relationships. Two data items A and B are said to be in a determinant or dependent relationship if certain values of data item B always appears with certain values of data item A. if the data item A is the determinant data item and B the dependent data item then the direction of the association is from A to B and not vice versa.
Page 4
The essence of this idea is that if the existence of something, call it A, implies that B must exist and have a certain value, then we say that "B is functionally dependent on A." We also often express this idea by saying that "A determines B," or that "B is a function of A," or that "A functionally governs B." Often, the notions of functionality and functional dependency are expressed briefly by the statement, "If A, then B." It is important to note that the value B must be unique for a given value of A, i.e., any given value of A must imply just one and only one value of B. X Y holds if whenever two tuples have the same value for X, they must have the same value for Y. The notation is: A B which is read as; B is functionally dependent on A In general, a functional dependency is a relationship among attributes. In relational databases, we can have a determinant that governs one other attribute or several other attributes. FDs are derived from the real-world constraints on the attributes Example
Dinner Course Meat Fish Cheese
Type of Wine Red White Rose
Since the type of Wine served depends on the type of Dinner, we say Wine is functionally dependent on Dinner. DinnerWine
Dinner Course Meat Fish Cheese
Type of Wine Red White Rose
Type of Fork Meat fork Fish fork Cheese fork
Since both Wine type and Fork type are determined by the Dinner type, we say Wine is functionally dependent on Dinner and Fork is functionally dependent on Dinner. Dinner Wine Dinner Fork
Page 5
Partial Dependency If an attribute which is not a member of the primary key is dependent on some part of the primary key (if we have composite primary key) then that attribute is partially functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute. Then if {A,B} C and B C Then C is partially functionally dependent on {A,B}
Full Dependency If an attribute which is not a member of the primary key is not dependent on some part of the primary key but the whole key (if we have composite primary key) then that attribute is fully functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute Then if {A,B} C and B C and A C does not hold Then C Fully functionally dependent on {A,B}
Transitive Dependency In mathematics and logic, a transitive relationship is a relationship of the following form: "If A implies B, and if also B implies C, then A implies C." Example: If Abebe is a Human, and if every Human is an Animal, then Abebe must be an Animal. Generalized way of describing transitive dependency is that: If A functionally governs B, AND If B functionally governs C THEN A functionally governs C In the normal notation: {(AB) AND (BC)} then AC
Steps of Normalization
We have various levels or steps in normalization called Normal Forms. The level of complexity, strength of the rule and decomposition increases as we move from one lower level Normal Form to the higher. A table in a relational database is said to be in a certain normal form if it satisfies certain constraints. Normalization towards a logical design consists of the following steps: UnNormalized Form: Identify all data elements First Normal Form: Find the key with which you can find all data Second Normal Form: Remove part-key dependencies. Make all data dependent on the whole key. Third Normal Form: Remove non-key dependencies. Make all data dependent on nothing but the key.
Page 6
For most practical purposes, databases are considered normalized if they adhere to third normal form. First Normal Form (1NF) Requires that all column values in a table are atomic (e.g., a number is an atomic value, while a list or a set is not). We have two ways of achieving this: 1. Putting each repeating group into a separate table and connecting them with a primary keyforeign key relationship 2. Moving these repeating groups to a new row by repeating the common attributes. Then find the key with which you can find all data A table/relation is in 1NF If There are no duplicated rows in the table. Unique identifier Each cell is single-valued (i.e., there are no repeating groups). Entries in a column (attribute, field) are of the same kind.
Example for First Normal form (1NF ) UNNORMALIZED RELATION EmpID 12 16 28 25 65 24 51 94 FName Abebe Lemma Chane Abera Almaz Dereje Selam Alem LName Mekuria Alemu Kebede Taye Belay Tamiru Belay Kebede SkillID 2 5 2 6 2 8 4 3 Skill SQL C++ SQL VB6 SQL Oracle Prolog Cisco SkillType Database Programming Database Programming Database Database Programming Networking School AAU Unity AAU Helico Helico Unity Jimma AAU SchoolAddress Sidist_Kilo Gerji Sidist_Kilo Piazza Piazza Gerji Jimma City Sidist_Kilo Skill Level 5 6 10 8 9 5 8 7
FIRST NORMAL FORM (1NF) Remove all repeating groups. Distribute the multi-valued attributes into different rows and identify a unique identifier for the relation so that is can be said is a relation in relational database.
EmpID 12 12 16 16 28
FName Abebe Abebe Lemma Lemma Chane
LName Mekuria Mekuria Alemu Alemu Kebede
SkillID 1 3 2 7 1
Skill SQL VB6 C++ IP SQL
SkillType Database Programming Programming Programming Database
School AAU Helico Unity Jimma AAU
SchoolAddress Sidist_Kilo Piazza Gerji Jimma City Sidist_Kilo
SkillLevel 5 8 6 4 10
Page 7
65 65 65 24 94
Almaz Almaz Almaz Dereje Alem
Belay Belay Belay Tamiru Kebede
1 5 8 4 6
SQL Prolog Java Oracle Cisco
Database Programming Programming Database Networking
Helico Jimma AAU Unity AAU
Piazza Jimma City Sidist_Kilo Gerji Sidist_Kilo
9 8 6 5 7
Second Normal form 2NF No partial dependency of a non key attribute on part of the primary key. This will result in a set of relations with a level of Second Normal Form. Any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is automatically also in 2NF. A table/relation is in 2NF If It is in 1NF and If all non-key attributes are dependent on the entire key. i.e. no partial dependency.
Since a partial dependency occurs when a non-key attribute is dependent on only a part of the (composite) key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial dependencies". Example for 2NF: EMP_PROJ
EmpID
EMP_PROJ
EmpName
Rearranged
ProjNo
ProjName
ProjLoc
ProjFund
ProjMangID
EmpID
ProjNo
EmpName
ProjName
ProjLoc
ProjFund
ProjMangID
This schema is in its 1NF since we dont have any repeating groups or attributes with multi -valued property. To convert it to a 2NF we need to remove all partial dependencies of non key attributes on part of the primary key. {EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund, ProjMangID But in addition to this we have the following dependencies EmpIDEmpName ProjNoProjName, ProjLoc, ProjFund, ProjMangID As we can see some non key attributes are partially dependent on some part of the primary key. Thus these collections of attributes should be moved to a new relation.
Page 8
EMPLOYEE
EmpID
PROJECT
EmpName
ProjNo
ProjName
ProjLoc
ProjFund
ProjMangID
EMP_PROJ
EmpID
ProjNo
Third Normal Form (3NF ) Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of the key, remove them to a separate table. This level avoids update and delete anomalies. A table/relation is in 3NF If It is in 2NF and There are no transitive dependencies between attributes.
Example for (3NF) Assumption: Students of same batch (same year) live in one building or dormitory STUDENT
StudID 125/97 654/95 842/95 165/97 985/95
Stud_F_Name Abebe Lemma Chane Alem Almaz
Stud_L_Name Mekuria Alemu Kebede Kebede Belay
Dept Info Sc Geog CompSc InfoSc Geog
Year 1 3 3 1 3
Dormitary 401 403 403 401 403
This schema is in its 2NF since the primary key is a single attribute. Lets take StudID, Year and Dormitary and see the dependencies. StudIDYear AND YeaRDormitary Then transitively StudId Dormitary To convert it to a 3NF we need to remove all transitive dependencies of non key attributes on the primary key. STUDENT
StudID 125/97 654/95 842/95
Stud F_Name Abebe Lemma Chane
Stud L_Name Mekuria Alemu Kebede
Dept Info Sc Geog CompSc
Year 1 3 3
Page 9
165/97 985/95
DORM
Alem Almaz
Kebede Belay
InfoSc Geog
1 3
Year 1 3
Dormitary 401 403
Physical Database Design

Physical database design is the process of producing a description of the implementation of the database on secondary storage. It describes the base relations, file organization, and indexes used to achieve effective access to the data along with any associated integrity constraints and security measures. The design is tailored to a specific DBMS system. Source of information for the physical design process are logical and conceptual data model. Knowledge of the DBMS that is selected to host the database systems, with all its functionalities, is required since functionalities of current DBMSs vary widely.
Steps in physical database design

1. Translate logical data model for target DBMS o o o o o o o To determine the file organizations and access methods that will be used to store the base relations; i.e. the way in which relations and tuples will be held on secondary storage To decide how to represent the base relations we have identified in the global logical data model in the target DBMS. Design enterprise constraints for target DBMS Design base relation Design representation of derived data Design enterprise constraint
2. Design physical representation Analyze transactions o o To understand the functionality of the transactions that will run on the database and to analyze the important transactions Choose file organization To determine an efficient file organization for each base relation
Choose indexes
Page 10
Estimate disk space and system requirement To estimate the amount of disk space that will be required by the database
3. Design user view o To design the user views that were identified in the conceptual database design methodology 4. Design security mechanisms 5. Consider controlled redundancy o To determine whether introducing redundancy in a controlled manner by relaxing the normalization rules will improve the performance of the system. 6. Monitor and tune the operational system o Design access rules To design the access rules to the base relations and user views
Translate logical data model for target DBMS

This phase is the translation of the logical data model to produce a relational database schema in the target DBMS. This includes creating the data dictionary based on the logical model and information gathered. After the creation of the data dictionary, the next activity is to understand the functionality of the target DBMS so that all necessary requirements are fulfilled for the database intended to be developed. Knowledge of the DBMS includes: o o how to create base relations whether the system supports: definition of Primary key, Foreign key, Domains, Referential integrity constraints and definition of enterprise level constraints So to define a database in one of the current relational DBMSs, you have to be familiar with SQL, which is the standard and widely used relational database language.
Basic SQL(Structured Query Language)

The SQL language provides a higher-level declarative language interface, so the user only specifies what the result is to be, leaving the actual optimization and decisions on how to execute the query to the DBMS, which makes it a non-procedural language.
Page 11
SQL is a comprehensive database language: It has statements for data definition, query, and update. Hence, it is both a DDL and a DML. In addition, it has facilities for defining views on the database, for specifying security and authorization, for defining integrity constraints, and for specifying transaction controls. SQL uses the terms table, row, and column for the formal relational model terms relation, tuple, and attribute, respectively.
DDL commands in SQL

In many DBMSs where no strict separation of levels is maintained, one language, called the data definition language (DDL), is used by the DBA and by database designers to define schema. The DBMS will have a DDL compiler whose function is to process DDL statements in order to identify descriptions of the schema constructs and to store the schema description in the DBMS catalog. In this section we will discuss the three basic DDL commands of SQL, namely, CREATE, ALTER and DROP. SQL CREATE command The main SQL command for data definition is the CREATE statement, which can be used to create schemas, tables (relations), and domains (as well as other constructs such as views, assertions, and triggers). The CREATE TABLE command is used to specify a new relation by giving it a name and specifying its attributes and initial constraints. The attributes are specified first, and each attribute is given a name, a data type to specify its domain of values, and any attribute constraints, such as NOT NULL. The key, entity integrity, and referential integrity constraints can be specified within the CREATE TABLE statement after the attributes are declared, or they can be added later using the ALTER TABLE command. Syntax: CREATE TABLE tblname (attribute_name1 datatype, attribute_name2 datatype, . . .);
The relations declared through CREATE TABLE statements are called base tables (or base relations); this means that the relation and its tuples are actually created and stored as a file by the DBMS.
Page 12
Attribute Data Types and Domains in SQL The basic data types available for attributes include numeric, character string, bit string, boolean, date, and time. Numeric data types include integer numbers of various sizes (INTEGER or INT) and floatingpoint (real) numbers of various precision (FLOAT or REAL). Formatted numbers can be declared by using DECIMAL(i,j) or NUMERIC(i,j)-where i, the precision, is the total number of decimal digits and j, the scale, is the number of digits after the decimal point. Character-string data types are either fixed length--CHAR(n), where n is the number of characters-or varying length-VARCHAR(n), where n is the maximum number of characters. A boolean data type has the traditional values of TRUE or FALSE. The DATE data type has ten positions, and its components are YEAR, MONTH, and DAY in the form YYYY-MM-DD. The TIME data type has at least eight positions, with the components HOUR, MINUTE, and SECOND in the form HH:MM:SS. Defining Constraints Because SQL allows NULLs as attribute values, a constraint NOT NULL may be specified if NULL is not permitted for a particular attribute. This is always implicitly specified for the attributes that are part of the primary key of each relation, but it can be specified for any other attributes whose values are required not to be NULL. It is also possible to define a default value for an attribute by appending the clause DEFAULT <value> to an attribute definition. The default value is included in any new tuple if an explicit value is not provided for that attribute. Another type of constraint can restrict attribute or domain values using the CHECK clause following an attribute or domain definition. For example, suppose that department ids are restricted to integer numbers between 1 and 20; then, we can change the attribute declaration of DNUMBER in the DEPARTMENT table to the following:
CREATE TABLE department (

Did INT NOT NULL CHECK (DNUMBER > 0 AND DNUMBER < 21),
Dname VARCHAR(20) UNIQUE);
Page 13
Defining Key and Referential Integrity Constraints Because keys and referential integrity constraints are very important, there are special clauses within the CREATE TABLE statement to specify them. The PRIMARY KEY clause specifies one or more attributes that make up the primary key of a relation. If a primary key has a single attribute, the clause can follow the attribute directly. For example, the primary key of DEPARTMENT can be specified as follows
CREATE TABLE student( ID INT PRIMARY KEY Name VARCHAR(20) Department INT);
Referential integrity is specified via the FOREIGN KEY clause like the following
Example 1: CREATE TABLE student( ID INT PRIMARY KEY, Name VARCHAR(20), Sex CHAR(1), stdDept INT, FOREIGN KEY(stdDept) references departm ent(DID);
SQL ALTER command The definition of a base table or of other named schema elements can be changed by using the ALTER command. For base tables, the possible alter table actions include adding or dropping a column (attribute), changing a column definition, and adding or dropping table constraints. For example, to add an attribute for keeping track of age of students we can a new age attribute to the student base relations of Example 1, using the following command:
ALTER TABLE student ADD age INT;

For example, to remove the attribute sex from the student base relation:
ALTER TABLE student DROP COLUMN sex;

SQL DROP Command The DROP command can be used to delete named schema elements, such as tables, domains, or constraints. If a base relation within a schema is not needed any longer, the relation and its definition can be deleted by using the DROP TABLE command.
DROP TABLE <tablename>;

Page 14
DML Commands in SQL

In SQL, there are four DML commands that help a user to manipulate the database: Insert, Select, Update and Delete.
Data manipulation languages have their functional capability organized by the initial word in a statement, which is almost always a verb. In the case of SQL, these verbs are:
SELECT ... FROM ... WHERE ... INSERT INTO ... VALUES ... UPDATE ... SET ... WHERE ... DELETE FROM ... WHERE ...
Adding Table Data: The INSERT Command In its simplest form, INSERT is used to add a single tuple to a relation. We must specify the relation name and a list of values for the tuple. The values should be listed in the same order in which the corresponding attributes were specified in the CREATE TABLE command.
Syntax INSERT INTO table_name [ (column_name,...) ] VALUES ( value1,...);

For example, to add a new tuple to the student relation specified Example 1
Example 2: INSERT INTO student (ID, name, sex, stdDept) VALUES (001, 'Abebe', 'M', 012); OR INSERT INTO student VALUES (001, 'Abebe', 'M', 012);
Attributes not specified in are set to their DEFAULT or to NULL, and the values are listed in the same order as the attributes are listed in the INSERT command itself. It is also possible to insert into a relation multiple tuples separated by commas in a single INSERT command. The attribute values forming each tuple are enclosed in parentheses.
Page 15
Modifying Table Data: The UPDATE Command Use the UPDATE command to update one or more columns in an existing row of data in a table.
Syntax UPDATE table_name SET column_name = expression,... [ WHERE condition ];
If you include a WHERE clause, only rows meeting the criteria specified in the condition are updated. Conditions are described in Conditions. If no condition is specified in the WHERE clause, all rows are updated. For example, to change Abebes department whose record is inserted to the database at Example 2 from 012 to 044, the following command can be used:
Example 3 UPDATE student SET stdDept= 002 WHERE name=Abebe ;

To make every student department 067 the update command can be used without the where clause:
Example 4 UPDATE student SET stdDept= 067;

Removing Table Data: The DELETE Command Use the DELETE command to delete one or more rows of data from an existing table.
Syntax DELETE FROM table_name [WHERE condition ];

If you include a WHERE clause, only rows meeting the criteria specified in the condition are deleted. Conditions are described in Conditions. If no WHERE clause is specified, all rows are deleted. For example to delete every record from student table of Example 1 the following command can be used:
Page 16
Example 5: DELETE FROM student;

To delete students who are male the delete command can be used to with the where clause to select particular rows:
Example 6: DELETE FROM student WHERE sex=M;

Viewing Table Data: The SELECT Command SQL has one basic statement for retrieving information from a database: the SELECT statement. The basic form of the SELECT statement, sometimes called a mapping or a select-from-where block, is formed of the three clauses SELECT, FROM, and WHERE and has the following form:
SELECT FROM WHERE

where
<attribute list> <table list> <condition>;
<attribute list> is a list of attribute names whose values are to be retrieved by the query. <table list> is a list of the relation names required to process the query. <condition> is a conditional (Boolean) ex pression that identifies the tuples to be retrieved by the query.
In SQL, the basic logical comparison operators for comparing attribute values with one another and with literal constants are =, <, <=, >, >=, and <>. Use an * to retrieve all non-hidden columns in the table. Otherwise, specify a comma-separated list of columns to retrieve. A missing WHERE clause indicates no condition on tuple selection; hence, all tuples of the relation specified in the FROM clause qualify and are selected for the query result. Example 6 : To retrieve every column information of every student from student table:
SELECT FROM
OR
* student;
Page 17
SELECT FROM
Id,name,sex,stdDept student;
Example 7: To retrieve only name and student department information of every student from student table:
SELECT FROM
name, stdDept student;
Example 8: To retrieve only name and student department information of male students from student table:
SELECT FROM WHERE
name, stdDept student sex=M;
Page 18
Chapter 5
The Relational Data Model and Relational Database Constraints
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Chapter Outline

Relational Model Concepts Relational Model Constraints and Relational Database Schemas Update Operations and Dealing with Constraint Violations
Slide 5- 2
Relational Model Concepts
A Relation is a mathematical concept based on the ideas of sets The model was first proposed by Dr. E.F. Codd of IBM Research in 1970 in the following paper:
"A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970
The above paper caused a major revolution in the field of database management and earned Dr. Codd the coveted ACM Turing Award
Slide 5- 3
Informal Definitions
Informally, a relation looks like a table of values. A relation typically contains a set of rows. The data elements in each row represent certain facts that correspond to a real-world entity or relationship In the formal model, rows are called tuples Each column has a column header that gives an indication of the meaning of the data items in that column
In the formal model, the column header is called an attribute name (or just attribute)
Slide 5- 4
Example of a Relation
Slide 5- 5
Informal Definitions
Key of a Relation:
Each row has a value of a data item (or set of items) that uniquely identifies that row in the table
Called the key
In the STUDENT table, SSN is the key The degree (or arity) of a relation is the number of attributes n of its relation schema. For example the above STUDENT relation is degree of seven
Degree of a Relation:
Slide 5- 6
Formal Definitions - Schema
The Schema (or description) of a Relation:

Denoted by R(A1, A2, .....An) R is the name of the relation The attributes of the relation are A1, A2, ..., An
Example: CUSTOMER (Cust-id, Cust-name, Address, Phone#)

CUSTOMER is the relation name Defined over the four attributes: Cust-id, Cust-name, Address, Phone#
For example, the domain of Cust-id is 6 digit numbers.
Each attribute has a domain or a set of valid values.
Slide 5- 7
Formal Definitions - Tuple
A tuple is an ordered set of values (enclosed in angled brackets < >) Each value is derived from an appropriate domain. A row in the CUSTOMER relation is a 4-tuple and would consist of four values, for example:

<632895, "John Smith", "101 Main St.", "(404) 894-2000"> This is called a 4-tuple as it has 4 values A tuple (row) in the CUSTOMER relation.
A relation is a set of such tuples (rows)
Slide 5- 8
Formal Definitions - Domain
A domain has a logical definition:
Example: USA_phone_numbers are the set of 10 digit phone numbers valid in the U.S.
A domain also has a data-type or a format defined for it.
The USA_phone_numbers may have a format: (ddd)ddd-dddd where each d is a decimal digit.
Dates have various formats such as year, month, date formatted as yyyy-mm-dd, or as dd mm,yyyy etc.
The attribute name designates the role played by a domain in a relation: Used to interpret the meaning of the data elements corresponding to that attribute
Example: The domain Date may be used to define two attributes named Invoice-date and Payment-date with different meanings
Slide 5- 9
Formal Definitions - Summary
Formally,
Given R(A1, A2, .........., An)
R(A1, A2, , An) is the schema of the relation R is the name of the relation A1, A2, , An are the attributes of the relation r(R): a specific state (or "value" or population) of relation R this is a set of tuples (rows)

r(R) = {t1, t2, , tn} where each ti is an n-tuple ti = <v1, v2, , vn> where each vj element-of dom(Aj)
Slide 5- 10
Definition Summary
Informal Terms
Table Column Header All possible Column Values
Formal Terms
Relation Attribute Domain
Row
Tuple
Table Definition
Populated Table
Schema of a Relation
State of the Relation
Slide 5- 11
Example A relation STUDENT
Slide 5- 12
Characteristics Of Relations
Ordering of tuples in a relation r(R): The tuples are not considered to be ordered, even though they appear to be in the tabular form. Ordering of attributes in a relation schema R (and of values within each tuple): We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .
(However, a more general alternative definition of relation does not require this ordering).
Slide 5- 13
Same state as previous Figure (but with different order of tuples)
Slide 5- 14
Characteristics Of Relations
Values in a tuple:

All values are considered atomic (indivisible). Each value in a tuple must be from the domain of the attribute for that column
If tuple t = <v1, v2, , vn> is a tuple (row) in the relation state r of R(A1, A2, , An) Then each vi must be a value from dom(Ai)
A special null value is used to represent values that are unknown or inapplicable to certain tuples.
Slide 5- 15
Relational Integrity Constraints
Constraints are conditions that must hold on all valid relation states. There are three main types of constraints in the relational model:

Key constraints Entity integrity constraints Referential integrity constraints Every value in a tuple must be from the domain of its attribute (or it could be null, if allowed for that attribute)
Another implicit constraint is the domain constraint
Slide 5- 16
Key Constraints
If a relation has several candidate keys, one is chosen arbitrarily to be the primary key.
The primary key attributes are underlined.
Example: Consider the CAR relation schema:

CAR(State, Reg#, SerialNo, Make, Model, Year) We chose SerialNo as the primary key
The primary key value is used to uniquely identify each tuple in a relation
Provides the tuple identity

General rule: Choose as primary key the smallest of the candidate keys (in terms of size) Not always applicable choice is sometimes subjective
Also used to reference the tuple from another tuple

Slide 5- 17
CAR table with two candidate keys LicenseNumber chosen as Primary Key
Slide 5- 18
Relational Database Schema
Relational Database Schema:
A set S of relation schemas that belong to the same database. S is the name of the whole database schema S = {R1, R2, ..., Rn} R1, R2, , Rn are the names of the individual relation schemas within the database S
Following slide shows a COMPANY database schema with 6 relation schemas
Slide 5- 19
COMPANY Database Schema
Slide 5- 20
Entity Integrity
Entity Integrity: The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R).
This is because primary key values are used to identify the individual tuples. t[PK] null for any tuple t in r(R) If PK has several attributes, null is not allowed in any of these attributes
Note: Other attributes of R may be constrained to disallow null values, even though they are not members of the primary key.
Slide 5- 21
Referential Integrity
A constraint involving two relations
The previous constraints involve a single relation.
Used to specify a relationship among tuples in two relations:
The referencing relation and the referenced relation.
Slide 5- 22
Referential Integrity
Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2.
A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.
Slide 5- 23
Referential Integrity (or foreign key) Constraint
Statement of the constraint
The value in the foreign key column (or columns) FK of the the referencing relation R1 can be either:
(1) a value of an existing primary key value of a corresponding primary key PK in the referenced relation R2, or (2) a null.
In case (2), the FK in R1 should not be a part of its own primary key.
Slide 5- 24
Displaying a relational database schema and its constraints
Each relation schema can be displayed as a row of attribute names The name of the relation is written above the attribute names The primary key attribute (or attributes) will be underlined A foreign key (referential integrity) constraints is displayed as a directed arc (arrow) from the foreign key attributes to the referenced table
Can also point the the primary key of the referenced relation for clarity
Next slide shows the COMPANY relational schema diagram

Slide 5- 25
Referential Integrity Constraints for COMPANY database
Slide 5- 26
Populated database state
Each relation will have many tuples in its current relation state The relational database state is a union of all the individual relation states Whenever the database is changed, a new state arises Basic operations for changing the database:

INSERT a new tuple in a relation DELETE an existing tuple from a relation MODIFY an attribute of an existing tuple
Next slide shows an example state for the COMPANY database

Slide 5- 27
Populated database state for COMPANY
Slide 5- 28
Update Operations on Relations

INSERT a tuple. DELETE a tuple. MODIFY a tuple. Integrity constraints should not be violated by the update operations. Several update operations may have to be grouped together. Updates may propagate to cause other updates automatically. This may be necessary to maintain integrity constraints.
Slide 5- 29
Update Operations on Relations
In case of integrity violation, several actions can be taken:
Cancel the operation that causes the violation (RESTRICT or REJECT option) Perform the operation but inform the user of the violation Trigger additional updates so the violation is corrected (CASCADE option, SET NULL option) Execute a user-specified error-correction routine
Slide 5- 30
Possible violations for each operation
INSERT may violate any of the constraints:
Domain constraint:
if one of the attribute values provided for the new tuple is not of the specified attribute domain if the value of a key attribute in the new tuple already exists in another tuple in the relation if a foreign key value in the new tuple references a primary key value that does not exist in the referenced relation if the primary key value is null in the new tuple
Key constraint:
Referential integrity:
Entity integrity:
Slide 5- 31
DELETE may violate only referential integrity:
If the primary key value of the tuple being deleted is referenced from other tuples in the database
Can be remedied by several actions: RESTRICT, CASCADE, SET NULL

RESTRICT option: reject the deletion CASCADE option: propagate the new primary key value into the foreign keys of the referencing tuples SET NULL option: set the foreign keys of the referencing tuples to NULL
One of the above options must be specified during database design for each foreign key constraint
Slide 5- 32
UPDATE may violate domain constraint and NOT NULL constraint on an attribute being modified Any of the other constraints may also be violated, depending on the attribute being updated:
Updating the primary key (PK):

Similar to a DELETE followed by an INSERT Need to specify similar options to DELETE May violate referential integrity
Updating a foreign key (FK):
Updating an ordinary attribute (neither PK nor FK):
Can only violate domain constraints
Slide 5- 33
Fundamentals of Database Systems: Data Protection
CHAPTER SIX Database Security and Integrity

A database represents an essential corporate resource that should be properly secured using appropriate controls. Database security encompasses hardware, software, people and data Multi-user database system - DBMS must provide a database security and authorization subsystem to enforce limits on individual and group access rights and privileges. Database security and integrity is about protecting the database from being inconsistent and being disrupted. We can also call it database misuse. Database misuse could be Intentional or accidental, where accidental misuse is easier to cope with than intentional misuse. Accidental inconsistency could occur due to: System crash during transaction processing Anomalies due to concurrent access Anomalies due to redundancy Logical errors Likewise, even though there are various threats that could be categorized in this group, intentional misuse could be: Unauthorized reading of data Unauthorized modification of data or Unauthorized destruction of data Most systems implement good Database Integrity to protect the system from accidental misuse while there are many computer based measures to protect the system from intentional misuse, which is termed as Database Security measures.
Page 1
Database security is considered in relation to the following situations: Theft and fraud Loss of confidentiality (secrecy) Loss of privacy Loss of integrity Loss of availability Database security - the mechanisms that protect the database against intentional or accidental threats. And Database security encompasses hardware, software, people and data Threat any situation or event, whether intentional or accidental, that may adversely affect a system and consequently the organization A threat may be caused by a situation or event involving a person, action, or circumstance that is likely to bring harm to an organization The harm to an organization may be tangible or intangible Tangible loss of hardware, software, or data Intangible loss of credibility or client confidence Examples of threats: Using another persons means of access Unauthorized amendment/modification or copying of data Program alteration Wire-tapping Illegal entry by hacker Blackmail Creating trapdoor into system Theft of data, programs, and equipment Failure of security mechanisms, giving greater access than normal Inadequate staff training
Page 2
Viewing and disclosing unauthorized data Data corruption owing to power loss or surge Fire (electrical fault, lightning strike, arson), flood, bomb Physical damage to equipment Breaking cables or disconnection of cables Introduction of viruses An organization needs to identify the types of threat it may be subjected to and initiate appropriate plans and countermeasures, bearing in mind the costs of implementing them
Page 3
Countermeasures: Computer based controls

The types of countermeasure to threats on computer systems range from physical controls to administrative procedures Despite the range of computer-based controls that are available, it is worth noting that, generally, the security of a DBMS is only as good as that of the operating system, owing to their close association The following are computer-based security controls for a multi-user environment: Authorization The granting of a right or privilege that enables a subject to have legitimate access to a system or a systems object Authorization controls can be built into the software, and govern not only what system or object a specified user can access, but also what the user may do with it Authorization controls are sometimes referred to as access controls The process of authorization involves authentication of subjects (i.e. a user or program) requesting access to objects (i.e. a database table, view, procedure, trigger, or any other object that can be created within the system) Views A view is the dynamic result of one or more relational operations operation on the base relations to produce another relation A view is a virtual relation that does not actually exist in the database, but is produced upon request by a particular user The view mechanism provides a powerful and flexible security mechanism by hiding parts of the database from certain users Using a view is more restrictive than simply having certain privileges granted to a user on the base relation(s)
Page 4
Backup and recovery Backup is the process of periodically taking a copy of the database and log file (and possibly programs) on to offline storage media A DBMS should provide backup facilities to assist with the recovery of a database following failure Database recovery is the process of restoring the database to a correct state in the event of a failure Journaling is the process of keeping and maintaining a log file (or journal) of all changes made to the database to enable recovery to be undertaken effectively in the event of a failure The advantage of journaling is that, in the event of a failure, the database can be recovered to its last known consistent state using a backup copy of the database and the information contained in the log file If no journaling is enabled on a failed system, the only means of recovery is to restore the database using the latest backup version of the database However, without a log file, any changes made after the last backup to the database will be lost Integrity Integrity constraints contribute to maintaining a secure database system by preventing data from becoming invalid and hence giving misleading or incorrect results Domain Integrity , Entity integrity , Referential integrity and Key constraints Encryption The encoding of the data by a special algorithm that renders the data unreadable by any program without the decryption key If a database system holds particularly sensitive data, it may be deemed necessary to encode it as a precaution against possible external threats or attempts to access it The DBMS can access data after decoding it, although there is a degradation in performance because of the time taken to decode it Encryption also protects data transmitted over communication lines
Page 5
Levels of Security Measures

Security measures can be implemented at several levels and for different components of the system. These levels are: 1. Physical Level: concerned with securing the site containing the computer system should be physically secured. The backup systems should also be physically protected from access except for authorized users. 2. Human Level: concerned with authorization of database users for access the content at different levels and privileges. 3. Operating System: concerned with the weakness and strength of the operating system security on data files. Weakness may serve as a means of unauthorized access to the database. This also includes protection of data in primary and secondary memory from unauthorized access. 4. Database System: concerned with data access limit enforced by the database system. Access limit like password, isolated transaction and etc. Even though we can have different levels of security and authorization on data objects and users, who access which data is a policy matter rather than technical. These policies should be known by the system: should be encoded in the system should be remembered: should be saved somewhere (th catalogue)
Any database access request will have the following three major components 1. Requested Operation: what kind of operation is requested by a specific query? 2. Requested Object: on which resource or data of the database is the operation sought to be applied? 3. Requesting User: who is the user requesting the operation on the specified object? The database should be able to check for all the three components before processing any request. The checking is performed by the security subsystem of the DBMS.
Page 6
Authentication
All users of the database will have different access levels and permission for different data objects, and authentication is the process of checking whether the user is the one with the privilege for the access level. Is the process of checking the users are who they say they are. Each user is given a unique identifier, which is used by the operating system to determine who they are Thus the system will check whether the user with a specific username and password is trying to use the resource. Associated with each identifier is a password, chosen by the user and known to the operation system, which must be supplied to enable the operating system to authenticate who the user claims to be Forms of user authorization There are different forms of user authorization on the resource of the database. These forms are privileges on what operations are allowed on a specific data object. User authorization on the data/extension 1. Read Authorization: the user with this privilege is allowed only to read the content of the data object. 2. Insert Authorization: the user with this privilege is allowed only to insert new records or items to the data object. 3. Update Authorization: users with this privilege are allowed to modify content of attributes but are not authorized to delete the records. 4. Delete Authorization: users with this privilege are only allowed to delete a record and not anything else.
Page 7
Different users, depending on the power of the user, can have one or the combination of the above forms of authorization on different data objects. Role of DBA in Database Security The database administrator is responsible to make the database to be as secure as possible. For this the DBA should have the most powerful privilege than every other user. The DBA provides capability for database users while accessing the content of the database. The major responsibilities of DBA in relation to authorization of users are: 1. Account Creation: involves creating different accounts for different USERS as well as USER GROUPS. 2. Security Level Assignment: involves in assigning different users at different categories of access levels. 3. Privilege Grant: involves giving different levels of privileges for different users and user groups. 4. Privilege Revocation: involves denying or canceling previously granted privileges for users due to various reasons. 5. Account Deletion: involves in deleting an existing account of users or user groups. Is similar with denying all privileges of users on the database.
Page 8

Database Lecture Note

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Database Lecture Note

Caricato da

Copyright:

Formati disponibili

AAU

Chapter One: Introduction to Database Systems

Thus the DB course is about:

Chapter One: Introduction to Database Systems

1. Manual Approach 2. Traditional File Based Approach 3. Database Approach

Limitations of the Manual approach

2. Traditional File Based Approach

Chapter One: Introduction to Database Systems

AAU, Computer Science Department, 2013

Chapter One: Introduction to Database Systems

AAU, Computer Science Department, 2013

Chapter One: Introduction to Database Systems

Chapter One: Introduction to Database Systems

Benefits of the database approach

AAU, Computer Science Department, 2013

Chapter One: Introduction to Database Systems

Limitations and risk of Database Approach

AAU, Computer Science Department, 2013

Chapter One: Introduction to Database Systems

Database Management System (DBMS)

storage of and access to MASSIVE amounts of PERSISTENT (data

programs that operate on it)

data. A DBMS also provides a systematic method for creating,

Chapter One: Introduction to Database Systems

DBMS and Components of DBMS Environment

Hardware: Components that are comprised of personal computers, mainframe or

those components like the DBMS software, application programs,

People: people in the organization responsible to designing, implement, manage,

AAU, Computer Science Department, 2013

Chapter One: Introduction to Database Systems

Database Development Life Cycle

that is identifying information gap in an organization and propose a

in database designing more emphasis is given to this phase. The phase is

Operation and Support:

administering and maintaining the operation of the

database system and providing support to users.

AAU, Computer Science Department, 2013

Chapter One: Introduction to Database Systems

Roles in Database Design and Use

1. DataBase Administrator (DBA)

the DBMS and other related software)

access to the database and monitoring the use of the database

Coordinating Responsible Accountable Involves

in all steps of database development

2. DataBase Designer (DBD)

AAU, Computer Science Department, 2013

Chapter One: Introduction to Database Systems

1. Logical and Conceptual DBD

3. Application Programmer and Systems Analyst

Chapter One: Introduction to Database Systems

Actors On the Scene:

Workers Behind the Scene

AAU, Computer Science Department, 2013

Fundamental of Database Systems: Data Models and Database System Architecture

DATABASE SYSTEM ARCHITECTURE

Categories of Data Models

Fundamental of Database Systems: Data Models and Database System Architecture

A database that stores student and course information.

Fundamental of Database Systems: Data Models and Database System Architecture

Hierarchical Data Model

Fundamental of Database Systems: Data Models and Database System Architecture

Fig. Example of Hierarchical data model

Network Data Model