Sei sulla pagina 1di 22

Notes By : Rajat Sharma

Unit 1

Data Base Management System


BBA(CAM)

Introduction to Database Systems: Database: collection of inter-related data stored on some physical medium (e.g. paper!). A database represents some aspects of the real world: it is a model. It is developed for some specific purpose (intended group of users, preconceived applications that users are interested in) therefore; a random assortment of data cannot correctly be referred to as a database A database is a coherent collection of data with some inherent meaning A telephone directory of a person (maybe a hundred items, with a simple structure) A catalogue of a library, thousands of cards, stored under different categories (authors, subject, title, year of publication etc) Public services databases (e.g. Income Tax offices, or NHS), with hundreds of millions of items they can be generated and maintained manually or automatically A DBMS can be: Special purpose:- the programmer can write a set of programs to create and maintain the database needed by one particular application General purpose:- software systems that facilitate the management of databases for various applications. So we can highlight the DBMS in points: Collection of interrelated data Set of programs to access the data DBMS contains information about a particular enterprise DBMS provides an environment that is both convenient and efficient to use. Basic Definitions Database: A collection of related data. Data: Known facts that can be recorded and have an implicit meaning. Database Management system (DBMS): A software package/ system to facilitate the creation and maintenance of a computerized database. Database system: The DBMS software together with the data itself. Sometimes, the applications are also included.

File System versus a DBMS A Database Management System (DMS) is a combination of computer software, hardware, and information designed to electronically manipulate data via computer processing. Two types of database management systems are DBMSs and FMSs. In simple terms, a File Management System (FMS) is a Database Management System that allows access to single files or tables at a time. FMSs accommodate flat files that have no relation to other files. The FMS was the predecessor for the Database Management System (DBMS), which allows access to multiple files or tables at a time File Management Systems Advantages Simpler to use Less expensive Disadvantages Typically does not support multi-user access Limited to smaller databases

Fits the needs of many small businesses and Limited functionality (i.e. no support for home users complicated transactions, recovery, etc.) Popular FMSs are packaged along with the operating systems of personal computers Decentralization of data (i.e. Microsoft office) Good for database solutions for hand held Redundancy and Integrity issues devices such as Palm Pilot Database Management Systems Advantages Greater flexibility Good for larger databases Greater processing power Disadvantages Difficult to learn Packaged separately from the operating system (i.e. Oracle, Microsoft Access,) Slower processing speeds

Fits the needs of many medium to largeRequires skilled administrators sized organizations Storage for all relevant data Ensures data integrity by managing transactions (ACID test = atomicity, consistency, isolation, durability) Supports simultaneous access Provides backup and recovery controls Advanced security Expensive

Requirement of databases Following points decides the requirement of a database;


For which company database has to be designed? Working profile of the company? What groups of users will use the system? Do they need different levels of access into each part of the system (e.g. administration, manager, clerical, read-only) What data will members of each group input into the system and where will it come from? What information will each type of user need to extract from the system? What types of reports will be needed by each type of user?

Characteristics of a DBMS Data redundancy is reduced Data consistency can be maintained The data and the mechanism to use them are independent A logical conceptualisation of data is enforced, leading to a more meaningful organisation Complex relations among data may be represented Users can have a personalised view of data Data sharing is easier Security can be implemented more easily, e.g. Restricting unauthorized access

People who deal with database Users are differentiated by the way they expect to interact with the system Application programmers interact with system through DML calls Sophisticated users form requests in a database query language Specialized users write specialized database applications that do not fit into the traditional data processing framework Nave users invoke one of the permanent application programs that have been written previously. E.g. people accessing database over the web, bank tellers, and clerical staff Database Administrator -- Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprises information resources and needs.

Introduction to Data Models A DBM provides users with a conceptual representation of data, without technical details a file processing application is interested in how long each file is, the length of its records, its position in the memory etc. whereas a user should only be concerned with names of things, references etc. A Data Model is a type of data abstraction used to provide a conceptual representation of the data. It provides a logical and structured organization of the data: data is easier to manage, define and manipulate it allows a separation of physical and logical organization of data. The Importance of Data Models
Good database design uses an appropriate data model End-users have different views and needs for data Data model organizes data for various users

as its foundation

Data Models E-R Data Model* Relational Data Model* Network Data Model Hierarchical Data Model Object-oriented Data Model Semantic Data Model Network Model Schemas and instances Schema The schema is a program written by the DBA using DDL statements. It describes the structure of the database, defining all record types, set types, areas, and data items in the database. The DBA writes the schema independently of any application run unit. Only one schema can exist for a database. Subschema The subschema is a subset of the schema; it is users of the database. Instances The term instance is typically used to describe a complete database environment, at this particular moment. Database keeps on changing with time due to some modifications, insertions, deletions etc. So database at this very moment is called instance of a Database.

Architecture of a DBMS

Two-tier architecture: E.g. client programs using ODBC/JDBC to communicate with a atabase Three-tier architecture: E.g. web-based applications, and applications built using middleware

Levels of Abstraction Physical level describes how a record (e.g., customer) is stored. Logical level: describes what data is stored in database, and the relationships among the data. type customer = record name : string; street : string; city : integer; end; View level: application programs hide details of data types. information (e.g., salary) for security purposes. Views can also hide

Schemas are defined using DDL; data is modified/queried using DML. Data viewed is controlled by DCL.

Data Independence Logical Data Independence : The ability to change the logical schema without affecting the view schema is called as Logical Data Independence. Physical Data Independence : The ability to change the physical schema without affecting the logical schema is called as Physical Data Independence. Database Utilities: Banking: all transactions Airlines: reservations, schedules Universities: registration, grades Sales: customers, products, purchases Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax deductions Hospitals: reservations, patients, records CODE RULES TO CONVERT DBMS TO RDBMS** The relational DBMS model is based on the relational algebra devised by E.F.CODD. The relational algebra by Codd is done through 12 rules popularly known as CODD's 12 Rules. (1)Information Rule: All information in a relational database including table names, column names are represented explicitly by values in tables. Knowledge of only one language is necessary to access all data such as description of the table and attribute definitions ,integrity constraints, action to be taken when constraints are violated, and security information. (2)Guaranteed Access Rule: Every piece of data in the relational database can be accessed by using a combination of a table name, a primary key value that identifies the row and a column that identifies the cell. The benefit of this is that user productivity is improved since there is no need to resort to using physical pointers addresses. Provides data independence. (3)Systematic treatment of Nulls Rule: The RDBMS handles that have unknown on inapplicable values in a predefined fashion. RDBMS distinguishes between zeros, blanks and nulls in records and handles such values in a consistent manner that produces correct answers, comparisons and calculations. (4)Active On-Line Catalog Based on the Relational Model. The description of a database and its contents are database tables and therefore can be queries online via the data language. The DBA's productivity is improved since changes and additions to the catalog can be done with the same commands that are used to access any other table. All queries and reports can be done as with other tables. (5)Comprehensive Data Sublanguage Rule: A RDBMS may support several languages, but at least one of them allows the user to do all

of the following: define tables view, query and update data ,set integrity constraints, set authorization, and define transactions. (6)View Updating Rule: Any view that is theoretically updatable, if changes can be made to the tables that effect the desired changes in the view. Data consistency is ensured since changes in the underlying tables are transmitted to the view they support. Logical data independence reduces maintenance cost. (7)High Level Inserts, Update and Delete: The RDBMS supports insertion, updation and deletion at a table level. With this the RDBMS can improve performance by optimizing the path to be taken to execute the action Ease of use improved since commands act on set of records. (8)Physical data Independence: The execution of adhoc requests and application programs is not affected by changes in the physical data access and storage methods. Database administrators can make the changes to physical access and storage methods, which improve performance but do not changes in the application programs or adhoc requests. This reduces maintenance costs. (9)Logical data Independence: Logical changes in tables and view such as adding/deleting columns or changing field lengths do not necessitate modifications in application programs or in the format of adhoc requests. (10)Integrity Independence: Like table/view definitions, integrity constraints are stored in the on-line catalog and therefore can be changed without necessitating changes in application programs or in the format of adhoc requests. The following two integrity constraints must be supported. (a)Entity Integrity: No component of primary key is allowed to have a null value. (b)Referential integrity: For each distinct non-null foreign key value in a relational database, there must exist a matching primary key from the same range of data value. (11)Distribution Independence: Application programs and adhoc requests are not affected by changes in the distribution of the physical data. (12)Nonsubversion Rule: If the RDBMS has a language change that accesses the information of a record at a time, this language cannot be used to by-pass the integrity constraints. In order to adhere to this rule the RDBMS must have an active catalog that contains the constraints must have a logical data independence.

Normalization (Covered in last unit***

Unit 2
Relation ship model

Data Base Management System

"Relation" is a mathematical term for "table", and thus "relational" roughly means, "based on tables. A table may represent an entity or a relationship from an entityrelationship diagram.. The header represents the metadata about what is represented by the table. The rows are also called tuples, which represent an occurrence of the relation. The columns are the attributes of the relation.

Domain The domain of an attribute defines the values that the attribute is permitted to have. Each attribute is associated with a set of values that can fill the attribute. This is called the domain of the attribute. The values in a simple domain are atomic basic elements that cant be broken down further. Domain represents a data type of column or attribute and defines all attribute's possible value

An Example of Relationship Model

Primary key/ The primary key is a unique identifier of a tuple (record There may be more than one field or combination of fields that are unique identifiers. Any combination of attributes is a candidate key if: 1. 1. It is a unique identifier; and 2. 2. It contains the minimum number of fields necessary to make a unique key.

One of the set of candidate keys is chosen to be the primary key, the others are alternate keys or called as secondary keys. Example: Social Security Numbers, Vehicle Identification Number Secondary key The keys in the table, which are not primary or unique, are called secondary keys. Example: Phone Number, in CUSTOMER entity keyed by CustomerID Foreign key The idea of a foreign key is to identify references to one tuple by its primary key that is made inside another tuple. Example: SalesPersonID, in PURCHASE entity keyed by PurchaseID A foreign key is an attribute or combination of attributes in relation R2 whose values must match the primary key values in R1. In referring to the R1 tuple by its primary key in R2, you must be sure it exists in R1. Super key A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. Composite Attribute It is an attribute that can be broken down into multiple attributes. Example: 5000 Forbes Avenue, Pittsburgh, Pa 15213 Can be broken into at least 4 attributes: street address, city, state, and zip Candidate key Any attribute(s) that can uniquely identify a set of records. This key needs to be irreducible, meaning that you cannot remove any of the attributes from the key because it will no longer uniquely identify all records. Example: The composite key made up of SSN, Lname, Fname is reducible to only SSN, so SSN is a candidate key. Unique Key Ensures that a value appears in a column only once. Unique constraints are specified using the UNIQUE option. Not null Primary or foreign key are default not-null keys. This means that the column for primary or foreign key cannot be kept empty. There has to be some values in them. We can also declare any other column as not null. Constraints Constraints guard against accidental damage to the database, by ensuring that authorized changes to the database do not result in a loss of data consistency. Domain constraints are most elementary form of constraint. Already discussed earlier.

10

Check constraint Ensures that the contents of a column fulfills user-specified criteria (for example, salary >0). Check constraints are specified using the CHECK option The check clause permits domains to be restricted: Use check clause to ensure that an hourly-wage domain allows only values greater than a specified value Referential integrity Referential integrity ensures that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation. In other words all the values present in the foreign key should be present in the primary key. For example, suppose Table B has a foreign key that points to a field in Table A. Referential integrity would prevent you from adding a record to Table B that cannot be linked to Table A. In addition, the referential integrity rules might also specify that whenever you delete a record from Table A, any records in Table B that are linked to the deleted record will also be deleted. This is called cascading delete. Finally, the referential integrity rules could specify that whenever you modify the value of a linked field in Table A, all records in Table B that are linked to it will also be modified accordingly. This is called cascading update. Trigger A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database. To design a trigger mechanism, we must: Specify the conditions under which the trigger is to be executed. Specify the actions to be taken when the trigger executes. Suppose that instead of allowing negative account balances, the bank deals with overdrafts by setting the account balance to zero creating a loan in the amount of the overdraft giving this loan a loan number identical to the account number of the overdrawn account Forms of authorization on parts of the database: Read authorization - allows reading, but not modification of data. Insert authorization - allows insertion of new data, but not modification of existing data. Update authorization - allows modification, but not deletion of data. Delete authorization - allows deletion of data SQL Data Types Each DBMS defines its own SQL types like varchar, char, number, float, real, date, integer etc.

11

DDL(data definition language) Specification notation for defining the database schema E.g. create table account ( account-number char(10), balance integer) DDL compiler generates a set of tables stored in a data dictionary Data dictionary contains metadata (i.e., data about data) Database schema Data storage and definition language Language in which the storage structure and access methods used by the database system are specified Usually an extension of the data definition language DML (data manipulation language) Language for accessing and manipulating the data organized by the appropriate data model DML also known as query language Two classes of languages Procedural user specifies what data is required and how to get those data Nonprocedural user specifies what data is required without specifying how to get those data SQL is the most widely used query language DCL (Data control language) DCL statements are those, which are used to control access permissions on the tables, indexes, views and other elements of the DBMS. DCL handles the authorizations aspects of data and permits the user to control who has access to see or manipulate data within the database. The DCL statements are GRANT: Use to grant privileges to other users or roles. REVOKE: Use to take back privileges granted to other users and roles. Indexing Indexes allow a DBMS to access data quicker (please note: this feature is nonstandard/not available on all systems). The system creates this internal data structure (the index), which causes selection of rows, when the selection is based on indexed columns, to occur faster. With indexing, we are concerned with finding the data we actually want quickly and efficiently, without having to request and read more disk blocks than absolutely necessary. A database index is a data structure that improves the speed of operations on a database table. Indices can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. The disk space required to store the index is typically less than that required by the table (since indices usually contain only the key-fields according to which the table is to be

12

arranged, and excludes all the other details in the table), yielding the possibility to store indices in memory that might be too small for the entire table. In a relational database an index is a copy of part of a table. Some databases extend the power of indexing by allowing indices to be created on functions or expressions. Index architectures can be classified as clustered or non-clustered. A non-clustered index normally contains a reference to a block that contains the row data for which the particular index item has been constructed. Clustering re-orders the data block in the same order as the index, hence it is also an operation on the data storage blocks as well as on the index. Exact operation of database systems vary, but because the row data can only be stored in one order physically, only one clustered index may be created on a given database table. Clustered indices can greatly increase access speed, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items are selected

Basic relational algebra operations, additional relational operations, SQL queries, Sub queries, working with views*** //( all the SQL part is covered in other notes)//

13

Unit 3
E-R Model

Data Base Management System

A database model that describes the attributes of entities and the relationships among them. An entity is a file (table). Today, ER models are often created graphically, and software converts the graphical representations of the tables into the SQL code required to create the data structures in the database. In ER modeling, the structure for a database is portrayed as a diagram, called an entity-relationship diagram (or ER diagram), that resembles the graphical breakdown of a sentence into its grammatical parts. Entities are rendered as points, polygons, circles, or ovals. Relationships are portrayed as lines connecting the points, polygons, circles, or ovals. Any ER diagram has an equivalent relational table, and any relational table has an equivalent ER diagram. SYMBOLS Entity Relationship Model: Overview of Database Design, Entities, attributes, and Entity sets, Relationships and Relationship sets, additional features of the ER Model, Conceptual database design with the ER model Entity versus attribute, entity versus Relationship . Relational model: Introduction to Relational model, foreign key constraints, enforcing integrity constraints, Querying relational data, logical database design: ER to relation, introduction to views, destroying/altering tables and views, Codd rules

14

Entity An entity is an object that exists and is distinguishable from other objects. Example: specific person, company, event, plant. An entity set that does not have a primary key is referred to as a weak entity set. Attributes An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set.Domain the set of permitted values for each attribute Attribute types: Simple and composite attributes. Single-valued and multi-valued attributes E.g. multivalued attribute: phone-numbers Derived attributes:4Can be computed from other attributes E.g. age, given date of birth

15

Mapping Cardinalities Express the number of entities to which another entity can be associated via a relationship set. Most useful in describing binary relationship sets. For a binary relationship set the mapping cardinality must be one of the following types: One to one One to many Many to one Many to many

Specialization Top-down design process; we designate subgroupings within an entity set that are distinctive from other entities in the set. These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set. Generalization A bottom-up design process combine a number of entity sets that share the same features into a higher-level entity set. Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.

16

Query by Example (QBE) is a method of creating database queries using examples based on a text string the name of a document or a list of documents. The QBE system converts the user input into a formal database query. This approach allows the user to perform powerful searches without the need of having to learn a more formalized query mechanism such as Structured Query Language. Query-by-Example (QBE ) is a language for querying (and also for inserting, updating data like SQL) relational data. The user creates a query by creating example tables. Hence the name! Basic Idea: The user formulates the query by entering an example of a possible answer in the appropriate place in an empty table. QBE vs. SQL Difference in Approach SQL query: You describe to the DBMS how to get the data you want through the SQL query syntax. QBE query: You describe the data itself through example tables. How to get? (SQL) vs. What to get? (QBE) QBE language involves relatively few concepts. Hence, a user needs minimal information to get started in QBE. QBE is especially suited for queries that are not too complex and can be expressed in terms of a few tables.

Domain relational Calculus, Introduction to RDBMS PACKAGES (ORACLE, SQL SERVER)***//( to be covered from the books in class)

17

Unit 4

Data Base Management System

Schema Refinement & Normal Forms: Introduction to schema refinement Relational database design requires that we find a good collection of relation schemas. A bad design may lead to Repetition of Information. Inability to represent certain information. Design Goals: Avoid redundant data Ensure that relationships among attributes are represented Facilitate the checking of updates for violation of database integrity constraints.

Functional dependencies, Consider a relation R that has two attributes A and B. The attribute B of the relation is functionally dependent on the attribute A if and only if for each value of A no more than one value of B is associated. In other words, the value of attribute A uniquely determines the value of R.A -> R.B For example, the social security number uniquely determines a name; SSNName Examples motivation schema refinement

Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)

Redundancy: Data for branch-name, branch-city, assets are repeated for each loan that a branch makes

18

Wastes space Complicates updating, introducing possibility of inconsistency of assets value

Decompositions Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1 R2 Normalization Normalization is a technique of database design that suggests that certain criteria be used when constructing a table layout (deciding what columns each table will have, and creating the key structure), where the idea is to eliminate redundancy of non-key data across tables. Normalization is usually referred to in terms of forms. First Normal Form refers to moving data into separate tables where the data in each table is of a similar type, and by giving each table a primary key. Putting data in Second Normal Form involves taking out data off to other tables that is only dependent of a part of the key. Third Normal Form involves getting rid of anything in the tables that doesn't depend solely on the primary key. Only include information that is dependent on the key, and move off data to other tables that are independent of the primary key, and create a primary keys for the new tables. Normal Forms Converting to First Normal Form A table in a relational database must be in 1NF.
Repeating groups must be eliminated Primary key determined o Uniquely identify attribute values (rows) o All attributes dependent on primary key o In example: Combination of PROJ_NUM

and EMP_NUM

Dependencies
Dependencies

can be depicted with the help of a diagram

19

Dependency diagram: o Depicts all dependencies found within a given table structure o Helpful in getting birds-eye view of all relationships among a tables o

attributes Use makes it much less likely that an important dependency will be overlooked on entire primary key

Desirable dependencies based Less desirable dependencies

Partial: Based on part of composite primary key Transitive: One nonprime attribute depends on another nonprime attribute Dependency Diagram

1NF: Definition
Tabular format in which: o All key attributes are defined o There are no repeating groups in the table o All attributes are dependent on primary key All relational tables must satisfy 1NF requirements Some tables contain partial dependencies o Dependencies based on only part of the primary key o Sometimes used for performance reasons, but should Still

be used with

caution subject to data redundancies

20

Second Normal Form 1. Identify all key components Write each key component on separate line Write original key on last line Write dependent attributes after each key. 2. Each line will become a new table Second Normal Form Conversion Results

Second Normal Form Defined Table is in second normal form (2NF) if:
It is in 1NF and It includes no partial dependencies: No attribute is dependent on only a portion

of the primary key

21

Converting to Third Normal Form


Resolve transitive dependencies (attributes dependent Create separate table for each transitive dependency

on non-key attributes)

3NF Results

Concepts of ER Model are already covered in unit 3*****

22

Potrebbero piacerti anche