Sei sulla pagina 1di 34

'

UNIT I INTRODUCTION Database Management Systems


Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models). A computer database relies upon software to organize the storage of data. This software is known as a database management system ( !"#). atabase management systems are categorized according to the database model that they support. The model tends to determine the $uery languages that are available to access the database. A great deal of the internal engineering of a !"#% however% is independent of the data model% and is concerned with managing factors such as performance% concurrency% integrity% and recovery from hardware failures. &n these areas there are large differences between products. '. A database management system ( !"#)% or simply a database system ( !#)% consists of
o

A collection of interrelated and persistent data (usually referred to as the database ( !)). A set of application programs used to access% update and manage that data (which form the data management system ("#)). !"# is to provide an environment that is both convenient and

(. The goal of a efficient to use in


o o

)etrieving information from the database. #toring information into the database. efinition of structures for information storage (data modeling). +rovision of mechanisms for the manipulation of information (file and systems structure% $uery processing).

*.

atabases are usually designed to manage large bodies of information. This involves
o o

(
o

+roviding for the safety of information in the database (crash recovery and security). ,oncurrency control if the system is shared by users.

Purpose of Database Systems


'. To see why database management systems are necessary% a typical --file.processing system// supported by a conventional operating system. The application is a savings bank0
o o

#avings account and customer records are kept in permanent system files. Application programs are written to manipulate files to perform the following tasks0

ebit or credit an account. Add a new account. 1ind an account balance. 2enerate monthly statements.

(.

evelopment of the system proceeds as follows0


o o o o

3ew application programs must be written as the need arises. 3ew permanent files are created as re$uired. but over a long period of time files may be in different formats% and Application programs may be in different languages. ata redundancy and inconsistency

*. #o we can see there are problems with the straight file.processing approach0
o

#ame information may be duplicated in several places. All copies may not be updated properly. "ay have to write a new application program to satisfy an unusual re$uest. 4.g. find all customers with the same postal code. ,ould generate this data manually% but a long 5ob... ata in different files.

ifficulty in accessing data

ata isolation

*
o

ata in different formats. ifficult to write new application programs. 6ant concurrency for faster response time. 3eed protection for concurrent updates. 4.g. two customers withdrawing funds from the same account at the same time . account has 7899 in it% and they withdraw 7'99 and 789. The result could be 7*89% 7:99 or 7:89 if no protection.

"ultiple users

#ecurity problems

4very user of the system should be able to access only the data they are permitted to see. 4.g. payroll people only handle employee records% and cannot see customer accounts; tellers only access account data and cannot see payroll data. ifficult to enforce this with application programs. ata may be re$uired to satisfy constraints. 4.g. no account balance below 7(8.99. Again% difficult to enforce or to change constraints with the file. processing approach.

&ntegrity problems

These problems and others led to the development of database management systems.

Data bstraction
'. The ma5or purpose of a database system is to provide users with an abstract vie! of the system. The system hides certain details of how data is stored and created and maintained ,omplexity should be hidden from database users. (. There are several levels of abstraction0 '. +hysical <evel0

: 1eatures of physical data model include0


#pecification all tables and columns. 1oreign keys are used to identify relationships between tables. enormalization may occur based on user re$uirements. +hysical considerations may cause the physical data model to be $uite different from the logical data model.

At this level% the data modeler will specify how the logical data model will be realized in the database schema. The steps for physical data model design are as follows0 '. ,onvert entities into tables. (. ,onvert relationships into foreign keys. *. ,onvert attributes into columns. :. "odify the physical data model based on physical constraints = re$uirements.

>ow the data are stored. 4.g. index% !.tree% hashing. <owest level of abstraction. ,omplex low.level structures described in detail.

(. ,onceptual <evel0 1eatures of logical data model include0


&ncludes all entities and relationships among them. All attributes for each entity are specified. The primary key for each entity specified. 1oreign keys (keys identifying the relationship between different entities) are specified. 3ormalization occurs at this level.

At this level% the data modeler attempts to describe the data in as much detail as possible% without regard to how they will be physically implemented in the database. &n data warehousing% it is common for the conceptual data model and the logical data model to be combined into a single step (deliverable).

8 The steps for designing the logical data model are as follows0 '. &dentify all entities. (. #pecify primary keys for all entities. *. 1ind the relationships between different entities. :. 1ind all attributes for each entity. 8. )esolve many.to.many relationships. ?. 3ormalization.

3ext highest level of abstraction. escribes what data are stored. escribes the relationships among data. atabase administrator level. >ighest level. escribes part of the database for a particular group of users. ,an be many different views of a database. 4.g. tellers in a bank get a view of customer accounts% but not of payroll data.

*. @iew <evel0

1ig. llustrates the three levels.

"igure #$#% The three levels of data abstraction

&O'IC &

D T

IND(P(NC(

ND

P)*SIC &

D T

IND(P(ND(NC(

? &ogical Data Independence% <ogical data independence is the ability to modify the conceptual schema without having alteration in external schemas or application programs. Alterations in the conceptual schema may include addition or deletion of fresh entities% attributes or relationships and should be possible without having alteration to existing external schemas or having to rewrite application programs. P+ysical Data Independence% +hysical data independence is the ability to modify the inner schema without having alteration to the conceptual schemas or application programs. Alteration in the internal schema might include. A Bsing new storage devices. A Bsing different data structures. A #witching from one access method to another. A Bsing different file organizations or storage structures.

Conceptual , &ogical , +hysical "odels

The terms CconceptualC. ClogicalC% and CphysicalC are fre$uently used in data modeling to differentiate levels of abstraction versus detail in the model. Although there is no general agreement% let alone accepted authority% which defines these terms% nevertheless data modelers generally understand the approximate scope of each.

D A conceptual entity.relationship model shows how the business world sees information. &t suppresses non.critical details in order to emphasize business rules and user ob5ects. &t typically includes only significant entities which have business meaning% along with their relationships. "any.to.many relationships are acceptable to represent entity associations. A conceptual model might discover that there is a need to house information about each person in an organization. 6hile considerable thought is given to discovering and describing the relevant properties of each person% the designers accept implicitly that each person is distinct and uni$ue. A conceptual model may include a few significant attributes to augment the definition and visualization of entities. 3o effort need be made to inventory the full attribute population of such a model. A conceptual model may have some identifying concepts or candidate keys noted but it explicitly does not include a complete scheme of identity% since identifiers are logical choices made from a deeper context.

,on. ceptual 4.) "odel

E A logical entity.relationship model is provable in the mathematics of data science. 2iven the current predominance of relational databases% logical models generally conform to relational theory. Thus a logical model contains only fully normalized entities. #ome of these may represent logical domains rather than potential physical tables. 1or a logical data model to be normalized% it must include the full population of attributes to be implemented and those attributes must be defined in terms of their domains or logical data types (e.g.% character% number% date% picture% etc.). A logical data model re$uires a complete scheme of identifiers or candidate keys for uni$ue identification of each occurrence in every entity. #ince there are choices of identifiers for many entities% the logical model indicates the current selection of identity. +ropagation of identifiers as foreign keys may be explicit or implied. #ince relational storage cannot support many.to.many concepts% a logical data model resolves all many.to.many relationships into associative entities which may ac$uire independent identifiers and possibly other attributes as well. +hysical atabase #chema A physical data model is a single logical model instantiated in a specific database management product (e.g.% #ybase% Oracle% &nformix% etc.) in a specific installation. The physical data model specifies implementation details which may be features of a particular product or version% as well as configuration choices for that database instance. These include index construction% alternate key declarations% modes of referential integrity (declarative or procedural)% constraints% views% and physical storage ob5ects such as tablespaces.

<ogical 4.) "odel

F The conceptual model is concerned with the real world view and understanding of data; the logical model is a generalized formal structure in the rules of information science; the physical model specifies how this will be executed in a particular instance. @arious data modeling methodologies and products provide these layers of abstraction in different ways. #ome address only the physical implementation; some model only the logical structure; others may provide elements of all three but not necessarily in three separate views. &n each case it helps the data modeler to understand the level of abstraction to which a particular feature or task belongs. !"#

&n #ummary

Data Storage C+aracteristics

1or a significant amount of data% we re$uire persistent% inexpensive% reliable and sharable storage methods with relatively rapid access time. Persistent . ata persists (lives on) after power is removed. Ine-pensive . typically measured on a 7 per "egabyte basis. Reliable . #hould not have to be replaced due to excessive errors. S+arable . #hould facilitate sharing of data among many users. ccess time . ata should be accessible in a relatively short period of time.

dvantages
The advantages of the database management systems can be enumerated as under0 .are+ouseofInformation The database management systems are warehouses of information% where large amount of data can be stored. The common examples in commercial applications are inventory data% personnel data% etc. &t often happens that a common man uses a database management system% without even realizing% that it is being used. The best examples for the same% would be the address book of a cell phone% digital diaries% etc. !oth these e$uipments store data in their

'9 internal Defining ttributes The uni$ue data field in a table is assigned a primary key. The primary key helps in the identification of data. &t also checks for duplicates within the same table% thereby reducing data redundancy. There are tables% which have a secondary key in addition to the primary key. The secondary key is also called /foreign key/. The secondary key refers to the primary key of another table% thus establishing a relationship between the two tables. SystematicStorage The data is stored in the form of tables. The tables consists of rows and columns. The primary and secondary key help to eliminate data redundancy% enabling systematic storage of data. C+angestoSc+ema The table schema can be changed and it is not platform dependent. Therefore% the tables in the system can be edited to add new columns and rows without hampering the applications% that depend on that particular database. database.

No&anguageDependence The database management systems are not language dependent. Therefore% they can be used with various languages and on various platforms.

Table/oins The data in two or more tables can be integrated into a single table. This enables to reduce the size of the database and also helps in easy retrieval of data.

MultipleSimultaneousUsage The database can be used simultaneously by a number of users. @arious users can retrieve the same data simultaneously. The data in the database can also be modified% based on the privileges DataSecurity ata is the most important asset. Therefore% there is a need for data security. management systems help to keep the data atabase secured. assigned to users.

''

Privileges ifferent privileges can be given to different users. 1or example% some users can edit the database% bstract but are 0ie! not allowed of to delete the and contents of the database. Retrieval

Data

(asy

!"# enables easy and convenient retrieval of data. A database user can view only the abstract form of data; the complexities of the internal structure of the database are hidden from him. The data fetched is in user friendly format.

DataConsistency ata consistency ensures a consistent view of data to every user. &t includes the accuracy% validity and integrity of related data. The data in the database must satisfy certain consistency constraints% for example% the age of a candidate appearing for an exam should be of number datatype and in the range of (9.(8. 6hen the database is updated% these constraints are checked by the database systems.

The commonly used database management system is called relational database management system () !"#). The most important advantage of database management systems is the systemetic storage of data% by maintaining the relationship between the data members. The data is stored as tuples in a ) !"#.

The advent of ob5ect oriented programming gave rise to the concept of ob5ect oriented database management systems. These systems combine properties like inheritance% encapsulation% polymorphism% abstraction with atomicity% consistency% isolation and durability% also called A,& properties of !"#.

atabase management systems have brought about systematization in data storage% along with data security. #$ Controlling Data Redundancy 1 &n the conventional file processing system% every user group maintains its own files for handling its data files. This may lead to 2 uplication of same data in different files.

'( 2 6astage of storage space% since duplicated data is stored. 2 4rrors may be generated due to updation of the same data in different files. 2 Time in entering data again and again is wasted. 2 ,omputer )esources are needlessly used. 2 &t is very difficult to combine information. 3$ (limination of Inconsistency 1 &n the file processing system information is duplicated throughGout the system. #o changes made in one file may be necessary be carried over to another file. This may lead to inconsistent data. #o we need to remove this duplication of data in multiple file to eliminate inconsistency. "or e-ample% 1 <et us consider an example of student/s result system. #uppose that in #TBG 43T file it is indicated that )oll noH '9 has opted for /,omputer/course but in )4#B<T file it is indicated that /)oll 3o. Hl 9/ has opted for /Accounts/ course. Thus% in this case the two entries for z particular student don/t agree with each other. Thus% database is said to be in an inconsistent state. #c to eliminate this conflicting information we need to centralize the database. On centralizing the data base the duplication will be controlled and hence inconsistency will be removed. ata inconsistency are often encountered in every day life ,onsider an another example% w have all come across situations when a new address is communicated to an organization that we deal it (4g . Telecom% 2as ,ompany% !ank). 6e find that some of the communications from that organization are received at a new address while other continued to be mailed to the old address. #o combining all the data in database would involve reduction in redundancy as well as inconsistency so it is likely to reduce the costs for collection storage and updating of ata. <et us again consider the example of )esult system. #uppose that a student having )oll no .(9' changes his course from /,omputer/ to /Arts/. The change is made in the #B!I4,T file but not in )4#B<T/# file. This may lead to inconsistency of the data. #o we need to centralize the database so that changes once made are reflected to all the tables where a particulars field is stored. Thus the update is brought automatically and is known as propagating updates.

'* 4$ 5etter service to t+e users 1 A !"# is often used to provide better services to the users. &n conventional system% availability of information is often poor% since it normally difficult to obtain information that the existing systems were not designed for. Once several conventional systems are combined to form one centralized database% the availability of information and its updateness is likely to improve since the data can now be shared and !"# makes it easy to respond to anticipated information re$uests. ,entralizing the data in the database also means that user can obtain new and combined information easily that would have been impossible to obtain otherwise. Also use of !"# should allow users that don/t know programming to interact with the data more easily% unlike file processing system where the programmer may need to write new programs to meet every new demand. 6$ "le-ibility of t+e System is Improved 1 #ince changes are often necessary to the contents of the data stored in any system% these changes are made more easily in a centralized database than in a conventional system. Applications programs need not to be changed on changing the data in the database. 7$ Integrity can be improved 1 #ince data of the organization using database approach is centralized and would be used by a number of users at a time. &t is essential to enforce integrity.constraints. &n the conventional systems because the data is duplicated in multiple files so updating or changes may sometimes lead to entry of incorrect data in some files where it exists. "or e-ample% 1 The example of result system that we have already discussed. #ince multiple files are to maintained% as sometimes you may enter a value for course which may not exist. #uppose course can have values (,omputer% Accounts% 4conomics% and Arts) but we enter a value />indi/ for it% so this may lead to an inconsistent data% so lack of &ntegrity. 4ven if we centralized the database it may still contain incorrect data. 1or example0 . J #alary of full time employ may be entered as )s. 899 rather than )s. 8999. J A student may be shown to have borrowed books but has no enrollment.

': J A list of employee numbers for a given department may include a number of non existent employees. These problems can be avoided by defining the validation procedures whenever any update operation is attempted. 8$ Standards can be enforced 1 #ince all access to the database must be through !"#% so standards are easier to enforce. #tandards may relate to the naming of data% format of data% structure of the data etc. #tandardizing stored data formats is usually desirable for the purpose of data interGchange or migration between systems. 9$ Security can be improved 1 &n conventional systems% applications are developed in an adhoc=temporary manner. Often different system of an organization would access different components of the operational data% in such an environment enforcing security can be $uiet difficult. #etting up of a dataGbase makes it easier to enforce security restrictions since data is now centralized. &t is easier to control who has access to what parts of the database. ifferent checks can be established for each type of access (retrieve% modify% delete etc.) to each piece of information in the database. ,onsider an (-ample of banking in which the employee at different levels may be given access to different types of data in the database. A clerk may be given the authority to know only the names of all the customers who have a loan in bank but not the details of each loan the customer may have. &t can be accomplished by giving the privileges to each employee. :$ Organi;ation<s re=uirement can be identified 1 All organizations have sections and deGpartments and each of these units often consider the work of their unit as the most important and therefore consider their need as the most important. Once a database has been setup with centralized control% it will be necessary to identify organization/s re$uirement and to balance the needs of the competating units. #o it may become necessary to ignore some re$uests for information if they conflict with higher priority need of the organization. &t is the responsibility of the !A ( atabase Administrator) to structure the database system to provide the overall service that is best for an organization.

'8 "or e-ample% 1 A !A must choose best file #tructure and access method to give fast response for the high critical applications as compared to less critical applications. >$ Overall cost of developing and maintaining systems is lo!er 1 &t is much easier to reGspond to unanticipated re$uests when data is centralized in a database than when it is stored in a conventional file system. Although the initial cost of setting up of a database can be large% one normal expects the overall cost of setting up of a database% developing and maintaining application programs to be far lower than for similar service using conventional systems% #ince the productivity of programGmers can be higher in using non.procedural languages that have been developed with !"# than using procedural languages. #?$ Data Model must be developed 1 +erhaps the most important advantage of setting up of database system is the re$uirement that an overall data model for an organization be build. &n convenGtional systems% it is more likely that files will be designed as per need of particular applications demand. The overall view is often not considered. !uilding an overall view of an organization/s data is usual cost effective in the long terms. ##$ Provides bac@up and Recovery 1 ,entralizing a database provides the schemes such as recovery and backups from the failures including disk crash% power failures% software errors which may help the database to recover from the inconsistent state to the state that existed prior to the occurrence of the failure% though methods are very complex.

&#A @A3TA24# O1 !"#


#$ Cost of )ard!are A Soft!are A processor with high speed of data processing and memory of large size is re$uired to run the !"# software. &t means that you have to up grade the hardware used for file.based system. #imilarly% !"# software is also very costly. 3$ Cost of Data Conversion 6hen a computer file.based system is replaced with a database system% the data stored into data file must be converted to database file. &t is very difficult and costly method to convert data of data files into database. Kou have to hire database and system designers along with application programmers. Alternatively% you have to take the services of some software

'? house. #o a lot of money has to be paid for developing software. J *$ Cost of Staff Trailing "ost !"#s are often complex systems so the training for users to use the !"# is re$uired. Training is re$uired at all levels% including programming% application development% and database administration. The organization has to be paid a lot of amount for the training of staff to run the !"#. 6$ ppointing Tec+nical Staff The trained technical persons such as database administrator% application programmers% data entry operators etc. Are re$uired to handle the !"#. Kou have to pay handsome salaries to these persons. Therefore% theC system cost increases. 7$ Database Damage &n most of the organizations% all data is integrated into a single database. &f database is damaged due to electric failure or database is corrupted on the storage media% then your valuable data may be lost forever.

)ISTOR* O" D T 5 S( S*ST(MS


ata are raw facts that constitute building blocks of information. atabase is

a collection of information and a means to manipulate data in a useful way% which must provide proper storage for large amounts of data% easy and fast access and facilitate the processing of data. atabase "anagement #ystem ( !"#) is a set of software that is used to define% store% manipulate and control the data in a database. 1rom pre.stage flat.file system% to relational and ob5ect.relational systems% database technology has gone through several generations and its :9 years history.

T)( (0O&UTION O" T)( D T 5 S(


ncient )istory% ata are not stored on disk; programmer defines both

logical data structure and physical structure% such as storage structure% access methods% &=O modes etc. One data set per program0 high data redundancy. There is no persistence; )andom access memory ()A") is expensive and limited% programmer productivity low.

'D #>8: 1ile.!ased0 predecessor of database% ata maintained in a flat file. +rocessing

characteristics determined by common use of magnetic tape medium.

ata are stored in files with interface between programs and files. "apping happens between logical files and physical file% one file corresponds to one or several programs @arious access methods exits% e.g.% se$uential% indexed% random )e$uires extensive programming in third.generation language such as ,O!O<% !A#&,. <imitations0
o

#eparation and isolation0 4ach program maintains its own set of data% users of one program may not aware of holding or blocking by other programs. uplication0 #ame data is held by different programs% thus% wastes space and resources. >igh maintenance costs such as ensuing data consistency and controlling access #haring granularity is very coarse 6eak security

o o

#>8:1#>:? 4ra of non.relational database0 A database provides integrated and structured collection of stored operational data which can be used or shared by application systems. +rominent hierarchical database model was &!"Ls first was the most popular network !"#. !"# called &"#. +rominent network database model was ,O A#K< !T2 model; & "#

)ierarc+ical data model

"id 'F?9s )ockwell partner with &!" to create information "anagement #ystem (&"#)% &"# != , lead the mainframe database market in D9Ls and early E9Ls. !ased on binary trees. <ogically represented by an upside down tree% one.to many relationship between parent and child records. 4fficient searching; <ess redundant data; integrity isadvantages0
o

ata independence;

atabase security and

,omplex implementation

'E
o

ifficult to manage and lack of standards% such as problem to add empty nodes and canLt easily handle many.many relationships. <acks structural independence% such add up application programming and use complexity.

Net!or@ data model

4arly 'F?9s% ,harles !achmann developed first !"# at >oneywell% &ntegrated ata #tore ( & #) &t standardized in 'FD' by the ,O A#K< group (,onference on <anguages) irected acyclic graph with nodes and edges &dentified * database component0 3etwork schemaMdatabase organization; #ubschemaMview s of database per user; and procedural 4ach record can have multiple parents0
o

ata #ystems

ata management language .. at low level

,omposed of sets relationships% a set represents a one..many relationship between the owner and the member 4ach set has owner record and member record "ember may have several owners

o o

"ain problem0 #ystem complexity and difficult to design and maintain; <ack of structural independence

The distinction of storing data in files and databases is that databases are intended to be used by multiple programs and types of users. #>9?1present 4ra of relational database and atabase "anagement #ystem ( !"#)0 !ased

on relational calculus% shared collection of logically related data and a description of this data% designed to meet the information needs of an organization; #ystem catalog=metadata provides description of data to enable program.data independence; logically related data comprises entities% attributes% and relationships of an organizationLs information. ata abstraction allows view level% a way of presenting data to a group of users and logical level% how data is understood to be when writing $ueries. #>9?% Ted ,odd at &!"Ls #an Iose <ab proposed relational models.

'F

Two ma5or pro5ects start and both were operational in late 'FD9s
o

&32)4# at Bniversity of ,alifornia% !erkeley became commercial and followed up +O#T2)4# which was incorporated into &nformix. #ystem ) at &!" san Iose <ab% later evolved into the first similar product 5ust prior to !(.) !(% which became one of

!"# product based on the relational model. (Oracle produced a

#>98% +eter ,hen defined the 4ntity.relationship(4)) model #>:?s0 "aturation of the relational database technology% more relational based !"# were developed and #N< standard adopted by &#O and A3#&. #>:7% Ob5ect.oriented new format. !"# (OO !"#) develops. <ittle success commercially

because advantages did not 5ustify the cost of converting billions of bytes of data to #>>?s0 incorporation of ob5ect.orientation in relational !"#s% new application

areas% such as data warehousing and O<A+% web and &nternet% &nterest in text and multimedia% enterprise resource planning (4)+) and management resource planning (")+)
o

'FF'0 "icrosoft ships access% a personal 'FF80 1irst &nternet database applications

!"# created as element of

6indows gradually supplanted all other personal !"# products.


o

'FFD0 O"< applied to database processing% which solves long.standing database problems. "a5or vendors begin to integrate O"< into !"# products

Database rc+itecture
'. atabase systems are partitioned into modules for different functions. #ome functions (e.g. file systems) may be provided by the operating system. (. ,omponents include0
o

"ile manager manages allocation of disk space and data structures used to represent information on disk. Database manager0 The interface between low.level data and application programs and $ueries.

(9
o

Buery processor translates statements in a $uery language into low.level instructions the database manager understands. ("ay also attempt to find an e$uivalent but more efficient form.) DM& precompiler converts "< statements embedded in an application

program to normal procedure calls in a host language. The precompiler interacts with the $uery processor.
o

DD& compiler converts

< statements to a set of tables containing

metadata stored in a data dictionary. &n addition% several data structures are re$uired for physical system implementation0
o o

Data files% store the database itself. Data dictionary% stores information about the structure of the database. &t is used +eavily. 2reat emphasis should be placed on developing a good design and efficient implementation of the dictionary. Indices% provide fast access to data items holding particular values.

1igure shows these components.

('

((

"igure % atabase system structure.

Codd<s rules
Dr$ ($ "$ Codd<s #3 rules for defining a fully relational database 3ote that based on these rules there is no fully relational database management system available today. &n particular% rules ?% F% '9% '' and '( are difficult to satisfy.

'. "oundation relational capabilities. (. Information values in a table. *. 'uaranteed ccess

Rule

A relational database management system must manage its stored data using only its

Rule

All information in the database should be represented in one and only one way - as Rule

Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name. :. Systematic Treatment of Null 0alues Null values (distinct from empty character string or a string of blank characters and distinct from ero or any other number) are supported in the fully relational !"#$ for representing missing information in a systematic way, independent of data type. 8. Dynamic On1line Catalog 5ased on t+e Relational Model %he database description is represented at the logical level in the same way as ordinary data, so authori ed users can apply the same relational language to its interrogation as they apply to regular data.

(* ?. Compre+ensive Data Sublanguage Rule

A relational system may support several languages and various modes of terminal use. &owever, there must be at least one language whose statements are e'pressible, per some well-defined synta', as character strings and whose ability to support all of the following is comprehensible( a. data definition b. view definition c. data manipulation (interactive and by program) d. integrity constraints e. authori ation f. transaction boundaries (begin, commit, and rollback).

D. 0ie!

Updating

Rule

All views that are theoretically updateable are also updateable by the system. E. )ig+1level InsertC UpdateC and Delete

%he capability of handling a base relation or a derived relation as a single operand applies nor only to the retrieval of data but also to the insertion, update, and deletion of data. F. P+ysical Data Independence Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representation or access methods. '9. &ogical Data Independence Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables. ''. Integrity Independence )ntegrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs. '(. Distribution Independence %he data manipulation sublanguage of a relational !"#$ must enable application

(: programs and terminal activities to remain logically unimpaired whether and whenever data are physically centrali ed or distributed. '*. Nonsubversion Rule )f a relational system has or supports a low-level (single-record-at-a-time) language, that low-level language cannot be used to subvert or bypass the integrity rules or constraints e'pressed in the higher-level (multiple-records-at-a-time) relational language.

"ile structures and inde-ing "ile Organi;ation


'. A file is organized logically as a se$uence of records. (. )ecords are mapped onto disk blocks. *. 1iles are provided as a basic construct in operating systems% so we assume the existence of an underlying file system. :. !locks are of a fixed size determined by the operating system. 8. )ecord sizes vary. ?. &n relational database% tuples of distinct relations may be of different sizes. D. One approach to mapping database to files is to store records of one length in a given file. E. An alternative is to structure files to accommodate variable.length records. (1ixed. length is easier to implement.)

"i-ed1&engt+ Records
'. ,onsider a file of deposit records of the form0
2.

aaaaaaaaaaaaPtype deposit = record

*. 4. bname 0 char(((); 8. 6. account* 0 char('9); D. 8. balance 0 real; F.

(8 #?$ end ''.


o

&f we assume that each character occupies one byte% an integer occupies : bytes% and a real E bytes% our deposit record is :9 bytes long. The simplest approach is to use the first :9 bytes for the first record% the next :9 bytes for the second% and so on. >owever% there are two problems with this approach. '. &t is difficult to delete a record from this structure. (. #pace occupied must somehow be deleted% or we need to mark deleted records so that they can be ignored. Bnless block size is a multiple of :9% some records will cross block boundaries. &t would then re$uire two block accesses to read or write such a record.

o o

'(. 6hen a record is deleted% we could move all successive records up one (1igure '9.D)% which may re$uire moving a lot of records.
o

6e could instead move the last record into the --hole// created by the deleted record (1igure '9.E). This changes the order the records are in. &t turns out to be undesirable to move records to occupy freed space% as moving re$uires block accesses. Also% insertions tend to be more fre$uent than deletions. &t is acceptable to leave the space open and wait for a subse$uent insertion. This leads to a need for additional structure in our file design. At the beginning of a file% allocate some bytes as a file +eader. This header for now need only be used to store the address of the first record whose contents are deleted. This first record can then store the address of the second available record% and so on (1igure '9.F). To insert a ne! record% we use the record pointed to by the header% and change the header pointer to the ne-t available record. &f no deleted records exist we add our new record to the end of the file.

o o

o o o

'*. #o one solution is0


o o

(? ':. Note0 Bse of pointers re$uires careful programming. &f a record pointed to is moved or deleted% and that pointer is not corrected% the pointer becomes a dangling pointer. )ecords pointed to are called pinned. '8. 1ixed.length file insertions and deletions are relatively simple because --one size fits all//. 1or variable length% this is not the case.

"ile Operations

,onsider four basic 1ile Operations0 Operation 1ind &nsert "odify elete Similar SB& Statement #elect &nsert Bpdate elete &nsert takes constant time. #elect% Bpdate and elete take n=( time. (n is the number of records)

Unordered file . 3ew record is inserted at the end of the file.


o o

Ordered file . 3ew record is inserted in order% in the file.


o o

&nsert takes log(n plus this time to re.organize records. #elect% Bpdate% elete take at least log(n An inde' is maintained that points to the location on disk where the record is found. &nsert takes constant time for the data itself plus log(n for the index #elect% Bpdate% elete take log(n lookup on the index followed by constant time to access data record.

Inde-ed file . 3ew record is inserted at the end of the file.


o

o o

IND(DIN'
J Mec+anism for efficiently locating ro!EsF !it+out +aving to scan entire table

(D J J 5ased on a search key: ro!s +aving a particular value for t+e searc+ @ey attributes can be =uic@ly located DonGt confuse candidate @ey !it+ searc+ @ey% Q Q Candidate @ey% set of attributesH guarantees uni=ueness Searc+ @ey% sequence of attributesH does not guarantee uni=ueness IJust used for searc+

Inde- Structure
J Inde- Structure Contains% Q Index entries J J Q Can contain t+e data tuple itself Einde- and table are integrated in t+is caseFH or Searc+ @ey value and a pointer to a ro! +aving t+at valueH table stored separately in t+is case I unintegrated indeLocation mechanism J Q lgorit+m K data structure for locating an inde- entry !it+ a given searc+ @ey value Inde- entries are stored in accordance !it+ t+e searc+ @ey value J J (ntries !it+ t+e same searc+ @ey value are stored toget+er E+as+C 51 treeF (ntries may be sorted on searc+ @ey value E51treeF

Types of Inde-ing

An index is made up of two components0 A key and a pointer The key is typically the key value for the relation and is mainly used to identify and look up records. The pointer is an address on disk where the rest of the data in the record can be found. Two types of indexes discussed here0 Ordered index and >ashing.

Ordered Inde

)ecords are stored as they are inserted. Rey attribute is stored in order in the index.

(E

Storage Structure
J #tructure of file containing a table Q Q Q >eap file (no index% not integrated) #orted file (no index% not integrated) &ntegrated file containing index and rows (index entries contain rows in this case) J J J &#A" !S tree >ash

Inde- "ile .it+ Separate Storage Structure

Clustered IndeJ Clustered index% inde- entries and ro!s are ordered in t+e same !ay Q Q n integrated storage structure is al!ays clustered Esince ro!s and inde- entries are t+e sameF T+e particular inde- structure EegC +as+C treeF dictates +o! t+e ro!s are organi;ed in t+e storage structure J T+ere can be at most one clustered inde- on a table

(F Q CR( T( T 5&( generally creates an integratedC clustered EmainF inde- on primary @ey J 'ood for range searc+es !+en a range of searc+ @ey values is re=uested Q Q Q Use location mec+anism to locate inde- entry at start of range J T+is locates first ro!$ Subse=uent ro!s are stored in successive locations if inde- is clustered Enot so if unclusteredF Minimi;es page transfers and ma-imi;es li@eli+ood of cac+e +its

Clustered Main Inde-

*9

Clustered Secondary Inde-

Unclustered IndeJ J Unclustered EsecondaryF inde-% inde- entries and ro!s are not ordered in t+e same !ay n secondary inde- mig+t be clustered or unclustered !it+ respect to t+e storage structure it references Q Q Q It is generally unclustered Esince t+e organi;ation of ro!s in t+e storage structure depends on main inde-F T+ere can be many secondary indices on a table Inde- created by CR( T( IND(D is generally an unclusteredC secondary inde-

*'

Unclustered Secondary Inde-

Sparse vs$ Dense IndeJ !ense inde'0 has index entry for each data record Q Q J Bnclustered index must be dense ,lustered index need not be dense

$parse inde'0 has index entry for each page of data file

Multiple ttribute Searc+ Ley J J ,)4AT4 &3 4O &nx O3 Tbl (Att+% Att,) #earch key is a se-uence of attributes; index entries are lexically ordered

*( J J #upports finer granularity e$uality search0 Q Q J T1ind row with value (A'% A() U T1ind rows with values between (A'% A() and (A'% A() U 1ind rows with values of Att+ between A' and A' !ut not T1ind rows with values of Att, between A( and A( U #upports range search (tree index only)0

#upports partial key searches (tree index only)0 Q Q

&ocating an Inde- (ntry J Bse binary search (index entries sorted) J &f . pages of index entries% then log,. page transfers (which is a big improvement over binary search of the data pages of a / page data file since / 00.) J Bse multilevel index0 #parse index on sorted list of index entries

T!o1&evel Inde-

1 $eparator level is a sparse index over pages of index entries 1 2eaf level contains index entries Q ,ost of searching the separator level VV cost of searching index level since separator level is sparse Q ,ost or retrieving row once index entry is found is 9 (if integrated) or ' (if not)

Multilevel Inde-

**

Q #earch cost H number of levels in tree Q &f is the fanout of a separator page% cost is log . 3 + 1 4xample0 if 4 '99 and . H '9%999% cost H * (reduced to ( if root is kept in main memory)

Inde- Se=uential ccess Met+od EIS MF


J J J 2enerally an integrated storage structure Q ,lustered% index entries contain rows #eparator entry H (ki , pi); ki is a search key value; pi is a pointer to a lower level page ki separates set of search key values in the two subtrees pointed at by pi-+ and pi.

Inde- Se=uential ccess Met+od

*:

Inde- Se=uential ccess Met+od J The index is static0 Q Q J Q Once the separator levels have been constructed% they never change 3umber and position of leaf pages in file stays fixed <eaf pages stored se$uentially in file when storage structure is created to support range searches J J if% in addition% pages are positioned on disk to support a scan% a range search can be very fast (static nature of index makes this possible) #upports multiple attribute search keys and partial key searches

2ood for e$uality and range searches

AAAAAAAAAAAAAAAAAAAAAA

Potrebbero piacerti anche