Sei sulla pagina 1di 49

Chapter 9

Database Management Systems

Objectives for Chapter 9


Problems in the flat-file approach to data management Why these gave rise to the database concept Relationships among the elements of the database environment Characteristics of the relational database model Three stages in database design including Conceptual design Logical design Physical design Anomalies caused by unnormalized databases and the need for data normalization Features of distributed databases Considerations in deciding on a particular database configuration

Flat-File Versus Database Environments


Computer processing involves two components: data and instructions (programs). Conceptually, there are two methods for designing the interface between program instructions and data: File-oriented processing: A specific data file was created for each application Data-oriented processing: Creates a single data repository to support numerous applications Disadvantages of file-oriented processing include redundant data and programs and varying formats for storing the redundant data. The format for similar fields may vary because the programmer used inconsistent field formats.
3

Flat-File Environment
User 1 Transactions

Data Program 1 A,B,C

User 2 Transactions

Program 2
User 3 Transactions

X,B,Y

Program 3
L,B,M

Data Redundancy & Flat-File Problems


Data Storage - creates excessive storage costs of paper documents and/or magnetic form Data Updating - any changes or additions must be performed multiple times Currency of Information - potential problem of failing to update all affected files Task-Data Dependency - users inability to obtain additional information as his or her needs change
5

Database Approach
User 1 Transactions

Database Program 1 D B M S A, B, C, X, Y, L, M

User 2 Transactions

Program 2
User 3 Transactions

Program 3

Advantages of the Database Approach


Data sharing/centralize database resolves flat-file problems: No data redundancy - Data is stored only once, eliminating data redundancy and reducing storage costs. Single update - Because data is in only one place, it requires only a single update procedure, reducing the time and cost of keeping the database current. Current values - A change to the database made by any user yields current data values for all other users. Task-data independence - As users information needs expand beyond their immediate domain, the new needs can be more easily satisfied than under the flat-file approach.
7

Disadvantages of the Database Approach


Can be costly to implement
additional hardware, software, storage, and network resources are required

Can only run in certain operating environments


may make it unsuitable for some system configurations

Because it is so different from the file-oriented approach, the database approach requires training users
may be inertia or resistance
8

Elements of the Database Approach


System Development Process

Database Administrator

Applications Transactions User Programs User Programs User Programs DBMS


Data Definition Language Data Manipulation Language Query Language

U S E R S

Transactions

Host Operating System

Transactions

Physical Database

User Queries

DBMS Features
User Programs - make the presence of the DBMS transparent to the user Direct Query - allows authorized users to access data without programming Application Development - user created applications Backup and Recovery - copies database Database Usage Reporting - captures statistics on database usage (who, when, etc.) Database Access - authorizes access to sections of 10 the database

Internal Controls and DBMS


The purpose of the DBMS is to provide controlled access to the database. The DBMS is a special software system programmed to know which data elements each user is authorized to access and deny unauthorized requests of data.

11

Data Definition Language (DDL)


DDL is a programming language used to define the database to the DBMS. The DDL identifies the names and the relationship of all data elements, records, and files that constitute the database. Viewing Levels: internal view - physical arrangement of records (1) conceptual view - representation of database (1) user view - the portion of the database each 12 user views (many)

ANSI Model
External User Views Conceptual Model Internal Model

Physical Database

Data Manipulation Language (DML)


DML is the proprietary programming language that a particular DBMS uses to retrieve, process, and store data. Entire user programs may be written in the DML, or selected DML commands can be inserted into universal programs, such as COBOL and FORTRAN.

14

Query Language
The query capability permits end users and professional programmers to access data in the database without the need for conventional programs. IBMs Structured Query Language (SQL) is a fourth-generation language that has emerged as the standard query language.

15

Three Steps in Designing a Database


Prepare the conceptual model
Identify the entities Identify the relationships between the entities Prepare the ER diagram

Specify the logical design


Select the logical database model (relational) Transform the conceptual data model using a logical database model

Implement the physical design


Physical structures Access methods
16

Phase 1
Prepare the Conceptual Model

Draw an ERD to capture the process.

17

ER-Diagram Symbols

ER-Diagram Symbols

Entity

Relationship

Attribute

Primary Key

Example of a Relationship Linking Two Entities

CUSTOMER

M places

ORDER

Order Number
Name Number

Item #

An Entity
...is an
individual object concept event

...may be a specific
tangible object intangible object

Entity Class is a collection of entities with similar attributes.


21

Attributes
A property of an entity that we choose to record (of interest to an organization). CUSTOMER (entity) PRODUCT (entity) customer # product # name description address finish telephone no. price balance qty. on hand
22

Cardinalities
Entity Salesperson Relationship Entity

Assigned

Car

1
Customer

Places

Order

Vendor

Supply

Inventory

ER-Diagram using REA Model


R
Inventory
M Line items M

E
M

A
1
Party to

Sales
M M

Sales person

Pays for
M 1

Made to
1

Customer

Cash

Increases

Cash Collections M

Received from

1
Received by

Cashier
34

Phase 2
Specify the Logical Design Create relational tables.

25

Logical Data Structures


A particular method used to organize records in a database is called the databases structure. The objective is to develop this structure efficiently so that data can be accessed quickly and easily. Four types of structures are:
hierarchical (AKA the tree structure) network relational object-oriented

26

The Relational Model


The relational model portrays data in the form of two dimensional tables:
relation - the database table attributes (data elements) - form columns tuples (records) - form rows data - the intersection of rows and columns

27

RESTRICT - filtering out rows, such as the purple

PROJECT - filtering out columns, such as the purple

JOIN X1 Y1 Y2 Y1 Z1 Z2 Z1

X1

Y1 Y2 Y1

Y1 Y2 Y3

Z1 Z2 Z3

X2
X3

X2
X3

Properly Designed Relational Tables No repeating values - All occurrences at the intersection of a row and column are a single value. The attribute values in any column must all be of the same class. Each column in a given table must be uniquely named. Each row in the table must be unique in at least one attribute, which is the primary key.
29

Relational Model Data Linkages (>1 table)


No explicit pointers are present. The data are viewed as a collection of independent tables. Relations are formed by an attribute that is common to both tables in the relation. Assignment of foreign keys: if 1 to 1 association, either of the tables primary keys may be the foreign key. if 1 to many association, the primary key on one of the sides is embedded as the foreign key on the other side. if many to many association, may embed foreign keys or create a separate linking table.
30

Three Types of Anomalies


Insertion Anomaly: A new item cannot be added to the table until at least one entity uses a particular attribute item. Deletion Anomaly: If an attribute item used by only one entity is deleted, all information about that attribute item is lost. Update Anomaly: A modification on an attribute must be made in each of the rows in which the attribute appears. Anomalies can be corrected by creating relational tables.
31

Advantages of Relational Tables


Removes all three anomalies Various items of interest (customers, inventory, sales) are stored in separate tables. Space is used efficiently. Very flexible. Users can form ad hoc relationships.
32

The Normalization Process


A process which systematically splits unnormalized complex tables into smaller tables that meet two conditions:
all nonkey (secondary) attributes in the table are dependent on the primary key all nonkey attributes are independent of the other nonkey attributes

When unnormalized tables are split and reduced to third normal form, they must then be linked together by foreign keys.
33

Steps in Normalization
Table with repeating groups First normal form 1NF
Remove partial dependencies Remove repeating groups

Second normal form 2NF


Remove transitive dependencies

Third normal form 3NF

Higher normal forms

Remove remaining anomalies

Accountants and Data Normalization


The update anomaly can generate conflicting and obsolete database values. The insertion anomaly can result in unrecorded transactions and incomplete audit trails. The deletion anomaly can cause the loss of accounting records and the destruction of audit trails. Accountants should have an understanding of the

data normalization process and be able to determine whether a database is properly normalized.

35

Phase 3
Implement the Physical Design Decide about software and hardware.

36

Physical Database Design


Transition from theoretical to physical aspects of database
IS IT

Decisions about software and hardware


Implementation

Populate the database with data Produce physical user views (multiple)
37

Data Structures
allow records to be located, stored, and retrieved and allow movement through the database. Two components:
The organization of a file is the physical arrangement of records. The access method is the technique used to locate records and to navigate through the database.
38

Distributed Data Processing

Central Site

Centralized Database

Site A

Site B

Site C

Distributed Data Processing


DP is organized around several information processing units (IPUs) distributed throughout the organization and placed under the control of the end users. DDP does NOT mean Decentralization!
IPUs are connected to one another and coordinated.
40

Potential Advantages of DDP


Cost reductions in hardware and data entry tasks Improved cost control responsibility Improved user satisfaction since control is closer to the user level Backup of data can be improved through the use of multiple data storage sites
41

Potential Disadvantages of DDP


Loss of control Mismanagement of organization-wide resources Hardware and software incompatibility Redundant tasks and data Consolidating incompatible tasks Difficulty attracting qualified personnel Lack of standards

42

Centralized Databases in DDP Environment


The data is retained in a central location. Remote IPUs send requests for data. Central site services the needs of the remote IPUs. The actual processing of the data is

performed at the remote IPU.

43

Data Currency
Occurs in DDP with a centralized database During transaction processing, the data will temporarily be inconsistent as a record is being read and updated. Database lockout procedures are necessary to keep IPUs from reading inconsistent data and from writing over a transaction being written by another IPU.
44

Distributed Databases: Partitioning


Splits the central database into segments that are distributed to their primary users Advantages: users control is increased by having data stored at local sites transaction processing response time is improved the volume of transmitted data between IPUs is reduced reduces the potential data loss from a disaster
45

The Deadlock Phenomenon


Especially a problem with partitioned databases Occurs when multiple sites lock each other out of data that they are currently using
One site needs data locked by another site.

Special software is needed to analyze and resolve conflicts.


Transactions may be terminated and have to be restarted.
46

The Deadlock Phenomenon


Locked A, waiting for C Locked E, waiting for A

A,B

E, F

C,D
Locked C, waiting for E
47

Distributed Databases: Replication


The duplication of the entire database for multiple IPUs This method is effective for situations with a high degree of data sharing, but no primary user, and supports read-only queries. The data traffic between sites is reduced considerably.
48

Concurrency Problems and Control Issues


Database concurrency is the presence of complete and accurate data at all IPU sites. With replicated databases, maintaining current data at all locations is a difficult task. Time stamping may be used to serialize transactions and to prevent and resolve any potential conflicts created by updating data at various IPUs.
49

Potrebbero piacerti anche