Sei sulla pagina 1di 8

R D B M S

Introduction

Any collection of objects, intentionally organized for a purpose and e cient retrieval is a “database”. In that sense, an entire library is a “database”; the term “database” is used casually and usually incorrectly to refer to the computer-based “relational database management systems” or “RDBMS”, discussed below. A RDBMS refers to the entire concept of business requirements, rules for describing data, commands for creating and manipulating the data, and generating reports, though you should note that most people use the term to refer solely to the computer program.

A RDBMS has several major components: the business logic (or rules that govern what data are to be collected,

by whom, when, and how they’re to be used), data themselves (grouped by functions based on the business needs, and the speci c types of data (see below), and the commands to manipulate the data. What is informally called a “database” is usually the computer application that carries out the business rules and manipulates the data. Some database applications, such as FileMaker Pro, MS Access, and phpMyAdmin

provide GUI tools to facilitate de ning databases, data tables, and creating reports from the data. Other tools, such as MySQL, Oracle, Sybase are just the tool for creating and manipulating data - the programmer must

create the rest of the computer application for input, reports, etc.

See also the .pdf (Database1.pdf )

For students new to RDBMS, it is helpful to think of it as a spreadsheet. There are columns and rows.

in the spreadsheet has a unique row and column address (e.g., row 5, column 2). If we know the row and

column, we can nd the data. RDBMS are computer implementations of the relational algebra that makes it possible to identify uniquely every cell and every set of related cells. This is the key: we can identify a single cell by some unique value, such as a record number, or by a set of values, such as lastname+ rstname +date_of_birth+shoe_size.

Every cell

Steps in creating a RDBMS

In both systems analysis and information architecture, one of the rst phases of creating a RDBMS is to gather

information about what data need to be collected, how they will be used and by whom, and the technical concerns about storing, retrieving, and presenting the data. Data collection: only data that are actually needed ought to be collected. This is determined by interviews, reviewing emails and other les that document the kind of problems people have with their current system, and by studying how people in an organization use the data. Ultimately a report is generated that outlines the problem and presents candidate solutions. From that report, a database administrator or programmer considers the data problem from three perspectives: the conceptual, logical, and physical. Conceptual phase: in this phase, the programmer or analyst breaks down the needs into functions and the relationships among the data and the people who use the data. For example, say you’re creating a payroll system (this becomes the “payroll database”). You need to know the name of the sta , their pay rate, hours worked, whether the check has been printed, how much taxes and insurance to be deducted, and so on. In the conceptual phase, you’d decide how the data should be grouped by function: e.g., record hours worked function, print paycheck function, check data for accuracy function, and so on. At this high level, only the major functions are de ned and related and will become the “tables” that belong to the single database. How the

LIS488 | 2

data move between these functions is called “data ow analysis.” The analyst determines also who gets to see what data and when (this is called the “view of the data”). For example, Joe cannot see Tom’s paycheck. Joe cannot change his pay rate but his manager can. These controls have to be speci ed and documented. From this we can move to the logical phase. Here the main activities are “data decomposition” and “data normalization.” Data decomposition means breaking down the concepts into something closer to how a computer might use them. The concept of “staname” must be broken down, say, into last name, rst name, middle initial, job title. In addition, the analyst tries to remove redundant data from being collected. For example we want to gather the stamember’s name only once and at the right time (when hired). This task is called normalization. There are many forms of normalization but the goal for most of us is to reach “third

normal form” or 3NF.

required to enforce normalization. [There are times when redundant data must be captured, but this has to be justi ed in the design of the database system.] All the data that are to be captured must be de ned by being given a data type (see the Topic Data_Types) and all the data gathered into a document called the data dictionary. The Data Dictionary is literally that - the names of all the data to be captured, their data types, their aliases, how the data are used, what table contains what elds, etc. It is critical to the success of any project to have and maintain the data dictionary. Physical phase. Once the data have been de ned, the programmer uses the logical design phase

documents to construct the actual database and tables in software.

the Access program guides you but it is still easy to make mistakes if you don’t understand how RDBMS work. We’ll focus on MySQL to demonstrate.

FileMaker Pro’s and MS Access’s interface enforces 3NF. Otherwise the programmer is

If you use MS Access or similar product,

MySQL is a software program that helps you to create other software programs, speci cally the data part of a RDBMS, by issuing various commands. You’ll still need some kind of programming or scripting to

interact with the database and tables. [In LIS488, we’ll use php and Java.] All SQL programs cluster their commands into two groups: Data De nition Language (or DDL) and Data Manipulation Language (DML).

DDL commands include CREATE database

DML commands include INSERT into …, SELECT …, MODIFY … and others

, CREATE table … and others

Example: Let’s say you’re creating a database about yourself and you want to insert into and retrieve data from

the database over the Internet.

“grades”. This is a possible data de nition:

As a rst try, we create a database called “Transcripts” and then a table called

eld name

data type

size

example

last_name

 

String

25 characters

Smith, De la Rosa Jane, Tom M Wellesley College Art History

rst_name

String

15 characters

middle_initial

char

1

collegeName

String

25

major

 

String

25

age

int

24

course_1

String

6

LIS488

grade_1

oat

4

4.0

course_2

String

6

LIS458

grade_2

oat

4

3.7

[and so on

]

Relational Database Management Systems

LIS488 | 3

If we know these data, we can play the kind of report (screen) design you want:

 

Welcome to Jane Doe’s online resume

 

About me

I’m a 24 years old Art Major from Wellesley College now enrolled in GSLIS at Simmons College.

My classes

LIS488

4.0

A

 
 

LIS458

3.7

A-

GPA: 3.52

To get these data, we issue the command to SQL to use the database we want: “USE Transcripts” Then we issue the command to the table that is part of the database (“grades”):

SELECT * FROM grades;

* means “all” so the command is “get all the records from the grades table”. Because there is only 1 record in the grades table, we retrieve all our data (in this case only the 1 record). If everyone in the class wanted to post their grades online, then it would make more sense to separate (decompose) the data into dierent functions: the student name and personal data in one table, the grades in another table, and the information about the courses into another table.

database studentInfo

table 1

table 2

table 3

student_names

grades

class_info

So, let’s rede ne our database tables and update the data dictionary.

Database:

tables:

studentInfo student_names, grades, class_info

table: student_names record_no

int not null auto_increment

last_name

String

25

rst_name

String

15

middle_initial

char

1

major

String

25

college

String

25

table: grades record_no class_number grade

not not null auto_increment

String

String 2

6

not null

table: class_info class_number

String

6

class_name

String

25

desc

Text

Relational Database Management Systems

LIS488 | 4

Note that we de ne Strings but must choose the right kind of String for our database MySQL. Strings are “varchar()” [a xed length eld from 0-255 characters long]. A “text” holds up to 65,535 characters.

Using this data dictionary, the programmer creates the physical form of the database and tables. The underline means the eld is indexed. More on that shortly. SQL commands end with a semi-colon ;

CREATE database studentinfo;

Now, let’s use the empty database and add tables: USE commands:

CREATE TABLE student_names (

studentinfo; To create tables, we add these

record_no int unsigned not null auto_increment, last_name varchar(25), first_name varchar(15), middle_initial char varchar(1), major varchar(25), college varchar(25), primary key (record_no)

);

[press the return key]

CREATE TABLE grades ( record_no int unsigned not null, class_number varchar(6) not null, grade varchar(2)

);

CREATE TABLE class_info ( class_number varchar(6), class_name varchar(25), desc Text

);

Notice that student_names.record_no eld is “int unsigned” - this means the record number is an integer between 0 - 65,535. It has to be an integer because the computer performs an arithmetic function on the record number every time a new record is added. [The old record number is reviewed and 1 added to it to create the new record number.] It is “not null” because we want to index that eld for fast retrieval. We cannot index on a value that is missing so the addition of “not null” forces the SQL program to give us an error if we try to save the record without a value. We know there will be an index (and the primary key) by the last line of the create statement. In the grades table, we see two elds cannot be null. Later we would create indices on these elds, too. Finally, notice that the class_info table has a eld called “desc” and a value of Text. This means we can enter up to 63,535 characters in this eld - far longer than we need to add a course description.

Assignment or Lab - Due before the next class.

1. Practice the various commands listed below and compare their behavior with di erent options.

Relational Database Management Systems

LIS488 | 5

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

No doubt, you’ve learned already some of the concepts & terminology of relational database modeling. To get up to speed with what you’ve covered, here are some quick notes that may help us harmonize our perspectives.

Components of a DBMS:

Many roles and activities are involved; not all of which you may have encountered. Here’s a view of the components of a DBMS. Notice the di erent contributions of programmers, users, and the DBA - the database administrator. Note, too, the functions that constitute a DBMS (and where the DDL and DML t in). Ultimately the commands and functions must be communicated to the computer system itself - via the le manager, various other access methods and buers before reaching the actual data stored in the relationship databases and tables.

actual data stored in the relationship databases and tables. Query Processor: transforms queries into a series

Query Processor: transforms queries into a series of low-level instructions directed to the database manager. Database Manager (DM): Interacts with the user-submitted application programs and queries. The DM accepts queries and examines the external and conceptual schema to determine what conceptual records are required to satisfy the request. The DN places a call to the File Manger to perform the request. File Manager: manipulates the underlying storage les and manages the allocation of storage space on the disk. Actual physical manipulation of the data is passed to the appropriate access method. DML preprocessor: converts DML statements embedded in an application program into a standard function call in the host language (e.g., MySQL).

DDL compiler: compiler converts DDL statements into a set of tables containing meta- data. These tables are then stored in the catalog while control info is stored in the data les headers. Catalog manager: [not the library kind!] - manages access to and maintains the system catalogue; the system catalog is accessed by most of the DBMS components. There are other important functions in SQL software such as authorization control, command processor, integrity checker, query optimizer, transaction manager, scheduler, recovery manager, and bu er manager.

Most of these functions are performed by the 3rd party application (such as MS Access) or can be manipulated at the command line or in other resource les. Very large systems, such as those that usually use Oracle have lots of these helper resources.

Relational Database Management Systems

LIS488 | 6

Terms:

Relation:

a relation is a table with columns & rows

Attribute:

an attribute is a named column of a relation

Domain:

is the set of allowable values for one or more attributes.

Figure 2: Instances of two tables (branch and starelations):

2: Instances of two tables (branch and sta ff relations): What to know Database Shared collection

What to know Database

Shared collection of logically related data (and a description of these data), designed

Entity

to meet the info needs of an organization. Distinct object (person, place, thing, concept, or event) represented in the database

Attribute

A property that describes some aspect of the object we wish to record; also a named

Domain

column of a relation. Set of allowable values for one or more attributes.

Tuple

Row of a relation

Degree

The degree of a relation is the number of attributes it contains.

Cardinality

The number of rows in a relation.

Superkey

Attribute or a set of attributes that identi es uniquely a tuple within a relation.

Candidate key

Superkey such that no proper subset is a superkey within the relation.

Relational db

Collection of normalized relations.

Primary key

Candidate key that is selected to identify tuples uniquely within the relation.

Foreign key

Attribute or set of attributes within one relation that matches the candidate key of

Null

some (possibly the same) relation. A value for an attribute that is currently unknown or is not applicable for this tuple

Relational Database Management Systems

LIS488 | 7

Referential integrity

If a foreign key exists in a relation, either the foreign key value must match a candidate

Join

key value of some tuple in its home relation or the foreign key value must be wholly null. The union of two or more tables.

Relationship

Association between several entities.

Relation

A table with columns and rows.

DDL

Data De nition Language - set of commands to de ne the data and constraints on the

DML

data stored in the database Data Manipulation Language - provides a general enquiry facility to the data (aka

View of the data

query language) Dynamic result of one or more relational operations on the base relations to produce

Logical db design

another relation. A view is a virtual relation that doesn’t actually exist in the DB. The layout of the relationships of data, using speci c design techniques and tools that

Business rules Physical db design

document the needs and uses of data, according to an organization’s speci c data needs. The articulation of an organization’s data needs. The implementation of the logical database design, usually expressed as the creation

OODBMS

of databases, tables, indices, etc. Object-Oriented Database Management System.

External view

Users’ view of the db

Conceptual level

What data are stored in the db and the relationship among the data

Internal level

Physical representation of the db on the computer.

DB Schema

The de nition of a database; the structure and content in each data element within

Data independence

the structure. Often created using visualization tools. Techniques that allow data to be changed without aecting the applications that

5GL

process it. Fifth Generation Language - expression referring to computer languages that with

Data normalization

each iteration are closer to human language by generation, e.g., 4GL, 5GL The process of analyzing data into record groups for more e cient processing. There

3NF

are many stages, the most standard result being “3NF” (third normal form) where data are identi ed only by the key eld in their record. The main purpose is to eliminate having to store a single datum in more than one place. The standard

Information system

Resources that enable the collection, management, control, and dissemination of info

DB app lifecycle

throughout an organization. DB planning, system de nition, requirements collection and analysis, db design,

Requirements analysis

application design and prototyping, etc. Process of collecting and analyzing info about the part of the organization that is to be

CASE

supported by the DB application and using this info to identify the users’ requirements of the new system. Computer-Aided Software Engineering - usually software that helps in the

Data dictionary

development of an information system, including analysis, design, and programming. The document (or database) that de nes the databases, tables, data types, sources, etc., for an information system.

Relational Database Management Systems

LIS488 | 8

Entity type

An object or concept that is identi ed by the organization as having an independent

Weak entity type Strong entity type Composite attribute Multi-valued attribute Derived attribute

existence. Entity that is existence-dependent on some other entity. Entity that is not existence-dependent on some other entity. Attribute composed of components, each with an independent existence. Attribute that holds multiple values for a single entity Attribute that represents a value that is derivable from the value of a related attribute

Superclass

or set of attributes not necessarily in the same entity. Entity type that includes distinct subclasses that requirement to be represented in the

Normalization

data model. The process of producing a set of relations with desirably properties, given the data

1:M

requirements of an enterprise. Representation of the one-to-many relationship, one data element related to many

M:N

others. Representation of many-to-many data relationships; database normalization breaks

1:1

down M:N to 1:M One-to-one data relationship.

Oracle SELECT ALTER UPDATE INSERT phpMyAdmin Data decomposition ER UML

Popular very large scale relational database product. Database command, or statement, to select data from a table Database command, or statement, to modify the structure of a database table Database command to change the data contents of an existing row. Database command to add new data (add a new row) into a table. Web-based GUI for working with MySQL Breaking down of work functions into discrete units of data Entity-relation diagram, a graphic representation of data relationships Unied Modeling Language, an object oriented analysis and design language; has 12

Web-enabled DB

diagrams (four structural, ve behavioral, and three model management (packages, subsystems, and models). A relational database that has been linked to the Internet.

Object-oriented programming

Writing software that supports a model wherein the data and associated

processing (“methods”) are de ned as self-contained “objects.” OOP has three major features, encapsulation, inheritance, and polymorphism.

Relational Database Management Systems