Sei sulla pagina 1di 8

RDBMS

Introduction Any collection of objects, intentionally organized for a purpose and efficient retrieval is
a “database”. In that sense, an entire library is a “database”; the term “database” is used
casually and usually incorrectly to refer to the computer-based “relational database
management systems” or “RDBMS”, discussed below. A RDBMS refers to the entire
concept of business requirements, rules for describing data, commands for creating
and manipulating the data, and generating reports, though you should note that most
people use the term to refer solely to the computer program.

A RDBMS has several major components: the business logic (or rules that govern what data are to be collected,
by whom, when, and how they’re to be used), data themselves (grouped by functions based on the business
needs, and the specific types of data (see below), and the commands to manipulate the data. What is
informally called a “database” is usually the computer application that carries out the business rules and
manipulates the data. Some database applications, such as FileMaker Pro, MS Access, and phpMyAdmin
provide GUI tools to facilitate defining databases, data tables, and creating reports from the data. Other tools,
such as MySQL, Oracle, Sybase are just the tool for creating and manipulating data - the programmer must
create the rest of the computer application for input, reports, etc. See also the .pdf (Database1.pdf )

For students new to RDBMS, it is helpful to think of it as a spreadsheet. There are columns and rows. Every cell
in the spreadsheet has a unique row and column address (e.g., row 5, column 2). If we know the row and
column, we can find the data. RDBMS are computer implementations of the relational algebra that makes it
possible to identify uniquely every cell and every set of related cells. This is the key: we can identify a single cell
by some unique value, such as a record number, or by a set of values, such as lastname+firstname
+date_of_birth+shoe_size.

Steps in creating a RDBMS


In both systems analysis and information architecture, one of the first phases of creating a RDBMS is to gather
information about what data need to be collected, how they will be used and by whom, and the technical
concerns about storing, retrieving, and presenting the data.
Data collection: only data that are actually needed ought to be collected. This is determined by
interviews, reviewing emails and other files that document the kind of problems people have with their current
system, and by studying how people in an organization use the data. Ultimately a report is generated that
outlines the problem and presents candidate solutions. From that report, a database administrator or
programmer considers the data problem from three perspectives: the conceptual, logical, and physical.
Conceptual phase: in this phase, the programmer or analyst breaks down the needs into functions and
the relationships among the data and the people who use the data. For example, say you’re creating a payroll
system (this becomes the “payroll database”). You need to know the name of the staff, their pay rate, hours
worked, whether the check has been printed, how much taxes and insurance to be deducted, and so on. In the
conceptual phase, you’d decide how the data should be grouped by function: e.g., record hours worked
function, print paycheck function, check data for accuracy function, and so on. At this high level, only the major
functions are defined and related and will become the “tables” that belong to the single database. How the
LIS488 | 2

data move between these functions is called “data flow analysis.” The analyst determines also who gets to see
what data and when (this is called the “view of the data”). For example, Joe cannot see Tom’s paycheck. Joe
cannot change his pay rate but his manager can. These controls have to be specified and documented.
From this we can move to the logical phase. Here the main activities are “data decomposition” and
“data normalization.” Data decomposition means breaking down the concepts into something closer to how a
computer might use them. The concept of “staff name” must be broken down, say, into last name, first name,
middle initial, job title. In addition, the analyst tries to remove redundant data from being collected. For
example we want to gather the staff member’s name only once and at the right time (when hired). This task is
called normalization. There are many forms of normalization but the goal for most of us is to reach “third
normal form” or 3NF. FileMaker Pro’s and MS Access’s interface enforces 3NF. Otherwise the programmer is
required to enforce normalization. [There are times when redundant data must be captured, but this has to be
justified in the design of the database system.]
All the data that are to be captured must be defined by being given a data type (see the Topic
Data_Types) and all the data gathered into a document called the data dictionary. The Data Dictionary is
literally that - the names of all the data to be captured, their data types, their aliases, how the data are used,
what table contains what fields, etc. It is critical to the success of any project to have and maintain the data
dictionary.
Physical phase. Once the data have been defined, the programmer uses the logical design phase
documents to construct the actual database and tables in software. If you use MS Access or similar product,
the Access program guides you but it is still easy to make mistakes if you don’t understand how RDBMS work.
We’ll focus on MySQL to demonstrate.

MySQL is a software program that helps you to create other software programs, specifically the data
part of a RDBMS, by issuing various commands. You’ll still need some kind of programming or scripting to
interact with the database and tables. [In LIS488, we’ll use php and Java.] All SQL programs cluster their
commands into two groups: Data Definition Language (or DDL) and Data Manipulation Language (DML).
DDL commands include CREATE database... , CREATE table … and others
DML commands include INSERT into …, SELECT …, MODIFY … and others

Example: Let’s say you’re creating a database about yourself and you want to insert into and retrieve data from
the database over the Internet. As a first try, we create a database called “Transcripts” and then a table called
“grades”. This is a possible data definition:
field name data type size example
last_name String 25 characters Smith, De la Rosa
first_name String 15 characters Jane, Tom
middle_initial char 1 M
collegeName String 25 Wellesley College
major String 25 Art History
age int 24
course_1 String 6 LIS488
grade_1 float 4 4.0
course_2 String 6 LIS458
grade_2 float 4 3.7
[and so on...]

Relational Database Management Systems


LIS488 | 3

If we know these data, we can play the kind of report (screen) design you want:
Welcome to Jane Doe’s online resume
I’m a 24 years old Art Major from Wellesley College now enrolled in GSLIS at
About me
Simmons College.
My classes LIS488 4.0 A
LIS458 3.7 A- GPA: 3.52

To get these data, we issue the command to SQL to use the database we want: “USE Transcripts”
Then we issue the command to the table that is part of the database (“grades”):
SELECT * FROM grades;

* means “all” so the command is “get all the records from the grades table”. Because there is only 1
record in the grades table, we retrieve all our data (in this case only the 1 record).
If everyone in the class wanted to post their grades online, then it would make more sense to separate
(decompose) the data into different functions: the student name and personal data in one table, the grades in
another table, and the information about the courses into another table.
database studentInfo
table 1 table 2 table 3
student_names grades class_info

So, let’s redefine our database tables and update the data dictionary.
Database: studentInfo
tables: student_names, grades, class_info

table: student_names
record_no int not null auto_increment
last_name String 25
first_name String 15
middle_initial char 1
major String 25
college String 25

table: grades
record_no not not null auto_increment
class_number String 6 not null
grade String 2

table: class_info
class_number String 6
class_name String 25
desc Text

Relational Database Management Systems


LIS488 | 4

Note that we define Strings but must choose the right kind of String for our database MySQL. Strings
are “varchar()” [a fixed length field from 0-255 characters long]. A “text” holds up to 65,535 characters.

Using this data dictionary, the programmer creates the physical form of the database and tables. The
underline means the field is indexed. More on that shortly. SQL commands end with a semi-colon ;

CREATE database studentinfo;

Now, let’s use the empty database and add tables: USE studentinfo; To create tables, we add these
commands:
CREATE TABLE student_names (
record_no int unsigned not null auto_increment,
last_name varchar(25),
first_name varchar(15),
middle_initial char varchar(1),
major varchar(25),
college varchar(25),
primary key (record_no)
);
[press the return key]

CREATE TABLE grades (


record_no int unsigned not null,
class_number varchar(6) not null,
grade varchar(2)
);

CREATE TABLE class_info (


class_number varchar(6),
class_name varchar(25),
desc Text
);

Notice that student_names.record_no field is “int unsigned” - this means the record number is an integer
between 0 - 65,535. It has to be an integer because the computer performs an arithmetic function on the
record number every time a new record is added. [The old record number is reviewed and 1 added to it to
create the new record number.] It is “not null” because we want to index that field for fast retrieval. We cannot
index on a value that is missing so the addition of “not null” forces the SQL program to give us an error if we try
to save the record without a value. We know there will be an index (and the primary key) by the last line of the
create statement. In the grades table, we see two fields cannot be null. Later we would create indices on these
fields, too. Finally, notice that the class_info table has a field called “desc” and a value of Text. This means we
can enter up to 63,535 characters in this field - far longer than we need to add a course description.

Assignment or Lab - Due before the next class.


1. Practice the various commands listed below and compare their behavior with different options.

Relational Database Management Systems


LIS488 | 5

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

No doubt, you’ve learned already some of the concepts & terminology of relational database modeling. To get
up to speed with what you’ve covered, here are some quick notes that may help us harmonize our perspectives.

Components of a DBMS:
Many roles and activities are involved; not all of which you may have encountered. Here’s a view of the
components of a DBMS. Notice the different contributions of programmers, users, and the DBA - the database
administrator. Note, too, the functions that constitute a DBMS (and where the DDL and DML fit in). Ultimately
the commands and functions must be communicated to the computer system itself - via the file manager,
various other access methods and buffers before reaching the actual data stored in the relationship databases
and tables.

Query Processor: transforms queries into a


series of low-level instructions directed to the
database manager.
Database Manager (DM): Interacts with the
user-submitted application programs and
queries. The DM accepts queries and examines
the external and conceptual schema to
determine what conceptual records are
required to satisfy the request. The DN places a
call to the File Manger to perform the request.
File Manager: manipulates the underlying
storage files and manages the allocation of
storage space on the disk. Actual physical
manipulation of the data is passed to the
appropriate access method.
DML preprocessor: converts DML statements
embedded in an application program into a
standard function call in the host language
(e.g., MySQL).

DDL compiler: compiler converts DDL


statements into a set of tables containing meta-
data. These tables are then stored in the catalog while control info is stored in the data files headers.
Catalog manager: [not the library kind!] - manages access to and maintains the system catalogue; the system
catalog is accessed by most of the DBMS components.
There are other important functions in SQL software such as authorization control, command processor,
integrity checker, query optimizer, transaction manager, scheduler, recovery manager, and buffer manager.

Most of these functions are performed by the 3rd party application (such as MS Access) or can be manipulated
at the command line or in other resource files. Very large systems, such as those that usually use Oracle have
lots of these helper resources.

Relational Database Management Systems


LIS488 | 6

Terms:
Relation: a relation is a table with columns & rows
Attribute: an attribute is a named column of a relation
Domain: is the set of allowable values for one or more attributes.

Figure 2: Instances of two tables (branch and staff relations):

What to know
Database Shared collection of logically related data (and a description of these data), designed
to meet the info needs of an organization.
Entity Distinct object (person, place, thing, concept, or event) represented in the database
Attribute A property that describes some aspect of the object we wish to record; also a named
column of a relation.
Domain Set of allowable values for one or more attributes.
Tuple Row of a relation
Degree The degree of a relation is the number of attributes it contains.
Cardinality The number of rows in a relation.
Superkey Attribute or a set of attributes that identifies uniquely a tuple within a relation.
Candidate key Superkey such that no proper subset is a superkey within the relation.
Relational db Collection of normalized relations.
Primary key Candidate key that is selected to identify tuples uniquely within the relation.
Foreign key Attribute or set of attributes within one relation that matches the candidate key of
some (possibly the same) relation.
Null A value for an attribute that is currently unknown or is not applicable for this tuple

Relational Database Management Systems


LIS488 | 7

Referential integrity If a foreign key exists in a relation, either the foreign key value must match a candidate
key value of some tuple in its home relation or the foreign key value must be wholly
null.
Join The union of two or more tables.
Relationship Association between several entities.
Relation A table with columns and rows.
DDL Data Definition Language - set of commands to define the data and constraints on the
data stored in the database
DML Data Manipulation Language - provides a general enquiry facility to the data (aka
query language)
View of the data Dynamic result of one or more relational operations on the base relations to produce
another relation. A view is a virtual relation that doesn’t actually exist in the DB.
Logical db design The layout of the relationships of data, using specific design techniques and tools that
document the needs and uses of data, according to an organization’s specific data
needs.
Business rules The articulation of an organization’s data needs.
Physical db design The implementation of the logical database design, usually expressed as the creation
of databases, tables, indices, etc.
OODBMS Object-Oriented Database Management System.
External view Users’ view of the db
Conceptual level What data are stored in the db and the relationship among the data
Internal level Physical representation of the db on the computer.
DB Schema The definition of a database; the structure and content in each data element within
the structure. Often created using visualization tools.
Data independence Techniques that allow data to be changed without affecting the applications that
process it.
5GL Fifth Generation Language - expression referring to computer languages that with
each iteration are closer to human language by generation, e.g., 4GL, 5GL
Data normalization The process of analyzing data into record groups for more efficient processing. There
are many stages, the most standard result being “3NF” (third normal form) where data
are identified only by the key field in their record. The main purpose is to eliminate
having to store a single datum in more than one place.
3NF The standard
Information system Resources that enable the collection, management, control, and dissemination of info
throughout an organization.
DB app lifecycle DB planning, system definition, requirements collection and analysis, db design,
application design and prototyping, etc.
Requirements analysis Process of collecting and analyzing info about the part of the organization that is to be
supported by the DB application and using this info to identify the users’ requirements
of the new system.
CASE Computer-Aided Software Engineering - usually software that helps in the
development of an information system, including analysis, design, and programming.
Data dictionary The document (or database) that defines the databases, tables, data types, sources,
etc., for an information system.

Relational Database Management Systems


LIS488 | 8

Entity type An object or concept that is identified by the organization as having an independent
existence.
Weak entity type Entity that is existence-dependent on some other entity.
Strong entity type Entity that is not existence-dependent on some other entity.
Composite attribute Attribute composed of components, each with an independent existence.
Multi-valued attribute Attribute that holds multiple values for a single entity
Derived attribute Attribute that represents a value that is derivable from the value of a related attribute
or set of attributes not necessarily in the same entity.
Superclass Entity type that includes distinct subclasses that requirement to be represented in the
data model.
Normalization The process of producing a set of relations with desirably properties, given the data
requirements of an enterprise.
1:M Representation of the one-to-many relationship, one data element related to many
others.
M:N Representation of many-to-many data relationships; database normalization breaks
down M:N to 1:M
1:1 One-to-one data relationship.
Oracle Popular very large scale relational database product.
SELECT Database command, or statement, to select data from a table
ALTER Database command, or statement, to modify the structure of a database table
UPDATE Database command to change the data contents of an existing row.
INSERT Database command to add new data (add a new row) into a table.
phpMyAdmin Web-based GUI for working with MySQL
Data decomposition Breaking down of work functions into discrete units of data
ER Entity-relation diagram, a graphic representation of data relationships
UML Unified Modeling Language, an object oriented analysis and design language; has 12
diagrams (four structural, five behavioral, and three model management (packages,
subsystems, and models).
Web-enabled DB A relational database that has been linked to the Internet.
Object-oriented programming Writing software that supports a model wherein the data and associated
processing (“methods”) are defined as self-contained “objects.” OOP has three major
features, encapsulation, inheritance, and polymorphism.

Relational Database Management Systems

Potrebbero piacerti anche