Sei sulla pagina 1di 33

FIT2094

Databases - Notes

Contents

RELATIONAL MODEL 2

RELATIONAL DATABASE 3

RELATIONAL OPERATORS 3

DATABASE DESIGN LIFE CYCLE 5

ENTITY RELATIONSHIP DIAGRAMS 6

LOGICAL MODELLING 7

NORMALISATION 10

STANDARD QUERY LANGUAGE 13

BASIC DATA MANIPULATION COMMANDS 15

SEQUENCES 17

OPERATORS 17

ADDITIONAL SELECT KEYWORDS 19

JOINING TABLES 21

TRANSACTION MANAGEMENT 23

SUBQUERIES 26

SQL FUNCTIONS 27

RELATIONAL SET OPERATORS 29

VIRTUAL TABLES 29

DATABASE CONNECTIVITY 30

1
Relational Model

The relational model was introduced in 1970 can contained the fundamental basics for a
relational DBMS’s basic structure. A DataBase Management System is the set of protocols used
to store a collection of data and explain how they are ordered.

• A relation is an abstract object

• A domain is a set of indivisible values, such as name, data type or data format. It provides
the restrictions on size for each data type.

A relation has two parts:

• A relation heading is also called a relational schema, consisting of a fixed set of


attributes:

‣ R(A1, A2,…, An)

‣ R is the relation name, A is the attribute type

‣ Each attribute corresponds to a domain

• A relation body, also known as an instance or state consists of time-varying set of


tuples.

‣ A tuple is an ordered list of values, similar to an array.

In a tabular representation, the relation heading forms the column headings and the relation body
forms the entries or rows.

For a relation to exist, there must be no duplicate tuples, which means there must be no two
tuples that contain the exact set of all values.

• Tuples are also unordered in each relation and there are no ordering of attributes within a
tuple.

• Tuple values are atomic, meaning they cannot be divided into further elements.

• When comparing, rows and columns are ordered and no tuples are deleted.

A candidate key K of relation R is an attribute or set of attributes which exhibit the following
properties:

• No two tuples of R have the same value for K (unique)

• No proper subset of K has the uniqueness property (minimality). This means there are no
unnecessary attributes chosen for K.

One candidate key is chosen to be the primary key of the relation, but a relation may have
multiple candidate keys. The remaining keys are termed alternate keys. A superkey is an
attribute or combination of attributes which only exhibit the uniqueness property.

A primary key must be chosen considering the data that may be added to the table in the future.
It has a number of desirable characteristics:

• Unique values - must be unique and cannot be duplicated

• Non-intelligent - must not have any semantic meaning behind it (ie a string of numbers is
preferred)

• No change over time - the primary key is fixed from the moment of tuple creation

• Single-Attribute - Contain the minimal number of attributes possible

• Numeric - a numeric key is desired as they can be ordered

• Security-Compliant - A social security number for example should not be used as a


primary key value

When writing relations, the following format is used, with the primary key underlined:

staff ( staffID, surname, initials, address, phone )

2
Relational Database

A relational database is a collection of normalised relations. Examples include the following:

• Order ( order_id, order_date )

• Order-Line ( order_id, product_id, quantity )

• Product ( product_id, description, unit_price )

A foreign key is an attribute (or multiple) in a table that exist in the same, or another, table as a
primary key. It must either match the primary key from another table or be NULL. This pairing
between primary keys (PK) and foreign keys (FK) creates a relationship between tables.

To ensure data integrity, PK values must be unique and not be NULL. The values of the FK must
either match a value of the PK in the related relation or be NULL. All values in the column must
come from the same domain (same data type and range).

Relational Operators

Operators in databases work similar to mathematics and apply to at most two relations at a time.
They are procedural and can perform a series of tasks. They are:

• Select

• Project

• Product

• Join

• Union

• Intersection

• Difference

• Division

The project operator (π) selects the values of a few different columns. Given a specific attribute,
it displays the corresponding value of that attribute from all records in the table.

The select operator (σ) selects a particular record and displays all attributes neatly. It is the main
operator used in RMDBS systems.

The join operator combines data from two or more relations, based on a common attribute or
attributes:

• The theta join (θ) uses on of the standard arithmetic comparisons ( <, ≤, =, ≥, > ) to
connect and display two relations. It acts as a boolean output; either the condition is met
or not.

• A natural join (⋈) compares all columns of two tables which have the same column name
and then joins them together and links attributes of the same name (a primary key ID for
example).

3
• An outer join returns a set of records (or rows) that include what a normal (or inner) join
would include, but it also includes other rows for which no corresponding match is found
in the other table. This can be under three different types:

‣ Left Outer Join or Left Join

‣ Right Outer Join or Right Join

‣ Full Outer Join or Full Join

Example:
Suppose we have the following 4 relations:

• Hotel ( Hotel-No, Name, Address )

• Room ( Room-No, Hotel-No, Type, Price )

• Booking ( Hotel-No, Guest-No, Date-From, Date-To, Room-No )

• Guest ( Guest-No, Name, Address )

1) List the name and addresses of all hotels

π Name, Address (Hotel)

This projects all names and addresses of each tuples found in the Hotel relation. The π
operator is used as a project symbol.

2) List all single rooms with a price below $50

π Hotel.Name, Room.Room-No ( σ Type = ‘Single’ ∩ Price < 50 (Hotel ⋈ Room))

When implementing multiple relations in a single query, the relation name is often used,
followed by .attribute. Here, it selects all tuples that have the appropriate type and price (∩
is the intersection symbol) from the natural join of Hotel and Room (⋈). In this case, a
natural join connects the Hotel-No from Hotel with Room.

3) List the names and addresses of all guests

π Name, Address (Guest)

Similar to question 1.

4) List the price and type of all rooms at the Grosvenor Hotel

π Room-No, Price, Type (σ Hotel.Name = ‘Grosvenor’ (Hotel ⋈ Room))

Here we project all rooms, with both the price and type entities, from the selection of all
hotels with the name ‘Grosvenor’. This is selected from the natural join of the two
relations, similar to question 2. Another approach that can be used saves program time if
there are very few tuples with the same hotel, but has no major effect if there are multiple.

π Room-No, Price, Type (Room ⋈ σ Name = ‘Grosvenor’ (Hotel))

4
5) List all names and addresses of guests currently staying at the Grosvenor Hotel (assume that
if the guest has a tuple in the BOOKING relation, then they are currently staying in the hotel)

σ Date-From ≤ TODAY ∩ Date-To ≥ TODAY ∩ Name = ‘Grosvenor’ (Hotel ⋈ Booking)

The attribute TODAY receives the current date and compares it to the booking date start
and end times. Once again, a natural join connects the relation Hotel with Booking with
the attribute, Hotel-No.

Database Design Life Cycle

When designing a database, there are a sequence of steps that are often followed before the
product is completed. They are:

Requirements Definition → Conceptual Design → Logical Design → Physical Design

The requirements definition identifies and analyses user views. A user view may be a report to
be produced or a particular type of transaction that should be supported. The output is a
statement of specifications which describes the user views’ particular requirements and
constraints.

During the conceptual design step, the data model of the database is designed. There are
various methodologies that can be employed, but the most common on is the ER diagram.

• An entity relationship (ER) diagram give a visual indication of the design

• Contains basic components:

‣ Entity

‣ Attribute

‣ Relationship

• They can be used to display both the keys and/or attributes of each table and how they
are connected by relationships.

The logical design develops a data model which targets a particular database model (e.g
relational, hierarchal, network, ect) and is independent of DBMS implementation package.
Normalisation techniques are used to test the correctness of the logical design.

The physical design process develops a strategy for the physical implementation of the logical
data model. It is dependant upon the DBMS environment chosen to be used.

5
Entity Relationship Diagrams

An ER diagram (ERD) connects relations and attributes in a visual way. It uses the keys defined
for each relation to create a relationship and is often employed during the conceptual or logical
stages of the database design process.

A connection links two or more relations together and is often connected with a key. There are
multiple types of connections in an ERD:

• A one-to-one connection states that a single unique primary key is connected to another
relation with the same unique primary key (often foreign). For example,

Customer (Customer_No) → Addresses (Customer_No)

• A one-to-many connection will have a unique key in one relation that is connected to an
attribute in another table with the same value. For example,

Customer (Customer_No) → Order (Customer_No)

• A many-to-many connection connects an attribute in one relation that could occur


multiple times to the same attribute in another relation that may also have multiple entries.
These are not common and usually not used in RMDBS.

An entity is an object in the system that is modelled and has information stored about. In the
physical design of a database, they represent a table. Entities can also have different properties:

• A strong entity has a key which may be defined without reference to other entities. For
example an EMPLOYEE entity does not require other entities to make sense.

• A weak entity has a key which requires the existence of one or more entities. For
example, a FAMILY entity must include the key from EMPLOYEE to create a suitable
family. It is dependant to another entity and its primary key is partially, or totally, derived
from its parent entity.

• A unary relationship exists when an association is maintained within a single entity.


Similarly, binary and ternary relationships exist when when two or three entities are
related, respectively.

• A composite or ‘bridge’ entity is used to handle a many-to-many relationship. It shares


primary keys from both connected tables.

Relationships can be classified into two kinds; identifying and non-identifying:

• An identifying relationship is when the existence of a tuple in an entity depends on an


tuple in a parent entity. Formally, the correct way to create this is to make the foreign key
part of the child entity’s primary key. For example, a PERSON may have multiple phone
numbers, and the primary key of the PHONE_NUMBER entity must include the person_id.
Usually, it is the relationship that connects a strong to a weak entity. It is shown with a
solid line:

6
• A non-identifying relationship is when the primary key attributes of the parent must not
become primary key attributes of the child. Another example is an OWNER can own a
BOOK, but the BOOK can exist without the OWNER. It is shown with a dotted line:

For each attribute, the domain specifies the set of all possible values. There are also many types
of attributes, each with a unique purpose:

• Simple attributes cannot be subdivided:

‣ Siblings, Sex, Marital Status

• Composite attributes can be subdivided into additional attributes:

‣ Address into street, city, zip

• Single-Valued attributes can only have a single value:

‣ Person only has one social security number

• Multi-Valued attributes can have many values:

‣ Person may have several degrees

‣ Car colour may consist of body colour, trip colour, ect

• Derived attributes can be derived with an algorithm and do not need to be stored:

‣ Age can be derived from date of birth

Logical Modelling

Cardinality describes the uniqueness of the data and often refers to the relationships between
two entities. In ER diagrams, there are a set of relationship connections using the Crow’s Foot
Symbols.

• A circle at the terminal of the relationship line indicates that that particular entity is not
required for the relationship of a particular tuple to exist.

‣ Dotted lines represent a strong-strong entity relationship; a non-identifying one.

‣ Normal lines represent a strong-weak entity relationship; an identifying one.

At different levels, there are different terminologies for terms like relationships and entities.

Conceptial Logical Physical

Entity Relation Table

Attribute Attribute Column

Instance Tuple Row

7
Conceptial Logical Physical

Identifier Primary Key Primary Key

Relationship - -

- Foreign Key Foreign Key

Relations are the logical level terminologies of entities.

• Each relation has a unique name in the database

• Each row is unique - duplicate tuples are not allowed

• Each column has a meaningful and unique name

• The order of attributes is unimportant

• The order of tuples is unimportant

• Each entry is atomic; this means each cell can only contain one entry

Mapping is the process of transferring conceptual data, in the form of entities and attributes, into
logical models.

Mapping composite attributes into a relation requires only the simple component to be included.
This improves data accessibility and helps maintain data quality. For example, ADDRESS could
be mapped into STREET, CITY, STATE and ZIP, as four seperate attributes.

Mapping multivalued attributes requires a different process:

• When the regular entity type contains a multivalued attribute, two new relations are
created.

• The first relation contains all the attributes of the entity type except the multivalued
attribute itself.

• The second relation contains two new attributes that form the primary key. One of the
attributes is the PK from the first relation, which becomes the foreign key (FK) in the
second relation and the other is the multivalued attribute.

• For example, if an EMPLOYEE has several SKILLS,

To map a weak entity, create a new relation and include all the simple attributes in the relation.
The PK of the identifying relation is included as a FK in the weak relation. For example:

To map a binary relationship, such as a one-to-many (1:M) relationship, first create the relation for
each of the two entity types participating in the relationship. Then include the PK attribute(s) of the
entity on one side of the relationship with the identical FK on the other side.

Similarly, a many-to-many (M:N) relationship can be created using the same system. One-to-one
(1:1) relationships mean that every single tuple in one relation is connected to, at most, one tuple
in another table and vice-a-versa.

• The primary key on the mandatory side of the relationship becomes the foreign key on the
optional side of the relationship.

8
• Where both sides are optional, place the FK on the side which causes the fewest NULL
values.

• If both sides of the relationship are mandatory, then it is likely the two entities can be
merged into one relation.

Unary relationships are ones where a relationship exists between two tuples within the same
relation. For example, an EMPLOYEE could supervise multiple EMPLOYEES. If this is the case:

• Create a relation for the entity type

• Add a FK within the same relation that references the PK of the relation

• A recursive foreign key is a FK in a relation that references the PK values of the same
relation.

Ternary relationships that exist (between 3 relationships) must be turned into three seperate
binary relationships between the three relations. Often, another relation can be created to help
facilitate this. For example, a ternary relationship for PATIENT TREATMENT between the three
entities, PATIENT, PHYSICIAN and TREATMENT that exists could become the following logical
relation:

9
Normalisation

Normalisation is a process that assigns attributes to entities so that data redundancies are
reduced or eliminated. It corrects table structure to reduce the likelihood of data anomalies to
occur. In databases, this occurs in different normal forms, namely 1NF, 2NF and 3NF (There are
others but the become increasingly more complicated and require more and more joins to create).
Normalisation operates on the logical level.

Denormalisation is the process of reducing the normal form of the database to account for
performance requirements, whilst still producing the desired output. For example, 3NF could be
converted to 2NF if needed.

There are three types of data anomalies:

• An update anomaly exists when one or more instances of duplicated data is updated, but
not all. For example, in a table with duplicate entries of the same person, if the home
address is changed on one person, it must also be changed on all the other entries with
the same person.

• An insert anomaly occurs when certain attributes cannot be inserted into the database
without the presence of other attributes.

• A delete anomaly occurs when the deletion of one attribute will cause an loss of data
from other attributes.

The object of normalisation is to produce a set of relations and data that conform to the following
properties:

• Each table represents a single subject - for example, COURSE will only contain
information about the courses, not the students doing them.

• No data item will be unnecessarily stored in more than one table - this stops update
anomalies from occurring.

• All nonprime attributes in a table are dependant on the primary key - the entire primary key
and only the primary key. This ensures data able to be uniquely identified.

‣ A prime attribute is a key attribute, usually associated with the primary key.

‣ A nonprime attribute is not part of a key.

• Each table is void of insertion, update and delete anomalies - this ensures the integrity of
the database is maintained and data is consistent.

• The primary key is the candidate key (the minimal, irreducible, superkey) selected to
identify the rows of each table.

Dependancy is a property of relationships that describe the extent to which one attribute affects
another attribute in another table:

• Functional dependancy is a relationship that exists when one attribute uniquely


determines another attribute. For example, attribute A determines attribute B (that is, B is
functionally dependent on A) if all of the rows in the table that agree in value for attribute A
also agree in value for attribute B.

• Fully functional dependance occurs if attribute B is functionally dependent on a


composite key A but not on any subset of that composite key. The attribute B is fully
functionally dependent on A.

• Partial dependancy is a condition in which an attribute is dependant on only a portion of


the primary key.

• A transitive dependancy exists when there are functional dependancies such that X → Y,
Y → Z and X is the primary key. This means that X determines Z via Y. The transitive
dependancy in this case is X → Z. It is a condition in which an attribute is dependent on
another attribute that is not part of the primary key.

First normal form (1NF) describes the tabular format which:

• All of the key attributes are defined.

• There are no repeating groups in the table (each row/column intersection contains one and
only one value, not a set of values).

10
• All attributes are dependant on the primary key.

1NF holds all data in a single table. As such, all relationship tables must satisfy the 1NF
requirements. To convert to 1NF the following steps must be taken:

• Eliminate the repeating groups:

‣ A repeating group is a group of multiple entries of the same type that can exist for
any single key attribute occurrence. For example, a car can have multiple colours for
its top, interior, bottom, trim, ect.

‣ Start by representing the data into a tabular format, where each cell contains a single
value and there are no repeating groups.

‣ Eliminate nulls by making sure each group has a suitable data value.

• Identify the primary key:

‣ adequate candidate key must be chosen to be the primary key that can uniquely
An
identify all tuples in a table.

‣ Primary keys can be composed of a combination of keys if needed.

• Identify all dependancies:

‣ A dependancy diagram maps out all data dependancies (primary key, partial and
transitive) that occur within a table structure.

Although, the problem with 1NF is that it can still contain partial dependancies, which are based
on only part of the primary key. This means it is still subject to data anomalies.

Second normal form (2NF) is that tabular format which:

• Is in 1NF.

• Includes no partial dependancies - no attribute is dependant on a portion of the primary


key.

Conversion to 2NF occurs only when the 1NF has a composite primary key. If the 1NF has a
single-attribute primary key, then the table is already in 2NF. If the primary key is composite, the
following steps are taken:

• Make new tables to eliminate partial dependancies:

‣ For each component of the primary key that acts as a determinant in a partial
dependency, create a new table with a copy of that component as the primary key.

‣ These components must be copied to the new table, but must still exist in the old
table, where they will become foreign keys.

• Reassign corresponding dependant attributes:

‣ The attributes that are dependent in a partial dependency are removed from the
original table and placed in the new table with the dependency’s determinant.

‣ All attributes that are not dependant on the partial dependancy are left in the old
table.

11
12

At this point, most anomalies are removed, as duplicate items no longer exist. However, it is still
possible for transitive dependancies to exist in 2NF through various table joins. This means the
primary key may rely on one or more nonprime attributes to functionally determine other nonprime
attributes, as indicated by a functional dependence among the nonprime attributes.

Third normal form (3NF) is the tabular format which:

• Is in 2NF.

• Contains no transitive dependancies.

To create 3NF the following steps are taken:

• Make new tables with transitive dependancies:

‣ For every transitive dependency, write a copy of its determinant as a primary key for a
new table.

‣ A determinant is any attribute whose value determines other values within a row. For
example, if you have three different transitive dependancies, you have three different
determinants.

‣ The determinants are still required to be within the original table as a foreign key.

• Reassign corresponding dependant attributes:

‣ Place the dependent attributes in the new tables with their determinants and remove
them from their original tables.

Similarly to the creation of 2NF, when there is a dependancy (whether partial or transitive), the
solution is to create a new table with the dependant attributes and to connect it with the original
via a primary-foreign key pair. It is important that 2NF is achieved before 3NF can be achieved.

Standard Query Language

Standard Query Language (SQL) is a data definition language (DDL) for creating databases,
tables, indexes, views, and a data manipulation language (DML) for updating and inserting data to
the database. It has a basic command set and has under 100 unique commands. In SQL, a query
covers both a question and action done to the database; this could be creating a table or
retrieving a set of cells.

Using a standard RDBMS, you must be authenticated before tables can be created.
Authentication is the process the DBMS uses to verify only registered users can access the data.
This usually is encompassed by a username/password login.

A schema is a logical group of database objects, such as tables and indexes, that are related to
each other. Usually, a schema belongs to a single user or application. A single database can hold
multiple schemas that belong to different users. They allow the database to group tables by
owner or function and enforce a level of security by allowing each user to only see the tables that
belong to their particular schema. The following is the SQL code for creating a schema.

CREATE SCHEMA AUTHORIZATION {creator};

Usually, this command is optional and not required.

Data types are required to be selected for each column, and these are strict; all cells in each
column must adhere to the correct data type. For names and text, varchars are used, while for
numeric values, integers or decimals could be used. It depends on the use of the variable, and
13
this does not change after the creation of the table. Below are some examples of different data
types supported by SQL.

Some other types not included in the above table include TIME, TIMESTAMP, DOUBLE,
CURRENCY and LOGICAL.

To create a table using SQL, the CREATE command is used, and each column must be specified
with both a name and its appropriate data type.

CREATE TABLE T_Table (

column1 INTEGER NOT NULL UNIQUE,

column2 INTEGER,

column3 VARCHAR (25) NOT NULL,

column4 CHAR (3) NOT NULL,

PRIMARY KEY (column1),

FOREIGN KEY (column2) REFERENCES T_OtherTable ON UPDATE CASCADE

);

In the example above, the table is created with four columns with the following properties:

• The NOT NULL specifies that a data entry must be made for that particular field. Validation
in the program end can be added such that a value must be entered.

• The UNIQUE specification creates a unique index in that attribute. It avoid having
duplicated values in that specific column.

• The primary key attributes contain both a not null and a unique specification.

• The REFERENCES key word connects a column with a column of the same name in
another table.

• ON UPDATE CASCADE allows the table to be updated correctly if the value in the
connected table of the foreign key is changed. Some RDBMS programs do not support
this command.
• The entire table is enclosed in parenthesis and each specific column is finished with a
comma.

• The entire command sequence ends with a semicolon (usually, but depends on the
RDBMS program being used).

In table names and column names, reserved keywords may not be used. A reserved keyword are
words that are used by SQL to perform specific functions, such as update or sum.

Constraints are a set of rules that help protect the integrity of the database and is crucial. The
foreign key is constrained with the on update specifications and determined by the table it is
referencing. A change is one table must be reflected automatically in a connected table. Besides

14
the PK and FK constraints, the ANSI SQL standard defines the following constraints to exist as
well:

• The NOT NULL constraint ensures that a column does not accept null values.

• The UNIQUE constraint ensures that all values in a column are unique.

• The DEFAULT constraint assigns a value to an attribute when a new row is added to a
table. The end user may, of course, enter a value other than the default value.

• The CHECK constraint is used to validate data when an attribute is entered, such as
checking for a minimum value or maximum date. The data for a check constraint is only
accepted if it means the appropriate condition.

For example, the following SQL could be made for the creation of a table:

CREATE TABLE T_Table (

column1 INTEGER NOT NULL UNIQUE,

column2 INTEGER,

column3 VARCHAR (25) NOT NULL DEFAULT ’My Database’

column4 CHAR (3) NOT NULL CHECK(column4 IN (‘000, ‘111)),

PRIMARY KEY (column1),

FOREIGN KEY (column2) REFERENCES T_OtherTable ON UPDATE CASCADE

);

Basic Data Manipulation Commands

To insert a row into a particular table, SQL uses the INSERT command. Below is an example of
how to add a new row, with N number of columns:

INSERT INTO T_Table VALUES (value1, value2, …, valueN);

In some cases, assuming the column does not have a NOT NULL constraint, it may be needed to
not enter a particular value into a column. Here, the NULL keyword can be added:

INSERT INTO T_Table VALUES (value1, NULL, value3, …, valueN);

If only some of the values are required to be inserted, and the others are to be left as empty, then
only a selected amount of columns can have their values inserted into. In the example below, the
only columns that have a value inserted are column1 and column3. The rest are left null.

INSERT INTO T_Table(column1, column3) VALUES (value1, value3);

In many RDBMS software, any changes made to a table contents are not saved to the disk until
the database is safely closed and has been committed. The COMMIT command saves all work:

COMMIT;

Commit commands also update the integrity of all data inserted, updated or deleted.

To produce a query, the SELECT command organises data from one or more table and displays it
in a view. For example, to list the entire contents from a particular table, the following SQL
command will be executed:

SELECT * FROM T_Table;

The (*) is a wildcard character and means ‘all’. So the above command will select all rows from
the table and output them in a view. In contrast, to only view a couple of columns from a table, the
following command could be executed:

SELECT (column1, column3) FROM T_Table;

The FROM clause of the query specifies which table or tables the data is to be retrieved from.

Once data is inserted, the UPDATE command can be used to modify data:

UPDATE T_Table SET column2 = ‘AAA’ WHERE column3 = ‘3’;

In the example above, the query will check every row in the table where the value of column3 has
a value of ‘3’ in the particular row, and in those rows only it will update the value of column2 to
‘AAA’. Similarly, updates can be made to for multiple column changes:

UPDATE T_Table SET column2 = ‘AAA’, column4 = ‘BBB’ WHERE column3 = ‘3’;

15
If the new data has not been committed yet, the ROLLBACK command undoes any changes
made to the database and returns the state back to the last commit.

ROLLBACK;

The DELETE statement can be used to delete a row from a table. For the example below, every
row which contains a ‘3’ in column3 will be removed:

DELETE FROM T_Table WHERE column3 = ‘3’;

Alternatively, to delete every row from a table, the where clause is not needed:

DELETE FROM T_Table;

Note, this command does NOT delete the entire table, just the contents of it.

Data can be inserted into a table with a select query, known as a subquery. A subquery is
embedded (or nested) inside another query. Also known as a nested query or an inner query. For
example, all data in a particular column from a table can be added to another table with the
command:

INSERT INTO T_TableNEW SELECT column1 FROM T_Table2;

The subquery can be as complicated as needed and the insert line will still work appropriately:

INSERT INTO T_TableNEW SELECT column1 FROM T_Table WHERE column3 = ‘3’;

Any changes to the table’s structure can be made using the ALTER TABLE command, followed
by a clause of the respective changes. There are three main alterations that can be made; add,
modify and drop.

The ADD command adds one or more new column to the table.

ALTER TABLE T_Table ADD (column6 CHAR(1));

The column will be added to the table and unless a default clause is given, the default value will
be NULL for all existing rows. It is important that a NOT NULL clause is not given to the new
column as this will give an error message. Other column properties can be added to the new
column, and multiple columns can be added too:

ALTER TABLE T_Table ADD (column6 CHAR(1) DEFAULT ’A’, column7 CHAR(2));

The MODIFY command can change the properties of a particular column in a table. For example,
to change the datatype of a column, the following command could be executed:

ALTER TABLE T_Table MODIFY (column3 CHAR(3));

In most cases, this is only allowed if the column being changed is already empty. If it has data in
it, only adjustments to the length of data can be made. For example, the amount of decimal digits
displayed in a field can be updated:

ALTER TABLE T_Table MODIFY (column5 DECIMAL(9,2));

The DROP command removes a column from a table. Columns that are involved in a foreign key
attribute cannot be dropped, nor can columns that are the only one in a table. The table can only
be dropped if it is on the ‘one’ side of the relationship.

ALTER TABLE T_Table DROP (column6, column7);

This command can also be used to remove an entire table from the database:

DROP TABLE T_Table;

Sometimes it is necessary to break a table up into smaller tables, and SQL provides a way that
avoids manual copying of data. This is known as a partial table.

CREATE TABLE T_Part AS SELECT column1, column2 FROM T_Table;

However, when creating a new table based on another table, the foreign and primary keys are not
added. To define a primary key, use:

ALTER TABLE T_Part ADD PRIMARY KEY (column1);

Similarly, foreign keys can also be added and reference other tables.

ALTER TABLE T_Part ADD FOREIGN KEY (column2) REFERENCES T_Table;

16
Sequences

Oracle does not support AutoNumber data types for creating an auto generated primary key. A
sequence can be used to assign values to a column on a table. A basic sequence can be created
by:

CREATE SEQUENCE seqName [START WITH n] [INCREMENT BY n] [CACHE | NO CACHE];


For example, a sequence for a key that increments by 1 from zero could be:

CREATE SEQUENCE column_seq START WITH 0 INCREMENT BY 1;


CACHE specifies whether or not the database will preallocate sequence numbers in memory. In
SQL Server, NO CACHE is two words but in other RDBMS the clause can be one work
(NOCACHE). You can check all of the sequences created using the following SQL command:

SELECT * FROM USER_SEQUENCES;


Once a sequence has been created, when developing a new table, the sequence value can be
used as the primary key. For example, inserting a row into a table with a auto incremented primary
key could use the following command:

INSERT INTO T_Table VALUES (column_seq.nextval, ‘2', NULL, ‘AAA’);

The .nextval command retrieves the next value in the sequence specified and then saves the new
one to the value for the next insert to use. Once the sequence value is used, it cannot be used
again, even if the previous sequenced row has been deleted from the table. You can drop a
sequence from a database using:

DROP SEQUENCE column_seq;

Dropping a sequence does not remove values from a table that previously used the sequence
numbers.

Operators

A partial table can be created by restricting what has been selected from one or more other
tables. The WHERE clause is used to add conditional restrictions. If no rows match the
conditions, the output table will be empty. For example:

SELECT column1, column2 FROM T_Table WHERE column3 = ‘345’;

For some conditions, the following operators can be used:

As SQL is interpreted in alpha numeric values, the operators can be applied to text and
characters. For example, selecting all values where a column < ‘C’ would select all text that began
with A or B. The string characters are judged from left to right, meaning that the word, Be, would
have a greater value than the word, Adjudication. Hence, if a number is placed in a text field, the
number 5 will be interpreted as being greater than 44.

Dates can undergo operators and are in the dd/mm/yyyy format. For example:

SELECT column1 FROM T_Table WHERE columnDate >= ’20-Jan-2016’;

SELECT column1 FROM T_Table WHERE columnDate >= ’20-01-2016’;

SELECT column1 FROM T_Table WHERE columnDate >= ’20/01/2016’;

17
If columnDate is a date formatted field, then the above three SQL queries will output the same
result.

Additional columns can be created from expressions. For example, if you wanted to multiply the
value of one column to another column in a table, the following could be done:

SELECT column1, column2 * column3 FROM T_Table;

The output of this example would produce two columns; column1 values for each row and an
additional column with the new expression associated with each row. These are known as
computed columns or alias. An alias is an alternative name for a column or table in an SQL
statement. By default, this computed column will have a name such as ‘column2 * column3’, but
that can be manually changed using the AS keyword followed by the new name of the column:

SELECT column1, column2 * column3 AS totValue FROM T_Table;

The rules of precedence are the rules that establish the order at which computations are
calculated. The operations are computed in the following order:

• Parenthesis ( )

• Power Operations

• Multiplication and Division

• Addition and Subtraction

For multiple conditions, the logical operators are used to combine conditions into one larger
statement. The OR operator will return a row if at least one condition is met. For example:

SELECT * FROM T_Table WHERE column1 > ‘2’ OR column2 = ’45’;

In this query, the rows outputted will either have a column1 value greater than 2 or column2
having a value of 45 or both. There is no requirement of both conditions to be met, and it is not
exclusive - meaning both conditions are allowed to be met.

The AND operator requires all of the conditions to be met for the row to be outputted. For
example:

SELECT * FROM T_Table WHERE column1 > ‘2’ AND column2 = ’45’ AND column3 = ‘A’;

In this example, all three conditions must be satisfied to be used in the query output. The AND
and OR conditions can also be combined:

SELECT * FROM T_Table WHERE column1 > ‘2’ AND column2 = ’45’ OR column3 = ‘A’;

By default, the query will be read left to right, so in this example, the column1 and column2
conditions are grouped. To group the column2 and column3 conditions, parenthesis can be used.

SELECT * FROM T_Table WHERE column1 > ‘2’ AND (column2 = ’45’ OR column3 = ‘A’);

These two queries will produce a different set of data.

The NOT operator negates a value of a conditional expression. For example:

SELECT * FROM T_Table WHERE NOT (column1 = ‘3’);

In this example, all rows where column1 has a value excluding ‘3’ will be selected.

The BETWEEN operator can check if an attribute has a value with a range of two values. For
example:

SELECT * FROM T_Table WHERE column1 BETWEEN ’50’ AND ‘100’;

Some databases do not support the between operator. In this case, the following query is
identical:

SELECT * FROM T_Table WHERE column1 > ’50’ AND column1 < ‘100’;

18
To check for a null attribute value, the IS NULL keyword can be used. For example, the following
query can check all rows for a null value and update it with an actual value:

UPDATE T_Table SET column1 = ‘AAA’ WHERE column1 IS NULL;

It is important that checking to see if a value is equal to ‘NULL’ is not used, as ‘NULL’ is not a
specific value, but rather a property of the cell.

The LIKE operator is used in conjunction with wildcards to find patterns within string attributes.
SQL allows for the use of the wildcards ( *, % and _ ) to be used in LIKE operators.

• % means any of the following or preceding characters are eligible.

‣ ‘J%’ will include Jim, James, John, Jack and Johnson

‣ ‘Ja%’ will include James and Jack

‣ ‘%n’ will include John and Johnson

• _ means any one character may be substituted for the underscore.

‣ ‘_23-456’ will include ‘123-456’, ‘123-456’ and ‘223-456

‣ ‘_23-_56’ will include ‘123-456’ and ‘723-756’

‣ ‘_o_es’ will include Jones, Cones and Roles

• They can also be used in combination

‣ ‘_23-456%’ will include ‘123-456’ and ‘123-456-789’

An SQL syntax can be developed using this:

SELECT * FROM T_Table WHERE column1 LIKE ‘J%’;

Keep in mind, SQL is case sensitive, so ‘J%’ will not yield the value ‘jim’. To fix this, the UPPER
(or alternatively LOWER) functions can transform the characters in the string to upper (or lower)
case characters.

SELECT * FROM T_Table WHERE UPPER (column1) LIKE ‘J%’;

In the expression above, a value of ‘jim’ in the column1 field will return a true condition. The
conditional operations (NOT, OR and AND) can also be used in conjunctions with the LIKE syntax.

Many queries that require multiple OR operators to check if a value is in a set of values can be
replaced with the IN operator. This operator will return true if a value exists in a set of fixed values.

SELECT * FROM T_Table WHERE column1 IN (‘2’, ‘3’, ‘5’, ‘7’, ’11’);

Only if the value of column1 is equal to one of these elements will the condition yield true. The IN
operator is particularly useful to check if a row exists in a subquery created. For example:

SELECT * FROM T_Table WHERE column1 IN (SELECT column1 FROM T_Table2);

The EXISTS operator can be used to check if a set of rows exist in a subquery. For example:

SELECT * FROM T_Table WHERE column1 EXISTS (SELECT * FROM T_Table2);


In this case, only the rows in T_Table where the column1 value exist in T_Table2 will be outputted.

Additional SELECT Keywords

SQL provides useful functions that can count, find the minimum or maximum values, calculate
averages and so on. The ORDER BY clause is useful when the listing order is important. Although
the option to selecting ascending and descending is offered, by default, ascending values are
used.

SELECT * FROM T_Table ORDER BY column2;

19
To produce a list in descending order, the DESC keyword can be used (ASC for ascending but is
usually unnecessary).

SELECT * FROM T_Table ORDER BY column2 DESC;

A cascading order sequence is a nested ordering sequence for a set of rows, such as a list in
which all last names are alphabetically ordered and, within each last name, the first names are
ordered. For this example:

SELECT * FROM T_Names ORDER BY last_name, first_name, middle_name;

The order in which the column names are entered in the ORDER BY clause is the nested order
that the rows will be arranged.

To select all distinct values in a table that exist, the DISTINCT clause can be used. For example, if
two entries had a value of ‘Jim’ in column1, only one Jim will be outputted:

SELECT DISTINCT first_name FROM T_Names;

The aggregate functions can perform different calculations on a set of rows. The COUNT function
creates a tally of the number of not null rows that a query outputs. For example:

SELECT COUNT( DISTINCT column1 ) FROM T_Table;

This query will output just the tally number of unique column1 values. It will not include the actual
values in the column. By default, a field is not necessary to be used as a parameter, assuming the
primary field has no null values.

SELECT COUNT (*) FROM T_Table;

There are other functions that can do similar calculations, as seen in the table below:

The MIN and MAX functions will return the lowest (or highest) value of a specified column.

SELECT MIN( column1 ) FROM T_Table;

Similarly to the COUNT function, the table values are not included, just the output of the function.
In the same example, if it is required to select the entire row (rather than just the minimum value)
of the entry that has the minimum value for a particular field, a nested query must be used as the
parameter to the minimum function.

SELECT * FROM T_Table WHERE column1 = ( SELECT MIN( column1 ) FROM T_Table );

The MAX function uses the same syntax.

The SUM functions will return the total value of a particular column from all rows queried. It can
be combined with the AS clause to name the output. For example:

SELECT SUM( column1) AS totalBalance FROM T_Table;

The AVG function is performed in the same way and finds the mean value for a specific column.

Rows can be grouped into smaller collections quickly and can be accessed using the GROUP BY
clause. It is generally used when you have columns combined with aggregate functions in the
SELECT statement. For example, to determine the minimum value of all rows with distinct values
in a particular column:

SELECT column1, MIN( column2 ) FROM T_Table GROUP BY column1;

This will output the minimum values for each set of distinct values. It is important to note that the
attributes shown following the GROUP BY clause must ALL be selected.

A particularly useful clause that comes with the GROUP BY clause is the HAVING clause. It is
applied to the output of the GROUP BY operation. It restricts the selected rows that are grouped.
For example:

SELECT * FROM T_Table GROUP BY column1 HAVING AVG( column2 ) < 10;

In this case, only the groups that had an average value from column2 less than 10 will be
selected.

20
Joining Tables

The most important distinction between a relational database and a standard database is the
ability to combine or join tables on common attributes. A join is performed when data is retrieved
from more than one table at a time. To join tables, simply list the tables in the FROM clause of the
SELECT statement. By default, a natural join is applied, where two columns are matched from
different tables.

SELECT T_Table.column1, T_Table2.column1, T_Table.column2

FROM T_Table, T_Table2

WHERE T_Table.column3 = T_Table2.column3;

In this example, the query will join the two tables connected to the values of column3. When using
joins, it is important to specify which table the column comes from (TABLE.COLUMN). When
joining three or more tables in SQL, it is important to specify a where condition for each each
table. For example:

SELECT T_Table.column1, T_Table2.column1, T_Table3.column1

FROM T_Table, T_Table2, T_Table3

WHERE T_Table.column3 = T_Table2.column3

AND T_Table.column3 = T_Table3.column3;

But avoid circular joins, in the above example, it is important that a join must not be made with
T_Table2 and T_Table3, as it is implied with the code above already.

An alias may be used to identify the source table from which data is taken. For example, a
shorthand of T_Table can be created.

SELECT T1.column1, T2.column1

FROM T_Table T1, T_Table2 T2

WHERE T1.column3 = T2.column3;

This shortens the query and minimises the chance of spelling mistakes.

A recursive query is a nested query that joins a table to itself. For example, in an employee table
that includes employees, a recursive query can join a manager to an employee within the same
table. In this example:

SELECT E.num, E.lastName, E.manager, M.lastName

FROM T_Employee E, T_Employee M

WHERE E.manager = M.num

ORDER BY E.manager;

The relational join operator merges rows from two tables and returns the rows with the following
conditions:

• Have common values in common columns (natural join)

• Meet a given join condition (equality or inequality)

• Have common values in common columns or have no matching values (outer join)

The joining syntax used above connecting to tables together is the old syntax and is not of
common use. The more common way is to use the JOIN clause.

Join operations can be classified as inner joins or outer joins. An inner join is a join operation
which only rows that meet a given criterion are selected. The join criterion can be an equality
condition (natural join) or an inequality condition (theta join). It is the most commonly used type of
join.

The outer join is an operation that produces a table in which all unmatched pairs are retained;
unmatched values in the related table are left null. Below is a table with different join types.

A cross join performs a relational product (also known as the cartesian product) of two tables.

SELECT * FROM T_Table CROSS JOIN T_Table2;

If there are 8 rows in T_Table and 12 rows in T_Table2, then the cross join will output 96 rows.
Each row from one table will be mapped with each row from the other table. It is also equivalent
to the old style join syntax:

21
SELECT * FROM T_Table, T_Table2;

A natural join returns all rows with matching values in the matching columns and eliminates
duplicate columns. This style of query is used when the tables share one or more common
attributes with common names. It will perform the following tasks:

• Determine the common attributes

• Select only the rows with common values in common attributes

• If there are no common attributes, return the cross join of the two tables

It is important that the matching attributes have the same name for the natural join to work
correctly.

SELECT * FROM T_Table NATURAL JOIN T_Table2;

Multiple tables can also be joined (assuming a column that matches in all three exists):

SELECT * FROM T_Table NATURAL JOIN T_Table2 NATURAL JOIN T_Table3;

Another way to express a join is using the USING keyword. This query returns only the rows with
matching values in the column indicated following the USING clause and it is required that this
column exists in both tables with the same name.

SELECT * FROM T_Table JOIN T_Table2 USING column1;

In this case, even if there are multiple matching columns, only the rows with matching column1
values will be selected.

Another way to express a join is when the tables have no common attribute names and the JOIN
ON clause is used. The query will return only the rows that meet the indicated join condition. This
way, the columns do not need to share the same way, but must have comparable data types.

SELECT * FROM T_Table JOIN T_Table2 ON T_Table.column1 = T_Table2.column2;

An outer join returns not only the rows matching the join condition, but returns the rows with
unmatched values too. There are three types; left, right and full. The left and right types reflect the
order in which the join operations are processed. The left table is the first table named, and the
right table is the second one.

• The left outer join returns all the rows from the first table and all the matching rows from
the second table:

SELECT * FROM T_Table LEFT JOIN T_Table2 ON T_Table.column1 = T_Table2.column2;


• The right outer join returns all the rows from the second table and all the matching rows
from the first table:

SELECT * FROM T_Table RIGHT JOIN T_Table2 ON T_Table.column1 = T_Table2.column2;

22
• The full join will return the rows from both tables including those that meet the condition
and those that don’t:

SELECT * FROM T_Table FULL JOIN T_Table2 ON T_Table.column1 = T_Table2.column2;

Transaction Management

A transaction is a logical unit of work that must be entirely completed or aborted; no intermediate
states are allowed. They must have display certain properties:

• Atomicity requires all operations (SQL requests) of the transaction to be completed. The
transaction is treated as a single, indivisible, logical unit of work.

• Consistency indicates the permanence of the database’s consistent state - the


transaction will take the database from one consistent state to another. If any part of the
transaction violates an integrity constraint, the transaction must be aborted.

• Isolation means that the data used during the execution of a transaction cannot be used
by a second transaction until the first has been completed. This property is useful in
multiuser databases and protects data integrity.

• Durability ensures that once transaction changes are done and committed, they cannot
be undone or lost (even in the event of system failure).

• Serialisability exists if the results of running transactions simultaneously are the same
results as running a transaction sequence one after another. There is no mixing of
transactions.
All of the SQL statements of a transaction must run successfully, otherwise the entire transaction
must be rolled back to the previous state. If the transaction is successfully, a commit is made to
the database. The most common data integrity and consistency problem is lost updates, which
occurs when two transactions are trying to update the same column or the same row at the same
time, and only one update will be performed.

A consistent database state is one in which all the integrity states are satisfied. Most real world
transactions are formed by two or more database requests. A database request is the equivalent
of a single SQL statement in an application program. For example, if a transaction includes two
UPDATE and one INSERT entry, then three database requests have been performed.

If a transaction is initiated by the user or application, the sequence must continue through SQL
statements until one of the following events are encountered:

• A COMMIT statements is reached, which automatically ends all SQL transactions and
saves the database.

• A ROLLBACK statement is reached, which automatically aborts the process and returns
the database to the last consistent state.

• The end of a program is reached successfully, in which all changes are permanently
recorded to the database - equivalent to a commit.

• The program is abnormally terminated due to a program crash or other issues. The
database changes must also be aborted and the state must return to the latest safe state -
equivalent to a rollback.

SQL Server requires the following statement to initiate a transaction:

BEGIN TRANSACTION;

A database also uses a transaction log to keep track of all transactions that update the
database. The DBMS uses the information stored in this log for any form of recovery triggered by
a ROLLBACK statement. It stores the following:

• A record for the beginning of a transaction

• For each transaction component, it stores:

‣ The type of operation performed (INSERT, UPDATE, DELETE)

‣ The names of the objects affected by the transaction (table name)

‣ The “before” and “after” values of the fields being updated

‣ Pointers to previous and next transaction log entries for the same transaction

• The ending (commit) of the transaction

23
A log increases the overhead of the database, but it is required to ensure that a corrupted
database can return to the previous saved state. It is important for recovery purposes.

A soft crash is a loss of volatile storage, but no damage to disks is made. A restart facility is
required to assist with this issue. A hard crash is caused when the disk becomes unreadable and
must be recovered from a previous saved state.

Restart Process - Once the cause of the soft crash has been rectified, and the database is being
restarted:

• The last checkpoint before the crash in the log file is identified. It is then read forward, and
two lists are constructed

• A REDO list containing the transaction-ids of transactions that were committed.

• An UNDO list containing the transaction-ids of transactions that never committed.

The database is then rolled forward, using REDO logic and the after-images and rolled back, using
UNDO logic and the before-images.

Recovery Process - A hard crash involves physical damage to the disk, rendering it unreadable.
This may occur in a number of ways:

• Head-crash. The read/write head, which normally “flies” a few microns off the disk surface,
for some reason actually contacts the disk surface, and damages it.

• Accidental impact damage, vandalism or fire, all of which can cause the disk drive and
disk to be damaged.

After a hard crash, the disk unit, and disk must be replaced, reformatted, and then re-loaded with
the data base.

A backup is a copy of the data base stored on a different device to the data base, and therefore
less likely to be subjected to the same catastrophe that damages the data base. Ideally, two
copies of each backup are held, an on-site copy, and an off-site copy to cater for severe
catastrophes, such as building destruction.

There are two types of transactions between two users; serial and interleaved. Serial transactions
occur when user1 alters the transaction database and once completed and committed, only then
will user2 access the database. Interleaved transactions have both users accessing the database
between commits. In the diagram below, assume T0 is user1 and T1 is user2.

Without caution, interleaved transactions can lead to lost updates and invalid data entries.

A serial schedule is a list of all tasks performed by the database when changing the information
present. Usually, an r refers to a read operation, a w refers to a write operation and a c refers to a
commit operation. The table below is an example with two users and is ordered in increasing
time.

24
Alternatively, the same example can be as:

S1: r0(X); w0(X); c0; r1(Y); w1(Y); r1(X); w1(X); c1;

Where the numbers following the operation refer to the transaction number (user).

A given interleaved execution of some set of transactions is said to be serializable if and only if it
produces the same result as some serial execution of those same transactions. For interleaved
schedules, we must determine whether the schedules are serializable by creating a precedence
graph.

Locks are required to prevent another transaction from reading inconsistent data and prevents
corruption and invalidation of data from occurring when multiple users try to write to the
database. Any single user can only modify those database records to which they have applied a
lock that gives them exclusive access to the record (until the lock has been released). A
transaction must acquire a lock prior to accessing a data item and locks are released when a
transaction is completed. They are usually controlled by the DBMS managers.

The granularity of locking refers to the size of the units that are, or can be, locked. It can be done
at the following levels:

• Database

• Table

• Page (or section of table)

• Record - allows concurrent transactions to access different rows of the same table, even
if the rows are located on the same page

• Attribute - allows concurrent transactions to access the same row, as long as they require
the use of different attributes within that row

There are two lock types:

• A shared lock can be held simultaneously by multiple processes, allowing them to read
without updating.

‣ If a transaction Ti has obtained a shared lock (denoted by S) on data item Q, then Ti


can read this item but not write to this item.

‣ They improve the amount of concurrency in a system.

‣ If T1 and T2 only wished to read P1 with no subsequent update they could both apply
an SLock on P1 and continue

• A process that needs to update a record must obtain an exclusive lock. Its application for
a lock will not proceed until all current locks are released.

‣ If a transaction Ti has obtained an exclusive lock (denoted X) on data item Q, then Ti


can both read and write to item Q.

A wait for graph can be created which describes the steps taken by each transaction process.
Below is an example of a WFG for three transactions. An S operation stands for a shared access,
when others can still access the database at the same time, and an X is an exclusive access
which disallows two transactions to occur concurrently.

A problem that may occur is a deadlock, also known as a deadly embrace. A scenario that may
occur could be:

• Transaction 1 has an exclusive lock on data item A, and requests a lock on data item B.

• Transaction 2 has an exclusive lock on data item B, and requests a lock on data item A.

25
Without committing data before the second transaction begins, the result is a deadlock where
neither transaction can run, while it waits for the other to complete. To prevent deadlocks, a
transaction must acquire a necessary lock before it updates any records and if it cannot access it,
it will release all locks and try again later.

Subqueries

The use of joins in a database allows you to get information from two or more tables. It is often
necessary to process data based on other processed data. A subquery can generate this
information and then use this new set of data to perform an action on it (insert, update, ect). It has
some basic characteristics:

• It is a SELECT statement inside another query

• It is normally expressed inside parameters

• The first query in the SQL statement is known as the outer query

• The query inside the SQL statement is known as the inner query

• The inner query is executed first

• The output of the inner query is used as the input to the outer query

• The entire SQL statement is sometimes referred to as a nested query

A subquery can return one or more values:

• One single value - For example an average price can be calculated and used to update a
value in another table

UPDATE T_Table SET column1 = ( SELECT AVG(price) FROM T_Table2 )

WHERE column2 = ‘abc’;

• A list of values - This type of subquery is used when a list of values is expected, such as
using an IN clause.

• A virtual table - Can be used when a table is expected, such as using a FROM clause

If the subquery returns no values at all, it returns a NULL and depending on the outer query, this
may cause an error or another NULL value.

The most common type of subquery uses an inner SELECT subquery on the right side of a
WHERE comparison expression. For example, to find a list of all items that have a price greater
than the average price of all the items, you could use the following SQL:

SELECT * FROM T_Table WHERE price > ( SELECT AVG( price ) FROM T_Table );

This type of expression (using equality operators) require the inner query to present an output that
is a single value. If it returns more than one value, the DBMS will produce an error.

Another common subquery type uses the IN clause to check if a value exists in another table. For
example, to find a list of all customers who have purchased an ‘apple’, the following SQL could
be executed:

SELECT * FROM T_Customer

WHERE id IN ( SELECT id FROM T_Purchase WHERE item_name = ‘apple’ );

For these expressions, the inner query can output a set of values (in this case a column).

Just as WHERE subqueries exist, a subquery can be used with a HAVING clause to restrict the
output of a GROUP BY query by applying additional criteria to the new grouped rows. For
example to list all products with a total quantity sold greater than the average quantity sold:

SELECT id, SUM( units ) FROM T_Product GROUP BY id

HAVING SUM( units ) > ( SELECT AVG( units ) FROM T_Product );

The IN clause can allow subqueries to check if a value exists within a list of values. However, this
does not work for inequality expressions ( < or > ). The ALL operator allows you to compare a
single value with a list of values returned by the first subquery using a comparison operator (other
than equals). For example, to select a product that is more expensive than all ‘apple’ products
that exist:

SELECT * FROM T_Product

WHERE price > ALL ( SELECT * FROM T_Customer

26
WHERE id IN ( SELECT id FROM T_Purchase WHERE item_name = ‘apple’ ) );

A similar operator is the ANY operator, which does the same function as the IN clause.

If the output of the subquery can produce a table of values, this subquery can be called upon by
the FROM clause and be used to analyse data with. For this to work, the output must be a virtual
table. For example, if you wanted to know all customers who have bought the products ‘apple’
and ‘banana’:

SELECT DISTINCT T_Customer.ID, T_Customer.Name


FROM T_Customer,
( SELECT T_Invoice.cusID FROM T_Invoice NATURAL JOIN T_Product
WHERE id = ‘apple’ ) CP1,
( SELECT T_Invoice.cusID FROM T_Invoice NATURAL JOIN T_Product
WHERE id = ‘banana' ) CP2
WHERE T_Customer.ID = CP1.cusID AND
T_Customer.ID = CP2.cusID;

A correlated subquery is a subquery that executes once for each row in the outer query. This
process is similar to nested loop in a programming language.

SELECT * FROM T_Product PS

WHERE PS.units > ( SELECT AVG( units ) FROM T_Product PA WHERE PA.id = PS.id );

In these subqueries, the inner query must make use of an attribute from the outer query. These
should be handled carefully, as the computation time increases due to the looping within each
query.

SQL Functions

Functions in SQL are very similar as to other languages. The function will take a number of
parameters (or none) and return an output. It can be called from any location in the SQL code
where an attribute value is being replaced.

There are a range of date and time functions that can be used in SQL Server:

• The YEAR function returns a four digit value from a date:

SELECT emp_name, emp_DOB, YEAR( emp_DOB ) AS year FROM …

• Similarly, the MONTH and DAY functions take a date parameter and return the numerical
value for the month and day from the date.

• The SYSDATE function (for Oracle) gets today’s current date:

SELECT SYSDATE;

• To add a specified number of date-parts to a given date, use the DATEADD function. For
example, to add 90 days to a date from a table attribute:

SELECT DATEADD( day, 90, product_date ) AS DueDate FROM PRODUCT;

• To find the difference between two dates, use the DATEDIFF function. Again, the day
parameter is used to specify what the output should be as. Both month and year can be
used, or hours, minutes, seconds.

SELECT DATEDIFF( day, product_date, SYSDATE ) AS DaysAgo FROM PRODUCT;

• To convert a date into another format (such as a varchar), the CONVERT function can be
used. This is for SQL Server. It has three parameters, the conversion format, the date and
the format type.

SELECT product_code, CONVERT( VARCHAR(8), P_INDATE, 1 ) FROM PRODUCT;

In this case, the format used is format 1. This corresponds to MM/DD/YY. Other formats
include:

‣ 101: MM/DD/YYYY

‣ 2: YY.MM.DD

‣ 102: YYYY.MM.DD

‣ 3: DD/MM/YY

27
‣ 103: DD/MM/YYYY
For Oracle and SQL Developer, the TO_CHAR function returns the character
representation of a date or set of date parts, given a set format.

SELECT product_code FROM PRODUCT WHERE TO_CHAR( p_date, ‘yyyy’ ) = 2018;

The format used can be one of the following:

‣ MONTH: name of month

‣ MON: three-letter month name

‣ MM: two-digit month name

‣ D: number for day of week

‣ DD: number for day of month

‣ DAY: name of day of week

‣ YYYY: four-digit year value

‣ YY: two-digit year value

Numeric functions can be grouped into algebraic, trigonometric and logarithmic functions:

• The ABS function returns the absolute value of the passed parameter (if negative, make it
positive)

SELECT ABS( -1.94 );

• The ROUND function rounds a value to a specified precision (number of digits). For
rounding to the nearest integer value, use the precision value of 0.

SELECT ROUND( -1.94, 1 );

• The CEILING and FLOOR functions outputs the nearest integer of the given value. The
ceiling function will return the nearest one above the current value (assuming it is not an
integer), while the floor function will return the nearest one below the current value (again,
assuming the input value is not an integer). So, CEILING( 10.5 ) = 11, FLOOR( 2.99 ) = 3,
CEILING ( 2.000 ) = 2.

SELECT FLOOR( -1.94 );

String manipulations are one of the most useful SQL functions and can convert strings to
uppercase, concatenate them, ect:

• To concatenate two strings, use the || operators. This will add two the two strings on either
side of the operator together. Multiple concatenations can be used, as given in the
example below:

SELECT emp_Lname || ', ' || emp_Fname AS name …

• To convert the entire string to upper or lower case, use the UPPER and LOWER functions
respectfully.

SELECT UPPER( emp_Lname || ', ' || emp_Fname ) AS name …

• The SUBSTRING functions returns a portion of a string given in the parameter. The first
parameter is the input string, the second is the starting index and the third is the length of
characters to cut.

SELECT phone, SUBSTR( phone, 1, 3 ) AS prefix …

• To get the number of characters that a particular string is, the LENGTH function is used
(for Oracle systems). In other database systems, it may be LEN instead. Similar differences
exist for other functions.

SELECT LENGTH( emp_Lname ) AS nameSize …

Another use function is the NVL function, which takes in a parameter and returns a specific
number if the parameter is NULL. This way, two fields can be added and even if one field is null,
the second will not be affected.

SELECT number1 + NVL( number2, 0 ) FROM T_TableName;

In the example above, even if number2 is null, (assuming number1 is always a value, then the
query will still output a value (where number2 = 0). If number2 is not null, then it will use it’s
correct value in the calculation.

28
Relational Set Operators

Most SQL commands are set oriented. This means that they deal with groups of things and
specific sets of data - they operate over entire rows, or columns at once. Union, intersection and
difference relational operators can be used to select part of the data set, but the relations must be
union compatible. This means that two or more tables must share the same number of columns
and have columns with the same data types corresponding with each other (the actual column
name is not important).

The UNION statement combines rows from two or more queries without including duplicate rows.
The syntax for such command is query UNION query. If one table has 5 fields, while the second
table has an additional 6th field, the output of the union query will ignore all fields that aren’t
present in both tables. An example SQL code would be:

SELECT * FROM T_Customer UNION SELECT * FROM T_Customer_2;

It can be used with more than one query, and can combine the output of multiple queries into one
larger output.

The UNION ALL clause will include all duplicated rows. It can combine two queries into one
single query (provided it is union compatible) but will include duplicates.

SELECT * FROM T_Customer UNION ALL SELECT * FROM T_Customer_2;

Similarly to this, the INTERSECT clause will provide the names of ALL duplicated rows between
the two queries. Only the rows that appear in both sets of data will be shown (not duplicated in
the output).

SELECT cus_code FROM T_Customer INTERSECT SELECT cus_code FROM T_Customer_2;

The MINUS (or EXCEPT in some systems) clause will output all rows that appear in the first set
(from the first query) but not in the second set (in the second query).

SELECT * FROM T_Customer MINUS SELECT * FROM T_Customer_2;

Virtual Tables

The output of a relational operator, such as SELECT, is another relation (or table). If the output of
such a query is needed to be saved, a relational view can be formed. A view is a virtual table
based on a SELECT query. It is saved as an object in the database and can contain columns,
computed columns, aliases and aggregate functions from one or more tables. The tables of which
the view is based upon are called the base tables.

To create a view, the CREATE VIEW command is used, followed by the view name. Before the
select query, the AS keyword is used:

CREATE VIEW product_stats AS SELECT * FROM …

A view can be used to replace anywhere a table name is expected in an SQL statement. They are
also dynamically updated - this means they are recreated each time they are used. So if a set of
data is updated, the view will be updated each time it is used (old data will then become
redundant). Views can provide security, as a company can provide each department a specific
view that they can call their data analysis from, rather than the whole database.

A batch update routine is a routine that pools transactions into a single group to update a master
table in a single operation. It updates data from two or more tables in a single operation. In normal
cases, the database system will produce an error when attempting to update data from a JOIN
table clause. A solution to this is to create an updatable view, which is a view that can update
attributes in the base tables used in the view. An updatable view has no unique syntax, but has a
range of restrictions in place:

• GROUP BY expressions or aggregate functions cannot be used

29
• Operators such as UNION, INTERSECT or EXCEPT cannot be used

• The base tables being updated must be key-preserved, meaning that the values in the
primary keys of the tables must be unique and remain that way.

Once the view has been created, simply update the values in the view and all base tables will be
updated:

UPDATE product_stats SET quantity = quantity - 1 WHERE code = ‘AAA’;

Database Connectivity

Database connectivity refers to the mechanisms in which application programs connect and
communicate with data repositories. The DBMS is the intermediary structure between the data
stored and the user’s applications. For online databases, a client/server approach must be used.
It can be broken down into three fundamental layers:

• A data layer where the data resides

• The middle layer that manages connectivity and data transformation issues. It is
responsible for translating the language into code that the database can respond to.

• The top layer has the interface of the DBMS application.

The database connectivity software is known as database middleware and it provides an


interface between the application program and the data repository.

Native SQL connectivity refers to the connection interface that is provided by the database
vendor and is unique to that vendor. For an Oracle database, the Oracle SQL Net interface must
be installed on the client’s device to access the database and interface of the online server. To
ensure that all DBMS programs act in a standard way, the Call Level Interface was developed by
the SQL access group that provides the database standards that all major vendors must abide by.

Developed in the early ‘90s, Open Database Connectivity (ODBC) is Microsoft’s implementation
of a the SQL standard for database access. It allows any Windows application to access relational
data sources, using SQL via a standard API. However, overtime, ODBC did not provide significant
functionality beyond the ability to execute SQL queries, so other data interfaces were developed.

There are many API adopted by databases to provide specific functions. One such is data
access objects (DAO), which is an object oriented API used to access desktop databases such
as MS Access and provides optimised interfaces for such programs. Another major API is the
remote data objects (RDO) which is a higher level, object oriented API used to access remote
database servers. It is used to deal with server based databases.

ODBC executes on the Windows operating system through dynamic linked-libraries (DLLs),
which are stored as files with a .dll extension. Running as a DLL, the code speeds up load and run
times. ODBC architecture has three main components:

• A high level ODBC API, through which applications access the ODBC functionality

• A driver manager that is in charge of managing all database connections

• An ODBC driver that communicates directly with the DBMS.

Java Database Connectivity (JDBC) is an application programming interface that allows a Java
program to interact with a wide range of data sources, including relational databases, tabular data
sources, spreadsheets, and text files. JDBC allows a Java program to establish a connection with
a data source, prepare and send the SQL code to the database server, and process the result set.
One advantage of JDBC over other middleware is that it requires no configuration on the client
side. It provides a way to connect to the database using the ODBC driver.

Providers are objects that manage the connection with a data source and provide data to the
consumers. They can be broken into two categories:

• Data providers provide data to other processes and are used to create the functionality of
the underlying data source.

30
• Service providers provide additional functionality to users and is located between the
data provider and the consumer. The service provider requests the data from the data
provider, transforms the data and then sends the transformed data to the end user. These
include transaction management services and querying services (such as SQL).

A session is a connection period between the two providers and a command is used to
manipulate the interaction between the two to create objects.

A script is written in a programming language that is not compiled, but interpreted and executed
at run time. A connection object is used to set up and establish a connection with a data source
of any time. A recordset object contains data generated by the execution of a command. It will
also contain any new data to be written to the data source. The recordset can be disconnected
from the data source. The DataSet is a disconnected, memory-resident representation of the
database and stores the data that has been read by the data provider, usually stored as an XML
format.

Internet database connectivity allows for a range of services such. The benefits of internet
technologies include, but are not limited to:

• Hardware and software independence - savings on equipment and no need for multiple
platform development.

• Common and simple user interface - reduced training time and cost and reduced end-user
support cost.

• Location independence - Global access through Internet infrastructure and mobile smart
devices and reduced requirements and cost for delicate connections.

• Rapid development - Availability of multiple development tools, reducing development


times. Allows for free client web tools and availability for open API software.

In general, a web server is the main hub through which all Internet services are accessed. When a
user dynamically queries a database, the client requests a webpage from a web server. When the
web server receives the request, it finds the page on the hard disk and sends it back to the client.
Dynamic webpages encompass most modern websites now, in which the web server generates
the contents of the page before sending it to the client. However, to gather data from a database
in a dynamic webpage, neither the client nor the web server knows how to connect to the
database and retrieve the data required.

A server-side extensive is required to allow this to occur. It is a program that interacts directly
with the server process to handle specific types of requests. Server-side extensions add
significant functionality to web servers and intranets. A database server-side extension is also
known as a web-to-database middleware, which retrieves data from the database and gives it
to the web server to send to the client. This can be done in the following process:

1. The client browser sends a page request to the web server

2. The web server receives and passes the request to the web-to-database middleware
for processing

3. The requested page might contain some kind of scripting which the web server passes
to the middleware.

4. The middleware reads the data, validates it and then executes the script. It can
connect to the database and passes the query using the database connectivity layer.

5. The database server executes the query and passes the result back to the middleware.

6. The middleware then compiles the result and dynamically generates a HTML formatted
page that includes the data retrieved from the database and sends it to the web server.

7. The web server returns the HTML page, including the query result, back to the client.

8. The client’s browser displays the page on the local device.

A web server interface defines a standard way to exchange messages with the external programs.
There are two well-defined web server interfaces:

• Common Gateway Interface (CGI)

• Application Programming Interface (API)

The Common Gateway Interface (CGI) uses script files that perform specific functions based on
a client’s parameters that are passed to the web server. The script file is a small program
31
containing the commands written in a programming language, usually Perl, C++ or Basic. The
script will convert the retrieved data to a HTML format. The main disadvantage of CGI is the
executable program is external and will run separately from the main processor, affecting
performance.

A newer web server interface is the Application Programming Interface (API) which is more
efficient and faster than the CGI script. They are implemented as shared code or dynamic link
libraries (DLLs) meaning API is treated part of the web server program that is invoked when
needed. Their code resides in memory, so there is no need to run an external program, like CGI
does. Another advantage is that API can use a shared connection to the database instead of
creating a new one every time, as in the case of CGI scripts. However, as the memory is stored in
the same storage as the web-server, an error in the API can bring down the whole web server.
APIs are also specific to web server and operating system.

The web browser is software that allows end users to navigate the web from the client computer.
Each time the end user clicks a hyperlink, the browser generates a HTTP GET page request that
is sent to the designated web server using the TCP/IP protocol. The browser’s job is to interpret
the HTML code and display it visually on the screen.

The web is a stateless system, meaning at any given time a web server does not know the status
of any of the clients communicating with it. The web does not reserve memory to maintain an
open communications state between the client and the server. The server does not know what the
client does with the webpage sent, as the page is stored in the clients cache.

As opposed to server-side extensions allowing the database to be read, a client-side extension


adds functionality to the web browser. Common extensions are:

• Plug-Ins: An external application that is automatically invoked by the browser when


needed. It is associated with a data object, generally a file extension, to allow the server to
properly handle data that is not supported. For example, when clicked on a .pdf file, it may
invoke an Adobe program to open up.

• JavaScript: A scripting language that allows web authors to design interactive sites. It is
embedded in the web page code that the client’s device can access and will be executed
by the browser and not on the web-server.

• ActiveX: This is more specific for Microsoft clients and allows programs to be written
inside webpages, similar to JavaScript. It adds controls such as drop down windows and
calendars to webpages.

• VBScript: Another Microsoft product that is used to extend browser functionality, derived
from Visual Basic. It is similar to JavaScript and the code is stored inside the HTML
document and executed by the web browser.

A web application server is a middleware application that expands the functionality of web
servers by linking them to a wide range of services such as databases, directory systems and
search engines. They are used to perform some of the following tasks:

• Connect to and query a database from a webpage

• Present database data in a webpage using various formats

• Create dynamic web search pages

• Create webpages to update, insert and delete database data

A cloud computing model provides ubiquitous, on-demand access to a shared pool of


configurable resources that can be rapidly provisioned. It provides a range of services that can be
accessed from any location, such as applications, storage, servers, processing power and
databases. There are three main types that are used, depending on the target customers:

• Public cloud: This infrastructure is built by third party organisations to sell cloud services
to the general public, such as Amazon Web Services (AWS), the Google Engine and
Microsoft Azure. In this model, cloud consumers share resources with other consumers
transparently.

• Private cloud: This is an internal infrastructure built by an organisation for the sole
purpose of servicing its own needs. It can be managed by the IT staff of the organisation
or by an external third party.

32
• Community cloud: This type of cloud is built by and for a specific group of organisations
that share a common trade, such as agencies of the federal government, the military or
higher education.

33

Potrebbero piacerti anche