Sei sulla pagina 1di 51

Introduction to SQL

What is SQL?
SQL SQL SQL SQL SQL SQL SQL SQL SQL stands for Structured Query Language allows you to access a database is an ANSI standard computer language can execute queries against a database can retrieve data from a database can insert new records in a database can delete records from a database can update records in a database is easy to learn

SQL is a Standard - BUT....


SQL is an ANSI (American National Standards Institute) standard computer language for accessing and manipulating database systems. SQL statements are used to retrieve and update data in a database. SQL works with database programs like MS Access, DB2, Informix, MS SQL Server, Oracle, Sybase, etc. Unfortunately, there are many different versions of the SQL language, but to be in compliance with the ANSI standard, they must support the same major keywords in a similar manner (such as SELECT, UPDATE, DELETE, INSERT, WHERE, and others). Note: Most of the SQL database programs also have their own proprietary extensions in addition to the SQL standard!

SQL Servers RDBMS


Modern SQL Servers are built on RDBMS.

DBMS - Database Management System


A Database Management System (DBMS) is a computer program that can access data in a database. The DBMS program enables you to extract, modify, or store information in a database. Different DBMS programs provides different functions for querying data, reporting data, and modifying data.

RDBMS - Relational Database Management System

A Relational Database Management System (RDBMS) is a Database Management System (DBMS) where the database is organized and accessed according to the relationships between data. RDBMS was invented by IBM in the early 1970's. RDBMS is the basis for SQL, and for all modern database systems like Oracle, SQL Server, IBM DB2, Sybase, MySQL, and Microsoft Access.

The Transact-SQL statements


The Structured Query Language (SQL) is the primary way of interacting with most relational database management systems. SQL Server is no exception. While we assume you have a base level of understanding, we will quickly cover the basics of the language before we get into how SQL Server has extended it to provide the developer with better ways of using the database. SQL is the interface for managing information in relational databases. With it we tell the database what we want to select, insert, update and delete. Each of these commands follows a standard syntax and use various clauses to determine exactly which rows are affected. Taken together, these form what is referred to as the Data Manipulation Language or DML.

SQL Data Manipulation Language (DML)


SQL (Structured Query Language) is a syntax for executing queries. But the SQL language also includes a syntax to update, insert, and delete records. These query and update commands together form the Data Manipulation Language (DML) part of SQL:

SELECT - extracts data from a database table UPDATE - updates data in a database table DELETE - deletes data from a database table INSERT INTO - inserts new data into a database table

SQL Data Definition Language (DDL)


The Data Definition Language (DDL) part of SQL permits database tables to be created or deleted. We can also define indexes (keys), specify links between tables, and impose constraints between database tables. The most important DDL statements in SQL are:

CREATE TABLE - creates a new database table ALTER TABLE - alters (changes) a database table DROP TABLE - deletes a database table CREATE INDEX - creates an index (search key) DROP INDEX - deletes an index

Declarative Referential Integrity (DRI)

The business rules, which define the how our application behaves, such as that we will allow users of our application to setup a customer before they make an order, are implemented using foreign keys and constraints.

Constraints ALTER TABLE dbo.Table1 ADD CONSTRAINT PK_Customer PRIMARY KEY CLUSTERED ( CustID ) ON [PRIMARY] Default values and extended properties
We can further define default values and extended properties for the tables we create using the:

ALTER TABLE dbo.Table1 ADD CONSTRAINT DF_Table1_CreditLimit DEFAULT 0 FOR CreditLimit Data Control Language (DCL)
Data Control Language (DCL) allows the owner and administrator of the database objects to define what rights users have to the data and objects in the database. The DBA can manage access to the database by granting and denying users rights to objects. The term CRUD Matrix refers to the rights to Create, Read, Update, and/or Delete information in a given table.

GRANT

Grant is used to give access to someone who needs to use the information stored in the objects. SQL Server allows us to specify user based or group based rights, and to apply them to objects or to groups of objects within a database.

REVOKE
We use the REVOKE statement to reverse the effect of the GRANT statement.

DENY

The DENY statement will prevent a user from accessing objects in the database.

SQL Database Tables


A database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or "Orders"). Tables contain records (rows) with data. Below is an example of a table called "Persons": LastName Hansen Svendson Pettersen FirstName Ola Tove Kari Address Timoteivn 10 Borgvn 23 Storgt 20 City Sandnes Sandnes Stavanger

The table above contains three records (one for each person) and four columns (LastName, FirstName, Address, and City).

SQL Queries
With SQL, we can query a database and have a result set returned.

A query like this:

SELECT LastName FROM Persons


Gives a result set like this: LastName Hansen Svendson Pettersen Note: Some database systems require a semicolon at the end of the SQL statement. We don't use the semicolon in our tutorials.

SQL SELECT Statement


The SQL SELECT Statement
The SELECT statement is used to select data from a table. The tabular result is stored in a result table (called the result-set).

Syntax SELECT column_name(s) FROM table_name


Note: SQL statements are not case sensitive. SELECT is the same as select.

SQL SELECT Example


To select the content of columns named "LastName" and "FirstName", from the database table called "Persons", use a SELECT statement like this:

SELECT LastName,FirstName FROM Persons


The database table "Persons": LastName Hansen Svendson Pettersen The result LastName Hansen Svendson Pettersen FirstName Ola Tove Kari FirstName Ola Tove Kari Address Timoteivn 10 Borgvn 23 Storgt 20 City Sandnes Sandnes Stavanger

Select All Columns

To select all columns from the "Persons" table, use a * symbol instead of column names, like this:

SELECT * FROM Persons


Result LastName Hansen Svendson Pettersen FirstName Ola Tove Kari Address Timoteivn 10 Borgvn 23 Storgt 20 City Sandnes Sandnes Stavanger

The Result Set


The result from a SQL query is stored in a result-set. Most database software systems allow navigation of the result set with programming functions, like: Move-To-First-Record, Get-RecordContent, Move-To-Next-Record, etc.

Semicolon after SQL Statements?


Semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be executed in the same call to the server. Some SQL tutorials end each SQL statement with a semicolon. Is this necessary? We are using MS Access and SQL Server 2000 and we do not have to put a semicolon after each SQL statement, but some database programs force you to use it.

The SELECT DISTINCT Statement


The DISTINCT keyword is used to return only distinct (different) values. The SELECT statement returns information from table columns. But what if we only want to select distinct elements? With SQL, all we need to do is to add a DISTINCT keyword to the SELECT statement:

Syntax SELECT DISTINCT column_name(s) FROM table_name

Using the DISTINCT keyword


To select ALL values from the column named "Company" we use a SELECT statement like this:

SELECT Company FROM Orders


"Orders" table Company Sega W3Schools Trio OrderNumber 3412 2312 4678

W3Schools Result Company Sega W3Schools Trio W3Schools

6798

Note that "W3Schools" is listed twice in the result-set. To select only DIFFERENT values from the column named "Company" we use a SELECT DISTINCT statement like this:

SELECT DISTINCT Company FROM Orders


Result: Company Sega W3Schools Trio Now "W3Schools" is listed only once in the result-set.

SQL WHERE Clause


The WHERE clause is used to specify a selection criterion.

The WHERE Clause


To conditionally select data from a table, a WHERE clause can be added to the SELECT statement.

Syntax SELECT column FROM table WHERE column operator value


With the WHERE clause, the following operators can be used: Operator Description = Equal <> Not equal > Greater than < Less than >= Greater than or equal <= Less than or equal BETWEEN Between an inclusive range LIKE Search for a pattern IN If you know the exact value you want to return for at least one of the columns

Note: In some versions of SQL the <> operator may be written as !=

Using the WHERE Clause


To select only the persons living in the city "Sandnes", we add a WHERE clause to the SELECT statement:

SELECT * FROM Persons WHERE City='Sandnes'


"Persons" table LastName Hansen Svendson Svendson Pettersen Result LastName Hansen Svendson Svendson FirstName Ola Tove Stale Address Timoteivn 10 Borgvn 23 Kaivn 18 City Sandnes Sandnes Sandnes Year 1951 1978 1980 FirstName Ola Tove Stale Kari Address Timoteivn 10 Borgvn 23 Kaivn 18 Storgt 20 City Sandnes Sandnes Sandnes Stavanger Year 1951 1978 1980 1960

Using Quotes
Note that we have used single quotes around the conditional values in the examples. SQL uses single quotes around text values (most database systems will also accept double quotes). Numeric values should not be enclosed in quotes. For text values:

This is correct: SELECT * FROM Persons WHERE FirstName='Tove' This is wrong: SELECT * FROM Persons WHERE FirstName=Tove
For numeric values:

This is correct: SELECT * FROM Persons WHERE Year>1965 This is wrong: SELECT * FROM Persons WHERE Year>'1965'

The LIKE Condition


The LIKE condition is used to specify a search for a pattern in a column.

Syntax SELECT column FROM table WHERE column LIKE pattern


A "%" sign can be used to define wildcards (missing letters in the pattern) both before and after the pattern.

Using LIKE
The following SQL statement will return persons with first names that start with an 'O':

SELECT * FROM Persons WHERE FirstName LIKE 'O%'


The following SQL statement will return persons with first names that end with an 'a':

SELECT * FROM Persons WHERE FirstName LIKE '%a'


The following SQL statement will return persons with first names that contain the pattern 'la':

SELECT * FROM Persons WHERE FirstName LIKE '%la%'

SQL INSERT INTO Statement


The INSERT INTO Statement
The INSERT INTO statement is used to insert new rows into a table.

Syntax INSERT INTO table_name VALUES (value1, value2,....)


You can also specify the columns for which you want to insert data:

INSERT INTO table_name (column1, column2,...) VALUES (value1, value2,....)

Insert a New Row


This "Persons" table: LastName Pettersen And this SQL statement: FirstName Kari Address Storgt 20 City Stavanger

INSERT INTO Persons

VALUES ('Hetland', 'Camilla', 'Hagabakka 24', 'Sandnes')


Will give this result: LastName Pettersen Hetland FirstName Kari Camilla Address Storgt 20 Hagabakka 24 City Stavanger Sandnes

Insert Data in Specified Columns


This "Persons" table: LastName Pettersen Hetland FirstName Kari Camilla Address Storgt 20 Hagabakka 24 City Stavanger Sandnes

And This SQL statement:

INSERT INTO Persons (LastName, Address) VALUES ('Rasmussen', 'Storgt 67')


Will give this result: LastName Pettersen Hetland Rasmussen FirstName Kari Camilla Address Storgt 20 Hagabakka 24 Storgt 67 City Stavanger Sandnes

SQL UPDATE Statement


The Update Statement
The UPDATE statement is used to modify the data in a table.

Syntax UPDATE table_name SET column_name = new_value WHERE column_name = some_value

Person: LastName Nilsen FirstName Address City

Fred

Kirkegt 56

Stavanger

Rasmussen

Storgt 67

Update one Column in a Row


We want to add a first name to the person with a last name of "Rasmussen":

UPDATE Person SET FirstName = 'Nina' WHERE LastName = 'Rasmussen'


Result: LastName Nilsen Rasmussen FirstName Fred Nina Address Kirkegt 56 Storgt 67 City Stavanger

Update several Columns in a Row


We want to change the address and add the name of the city:

UPDATE Person SET Address = 'Stien 12', City = 'Stavanger' WHERE LastName = 'Rasmussen'
Result: LastName Nilsen Rasmussen FirstName Fred Nina Address Kirkegt 56 Stien 12 City Stavanger Stavanger

SQL DELETE Statement


The DELETE Statement
The DELETE statement is used to delete rows in a table.

Syntax DELETE FROM table_name WHERE column_name = some_value

Person: LastName Nilsen Rasmussen FirstName Fred Nina Address Kirkegt 56 Stien 12 City Stavanger Stavanger

Delete a Row
"Nina Rasmussen" is going to be deleted:

DELETE FROM Person WHERE LastName = 'Rasmussen'


Result LastName Nilsen FirstName Fred Address Kirkegt 56 City Stavanger

Delete All Rows


It is possible to delete all rows in a table without deleting the table. This means that the table structure, attributes, and indexes will be intact:

DELETE FROM table_name or DELETE * FROM table_name

SQL ORDER BY
The ORDER BY keyword is used to sort the result.

Sort the Rows


The ORDER BY clause is used to sort the rows. Orders: Company Sega ABC Shop W3Schools W3Schools OrderNumber 3412 5678 6798 2312

Example
To display the company names in alphabetical order:

SELECT Company, OrderNumber FROM Orders ORDER BY Company


Result: Company ABC Shop Sega W3Schools W3Schools OrderNumber 5678 3412 6798 2312

Example

To display the company names in alphabetical order AND the OrderNumber in numerical order:

SELECT Company, OrderNumber FROM Orders ORDER BY Company, OrderNumber


Result: Company ABC Shop Sega W3Schools W3Schools OrderNumber 5678 3412 2312 6798

Example
To display the company names in reverse alphabetical order:

SELECT Company, OrderNumber FROM Orders ORDER BY Company DESC


Result: Company W3Schools W3Schools Sega ABC Shop OrderNumber 6798 2312 3412 5678

Example
To display the company names in reverse alphabetical order AND the OrderNumber in numerical order:

SELECT Company, OrderNumber FROM Orders ORDER BY Company DESC, OrderNumber ASC
Result: Company W3Schools W3Schools Sega ABC Shop OrderNumber 2312 6798 3412 5678

Notice that there are two equal company names (W3Schools) in the result above. The only time you will see the second column in ASC order would be when there are duplicated values in the first sort column, or a handful of nulls. The ORDER BY keyword is used to sort the result.

Sort the Rows

The ORDER BY clause is used to sort the rows. Orders: Company Sega ABC Shop W3Schools W3Schools OrderNumber 3412 5678 6798 2312

Example
To display the company names in alphabetical order:

SELECT Company, OrderNumber FROM Orders ORDER BY Company


Result: Company ABC Shop Sega W3Schools W3Schools OrderNumber 5678 3412 6798 2312

Example
To display the company names in alphabetical order AND the OrderNumber in numerical order:

SELECT Company, OrderNumber FROM Orders ORDER BY Company, OrderNumber


Result: Company ABC Shop Sega W3Schools W3Schools OrderNumber 5678 3412 2312 6798

Example
To display the company names in reverse alphabetical order:

SELECT Company, OrderNumber FROM Orders ORDER BY Company DESC


Result: Company W3Schools W3Schools Sega OrderNumber 6798 2312 3412

ABC Shop

5678

Example
To display the company names in reverse alphabetical order AND the OrderNumber in numerical order:

SELECT Company, OrderNumber FROM Orders ORDER BY Company DESC, OrderNumber ASC
Result: Company W3Schools W3Schools Sega ABC Shop OrderNumber 2312 6798 3412 5678

Notice that there are two equal company names (W3Schools) in the result above. The only time you will see the second column in ASC order would be when there are duplicated values in the first sort column, or a handful of nulls.

SQL AND & OR


AND & OR
AND and OR join two or more conditions in a WHERE clause. The AND operator displays a row if ALL conditions listed are true. The OR operator displays a row if ANY of the conditions listed are true.

Original Table (used in the examples)


LastName Hansen Svendson Svendson FirstName Ola Tove Stephen Address Timoteivn 10 Borgvn 23 Kaivn 18 City Sandnes Sandnes Sandnes

Example
Use AND to display each person with the first name equal to "Tove", and the last name equal to "Svendson":

SELECT * FROM Persons WHERE FirstName='Tove' AND LastName='Svendson'

Result: LastName Svendson FirstName Tove Address Borgvn 23 City Sandnes

Example
Use OR to display each person with the first name equal to "Tove", or the last name equal to "Svendson":

SELECT * FROM Persons WHERE firstname='Tove' OR lastname='Svendson'


Result: LastName Svendson Svendson FirstName Tove Stephen Address Borgvn 23 Kaivn 18 City Sandnes Sandnes

Example
You can also combine AND and OR (use parentheses to form complex expressions):

SELECT * FROM Persons WHERE (FirstName='Tove' OR FirstName='Stephen') AND LastName='Svendson'


Result: LastName Svendson Svendson FirstName Tove Stephen Address Borgvn 23 Kaivn 18 City Sandnes Sandnes

SQL IN

IN
The IN operator may be used if you know the exact value you want to return for at least one of the columns.

SELECT column_name FROM table_name WHERE column_name IN (value1,value2,..)

Original Table (used in the examples)


LastName Hansen FirstName Ola Address Timoteivn 10 City Sandnes

Nordmann Pettersen Svendson

Anna Kari Tove

Neset 18 Storgt 20 Borgvn 23

Sandnes Stavanger Sandnes

Example 1
To display the persons with LastName equal to "Hansen" or "Pettersen", use the following SQL:

SELECT * FROM Persons WHERE LastName IN ('Hansen','Pettersen')


Result: LastName Hansen Pettersen FirstName Ola Kari Address Timoteivn 10 Storgt 20 City Sandnes Stavanger

SQL BETWEEN

BETWEEN ... AND


The BETWEEN ... AND operator selects a range of data between two values. These values can be numbers, text, or dates.

SELECT column_name FROM table_name WHERE column_name BETWEEN value1 AND value2

Original Table (used in the examples)


LastName Hansen Nordmann Pettersen Svendson FirstName Ola Anna Kari Tove Address Timoteivn 10 Neset 18 Storgt 20 Borgvn 23 City Sandnes Sandnes Stavanger Sandnes

Example 1
To display the persons alphabetically between (and including) "Hansen" and exclusive "Pettersen", use the following SQL:

SELECT * FROM Persons WHERE LastName BETWEEN 'Hansen' AND 'Pettersen'


Result: LastName Hansen FirstName Ola Address Timoteivn 10 City Sandnes

Nordmann

Anna

Neset 18

Sandnes

IMPORTANT! The BETWEEN...AND operator is treated differently in different databases. With some databases a person with the LastName of "Hansen" or "Pettersen" will not be listed (BETWEEN..AND only selects fields that are between and excluding the test values). With some databases a person with the last name of "Hansen" or "Pettersen" will be listed (BETWEEN..AND selects fields that are between and including the test values). With other databases a person with the last name of "Hansen" will be listed, but "Pettersen" will not be listed (BETWEEN..AND selects fields between the test values, including the first test value and excluding the last test value). Therefore: Check how your database treats the BETWEEN....AND operator!

Example 2
To display the persons outside the range used in the previous example, use the NOT operator:

SELECT * FROM Persons WHERE LastName NOT BETWEEN 'Hansen' AND 'Pettersen'
Result: LastName Pettersen Svendson FirstName Kari Tove Address Storgt 20 Borgvn 23 City Stavanger Sandnes

SQL Alias

With SQL, aliases can be used for column names and table names.

Column Name Alias


The syntax is:

SELECT column AS column_alias FROM table

Table Name Alias


The syntax is:

SELECT column FROM table AS table_alias

Example: Using a Column Alias


This table (Persons): LastName Hansen FirstName Ola Address Timoteivn 10 City Sandnes

Svendson Pettersen And this SQL:

Tove Kari

Borgvn 23 Storgt 20

Sandnes Stavanger

SELECT LastName AS Family, FirstName AS Name FROM Persons


Returns this result: Family Hansen Svendson Pettersen Name Ola Tove Kari

Example: Using a Table Alias

This table (Persons): LastName Hansen Svendson Pettersen And this SQL: FirstName Ola Tove Kari Address Timoteivn 10 Borgvn 23 Storgt 20 City Sandnes Sandnes Stavanger

SELECT LastName, FirstName FROM Persons AS Employees


Returns this result: Table Employees: LastName Hansen Svendson Pettersen FirstName Ola Tove Kari

Keys Super Key:


Primary key: A primary key is a column with a unique value for each row. Each primary key value must be unique within the table. The purpose is to bind data together, across tables, without repeating all of the data in every table.

Foreign Key:

Unique Key: Composite Key:

SQL JOIN
Sometimes we have to select data from two or more tables to make our result complete. We have to perform a join. Tables in a database can be related to each other with keys. A primary key is a column with a unique value for each row. Each primary key value must be unique within the table. The purpose is to bind data together, across tables, without repeating all of the data in every table. In the "Employees" table below, the "Employee_ID" column is the primary key, meaning that no two rows can have the same Employee_ID. The Employee_ID distinguishes two persons even if they have the same name. When you look at the example tables below, notice that:

The "Employee_ID" column is the primary key of the "Employees" table The "Prod_ID" column is the primary key of the "Orders" table The "Employee_ID" column in the "Orders" table is used to refer to the persons in the "Employees" table without using their names

Employees: Employee_ID 01 02 03 04 Name Hansen, Ola Svendson, Tove Svendson, Stephen Pettersen, Kari

Orders: Prod_ID 234 657 865 Product Printer Table Chair Employee_ID 01 03 03

Referring to Two Tables


We can select data from two tables by referring to two tables, like this:

Example
Who has ordered a product, and what did they order?

SELECT Employees.Name, Orders.Product FROM Employees, Orders WHERE Employees.Employee_ID=Orders.Employee_ID


Result Name Hansen, Ola Svendson, Stephen Svendson, Stephen Product Printer Table Chair

Example
Who ordered a printer?

SELECT Employees.Name FROM Employees, Orders WHERE Employees.Employee_ID=Orders.Employee_ID AND Orders.Product='Printer'


Result Name Hansen, Ola

Using Joins
OR we can select data from two tables with the JOIN keyword, like this:

Example INNER JOIN


Syntax

SELECT field1, field2, field3 FROM first_table INNER JOIN second_table ON first_table.keyfield = second_table.foreign_keyfield

Who has ordered a product, and what did they order?

SELECT Employees.Name, Orders.Product FROM Employees INNER JOIN Orders ON Employees.Employee_ID=Orders.Employee_ID


The INNER JOIN returns all rows from both tables where there is a match. If there are rows in Employees that do not have matches in Orders, those rows will not be listed. Result Name Hansen, Ola Svendson, Stephen Svendson, Stephen Product Printer Table Chair

Example LEFT JOIN


Syntax

SELECT field1, field2, field3 FROM first_table LEFT JOIN second_table ON first_table.keyfield = second_table.foreign_keyfield
List all employees, and their orders - if any.

SELECT Employees.Name, Orders.Product FROM Employees LEFT JOIN Orders ON Employees.Employee_ID=Orders.Employee_ID


The LEFT JOIN returns all the rows from the first table (Employees), even if there are no matches in the second table (Orders). If there are rows in Employees that do not have matches in Orders, those rows also will be listed. Result Name Hansen, Ola Svendson, Tove Svendson, Stephen Svendson, Stephen Pettersen, Kari Product Printer Table Chair

Example RIGHT JOIN


Syntax

SELECT field1, field2, field3 FROM first_table RIGHT JOIN second_table ON first_table.keyfield = second_table.foreign_keyfield

List all orders, and who has ordered - if any.

SELECT Employees.Name, Orders.Product FROM Employees RIGHT JOIN Orders ON Employees.Employee_ID=Orders.Employee_ID


The RIGHT JOIN returns all the rows from the second table (Orders), even if there are no matches in the first table (Employees). If there had been any rows in Orders that did not have matches in Employees, those rows also would have been listed. Result Name Hansen, Ola Svendson, Stephen Svendson, Stephen Product Printer Table Chair

Example
Who ordered a printer?

SELECT Employees.Name FROM Employees INNER JOIN Orders ON Employees.Employee_ID=Orders.Employee_ID WHERE Orders.Product = 'Printer'
Result Name Hansen, Ola

SQL UNION and UNION ALL


UNION
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. Note: With UNION, only distinct values are selected.

SQL Statement 1 UNION SQL Statement 2

Employees_Norway: E_ID 01 02 03 04 Employees_USA: E_ID 01 02 03 04 E_Name Turner, Sally Kent, Clark Svendson, Stephen Scott, Stephen E_Name Hansen, Ola Svendson, Tove Svendson, Stephen Pettersen, Kari

Using the UNION Command

Example
List all different employee names in Norway and USA:

SELECT E_Name FROM Employees_Norway UNION SELECT E_Name FROM Employees_USA


Result E_Name Hansen, Ola Svendson, Tove Svendson, Stephen Pettersen, Kari Turner, Sally Kent, Clark Scott, Stephen Note: This command cannot be used to list all employees in Norway and USA. In the example above we have two employees with equal names, and only one of them is listed. The UNION command only selects distinct values.

UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.

SQL Statement 1 UNION ALL SQL Statement 2

Using the UNION ALL Command


Example
List all employees in Norway and USA:

SELECT E_Name FROM Employees_Norway UNION ALL SELECT E_Name FROM Employees_USA
Result E_Name Hansen, Ola Svendson, Tove Svendson, Stephen Pettersen, Kari Turner, Sally Kent, Clark

Svendson, Stephen Scott, Stephen

SQL Create Database, Table, and Index


Create a Database
To create a database:

CREATE DATABASE database_name

Create a Table
To create a table in a database:

CREATE TABLE table_name ( column_name1 data_type, column_name2 data_type, ....... ) Example


This example demonstrates how you can create a table named "Person", with four columns. The column names will be "LastName", "FirstName", "Address", and "Age":

CREATE TABLE Person ( LastName varchar, FirstName varchar, Address varchar, Age int )
This example demonstrates how you can specify a maximum length for some columns:

CREATE TABLE Person ( LastName varchar(30), FirstName varchar, Address varchar, Age int(3) )
The data type specifies what type of data the column can hold. The table below contains the most common data types in SQL: Data Type integer(size) int(size) Description Hold integers only. The maximum number of digits are specified in parenthesis.

smallint(size) tinyint(size) decimal(size,d) numeric(size,d) char(size) varchar(size) date(yyyymmdd)

Hold numbers with fractions. The maximum number of digits are specified in "size". The maximum number of digits to the right of the decimal is specified in "d". Holds a fixed length string (can contain letters, numbers, and special characters). The fixed size is specified in parenthesis. Holds a variable length string (can contain letters, numbers, and special characters). The maximum size is specified in parenthesis. Holds a date

Create Index
Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes, they are just used to speed up queries. Note: Updating a table containing indexes takes more time than updating a table without, this is because the indexes also need an update. So, it is a good idea to create indexes only on columns that are often used for a search. A Unique Index Creates a unique index on a table. A unique index means that two rows cannot have the same index value.

CREATE UNIQUE INDEX index_name ON table_name (column_name)


The "column_name" specifies the column you want indexed. A Simple Index Creates a simple index on a table. When the UNIQUE keyword is omitted, duplicate values are allowed.

CREATE INDEX index_name ON table_name (column_name)


The "column_name" specifies the column you want indexed.

Example
This example creates a simple index, named "PersonIndex", on the LastName field of the Person table:

CREATE INDEX PersonIndex ON Person (LastName)


If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name:

CREATE INDEX PersonIndex ON Person (LastName DESC)

If you want to index more than one column you can list the column names within the parentheses, separated by commas:

CREATE INDEX PersonIndex ON Person (LastName, FirstName)

SQL ALTER TABLE


ALTER TABLE
The ALTER TABLE statement is used to add or drop columns in an existing table.

ALTER TABLE table_name ADD column_name datatype ALTER TABLE table_name DROP COLUMN column_name
Note: Some database systems don't allow the dropping of a column in a database table (DROP COLUMN column_name).

Person: LastName Pettersen FirstName Kari Address Storgt 20

Example
To add a column named "City" in the "Person" table:

ALTER TABLE Person ADD City varchar(30)


Result: LastName Pettersen FirstName Kari Address Storgt 20 City

Example
To drop the "Address" column in the "Person" table:

ALTER TABLE Person DROP COLUMN Address


Result: LastName Pettersen FirstName Kari City

ALTER TABLE
The ALTER TABLE statement is used to add or drop columns in an existing table.

ALTER TABLE table_name ADD column_name datatype ALTER TABLE table_name DROP COLUMN column_name
Note: Some database systems don't allow the dropping of a column in a database table (DROP COLUMN column_name).

Person: LastName Pettersen FirstName Kari Address Storgt 20

Example
To add a column named "City" in the "Person" table:

ALTER TABLE Person ADD City varchar(30)


Result: LastName Pettersen FirstName Kari Address Storgt 20 City

Example
To drop the "Address" column in the "Person" table:

ALTER TABLE Person DROP COLUMN Address

Result: LastName Pettersen FirstName Kari City

SQL Functions

SQL has a lot of built-in functions for counting and calculations.

Function Syntax
The syntax for built-in SQL functions is:

SELECT function(column) FROM table

Types of Functions
There are several basic types and categories of functions in SQL. The basic types of functions are:

Aggregate Functions Scalar functions

Aggregate functions
Aggregate functions operate against a collection of values, but return a single value. Note: If used among many other expressions in the item list of a SELECT statement, the SELECT must have a GROUP BY clause!!

"Persons" table (used in most examples)


Name Hansen, Ola Svendson, Tove Pettersen, Kari Age 34 45 19

Aggregate functions in SQL Server


Function AVG(column) BINARY_CHECKSUM CHECKSUM CHECKSUM_AGG COUNT(column) COUNT(*) COUNT(DISTINCT column) FIRST(column) LAST(column) MAX(column) MIN(column) STDEV(column) STDEVP(column) SUM(column) VAR(column) VARP(column) Description Returns the average value of a column

Returns the number of rows (without a NULL value) of a column Returns the number of selected rows Returns the number of distinct results Returns the value of the first record in a specified field (not supported in SQLServer2K) Returns the value of the last record in a specified field (not supported in SQLServer2K) Returns the highest value of a column Returns the lowest value of a column

Returns the total sum of a column

Scalar functions
Scalar functions operate against a single value, and return a single value based on the input value.

SQL GROUP BY and HAVING


Aggregate functions (like SUM) often need an added GROUP BY functionality.

GROUP BY...
GROUP BY... was added to SQL because aggregate functions (like SUM) return the aggregate of all column values every time they are called, and without the GROUP BY function it was impossible to find the sum for each individual group of column values. The syntax for the GROUP BY function is:

SELECT column,SUM(column) FROM table GROUP BY column

GROUP BY Example
This "Sales" Table: Company W3Schools IBM W3Schools And This SQL: Amount 5500 4500 7100

SELECT Company, SUM(Amount) FROM Sales


Returns this result: Company W3Schools IBM W3Schools SUM(Amount) 17100 17100 17100

The above code is invalid because the column returned is not part of an aggregate. A GROUP BY clause will solve this problem:

SELECT Company,SUM(Amount) FROM Sales GROUP BY Company


Returns this result: Company SUM(Amount)

W3Schools IBM

12600 4500

HAVING...
HAVING... was added to SQL because the WHERE keyword could not be used against aggregate functions (like SUM), and without HAVING... it would be impossible to test for result conditions. The syntax for the HAVING function is:

SELECT column,SUM(column) FROM table GROUP BY column HAVING SUM(column) condition value
This "Sales" Table: Company W3Schools IBM W3Schools This SQL: Amount 5500 4500 7100

SELECT Company,SUM(Amount) FROM Sales GROUP BY Company HAVING SUM(Amount)>10000


Returns this result Company W3Schools SUM(Amount) 12600

SQL SELECT INTO Statement


The SELECT INTO Statement
The SELECT INTO statement is most often used to create backup copies of tables or for archiving records.

Syntax SELECT column_name(s) INTO newtable [IN externaldatabase] FROM source

Make a Backup Copy


The following example makes a backup copy of the "Persons" table:

SELECT * INTO Persons_backup

FROM Persons
The IN clause can be used to copy tables into another database:

SELECT Persons.* INTO Persons IN 'Backup.mdb' FROM Persons


If you only want to copy a few fields, you can do so by listing them after the SELECT statement:

SELECT LastName,FirstName INTO Persons_backup FROM Persons


You can also add a WHERE clause. The following example creates a "Persons_backup" table with two columns (FirstName and LastName) by extracting the persons who lives in "Sandnes" from the "Persons" table:

SELECT LastName,Firstname INTO Persons_backup FROM Persons WHERE City='Sandnes'


Selecting data from more than one table is also possible. The following example creates a new table "Empl_Ord_backup" that contains data from the two tables Employees and Orders:

SELECT Employees.Name,Orders.Product INTO Empl_Ord_backup FROM Employees INNER JOIN Orders ON Employees.Employee_ID=Orders.Employee_ID

SQL CREATE VIEW Statement


A view is a virtual table based on the result-set of a SELECT statement.

What is a View?
In SQL, a VIEW is a virtual table based on the result-set of a SELECT statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database. You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from a single table. Note: The database design and structure will NOT be affected by the functions, where, or join statements in a view.

Syntax CREATE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition

Note: The database does not store the view data! The database engine recreates the data, using the view's SELECT statement, every time a user queries a view.

Using Views
A view could be used from inside a query, a stored procedure, or from inside another view. By adding functions, joins, etc., to a view, it allows you to present exactly the data you want to the user. The sample database Northwind has some views installed by default. The view "Current Product List" lists all active products (products that are not discontinued) from the Products table. The view is created with the following SQL:

CREATE VIEW [Current Product List] AS SELECT ProductID,ProductName FROM Products WHERE Discontinued=No
We can query the view above as follows:

SELECT * FROM [Current Product List]


Another view from the Northwind sample database selects every product in the Products table that has a unit price that is higher than the average unit price:

CREATE VIEW [Products Above Average Price] AS SELECT ProductName,UnitPrice FROM Products WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)
We can query the view above as follows:

SELECT * FROM [Products Above Average Price]


Another example view from the Northwind database calculates the total sale for each category in 1997. Note that this view selects its data from another view called "Product Sales for 1997":

CREATE VIEW [Category Sales For 1997] AS SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales FROM [Product Sales for 1997] GROUP BY CategoryName
We can query the view above as follows:

SELECT * FROM [Category Sales For 1997]


We can also add a condition to the query. Now we want to see the total sale only for the category "Beverages":

SELECT * FROM [Category Sales For 1997] WHERE CategoryName='Beverages'

SQL Quick Reference

SQL Syntax
Statement AND / OR Syntax SELECT column_name(s) FROM table_name WHERE condition AND|OR condition ALTER TABLE table_name ADD column_name datatype ALTER TABLE table_name DROP COLUMN column_name SELECT column_name AS column_alias FROM table_name SELECT column_name FROM table_name AS table_alias SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value1 AND value2 CREATE DATABASE database_name CREATE INDEX index_name ON table_name (column_name) CREATE TABLE table_name ( column_name1 data_type, column_name2 data_type, ....... ) CREATE UNIQUE INDEX index_name ON table_name (column_name) CREATE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition DELETE FROM table_name (Note: Deletes the entire table!!) or DELETE FROM table_name WHERE condition DROP DATABASE database_name DROP INDEX table_name.index_name DROP TABLE table_name SELECT column_name1,SUM(column_name2) FROM table_name GROUP BY column_name1 SELECT column_name1,SUM(column_name2) FROM table_name GROUP BY column_name1 HAVING SUM(column_name2) condition value SELECT column_name(s) FROM table_name WHERE column_name IN (value1,value2,..) INSERT INTO table_name

ALTER TABLE (add column) ALTER TABLE (drop column) AS (alias for column) AS (alias for table) BETWEEN

CREATE DATABASE CREATE INDEX CREATE TABLE

CREATE UNIQUE INDEX CREATE VIEW

DELETE FROM

DROP DATABASE DROP INDEX DROP TABLE GROUP BY

HAVING

IN

INSERT INTO

VALUES (value1, value2,....) or INSERT INTO table_name (column_name1, column_name2,...) VALUES (value1, value2,....) SELECT column_name(s) FROM table_name WHERE column_name LIKE pattern SELECT column_name(s) FROM table_name ORDER BY column_name [ASC|DESC] SELECT column_name(s) FROM table_name SELECT * FROM table_name SELECT DISTINCT column_name(s) FROM table_name SELECT * INTO new_table_name FROM original_table_name or SELECT column_name(s) INTO new_table_name FROM original_table_name TRUNCATE TABLE table_name

LIKE

ORDER BY

SELECT SELECT * SELECT DISTINCT SELECT INTO (used to create backup copies of tables)

TRUNCATE TABLE (deletes only the data inside the table) UPDATE

WHERE

UPDATE table_name SET column_name=new_value [, column_name=new_value] WHERE column_name=some_value SELECT column_name(s) FROM table_name WHERE condition

Cursor
A cursor is used for processing individual rows returned by the database system for a query. It is necessary because many programming languages suffer from impedance mismatch. Programming languages are often procedural and do not offer any mechanism for manipulating whole result sets at once. Therefore, the rows in a result set must be processed sequentially by the application. In this way, a cursor can be thought of as an iterator over the collection of rows in the result set. Several SQL statements do not require the use of cursors. That includes the INSERT statement, for example, as well as most forms of the DELETE and UPDATE statements. Even a SELECT statement may not involve a cursor if it is used in the variation of SELECT INTO. A SELECT INTO retrieves at most a single row directly into the application

Working with Cursors

A cursor is made known to the DBMS with the DECLARE CURSOR statement. A name has to be assigned for the cursor. DECLARE cursor_name CURSOR FOR SELECT ... FROM ...

Before being used, a cursor must be opened with the OPEN statement. As a result of the opening, the cursor is positioned before the first row in the result set. OPEN cursor_name A cursor is positioned on a specific row in the result set with the FETCH statement. A fetch operation transfers the data of the row into the application. Once all rows are processed or the fetch operation is to be positioned on a non-existing row (cf. scrollable cursors below), a SQLSTATE '02000' (usually accompanied by an SQLCODE +100) is returned by the DBMS to indicate the end of the result set. FETCH cursor_name INTO ... The last step is to close the cursor using the CLOSE statement. CLOSE cursor_name Once a cursor is closed it can be opened again, which implies that the query is evaluated again and a new result set is built.

Scrollable Cursors
Cursors may be declared as being scrollable or not. The scrollability indicates the direction in which a cursor can move. A non-scrollable cursor is also known as forward-only. Each row can be fetched at most once, and the cursor automatically moves to the immediately following row. A fetch operation after the last row has been retrieved positions the cursor after the last row and returns SQLSTATE 02000 (SQLCODE +100). A scrollable cursor can be positioned anywhere in the result set using the FETCH SQL statement. The keyword SCROLL must be specified when declaring the cursor. The default is NO SCROLL, although different language bindings like JDBC may apply different default. DECLARE cursor_name sensitivity SCROLL CURSOR FOR SELECT ... FROM ... The target position for a scrollable cursor can be specified relative to the current cursor position or absolute from the beginning of the result set. FETCH [ NEXT | PRIOR | FIRST | LAST ] FROM cursor_name FETCH ABSOLUTE n FROM cursor_name FETCH RELATIVE n FROM cursor_name Scrollable cursors can potentially access the same row in the result set multiple times. Thus, data modifications (insert, update, delete operations) from other transactions could have an impact on the result set. A cursor can be SENSITIVE or INSENSITIVE to such data modifications. A sensitive cursor picks up data modifications impacting the result set of the cursor, and an insensitive cursor does not. Additionally, a cursor may be ASENSITIVE, in which case the DBMS tries to apply sensitivity as much as possible.

WITH HOLD
Cursors are usually closed automatically at the end of a transaction, i.e when a COMMIT or ROLLBACK (or an implicit termination of the transaction) occurs. That behavior can be changed if the cursor is declared using the WITH HOLD clause. (The default is WITHOUT HOLD.) A holdable cursor is kept open over COMMIT and closed upon ROLLBACK. (Some DBMS deviate from this standard behavior and also keep holdable cursors open over ROLLBACK.) DECLARE cursor_name CURSOR WITH HOLD FOR SELECT ... FROM ... When a COMMIT occurs, a holdable cursor is positioned before the next row. Thus, a positioned UPDATE or positioned DELETE statement will only succeed after a FETCH operation occurred first in the transaction. Note that JDBC defines cursors as holdable per default. This is done because JDBC also activates auto-commit per default. Due to the usual overhead associated with auto-commit and holdable cursors, both features should be explicitly deactivated at the connection level. Positioned Update/Delete Statements Cursors can not only be used to fetch data from the DBMS into an application but also to identify a row in a table to be updated or deleted. The SQL:2003 standard defines positioned update and positioned delete SQL statements for that purpose. Such statements do not use a regular WHERE clause with predicates. Instead, a cursor identifies the row. The cursor must be opened and positioned on a row already using the FETCH statement. UPDATE table_name SET ... WHERE CURRENT OF cursor_name DELETE FROM table_name WHERE CURRENT OF cursor_name The cursor must operate on an updatable result set in order to successfully execute a positioned update or delete statement. Otherwise, the DBMS would not know how to apply the data changes to the underlying tables referred to in the cursor. Cursors in Distributed Transactions Using cursors in distributed transactions (X/Open XA Environments), which are controlled using a transaction monitor, is no different than cursors in non-distributed transactions. One has to pay attention when using holdable cursors, however. Connections can be used by different applications. Thus, once a transaction has been ended and committed, a subsequent transaction (running in a different application) could inherit existing holdable cursors. Therefore, an application developer has to be aware of that situation.

Disadvantages of Cursors
The following information may vary from database system to database system. Fetching a row from the cursor may result in a network round trip each time. This uses much more network bandwidth than would ordinarily be needed for the execution of a single SQL statement like DELETE. Repeated network round trips can severely impact the speed of the operation using the cursor. Some DBMSs try to reduce this impact by using block fetch. Block fetch implies that multiple rows are sent together from the server to the client. The client stores a whole block of rows in a local buffer and retrieves the rows from there until that buffer is exhausted.

Cursors allocate resources at the server, for instance locks, packages, processes, temporary storage, etc. For example, Microsoft SQL Server implements cursors by creating a temporary table and populating it with the query's result set. If a cursor is not properly closed (deallocated), the resources will not be freed until the SQL session (connection) itself is closed. This wasting of resources on the server can not only lead to performance degradations but also to failures.

Index
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records. The disk space required to store the index is typically less than the storage of the table (since indexes usually contain only the key-fields according to which the table is to be arranged, and excludes all the other details in the table), yielding the possibility to store indexes into memory from tables that would not fit into it. In a relational database an index is a copy of part of a table. Some databases extend the power of indexing by allowing indexes to be created on functions or expressions. For example, an index could be created on upper (last_name), which would only store the uppercase versions of the last_name field in the index. Another option sometimes supported is the use of "filtered" indexes, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions. All of these indexing refinements are supported in Visual FoxPro, for example.[1] Indexes may be defined as unique or non-unique. A unique index acts as a constraint on the table by preventing identical rows in the index and thus, the original columns.

Architecture
Index architectures can be classified as clustered or non-clustered. A non-clustered index normally contains a reference to a block that contains the row data for which the particular index item has been constructed. This block will hold several other rows depending on the row size. For each index lookup on a non-clustered index a data block that houses the row sought after must also be retrieved. Clustering re-orders the data block in the same order as the index, hence it is also an operation on the data storage blocks as well as on the index. Exact operation of database systems vary, but because the row data can only be stored in one order physically only one clustered index may be created on a given database table. Clustered indexes can greatly increase access speed, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items are selected. Since the physical records are in this sort order on disk the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, while others intermix the two different types of data blocks within the same physical file(s). Databases that use the latter scheme may be said to store the actual data in the leaf node of the index, whereas, in fact there is still a distinction between the index and the data block, and the data blocks can be traversed without using the index by way of a link list that links the data blocks in order. In Microsoft SQL Server, the leaf node of the clustered index corresponds to the actual data, not simply a pointer to data that resides elsewhere, as is the case with a non-clustered index. [2] Each relation can have a single clustered index and many unclustered indexes. [3]. Indexes can be implemented using a variety of data structures. Popular indices include balanced trees, [[B+ tree]] s and hashes.[4]

Column order

The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column. For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it would be very tedious to find all the phone numbers for a given last name. You would have to look within each city's section for the entries with that last name. Some databases can do this; others just wont use the index.

Applications and limitations


Indexes are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Finkelstein';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the b-tree data structure until the Finkelstein entry has been found; this is much less computationally expensive than a full table scan. Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';. This query would yield an email address for every customer whose email address ends with "@yahoo.com", but even if the email_address column has been indexed the database still must perform a full table scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term the database software is unable to use the underlying b-tree data structure. This problem can be solved through the addition of another index created on reverse (email_address) and a SQL query like this: SELECT email_address FROM customers WHERE reverse (email_address) LIKE reverse ('%@yahoo.com');. This puts the wild-card at the right most part of the query (now moc.oohay@ %) which the index on reverse(email_address) can satisfy.

Types Of Index Bitmap index


Main article: Bitmap index A bitmap index is a special kind of index that stores the bulk of its data as bitmaps and answers most queries by performing bitwise logical operations on these bitmaps. The most commonly used index, such as B+trees, are most effective if the values it indexes do not repeat or repeat a relatively smaller number of times. In contrast, the bitmap index is designed for cases where the values of a variable repeats very frequently. For example, the gender field in a customer database usually contains two distinct values, male or female. For such variables, the bitmap index can have a significant performance advantage over the commonly used trees.

Dense index
A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a record in the sorted data file. In clustered indexes with duplicate keys the dense index points to the first record with that key.[5]

Sparse index
A sparse index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indexes with duplicate keys the sparse index points to the lowest search key in each block.

Stored procedure
In a database management system (DBMS), it is an SQL program that is stored in the database which is executed by calling it directly from the client or from a database trigger. When the SQL procedure is stored in the database, it does not have to be replicated in each client. This saves programming effort especially when different client user interfaces and development systems are used. Triggers and stored procedures are built into DBMSs used in client/server environments.

A stored procedure is a subroutine available to applications accessing a relational database system. Stored procedures (sometimes called a sproc or SP) are actually stored in the database. Typical uses for stored procedures include data validation (integrated into the database) or access control mechanisms. Furthermore, stored procedures are used to consolidate and centralize logic that was originally implemented in applications. Large or complex processing that might require the execution of several SQL statements is moved into stored procedures and all applications call the procedures only. Stored procedures are similar to user-defined functions (UDFs). The major difference is that UDFs can be used like any other expression within SQL statements, whereas stored procedures must be invoked using the CALL statement CALL procedure () Stored procedures can return result sets, i.e. the results of a SELECT statement. Such result sets can be processed using cursors by other stored procedures by associating a result set locator, or by applications. Stored procedures may also contain declared variables for processing data and cursors that allow it to loop through multiple rows in a table. The standard Structured Query Language provides IF, WHILE, LOOP, REPEAT, and CASE statements, and more. Stored procedures can receive variables, return results or modify variables and return them, depending on how and where the variable is declared.

Implementation
The exact and correct implementation of stored procedure varies from one database system to another. Most major database vendors support them in some form. Depending on the database system, stored procedures can be implemented in a variety of programming languages, for example SQL, Java, or C and [[C++ (programming language) |C++]]. Stored procedures written in non-SQL programming languages may or may not execute SQL statements themselves.

Advantages
A variety of advantages can be obtained through the use of stored procedures. Pre-compilation of SQL statements SQL statements implemented as stored procedures in some cases run faster, as they can be precompiled. Execution plans for compiled statements can be stored in the database, together with the procedure. This can remove the compilation overhead that is typically required in situations where software applications send inline SQL queries to a database. However, most database systems implement statement caches to avoid repetitive compilation of dynamic SQL statements. In addition, pre-compiled SQL statements, while avoiding some overhead, add to the complexity of creating an optimal execution plan because not all arguments of the SQL statement are supplied at compile time. Depending on the specific database implementation and configuration, mixed

performance results will be seen from stored procedures versus generic queries or user defined functions .

Execution on a database server


Stored procedures can run directly within the database engine. In a production system, this typically means that the procedures run entirely on a specialized database server, which has direct access to the data being accessed. The benefit here is that network communication costs can be avoided completely. This becomes particularly important for complex series of SQL statements. However, note that unnecessary or excessive procedural statement execution in the database server (typically a singular shared resource) may impair overall enterprise system performance i.e., while application servers can often be dramatically scaled horizontally for increased processing capacity, the same is not generally or as easily accomplished for database servers. Therefore, a growing school of thought (not among old school DBAs of course) advocates the database be used for what it's best at - i.e., a very efficient file cabinet. Thereby restricting any database-local procedural executions to only very specific cases rather than the old school ubiquity instead advocating the use of advanced object oriented domain class ontologies and reusable parameterized SQL generation. However, the culture war on this point is likely to continue for years more! Simplification of data management Stored procedures allow for business logic to be embedded as an API in the database, which can simplify data management and reduce the need to encode the logic elsewhere in client programs. This may result in a lesser likelihood of data becoming corrupted through the use of faulty client programs. Thus, the database system can ensure data integrity and consistency with the help of stored procedures. Some critics claim that databases should be for storing data only, and that business logic should only be implemented by writing a business layer of code, through which client applications should access the data. However, the use of stored procedures does not preclude the use of a business layer.

Security
Carefully written stored procedures may allow for fine grained security permissions to be applied to a database. For example, client programs might be restricted from accessing the database via any means except those that are provided by the available stored procedures.

Other uses
In some systems, stored procedures can be used to control transaction management; in others, stored procedures run inside a transaction such that transactions are effectively transparent to them. Stored procedures can also be invoked from a database trigger or a condition handler. For example, a stored procedure may be triggered by an insert on a specific table, or update of a specific field in a table, and the code inside the stored procedure would be executed. Writing stored procedures as condition handlers also allow DBAs to track errors in the system with greater detail by using stored procedures to catch the errors and record some audit information in the database or an external resource like a file.

Database trigger
Creates a DML or DDL trigger. A trigger is a special kind of stored procedure that automatically executes when an event occurs in the database server. DML triggers execute when a user tries to modify data through a data manipulation language (DML) event. DML events are INSERT, UPDATE, or DELETE statements on a table or view. DDL triggers execute in response to a variety of data definition language (DDL) events. These are primarily CREATE, ALTER, and DROP statements. DML and DDL triggers can be created in the SQL Server 2005 Database Engine directly from TransactSQL statements or from methods of assemblies that are created in the Microsoft .NET Framework common language runtime (CLR) and uploaded to an instance of SQL Server. SQL Server allows for creating multiple triggers for any specific statement. A database trigger is procedural code that is automatically executed in response to certain events on a particular table in a database. Triggers can restrict access to specific data, perform logging, or audit data modifications. There are two classes of triggers, they are either "row triggers" or "statement triggers". With row triggers you can define an action for every row of a table, while statement triggers occur only once per INSERT, UPDATE, or DELETE statement. Triggers cannot be used to audit data retrieval via SELECT statements. Each class can be of several types. There are "BEFORE triggers" and "AFTER triggers" which identifies the time of execution of the trigger. There is also an "INSTEAD OF trigger" which is a trigger that will execute instead of the triggering statement. There are typically three triggering events that cause triggers to 'fire':

INSERT event (as a new record is being inserted into the database). UPDATE event (as a record is being changed). DELETE event (as a record is being deleted).

The trigger is used to the automate DML condition process. The major features and effects of database triggers are that they:

do not accept parameters or arguments (but may store affected-data in temporary tables) cannot perform commit or rollback operations because they are part of the triggering SQL statement (only through autonomous transactions) can cause mutating table errors, if they are poorly written.

Triggers in Microsoft SQL Server


Microsoft SQL Server supports triggers either after or instead of an insert, update, or delete operation. Microsoft SQL Server supports triggers on tables and views with the constraint that a view can be referenced only by an INSTEAD OF trigger. Microsoft SQL Server 2005 introduced support for Data Definition Language (DDL) triggers, which can fire in reaction to a very wide range of events, including:

Drop table Create table Alter table Login events

Trigger on an INSERT, UPDATE, or DELETE statement to a table or view (DML Trigger) CREATE TRIGGER [ schema_name .]trigger_name ON {table | view} [WITH <dml_trigger_option> [...n]] {FOR | AFTER | INSTEAD OF} { [ INSERT ] [ , ] [ UPDATE ] [ , ] [ DELETE ] } [WITH APPEND] [NOT FOR REPLICATION] AS {sql_statement [;] [...n] | EXTERNAL NAME <method specifier [ ; ] > }

Database transaction
A database transaction is a unit of interaction with a database management system or similar system that is treated in a coherent and reliable way independent of other transactions. In general, a database transaction must be atomic, meaning that it must be either entirely completed or aborted. Ideally, a database system will guarantee the properties of Atomicity, Consistency, Isolation and Durability (ACID) for each transaction. In practice, these properties are often relaxed somewhat to provide better performance. In some systems, transactions are also called LUWs for Logical Units of Work.

Purpose of transaction
In database products the ability to handle transactions allows the user to ensure that integrity of a database is maintained. A single transaction might require several queries, each reading and/or writing information in the database. When this happens it is usually important to be sure that the database is not left with only some of the queries carried out. For example, when doing a money transfer, if the money was debited from one account, it is important that it also be credited to the depositing account. Also, transactions should not interfere with each other. For more information about desirable transaction properties, see ACID. A simple transaction is usually issued to the database system in a language like SQL in this form: 1. 2. 3. Begin the transaction Execute several queries (although any updates to the database aren't actually visible to the outside world yet) Commit the transaction (updates become visible if the transaction is successful)

If one of the queries fails the database system may rollback either the entire transaction or just the failed query. This behaviour is dependent on the DBMS in use and how it is set up. The transaction can also be rolled back manually at any time before the commit.

Transactional databases
Databases that support transactions are called transactional databases. Most modern relational database management systems fall into this category.

What is normalization?
Normalization is the process of designing a data model to efficiently store data in a database. The end result is that redundant data is eliminated, and only data related to the attribute is stored within the table. For example, let's say we store City, State and ZipCode data for Customers in the same table as Other Customer data. With this approach, we keep repeating the City, State and ZipCode data for all Customers in the same area. Instead of storing the same data again and again, we could normalize the data and create a related table called City. The "City" table could then store City, State and ZipCode along with IDs that relate back to the Customer table, and we can eliminate those three columns from the Customer table and add the new ID column. Normalization rules have been broken down into several forms. People often refer to the third normal form (3NF) when talking about database design. This is what most database designers try to achieve: In the conceptual stages, data is segmented and normalized as much as possible, but for practical purposes those segments are changed during the evolution of the data model. Various normal forms may be introduced for different parts of the data model to handle the unique situations you may face. Whether you have heard about normalization or not, your database most likely follows some of the rules, unless all of your data is stored in one giant table. We will take a look at the first three normal forms and the rules for determining the different forms here.

First Normal Form (1NF)

In first normal form, every entity in the database has a primary key attribute (or set of attributes). Each attribute must have only one value, and not a set of values. For a database to be in 1NF it must not have any repeating groups. A repeating group is data in which a single instance may have multiple values for a given attribute. For example, consider a recording studio that stores data about all its artists and their albums. Table 4.1 outlines an entity that stores some basic data about the artists signed to the recording studio. Table 4.1. Artists and Albums: Repeating Groups of Data Artist Name The Awkward Stage Girth Wasabi Peanuts The Bobby Jenkins Band Juices of Brazil Genre Album Name Rock Home Metal On the Sea Adult Contemporary Rock Spicy Legumes R&B Live! Running the Game Latin Jazz Long Road Album Release Date 10/01/2006 5/25/1997 11/12/2005 7/27/1985 10/30/1988 1/01/2003

White

6/10/2005

Notice that for the first artist, there is only one album and therefore one release date. However, for the fourth and fifth artists, there are two albums and two release dates. In practice, we cannot guarantee which release date belongs to which album. Sure, it'd be easy to assume that the first

release date belongs to the first album name, but how can we be sure that album names and dates are always entered in order and not changed afterward? There are two ways to eliminate the problem of the repeating group. First, we could add new attributes to handle the additional albums, as in Table 4.2. Table 4.2. Artists and Albums: Eliminate the Repeating Group, but at What Cost? Artist Name The Awkward Stage Girth Wasabi Peanuts Genre Rock Album Name Release Date Album 1 1 Name 2 Home 10/01/2006 NULL 5/25/1997 11/12/2005 7/27/1985 1/01/2003 NULL NULL Live! White Release Date 2 NULL NULL NULL 10/30/1988 6/10/2005

Metal On the Sea Adult Contemporary Spicy Legumes Rock The Bobby Jenkins R&B Running the Band Game Juices of Brazil Latin Jazz Long Road

We've solved the problem of the repeating group, and because no attribute contains more than one value, this table is in 1NF. However, we've introduced a much bigger problem: what if an artist has more than two albums? Do we keep adding two attributes for each album that any artist releases? In addition to the obvious problem of adding attributes to the entity, in the physical implementation we are wasting a great deal of space for each artist who has only one album. Also, querying the resultant table for album names would require searching every album name column, something that is very inefficient. If this is the wrong way, what's the right way? Take a look at Tables 4.3 and 4.4.

Table 4.3. The Artists ArtistName The Awkward Stage Girth Wasabi Peanuts The Bobby Jenkins Band Juices of Brazil Table 4.4. The Albums AlbumName White Home On The Sea Spicy Legumes Running the Game Live! Long Road ReleaseDate 6/10/2005 10/01/2006 5/25/1997 11/12/2005 7/27/1985 10/30/1988 1/01/2003 ArtistName Juices of Brazil The Awkward Stage Girth Wasabi Peanuts The Bobby Jenkins Band The Bobby Jenkins Band Juices of Brazil Genre Rock Metal Adult Contemporary Rock R&B Latin Jazz

We've solved the problem by adding another entity that stores album names as well the attribute that represents the relationship to the artist entity. Neither of these entities has a repeating group,

each attribute in both entities holds a single value, and all of the previously mentioned query problems have been eliminated. This database is now in 1NF and ready to be deployed, right? Considering there are several other normal forms, we think you know the answer.

Second Normal Form (2NF)


Second normal form (2NF) specifies that, in addition to meeting 1NF, all non-key attributes have a functional dependency on the entire primary key. A functional dependency is a one-way relationship between the primary key attribute (or attributes) and all other non-key attributes in the same entity. Referring again to Table 4.3, if ArtistName is the primary key, then all other attributes in the entity must be identified by ArtistName. So we can say, "ArtistName determines ReleaseDate" for each instance in the entity. Notice that the relationship does not necessarily hold in the reverse direction; any genre may appear multiple times throughout this entity. Nonetheless, for any given artist, there is one genre. But what if an artist crosses over to another genre? To answer that question, let's compare 1NF to 2NF. In 1NF, we have no repeating groups, and all attributes have a single value. However, in 1NF, if we have a composite primary key, it is possible that there are attributes that rely on only one of the primary key attributes, and that can lead to strange data manipulation anomalies. Take a look at Table 4.5, in which we have solved the multiple genre problem. But we have added new attributes, and that presents a new problem.

Table 4.5. Artists: 1NF Is Met, but with Problems PKArtist PKGenre Name The Rock Awkward Stage Girth Metal Wasabi Peanuts Adult Contemporary Rock The Bobby R&B Jenkins Band The Bobby Soul Jenkins Band Juices of Latin Jazz Brazil Juices of World Beat Brazil SignedDate Agent 9/01/2005 AgentPrimaryPhone AgentSecondaryPhone NULL

John Doe (777)555-1234

10/31/1997 Sally (777)555-6789 Sixpack 1/01/2005 John Doe (777)555-1234

(777)555-0000 NULL

3/15/1985

Johnny Jenkins Johnny Jenkins

(444)555-1111

NULL

3/15/1985

(444)555-1111

NULL

6/01/2001 6/01/2001

Jane Doe (777)555-4321 Jane Doe (777)555-4321

(777)555-9999 (777)555-9999

In this case, we have two attributes in the primary key: Artist Name and Genre. If the studio decides to sell the Juices of Brazil albums in multiple genres to increase the band's exposure, we end up with multiple instances of the group in the entity, because one of the primary key attributes has a different value. Also, we've started storing the name of each band's agent. The problem here is that the Agent attribute is an attribute of the artist but not of the genre. So the Agent attribute is only partially dependent on the entity's primary key. If we need to update the Agent attribute for a band that has multiple entries, we must update multiple records or else risk having two different agent names listed for the same band. This practice is inefficient and risky from a data integrity standpoint. It is this type of problem that 2NF eliminates.

Tables 4.6 and 4.7 show one possible solution to our problem. In this case, we can break the entity into two different entities. The original entity still contains only information about our artists; the new entity contains information about agents and the bands they represent. This technique removes the partial dependency of the Agent attribute from the original entity, and it lets us store more information that is specific to the agent. Table 4.6. Artists: 2NF Version of This Entity PKArtist Name The Awkward Stage Girth Wasabi Peanuts The Bobby Jenkins Band The Bobby Jenkins Band Juices of Brazil Juices of Brazil PKGenre SignedDate Rock 9/01/2005 Metal 10/31/1997 Adult Contemporary Rock 1/01/2005 R&B 3/15/1985 Soul 3/15/1985 Latin Jazz 6/01/2001 World Beat 6/01/2001

Table 4.7. Agents: An Additional Entity to Solve the Problem PKAgent Name John Doe Sally Sixpack Johnny Jenkins Jane Doe Artist Name The Awkward Stage Girth The Bobby Jenkins Band Juices of Brazil AgentPrimaryPhone 555-1234 (777)555-6789 (444)555-1111 555-4321 AgentSecondaryPhone NULL (777)555-0000 NULL 555-9999

Third Normal Form (3NF)


Third normal form is the form that most well-designed databases meet. 3NF extends 2NF to include the elimination of transitive dependencies. Transitive dependencies are dependencies that arise from a non-key attribute relying on another non-key attribute that relies on the primary key. In other words, if there is an attribute that doesn't rely on the primary key but does rely on another attribute, then the first attribute has a transitive dependency. As with 2NF, to resolve this issue we might simply move the offending attribute to a new entity. Coincidentally, in solving the 2NF problem in Table 4.7, we also created a 3NF entity. In this particular case, AgentPrimaryPhone and AgentSecondaryPhone are not actually attributes of an artist; they are attributes of an agent. Storing them in the Artists entity created a transitive dependency, violating 3NF. The differences between 2NF and 3NF are very subtle. 2NF deals with partial dependency, and 3NF with transitive dependency. Basically, a partial dependency means that attributes in the entity don't rely entirely on the primary key. Transitive dependency means that attributes in the entity don't rely on the primary key at all, but they do rely on another non-key attribute in the table. In either case, removing the offending attribute (and related attributes, in the 3NF case) to another entity solves the problem. One of the simplest ways to remember the basics of 3NF is the popular phrase, "The key, the whole key, and nothing but the key." Because the normal forms are nested, the phrase means that 1NF is met because there is a primary key ("the key"), 2NF is met because all attributes in the table rely on all the attributes in the primary key ("the whole key"), and 3NF is met because none of the nonkey attributes in the entity relies on any other non-key attributes ("nothing but the key"). Often, people append the phrase, "So help me Codd." Whatever helps you keep it straight.

Boyce-Codd Normal Form (BCNF)


In certain situations, you may discover that an entity has more than one potential, or candidate, primary key (single or composite). Boyce-Codd normal form simply adds a requirement, on top of 3NF, that states that if any entity has more than one possible primary key, then the entity should be split into multiple entities to separate the primary key attributes. For the vast majority of databases, solving the problem of 3NF actually solves this problem as well, because identifying the

attribute that has a transitive dependency also tends to reveal the candidate key for the new entity being created. However, strictly speaking, the original 3NF definition did not specify this requirement, so BCNF was added to the list of normal forms to ensure that this was covered.

Fourth Normal Form (4NF) and Fifth Normal Form (5NF)


You've seen that 3NF generally solves most logical problems within databases. However, there are more-complicated relationships that often benefit from 4NF and 5NF. Consider Table 4.8, which describes an alternative, expanded version of the Agents entity. Table 4.8. Agents: More Agent Information PKAgent Name John Doe Sally Sixpack John Doe Johnny Jenkins Jane Doe PKAgency AAA Talent A Star Is Born Agency AAA Talent Johnny Jenkins Talent BBB Talent PKArtist Name AgentPrimaryPhone AgentSecondaryPhone The Awkward Stage Girth (777)555-1234 (777)555-6789 NULL (777)555-0000 NULL NULL (777)555-9999

Wasabi Peanuts (777)555-1234 The Bobby Jenkins (444)555-1111 Band Juices of Brazil (777)555-4321

Specifically, this entity stores information that creates redundancy, because there is a multivalued dependency within the primary key. A multivalued dependency is a relationship in which a primary key attribute, because of its relationship to another primary key attribute, creates multiple tuples within an entity. In this case, John Doe represents multiple artists. The primary key requires that the Agent Name, Agency, and Artist Name uniquely define an agent; if you don't know which agency an agent works for and if an agent quits or moves to another agency, updating this table will require multiple updates to the primary key attributes. There's a secondary problem as well: we have no way of knowing whether the phone numbers are tied to the agent or tied to the agency. As with 2NF and 3NF, the solution here is to break Agency out into its own entity. 4NF specifies that there be no multivalued dependencies in an entity. Consider Tables 4.9 and 4.10, which show a 4NF of these entities. Table 4.9. Agent-Only Information PKAgent Name John Doe Sally Sixpack John Doe Johnny Jenkins Jane Doe AgentPrimaryPhone (777)555-1234 (777)555-6789 (777)555-1234 (444)555-1111 (777)555-4321 AgentSecondaryPhone NULL (777)555-0000 NULL NULL (777)555-9999 Artist Name The Awkward Stage Girth Wasabi Peanuts The Bobby Jenkins Band Juices of Brazil

Table 4.10. Agency Information PKAgency AAA Talent A Star Is Born Agency AAA Talent Johnny Jenkins Talent BBB Talent AgencyPrimaryPhone (777)555-1234 (777)555-0000 (777)555-4455 (444)555-1100 (777)555-9999

Now we have a pair of entities that have relevant, unique attributes that rely on their primary keys. We've also eliminated the confusion about the phone numbers.

Often, databases that are being normalized with the target of 3NF end up in 4NF, because this multivalued dependency problem is inherently obvious when you properly identify primary keys. However, the 3NF version of these entities would have worked, although it isn't necessarily the most efficient form. Now that we have a number of 3NF and 4NF entities, we must relate these entities to one another. The final normal form that we discuss is fifth normal form (5NF). 5NF specifically deals with relationships among three or more entities, often referred to as tertiary relationships. In 5NF, the entities that have specified relationships must be able to stand alone as individual entities without dependence on the other relationships. However, because the entities relate to one another, 5NF usually requires a physical entity that acts as a resolution entity to relate the other entities to one another. This additional entity has three or more foreign keys (based on the number of entities in the relationship) that specify how the entities relate to one another. This is how many-to-many relationships (as defined in Chapter 2) are actually implemented. Thus, if a many-to-many relationship is properly implemented, the database is in 5NF. Frequently, you can avoid the complexity of 5NF by properly implementing foreign keys in the entities that relate to one another, so 4NF plus these keys generally avoids the physical implementation of a 5NF data model. However, because this alternative is not always realistic, 5NF is defined to help formalize this scenario.

What does normalization have to do with SQL Server?


To be honest, the answer here is nothing. SQL Server, like any other RDBMS, couldn't care less whether your data model follows any of the normal forms. You could create one table and store all of your data in one table or you can create a lot of little, unrelated tables to store your data. SQL Server will support whatever you decide to do. The only limiting factor you might face is the maximum number of columns SQL Server supports for a table. SQL Server does not force or enforce any rules that require you to create a database in any of the normal forms. You are able to mix and match any of the rules you need, but it is a good idea to try to normalize your database as much as possible when you are designing it. People tend to spend a lot of time up front creating a normalized data model, but as soon as new columns or tables need to be added, they forget about the initial effort that was devoted to creating a nice clean model. To assist in the design of your data model, you can use the DaVinci tools that are part of SQL Server Enterprise Manager.

Advantages of normalization
1. Smaller database: By eliminating duplicate data, you will be able to reduce the
overall size of the database. 2. Better performance: a. Narrow tables: Having more fine-tuned tables allows your tables to have less columns and allows you to fit more records per data page. b. Fewer indexes per table mean faster maintenance tasks such as index rebuilds. c. Only join tables that you need.

Disadvantages of normalization
1. More tables to join: By spreading out your data into more tables, you increase the need to join tables. 2. Tables contain codes instead of real data: Repeated data is stored as codes rather than meaningful data. Therefore, there is always a need to go to the lookup table for the value. 3. Data model is difficult to query against: The data model is optimized for applications, not for ad hoc querying.

Summary
Your data model design is both an art and a science. Balance what works best to support the application that will use the database and to store data in an efficient and structured manner. For transaction-based systems, a highly normalized database design is the way to go; it ensures consistent data throughout the entire database and that it is performing well. For reporting-based systems, a less normalized database is usually the best approach. You will eliminate the need to join a lot of tables and queries will be faster. Plus, the database will be much more user friendly for ad hoc reporting needs.

Denormalization
Generally, most online transactional processing (OLTP) systems will perform well if they've been normalized to either 3NF or BCNF. However, certain conditions may require that data be intentionally duplicated or that unrelated attributes be combined into single entities to expedite certain operations. Additionally, online analytical processing (OLAP) systems, because of the way they are used, quite often require that data be denormalized to increase performance. Denormalization, as the term implies, is the process of reversing the steps taken to achieve a normal form. Often, it becomes necessary to violate certain normalization rules to satisfy the realworld requirements of specific queries. Let's look at some examples. In data models that have a completely normalized structure, there tend to be a great many entities and relationships. To retrieve logical sets of data, you often need a great many joins to retrieve all the pertinent information about a given object. Logically this is not a problem, but in the physical implementation of a database, joins tend to incur overhead in query processing time. For every table that is joined, there is usually a cost to scan the indexes on that table and then retrieve the matching data from each object, combine the resulting data, and deliver it to the end user (for more on indexes and query optimization, see Chapter 10). When millions of rows are being scanned and tens or hundreds of rows are being returned, it is costly. In these situations, creating a denormalized entity may offer a performance benefit, at the cost of violating one of the normal forms. The trade-off is usually a matter of having redundant data, because you are storing an additional physical table that duplicates data being stored in other tables. To mitigate the storage effects of this technique, you can often store subsets of data in the duplicate table, clearing it out and repopulating it based on the queries you know are running against it. Additionally, this means that you have additional physical objects to maintain if there are schema changes in the original tables. In this case, accurate documentation and a managed change control process are the only practices that can ensure that all the relevant denormalized objects stay in sync. Denormalization also can help when you're working on reporting applications. In larger environments, it is often necessary to generate reports based on application data. Reporting queries often return large historical data sets, and when you join various types of data in a single report it incurs a lot of overhead on standard OLTP systems. Running these queries on exactly the same databases that the applications are trying to use can result in an overloaded system, creating blocking situations and causing end users to wait an unacceptable amount of time for the data. Additionally, it means storing large amounts of historical data in the OLTP system, something that may have other adverse effects, both internally to the database management system and to the physical server resources. Denormalizing the data in the database to a set of tables (or even to a different physical database) specifically used for reporting can alleviate the pressure on the primary OLTP system while ensuring that the reporting needs are being met. It allows you to customize the tables being used by the reporting system to combine the data sets, thereby satisfying the queries being run in the most efficient way possible. Again, this means incurring overhead to store data that is already being

stored, but often the trade-off is worthwhile in terms of performance both on the OLTP system and the reporting system. Now let's look at OLAP systems, which are used primarily for decision support and reporting. These types of systems are based on the concept of providing a cube of data, whereby the dimensions of the cube are based on fact tables provided by an OLTP system. These fact tables are derived from the OLTP versions of data being stored in the relational database. These tables are often denormalized versions, however, and they are optimized for the OLAP system to retrieve the data that eventually is loaded into the cube. Because OLAP is outside the scope of this book, it's enough for now to know that if you're working on a system in which OLAP will be used, you will probably go through the exercise of building fact tables that are, in some respects, denormalized versions of your normalized tables. When identifying entities that should be denormalized, you should rely heavily on the actual queries that are being used to retrieve data from these entities. You should evaluate all the existing join conditions and search arguments, and you should look closely at the data retrieval needs of the end users. Only after performing adequate analysis on these queries will you be able to correctly identify the entities that need to be denormalized, as well as the attributes that will be combined into the new entities. You'll also want to be very aware of the overhead the system will incur when you denormalize these objects. Remember that you will have to store not only the rows of data but also (potentially) index data, and keep in mind that the size of the data being backed up will increase. Overall, denormalization could be considered the final step of the normalization process. Some OLTP systems have denormalized entities to improve the performance of very specific queries, but more than likely you will be responsible for developing an additional data model outside the actual application, which may be used for reporting, or even OLAP. Either way, understanding the normal forms, denormalization, and their implications for data storage and manipulation will help you design an efficient, logical, and scalable data model.

Potrebbero piacerti anche