Sei sulla pagina 1di 21

SQL Aggregate Functions

o SQL aggregation function is used to perform the calculations on multiple rows of a single column of a
table. It returns a single value.
o It is also used to summarize the data.

Types of SQL Aggregation Function

1. COUNT FUNCTION

o COUNT function is used to Count the number of rows in a database table. It can work on both numeric and
non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table. COUNT(*)
considers duplicate and Null.

Syntax

1. COUNT(*)
2. or
3. COUNT( [ALL|DISTINCT] expression )

Sample table:

PRODUCT_MAST

PRODUCT COMPANY QTY RATE COST


Item1 Com1 2 10 20
Item2 Com2 3 25 75
Item3 Com1 2 30 60
Item4 Com3 5 10 50
Item5 Com2 2 20 40
Item6 Cpm1 3 25 75
Item7 Com1 5 30 150
Item8 Com1 3 10 30
Item9 Com2 2 25 50
Item10 Com3 4 30 120

Example: COUNT( )

1. SELECT COUNT(*)
2. FROM PRODUCT_MAST;

Output:

10

Example: COUNT with WHERE

1. SELECT COUNT(*)
2. FROM PRODUCT_MAST;
3. WHERE RATE>=20;

Output:

Example: COUNT( ) with DISTINCT

1. SELECT COUNT(DISTINCT COMPANY)


2. FROM PRODUCT_MAST;

Output:

Example: COUNT( ) with GROUP BY

1. SELECT COMPANY, COUNT(*)


2. FROM PRODUCT_MAST
3. GROUP BY COMPANY;

Output:

Com1 5
Com2 3
Com3 2

Example: COUNT( ) with HAVING


1. SELECT COMPANY, COUNT(*)
2. FROM PRODUCT_MAST
3. GROUP BY COMPANY
4. HAVING COUNT(*)>2;

Output:

Com1 5
Com2 3

2. SUM Function

Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.

Syntax

1. SUM( )
2. or
3. SUM( [ALL|DISTINCT] expression )

Example: SUM( )

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST;

Output:

670

Example: SUM( ) with WHERE

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST
3. WHERE QTY>3;

Output:

320

Example: SUM() with GROUP BY

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST
3. WHERE QTY>3
4. GROUP BY COMPANY;

Output:

Com1 150
Com2 170
Example: SUM() with HAVING

1. SELECT COMPANY, SUM(COST)


2. FROM PRODUCT_MAST
3. GROUP BY COMPANY
4. HAVING SUM(COST)>=170;

Output:

Com1 335
Com3 170

3. AVG function

The AVG function is used to calculate the average value of the numeric type. AVG function returns the average of
all non-Null values.

Syntax

1. AVG( )
2. or
3. AVG( [ALL|DISTINCT] expression )

Example:

1. SELECT AVG(COST)
2. FROM PRODUCT_MAST;

Output:

67.00

4. MAX Function

MAX function is used to find the maximum value of a certain column. This function determines the largest value of
all selected values of a column.

Syntax

1. MAX( )
2. or
3. MAX( [ALL|DISTINCT] expression )

Example:

1. SELECT MAX(RATE)
2. FROM PRODUCT_MAST;
30

5. MIN Function
MIN function is used to find the minimum value of a certain column. This function determines the smallest value of
all selected values of a column.

Syntax

1. MIN()
2. or
3. MIN( [ALL|DISTINCT] expression )

Example:

1. SELECT MIN(RATE)
2. FROM PRODUCT_MAST;

Output:

10
SQL CREATE VIEW Statement

In SQL, a view is a virtual table based on the result-set of an SQL statement.

A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables
in the database.

You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were
coming from one single table.

CREATE VIEW Syntax


CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

Note: A view always shows up-to-date data! The database engine recreates the data, using the view's SQL
statement, every time a user queries a view.

SQL CREATE VIEW Examples

The following SQL creates a view that shows all customers from Brazil:

Example
CREATE VIEW [Brazil Customers] AS
SELECT CustomerName, ContactName
FROM Customers
WHERE Country = "Brazil";
Your Database:
Tablename Records

Customers 91

Categories 8

Employees 10

OrderDetails 518

Orders 196

Products 77

Shippers 3

Suppliers 29

Views:
Name of View Records

Brazil Customers 9

SQL INDEXES

Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an
index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book.
For example, if you want to reference all pages in a book that discusses a certain topic, you first refer to the index,
which lists all the topics alphabetically and are then referred to one or more specific page numbers.
An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with
the UPDATE and the INSERT statements. Indexes can be created or dropped with no effect on the data.
Creating an index involves the CREATE INDEX statement, which allows you to name the index, to specify the
table and which column or columns to index, and to indicate whether the index is in an ascending or descending
order.
Indexes can also be unique, like the UNIQUE constraint, in that the index prevents duplicate entries in the column
or combination of columns on which there is an index.

The CREATE INDEX Command


The basic syntax of a CREATE INDEX is as follows.
CREATE INDEX index_name ON table_name;
Single-Column Indexes
A single-column index is created based on only one table column. The basic syntax is as follows.
CREATE INDEX index_name
ON table_name (column_name);
Unique Indexes
Unique indexes are used not only for performance, but also for data integrity. A unique index does not allow any
duplicate values to be inserted into the table. The basic syntax is as follows.
CREATE UNIQUE INDEX index_name
on table_name (column_name);
Composite Indexes
A composite index is an index on two or more columns of a table. Its basic syntax is as follows.
CREATE INDEX index_name
on table_name (column1, column2);
Whether to create a single-column index or a composite index, take into consideration the column(s) that you may
use very frequently in a query's WHERE clause as filter conditions.
Should there be only one column used, a single-column index should be the choice. Should there be two or more
columns that are frequently used in the WHERE clause as filters, the composite index would be the best choice.
Implicit Indexes
Implicit indexes are indexes that are automatically created by the database server when an object is created.
Indexes are automatically created for primary key constraints and unique constraints.

The DROP INDEX Command


An index can be dropped using SQL DROP command. Care should be taken when dropping an index because the
performance may either slow down or improve.
The basic syntax is as follows −
DROP INDEX index_name;
You can check the INDEX Constraint chapter to see some actual examples on Indexes.
When should indexes be avoided?
Although indexes are intended to enhance a database's performance, there are times when they should be avoided.
The following guidelines indicate when the use of an index should be reconsidered.
 Indexes should not be used on small tables.
 Tables that have frequent, large batch updates or insert operations.
 Indexes should not be used on columns that contain a high number of NULL values.
 Columns that are frequently manipulated should not be indexed.
The SQL CREATE TABLE Statement

 The CREATE TABLE statement is used to create a new table in a database.

Syntax
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
....
);

 The column parameters specify the names of the columns of the table.
 The datatype parameter specifies the type of data the column can hold (e.g. varchar, integer, date, etc.).

The SQL DROP TABLE Statement

 The DROP TABLE statement is used to drop an existing table in a database.

Syntax
DROP TABLE table_name;

 Note: Be careful before dropping a table. Deleting a table will result in loss of complete information stored
in the table!

SQL TRUNCATE TABLE

The TRUNCATE TABLE statement is used to delete the data inside a table, but not the table itself.

Syntax
TRUNCATE TABLE table_name;
First Normal Form (1NF)

o A relation will be 1NF if it contains an atomic value.


o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

next →← prev
First Normal Form (1NF)

o A relation will be 1NF if it contains an atomic value.


o It states that an attribute of a table cannot hold multiple values. It must
hold only single-valued attribute.
o First normal form disallows the multi-valued attribute, composite
attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued


attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

Second Normal Form (2NF)

o In the 2NF, relational must be in 1NF.


o In the second normal form, all non-key attributes are fully functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can
teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE


25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of
a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer

Third Normal Form (3NF)

o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function
dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The
non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It
violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table,
with EMP_ZIP as a Primary key.

EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

Boyce Codd normal form (BCNF)

o BCNF is the advance version of 3NF. It is stricter than 3NF.


o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO


264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:

EMP_ID EMP_COUNTRY
264 India
264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO


Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)

o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a
multi-valued dependency.

Example
STUDENT

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no
relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

STUDENT_HOBBY

STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

FAQ: What is the distinction between BCNF and 3NF? Is there a reason to prefer one over the other?

ANSWER: 3NF is the Third normal form used in relational database normalization. According to the Codd’s
definition, a table is said to be in 3NF, if and only if, that table is in the second normal form (2NF), and every
attribute in the table that do not belong to a candidate key should directly depend on every candidate key of that
table.
BCNF (also known as 3.5NF) is another normal form used in relational database normalization. It was introduced to
capture some the anomalies that are not addressed by the 3NF. A table is said to be in BCNF, if and only if, for each
of the dependencies of the form A → B that are non-trivial, A is a super-key.

BCNF acts differently from 3NF only when the relation has multiple overlapping candidate keys.

The reason is that the functional dependency X -> Y is of course true if Y is a subset of X. So in any table that has
only one candidate key and is in 3NF, it is already in BCNF because there is no column (either key or non-key) that
is functionally dependent on anything besides that key.
Example: Assume your pizza has exactly three topping types, and you must choose:

 one type of cheese


 one type of meat
 one type of vegetable
So we order two pizzas and choose the following toppings:

1. Pizza Topping Topping Type


2. -------- ----------- -------------
3. 1 mozzarella cheese
4. 1 pepperoni meat
5. 1 olives vegetable
6. 2 mozzarella meat
7. 2 sausage cheese
8. 2 peppers vegetable
Wait a second, mozzarella can't be both a cheese and a meat! And sausage isn't a cheese!

Because each pizza must have exactly one of each topping type, we know that (Pizza, Topping Type) is a candidate
key. We also know intuitively that a given topping cannot belong to different types simultaneously. So
(Pizza, Topping) must be unique and therefore is also a candidate key. So we have two overlapping candidate keys.
We need to prevent these sorts of mistakes, to make mozzarella always be cheese. We should use a separate table for
this, so we write down that fact in only one place.

1. Pizza Topping
2. -------- ----------
3. 1 mozzarella
4. 1 pepperoni
5. 1 olives
6. 2 mozzarella
7. 2 sausage
8. 2 peppers
9.
10. Topping Topping Type
11. ----------- -------------
12. mozzarella cheese
13. pepperoni meat
14. olives vegetable
15. sausage meat
16. peppers vegetable

I showed an anomaly where we marked mozzarella as the wrong topping type. We know this is wrong, but the rule
that makes it wrong is a dependency Topping -> Topping Type which is n ot a valid dependency for BCNF for this
table. It's a dependency on something other than a whole candidate key.
So to solve this, we take Topping Type out of the Pizzas table and make it a non-key attribute in a Toppings table.
Relational Calculus

o Relational calculus is a non-procedural query language. In the non-procedural query language, the user is
concerned with the details of how to obtain the end results.
o The relational calculus tells what to do but never explains how to do.

Types of Relational calculus:

1. Tuple Relational Calculus (TRC)

o The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering variable uses the
tuples of a relation.
o The result of the relation can have one or more tuples.

Notation:

1. {T | P (T)} or {T | Condition (T)}

Where

T is the resulting tuples

P(T) is the condition used to fetch T.

For example:

1. { T.name | Author(T) AND T.article = 'database' }

OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from Author who
has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal Quantifiers (∀).

For example:

1. { R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}

Output: This query will yield the same result as the previous one.

2. Domain Relational Calculus (DRC)

o The second form of relation is known as Domain relational calculus. In domain relational calculus, filtering
variable uses the domain of attributes.
o Domain relational calculus uses the same operators as tuple calculus. It uses logical connectives ∧ (and), ∨
(or) and ┓ (not).
o It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.

Notation:

1. { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}

Where

a1, a2 are attributes


P stands for formula built by inner attributes

For example:

1. {< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}

Output: This query will yield the article, page, and subject from the relational javatpoint, where the subject is a
database.

GATE | GATE CS 2013 | Question 65


Consider the following relational schema.
Students(rollno: integer, sname: string)
Courses(courseno: integer, cname: string)
Registration(rollno: integer, courseno: integer, percent: real)
Which of the following queries are equivalent to this query in English?
"Find the distinct names of all students who score
more than 90% in the course numbered 107"
(A) I, II, III and IV
(B) I, II and III only
(C) I, II and IV only
(D) II, III and IV only

Answer: (A)

Explanation:
Option A:

This is a SQL query expression. It first perform a cross product of Students


and Registration, then WHERE clause only keeps those rows in the cross product
set where the student is registered for course no 107, and percentage is > 90.
Then select distinct statement gives the distinct names of those students as the
result set.

Option B:

This is a relational algebra expression. It first perform a NATURAL JOIN


of Students and Registration (NATURAL JOIN implicitly joins on the basis
of common attribute, which here is rollno ), then the select operation( sigma)
keeps only those rows where the student is registered for courseno 107,
and percentage is > 90. And then the projection operation (pi) projects only
distinct student names from the set.

Note: Projection operation (pi) always gives the distinct result.


Option C:

This is a Tuple Relational Calculus (TRC) language expression,


It is not a procedural language (i.e. it only tells “what to do”,
not “how to do”). It just represents a declarative mathematical
expression.
Here T is a Tuple variable.

From left to right, it can be read like this, “It is a set of


tuples T, where, there exists a tuple S in Relation Students, and
there exist a tuple R in relation Registration, such that
S.rollno = R.rollno AND R.couseno = 107 AND R.percent > 90 AND
T.sname = S.sname”. And the schema of this result is (sname), i.e. each
tuple T will contain only student name, because only T.sname has been defined
in the expression.

As TRC is a mathematical expression, hence it is expected to give only distinct result set.
Option D:

This is a Domain Relational Calculus (DRC) language expression.


This is also not procedural. Here SN is a Domain Variable. It can be read
from left to right like this “The set of domain variable SN, where,
there exist a domain variable SR , and a domain variable Rp, such that,
SN and SR domain variables is in relation Students and SR,107,RP is a domain
variables set in relation Registration, AND RP > 90 “

Above, SN represents sname domain attribute in Students relation, SR


represents rollno domain attribute in Students relation, and RP represents
percentage domain attribute in Registration relation.
The schema for the result set is (SN), i.e. only student name.

As DRC is a mathematical expression, hence it is expected to


give only distinct result set.

GATE | GATE CS 2013 | Question 54 on 1 NF and 2NF


Relation R has eight attributes ABCDEFGH. Fields of R contain only atomic values. F = {CH -> G, A -> BC, B ->
CFH, E -> A, F -> EG} is a set of functional dependencies (FDs) so that F+ is exactly the set of FDs that hold for R.
How many candidate keys does the relation R have?
(A) 3
(B) 4
(C) 5
(D) 6

Answer: (B)

Explanation: A+ is ABCEFGH which is all attributes except D.


B+ is also ABCEFGH which is all attributes except D.
E+ is also ABCEFGH which is all attributes except D.
F+ is also ABCEFGH which is all attributes except D.
So there are total 4 candidate keys AD, BD, ED and FD
GATE | GATE CS 2013 | Question 55
Consider the FDs given in above question. The relation R is
(A) in 1NF, but not in 2NF.
(B) in 2NF, but not in 3NF.
(C) in 3NF, but not in BCNF.
(D) in BCNF

Answer: (A)

Explanation: The table is not in 2nd Normal Form as the non-prime attributes are dependent on subsets of
candidate keys.
The candidate keys are AD, BD, ED and FD. In all of the following FDs, the non-prime attributes are dependent on
a partial candidate key.
A -> BC
B -> CFH
F -> EG

Potrebbero piacerti anche