Sei sulla pagina 1di 9

Introduction to Analytical Functions in SQL Server

July 24th, 2013 amit Leave a comment Go to comments


Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate
functions in that they return multiple rows for each group. Let us look at analytical functions in SQL
Server
Four different parts of an Analytical clause:
the analytical function, for example AVG, LEAD, PERCENTILE_RANK
the partitioning clause, for example PARTITION BY job or PARTITION BY dept, job
the order by clause, for example order by job nulls last
the windowing clause, for example RANGE UNBOUNDED PRECEDING or ROWS
UNBOUNDED FOLLOWING
Notes: when no window clause is present then window is equal to partition.
Aggregate functions perform a calculation on a set of values and return a single value. With the
exception of COUNT, aggregate functions ignore null values. Aggregate functions are often used with
the GROUP BY clause of the SELECT statement.
Aggregate functions in T-SQL
AVG

MAX

BINARY_CHECKSUM MIN
CHECKSUM

SUM

CHECKSUM_AGG

STDEV

COUNT

STDEVP

COUNT_BIG

VAR

GROUPING

VARP

Example for aggregate function


SELECT MAX(st1.field3) FROM table1
Example for Analytical function
select ename

job

sal

hiredate

first_value(sal) over ( partition by job


order by

hiredate

range between current row


and

unbounded following

) job_avg
,

first_value(sal) over ( partition by job


order by
rows

hiredate
between current row

and 2 following
) job_avg
from emp
where sal < 2500
order
by

job

SQL Server Analytical Functions


Analytical functions are introduced in 2005.
Analytical Functions available in SQL Server 2005 -2008
RANK: Assigns a unique number for each row starting with 1, except for rows that have
duplicate values, in which case the same ranking is assigned and a gap appears in the sequence
for each duplicate ranking.
DENSE_RANK. This is same as RANK () function. Only difference is returns rank without
gaps.
ROW_NUMBER. Returns the sequential row number of the result set, starting at 1 for the first
row in each partition. For rows that have duplicate values, numbers are arbitrarily assigned.
NTILE. Divides an ordered partition into a specified number of groups. Each group is assigned
a number. If the number of rows isnt divisible by the number of groups, the first few groups
will have more rows than the latter groups. Otherwise, if the rows are divisible by the group

number, each group will have the same number of rows.

New Analytical functions available in SQL Server 2012


CUME_DIST: calculates the cumulative distribution of a value in a group of values. The range
of values returned by CUME_DIST is >0 to <=1, which represents percentage of number of
rows with value less than (for ascending order) or equal to current row.
Syntax:
Without Partition Clause
SELECT SalesOrderID, OrderQty,
CUME_DIST() OVER(ORDER BY SalesOrderID) AS CDist
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)

ORDER BY CDist DESC


With Partition Clause
SELECT SalesOrderID, OrderQty, ProductID,
CUME_DIST() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) AS CDist
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.SalesOrderID DESC, CDist DESC

FIRST_VALUE: Returns the first value in an ordered set of values in SQL Server 2012.
Syntax:1) FIRST_VALUE (column) over (partition by column order by column)
2) FIRST_VALUE (column) ignore nulls over (partition by column order by column)

Above will ignore null columns


3) FIRST_VALUE (column) over (partition by column order by column) ROWS UNBOUNDED
PRECEDING
LAST_VALUE: Returns the highest value in the windows or set of result.
Syntax:1) LAST_VALUE (column) over (partition by column order by column)
2) LAST_VALUE (column) ignore nulls over (partition by column order by column)
Above will ignore null columns
3) LAST_VALUE (column) over (partition by column order by column) ROWS UNBOUNDED
PRECEDING
SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty,
FIRST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FstValue,
LAST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) LstValue
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty

Notes
When you ORDER a set of records in analytic functions you can specify a range of rows to consider,
ignoring the others. You can do this using the ROWS clause

UNBOUNDED PRECEDING the range starts at the first row of the partition.
UNBOUNDED FOLLOWING The range ends at the last row of the partition.
CURRENT ROW range begins at the current row or ends at the current row
n PRECEDING or n FOLLOWING The range starts or ends n rows before or after the current row
LEAD and LAG: The LAG/LEAD functions return a column from a previous/following row in
the partition, with respect to the current row, as specified by the row offset in the function,
without the use of a self-join.
SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty,
LEAD(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID
) LeadValue,
LAG(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID
) LagValue
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty

PERCENTILE_CONT (): Calculates a percentile based on a continuous distribution of the


column value. It is similar to median. The return type is float(53) and the value of percentile

should be between 0 and 1.


PERCENTILE_DISC (): Computes a specific percentile for sorted values in an entire rowset or
within distinct partitions of a rowset. For a given percentile value P, PERCENTILE_DISC sorts
the values of the expression in the ORDER BY clause and returns the value with the smallest
CUME_DIST value (with respect to the same sort specification) that is greater than or equal to
P.
SELECT SalesOrderID, OrderQty, ProductID,
CUME_DIST() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) AS CDist,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS PercentileDisc
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC

PERCENT_RANK():This function returns relative standing of a value within a query result set
or partition.
The formula to find PERCENT_RANK () is as following:

PERCENT_RANK () = (RANK () 1) / (Total Rows 1)


SELECT SalesOrderID, OrderQty, ProductID,
RANK() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) Rnk,
PERCENT_RANK() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) AS PctDist
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY PctDist DESC

More Analytical functions


There is a commercial add-in package for SQL Server that adds literally hundreds of analytic functions
to SQL 2005/08/12, including PERCENTILE and PERCENTRANK.
http://westclintech.com/Products/XLeratorDBstatistics/XLeratorDBstatisticsDocumentation/tabid/159/t
opic/PERCENTILE/Default.aspx

Analytical Functions in Oracle


Here is the list of analytic functions in oracle as of available till 10.2g.
AVG
CORR
COVAR_POP
COVAR_SAMP
COUNT
CUME_DIST
DENSE_RANK
FIRST
FIRST_VALUE
LAG

LAST
LAST_VALUE
LEAD
MAX
MIN
NTILE
PERCENT_RANK
PERCENTILE_CONT
PERCENTILE_DISC
RANK
RATIO_TO_REPORT
REGR_ (Linear Regression) Functions
ROW_NUMBER
STDDEV
STDDEV_POP
STDDEV_SAMP
SUM
VAR_POP
VAR_SAMP
VARIANCE
New analytical functions in 11g
NTH_VALUE
ListAgg
Details
http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions004.htm
Reference Links
http://msdn.microsoft.com/en-us/library/hh213234.aspx
http://technology.amis.nl/2004/10/04/analytical-sql-functions-theory-and-examples-part-2-on-theorder-by-and-windowing-clauses/2/
http://blog.sqlauthority.com/2007/10/09/sql-server-2005-sample-example-of-ranking-functionsrow_number-rank-dense_rank-ntile/

Potrebbero piacerti anche