Sei sulla pagina 1di 51

DESCRIPTIVE STATISTICS

STATISTICAL MEASUREMENT
OF DATA
Location (central tendency)
Dispersion (spread)
Skewness (symmetry)
Kurtosis (peakedness)
MEASURES OF LOCATION
Arithmetic mean
Geometric mean
Harmonic mean
Median
Percentiles
Mode
where n is the number of observations.
ARITHMETIC MEAN
(DISCRETE DATA)

=
=
n
i
i
x
n
x
1
1
where x
i
is the MCV and f
i
is the frequency of
of the i
th
class whereas n is the number of
classes.
ARITHMETIC MEAN
(GROUPED DATA)

=
=
=
n
i
i
n
i
i i
f
x f
x
1
1
GEOMETRIC MEAN
The geometric mean is used where
relative changes (especially percentages)
are being considered.
HARMONIC MEAN
The harmonic mean is used when the data
consist of rates such as prices ($/kg),
speeds (km/h) or production (output/man-
hour).
where n is the number of observations.
GEOMETRIC MEAN
n
n
x x x Mean Geometric . . . .
2 1
=
MEDIAN
The median is the middle observation
of a set of arranged data (ascending
or descending order), i. e, it divides
the set of data into two equal parts in
terms of the number of observations.
MEDIAN
The rank of the median is given by



where n is the total number of
observations.
2
) 1 ( + n
MEDIAN (DISCRETE DATA)
When n is odd, the median is the
middle observation.

When n is even, the median is the
average or midpoint of the two
middle observations.
MEDIAN (DISCRETE DATA)
Example 1:
27 13 62 5 44 29 16
Rearranged:
5 13 16 27 29 44 62

Rank of median = (7 + 1) / 2 = 4
Median = 27
MEDIAN (DISCRETE DATA)
Example 2:
5 13 16 27 29 44

Rank of median = (6 + 1) / 2 = 3.5
Median = (16 + 27) / 2 = 21.5
MEDIAN (GROUPED DATA)
In this case, the value of the median
can only be estimated since the
identity of each observation is
unknown in the whole frequency
distribution.
MEDIAN (GROUPED DATA)
We proceed as follows:
Determine the rank of the median
Locate the cell in which the median is
found
Use linear interpolation or simple
proportion to evaluate the median
MEDIAN (GROUPED DATA)
The method of linear interpolation
assumes that the observations within
each cell are evenly spread or uniformly
distributed.
MEDIAN (GROUPED DATA)
where is the rank of the median
in its cell. This is obtained by taking the
overall rank of the median and
subtracting the cumulative frequency of
the previous cell.
|
|
.
|

\
|
+ = width cell
frequency
R
LCB Median
median
median
R
MEDIAN (E. g. GROUPED DATA)

Marks No. of
students
Less than
CF
0 4 2 2
5 9 8 10
10 14 14 24
15 19 17 41
20 24 9 50
Total 50
MEDIAN (E. g. GROUPED DATA)
n = 50
Rank of median = (50 + 1) / 2 = 25.5
Location of median: cell 15 19

94 . 14 5
17
) 24 5 . 25 (
5 . 14 =
|
.
|

\
|

+ = Median
MEDIAN (E. g. GROUPED DATA)
19.5
25

14.5
25.5 41
Ranks
Values
Q
2

PERCENTILES
Percentiles are statistics which divide a
distribution into 100 equal parts in terms
of the number of observations.

The most well-known ones are quartiles
and deciles.
QUARTILES
The rank of the first or lower quartile
(Q
1
) is given by


The rank of the third or upper quartile
(Q
3
) is given by

where n is the total number of
observations.
4
) 1 ( 3 + n
4
) 1 ( + n
QUARTILES (DISCRETE DATA)
Example 1:
27 13 62 5 44 29 16
Rearranged:
5 13 16 27 29 44 62

Rank of Q
1
= (7 + 1) / 4 = 2
Q
1
= 13
Rank of Q
3
= 3(7 + 1) / 4 = 6
Q
3
= 44

QUARTILES (DISCRETE DATA)
Example 2:
5 13 16 27 29 44

Rank of Q
1
= (6 + 1) / 4 = 1.75
Q
1
= 5 + 0.75(13 5) = 11

Rank of Q
3
= 3(6 + 1) / 4 = 5.25
Q
3
= 29 + 0.25(44 29) = 32.75

PERCENTILES
where is the rank of the k
th
percentile
in its cell. This is obtained by taking the
overall rank of the percentile and
subtracting the cumulative frequency of
the previous cell.
|
|
.
|

\
|
+ = width cell
frequency
R
LCB P
k
k
k
R
PERCENTILES
Percentiles can be estimated from a
cumulative frequency ogive by
interpolation.
MODE (DISCRETE DATA)
The mode is the observation occurring the
most or which has the highest frequency. It
can be easily located by visual inspection.
NOTE If there are more than one
observation with the same highest frequency
we say that there are several modes but we
can also say that there is no mode.
MODE (GROUPED DATA)
In this case, we talk about a modal class,
which is the class with the highest frequency.
A rough approximation for a single value of
the mode is the MCV of the modal class.

The mode can be found quite accurately by
using a formula or from a histogram.
MODE (GROUPED DATA)
A useful formula for finding the mode is


Mode = mean 3(mean median)
MODE (GROUPED DATA)
where f
1
is the difference in frequencies
between the modal class and the class
preceding it and f
2
is the difference in
frequencies between the modal class and the
class immediately after it.
|
|
.
|

\
|

+
+ = width cell
f f
f
LCB Mode
2 1
1
MODE (GROUPED DATA)
We can also use a histogram to find the
mode. We simply represent the modal
class and the classes preceding it and
immediately after it.
MODE (GROUPED DATA)
0
5
10
15
20
25
30
35
40
45
50
20-40 40-60 60-80
Age of people
F
r
e
q
u
e
n
c
y
MEASURES OF DISPERSION
Range
Quartile deviation
Standard deviation
Coefficient of variation
RANGE (DISCRETE DATA)
The range is the numerical difference
between the maximum and the
minimum observations
RANGE (GROUPED DATA)
The range is the numerical difference
between the upper cell limit of the last
cell and lower cell limit of the first
cell.
QUARTILE DEVIATION
The quartile deviation or semi inter-
quartile range is defined as



This quantity eliminates outliers and
extreme values.
2
1 3
Q Q
deviation Quartile

=
STANDARD DEVIATION
AND VARIANCE
The standard deviation is the positive
square root of the variance. All
formulae are given in terms of the
variance which is equal to

= =
= =
n
i
i
n
i
i
x x
n
x x
n
s
1
2 2
1
2 2
1
) (
1
STANDARD DEVIATION
(discrete data)
The standard deviation is the best
measure of spread since it can be used
for further statistical processing.
n
x x
s
n
i
i

=

=
1
2
) (
STANDARD DEVIATION
(grouped data)





with the usual definitions of x
i
and f
i
.

=
=

=
n
i
i
n
i
i i
f
x x f
s
1
1
2
) (
COEFICIENT OF VARIATION
The purpose of the coefficient of
variation is to compare dispersions in
various distributions.
x
s
iation var of t Coefficien =
SKEWNESS
Skewness is a measure of symmetry. It
indicates whether there is a concentration
of low or high observations.

A distribution having a lot of low
observations is positively skewed whereas
one which has more high observations
displays negative skewness.
SKEWNESS
A distribution which is symmetrical
has no or zero skewness (e. g the
Normal distribution)
MEASURE OF SKEWNESS
Coefficient of skewness:

deviation dard tan S
Mode Mean
deviation dard tan S
Median Mean ) ( 3
POSITIVE SKEWNESS
Mean Q
2

Mode
NEGATIVE SKEWNESS
Mean Mode Q
2

ZERO SKEWNESS (SYMMETRY)
Mode
Mean
Median
KURTOSIS
Kurtosis indicates the degree of
peakedness of a unimodal
frequency distribution.
Kurtosis usually indicates to which
extent a curve (distribution) departs
from the bell-shaped or normal curve.
KURTOSIS
Platykurtic
KURTOSIS
Mesokurtic
KURTOSIS
Leptokurtic
KURTOSIS
The formulae for calculating kurtosis
are given by



4
4
) (
ns
x x


= |
4
4
) (
ns
x x f


= |

Potrebbero piacerti anche