Sei sulla pagina 1di 7

Measures of Variability

I. Range

The range for a set of data items is the difference between the largest and smallest
values. Although the range is the easiest of the numerical measures of variability
to compute, it is not widely used because it is based on only two of the items in the
data set and thus is influenced too much by extreme data values.

II. Interquartile Range

A form of the range that avoids the dependence on extreme values in the data set is
the interquartile range (IQR), or Q-spread. This descriptive measure of variability
is simply the difference between the third quartile (Q3 ) , or 75%-tile data item, and
the first quartile (Q1 ) , or 25%-tile data item. In effect, it is showing the range for
the middle 50% of the data and, as such, is not affected by the extreme values in the
3
data set. To calculate Q3 , let i = N where N is the number of data items. If i is
4
not an integer, then the next integer greater than i denotes the position of the 75%-tile;
if i is an integer, then the 75%-tile is the average of the data values in positions i and
1
i + 1. Similarly, to calculate Q1 , let i = N and follow the same guidelines as
4
above.

Example 1: Given the following data: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29. Find the
IQR.

3
N = 10 ⇒ i = (10) = 7.5 ⇒ Q3 is the 8th data item ⇒ Q3 = 19 . Next,
4
1
i= (10 ) = 2.5 ⇒ Q1 is the 3rd data item ⇒ Q1 = 5 . Therefore, IQR = 19-5 =
4
14.

Example 2: Given the following data: 2, 3, 5, 7, 11, 13, 17, 19. Find the IQR.

3
N =8⇒i = (8) = 6 ⇒ Q3 is the average of the data values in the 6th and 7th
4
13 + 17 1
positions ⇒ Q3 = = 15 . Next, i = (8) = 2 ⇒ Q1 is the average of the
2 4
3+5
values in the 2nd and 3rd positions ⇒ Q1 = = 4. Therefore, IQR = 15-4 = 11.
2
1

III. Average Absolute Deviation from the Mean

Obviously, there are limitations in using range or interquartile range as measures of


variability. It would seem reasonable that any useful measure of variability should
measure the spread around the mean since the mean is the “balance point” of a
distribution. If you find the difference between each data item and the mean, you
will get negative values for items that are less than the mean and positive values for
items greater than the mean. If you then sum up all of these differences, you will get
zero; this illustrates a special property of the mean. However, by taking the absolute
value of each difference, you will get the distance of each item from the mean, and
the sum of these distances would measure the total spread around the mean. If you
were to include more data items, equally spread around the mean, you would
increase the total of the distances even though the new distribution might be less
variable. Therefore, it is important to divide the total absolute deviation by the
number of data items; this will give an average absolute deviation from the mean.

ΣX −X
Average Absolute Deviation =
N

This average absolute deviation gives the average distance of any data item from the
mean and thus is a good measure of spread.

IV. Standard Deviation

If you were to calculate the average absolute deviation of a distribution using a value
other than the mean, you could possibly get a smaller average absolute deviation.
This result is one of the reasons that the average absolute deviation is not the best
measure of variability. Instead, calculate the average of the squared differences from
the mean; this is the variance of a distribution. If you were to calculate the average
of the squared differences of a distribution by using a value other than the mean,
you would always get a larger value. The mean is the one number that minimizes
the average of the squared differences in a distribution.

Σ( X − X ) 2
Variance = σ 2 =
N

There are still two slight inconveniences in using variance as our measure of
variability. First, variance does not give an estimate of the distance of a typical data
from the mean; it is too big. Second, if the data items have a unit of measurement
associated with them, then the variance would not have the same unit of measure-
ment; it would have square units. By taking the square root of variance, we get
standard deviation, which is the measure of variability that we want.
2
2
Standard Deviation = σ = Σ( X − X )
N

The standard deviation can be calculated in an alternative way.

2
Standard Deviation = σ = ΣX − X 2
N

Example: Given the following histogram, estimate the standard deviation.

%/cig

2 30%
40%
(.5) 20%
10%
0
0 10 20 40 80
Number of cigarettes

Recall that the mean of a histogram can be determined by calculating a “weighted”


average using the midpoints of the class intervals and the areas of the blocks. Thus,
X =.1(5) +.3(15 ) +.4(30 ) +.2(60 ) =.5 +4.5 +12 +12 =29 cigarettes. The
standard
deviation of a histogram can also be calculated using the midpoints of the class
interval, the area of the blocks, and the “weighted” average.

Using the first formula, we get:

.1(5 − 29 ) 2 + .3(15 − 29 ) 2 + .4(30 − 29 ) 2 + .2(60 − 29 ) 2


SD = σ = ≈ 17 .6 cig
.1 + .3 + .4 + .2

Using the second formula, we get:

.1(5 2 ) + .3(15 2 ) + .4(30 2 ) + .2(60 2 )


SD = σ = − 29 2 ≈ 17 .6 cig
.1 + .3 + .4 + .2

3
Important Note:

Some textbooks will give the following formulas for variance and standard deviation:

2 2 2
Variance = s 2 = Σ( X − X ) = ΣX − N X
N −1 N −1

2 2 2
Standard Deviation = s = Σ( X − X ) = ΣX − N X
N −1 N −1

These formulas should be used when N data items are taken as a sample from a
larger population in which the variance and standard deviation of that population are
unknown. These formulas give good approximations of the variance and standard
deviation of the population.

Practice Sheet – Measures of Variability


I. The following are 25 final averages in a math class:

46 64 72 79 89
49 66 74 79 91
53 66 75 80 94
60 67 76 83 95
61 71 79 88 98

(1) What is the range?


(2) What is the interquartile range?

II. Given the following data: 5, 7, 11, 12, 13, 18.

(1) What is the mean?


(2) What is the average absolute deviation from the mean?
(3) What is the median?
(4) What is the average absolute deviation from the median?
(5) What is the standard deviation?
(6) Add 8 to each item. What is the new SD?
(7) Subtract 7 from each item. What is the new SD?
(8) Multiply each item by 7. What is the new SD?
(9) Divide each item by 5. What is the new SD?

4
III. In the histogram given below, the class intervals include the right endpoint, not the
left:

%/$1000

1.25

1.00

0.75

0.50

0.25

0
0 20 40 80 100 120
Income (in $1000)

(1) What is the estimated mean?


(2) What is the estimated standard deviation?
(3) What is the estimated interquartile range?

IV. Class A Class B


N = 20 N = 30
X = 70 X =80
σ X = 10 σX =6

(1) Find ΣX for class A.


(2) Find ΣX for class B.
(3) Find ΣX for the two classes combined.
(4) Find X for the combined classes.
(5) Find ΣX 2 for class A. [Hint: Use the alternative formula for SD.]
(6) Find ΣX 2 for class B.
(7) Find ΣX 2 for the combined classes.
(8) Find σ X for the combined classes.

Solution Key for Measures of Variability


I. (1) 98 – 46 = 52
(2) 83 – 66 = 17

II. (1) 11
(2) 3 1
3
(3) 11.5
(4) 3 1
3
(5) 4.2
(6) 4.2
(7) 4.2
(8) 29.4
(9) .84

III. (1) 56
(2) 26
(3) 76 – 35 = 41

IV. (1) 1400


(2) 2400
(3) 3800
(4) 76
(5) 100,000
(6) 193,080
(7) 293,080
(8) 9.25

Potrebbero piacerti anche