LESSON 4 Measures of Central Tendency
Introduction
Histograms and polygons provide a general idea as to how a data is distributed. When comparisons are to be made between data or further statistical analysis is to be done, exact measures are required to describe the characteristics of a data. These numerical measures are also referred to as summary statistics.
In this lesson and the following, we will discuss two measures that can be used to describe the characteristics of a distribution. They are measures of central tendency and measures of dispersion.
LEARNING OUTCOMES
Upon the completion of this lesson, you should be able to:
a) discuss mean, mode and median as measures of central tendency;
b) discuss the advantages and disadvantages of each of the central tendency values;
c) find the mean, mode and median for ungroup and grouped data from a given data set.
1
Measure of Central Tendency
When statisticians study a group of measurements, they try to determine which measure is most representative of the group. The score about which most of the other scores tend to cluster is a measure of central tendency. Three measures of central tendency are the mode, the median and the mean. A measure for central tendency is an average that represents the data. It pinpoints the center of the data. These measures are commonly known as averages. We will discuss three averages. They are the
i) arithmetic mean (or simply the mean),
ii) median, and
iii) mode.
Let us see how these measure are calculated from raw data, ungrouped and grouped frequency distribution. Note that all measures presented here correspond to measure made from sample data.
Raw Data and Ungrouped Frequency Distributions
Arithmetic mean If we have a sample of n observations, x _{1} , x _{2} , x _{3} , ……
,x
_{n} , the sample mean, denoted
by X , is defined as the sum of all observations divided by the sample size.
X =
n
i
1
x
i
n
Referring to example 3, the mean number of train stations a passenger passes before alighting the train is
X 

3 
4 
4 

2 
2 
4 
1 
2 

2 

0 
3 
2 
3 
2 
1 
3 

2 

2 
1 
2 

45 
2.25 
20 
20 

2 
This means that on average, a passenger would pass approximately 2 train stations before
getting off the train.
From the ungrouped frequency distribution in Table 1, we calculate the mean the
following formula:
k
f x
i
i
X
i
1
n
,
where
n
k
i 1
f
i and k = the number of class intervals.
Table 1
Number of train stations passed, 
Number of 
f X 
passengers, f 

X 

0 
1 
0 
1 
3 
3 
2 
9 
18 
3 
4 
12 
4 
3 
12 
Total 
f =20 
fx =45 
Hence, we have the mean,
2.25
In obtaining mean, all the observations from the sample or population are considered.
Therefore, if there exist extreme values (either too big or too small), then, mean is not a
suitable measure to represent the distribution of the data. The median would be a better
measure for central tendency.
Example 1
The marks of five candidates in a mathematics test with a maximum possible mark of 20 are given below.
15
13
19
18
14
Find the mean value.
3
Solution:
So, the mean value is 15.8.
Example 2
A survey was taken in Mathematics class regarding the number of story books read by each student in January. The table shows the class data with the frequency of responses. The mean of this data is 2.5. Find the value of k in the table.
Books 
1 
2 
3 
4 
5 

Frequency 
5 
k 
8 
4 
1 

Solution 

1(5) 

2( 
k 
) 

3(8) 4(4) 

5(1) _{} 2.5 

5 

k 
8 
4 

1 

50 

2 k 
2.5 

18 k 

50 0.5k 2k 45 5 2.5k k 10 
4 
Median
~
The median, denoted by X , is the middle value of the observations that has been
arranged in an ascending or descending order. If the number of observations is odd, the
median is the middle value, but if the number of observation is even, then the median is
the mean of the two middle values.
Let’s take the data from example 3 and arrange them in the ascending order as follows.
0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4
In this case, the number of observation is even; therefore the middle value is the midway
~ 2
2
^{2} 2
between the tenth and the eleventh value. Hence, X =
. In other words, 50% of
the observation will be below the median and the other 50% will be above the median.
Since median divides the observation into two, it is not affected by extreme values.
In an ungrouped frequency distribution, the median is obtained by looking at the point
where 50% of the frequency lies. The same value is obtained using the ungrouped
frequency distribution in Table 1.
Example 3
The marks of five candidates in a geography test for which the maximum possible mark was 20 are given below:
19 18
16
15
20
Find the median mark.
Solution:
Arrange the marks in ascending order of magnitude:
15 16
18
19
20
The third score, 18, is the middle one in this arrangement.
Median = 18
5
Note:
In general:
If the number of values in the data set is even, then the median is the average of the two
middle values.
Example 4
Find the median of the following scores:
11
17
15
20
9
12
Solution:
Arrange the score values in ascending order of magnitude:
9
There are 6 scores in the data set.
11
12
15
17
20
The third and fourth scores, 12 and 15, are in the middle. That is, there is no one middle value.
6
Note:
Half of the values in the data set lie below the median and half lie above the median.
Mode
The mode, denoted by Xˆ , is the most frequently occurring value in the observation. For the data in example 3, we find that the most frequently occurring value is 2. This means that most passengers pass 2 train stations before alighting.
In an ungrouped frequency distribution, we determine the mode by looking at the highest frequency, in this case 9. Hence the mode is 2.
The concept of mode is easy to understand and simple to obtain. Mode is not affected by extreme values. The disadvantage of this measure for central tendency is that it might not exist. A set of observations can have no mode, one mode, two or more modes.
Example 5
The marks awarded to seven pupils for an assignment were as follows:
19 
15 
19 
16 
13 
20 
19 
a. 
Find the median mark. 

b. 
State the mode. 
7
Solution:
a. 
Arrange the marks in ascending order of magnitude: 

13 
15 
16 
19 
19 
19 
20 
Note:
The fourth score, 19, is the middle data value in this arrangement.
Median = 19
[19 is the middle data value]
b. 19 is the score that occurs most often.
Mode = 19
Grouped Frequency Distributions
For data that have been summarized in grouped frequency distributions, the measures of
central tendency computed are only estimates of the true value. This is because accuracy
has been lost when summarizing the data.
Arithmetic Mean
In finding the mean from a grouped frequency distribution, we choose one value from
each class interval as a representative. This value is the class mark. We denote the class
mark as m. The formula for the mean is given below.
8
=
k
i ^{}^{1}
f m
i
i
n
,
k
where n = _{}
i 1
f i and k = number of class intervals
Using the grouped frequency distribution in Table 1, the mean speed of the 55 cars can be
estimated. We add more columns to the table so that it is easier to have the values to be
substituted in the formula.
Speed (km/h), 
Number of cars, 
Class mark, x 
fx 

X 
f 

45 
 49 
4 
47 
188 

50 
 54 
14 
52 
728 

55 
 59 
19 
57 
1083 

60 
 64 
7 
62 
434 

65 
 69 
5 
67 
335 

70 
 74 
4 
72 
288 

75 
 79 
2 
77 
154 

Total 
f

=55 
fx=3210 
_{X} =
3210 _{}
55
58.36
km/h
Notice that when we compare this value with the actual sample mean, there is a
difference. Obviously, this is due the lost of accuracy.
Example 6
Work out an estimate for the mean height.
Height (cm), 
Number of 
Mid Point (x) 
fx 

X 
people (f), 

101 
 120 
1 
110.5 
110.5 

121 
 130 
3 
125.5 
376.5 

131 
 140 
5 
135.5 
677.5 

141 
 150 
7 
144.5 
1018.5 

151 
 160 
4 
155.5 
622 

161 
 170 
2 
165.5 
331 

171 
 190 
1 
180.5 
180.5 

Total 
f

=23 
fx=3316.5 
mean = X =
fx
f
3316.5 _{}
23
144cm
(3sf)
9
Median
To estimate the median from a grouped frequency distribution, we must first locate the
class interval containing the
^{n}
2
th observation. We call this class interval the median
class. The estimated value of median can be obtained using the following formula.
where
~
X
n
2
f
m1
f
m
.C
= lower class boundary of the median class
f
m1
= cumulative frequency of classes before the median class
f
m = frequency of the median class
c = width of the median class
Referring to the grouped frequency distribution in Table 1, the median class is 55 – 59.
Speed (km/h), 
Number of cars, 
Class mark, x 
fx 

X 
f 

45 
 49 
4 
47 
188 
50 
 54 
14 
52 
728 
55 
 59 
19 
57 
1083 
60 
 64 
7 
62 
434 
65 
 69 
5 
67 
335 
70 
 74 
4 
72 
288 
75 
 79 
2 
77 
154 
Total 
f =55 
fx=3210 
The following information are obtained and substitute them into the formula for the value
of the median.
f
^{n}
2
m1
f
m
^{5}^{5}
=
=
=
=
c = 5
= 27.5
2
54.5
4 + 14 = 16
19
10
Median speed,
~ 
27.5 

18 

X 

54.5 

.5 

57 
km/ h 
. 
19
This means that at most 50% of the cars are being driven at 57 km/h or less and at most 50% of the cars are being driven above 57 km/h.
Mode
The class interval with the highest frequency is identified as the modal class. The formula
for mode for the grouped frequency distribution is
where
ˆ 1
X
1
2
.c
= lower class boundary of the modal class
1 = frequency of the modal class – frequency of the class before the modal class
2 = frequency of the modal class – frequency of the class after the modal class
c = width of the modal class
The modal class for Table 1 is 55 – 59. It is a coincidence that the median and modal
class is the same for this particular example. Do not make a generalization. There are
times where the median and modal class is not the same. We obtain the following
information to be substituted into the formula for mode.
= 54.5
1 = 19 – 14 = 5
2 = 19 – 7 = 12
c = 5
X
54.5
5
5
12
.5
55.97 km/ h
This means that most cars are being driven at approximately 56 km/h, which is below the
city speed limit.
When computing the mean, all the observations are taken into consideration, but
computing the median only involves the scores in the middle of the distribution. Hence,
11
whenever we have extreme scores in the distribution, or when the distribution is skewed,
the median is a better measure for the average.
Advantages of the MEAN:
The calculation of arithmetic mean is based on all values given in the data set.
The calculation of arithmetic mean is simple and it is unique, that is, every data
set has one and only one mean.
The arithmetic mean is reliable single value that reflects all values in the data
set.
can be used for further statistical calculations and mathematical manipulations.
Disadvantages of the MEAN
Easily affected by extreme values
In grouped data with open ended class intervals, the mean cannot be computed
Not Appropriate with Highly Skewed Data
Advantages of the MEDIAN
It is very simple to understand and easy to calculate. In some cases it is obtained simply by inspection.
Median lies at the middle part of the series and hence it is not affected by the extreme values.
Can be computed even for grouped data with open ended class intervals
In grouped frequency distribution it can be graphically located by drawing ogives.
It is especially useful in openended distributions since the position rather than the value of item that matters in median.
12
Disadvantages of Median
In simple data set, the item values have to be arranged. If the series contains large number of items, then the process becomes tedious.
It is a less representative average because it does not depend on all the items in the series.
Observations from different data sets have to be merged to obtain a new median, whether group or ungrouped data are involved
In simple data set, having even number of items, median cannot be exactly found.
Moreover, the interpolation formula applied in the continuous series is based on the unrealistic assumption that the frequency of the median class is evenly spread over the magnitude of the class interval of the median group.
Advantages of the MODE
Mode value is easy to understand and to calculate. Mode class can also be
located by inspection.
The mode is not affected by the extreme values in the distribution. The mode
value can also be calculated for openended frequency distributions.
The mode can be used to describe quantitative as well as qualitative data. For
example, its value is used for comparing consumer preferences for various
types of products, say cigarettes, soaps, toothpastes, or other products
Disadvantages of the MODE
Mode is not a rigidly defined measure as there are several methods for
calculating its value.
It is difficult to locate modal class in the case of multimodal frequency
distributions.
Mode is not suitable for algebraic manipulations.
13
Skewness Definition
Skewness in statistics has been developed with respect to symmetry; in fact, it is the opposite of symmetry. Symmetry is a concept that is used in defining distribution in terms of graphical representation. A distribution is said to be symmetric if it looks the same from both left and right side of the center point (refer figure 1). The center point is called the axis of symmetry. This graph shows an example of a symmetric distribution.
Figure 1
Here the measures of central tendency like mean, median and mode will always be equal to each other and the axis of symmetry which is the ordinate at the mean will divide the distribution into two equal parts such that one side will be a mirror image of the other.
So we can define skewness as a measure of asymmetry of the distribution that means, it helps to measure how much the distribution is not symmetric. It describes which side of the distribution has longer or shorter tail.
On the basis of the shapes interpreting statistic skewness can be done in three ways
Positive skewness: If the right tail is longer than the left tail in the graph of the distribution, the function is said to have positive skewness (refer figure 2). The presence of the extreme observations on the right hand side of a distribution makes it positively skewed.
14
So, if the mean > median> mode in any distribution, then it can be said to follow positive skewness.
Figure 2
When the distribution is skewed to the right (positively skewed), mean value is the largest among the three averages, followed by the median and then the mode. Why is this so? This is because the mean value is affected by the extreme values as compared to median and mode. Therefore, in when we have a positively skewed distribution, the mean value is not a suitable average to describe the distribution. The median and the mode would be more appropriate.
Negative skewness: If the left tail is longer than the right tail in the graph of the
distribution, then the function will have negative skewness (refer figure
presence of the extreme observations on the left hand side of a distribution makes it negatively skewed.
The
15
So, if the mean < median< mode in any distribution, then it can be said to follow negative skewness.
Figure 3
Zero skewness: If the two tails are of the same length and shape, then we say that the function has zero skewness (figure 4). Then the distribution will be normal and symmetric.
Figure 4
16
It is useful to report all three measures because their relative positions can provide some idea about the shape of the distribution. We give some of the common cases encountered. In summary the different types of graphical representation of skewness are presented below
17
Exercise 1
1. The table displays the frequency of scores on a Science quiz (max score 10). Find the median of the scores.
Score 
5 
6 
7 8 
9 
10 
Frequency 
1 
5 
8 14 
12 
7 
2. The table displays the number of cars owned in a family among students in Form 5 Science 1. Find the mean, median and mode of the cars owned per family for this data set.
Express answers to the nearest hundredth.
Cars owned 
0 
1 
2 
3 
4 
5 
Frequency 
2 
5 
4 
6 
10 
8 
3. Four students take an IQ test. Their scores are 96, 100, 106, 114. Find the mean and median scores.
4. Find the mean, median and mode for this grouped data of test scores.
Scores 
Frequency 
65 
2 
70 
3 
75 
2 
80 
5 
85 
8 
90 
7 
95 
5 
100 
3 
18
5.
The table shows the number of hours (x) children spent on watching television in a
week
Class Interval 
Midpoint (x) 
Frequency (f) 
fx 

10 x < 
15 
42 

15 
x < 20 
38 

20 
x < 
25 
45 

25 
x < 3 0 
38 


f = 
_{} 
fx = 
First complete the table. Using the information answer the questions:
(a) 
Find the modal class. 
(b) 
Estimate the median hours of watching TV. (whole number) 
(c) 
Estimate the mean hours of watching TV (2 d.p.) 
6. The table shows the number of visits(v) to the doctor patients at a surgery make in a year.
Class Interval 
Midpoint (x) 
Frequency (f) 
fx 

0 x 5 
5 

5 x < 10 
47 

10 
x < 
15 
11 


f = 
_{} 
fx _{=} 
First complete the table. Using this information answer the following questions.
(a) 
What is the modal class? 
(b) 
Estimate the median number of visits made to the surgery in a year. (whole number) 
(c) 
Estimate the mean number of visits made to the surgery in a year. (2 d.p.) 
7. 
The following frequency distribution shows the quiz scores of a sample of students: 
Score
Frequency
14 
 18 
2 
19 
 23 
5 
24 
 28 
12 
29 
 33 
1 
For the above data, compute the following.
a.
The mean
b.
The standard deviation
19
20
Molto più che documenti.
Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.
Annulla in qualsiasi momento.