Lesson 4 Measure of Central Tendency

1
LESSON 4
Measures of Central Tendency

Introduction

Histograms and polygons provide a general idea as to how a data is distributed. When
comparisons are to be made between data or further statistical analysis is to be done,
exact measures are required to describe the characteristics of a data. These numerical
measures are also referred to as summary statistics.

In this lesson and the following, we will discuss two measures that can be used to
describe the characteristics of a distribution. They are measures of central tendency and
measures of dispersion.

LEARNING OUTCOMES

Upon the completion of this lesson, you should be able to:

a) discuss mean, mode and median as measures of central tendency;
b) discuss the advantages and disadvantages of each of the central tendency values;
c) find the mean, mode and median for ungroup and grouped data from a given data set.

2
Measure of Central Tendency

When statisticians study a group of measurements, they try to determine which measure
is most representative of the group. The score about which most of the other scores tend
to cluster is a measure of central tendency. Three measures of central tendency are the
mode, the median and the mean. A measure for central tendency is an average that
represents the data. It pinpoints the center of the data. These measures are commonly
known as averages.
We will discuss three averages. They are the
i) arithmetic mean (or simply the mean),
ii) median, and
iii) mode.
Let us see how these measure are calculated from raw data, ungrouped and grouped
frequency distribution. Note that all measures presented here correspond to measure
made from sample data.

Raw Data and Ungrouped Frequency Distributions

Arithmetic mean
If we have a sample of n observations, x
1
, x
2
, x
3
, ..,x
n
, the sample mean, denoted
by X , is defined as the sum of all observations divided by the sample size.
X =
n
x
n
1 i
i

Referring to example 3, the mean number of train stations a passenger passes before
alighting the train is

25 . 2
20
45
20
2 1 2 2 3 1 2 3 2 3 0 2 2 1 4 2 2 4 4 3
X

3
This means that on average, a passenger would pass approximately 2 train stations before
getting off the train.

From the ungrouped frequency distribution in Table 1, we calculate the mean the
following formula:

n
x f
X
k
1 i
i i
, where
k
1 i
i
f n and k = the number of class intervals.
Table 1
Number of train
stations passed,
X
Number of
passengers, f
f X
0 1 0
1 3 3
2 9 18
3 4 12
4 3 12
Total
f =20 fx =45

Hence, we have the mean, 25 . 2
20
45
X
In obtaining mean, all the observations from the sample or population are considered.
Therefore, if there exist extreme values (either too big or too small), then, mean is not a
suitable measure to represent the distribution of the data. The median would be a better
measure for central tendency.

Example 1
The marks of five candidates in a mathematics test with a maximum possible mark of 20
are given below.
15 13 19 18 14
Find the mean value.
4
Solution:

So, the mean value is 15.8.

Example 2
A survey was taken in Mathematics class regarding the number of story books read by
each student in January. The table shows the class data with the frequency of responses.
The mean of this data is 2.5. Find the value of k in the table.
Books 1 2 3 4 5
Frequency 5 k 8 4 1
Solution
5 . 2
1 4 8 5
) 1 ( 5 ) 4 ( 4 ) 8 ( 3 ) ( 2 ) 5 ( 1

k
k

5 . 2
18
2 50
k
k

k k 5 . 2 45 2 50
5 5 . 0 k
10 k

5
Median
The median, denoted by X
~
, is the middle value of the observations that has been
arranged in an ascending or descending order. If the number of observations is odd, the
median is the middle value, but if the number of observation is even, then the median is
the mean of the two middle values.

Lets take the data from example 3 and arrange them in the ascending order as follows.
0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4
In this case, the number of observation is even; therefore the middle value is the midway
between the tenth and the eleventh value. Hence, X
~
= 2
2
2 2
. In other words, 50% of

the observation will be below the median and the other 50% will be above the median.
Since median divides the observation into two, it is not affected by extreme values.
In an ungrouped frequency distribution, the median is obtained by looking at the point
where 50% of the frequency lies. The same value is obtained using the ungrouped
frequency distribution in Table 1.

Example 3
The marks of five candidates in a geography test for which the maximum possible mark
was 20 are given below:
19 18 16 15 20
Find the median mark.
Solution:
Arrange the marks in ascending order of magnitude:
15 16 18 19 20
The third score, 18, is the middle one in this arrangement.
Median = 18
6
Note:

In general:

If the number of values in the data set is even, then the median is the average of the two
middle values.

Example 4
Find the median of the following scores:
11 17 15 20 9 12
Solution:
Arrange the score values in ascending order of magnitude:
9 11 12 15 17 20
There are 6 scores in the data set.

The third and fourth scores, 12 and 15, are in the middle. That is, there is no one middle
value.
7

Note:
Half of the values in the data set lie below the median and half lie above the median.

Mode
The mode, denoted by X
, is the most frequently occurring value in the observation. For

the data in example 3, we find that the most frequently occurring value is 2. This means
that most passengers pass 2 train stations before alighting.

In an ungrouped frequency distribution, we determine the mode by looking at the highest
frequency, in this case 9. Hence the mode is 2.

The concept of mode is easy to understand and simple to obtain. Mode is not affected by
extreme values. The disadvantage of this measure for central tendency is that it might not
exist. A set of observations can have no mode, one mode, two or more modes.
Example 5
The marks awarded to seven pupils for an assignment were as follows:
19 15 19 16 13 20 19
a. Find the median mark.
b. State the mode.

8
Solution:
a. Arrange the marks in ascending order of magnitude:
13 15 16 19 19 19 20

Note:
The fourth score, 19, is the middle data value in this arrangement.
Median = 19 [19 is the middle data value]
b. 19 is the score that occurs most often.
Mode = 19

Grouped Frequency Distributions

For data that have been summarized in grouped frequency distributions, the measures of
central tendency computed are only estimates of the true value. This is because accuracy
has been lost when summarizing the data.

Arithmetic Mean
In finding the mean from a grouped frequency distribution, we choose one value from
each class interval as a representative. This value is the class mark. We denote the class
mark as m. The formula for the mean is given below.
9
=
n
m f
k
i
i i
1
, where n =
k
1 i
i
f and k = number of class intervals
Using the grouped frequency distribution in Table 1, the mean speed of the 55 cars can be
estimated. We add more columns to the table so that it is easier to have the values to be
substituted in the formula.
Speed (km/h),
X
Number of cars,
f
Class mark, x fx
45 - 49 4 47 188
50 - 54 14 52 728
55 - 59 19 57 1083
60 - 64 7 62 434
65 - 69 5 67 335
70 - 74 4 72 288
75 - 79 2 77 154
Total
f =55

fx=3210

X = 36 . 58
55
3210
km/h
Notice that when we compare this value with the actual sample mean, there is a
difference. Obviously, this is due the lost of accuracy.
Example 6
Work out an estimate for the mean height.

Height (cm),
X
Number of
people (f),
Mid Point (x) fx
101 - 120 1 110.5 110.5
121 - 130 3 125.5 376.5
131 - 140 5 135.5 677.5
141 - 150 7 144.5 1018.5
151 - 160 4 155.5 622
161 - 170 2 165.5 331
171 - 190 1 180.5 180.5
Total
f =23

fx=3316.5

mean = X =
f
fx
cm 144
23
5 . 3316
(3sf)
10
Median
To estimate the median from a grouped frequency distribution, we must first locate the
class interval containing the
2
n
th observation. We call this class interval the median
class. The estimated value of median can be obtained using the following formula.
C .
f
f
2
n
X
~
m
1 m

where = lower class boundary of the median class

1 m
f = cumulative frequency of classes before the median class

m
f = frequency of the median class
c = width of the median class

Referring to the grouped frequency distribution in Table 1, the median class is 55 59.
Speed (km/h),
X
Number of cars,
f
Class mark, x fx
45 - 49 4 47 188
50 - 54 14 52 728
55 - 59 19 57 1083
60 - 64 7 62 434
65 - 69 5 67 335
70 - 74 4 72 288
75 - 79 2 77 154
Total
f =55

fx=3210

The following information are obtained and substitute them into the formula for the value
of the median.

2
n
=
2
55
= 27.5
= 54.5

1 m
f = 4 + 14 = 16

m
f = 19
c = 5
11
Median speed, h / km 57 5 .
19
18 5 . 27
5 . 54 X
~
.
This means that at most 50% of the cars are being driven at 57 km/h or less and at most
50% of the cars are being driven above 57 km/h.

Mode
The class interval with the highest frequency is identified as the modal class. The formula
for mode for the grouped frequency distribution is
c .
2 1
1
X

where = lower class boundary of the modal class
1 = frequency of the modal class frequency of the class before the modal class
2 = frequency of the modal class frequency of the class after the modal class
c = width of the modal class

The modal class for Table 1 is 55 59. It is a coincidence that the median and modal
class is the same for this particular example. Do not make a generalization. There are
times where the median and modal class is not the same. We obtain the following
information to be substituted into the formula for mode.
= 54.5
1 = 19 14 = 5
2 = 19 7 = 12
c = 5
h / km 97 . 55 5 .
12 5
5
5 . 54 X

This means that most cars are being driven at approximately 56 km/h, which is below the
city speed limit.
When computing the mean, all the observations are taken into consideration, but
computing the median only involves the scores in the middle of the distribution. Hence,
12
whenever we have extreme scores in the distribution, or when the distribution is skewed,
the median is a better measure for the average.

Advantages of the MEAN:

The calculation of arithmetic mean is based on all values given in the data set.
The calculation of arithmetic mean is simple and it is unique, that is, every data
set has one and only one mean.
The arithmetic mean is reliable single value that reflects all values in the data
set.
can be used for further statistical calculations and mathematical manipulations.

Disadvantages of the MEAN

Easily affected by extreme values

In grouped data with open ended class intervals, the mean cannot be computed

Not Appropriate with Highly Skewed Data

Advantages of the MEDIAN

It is very simple to understand and easy to calculate. In some cases it is obtained
simply by inspection.
Median lies at the middle part of the series and hence it is not affected by the
extreme values.
Can be computed even for grouped data with open ended class intervals
In grouped frequency distribution it can be graphically located by drawing ogives.
It is especially useful in open-ended distributions since the position rather than the
value of item that matters in median.

13
Disadvantages of Median

In simple data set, the item values have to be arranged. If the series contains large
number of items, then the process becomes tedious.

It is a less representative average because it does not depend on all the items in
the series.

Observations from different data sets have to be merged to obtain a new median,
whether group or ungrouped data are involved

In simple data set, having even number of items, median cannot be exactly found.

Moreover, the interpolation formula applied in the continuous series is based on
the unrealistic assumption that the frequency of the median class is evenly spread
over the magnitude of the class interval of the median group.

Advantages of the MODE

Mode value is easy to understand and to calculate. Mode class can also be
located by inspection.
The mode is not affected by the extreme values in the distribution. The mode
value can also be calculated for open-ended frequency distributions.
The mode can be used to describe quantitative as well as qualitative data. For
example, its value is used for comparing consumer preferences for various
types of products, say cigarettes, soaps, toothpastes, or other products

Disadvantages of the MODE

Mode is not a rigidly defined measure as there are several methods for
calculating its value.
It is difficult to locate modal class in the case of multi-modal frequency
distributions.
Mode is not suitable for algebraic manipulations.
14
Skewness Definition

Skewness in statistics has been developed with respect to symmetry; in fact, it is the
opposite of symmetry. Symmetry is a concept that is used in defining distribution in
terms of graphical representation. A distribution is said to be symmetric if it looks the
same from both left and right side of the center point (refer figure 1). The center point is
called the axis of symmetry. This graph shows an example of a symmetric distribution.

Figure 1

Here the measures of central tendency like mean, median and mode will always be equal
to each other and the axis of symmetry which is the ordinate at the mean will divide the
distribution into two equal parts such that one side will be a mirror image of the other.

So we can define skewness as a measure of asymmetry of the distribution that means, it
helps to measure how much the distribution is not symmetric.
It describes which side of the distribution has longer or shorter tail.

On the basis of the shapes interpreting statistic skewness can be done in three ways

Positive skewness: If the right tail is longer than the left tail in the graph of the
distribution, the function is said to have positive skewness (refer figure 2). The presence
of the extreme observations on the right hand side of a distribution makes it positively
skewed.
15
So, if the mean > median> mode in any distribution, then it can be said to follow positive
skewness.

Figure 2

When the distribution is skewed to the right (positively skewed), mean value is the
largest among the three averages, followed by the median and then the mode. Why is this
so? This is because the mean value is affected by the extreme values as compared to
median and mode. Therefore, in when we have a positively skewed distribution, the mean
value is not a suitable average to describe the distribution. The median and the mode
would be more appropriate.

Negative skewness: If the left tail is longer than the right tail in the graph of the
distribution, then the function will have negative skewness (refer figure 3). . The
presence of the extreme observations on the left hand side of a distribution makes it
negatively skewed.

16
So, if the mean < median< mode in any distribution, then it can be said to follow negative
skewness.

Figure 3

Zero skewness: If the two tails are of the same length and shape, then we say that the
function has zero skewness (figure 4). Then the distribution will be normal and
symmetric.

Figure 4

17
It is useful to report all three measures because their relative positions can provide some
idea about the shape of the distribution. We give some of the common cases encountered.
In summary the different types of graphical representation of skewness are presented
below

18
Exercise 1
1. The table displays the frequency of scores on a Science quiz (max score 10). Find the
median of the scores.
Score 5 6 7 8 9 10
Frequency 1 5 8 14 12 7
2. The table displays the number of cars owned in a family among students in Form
5 Science 1. Find the mean, median and mode of the cars owned per family for
this data set.
Express answers to the nearest hundredth.
Cars owned 0 1 2 3 4 5
Frequency 2 5 4 6 10 8
3. Four students take an IQ test. Their scores are 96, 100, 106, 114. Find the mean
and median scores.
4. Find the mean, median and mode for this grouped data of test scores.
Scores Frequency
65 2
70 3
75 2
80 5
85 8
90 7
95 5
100 3

19
5. The table shows the number of hours (x) children spent on watching television in a
week
Class Interval Mid-point (x) Frequency (f) fx
x 10 < 15 42
15 x < 20 38
20 x < 25 45
25 x < 3 0 38

f =
fx =

First complete the table. Using the information answer the questions:
(a) Find the modal class.
(b) Estimate the median hours of watching TV. (whole number)
(c) Estimate the mean hours of watching TV (2 d.p.)

6. The table shows the number of visits(v) to the doctor patients at a surgery make in a
year.
Class Interval Mid-point (x) Frequency (f) fx
0 x 5 5
5 x < 10 47
10 x < 15 11

f =
fx =

First complete the table. Using this information answer the following questions.
(a) What is the modal class?
(b) Estimate the median number of visits made to the surgery in a year. (whole number)
(c) Estimate the mean number of visits made to the surgery in a year. (2 d.p.)

7. The following frequency distribution shows the quiz scores of a sample of students:
Score Frequency
14 - 18 2
19 - 23 5
24 - 28 12
29 - 33 1
For the above data, compute the following.
a. The mean b. The standard deviation
20

Lesson 4 Measure of Central Tendency

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Lesson 4 Measure of Central Tendency

Caricato da

Copyright:

Formati disponibili

1

. In other words, 50% of

, is the most frequently occurring value in the observation. For

Potrebbero piacerti anche