Sei sulla pagina 1di 20

LESSON 4 Measures of Central Tendency Introduction

Histograms and polygons provide a general idea as to how a data is distributed. When comparisons are to be made between data or further statistical analysis is to be done, exact measures are required to describe the characteristics of a data. These numerical measures are also referred to as summary statistics.

In this lesson and the following, we will discuss two measures that can be used to describe the characteristics of a distribution. They are measures of central tendency and measures of dispersion.

LEARNING OUTCOMES

Upon the completion of this lesson, you should be able to:

a) discuss mean, mode and median as measures of central tendency;

b) discuss the advantages and disadvantages of each of the central tendency values;

c) find the mean, mode and median for ungroup and grouped data from a given data set.

1 Measure of Central Tendency

When statisticians study a group of measurements, they try to determine which measure is most representative of the group. The score about which most of the other scores tend to cluster is a measure of central tendency. Three measures of central tendency are the mode, the median and the mean. A measure for central tendency is an average that represents the data. It pinpoints the center of the data. These measures are commonly known as averages. We will discuss three averages. They are the

i) arithmetic mean (or simply the mean),

ii) median, and

iii) mode.

Let us see how these measure are calculated from raw data, ungrouped and grouped frequency distribution. Note that all measures presented here correspond to measure made from sample data.

Raw Data and Ungrouped Frequency Distributions Arithmetic mean If we have a sample of n observations, x 1 , x 2 , x 3 , ……

,x

n , the sample mean, denoted

by X , is defined as the sum of all observations divided by the sample size.

X =

n

i

1

x

i

n

Referring to example 3, the mean number of train stations a passenger passes before alighting the train is

 X  3 4   4  2 2   4 1   2  2  0 3   2 3   2 1   3  2  2 1   2  45  2.25 20 20 2

This means that on average, a passenger would pass approximately 2 train stations before

getting off the train.

From the ungrouped frequency distribution in Table 1, we calculate the mean the

following formula:

k

f x

i

i

X

i

1

n

,

where

n

k

i 1

f

i and k = the number of class intervals.

Table 1

 Number of train stations passed, Number of f X passengers, f X 0 1 0 1 3 3 2 9 18 3 4 12 4 3 12 Total f =20 fx =45

Hence, we have the mean, 45
X 
20

2.25

In obtaining mean, all the observations from the sample or population are considered.

Therefore, if there exist extreme values (either too big or too small), then, mean is not a

suitable measure to represent the distribution of the data. The median would be a better

measure for central tendency.

Example 1

The marks of five candidates in a mathematics test with a maximum possible mark of 20 are given below.

15

13

19

18

14

Find the mean value.

3

Solution: So, the mean value is 15.8.

Example 2

A survey was taken in Mathematics class regarding the number of story books read by each student in January. The table shows the class data with the frequency of responses. The mean of this data is 2.5. Find the value of k in the table.

 Books 1 2 3 4 5 Frequency 5 k 8 4 1 Solution 1(5)  2( k )  3(8)  4(4)  5(1)  2.5 5  k 8   4  1 50  2 k  2.5 18  k 50  0.5k 2k   45 5  2.5k k 10 4 Median

~

The median, denoted by X , is the middle value of the observations that has been

arranged in an ascending or descending order. If the number of observations is odd, the

median is the middle value, but if the number of observation is even, then the median is

the mean of the two middle values.

Let’s take the data from example 3 and arrange them in the ascending order as follows.

0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4

In this case, the number of observation is even; therefore the middle value is the midway

~ 2

2

2 2

between the tenth and the eleventh value. Hence, X =

. In other words, 50% of

the observation will be below the median and the other 50% will be above the median.

Since median divides the observation into two, it is not affected by extreme values.

In an ungrouped frequency distribution, the median is obtained by looking at the point

where 50% of the frequency lies. The same value is obtained using the ungrouped

frequency distribution in Table 1.

Example 3

The marks of five candidates in a geography test for which the maximum possible mark was 20 are given below:

19 18

16

15

20

Find the median mark.

Solution:

Arrange the marks in ascending order of magnitude:

15 16

18

19

20

The third score, 18, is the middle one in this arrangement.

Median = 18

5

Note: In general: If the number of values in the data set is even, then the median is the average of the two

middle values.

Example 4

Find the median of the following scores:

11

17

15

20

9

12

Solution:

Arrange the score values in ascending order of magnitude:

9

There are 6 scores in the data set.

11

12

15

17

20 The third and fourth scores, 12 and 15, are in the middle. That is, there is no one middle value.

6 Note:

Half of the values in the data set lie below the median and half lie above the median. Mode

The mode, denoted by Xˆ , is the most frequently occurring value in the observation. For the data in example 3, we find that the most frequently occurring value is 2. This means that most passengers pass 2 train stations before alighting.

In an ungrouped frequency distribution, we determine the mode by looking at the highest frequency, in this case 9. Hence the mode is 2.

The concept of mode is easy to understand and simple to obtain. Mode is not affected by extreme values. The disadvantage of this measure for central tendency is that it might not exist. A set of observations can have no mode, one mode, two or more modes.

Example 5

The marks awarded to seven pupils for an assignment were as follows:

 19 15 19 16 13 20 19 a. Find the median mark. b. State the mode.

7

Solution:

 a. Arrange the marks in ascending order of magnitude: 13 15 16 19 19 19 20 Note:

The fourth score, 19, is the middle data value in this arrangement.

Median = 19

[19 is the middle data value]

b. 19 is the score that occurs most often.

Mode = 19 Grouped Frequency Distributions

For data that have been summarized in grouped frequency distributions, the measures of

central tendency computed are only estimates of the true value. This is because accuracy

has been lost when summarizing the data.

Arithmetic Mean

In finding the mean from a grouped frequency distribution, we choose one value from

each class interval as a representative. This value is the class mark. We denote the class

mark as m. The formula for the mean is given below.

8

=

k

i 1

f m

i

i

n

,

k

where n =

i 1

f i and k = number of class intervals

Using the grouped frequency distribution in Table 1, the mean speed of the 55 cars can be

estimated. We add more columns to the table so that it is easier to have the values to be

substituted in the formula.

 Speed (km/h), Number of cars, Class mark, x fx X f 45 - 49 4 47 188 50 - 54 14 52 728 55 - 59 19 57 1083 60 - 64 7 62 434 65 - 69 5 67 335 70 - 74 4 72 288 75 - 79 2 77 154 Total f =55 fx=3210

X =

3210

55

58.36

km/h

Notice that when we compare this value with the actual sample mean, there is a

difference. Obviously, this is due the lost of accuracy.

Example 6

Work out an estimate for the mean height.

 Height (cm), Number of Mid Point (x) fx X people (f), 101 - 120 1 110.5 110.5 121 - 130 3 125.5 376.5 131 - 140 5 135.5 677.5 141 - 150 7 144.5 1018.5 151 - 160 4 155.5 622 161 - 170 2 165.5 331 171 - 190 1 180.5 180.5 Total f =23 fx=3316.5

mean = X =

fx

f

3316.5

23

144cm

(3sf)

9 Median

To estimate the median from a grouped frequency distribution, we must first locate the

class interval containing the

n

2

th observation. We call this class interval the median

class. The estimated value of median can be obtained using the following formula.

where

~

X 

n

2

f

m1

f

m

.C

= lower class boundary of the median class

f

m1

= cumulative frequency of classes before the median class

f

m = frequency of the median class

c = width of the median class

Referring to the grouped frequency distribution in Table 1, the median class is 55 59.

 Speed (km/h), Number of cars, Class mark, x fx X f 45 - 49 4 47 188 50 - 54 14 52 728 55 - 59 19 57 1083 60 - 64 7 62 434 65 - 69 5 67 335 70 - 74 4 72 288 75 - 79 2 77 154 Total f =55 fx=3210

The following information are obtained and substitute them into the formula for the value

of the median.

f

n

2

m1

f

m

55

=

=

=

=

c = 5

= 27.5

2

54.5

4 + 14 = 16

19

10

Median speed,

 ~ 27.5  18 X  54.5  .5  57 km/ h .

19

This means that at most 50% of the cars are being driven at 57 km/h or less and at most 50% of the cars are being driven above 57 km/h. Mode

The class interval with the highest frequency is identified as the modal class. The formula

for mode for the grouped frequency distribution is

where

ˆ 1

X 

 

1

2

.c

= lower class boundary of the modal class

1 = frequency of the modal class frequency of the class before the modal class

2 = frequency of the modal class frequency of the class after the modal class

c = width of the modal class

The modal class for Table 1 is 55 59. It is a coincidence that the median and modal

class is the same for this particular example. Do not make a generalization. There are

times where the median and modal class is not the same. We obtain the following

information to be substituted into the formula for mode.

= 54.5

1 = 19 14 = 5

2 = 19 7 = 12

c = 5

X

54.5

5

5

12

.5

55.97 km/ h

This means that most cars are being driven at approximately 56 km/h, which is below the

city speed limit.

When computing the mean, all the observations are taken into consideration, but

computing the median only involves the scores in the middle of the distribution. Hence,

11

whenever we have extreme scores in the distribution, or when the distribution is skewed,

the median is a better measure for the average.

The calculation of arithmetic mean is based on all values given in the data set.

The calculation of arithmetic mean is simple and it is unique, that is, every data

set has one and only one mean.

The arithmetic mean is reliable single value that reflects all values in the data

set.

can be used for further statistical calculations and mathematical manipulations.

Easily affected by extreme values

In grouped data with open ended class intervals, the mean cannot be computed

Not Appropriate with Highly Skewed Data

It is very simple to understand and easy to calculate. In some cases it is obtained simply by inspection.

Median lies at the middle part of the series and hence it is not affected by the extreme values.

Can be computed even for grouped data with open ended class intervals

In grouped frequency distribution it can be graphically located by drawing ogives.

It is especially useful in open-ended distributions since the position rather than the value of item that matters in median.

12

In simple data set, the item values have to be arranged. If the series contains large number of items, then the process becomes tedious.

It is a less representative average because it does not depend on all the items in the series.

Observations from different data sets have to be merged to obtain a new median, whether group or ungrouped data are involved

In simple data set, having even number of items, median cannot be exactly found.

Moreover, the interpolation formula applied in the continuous series is based on the unrealistic assumption that the frequency of the median class is evenly spread over the magnitude of the class interval of the median group.

Mode value is easy to understand and to calculate. Mode class can also be

located by inspection.

The mode is not affected by the extreme values in the distribution. The mode

value can also be calculated for open-ended frequency distributions.

The mode can be used to describe quantitative as well as qualitative data. For

example, its value is used for comparing consumer preferences for various

types of products, say cigarettes, soaps, toothpastes, or other products

Mode is not a rigidly defined measure as there are several methods for

calculating its value.

It is difficult to locate modal class in the case of multi-modal frequency

distributions.

Mode is not suitable for algebraic manipulations.

13 Skewness Definition

Skewness in statistics has been developed with respect to symmetry; in fact, it is the opposite of symmetry. Symmetry is a concept that is used in defining distribution in terms of graphical representation. A distribution is said to be symmetric if it looks the same from both left and right side of the center point (refer figure 1). The center point is called the axis of symmetry. This graph shows an example of a symmetric distribution. Figure 1

Here the measures of central tendency like mean, median and mode will always be equal to each other and the axis of symmetry which is the ordinate at the mean will divide the distribution into two equal parts such that one side will be a mirror image of the other.

So we can define skewness as a measure of asymmetry of the distribution that means, it helps to measure how much the distribution is not symmetric. It describes which side of the distribution has longer or shorter tail.

On the basis of the shapes interpreting statistic skewness can be done in three ways

Positive skewness: If the right tail is longer than the left tail in the graph of the distribution, the function is said to have positive skewness (refer figure 2). The presence of the extreme observations on the right hand side of a distribution makes it positively skewed.

14

So, if the mean > median> mode in any distribution, then it can be said to follow positive skewness. Figure 2

When the distribution is skewed to the right (positively skewed), mean value is the largest among the three averages, followed by the median and then the mode. Why is this so? This is because the mean value is affected by the extreme values as compared to median and mode. Therefore, in when we have a positively skewed distribution, the mean value is not a suitable average to describe the distribution. The median and the mode would be more appropriate.

Negative skewness: If the left tail is longer than the right tail in the graph of the

distribution, then the function will have negative skewness (refer figure

presence of the extreme observations on the left hand side of a distribution makes it negatively skewed.

The

15

So, if the mean < median< mode in any distribution, then it can be said to follow negative skewness. Figure 3

Zero skewness: If the two tails are of the same length and shape, then we say that the function has zero skewness (figure 4). Then the distribution will be normal and symmetric. Figure 4

16

It is useful to report all three measures because their relative positions can provide some idea about the shape of the distribution. We give some of the common cases encountered. In summary the different types of graphical representation of skewness are presented below 17 Exercise 1

1. The table displays the frequency of scores on a Science quiz (max score 10). Find the median of the scores.

 Score 5 6 7 8 9 10 Frequency 1 5 8 14 12 7

2. The table displays the number of cars owned in a family among students in Form 5 Science 1. Find the mean, median and mode of the cars owned per family for this data set.

Express answers to the nearest hundredth.

 Cars owned 0 1 2 3 4 5 Frequency 2 5 4 6 10 8

3. Four students take an IQ test. Their scores are 96, 100, 106, 114. Find the mean and median scores.

4. Find the mean, median and mode for this grouped data of test scores.

 Scores Frequency 65 2 70 3 75 2 80 5 85 8 90 7 95 5 100 3

18

5.

The table shows the number of hours (x) children spent on watching television in a

week

 Class Interval Mid-point (x) Frequency (f) fx 10  x < 15 42 15  x < 20 38 20  x < 25 45 25  x < 3 0 38  f =  fx =

First complete the table. Using the information answer the questions:

 (a) Find the modal class. (b) Estimate the median hours of watching TV. (whole number) (c) Estimate the mean hours of watching TV (2 d.p.)

6. The table shows the number of visits(v) to the doctor patients at a surgery make in a year.

 Class Interval Mid-point (x) Frequency (f) fx 0  x  5 5 5  x < 10 47 10  x < 15 11  f =  fx =

First complete the table. Using this information answer the following questions.

 (a) What is the modal class? (b) Estimate the median number of visits made to the surgery in a year. (whole number) (c) Estimate the mean number of visits made to the surgery in a year. (2 d.p.) 7. The following frequency distribution shows the quiz scores of a sample of students:

Score

Frequency

 14 - 18 2 19 - 23 5 24 - 28 12 29 - 33 1

For the above data, compute the following.

a.

The mean

b.

The standard deviation

19