Sei sulla pagina 1di 6

Measures of central tendency: Mean, Median and Mode

MEAN
Mean is the most commonly used measure of central tendency. There are different types of
mean, viz. arithmetic mean, weighted mean, geometric mean (GM) and harmonic mean (HM). If
mentioned without an adjective (as mean), it generally refers to the arithmetic mean.

Arithmetic mean
Arithmetic mean (or, simply, “mean”) is nothing but the average. It is computed by adding all
the values in the data set divided by the number of observations in it. If we have the raw data,
mean is given by the formula

Where, ∑ (the uppercase Greek letter sigma), X refers to summation, refers to the individual
value and n is the number of observations in the sample (sample size). The research articles
published in journals do not provide raw data and, in such a situation, the readers can compute
the mean by calculating it from the frequency distribution (if provided).

Where, f is the frequency and X is the midpoint of the class interval and n is the number of
observations.

ADVANTAGES
The mean uses every value in the data and hence is a good representative of the data. The irony
in this is that most of the times this value never appears in the raw data.
Repeated samples drawn from the same population tend to have similar means. The mean is
therefore the measure of central tendency that best resists the fluctuation between different
samples.
It is closely related to standard deviation, the most common measure of dispersion.

DISADVANTAGES
The important disadvantage of mean is that it is sensitive to extreme values/outliers, especially
when the sample size is small. Therefore, it is not an appropriate measure of central tendency for
skewed distribution.
Mean cannot be calculated for nominal or nonnominal ordinal data. Even though mean can be
calculated for numerical ordinal data, many times it does not give a meaningful value, e.g. stage
of cancer.

Weighted mean
Weighted mean is calculated when certain values in a data set are more important than the
others. A weight wi is attached to each of the values xi to reflect this importance.

For example, When weighted mean is used to represent the average duration of stay by a patient
in a hospital, the total number of cases presenting to each ward is taken as the weight.

Geometric Mean
It is defined as the arithmetic mean of the values taken on a log scale. It is also expressed as the
nth root of the product of an observation.

GM is an appropriate measure when values change exponentially and in case of skewed


distribution that can be made symmetrical by a log transformation. GM is more commonly used
in microbiological and serological research. One important disadvantage of GM is that it cannot
be used if any of the values are zero or negative.

Harmonic mean
It is the reciprocal of the arithmetic mean of the observations.

HM is appropriate in situations where the reciprocals of values are more useful. HM is used
when we want to determine the average sample size of a number of groups, each of which has a
different sample size.

DEGREE OF VARIATION BETWEEN THE MEANS


If all the values in a data set are the same, then all the three means (arithmetic mean, GM and
HM) will be identical. As the variability in the data increases, the difference among these means
also increases. Arithmetic mean is always greater than the GM, which in turn is always greater
than the HM
Apart from the mean, median and mode are the two commonly used measures of central
tendency. The median is sometimes referred to as a measure of location as it tells us where the
data are. This article describes about median, mode, and also the guidelines for selecting the
appropriate measure of central tendency.

MEDIAN
Median is the value which occupies the middle position when all the observations are arranged in
an ascending/descending order. It divides the frequency distribution exactly into two halves.
Fifty percent of observations in a distribution have scores at or below the median. Hence median
is the 50th percentile. Median is also known as ‘positional average.
It is easy to calculate the median. If the number of observations are odd, then (n + 1)/2th
observation (in the ordered set) is the median. When the total number of observations are even, it
is given by the mean of n/2th and (n/2 + 1)th observation.

Advantages
1. It is easy to compute and comprehend.
2. It is not distorted by outliers/skewed data.
3. It can be determined for ratio, interval, and ordinal scale.

Disadvantages
1. It does not take into account the precise value of each observation and hence does not use
all information available in the data.
2. Unlike mean, median is not amenable to further mathematical calculation and hence is
not used in many statistical tests.
3. If we pool the observations of two groups, median of the pooled group cannot be
expressed in terms of the individual medians of the pooled groups.

Calculation of Median

Median for Individual series

In individual series, where data is given in the raw form, the first step towards median calculation is
to arrange the data in ascending or descending order. Now calculate the number of observations
denoted by N. The next step is decided by whether the value of N is even or odd.

1. If the value of N is odd then simply the value of (N+1)/2 th item is median for the data.

2. If the value of N is even, then use this formula: Median = [ size of (N+1)/2 term + size of
(N/2 + 1)th term]÷2

Median for Discrete Series


The first step for calculation of median here also involves arranging the data in ascending or
descending order. This is followed by conversion of simple frequencies into cumulative frequencies.
Hence another column for cumulative frequency needs to be constructed, wherein the last value is
labeled as the value of N (i.e ∑f).
Next, we need to find the value of (N+1)/2. Lastly, the value corresponding to the cumulative
frequency just greater than (N+1)/2 is termed as the median for the data.

Median for Frequency Distribution


As in all other types of distributions, here also initially we arrange the classes in either ascending or
descending order. Next, we need to find the cumulative frequencies. The last value in the
cumulative frequency column which is ∑f is labeled as N. This is followed by the calculation of the
value of N/2.
Further, the class corresponding to the cumulative frequency just greater than this value is known as
the median class. Lastly, the median value is calculated by applying the following formula:

Median = l/2 +i/f [ N/2 – C]


Here, l = The lower limit of the median class
i = size of the class, f = Frequency corresponding to the median class
N = Summation of frequencies
C = The cumulative frequency corresponding to the class just before the median class

MODE
Mode is defined as the value that occurs most frequently in the data. Some data sets do not have
a mode because each value occurs only once. On the other hand, some data sets can have more
than one mode. This happens when the data set has two or more values of equal frequency which
is greater than that of any other value. Mode is rarely used as a summary statistic except to
describe a bimodal distribution. In a bimodal distribution, the taller peak is called the major
mode and the shorter one is the minor mode.

Advantages
1. It is the only measure of central tendency that can be used for data measured in a nominal
scale.
2. It can be calculated easily.

Disadvantages
1. It is not used in statistical analysis as it is not algebraically defined and the fluctuation in
the frequency of observation is more when the sample size is small.

Calculation of Mode

Mode for Individual Series


In case of individual series, we just have to inspect the item that occurs most frequently in the
distribution. Further, this item is the mode of the series.

Mode for Discrete Series


In discrete series, we have values of items with their corresponding frequencies. In essence, here
the value of the item with the highest frequency will be the mode for the distribution.

Mode for Frequency Distribution


Lastly, for frequency distribution, the method for mode calculation is somewhat different. Here
we have to find a modal class. The modal class is the one with the highest frequency value. The
class just before the modal class is called the pre-modal class. Whereas, the class just after the
modal class is known as the post-modal class. Lastly, the following formula is applied for
calculation of mode:
Mode = l + i [(f1-f0)/(2f1-f0-f2)]
Here, l= The lower limit of the modal class
f1 = Frequency corresponding to the modal class,
f2 = Frequency corresponding to the post-modal class,
and f0 = Frequency corresponding to the pre-modal class

POSITION OF MEASURES OF CENTRAL TENDENCY


The relative position of the three measures of central tendency (mean, median, and mode)
depends on the shape of the distribution. All three measures are identical in a normal distribution
[Figure 1a]. As mean is always pulled toward the extreme observations, the mean is shifted to the
tail in a skewed distribution [Figure 1b and c]. Mode is the most frequently occurring score and
hence it lies in the hump of the skewed distribution. Median lies in between the mean and the
mode in a skewed distribution.
Figure 1
The relative position of the various measures of central tendency. (a) Normal distribution (b)
Positively (right) skewed distribution (c) Negatively (left) skewed distribution

SELECTING THE APPROPRIATE MEASURE


Mean is generally considered the best measure of central tendency and the most frequently used
one. However, there are some situations where the other measures of central tendency are
preferred.
Median is preferred to mean when
1. There are few extreme scores in the distribution.
2. Some scores have undetermined values.
3. There is an open ended distribution.
4. Data are measured in an ordinal scale.
5. Mode is the preferred measure when data are measured in a nominal scale. Geometric
mean is the preferred measure of central tendency when data are measured in a
logarithmic scale.