0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

65 visualizzazioni6 pagine© © All Rights Reserved

DOCX, PDF, TXT o leggi online da Scribd

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

65 visualizzazioni6 pagine© All Rights Reserved

Sei sulla pagina 1di 6

MEAN

Mean is the most commonly used measure of central tendency. There are different types of

mean, viz. arithmetic mean, weighted mean, geometric mean (GM) and harmonic mean (HM). If

mentioned without an adjective (as mean), it generally refers to the arithmetic mean.

Arithmetic mean

Arithmetic mean (or, simply, “mean”) is nothing but the average. It is computed by adding all

the values in the data set divided by the number of observations in it. If we have the raw data,

mean is given by the formula

Where, ∑ (the uppercase Greek letter sigma), X refers to summation, refers to the individual

value and n is the number of observations in the sample (sample size). The research articles

published in journals do not provide raw data and, in such a situation, the readers can compute

the mean by calculating it from the frequency distribution (if provided).

Where, f is the frequency and X is the midpoint of the class interval and n is the number of

observations.

ADVANTAGES

The mean uses every value in the data and hence is a good representative of the data. The irony

in this is that most of the times this value never appears in the raw data.

Repeated samples drawn from the same population tend to have similar means. The mean is

therefore the measure of central tendency that best resists the fluctuation between different

samples.

It is closely related to standard deviation, the most common measure of dispersion.

DISADVANTAGES

The important disadvantage of mean is that it is sensitive to extreme values/outliers, especially

when the sample size is small. Therefore, it is not an appropriate measure of central tendency for

skewed distribution.

Mean cannot be calculated for nominal or nonnominal ordinal data. Even though mean can be

calculated for numerical ordinal data, many times it does not give a meaningful value, e.g. stage

of cancer.

Weighted mean

Weighted mean is calculated when certain values in a data set are more important than the

others. A weight wi is attached to each of the values xi to reflect this importance.

For example, When weighted mean is used to represent the average duration of stay by a patient

in a hospital, the total number of cases presenting to each ward is taken as the weight.

Geometric Mean

It is defined as the arithmetic mean of the values taken on a log scale. It is also expressed as the

nth root of the product of an observation.

distribution that can be made symmetrical by a log transformation. GM is more commonly used

in microbiological and serological research. One important disadvantage of GM is that it cannot

be used if any of the values are zero or negative.

Harmonic mean

It is the reciprocal of the arithmetic mean of the observations.

HM is appropriate in situations where the reciprocals of values are more useful. HM is used

when we want to determine the average sample size of a number of groups, each of which has a

different sample size.

If all the values in a data set are the same, then all the three means (arithmetic mean, GM and

HM) will be identical. As the variability in the data increases, the difference among these means

also increases. Arithmetic mean is always greater than the GM, which in turn is always greater

than the HM

Apart from the mean, median and mode are the two commonly used measures of central

tendency. The median is sometimes referred to as a measure of location as it tells us where the

data are. This article describes about median, mode, and also the guidelines for selecting the

appropriate measure of central tendency.

MEDIAN

Median is the value which occupies the middle position when all the observations are arranged in

an ascending/descending order. It divides the frequency distribution exactly into two halves.

Fifty percent of observations in a distribution have scores at or below the median. Hence median

is the 50th percentile. Median is also known as ‘positional average.

It is easy to calculate the median. If the number of observations are odd, then (n + 1)/2th

observation (in the ordered set) is the median. When the total number of observations are even, it

is given by the mean of n/2th and (n/2 + 1)th observation.

Advantages

1. It is easy to compute and comprehend.

2. It is not distorted by outliers/skewed data.

3. It can be determined for ratio, interval, and ordinal scale.

Disadvantages

1. It does not take into account the precise value of each observation and hence does not use

all information available in the data.

2. Unlike mean, median is not amenable to further mathematical calculation and hence is

not used in many statistical tests.

3. If we pool the observations of two groups, median of the pooled group cannot be

expressed in terms of the individual medians of the pooled groups.

Calculation of Median

In individual series, where data is given in the raw form, the first step towards median calculation is

to arrange the data in ascending or descending order. Now calculate the number of observations

denoted by N. The next step is decided by whether the value of N is even or odd.

1. If the value of N is odd then simply the value of (N+1)/2 th item is median for the data.

2. If the value of N is even, then use this formula: Median = [ size of (N+1)/2 term + size of

(N/2 + 1)th term]÷2

The first step for calculation of median here also involves arranging the data in ascending or

descending order. This is followed by conversion of simple frequencies into cumulative frequencies.

Hence another column for cumulative frequency needs to be constructed, wherein the last value is

labeled as the value of N (i.e ∑f).

Next, we need to find the value of (N+1)/2. Lastly, the value corresponding to the cumulative

frequency just greater than (N+1)/2 is termed as the median for the data.

As in all other types of distributions, here also initially we arrange the classes in either ascending or

descending order. Next, we need to find the cumulative frequencies. The last value in the

cumulative frequency column which is ∑f is labeled as N. This is followed by the calculation of the

value of N/2.

Further, the class corresponding to the cumulative frequency just greater than this value is known as

the median class. Lastly, the median value is calculated by applying the following formula:

Here, l = The lower limit of the median class

i = size of the class, f = Frequency corresponding to the median class

N = Summation of frequencies

C = The cumulative frequency corresponding to the class just before the median class

MODE

Mode is defined as the value that occurs most frequently in the data. Some data sets do not have

a mode because each value occurs only once. On the other hand, some data sets can have more

than one mode. This happens when the data set has two or more values of equal frequency which

is greater than that of any other value. Mode is rarely used as a summary statistic except to

describe a bimodal distribution. In a bimodal distribution, the taller peak is called the major

mode and the shorter one is the minor mode.

Advantages

1. It is the only measure of central tendency that can be used for data measured in a nominal

scale.

2. It can be calculated easily.

Disadvantages

1. It is not used in statistical analysis as it is not algebraically defined and the fluctuation in

the frequency of observation is more when the sample size is small.

Calculation of Mode

In case of individual series, we just have to inspect the item that occurs most frequently in the

distribution. Further, this item is the mode of the series.

In discrete series, we have values of items with their corresponding frequencies. In essence, here

the value of the item with the highest frequency will be the mode for the distribution.

Lastly, for frequency distribution, the method for mode calculation is somewhat different. Here

we have to find a modal class. The modal class is the one with the highest frequency value. The

class just before the modal class is called the pre-modal class. Whereas, the class just after the

modal class is known as the post-modal class. Lastly, the following formula is applied for

calculation of mode:

Mode = l + i [(f1-f0)/(2f1-f0-f2)]

Here, l= The lower limit of the modal class

f1 = Frequency corresponding to the modal class,

f2 = Frequency corresponding to the post-modal class,

and f0 = Frequency corresponding to the pre-modal class

The relative position of the three measures of central tendency (mean, median, and mode)

depends on the shape of the distribution. All three measures are identical in a normal distribution

[Figure 1a]. As mean is always pulled toward the extreme observations, the mean is shifted to the

tail in a skewed distribution [Figure 1b and c]. Mode is the most frequently occurring score and

hence it lies in the hump of the skewed distribution. Median lies in between the mean and the

mode in a skewed distribution.

Figure 1

The relative position of the various measures of central tendency. (a) Normal distribution (b)

Positively (right) skewed distribution (c) Negatively (left) skewed distribution

Mean is generally considered the best measure of central tendency and the most frequently used

one. However, there are some situations where the other measures of central tendency are

preferred.

Median is preferred to mean when

1. There are few extreme scores in the distribution.

2. Some scores have undetermined values.

3. There is an open ended distribution.

4. Data are measured in an ordinal scale.

5. Mode is the preferred measure when data are measured in a nominal scale. Geometric

mean is the preferred measure of central tendency when data are measured in a

logarithmic scale.