Sei sulla pagina 1di 58

Measures Of Central

Tendency
&
Dispersion.
By
Twino Ivan

Ivan Twino

Session Content
Explain Measures of Central Tendency
as used in quantitative methods.
State the commonly used measures of
central tendency
Compute some measures of central
tendency using standard formulae.
Discuss their application in
management.
Ivan Twino

Measures of Central
tendency
Measures of Central tendency or location

describe the centre of the entire data set.


They indicate the location to which items
in the data set tend to concentrate.
They indicate where the centre of the
distribution of data is located on the scale
that is being used.
The most commonly used measures of
central tendency are; mean (Arithmetic
mean), median & mode.
Measures of central tendency are
measures of the location of the middle or
the center of a distribution.
Ivan Twino

Measures of Central
tendency

Therefore, any statistical measure


indicating the centre of a set of the data
arranged in an increasing or decreasing
order of magnitude is called the measure
of central tendency (measure of central
location).
Measures of central tendency are
appropriate for quick decision making
among other uses in management.

Ivan Twino

NOTE:
To clearly get the whole story of Measures of
Central tendency and dispersion as
descriptive statistics, the investigator needs to
organize and present the raw data collected
into a frequency distribution table from which
some descriptive statistic can then be
computed or derived to simplify data analysis,
interpretation and decision making
/conclusion in relation to the problem being
researched on. Refer to the case study in the next slide.
Ivan Twino

QM Case Study
The ministry of works is planning to construct a
dual carriage road from Kampala to Entebbe
International Airport. In their pilot survey, it was
found out that many people are to be
compensated if the project is implemented. To
get the levels of compensation estimates, the
ministry then collected information from a
sample of 50 households on amounts of their
property values (in000 $) for compensation as
in the table 1 below:
Ivan Twino

Table 1:
Group/organize this data and construct the
frequency distribution table
70 41 34

55

45 66 73 77

80

30

50 45 72

50

27 70 55 70

85

70

30 50 60

53

40 45 35 55

20

81

25 51 35

62

60 30 45 35

50

89

53 23 28

65

68 50 65 34

35

76

Ivan Twino

Class Work
Using table 1 above and starting with the
class of 20-29 and constant class width,
construct a cumulative frequency distribution
table and:
(a)Determine the mean, median and mode of
the given raw data and comment on nature of
distribution of compensation.
(b) Determine the range, quartile deviation,
the standard deviation and variance of the
given raw data.
Ivan Twino

Format of cumulative frequency distribution


table for Organizing Data
Class
interval

Class
boundary

Class MidMark

Tally
Bars

(x)
-

Frequency

Cumulative
frequency

(f)

(cf)

fX

Total

f =

Ivan Twino

fx
=

fX2

fX2
=
9

MEAN (Arithmetic Mean)


This refers to the total or sum of observations
divided by the number of observations.
It is also defined as the ratio of the sum of
observations to the number of observations.
It is the mostly widely used measure of
central tendency.
It indicates where the center of the
distribution is located on the scale.
The mean is a single value which is
representative of all items in the population or
sample.
Ivan Twino

10

Mean for discrete data


(Ungrouped data)
If a set of data x1, x2 xn, not necessary
all distinct, represents a finite sample of
size n,
It mean is given by: mean = Xi
n

=X1 +X2 +.+Xn


n
Ivan Twino

11

Mean for discrete data


(Ungrouped data)
Given 10,22,31,9,24,27,29,9,23,12:
Mean = x
n
Mean=196/10= 19.6

Ivan Twino

12

Arithmetic Mean
Grouped data
Mean(average) for Grouped data
is given by the formula:

= fX
x
f
Alternatively(see next slide),
Ivan Twino

13

Mean For Grouped Data


.

Mean
x

i 1
n

i 1

Ivan Twino

f i xi
fi

14

mean
Simple or arithmetic average of a range of values
or quantities, computed by dividing the total of all
values by the number of values. For example, the
mean of 1, 2, 3, 4, and 5 is (15 divided by 5) = 3.
It is the most common and best general purpose
measure of the mid-point (around which all other
values cluster) of a set of values, but is prone to
distortion by the presence of extreme values and
may require use of a measure of distortion (such
as mean deviation or standard deviation). Also
called arithmetic mean. See also median and
mode.
Ivan Twino

15

Application/uses of mean

For decision making purposes: Used as a decision


making tool, like in estimating per capita incomes,
average wage payment, factor productivity, average
expenditures or profit/cost per unit output, etc
For performance assessment/analysis; and generalization /
inference on certain business indicators.
For comparison purposes (for comparative studies of
different distributions or communities /countries).
For describing the distribution in a concise
manner(around which value is data distributed).
For monitoring & evaluation purposes given the baseline
information( use mean before and after intervention)
Used for research purposes and problem identification.
For computing other various basic characteristics of a mass
of data e.g. Variances and standard deviations.
Auditing, planning and quality
control, ETC
Ivan Twino
16

THE MEDIAN
It is another measure of the central tendency,
which corresponds to the half of the total
frequency of the distribution of organized data.
The median is the middle value of a set of
observation arranged in an increasing or
decreasing order of magnitude when the number
of orderly arranged discrete observations is odd.
Or the Arithmetic mean of the two middle values
when the number of observation is even in case
of discrete data.
Ivan Twino

17

Median for discrete data(ungrouped data)


Middle value of distribution.
But when N is even, the median will be
the value at position (N+1)/2
Given 10,22,31,9,24,27,29,9,23,12.
Arranged in ascending order:
9,9,10,12,22,23,24,27,29,31
(N+1)/2 position=5.5.
Then median=(22 +23)/2=22.5
Ivan Twino

18

Median for Grouped data


.

N
( ) cf
2
Median L
C
f

Ivan Twino

19

Contd.
Where :
L = Lower class boundary of the median class
f = frequency of the median class
cf = Cumulative frequency of the previous class
C = class width of the median class or group
N/2= Middle item.
(N/2) cf shows the position of median in the
median class.

Ivan Twino

20

Median-Grouped data
Media

Lm +

N/2 Cfbm cm

fm
Where
Lm is the lower class boundary of the median
class
N is the total number of observations
CFbm is the cumulative frequency of the class
before the median class
Cm is the class interval of the median class
Ivan Twino

21

THE MODE
Mode is any value in the distribution/data that occurs
most frequently (Most frequently occurring
number or item in the data set).
The mode of a set of observations is that value
which occurs most often or one with greater
frequency than others .
The mode can be estimated from the class with the
highest frequency.
It is used as a tool for answering some research
questions in relation to Common Responses or
occurrences or observations.
Ivan Twino

22

Estimation of Mode
.

fa
Mode L
C
fa f b

Ivan Twino

23

fa

Contd.
Where:
L = Lower class boundary of the modal
group
f a = frequency of post-modal class (after
the modal group)
f b = Frequency of the pre-modal (class)
(before the modal group)
C = Class width
Ivan Twino

24

Estimation of Mode
Alternatively; mode can also be given
as:

1
Mode L
C
1 2

Ivan Twino

25

Contd
Where :
L = Lower class boundary of the modal class or
group
fm = frequency of the modal class

1 = fm-fb = Difference between frequency of the


modal class & that of the pre-modal(before)class

2 = fm-fa =Difference between frequency of the


modal class & that of the post-modal(after)class.

C = Class width of the modal class


Ivan Twino

26

Common shapes for explaining nature of


distribution ( using frequency Vs observations)
Normal distribution(symmetric-bell shaped): In this case
observations are evenly distributed, and distribution can be
divided into almost two equal parts. For such distributions,
the mean, mode and median are assumed to be the
same/ very close to each other.
Positive skewness(right skewness): most observations
(mode) cluster around relatively low values resulting into a
longer tail to the right i.e. long right tail compared to a short
left tail.In this case, values for: mean>median> mode.
Negative skewness(Left skewness):most observations
(mode) cluster around relatively high values resulting into
a longer tail to the left compared to a short right tail. In this
case values for: mean<median<mode.
Ivan Twino

27

Pearsons coefficient of skewness


Sk =3(mean-median)/(standard deviation)
OR Coefficient of sleekness=(Meanmode)/SD
It explains the nature of distribution of
observations or data. If it is equal or
approximated to zero, it implies normal
distribution, otherwise negatively or positively
skewed depending on the sign(+ or -) of
value got whose range is within -1 to +1.
Ivan Twino

28

Estimated relationship between


Mean, Median and Mode

Mode= 3(Median)-2(Mean)

Ivan Twino

29

Common shapes for explaining


nature of distribution ( using frequency Vs
observations)

Normal distribution(symmetric-bell shaped): In this


case observations are evenly distributed, and distribution
can be divided into almost two equal parts. For such
distributions, the mean, mode and median are
assumed to be the same/ very close to each other.
Positive skewness(right skewness): most observations
(mode) cluster around relatively low values resulting
into a longer tail to the right i.e. long right tail
compared to a short left tail.In this case, values for:
mean>median> mode.
Negative skewness(Left skewness):most observations
(mode) cluster around relatively high values resulting
into a longer tail to the left compared to a short right tail.
In this case, values for: mode>median> mean.
Ivan Twino

30

Measures of Central Tendency


The Shape of Distributions
With perfectly bell
shaped distributions,
the mean, median,
and mode are
identical.
With positively skewed
data, the mode is
lowest, followed by the
median and mean.
With negatively
skewed data, the
mean is lowest,
followed by the
median and mode.

Median vs. Mean Values (cont.)


If the distribution of the data is skewed, the mean is pulled
(relative to the median) in the direction of the long thin tail.
For example, income is distributed in a highly skewed fashion, with a
long thin tail in the direction of higher income. Thus, mean income is
typically considerably higher than median income, with low incomes
earned by most/majority like classroom teacher in MOES or police in
IA

skewness
.

Itwino

33

Session II
Measures of
Dispersion

Ivan Twino

34

Measures of Dispersion
They are descriptive statistics/ measures that
explain or describe the nature of variation
(deviation) or degree of scatterdness of data
around some measures of central tendency(mainly
the mean) that exists in the distribution of data.
The mean, median and mode only estimate the
location(value) where data tend to concentrate
(the centre of data distribution), but do not
describe how data is spread or distributed about
them.
Hence need for use of measures of dispersion for
this purpose.
The degree to which numerical data tend to
spread about an average value (mean) is called
the variation or dispersion of data.
Ivan Twino

35

Measures of Dispersion
They show how the observations/data are
spread about(out) from the average or
mean in the entire data set in the sample
or population.
For instance how wage payments and
household incomes are spread around the
average wage rate and average income
respectively ( depict the degree/nature of
distribution or inequality or variability within the
set of data in considaration)
Ivan Twino

36

Measures of Dispersion
They enable the investigator know the extent to
which the values in the distribution are dispersed
around the measure of central tendency for
management, monitoring, evaluation, policy and
decision making purposes.
Such measures help in comparison of distributions
of certain different phenomena or two sets of data
for two communities whose SD values are given
(Note: one with lower SD is often preferred to the
other with higher disparities and risks).
The larger the measures of dispersion, the greater the
degree of deviations of values about the mean.
Ivan Twino

37

Examples of Measures of Dispersion


Some important descriptive statistics for measuring
the dispersion or variability of the set of data are:
Range
Variance
standard deviation
Coefficient of variation (Which is standard deviation
as a percentage of the mean is also a good measure
of comparing variability between two sets of data)
C.V ={Standard deviation/mean}*100
Mean Absolute deviation
The Quartile deviation

ETC. NOTE: Use table 1 to compute the above measures.


Ivan Twino

38

RANGE
It is the difference between the highest value
and the lowest value in a given set of distribution
of observations or data (ungrouped data).
The range R of the set of data is the difference
between the largest and the smallest numbers in
the set. R = H - L
Where R = Range

H = Highest value of observation

L = Lowest value of observation.


However, range is the poor measure of variation
particularly when the sample is large.
Ivan Twino

39

Range For Grouped Data


This is defined as the difference between the
largest and smallest values among all
observations for ungrouped data.
For Grouped data the range is the difference
between the lower class limit of the lower
extreme class and the upper class limit of the
upper extreme class.

R is difference between the lower class limit of


the first class interval and the upper class limit
of the last class interval of the frequency
distribution table for grouped data.
Ivan Twino

40

Variance
Variance is the arithmetic mean of the squares of
the deviations(X-)2 of individual values of
observations from their arithmetic mean().
Its denoted by S2 for a sample distribution and 2
for a population.
Variance is the average of sum of square of
deviations of values from the mean() and is
given as Variance 2 = (X-)2/n.
Or Variance 2 = f(X-)2/f .
Ivan Twino

41

Variance
.
Variance =
2

f xx

Ivan Twino

f
42

STANDARD DEVIATION
Standard Deviation () is the positive square root of
variance.
Also known as Root-mean square deviation and is
described as the square root of the average of the
sum of the square of the deviations. The square
root of the mean squared deviation from mean of
distribution.
It is the Square root of the mean of the squared
deviations.
And this gives how far above or below the mean the
observations are in the data distribution.
It indicates the spread/ variability of the data around
the mean.
Ivan Twino

43

Standard Deviation (for grouped data)


Standard deviation is expressed as:

fx

fx

f
2

Ivan Twino

44

Standard Deviation (for grouped data)


Alternatively,

SD

Ivan Twino

45

Importance of SD

It is an important measure of degree of dispersion or


variation of other items in the distribution from the mean.
It makes distribution observations very easily understood.
The larger the SD the greater will be the degree of deviation
/dispersion from the mean. Hence, applicable in the
analysis of degree and nature of distribution of income or
wages of people around the average figure or per capita
income in an economy.
NOTE :If there is no dispersion, it implies that all values in
the distribution are the same (SD = 0)-equality
It used for comparison purposes given two sets of data for
two communities(whose SD values are got).
For quality control-quality assurance and Auditing against
set standards
Applied in risk analysis and decision making.
.
Ivan Twino
46

For decision making purposes


For comparison purposes of different
distributions or situations for communities
/countries before and after interventions.
For describing the nature distribution in a
concise manner
For monitoring & evaluation purposes given
the baseline information( use mean before and
after intervention)
Used for research purposes and problem
identification.
Auditing and quality control purposes,
For planning and policy purposes
Ivan Twino

47

Quartiles
For n observations, arranged in order of size, the lower
quartile,Q1,is the value at the position of 25% of the way
through the distribution of ordered data.
The upper quartile, Q3, is the value at the position of 75%
of the way through the distribution.
The quartiles together with the median divide the
distribution into 4 equal parts.
For instance, given the ungrouped data:
3,3,5,6,8,9,12,14,19,20,24,30
Q1 (lower quartile) =5

Q2(median) =9
Q3(upper quartile)=19
Ivan Twino

48

The Quartile for grouped data


(apply the formula for median)
Where;
Q3 = Lq3 + 3N/4 Cfbq3

Cq3
f

Q1 =

and
Lq1 + 1N/4 Cfbq1 Cq1
f

Ivan Twino

49

The Inter-quartile range and Quartile deviation

Inter-quartile range is the difference


between the upper quartile and the lower
quartile i.e. (Q3 - Q1 ).
Quartile deviation is equal to the Interquartile range (Q3 - Q1 ) divided by two.
It is half of Inter-quartile range
Simply the average of the difference
between the upper quartile and lower
quartile i.e. Q3 Q1
2
Ivan Twino

50

Example
The table 2 below shows the scores of 30
applicants for secretarial positions in a
large project obtained in aptitude test
marked out of 50 designed to measure
clerical skills. Complete the table and
compute the descriptive statistics:

Ivan Twino

51

Illustration(Table 2)
Class interval Midpoint (x) Frequency(f)

fx

20 24

22

44

25 29

27

81

30 34

32

192

35 39

37

185

40 44

42

10

420

45 - 49

47

188

f=30

fx=1110

Total
Ivan Twino

52

Group Work
Using table 2:
(a) Determine the mean, median and the
mode scores and state managerial
implication of each where possible.
(b) Estimate the values of range ,
variance, the standard deviation, quartile
deviation and coefficient of variation, the
degree of skewness and state their
managerial implication of each.
Ivan Twino

53

Conclusion
Since todays decisions are driven by data,
therefore measures of central tendency and
dispersion are generally vital for statistical
thinking for managerial business decisions.
Such statistical models help Managers and
professionals to solve problems in diversity of
context, add substance to decisions, reduce
guess work and decisions based entirely on
personal opinions or beliefs.

Ivan Twino

54

Importance of Quantitative Methods in


Management

Important for decision making based on relevant and


reliable quantitative data/findings(data/evidence based
decisions-statistical thinking for managerial decisions) ;but
not only on personal opinions /attitudes or beliefs.
Statistics helps in predicting future trends of events
(Forecasting the socio- economic, business, political and
organizational performance trends/outcome of events) in
an economy/country, organizations /markets.
They are important in determining the rates of change in
various socio-economic variables over time
In quality control, planning, budgeting for scarce
resources and control of activities subject to resource
constraints
Itwino

55

Cont
Help to measure effects/impacts of change in one
variable(IV) on the other(DV) & rates of changes.
Applied for comparison purposes that are vital for
policy formulation , policy/ project evaluation and
monitoring ,mgt purpose and comparative studies
of different distributions or communities /countries
e.g. Per capita income, SOL, inflation & exchange
rates,etc
The Regression/ Correlation techniques help the managers or
planners and firms to know the relationship between two or
more variables (e.g. sales and advertising, rewards and org
performance) from which mgt decisions and policies are based.
Itwino

56

Quantitative approaches are faster in providing


solutions to managerial problems of what, when, how
and for whom to produce using scarce resources.
Linear programming and optimization models are used
by firms and organizations to estimate optimal
resource combinations for cost minimize and profits
or returns maximize, hence prioritizing activities
subject to certain resource constraints .
For describing the distribution of data in a concise
manner ( e.g. average- is the value around which
data is distributed).
For monitoring & evaluation purposes given the
baseline information (for performance
assessment/analysis)
Used problem identification or identification of
knowledge/informationItwino
gaps for research purposes57

Contd..
Probability Theory (as part of Management Science) is important
risk analysis for decision-making or management purposes.
Statistics helps a lot in Policy making. In order to form certain
policies, Statistics provide baseline information with adequate
numerical data relevant to that phenomena e.g. Policy on Education
depends on data related to Educational Issues, Policy on health is also
dependent on medical data etc.
H R -management: Study of Statistical data regarding wage rates,
employment trends, COL indices, employees grievances, Labor
turnover rates, records of performance appraisal etc and proper
analysis of such data assist employers and the personnel department
in formulating the personnel policies & strategies for manpower
planning and organizational development.
Index numbers help government ,individuals and firms in analyzing
changes in certain variables ( prices, exchange rates , C OL ,poverty
levels ,etc) overtime for appropriate interventions
Itwino

58