Sei sulla pagina 1di 61

In the Name of ALLAH, the beneficent, the Merciful,

O Allah, send your salutations upon Muhammad (PBUH) &


on the Family of Muhammad (PBUH) as you sent your
salutations upon Ibrahim & on the Family of Ibrahim verily
you are Most Praiseworthy & Glorious

Quantitative Methods
for Decision Making
A Practical and Philosophical approach
By,
Yaseen Ahmed Meenai
Faculty, FCS-IBA
ymeenai@iba.edu.pk

What is Statistics (A science or an


art?)
An activity of obtaining data and then;
Compiling,
summarizing,
presenting,
analyzing, interpreting and.
Drawing conclusions, is called Statistics.
In short it is;
Data Process
Information/Conclusions
Statistics is sort of a mixture of science and
art, till process it is a SCIENCE and drawing
conclusions is an individuals ART.

What is DATA (A word or a


Keyword?)
DATA is a group of raw fact and figures
which may VARY from;
Person to Person, Object to Object,
Distance to Distance and Time to
Time.
Only the absence of VARIATION can
cause a CONSTANT and it doesnt
exists in our physical world. Only
spiritualism can define a CONSTANT.

Data v/s Variable


Variable is the storage of data, its being represented by letters X,Y,Z etc.
There are two types of variables:
Qualitative Variable: It deals with the data which may vary by it
kind, which provides labels, or names, for categories of like
items, i.e. a set of observations where any single observation is
a word or code that represents a class or category.
Gender, Complexion, Weather, Type are some examples
Quantitative Variable: It deals with the numeric data, which
measures either how much or how many of something, i.e. a set
of observations where any single observation is a number that
represents an amount or a count.
Age, Height, number, price are some examples of Quantitative variable.
Source: http://www.microbiologybytes.com/maths/1011-17.html

Inactivity breaker
Object: Allocate a blank page from your writing material and divide that
page into two columns in the following manner:

Qualitative Variables

Quantitative
Variables

1- Gender

1- Age

2- Complexion

2- Height

3- Qualification

3- Weight

4- Weather

4- Price

20.

20.

Try to write atleast 20 variables in each column by observing several fields


like management, agriculture, medical, engineering, geology etc. Submit the
same sheet by writing your full name on the top.

Data Sources
There are three major sources of data:
1. Survey/Census: An official, usually periodic
enumeration of a population, often including the
collection of related demographic information,
is called census. Survey means to inspect and
determine the conditions of interest.
2. Experiment: Any activity, which is usually
being conducted within an isolated atmosphere,
and produces results, is called experiment.
3. Simulation: An artificial way of data collection.

Question of the Day.


What do you think about
Quality of the following in IBA??
1- Teaching 1,2,3,4,5
2- Administration 1,2,3,4,5
3- Structure 1,2,3,4,5
Where 1-Very Poor 5-Excellent

Data
Collection/compilation
Teaching Ranks where 1-Very Poor, 5-Excellent
4.5
3.7
4.3
3.3
2.7
4.7
3.8
4.5
3.4
4.0
3.8
2.7
4.3
3.4
3.2
3.7
3.9
3.8
3.8
3.7
3.6
5.0
4.2
4.1
4.2
4.1
3.9
4.5
5.0
3.7
4.8
3.2
4.2
4.5
4.2
5.0
2.9
Data collection/compilation is needed for getting
actual behavior of the variable.
Note: The above data is simulated version of the actual.

Data Tabulation (Grouping


Exercise)
Step # 01: Finding the range
Range = Max. Min = 5.0-2.7 =2.3
Step # 02: Finding the number of classes
No. of classes = 1 + 3.3 log(n) = 1+3.3 log(37) = 6.175
Step # 03: Finding the width or height (h)
h = Range/No. of classes= 2.3/6.175 = 0.377 0.4
Class Interval: One of the intervals into which the range of a
variable of a distribution is divided, esp. one of the
divisions of the base line of a bar chart or histogram.
After forming the structure of Class-Intervals and frequencies by
using methods of tally-marks, we can observe the actual behavior.

Data Process
Information

Freque
Ranks
ncy
2.7

3.1

3.4

10

3.8

4.2

4.6

Histogram

Frequency

12
10
8
6
4
2
0

Ranks

The above mentioned frequency distribution table and the


Histogram are revealing the shape of thoughts generated from
the minds of students. If we discover a subsequent Mathematical
Model, it will called a Probability distribution.

Data Process
Information
Freque
Ranks
ncy
2.7
7
3.1
3.4

11
9

3.8
4.2

6
3

4.6

Histogram

Frequency

12
10
8
6
4
2
0

Ranks

The above mentioned frequency distribution table and the


Histogram are revealing the shape of thoughts generated from
the minds of students. If we discover a subsequent Mathematical
Model, it will called a Probability distribution.

Grouping the data


(MSEXCEL)

Data Analysis option is located in the Data menu, in


case if it is not present there we can activate it by
running the Add-Ins present in Excel Options.

Grouping the data (MSEXCEL)


cont
After providing
data-range
and hitting the
Labels
and
Chart-output
options, we can
find
the
histogram
either in the
new worksheet
or
in
the
Bin numbersThese numbers represent
the intervals
specific
placethat
of
you want the Histogram tool to use for measuring
the input
the
existing
data in the data analysis.
sheet.

Statistical Measures (An


introduction)
The phrase descriptive statistics is used generically
in place of statistical measures.
These statistic(s) describe or summarize the qualities
of data.
Another name is summary statistics, which we
mostly used to ornament our reports/cases/research.
This would be beneficial if graphical summary is not
enough sufficient for the final conclusions.

Dat
a

Processin
g
By Graph

Processin
g
By
Measure

Conclusio
ns

Statistical Measures (An


Example)
Consider the following group data:
Class
Intervals

Frequenc
y

Relative
Frequency
(R.F.)

Cumulative
Relative Frequency
(C.R.F)

24
46
68
810
1012

2
5
9
7
2
f=25

2/25 = 0.08
5/25 = 0.20
9/25 = 0.36
7/25 = 0.28
2/25 = 0.08
R.F.=1

0.08
0.28
0.64
0.92
1.00

The above data showing Income in 1000s of Rupees of


some individuals in late 1980s

Statistical Measures
(Quartiles)
These
are
3
values
respectively
represented by Q1, Q2 and Q3 and divides
the data into 4 equal parts.
Each part contains 25% observations
Quartiles Usually highlight 4 different
classes i.e. Lower class, Lower Middle,
25% and
25%
25%
Upper Middle
Upper
class.25%
Lower
Class

Mi
n

Lower
Middle

Upper
Middle

Upper
Class

Ma
x

Computing Quartiles
In order to computer Quartile Values, we
need to consider the same frequency
distribution in addition to the column of
Cumulative Frequency.
Class
Frequenc
Cumulative
Intervals
y
Frequency
(C.F.)
24
2
2
46
5
7
68
9
16
810
7
23
1012
2
25
f=25

Computing Quartiles
(Procedure)
For any group-data, quartiles can be computed by
following two simple steps:
Step-1: Finding the location of ith Quartile: (where
i=1,2 and 3)

Step-2: Finding the value of ith Quartile:

Where l = lower limit of captured class, h=class-width, f=class


frequency, C.F.=previous class C.F.

Computing Quartiles (Demo)


Class
Intervals

Frequenc
y

24
46
68
810
1012

Cumulative
Frequency
(C.F.)
2
7
16
23
25

2
5
9
7
2
f=25
Step-1 (For Q1): (1 x 25) / 4 = 6.25
Step-2:

Q1=4+2/5 (6.25 - 2) = 5.7

Note: Class width=h=2

1st
Quart
ile
Class

Quartiles (Income
Classes)
25%
Lower
Class

Mi
n

25%
Lower
Middle

25%
Upper
Middle

25%
Upper
Class

Ma
x

2000
5700 7222
8786
12000
Quartiles can be computed using MSEXCEL,
ungroup form of data is needed there, the
syntax is given below:
=QUARTILE(Data Range,i) where i=1,2,3
showing quartile numbers.

Quartiles, Deciles and


Percentiles
Quartiles:
Deciles:
Percentiles:
To divide the
data into 4
equal parts.
Quartiles are
three
values
Q1, Q2 and Q3

To divide the
data into 10
equal parts.
Deciles
are
Nine values D1,
D2 , D3 D9.

To
divide
the
data into 100
equal parts.
Percentiles are
Ninty nine values
P1, P2,. P99

Step-1:
i=1,2,3

Step-1:
i=1,2,3,,9

Step-1:
i=1,2,3,99

Step-2:

Step-2:

Step-2:

Practice Questions
Q. What should be the interval of income
which covers middle 50% individuals?
Ans.
5700 to 8786
Q. What should be the interval of income
which covers middle 40% individuals?
Mi
n

100
%
40%

30%
D3

30%
D7

Ma
x

Q. What should be the interval of income


which covers middle 30% individuals?

Exploratory Data Analysis (EDA)


by Sir John Wilder Tukey
There are two types of studies:
Hypothetical Study
Exploratory Study
In Exploratory study, we can perform our
analysis
by
avoiding
conventional
methodologies. In EDA, we can observe
the trend of data by applying different
processes on the data.
The Box-plot is a very useful part of EDA.

The Box-Plot
Boxplot
Boxplot of
of Teaching
Teaching

Inter-quartile
Range=Q3-Q1

Min

Q1
33

Max

Q2
44

Teaching
Teaching Ranks
Ranks

Q3
55

Processing Data using


Box-Plots
Boxplots of Female Ages - Male Ages
(means are indicated by solid circles)
45

Males are
Younger
than
Females

More
Variable
Less
35
Consistent
Heterogeneou
s
More Diversed

Less
Variable
More
Consistent
Homogenous

Less Diversed
Male Age

Female A

25

Exploratory Analysis for Quality


ranks from Aventis Field Managers
Boxplots of Teaching, Administration & Structure
(means are indicated by solid circles)
5

Structur

Admin

Teaching

Statistical Measures (Central Tendency)

(Mean, Median and Mode)


The main problem associated with the
mean value of some data is that it is
sensitive to outliers.
The median is simply the middle value
among some scores of a variable. Its
the 2nd Quartile (Q2) of any data.
The most frequent response or value
for a variable. Multiple modes are
possible: bimodal or multimodal.

Mean, Median and Mode


Measurements are on x-axis and
frequencies are on y-axis

The Mode is based on the principal of


democracy, while median (Q2) follows the rule of
moderation. Mean took its place after being
influenced
by
the
higher
values
of
measurements. The above mentioned distribution
is +vely skewed.

Mean and Mode


(Computations)
f x

Mod
al
Clas
s

Class
Intervals
24

Frequency
Mid-Points
fi
xi
(2+4)/2
2
=3

46
68
810
810

f1=5
fm
=9
m
ff2=7
=7

1012
1012

2
2

f
fii=25
=25

(4+6)/2
(6+8)/2
(8+10)/2
(8+10)/2
(10+12)/2
(10+12)/2
=11
=11

2 3

=5
=7

5 5
9 7
7
7
9
9

=9
=9

2
2
11
11
f
f ii

x
xi=179
=179
i

Mode= 7.333

= 7333/Majoritys Income

= 7160/- is the
Average Income

Empirical relationship b/w


Mean, Median and Mode
Following are the values for Mean,
Median and Mode obtained from the
Income data:
f i xi 179
Mean

25

Median Q2

7.222

f m f1
h
Mode l
2 f m f1 f 2
Mean Median Mode

7.160

7.333

(Thus the data is slightly vely skewed )

Arithmetic Mean, Geometric


Mean and Harmonic Mean
For any ungroup data, The Arithmetic Mean
is:
Where xi are the observations
and n is the sample size

For any ungroup data, The Geometric Mean


is:

For any ungroup data, The Harmonic Mean is:

Arithmetic Mean, Geometric


Mean and Harmonic Mean
Consider the Following ungroup data and
compute A.M. , G.M. and H.M.:
XI : 1,2,3,4,5 n=5
A.M. = (1+2+3+4+5)/5
= 15/5 = 3.0
G.M.
= (1x2x3x4x5) 1/5
= (120) 1/5= 2.6052
H.M. = 5 / (1/1+1/2+1/3++1/5)
5/2.8333 = 2.1898

Theorems related to AM, GM &


HM

Empirically prove the following Theorems:


Theorem No. 1:
AM>GM>HM
3.0 > 2.6052 > 2.1898
Theorem No. 2:
AM x HM GM2
3.0 x 2.1898 2.60522
6.569 6.7870 diff. = 0.22

Arithmetic Mean, Geometric Mean


and Harmonic Mean for Group Data
For any Group data, The Arithmetic Mean is:
Where xi are the Mid-Points
and fi are class frequencies.

For any Group data, The Geometric Mean is:

For any Group data, The Harmonic Mean is:

AM, GM & HM
(Computations)
Class
Intervals

24
46
68
810
1012

Frequenc
y
fi

2
5
9
7
2
f
fii=25
=25

MidPoints
xi

3
5
7
9
11

For
A.M.

For
G.M.

f i xi

xi

23
23
55
97
79
211

ffii

xxi=179
=179
i

fi

32
55
79
97
11 22
fi

x
xi fi
i

For
H.M.
fi / x i
2/3
5/5
9/7
7/9
2/11
fi / x i

Mean, Median and Mode


MSEXCEL syntaxes for finding three
measures of central tendency are;
=Average(Data Range)
=Quartile(Data Range,2)
Median
=Mode(Data Range)

For Mean
For
For Mode

Statistical Measures
(Dispersion)
t

What is DISPERSION?? on
H
B
d
A dart-game can help us
in
this
C
r
l
l
U
e
i
t
M lay ch
Based
on
the
visual
s
A
observation; we can
declare
u
w
e
p
r
o
Player-A
as w
a
winner
m
e
e
h
because:
t
y
h
,
w
t
a
Player A is, u
l
w
o
B
d
p
More consistent/Less
o
H
e
n
e
Variable/Homogenous/Less
s
d
!
k
r
h
!
n
Dispersed
t
!
e
!
A
And
t
p
?
n
?
Player B is, is
?
?
e ?
d
t
Less Consistent/More
?
is sis is?
Variable/Heterogeneous/More
dispersed
n

Measures of Dispersion
Some Important Measures of
Dispersion are:
Range=Max-Min
Variance
Standard Deviation
Mean Deviation
Inter-quartile Range
Coefficient of Variation (C.V.)

Dispersion Measures
(Cont)
2
x x

Variance V ( X )

Variance of the following


ungroup data:
X: 1,2,3,4,5
Mean=3

Standard Deviation=
=1.414 ???

V (X ) 2

Coefficient of Variation
(Consistency Check)
In order to check whether the
variable is consistent or not, we need
to compute the coefficient of
variation,

V (X )

C.V .
100 100
X

For any consistent variable, C.V. <


100%
C.V. is the unit-less measure of

Variance & Standard deviation


(group-data)
Class
Intervals
24

Frequenc
Midy
Points
fi
xi
2
(2+4)/2=3

f i xi

f i (xi-mean)2

2 3

2(3 7.16)2=34.61
5(5 7.16)2=23.33
9(7 7.16)2=0.230
7(9 7.16)2=23.69
2(11 7.16)2=29.49
=111.34

46
46
68

5
5
9

(4+6)/2=5
(4+6)/2=5
(6+8)/2=7

5 5
5 5
9 7

68
810

9
7

9
7 7
9

810
1012

7
2

1012

2
fi=25

(6+8)/2=7
(8+10)/2=
9
(8+10)/2=
(10+12)/2
9
=11
(10+12)/2
=11

9
2711
f i 2
x11
i=179

fi=25

f i xi=179

f x x

Variance V ( X )
f
i

111 .34

4.45
25

Variable Comparison (Property


of C.V.)
Coefficient of Variation for 1,2,3,4,5 (n = 5) is,

V (X )
1.414
C.V .
100
100 47.1%
X
3
And for the Income-data ( f = 25 ); it is,

V (X )
2.111
C.V .
100
100 29.48%
X Income data
7.16is more consistent
So technically,

than the first five natural numbers.

Hand-Profile Analysis
(An exploratory approach)
X4

X3

S.N
o.

X2

X5
Span
(X6)

Lengt
h (X7)

Thum
b (X1)
in
cms

Measurement
s (X)

X1

X2

X3

X4

X5

X6

7
X7
Determine
the
Mean,
Standard
deviation
and
Coefficient
of
Variation.

Computing Mean and Standard


Deviation Using Scientific
Calculators
New Models (ES
Series)

Prev. Models (MS


Series)

Press MODE
Select STAT
Select 1-Var
Enter the Data in
appeared data column
For Finding Mean and
Standard Deviation:
Press Shift and then press 1
Select VAR
Select
for mean
Select X n for Standard
Deviation

Press MODE
Select SD
Entering the Data:
Obs1 M+
Obs2 M+
Obs3 M+
do it for all remaining data
observations.
For Finding Mean and
Stand. Dev.
Press Shift and Press 2
Select
for mean
Select X n for Standard
Deviation

Why Bell-Shaped Symmetrical


Distribution??
There
are
Distributions

several

Symmetrical

Why Bell-Shaped Symmetrical


Distribution??
In a Bell-shaped distribution, extreme
values come with less frequency.
Majority
falls
within
one
standard
deviation.
Its Natures Distribution. God created
almost all natural measures with a bellshaped distribution.

Empirical Proof for the Approx.


Confidence Intervals
Bring One Neem Leaf and measure its length
in cms.
Obtain Mean and Standard Deviation
Empirically prove the following theorems:
1) will cover approximately 68% observations
2) 2 will cover approximately 95% observations
3) 3 will cover approximately 99.98% observations

(Group the data and prove that its


Bell-shaped symmetric in nature)

The Normal (Gaussian) Distribution


(Distribution of a continuous random variable)

Bell-shaped distribution or curve


Perfectly symmetrical about the mean.

Mean = median = mode


Tails are asymptotic: closer and closer to
horizontal axis but never reach it.
Approximate domain formula is -3 X
+3

The Normal Probability Density


Function
The PDF is written as:

Where and are two parameters


which are Mean and Standard Deviation,
respectively.
Simplify the f(X) if =0 and =1?
Simplified form is said to be the Standard
Normal Distribution.

Normal curves and


probability

Finding Area Under the


Standard Normal Curve
Standard Normal Table comprises all
possible Areas under the Standard
Normal Curve.
These Areas are to the left of Z=z i.e.,

This can be witten as P(Z 1.18) = 0.8810

Finding Area Under the


Standard Normal Curve
Determine the following Areas/probabilities
using the Standard Normal Table:
1- P(Z1.25) =
2- P(Z< -1.00) =
3- P(Z= -1.00) =
4- P(Z +1.00) =
Solution,
P(Z +1.00)= 1 P(Z< +1.00)
= 1 0.8413 = 0.1587
Theorem: P(Z +1.00) = P(Z -1.00)

Finding Area Under the


Standard Normal Curve
Determine the following Areas/probabilities using
the Standard Normal Table:
5- P(-1.00 Z +1.00) =
Solution,
P(-1.00 Z +1.00) = P(Z +1.00) P(Z <
-1.00)
Theorem:
P(a Z b) = P(Z b) P(Z < a)
6- P(-2.25 Z -1.00) =

Observing Quantiles (Inverse


consideration of Standard Normal Table)

Determine
the
following
Quantiles/Percentage
Points/Z-scores
using the Standard Normal Table:
7- P(Z
a) 0.09
= 0.025
Z

0.06

0.00
-3.9
..
-1.9

0.025

Therefore, the answer will be a= -1.96

Observing Quantiles (Inverse


consideration of Standard Normal Table)
8-

P(Z b) = 0.05
Z

0.09

0.05

0.04

0.00

-3.9
..
-1.6

0.0495 0.0505

Therefore, b = -[1.6 + (0.04+0.05)/2] =


-1.645
Elsewhere we can also consider the
nearest value.

Normal Distribution (Cases)


Soft-drink Analysis from KU canteens
Amount of soft-drink within a glass follows a
Normal Distribution with =220 ml. and =5 ml.
If a student purchases one glass of soft-drink
then determine the probability that he will get
less than 215 ml within his glass:
P(X<215) = ??
We must use the z-transformation: Z = (X-)/, so:
P[(X-)/ < (215-220)/5] =
P(
Z<
- 1.00
) = 0.1587

Normal Distribution (Cases)


Soft-drink Analysis from KU canteens
P(X<215) = 15.87%
1- There is a 16% chance that he will get less
than 215ml within his glass.
2- We are 16% confident that he will get less
than 215 ml. within his glass.
3- If 50 students purchasing 50 glasses of softdrink then approx. 50 x 0.1587 8 of them will
be having less than 215 ml. within their
glasses
Find: P( 215 X 225 ) = P(X 225) P(X<215)

Normal Probabilities Using MSEXCEL


For any Normal distribution with =250
and =5, we can obtain the P(X<245)
using the following syntax:
=Normdist(x,,,cumulative)
=Normdist(245,250,5,1)
And for P(X>255)
=1 - Normdist(255,250,5,1)
We can apply the same scenario on a soft-drink case
study.

Index Numbers
Index Numbers are RELATIVE measures.
Index Numbers Could be Price Relatives or
Quantity Relatives.
Index Numbers are having two major types:
1) Simple Index 2) Composite Index
) Simple Index Number can be obtained
using this formula: In=Pn/P0100 where,
Pn is the current year (time) and Po is the Base year
(time)

Simple Index (Example)


Consider the following table comprising prices of
a commodity in different years:
Year
s

Pric
e
(Rs/)

2006

54

2007

60

Fixed Base
In=Pn/54 100

Chain Base
In=Pn/Pn 100

=54/54 100=
100.0%

=54/54 100=
100.0%

=60/54 100=
111.1%

=60/54 100=
111.1%

2008 67
=67/54
100= base=67/60
100=by fixing
If we
want to use
a Fixed
method
111.7%
the base year 124.1%
as 2006 then the
possible Indices
will be computed by dividing all Price values with
54.
In Chain base method; the preceding year price
will be used as base.

Composite Index (Example)


Consider the following table comprising prices of
a commodity in different years for three different
cities:
Yea Pric Pric Pric
Fixed Base
Chain Base
rs

e
e
e
City City City
1
2
3

200
6

54

52

50

200
7

60

65

62

Su
m
P

In=Pn/156
100

In=Pn/P0
100

156

100%

100%

187

119.9%

119.9%

200

128.2%

106.9%

200
Before
the fixed base or chain based
67 computing
65
68
8 index numbers, we have to obtain a sum for all
prices in the next column.
Finally we can compute both Fixed base and chain
base indices for the P column using the same
procedures.