Sei sulla pagina 1di 35

QUANTITATIVE ANALYSIS FOR

MANAGEMENT – II

QAM – II by Gaurav Garg (IIM Lucknow)


• COURSE OUTLINE
 Sampling Distributions
 Interval Estimation
 Sample Size Decision
 Testing of Hypothesis
• single population and
• two populations
 Measures of Association for Qualitative data and
contingency Table
 Chi-square test for Goodness of fit
 Analysis of Variance – one way and two way
 Multiple Regression Analysis
QAM – II by Gaurav Garg (IIM Lucknow)
Sampling Distributions

• Concept of Sampling Distribution


• Distributions of Sample Mean and
Sample Proportion
• Central Limit Theorem
• t, Chi-Square and F distributions.

QAM – II by Gaurav Garg (IIM Lucknow)


Parameter and Statistic
• Parameter:
 Statistical measures computed using population observations.
 Let X1, X2,…, XN are population units.
 Population mean Population Variance
1 N
  X   Xi
1 N
   ( X i  X )2
2

N i 1 N i 1

• Statistic:
 Statistical measures computed using sample observations.
 Let x1, x2,…, xn are sample units.
 Sample mean Sample
n
Variance
1 1 n
1 n
x   xi
n i 1
s 2
 
n i 1
( xi  x ) 2
or s1
2
 
n  1 i 1
( xi  x ) 2

QAM – II by Gaurav Garg (IIM Lucknow)


• In practice, parameter values are not known.
• They are estimated using sample observations.
• Parameter values are fixed.
• Values of statistic varies sample to sample.
• Unbiased Estimate
 If E(statistic) = parameter,
 then the statistic is said to be unbiased estimate of
the parameter.
 Sample mean is an unbiased estimate of population
mean.

QAM – II by Gaurav Garg (IIM Lucknow)


• Let us consider the following population of size 4:
• 18, 20, 22, 24
• Population mean = (18 + 20 + 22 + 24)/ 4 = 21
• Population Variance
• = [(18-21)2 + (20-21) 2 + (22-21) 2 + (24-21) 2] / 4 = 5
• Consider all possible samples of size 2
• Obtain sample mean and sample variance of all the
samples.
• Sample mean is an unbiased estimate of population
mean.
• This means that the average of all sample means
equals population mean.
QAM – II by Gaurav Garg (IIM Lucknow)
Samples x s2 s1 2
1 n
x   xi
18, 18 18 0 0
20, 18 19 1 2 n i 1
22, 18 20 4 8
n
1
s 2   ( xi  x)2
24, 18 21 9 18
18, 20 19 1 2
20, 20
n i 1
20 0 0
n
1

22, 20 21 1 2
24, 20 s12  ( x  x ) 2

n  1 i 1
22 4 8 i
18, 22 20 4 8

  21,  2  5, n  2
20, 22 21 1 2
22, 22 22 0 0
24, 22
E( x)  
23 1 2
18, 24 21 9 18
20, 24 22 4 8 E (s 2 )   2
22, 24 23 1 2
24, 24 24 0 0 E (s )  
2
1
2

Average 21 2.5 5
Sampling Distributions
• Unknown parameters are estimated using sample
observations.
• Parameter values are fixed.
• Values of statistic varies sample to sample.
• Each sample has some probability of being chosen.
• Each value of a statistic is associated with a probability.
• Statistic is a random variable.
• Distribution of a statistic is called sampling distribution.
• Distribution of a statistic may not be the same as the
distribution of population.
QAM – II by Gaurav Garg (IIM Lucknow)
Sampling Distribution of Mean
(or Distribution of Sample Mean)
• Consider the previous example again.
• Histogram of population units

0.25

0 18 20 22 24
x
• Each item is frequented only once.
• Population distribution is discrete uniform
distribution.
QAM – II by Gaurav Garg (IIM Lucknow)
Sample Probability = relative
Samples Mean Frequency frequency
(18, 18) 18 1 1/16
(20, 18), (18, 20) 19 2 2/16
(22, 18), (18, 22), (20, 20) 20 3 3/16
(24, 18), (18, 24), (20, 22), (22,20) 21 4 4/16
(20, 24), (24, 20), (22, 22) 22 3 3/16
(22, 24), (24, 22) 23 2 2/16
(24, 24) 24 1 1/16
Total 1

4/16
3/16 (no longer uniform)
2/16
1/16
0
18 19 20 21 22 23 24
QAM – II by Gaurav Garg (IIM Lucknow)
• The value of the sample mean depends on the chosen
sample.
• Each sample is chosen with certain probability.
• So, each possible value of sample mean is associated
with some probability.
• Distribution of sample mean is the list of all possible
values along with corresponding probabilities.

Sample 18 19 20 21 22 23 24
Mean
Probability 1/16 2/16 3/16 4/16 3/16 2/16 1/16

QAM – II by Gaurav Garg (IIM Lucknow)


• In other words, the statistic x (sample mean)
can be considered as a random variable.
• The distribution of x is given by following table:
x Prob col 1×col2 col12×col2
18 1/16 1.125 20.250
19 2/16 2.375 45.125
20 3/16 3.750 75.000 E(𝑥)ҧ = 21,
21 4/16 5.250 110.250 𝑉𝑎𝑟 𝑥ҧ = 2.5
22 3/16 4.125 90.750
23 2/16 2.875 66.125 How we calculated the variance?
Var(X bar = E(x2)-E(x)2
24 1/16 1.500 36.000
21.000 443.500

QAM – II by Gaurav Garg (IIM Lucknow)


• In general, E ( x )   , Var( x )   n
2

• E ( x ) and Var ( x ) can also be obtained as follows:

1 n  1 n 1 n 1
E ( x )  E   xi    E ( xi )     n  
 n i 1  n i 1 n i 1 n

1 n  1 n
1 n
1  2
Var ( x )  Var  xi   2  Var ( xi )  2   2
 2
n 2

 n i 1  n i 1 n i 1 n n
How?
• Common Notation:
 x  E ( x )   ,   Var ( x )   n
2
x
2

QAM – II by Gaurav Garg (IIM Lucknow)


Standard Error
• Different samples of the same size from the same
population will yield different sample means.
• A measure of the variability in different values of
sample mean is given by the Standard Error of the
sample mean.
standard error ( x )   x  Var( x )   n
• Standard error of a statistic is the standard deviation
of its distribution. Yesterday example the standard error value and standard
deviation values were different?

• In our example,  x  2.5  1.5811


• Standard error decreases when sample size is
increased.
QAM – II by Gaurav Garg (IIM Lucknow)
Central Limit Theorem
• When population distribution is N(μ, σ),

• then x ~ N  ,  n . 
• When the population distribution is not normal,

• then also x ~ N  ,  n , provided n→∞. 
• Practically, this result is true for n ≥ 30.

QAM – II by Gaurav Garg (IIM Lucknow)


1,800 Randomly Selected Values
from an Exponential Distribution

Distribution of Sample Mean

10
n=2 9 16
8
7
n=5 14 n=30
6 12
5 10
4 8
3 6
2 4
1 2
00.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25
3.50
3.750
4.00
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
x x

QAM – II by Gaurav Garg (IIM Lucknow)


1,800 Randomly Selected Values from a Uniform
Distribution
F250 It means to say that whatever
r 200 population distribution is we
e can find the population mean
150
q and standard deviation by
u 100
sampling it with large sample
e 50 and converting it to normal
n0 distribution?
c 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
X
y
Distribution of Sample Mean
F10
9
F12 F25
10
r87 r8 r 20
6 e6 15
e54 e10
q32 q4 q5
2
u10 u0 u 0
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25
3.50
3.75 4.25 1.00
4.00 1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25
3.50
3.75
4.00
4.25
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25
3.50
3.75
4.00
4.25
e n=2 x e n=5 xe n=30 x
n n n
c c c
y y y
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• Suppose a population has mean μ = 8 and standard
deviation σ = 3.
• Suppose a random sample of size n = 36 is selected.
• What is the probability that the sample mean is
between 7.75 and 8.25?
• Even if the population is not normally distributed,
the central limit theorem can be used (n > 30).
• So, the distribution of the sample mean is
approximately N(8, 3/6).
• i.e, x ~ N (8, 3 / 6)
P[7.75  x  8.25]  ?
QAM – II by Gaurav Garg (IIM Lucknow)
Sampling Distribution of Proportion
(or Distribution of Sampling Proportion)
• Let us consider that the population is divided into
two mutually exclusive and collectively exhaustive
classes.

• One class possesses a particular attribute,

• Other class does not posses that attribute.

• For example a people in a city could be divided into


“Smokers” and “Non-smokers”.
QAM – II by Gaurav Garg (IIM Lucknow)
• Let
 N= population size
 X= no. of people out of N possessing a particular
attribute
 𝝅= Actual proportion of the people possessing a
particular attribute = X/N

• Let a sample is selected from this population.


 n= sample size
 x= no. of people in the sample possessing a
particular attribute
 p= x/n = sample proportion

QAM – II by Gaurav Garg (IIM Lucknow)


• X,𝝅 are population parameters.
• x, p are sample statistics.
• p provides an estimate of𝝅.
• Note that, x ~ B(n, 𝝅) WHY Binomial?
• E(x) = n𝝅, Var(x) = n𝝅(1 – 𝝅 ),
• This implies that
• E(p) = E(x/n) = 𝝅,
• Var(p) = Var(x/n) = n𝝅(1 – 𝝅 )/n2 = 𝝅(1 – 𝝅 )/n.
• Standard error (p) = √[Var(p)] = √[𝝅(1 – 𝝅 )/n]

QAM – II by Gaurav Garg (IIM Lucknow)


• When the sample size n is large enough,
x  n
From where it came?
Z ~ N (0,1)
n (1   )
p 
or Z ~ N (0,1)
 (1   ) n

• This is a particular case of central limit theorem.


• Practically, this result is true for 𝑛 ≥ 30.
• Or, when 𝑛𝝅 ≥ 5 as well as 𝑛(1 – 𝝅 ) ≥ 5.

QAM – II by Gaurav Garg (IIM Lucknow)


• Example:
• If the true proportion of voters who support ABC party
is 0.4.
• What is the probability that a sample of size 200 yields a
sample proportion between 0.40 and 0.45?
• 𝝅 = 0.4, 1 - 𝝅 = 1 – 0.4 = 0.6
• n = 200.
• Pr[ 0.40 < p < 0.45 ] =?

p 
Z ~ N (0,1)
 (1   ) n

QAM – II by Gaurav Garg (IIM Lucknow)


Finite Population Correction
• For the application of central limit theorem, we
assumed that the sample size n is large.
• If the population size N is small, sample size n can
not be sufficiently large.
• And we can not apply central limit theorem.
• In this situation, we multiply the standard error by
Finite Population Correction (fpc),
Do this applicable when
• which is given by N n population is not normal and
fpc  sample size is small?
N 1
• Clearly, when N → ∞, fpc → 1.
QAM – II by Gaurav Garg (IIM Lucknow)
• Thus


x ~ N  , fpc   n or x
 N n
~ N (0,1).

n N 1
• And
p  p 
 ~ N 0,1.
 (1   )  (1   ) N n
 fpc 
n n N 1

• fpc should be used when n / N >0.05

QAM – II by Gaurav Garg (IIM Lucknow)


Degree of Freedom
• Assume four numbers: a, b, c, and d,
• such that, a+b+c+d = m.
• You are free to vary any 3 numbers.
• But 4th one must be chosen so that sum is m.
• Thus your degree of freedom is 3.
• The no. of independent observations which make up a
statistic, is known as the degrees of freedom (d.f.)
associated with that statistic.
• d.f. is the number of values in the final calculation of a
statistic that are free to vary.
• d.f. of a statistic
= (no. of independent observations) – (no. of parameters estimated)
QAM – II by Gaurav Garg (IIM Lucknow)
Student’s t Distribution
• Let us take a sample x1 , x2 ,..., xn from N(μ,σ).
• Define the statistic
x 1 n 1 n
T , where x   xi , s1 
2
 ( x  x ) 2
.
n  1 i 1
i
s1 n n i 1

• Then T follows Student’s t Distribution with (n-1)


d.f. and range (- ∞,∞).
Interpretation of K
• It is denoted as T ~ t( n 1)
k
• If T ~ t( k ) , then E (T )  0, Var (T )  , (k  2)
k 2

QAM – II by Gaurav Garg (IIM Lucknow)


pdf of Student’s t distribution with k d.f.

• This distribution is symmetric about 0.


• Mean=Median=Mode=0
QAM – II by Gaurav Garg (IIM Lucknow)
Note: t(n-1) →N(0,1) as n increases

Standard Normal
(t with df = ∞)

t-distributions are bell- t (df = 13)


shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal

0 t
QAM – II by Gaurav Garg (IIM Lucknow)
Chi Square Distribution
• Let us take a sample x1 , x2 ,..., xn from N(μ,σ).

 xi   
n 2
• Define the statistic   
2

i 1   
• The symbol  is read as Chi-Square and has a Chi-Square
2

Distribution with n degree of freedom and range (0,∞).

• This distribution is denoted as  2


(n ).

 xi  x 
n 2

• If we define the statistic as   


2

i 1   

• The distribution of this statistic is  2


( n 1) .

QAM – II by Gaurav Garg (IIM Lucknow)


• If X ~  (2k ) , then E ( X )  k , Var ( X )  2k .

• Plot of p.d.f. of Chi-Square distribution with d.f. k

• The modes are at T = k-2 (k>1).


QAM – II by Gaurav Garg (IIM Lucknow)
Distribution of Sample Variance

• Let us take a sample x1 , x2 ,..., xn from N(μ,σ).


• Sample variance
1 n 1 n
s   ( xi  x ) or s1 
2 2 2
 ( x  x ) 2

n  1 i 1
i
n i 1

• Using Chi square distribution,


2
ns 2
(n  1) s 2 n
( xi  x ) 2 n
 xi  x 
 1
    ~  2
( n 1)
 2
 2
i 1 2 i 1   

QAM – II by Gaurav Garg (IIM Lucknow)


Snedecor’s F Distribution
• Let X and Y be two independent random
variables such that X ~ (2d1 ) and Y ~ (2d2 )
X d1
• Define the statistic F 
Y d2
• F follows Snedecor’s F Distribution with d1 and
d2 d.f. and range (0,∞).
• It is denoted as F~F(d1,d2) .
2d 2 (d 2  d1  2)
2
d2
E(F )  , d 2  2 and Var ( F )  , d2  4
d2  2 d1 (d 2  2) (d 2  4)
2

QAM – II by Gaurav Garg (IIM Lucknow)


pdf of Snedecor’s F distribution with d1 and d2 d.f.

QAM – II by Gaurav Garg (IIM Lucknow)


Summary
• Parameter and Statistic
• Unbiasedness
• Distribution of sample mean
• Distribution of sample proportion
• Distribution of sample variance
• Central limit theorem
• Finite population correction
• Degree of Freedom
• Student’s t, Chi-Square and Snedecor’s F-
distributions
QAM – II by Gaurav Garg (IIM Lucknow)

Potrebbero piacerti anche