Lecture 4

Sampling Theory
MODULE II
LECTURE - 4
SIMPLE RANDOM SAMPLING
DR. SHALABH
DEPARTMENT OF MATHEMATICS AND STATISTICS
INDIAN INSTITUTE OF TECHNOLOGY KANPUR
1
Estimation of population mean and population variance
One of the main objectives after the selection of a sample is to know about the tendency of the data to cluster
around the central value and the scatterdness of the data around the central value.
Among various indicators of central tendency and dispersion, the popular choices are arithmetic mean and
variance. So the population mean and population variability are generally measured by arithmetic mean (or
weighted arithmetic mean) and variance.
There are various popular estimators for estimating the population mean and population variance. Among
them, sample arithmetic mean and sample variance are more popular than other estimators .
One of the reason to use these estimators is that they possess nice statistical properties. Moreover, they are
also obtained through well established statistical estimation procedures like maximum likelihood estimation,
least squares estimation, method of moments etc. under several standard statistical distributions.
One may also consider other indicators like median, mode, geometric mean, harmonic mean for measuring the
central tendency and mean deviation, absolute deviation, Pitman nearness etc. for measuring the dispersion.
The properties of such estimators can be studied by numerical procedures like bootstraping.
2
1. Estimation of population mean
n
1 N
Let us consider the sample arithmetic mean y =
n
∑
i =1
yi as an estimator of population mean Y = 1
N
∑Y i
i =1
and verify if y is an unbiased estimator of Y under the two cases .
• SRSWOR
 N 
   
1  n  1 1 1    n n
 ∑ yi 
=
E=
E( y ) = E ( ti ) ∑i
n   N  i 1=
t ( =
where t i ∑ yi )
=n i1  n= i 1
  
 n  
N
 
1 1  n 
n
= . ∑  ∑
n  N =i 1 = i 1
yi .

 
n
When n units are sampled from N units by without replacement, then each unit of the population can occur
with (n - 1) other units selected out of the remaining (N - 1) units in the population and each unit occurs in
 N − 1 of the  N  possible samples. So

   
 n −1  n
N
 
n
 n   N − 1 N
so
∑  ∑ yi  =  n − 1  ∑ yi
i1  
=i 1 = = i1
( N − 1)! n !( N − n)! N 1 N
Now =
E( y ) .
(n − 1)!( N − n)! = nN!
= ∑ y
i 1=
i = ∑ yi Y .
N i1
3
Thus y is an unbiased estimator of Y .
Alternatively, the following approach can also be adopted to show that the sample mean is an unbiased estimator
of population mean
n
1
E( y ) =
n
∑ E( y )
j =1
j
1 n N 
= ∑  ∑
i 1
n =j 1 =
Yi Pj (i ) 

1 n N 1
= ∑  ∑ Yi . 
n=j 1 =i 1 N
1 n
= = ∑Y Y
n j =1
where Pj (i ) denotes the probability of selection of ith unit at jth stage.
• SRSWR
1  n 
E ( y ) = E  ∑ yi 
n  i =1 
1 n 1 n 1 n
=
n
∑
= E ( yi )
=i 1 =i 1 n
∑ (Y1P1 + ... +=
YN PN )
=i 1 n
∑ Y
=Y
1
=
where Pi = for all i 1, 2,..., N is the probability of selection of a unit. Thus y is an unbiased estimator of
N
population mean under SRSWR also.
4
Variance of the estimate
Assume that each observation has same variance σ 2 . Then
V (=
y ) E ( y − Y )2
2
1 n 
= E  ∑ ( yi − Y ) 
 n i =1 
1 n 1 n n 
= E  2 ∑ ( yi − Y ) + 2 ∑∑ ( yi − Y )( y j − Y ) 
2
= n i 1 n i ≠j 
1 n 1 n n
= 2 ∑ E ( yi − Y ) 2 + 2 ∑∑ E ( yi − Y )( y j − Y )
= n i1 n i ≠j
1 n 2 K
= 2 ∑σ +
n i =1 n2
N −1 2 K
= S +
Nn n2
where
n n
=
K ∑∑ E ( y − Y )( y
i ≠j
i j − Y ).
Now we find the value of K under the setups of SRSWR and SRSWOR.
5
• SRSWOR
n n
=
K ∑∑ E ( y
i ≠j
i − Y )( y j − Y )
Consider
N N
1
E ( y=
i − Y )( y j − Y ) ∑∑ ( yk − Y )( yl − Y ).
N ( N − 1) k ≠l
Since
2
N  N N N
 ∑ ( yk − Y )  = ∑ ( yk − Y ) + ∑∑ ( yk − Y )( yl − Y )
2
= k 1=  k 1 k ≠l
N N
0 =( N − 1) S 2 + ∑∑ ( yk − Y )( yl − Y )
k ≠l
N N
1
∑∑ ( y
k ≠l
k − Y )( y=
l −Y )
N ( N − 1)
 −( N − 1) S 2 
S2
= − .
N
Thus
S2
K=
− n(n − 1)
N
6
and substituting the value of K, the variance of y under SRSWOR is
N −1 2 1 S2
V ( yWOR )= S − 2 n(n − 1)
Nn n N
N −n 2
= S .
Nn
• SRSWR
Now we obtain the value of K under SRSWR.

n n
=
K ∑∑ E ( y − Y )( y
i ≠j
i j −Y)
n n
= ∑∑ E ( y − Y ) E ( y
i ≠j
i j −Y)
=0
because ith and jth draws (i ≠ j ) are independent. Thus the variance of y under SRSWR is
N −1 2
V ( yWR ) = S .
Nn
7
It is to be noted that if N is infinite (large enough), then
S2
V ( y) =
n
N −n
in both the cases of SRSWOR and SRSWR. So the factor is responsible for changing the variance of y when
N
N −n
the sample is drawn from a finite population in comparison to an infinite population. This is why is called as
N
finite population correction (fpc) .
N −n n N −n n
It may be noted that = 1 − , so is close to 1 if the ratio of sample size to population size  
N N N N
is very small or negligible. In such a case, the size of the population has no direct effect on the variance of y .
n
The term is called as sampling fraction.
N
n
In practice, fpc can be ignored whenever < 5% and for many purposes even if it is as high as 10%.
N
Ignoring fpc will result in the over estimation of variance of y .
8
Efficiency of y under SRSWOR over SRSWR:
Now we compare the variances of sample means under SRSWOR and SRSWR.
N −n 2
V ( y )WOR = S
Nn
N −1 2
V ( y )WR = S
Nn
N − n 2 n −1 2
= S + S
Nn Nn
= V ( y )WOR + a positive quantity
V ( y )WR > V ( y )WOR

and so, SRSWOR is more efficient than SRSWR.
9
Estimation of variance from a sample
Since the expressions of variances of sample mean involve S2 which is based on population values, so these
expressions can not be used in real life applications. In order to estimate the variance of y on the basis of a sample, an
estimator of S2 (or equivalently σ ) is needed. Consider s2 as an estimator of S2 (or σ 2 ) and we investigate its
2
biasedness for S2 in the cases of SRSWOR and SRSWR.
Consider
n
1
=s
n −1
2
∑ (y
i =1
i − y )2
n 2
1
=
n −1
∑
i =1
( yi − Y ) − ( y − Y ) 
1  n 2
= ∑
n − 1  i =1
( y i − Y ) 2
− n ( y − Y ) 

1  n 2
= ∑
E (s2 )
n − 1  i =1
E ( y i − Y ) 2
− nE ( y − Y ) 

1  n 
= ∑
n − 1  i =1
Var ( y i ) − n Var ( y ) 

1
=  nσ 2 − nVar ( y ) 
n −1
10
In case of SRSWOR
N −n 2
Var ( y )WOR = S
Nn
 2 N −n 2
n
= σ − Nn S 
and so E (s 2 )
n −1
n  N −1 2 N −n 2
= S − S 
n − 1  N Nn 
=S 2
In case of SRSWR
N −1 2
Var ( y )WR = S
Nn
 2 N −1 2 
n
and so= σ
E (s 2 )
 − S 
n −1
Nn 
n  N −1 2 N −1 2 
=  S − S 
n −1  N Nn 
N −1 2
= S
N
=σ2
 S 2 in SRSWOR
Hence E ( s 2 ) = 
σ in SRSWR
2
11
An unbiased estimate of Var ( y ) is
( y) N −n 2
Var WOR = s in case of SRSWOR.
Nn
 ( y ) = N − 1 . N s2
Var
Nn N − 1
WR
s2
= in case of SRSWR.
n
12

Lecture 4

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Lecture 4

Caricato da

Copyright:

Formati disponibili

Sampling Theory

 N − 1 of the  N  possible samples. So

Now we obtain the value of K under SRSWR.

Ignoring fpc will result in the over estimation of variance of y .

V ( y )WR > V ( y )WOR

biasedness for S2 in the cases of SRSWOR and SRSWR.

Potrebbero piacerti anche