Sei sulla pagina 1di 12

Sampling Theory

MODULE VIII
LECTURE - 28

DOUBLE SAMPLING
(TWO PHASE SAMPLING)

DR. SHALABH
DEPARTMENT OF MATHEMATICS AND STATISTICS
INDIAN INSTITUTE OF TECHNOLOGY KANPUR

The ratio and regression methods of estimation require the knowledge of population mean of auxiliary variable (2X ) to
estimate the population mean of study variable (Y ). If information on auxiliary variable is not available, then there are
two options collect a sample only on study variable and use sample mean as an estimator of population mean.
An alternative solution is to use a part of the budget for collecting information on auxiliary variable to collect a large
preliminary sample in which xi alone is measured. The purpose of this sampling is to furnish a good estimate of X

This method is appropriate when the information about xi is on file cards that have not been tabulated. After collecting
a large preliminary sample of size

n ' units from the population, select a smaller sample of size n from it and collect

the information on y. These two estimates are used to obtain an estimator of population mean Y . This procedure of
selecting a large sample for collecting information on auxiliary variable x and then selecting a sub-sample from it for
collecting the information on the study variable y is called double sampling or two phase sampling. It is useful when it
is considerably cheaper and quicker to collect data on x than y and there is high correlation between x and y.

n ' is drawn from a population of size


N and then again a random sample of size n is drawn from the first sample of size n ' .
In this sampling, the randomization is done twice. First a random sample of size

So the sample mean in this sampling is a function of the two phases of sampling. If SRSWOR is utilized to draw the
samples at both the phases, then
N
- number of possible samples at first phase when a sample of size n is drawn from a population of size N is = M 0 ,
n'
say.
- number of possible samples at second phase where a sample of size n is drawn from the first phase sample of size

n '
= M 1 , say.
n

is

n'

Population of X
(N units)

Sample
(Large)

M 0 samples

n units

Subsample
(small)

M 1 samples

n units

Then the sample mean is a function of two variables. If is the statistic calculated at the second phase such that

2,..., M 0 , j 1, 2,..., M 1 with Pij being the probability that ith sample is chosen at first phase and jth sample is
=
ij , i 1,=
chosen at second phase, then

E ( ) = E1 [ E2 ( ) ] .

where E2 ( ) denotes the expectation over second phase and E1 denotes the expectation over the first phase. Thus
M 0 M1

E ( ) = Pij ij
=i 1 =j 1
M 0 M1

(using P( A B)
=
PP

=i 1 =j 1
M0

j /i

ij

P( A) P( B | A))

M1

Pi Pj / i ij .

i
j 1
=
=
1

1st stage

2nd stage

Variance of

Var=
( ) E [ E ( ) ]

=E [ ( E2 ( )) + ( E2 ( ) E ( )) ]

E [ E2 ( ) ] + [ E2 ( ) E ( ) ] + 0
=
2

= E1 E2 E2 ( )]2 + [ E2 ( ) E ( )

= E1 E2 [ E2 ( ) ] + E1 E2 [ E2 ( ) E ( )]
2

constant for E2
= E1 [V2 ( ) ] + E1 [ E2 ( ) E1 ( E2 ( ))]

= E1 [V2 ( ) ] + V1 [ E2 ( ) ]

Note: The two phase sampling can be extended to more than two phases depending upon the need and objective of

the experiment. Various expectations can also be extended on the similar lines .

Double sampling in ratio method of estimation


X is not known then double sampling technique is applied. Take a large initial sample of n '
size by SRSWOR to estimate the population mean X as
1 n'

X= x='
xi
n ' i =1
If the population mean

Then a second sample is a subsample of size n selected from the initial sample by SRSWOR. Let y and x be the
means of y and x based on the subsample. Then
=
E ( x ') X=
, E ( x ) X=
, E( y ) Y .
The ratio estimator under double sampling now becomes

y
YRd = x '
x
The exact expressions for the bias and mean squared error of

YRd are difficult to derive. So we find their approximate

expressions using the same approach mentioned while describing the ratio method of estimation.
Let

y Y
xX
x ' X
=
, 1
=
, 2
Y
X
X

=
0

( 0 ) E=
(1 ) E=
( 2 ) 0
E=
1 1
2
) C x2
E (1=
n N
E (1 2 )=
=
=

1
E ( x X )( x ' X )
X2
1
E1 E2 ( x X )( x ' X ) | n '
X2
1
E1 ( x ' X ) 2
2
X

[Note that only those values are used which are common to both the phases]

2
1 1S
= x2
n' N X

1 1
= C x2
n' N
= E ( 22 ).
E ( 0 2 ) = Cov ( y , x ')

Cov [ E ( y | n '), E ( x ' | n ') ] + E [Cov ( y , x ') | n ']

= Cov Y , X + E [Cov ( y ', x ') ]


= Cov [ ( y ', x ') ]

to both the phases are considered]

1 S xy

N XY
1 Sx S y

N X Y

1 1
= C xC y
n' N

where y '
is the sample mean of y ' s
based on the sample size n '.

1
=
n'
1
=
n'

[ only those units which are common

E ( 01 ) =

1
Cov( y , x )
xy

1 1 S
= xy
n N XY
1 1 S S
= x y
n N X Y
1 1
= CxC y
n N

E ( 2 ) =
=
=
=

1
V1 { E2 ( y / n ')} + E1 {V2 ( yn / n ')}
Y2
1
Y2

1 1 '2
'
V1 ( yn ) + E1 s y
n n '

1 1 1 2 1 1 2
Sy + Sy
Y 2 n ' N
n n'
2

1 1 S
= y2
n N Y
1 1
= C y2
n N
s '2y is the mean sum of squares of y based on initial sample of size n '.
where

1
Var ( y )
Y2

E (1 2 ) =

1
Cov( x , x ')
X2
1
Cov { E ( x / n '), E ( x '/ n ')} + 0
X2

1
Var ( X ')
X2

where Var ( X ') is the variance of mean of x based on initial sample of size n '.

Estimation error of YRd


Write Y as
Rd

(1 + 0 )
Y
(1 + 2 ) X
YRd
=
(1 + 1 )
X
=Y (1 + 0 )(1 + 2 )(1 + 1 ) 1
= Y (1 + 0 )(1 + 2 )(1 1 + 12 ...)
Y (1 + 0 + 2 + 0 2 1 o1 1 2 + 12 )

upto the terms of order two. Other terms of degree greater than two are assumed to be negligible.

Bias of YRd
) Y 1 + 0 + 0 + E ( 0 2 ) 0 E ( 01 ) E (1 2 ) + E (12 )
E (YRd =
(YRd ) E (YRd ) Y
Bias=
= Y E ( 0 2 ) E ( 01 ) E (1 2 ) + E (12 )
1 1
1 1
1 1
1 1
=
Y Cx C y Cx C y Cx2 + Cx2
n N
n' N
n N
n ' N
1 1
= Y ( Cx2 Cx C y )
n n'
1 1
=
Y Cx (Cx C y ).
n n'
The bias is negligible if n is large and relative bias vanishes if Cx2 = Cxy , i.e., the regression line passes through origin.

Mean squared error of YRd :


MSE (=
YRd ) E (YRd Y ) 2`
Y 2 E ( 0 + 2 1 ) 2 (retaining the terms upto order two)
= Y 2 E 02 + 12 + 22 + 2 0 2 2 01 21 2
1 1

1 1
1 1
1 1
1 1
= Y 2 C y2 + Cx2 + Cx2 + 2 Cx C y 2 Cx C y
n N
n' N
n N
n' N
n N

1 1
= MSE (ratio estimator) + Y 2 ( 2 Cx C y Cx2 ) .
n' n

10

1 1
1 1
= Y 2 ( Cx2 + C y2 2 Cx C y ) + Y 2 C x (2 C y C x )
n N
n' N

The second term is the contribution of second phase of sampling. This method is preferred over ratio method if

2 Cx C y Cx2 > 0
or >

1 Cx
.
2 Cy

Choice of n and n
Write

V V'
MSE (YRd =
+
)
n n'
where V and V ' contain all the terms containing n and n 'respectively.
The cost function is C
=
nC + n ' C ' where C and
0

C ' are the costs per unit for selecting the samples n and n '

respectively.
Now we find the optimum sample sizes n and n ' for fixed cost

V
n

= +

C0 . The Lagrangian function is

V'
+ ( nC + n ' C ' C0 )
n'

V
0 C = 2
=
n
n

V'
0 C ' = 2 .
=
n '
n'
Cn 2 = V
or
or
Similarly

n=

V
C

nC = VC .
n ' C ' = V ' C '.

11

Thus

Thus

VC + V ' C '
C0

and so

=
Optimum n

C0
V
=
nopt , say
VC + V ' C ' C

=
Optimum n '

C0
V'
'
=
nopt
, say
VC + V ' C ' C '

V
V'
+ '
nopt nopt

=
Varopt (YRd
)
=

( VC + V ' C ') 2
.
C0

Comparison with SRS


If X is ignored and all resources are used to estimate Y by y , then required sample size =

C0
.
C

S y2
CS y2
( y) =
Var
=
C0 / C C0
CS y2
Var ( y )
.
=
2
Varopt (YRd ) ( VC + V ' C ')

12

Relative effiiency =

Potrebbero piacerti anche