Sei sulla pagina 1di 6

STAA 567 Lec 10: Bootstrap estimates of bias and

confidence intervals
(Instructor : Nishant Panda)
Additional References
1. (IB) : An Introduction to the Bootstrap, Efron and Tibshirani, Chapman & Hall/CRC

Introduction

Last lecture we used the bootstrap technique to estimate the standard error of an estimator. In this lecture,
we will go further and see how we can use bootstrap to estimate bias and construct confidence intervals. The
following notation from last lecture is written down for completeness.
Let X F be a random variable (could be multidimensional!) whose c.d.f is given by F (x). Let us denote
the expectation and variance of X by F and F2 to emphasize the distribution. These are all parameters of
the distribution F . In general, we denote a parameter as a function the distribution F , denoted as = T (F )
or F for short . Let X1 , X2 , . . . , Xn be a random sample of size n of X. An estimator , b is a function of
the sample, i.e = h(X1 , X2 , . . . , Xn ). For example, if F = F , then = X is an estimator for the mean.
b b
Similarly, if F = F2 , then b = S 2 is an estimator for the variance. Let Fb be an estimated c.d.f, then bFb is
the plug-in estimator for .
As usual if F = F2 , then S 2 is an estimator for the variance given by
n
1 X
b = S 2 = (Xi X)2 .
n 1 i=1

But the plug-in estimator using the emperical c.d.f is


n
2 1X
c
F
= =
b n (Xi X)2 .
b F
i=1

Estimating Bias through bootstrapping.


Let X F . Let X1 , X2 , . . . , Xn be a random sample of X. Let Fb be an estimated c.d.f. And say we are
interested in estimating some parameter F using the sample through an estimator .b The plug-in estimator
is denote by Fb. We define the Bias of the estimator as,
c b

h i
b ) = EF b F .
F (,

As you must now have guessed, the Bootstrap estimate of the bias is given by the plug-in estimator,
h i
b ) = E b c
Fb(, Fb F
b

Note, that the expectation can be approximated from Monte Carlo,

R
h i 1 X\
EFb b (xj )
R j=1

where the notation is from last lecture. Thus, Bootstrap bias estimate of b is

1
R
b ) 1
X
\
Fb(, (xj ) F
c.
R j=1 b

Example 1: Estimate the bias of S 2 from last Lecture using Bootstrap.


# Observed data from Last Lecture
set.seed(42)
n <- 40
obs.data <- rnorm(n, mean = 5, sd = 2)
# First let us create our estimator
theta.hat <- function(sample){
return (var(sample))
}
# Set the number of bootstrap samples R
R <- 1e3
# create a vector for sampling distribution
theta.hat.dist <- rep(0, R)

# do the bootstrap to get the sampling distribution


for (boot in 1:R){

# get sample of size n from F hat.


x.star <- sample(obs.data, n, replace = TRUE)

# get the estimated value from the estimator


theta.hat.dist[boot] <- theta.hat(x.star)
}
# create the plug-in estimator function
g <- function(x, x_bar){
(x - x_bar)^2
}

# Compute the bias

# Get the expected value of the estimator from bootstrap


mu.Fhat <- mean(theta.hat.dist)

# Get the plug-in estimator


theta.Fhat <- sum(sapply(obs.data, g, x_bar = mean(obs.data)))/n

# Compute the bias as the difference


bias.boot <- mu.Fhat - theta.Fhat
bias.boot
## [1] -0.03825479
Let us check if we did the above correctly with the boot package in R
library(boot)
# our test statistic is the sample variance
sample.var <- function (x, d){
return(var(x[d]))

2
}

# get the bootstrap object


b <- boot(obs.data, sample.var, R)

b
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = obs.data, statistic = sample.var, R = R)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 5.976929 -0.1527511 1.195859
The Bias seems different!. The boot pacakage seems to be computing bias using
h i
b ) = E b b
Fb(, Fb

That is it does not use the plug-in estimator in the last term but uses the original estimator. If b was a plug-in
estimator the boot package would have given the same answer. (There is no way for a code to know what is
the plug-in estimator for a general estimator. It can only take the estimator you have as an argument.)
# Compute the bias as the difference between
# bootstrap expectation and original estimator
bias.boot.new = mu.Fhat - var(obs.data)
print(bias.boot.new)
## [1] -0.187678
This should not be a problem when the size of your observed data is large but it is something to keep in mind.

(Home Assignment!) For the obs.data here, assume that your estimator is not S 2 but in-fact
the plug-in estimator
n
2 1X
b = (Xi X)2 .
F n i=1
That is, you are estimating 2 using 2 . Mimic the code above and compute the Standard
F
b
Error and Bias of 2 . Also check with boot package.
Fb

Bootstrap Confidence Interval

Percentile Confidence Intervals


The simplest Bootstrap Confidence Interval can be described as follows. If b is the parameter of interest,
\
and {(x
j )} are the bootstrap estimates for 1 j R, then the sampling distribution of is approximated
b
d )}. Say we want a 90% confidence interval. Then using the histogram we get the
by the histogram of {(x j
5th and 95th percentiles and these form the lower and upper end of the interval. In math lingo, if we seek a
confidence level (1 ) let h i
m= R
2

3
where, [] denotes the greatest integer function. Then, to compute the (1 ) confidence interval,
\
1. Order the bootstrap estimates {(x
j )} in increasing order


d
(1) , (2) , . . . , (R)
d d

2. The confidence interval is given by  



d \
(m) , (R+1m)

This is easier to code than to write!

Example 2: Construct a 95% confidence interval for b as in Example 1.


# We already did all of the work here.
bounds <- quantile(theta.hat.dist, c(0.025, (1 - 0.025)))
sprintf("(%s, %s)", round(bounds[1],2), round(bounds[2],2))
## [1] "(3.54, 8.23)"
As a sanity check (that is using actual values)
m = floor(R*0.025)
sorted.thetaH <- sort(c(theta.hat.dist, var(obs.data)))
sprintf("(%s, %s)", round(sorted.thetaH[m], 2), round(sorted.thetaH[(R+1-m)]),2)
## [1] "(3.54, 8)"
Let us check this using boot package
# b was our bootstrap object from before
boot.ci(b, type = "perc")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = b, type = "perc")
##
## Intervals :
## Level Percentile
## 95% ( 3.523, 8.216 )
## Calculations and Intervals on Original Scale

Studentized Bootstrap
The percentile Bootstrap can get narrow in practice (undercoverage). The studentized Bootstrap (also known
as Bootstrap-t) is a technique that prevents this flaw in the percentile Bootstrap. First some theory. For
some unbiased estimator b for if C.L.T holds, then

b
Z= AN (0, 1)
SE

But SE is typically not known. Student found that if b = X for data drawn from normal distribution, then

b
T = tn1
SE
d

4
For any arbitrary estimator ,
b this may not be true. Bootstrap-t method tries to get an approximation of T
directly from data using the fact that b is distributionally closer to b ,
b where b is the bootstrap
distribution of i.e
b
b b
T
d
SE
d is the plug-in estimate of SE . Here is an algorithm to get the distribution of Z.
where, SE
1. Get a bootstrap sample x = (x1 , x2 , . . . xn ). Suppose (and this is a big assumption) you can estimate
the SE of b for this sample x , then
\
(x )
b
T (x ) =
d )
SE(x

2. Repeat this some R times and get the distribution of T .


3. The confidence interval is then given by

 
b t1/2 SE
d , b t SE
/2
d

d ? This is mind-bending (inception). Let us go step by step.


The big question here is how do we get SE
r h i
d =
SE varFc b

Now, 
h i h i2 
varFc b = EFc b EFc b

h i
c ! If b is complicated, we need to use Bootstrap to estimate var
Note the F b
Fc ! This is double bootstrap.

In order to get the distribution of a statistic we take bootstrap samples from the data. In order to
get the distribution of the bootstrap statistic we take bootstrap samples from the bootstrap data!
Population : c.d.f is F , parameter is F
You sample from F , you get data!
Data : approx c.d.f is Fb, estimator is b
You sample from Fb you get the Bootstrap sample
Bootstrap sample : approx c.d.f is Fb , Bootstrap estimator is b

Example 3: Let X F be a random variable and let X1 , X2 , . . . , Xn be a random sample.


Suppose you are interested in estimating the mean of X by the sample mean. That is, your
parameter F = F and your estimator b = X. Let Fb be the emperical c.d.f What are the
standar error SE, the plug-in estimator c
b, the plug-in estimate of the standard error SE, the
F
d

bootstrap estimator and the plug-in estimator of the bootstrap standard error SE ? .
b d

1. standard errorSE.
q  
SE(X) = varF X

Using the fact that,


  2
varF X = F
n

5
, we get
F
SE(X) = .
n
2. the plug-in estimator c
b By definition, plug-in estimator is the parameter evaluated w.r.t F .
F
b

c
b = Fb = Fb
F

Note that, since F = EF [X],


Fb = EFb [X]
This is easy to calculate
n
1X
EFb [X] = Xi = X
n i=1

3. the plug-in estimate of the standard error SE


d
q  
plug-in SE
d = SE =
F
b varFb X

Using the fact that, " n #


  1X
varFb X = varFb Xi
n i=1
along with the fact that Xi is an iid sample of X, we get
 2 
  1 1
varFb X = varFb [X] = EFb X EFb [X]
n n

Thus,
n
  1 X 2
varFb X = 2 Xi X
n i=1
s
Pn 2
Xi X
i=1
plug-in SE
d=
n
4. Bootstrap estimator b Let X1 , X2 , . . . , Xn be a bootstrap sample and let F
c be the emperical c.d.f of
the bootstrap sample. Then, if = h(X1 , X2 , . . . , Xn ), then = h(X1 , X2 , . . . , Xn ). Thus,
b b

n
1X
b = X = X
n i=1 i
5. Convince yourself that s
n 2
Xi X
P
i=1
d =
plug-in SE
n

(Home Assignment!): With the same observed data obs.data as in the examples, say you are
now interested in the mean and your estimator is X. Get the studentized bootstrap confidence
interval for this estimator. You will need Example 3 i.e you dont need to do double bootstrap.

In the next Lecture we will see how to implement double bootstrap to construct a studentized bootstrap
confidence interval for S 2 .

Potrebbero piacerti anche