Sei sulla pagina 1di 9

STAT 135 Solutions to Homework 4: 30 points

Spring 2015

Problem 1: 10 points
In each of the following cases, (i) write down the likelihood function
 ofθ, (ii) show that the corresponding T (X)
is a sufficient statistic, (iii) compute θ̂M LE and (iv) compute E θ̂M LE .

P
1. X1 , ..., Xn iid Poisson random variables with rate λ = θ + 1, with T (X) = i Xi
(i)
The likelihood is given by
n Pn
Y e−θ+1 (θ + 1)Xi e−n(θ+1) (θ + 1) i=1 Xi
lik(θ) = = Qn
Xi ! i=1 Xi !
i=1

(ii)

Recall that a necessary and sufficient condition for T to be sufficient for θ is that

fθ (x1 , ..., xn ) = gθ (T )h(x1 , ..., xn )

i.e. that the density can be factored into a product such that one factor, h, does not depend on θ, and the other
factor, which does depend on θ, depends on (x1 , ..., xn ) only through T .
In this case, we have that
Pn
e−n(θ+1) (θ + 1) i=1 xi
fθ (x1 , ..., xn ) = Qn
i=1 xi !
Pn 1
= e−n(θ+1) (θ + 1) i=1 xi
· Qn
i=1 xi !
1
= e−n(θ+1) (θ + 1)T · Qn
i=1 xi !

So our density satisfies the theorem, since gθ (T ) = e−n(θ+1) (θ + 1)T depends on (x1 , ..., xn ) only through T (x) =
Qn 1
P
i xi , and h(x1 , ...., xn ) =
i=1 xi !
does not depend on θ.

(iii)

1
The log-likelihood is given by
X X
`(θ) = log(lik(θ)) = −n(θ + 1) + Xi log(θ + 1) − Xi !
i i

So differentiating with respect to θ gives


P
∂` Xi
= −n + i
∂θ θ+1
Setting to zero to identify the value of θ that maximizes the log-likelihood, we get that
P
i Xi
−n + =0
θ̂M LE + 1
Thus, rearranging, we get P
i Xi
θ̂M LE = −1
n
(iv)
P 
i Xi
E(θ̂M LE ) = E −1
n
P
E ( i Xi )
= −1
P n
E(Xi )
= i −1
P n
(θ + 1)
= i −1
n
n(θ + 1)
= −1
n
=θ+1−1

So that our MLE is unbiased.

2. X1 , ..., Xn iidPrandom variables with exponential distribution, i.e. with density f (x|θ) = θeθ x ,
x > 0. T (X) = i Xi
(i)

n
Y Pn
lik(θ) = θeθXi = θn eθ i=1 Xi

i=1

(ii)

Again, recall the sufficient and necessary condition for T (X) to be a sufficient statistic and note that our likelihood
function can be written as such a product, where

gθ (T ) = θn eθT

2
and
h(x1 , ..., xn ) = 1
(iii)
The log-likelihood function is given by
n
X
`(θ) = log(lik(θ)) = n log θ + θ Xi
i=1

Differentiating, we have
n
∂` n X
= + Xi
∂θ θ
i=1

And setting to zero to identify the value of θ which maximizes the log-likelihood, we have that
n
n X
+ Xi = 0
θ̂M LE i=1

where, rearranging yields


n
θ̂M LE = Pn
i=1 Xi
(iv)

 
n
E(θ̂M LE ) = E
Pn
Xi
 i=1 
1
= nE Pn
i=1 Xi
Pn Pn
The hint tells us that i=1 Xi ∼ Γ(n, θ), so if we let Y = i=1 Xi , we simply need to find E(1/Y ) where
Y ∼ Γ(n, θ). We can do this as follows:


1 θn n−1 −θy
  Z
1
E = y e dy
Y 0 y Γ(n)
Z ∞ n
θ
= y n−2 e−θy dy
0 Γ(n)
Γ(n − 1)θ ∞ θn−1
Z
= y n−2 e−θy dy
Γ(n) 0 Γ(n − 1)
Γ(n − 1)θ
= (Since the integrand is just the Γ(n − 1, θ) density)
Γ(n)
θ
= (Since Γ(k) = (k − 1)!)
n−1
Thus
n
E(θ̂M LE ) = θ
n−1

3
Problem 2: Let X1 , ..., Xn be iid random variables, uniformly distributed over
(θ, 2θ). 10 points
1. Show that a sufficient statistic for θ is T (X) = (mini Xi , maxi Xi )
Note that we can write the density for the U nif orm(θ, 2θ) distribution as
1
fθ (x) = 1(θ ≤ x ≤ 2θ)
θ
(
1 if A is true
where 1(A) =
0 otherwise
so
n
1
1(θ ≤ xi ≤ 2θ)
Y
fθ (x1 , ..., xn ) =
θ
i=1

Where we note that for the product to be non-zero, we need θ ≤ xi ≤ 2θ for all i = 1, 2, ..., n, which is equivalent
to maxi xi ≤ 2θ and mini xi ≥ θ. So
1
fθ (x1 , ..., xn ) = 1(max xi ≤ 2θ , min xi ≥ θ)
θn i i

Thus T = (mini Xi , maxi Xi ) is a sufficient statistic by the factorization theorem, where gθ (T ) = 1


θn 1(maxi xi ≤
2θ , mini xi ≥ θ) depends on the xi ’s only through T = (mini Xi , maxi Xi ) and h(x1 , ..., xn ) = 1.

2. Show that an unbiased estimator for θ is θ̂ = 32 X1


Note that E(Xi ) = 12 (θ + 2θ), so
 
2 2 2 3θ
E(θ̂) = E X1 = E(X1 ) = =θ
3 3 3 2

so θ̂ is an unbiased estimator for θ.

3. Compute θ̂M LE
Note that the likelihood is given by
1
lik(θ) = 1(max Xi ≤ 2θ , min Xi ≥ θ)
θn i i

which is clearly maximized when θ takes its smallest possible value such that θ ≤ Xi ≤ 2θ for all i = 1, ..., n,
specifically, X2i ≤ θ for all i = 1, ..., n. Thus
maxi Xi
θ̂M LE =
2

4
4. Using T (X), find an unbiased estimator of θ whose mean-squared error is at least as good
as that of θ̂
Recall that the Rao-Blackwell theorem tells us that, given an estimator θ̂, we can find an estimator whose MSE
is at least as good as that of θ̂ if we know a sufficient statistic T . In particular, that estimator is θ̃, which can be
calculated using
2
θ̃ = E(θ̂|T ) = E(X1 | min Xi , max Xi )
3 i i
we can write this as
2
θ̃ = E(X1 |a = min Xi , b = max Xi )
3 i i

Note that using the law of total probability, we can separate the expectation into several cases as follows:

E(X1 |a = min Xi , b = max Xi ) = E(X1 |a = min Xi , b = max Xi , X1 = a) × P (X1 = min Xi )


i i i i i
+ E(X1 | min Xi = a, max Xi = b, X1 = b) × P (X1 = max Xi )
i i i
+ E(X1 | min Xi = a, max Xi = b, X1 > a, X1 < b) × P (X1 6= min Xi , X1 6= max Xi )
i i i i
 
1 1 a+b 2
=a× +b× + × 1−
n n 2 n
a+b
=
2
Thus,

2 mini Xi + maxi Xi
θ̃ =
3 2
mini Xi + maxi Xi
=
3

5. Can you use T (X) to improve θ̂M LE as well? Why?


Note that since the MLE is part of the sufficient statistic, the conditional expectation does not change the estimator.
In particular, the improved estimator is simply
 
  maxi Xi maxi Xi
E θ̂M LE |T = E (min Xi , max Xi ) =
2 i i 2
which is just the ML estimator. Thus in this case Rao-Blackwell does not improve the MLE.

6. How do you think the resulting estimator from (4) compares to θ̂M LE in terms of the
mean-squared error? You can simulate the experiment to help you interpret
Note that for this example, the basic assumptions that our density is differentiable as a function of θ, with a
derivative that is jointly continuous in x and θ, and that the support (range over which the distribution is defined)
does not depend on θ is certainly not satisfied. This is because our support is (θ, 2θ) which clearly depends on our
parameter θ, the asymptotic properties of the MLE (the properties which make the MLE so nice) do not apply,
which is why the MLE histogram does not look normal. As a result, we might expect that the improved estimator
from (4) would achieve smaller MSE than the MLE.

5
set.seed(123)
library(ggplot2)
library(grid)
library(gridExtra)
# let's simulate for theta = 2
theta <- 2

mle <- c()


est <- c()
# do 1000 simulations
for(i in 1:1000){

# draw a sample of size 500 from the uniform(theta, 2theta) distribution


sample <- runif(500,theta,2*theta)

# calculate the MLE estimate


mle[i] <- max(sample)/2

# calculate the theta.hat estimate from (4)


est[i] <- (min(sample) + max(sample))/3
}

# put results into a data frame


est.df <- data.frame(mle = mle, est = est)

# plot histograms
gg.mle <- ggplot(est.df) +
geom_histogram(aes(x = mle), col = "white", binwidth = 0.0015) +
scale_x_continuous(limits = c(1.985,2.015)) + ggtitle("MLE")
gg.est <- ggplot(est.df) +
geom_histogram(aes(x = est), col = "white", binwidth = 0.0015) +
scale_x_continuous(limits = c(1.985,2.015)) + ggtitle("Estimate from (4)")
grid.arrange(gg.mle,gg.est,ncol = 2)

6
MLE Estimate from (4)
400 400

count 300 300

count
200 200

100 100

0 0
1.99 2.00 2.01 1.99 2.00 2.01
mle est

# calculate ht ebias for MLE


bias.mle <- mean(mle) - theta
bias.mle

## [1] -0.001982282

# calculate the bias for estimator from (4)


bias.est <- mean(est) - theta
bias.est

## [1] -2.870242e-05

# the estimate from (4) has much smaller bias (MLE has a bias 69 times larger than the estimate!)
bias.mle/bias.est

## [1] 69.06322

# calculate the MSE for the MLE


mse.mle <- (theta - mean(mle))^2 + var(mle)
# calcualte the MSE for the estimate from (4)
mse.est <- (theta - mean(est))^2 + var(est)

# the MLE has MSE at least twice the size of that of the estimate from (4)
mse.mle/mse.est

## [1] 2.263549

7
Problem 3: We want to compute the probability that the Sun will rise to-
morrow, given that we know it has risen every day for the last 500 years. Let
us denote Xi = 1 the event that the sun rose at day i and Xi = 0 otherwise,
i = 1, ..., n. Given a value p ∈ [0, 1], we model Xi ∼ B(p) and assume that Xi
are independent conditionally on p, thus we do not consider nay cosmological
model whatsoever. 10 points
1. Show that the likelihood of observing Xi = xi , i = 1, ..., n given p is

lik(p) = P (X1 = x1 , ..., Xn = xn |p) = ps (1 − p)n−s ,


P
where s = i xi
We have that
n
Y
lik(p) = pxi (1 − p)1−xi
i=1
P P
xi i (1−xi )
=p i (1 − p)
= ps (1 − p)n−s

2. We make no prior assumptions on p, except for the fact that the experiment (Sun rising
or not) is allowed to succeed or fail. Therefore we choose as prior distribution p ∼ U [0, 1].
Show that in that case the posterior distribution for p is
ps (1 − p)n−s
f (p|X1 = x1 , ..., Xn = xn ) = R 1
0
p0s (1 − p0 )n−s dp0

Recall that the posterior distribution for Θ given X is given by

fX|Θ=θ (x|θ)fΘ (θ)


fΘ|X=x (θ|x) = R
fX|Θ=θ0 (x|θ0 )fΘ (θ0 )dθ0

thus, since θ = p ∼ U [0, 1], we have f (p) = 1 and f (x|p) = ps (1 − p)n−s , it follows immediately that the posterior
distribution for p given X is given by

ps (1 − p)n−s
f (p|X1 = x1 , ..., Xn = xn ) = R 1
0 p0s (1 − p0 )n−s dp0

R1
3. Since 0 p0s (1 − p0 )n−s dp0 = s!(n−s)!
(n+1)!
, it results that p|(Xi = xi ) is distributed following a Beta
distribution with parameters α = s + 1 and β = n − s + 1. Using the fact that if Y ∼ Beta(α, β)
α
then E(Y ) = α+β , show that
n
!
X s+1
P Xn+1 = 1 Xi = s =

i=1
n+2

8
Using the law of total probability, we have that
n n n
! Z ! !
X 1 X X
P Xn+1 = 1 Xi = s = P Xn+1 = 1 p, Xi = s f p Xi = s dp

i=1 0 i=1 i=1
n
!
Z 1 X
= P (Xn+1 = 1|p)f p Xi = s dp

0 i=1
 P 
n
Where P Xn+1 = 1 p, i=1 Xi = s = P (Xn+1 = 1|p) since the Xi ’s are independent when conditioning on p

(given in the question set-up). Continuing on, since P (Xn+1 = 1|p) = p, we have
n n
! Z !
X 1 X
P Xn+1 = 1 Xi = s = pf p Xi = s dp

i=1 0 i=1
n
!
X
=E p Xi = s

i=1
s+1
=
n+2
Since p|(Xi = xi ) follows a Beta(s + 1, n − s + 1) distribution.

Potrebbero piacerti anche