Sei sulla pagina 1di 3

Stat 211 2014

Due: Thursday Feb 27 at the beginning of class

Homework 2
Joe Blitzstein and Tirthankar Dasgupta Collaboration policy: You are free to discuss the problems with others, though it is strongly recommended that you try the problems on your own rst. Copying is not allowed, and write-ups must be your own explanations in your own words. 1. (Multidimensional Cram er-Rao Lower Bound) In this problem, we develop the multidimensional version of the CRLB. For simplicity, assume that = (1 , 2 ) is a two-dimensional parameter. Assume regularity (the functions below are dierentiable, and dierentiation under the integral sign is legal) and nondegeneracy (e.g., variances are positive). The score function is now dened as the gradient of the log-likelihood function l(), and the observed information is now the negative of the Hessian of l(). The Fisher information matrix is the 2 by 2 matrix I () with Iij () = E 2 l() i j .

Analogously to the 1-dimensional case, I () is the variance-covariance matrix of the score function. (a) Find I () for the N (, 2 ) problem where = (, 2 ) (with and 2 unknown). (b) Let g : R, and suppose that T is an unbiased estimator of g (). Show that (under regularity conditions allowing us to DUThIS) Var(T ) (g ())I ()1 (g ()). Hint: rst explain why I () is invertible. Then prove the following generalization of the fact that correlation is between 1 and 1: for any random vectors X1 and X2 (possibly of dierent dimensions) such that the covariance matrix of X1 is invertible, Cov(X2 , X2 ) Cov(X2 , X1 )Cov(X1 , X1 )1 Cov(X1 , X2 ), in the sense that the dierence is nonnegative denite. (c) Suppose we are only interested in estimating 1 , with 2 an unknown nuisance param1 be an unbiased estimator of 1 . Show that eter. Let 1 ) I 11 (), Var( 1

where I ij () is the (i, j ) component of I 1 (). Is this bound larger or smaller than 1/I11 ()? Justify your answer using matrix theory (e.g., the result for the inverse of a partitioned matrix) and explain your answer intuitively (e.g., what if 2 were assumed known?). 2. (Conditions for attaining the Cram er-Rao lower bound) (a) Let f (y ), R, be the PDF of data vector Y . Suppose that the regularity conditions needed for the Cram er-Rao Theorem are satised. Show that if T (Y ) is an unbiased estimator of g (), then T (Y ) attains the Cram er-Rao lower bound if and only if a()[T (Y ) g ()] = log f (y )

for some a(). Also state the condition in words, in terms of exponential families. (b) Let Y1 , . . . , Yn be i.i.d. random variables with PDF f (y ) = y 1 , 0 < y < 1, > 0. Is there a function of , say g (), for which there exists an unbiased estimator which attains the Cram er-Rao lower bound? 3. (Fisher weighting) Assume that we are interested in estimating a scalar parameter . We have available K estimators 1 , . . . , K , all of which are unbiased, but with possibly dierent variances 2 . Our goal is to utilize all the information to get a more ecient combined Var(k ) = k estimator. Throughout the problem, eciency is measured by inverted variance. Let = (1 , . . . , K ) (viewed as a column vector). Use matrix notation whenever possible. (a) (Independent estimators) If the k s are from completely separate experiments or studies, it may be reasonable to assume that all of them are independent; assume that for this part. A natural combination rule is C = K k=1 wi k . Using Lagrange multipliers, nd the optimal weights w = (w1 , , wK ), i.e., the weights that yield the most ecient estimator, under the constraint that K k=1 wk = 1. Explain why the constraint is reasonable, and why the answer makes sense intuitively. (b) (Dependent estimators) In practice, the estimators k s may come from the same study, and they may not be independent. Assume Cov( ) = (ij ). Using matrix notation and Lagrange multipliers, nd the optimal w to minimize the variance of C , subject to the same constraint as (a). Note: For example, Y1 , . . . , Yn Gamma(). We may have two unbiased moment estimators 1 = Y and 2 = S 2 . Unfortunately, Y / S 2 . (c) ((a) continued) Although (a) seems intuitive, the answer is not completely satisfactory. Why should we use a linear combination, since quadratic functions or cubic functions of may have smaller variance or mean squared error? In order to answer this question, we should introduce a new statistical model. We treat as data, and assume that 2
i.i.d.

2 ). Write down the likelihood for , and nd the MLE of . How can we k N (, k justify Fisher weighting in (a) based the likelihood?

ind

Note: Normality is a very reasonable assumption in many cases. Under regularity conditions, many estimators are asymptotically Normal. (d) (MVN setting) Assume that NK (u, ), where u = (1, . . . , 1). Is the likelihood function log-concave and unimodal? Find the MLE, and explain how it relates to (b). 4. (Maximum likelihood, Newtons method, and Fisher scoring) Let S1 and S2 be independent with Sj (cj + ) 2 (ni ), with 0 an unknown parameter and with 0 < c1 c2 known constants. A statistician wishes to estimate , having observed S1 and S2 . (a) For this part and the next, assume that c1 = c2 . Construct the MLE of and the UMVUE, keeping in mind that is constrained to be nonnegative (so the domain of the likelihood function is [0, )). Are these necessarily the same? Does the score function necessarily have a rst derivative? Discuss whether the MLE or the UMVUE is better. Note: No calculations are needed to answer the part about which is better; do not assume large n1 , n2 , as this is not a question about asymptotic behavior. (b) For the MLE, determine expressions for the observed information (assuming that the data S1 , S2 make that possible) and for the expected information (evaluated at the MLE). Are these two informations necessarily the same? (c) Now and for the rest of the problem, suppose c1 < c2 . Show that neither (S1 , S2 ) nor any linear combination a1 S1 + a2 S2 is a complete sucient statistic. (d) Evaluate the score function and the expected information I (). Assuming that the likelihood function has a stationary point at some > 0, show how one can solve for the root by Newtons method, i.e., give the updating equation for how to get the next estimate as a function of the current estimate. Now show how one solves for the root by Fishers method of scoring. Show that root nding by Fishers method of scoring is equivalent to iteratively calculating the best weighted average of the separate unbiased estimates of , one based on S1 and the other based on S2 . 1 and 2 respectively, show that if is the That is, calling these unbiased estimates most recent estimate of , then the updated estimate mimics computing the best linear 1 + W2 2 with Wj proportional to the Fisher unbiased estimator, taking the form W1 . information for in j , evaluated at the most recent (e) Assume c1 = 1, c2 = 5, n1 = 6, n2 = 8 and that it is observed that S1 = 30, S2 = 56. In R (or other software), plot the likelihood function for . Use Fisher scoring to solve iteratively for the MLE, starting with the initial guess = 0. Write down the rst 5 iterated estimates, and the nal converged value, and note how many iterations it took with these data to achieve 5-digit accuracy. 3

Potrebbero piacerti anche