M347 201806

M3471806F1PV2
M347/F
Module Examination 2018

Mathematical statistics
Thursday 7 June 2018 2.30 pm – 5.30 pm
Time allowed: 3 hours
The examination is in TWO parts. You should attempt both parts.

Part 1
This part consists of 25 computer-marked questions. You should attempt
ALL questions in Part 1. Each question is worth 2 marks. Part 1 as a
whole therefore carries 50% of the total marks for the examination.
Record your answers for Part 1 on the computer-marked examination
(CME) form provided, following the instructions given on the next page.
Part 2
This part consists of seven longer questions. Throughout this part you
should show all the main steps in your calculations.
You should answer FIVE questions from Part 2. If you answer more than
five questions, then the marks from your best five answers will be counted.
Each question in Part 2 carries 10 marks in total, and the marks allocated
to each part of each question are indicated in the paper. Part 2 as a whole
carries 50% of the total marks for the examination. Record your answers
for Part 2 in the answer book(s) provided. Please write on the front of
your answer book(s) the numbers of the questions you have attempted in
Part 2.
A booklet containing a list of useful results and formulae from the module
can be found enclosed in this examination paper.
At the end of the examination
Make sure that you have completed the CME form, including Part 2 of the
form. Check that you have written your personal identifier and
examination number on each answer book used. Failure to do so will
mean that your work cannot be identified.
Put all your used answer books together with your signed desk record on
top. Fasten them in the top left corner with the round paper fastener.
Attach this question paper and your completed CME form to the back of
the answer books with the flat paper clip.
Copyright
c 2018 The Open University
PART 1
• This part of the paper carries 50% of the total marks. (Allow some
time to check that your selections have been correctly entered on the
CME form.)
• You should attempt ALL the questions in this part of the
examination.
• You should note that for some of these questions you may be
required to select more than one answer from the options given. All
such questions include an instruction like ‘You should select TWO
options for this question’.
Instructions for completing the computer-marked examination

(CME) form
1. You will find one CME form provided with this paper.
2. You should use a pencil to make entries on the CME form. If you
make any smudges or other marks on the form that you cannot
cancel out clearly, then you should ask the invigilator for a new
form, and transfer your entries onto it.
3. Please note that for each question you should pencil across either
the required number of cells or the ‘don’t know’ cell (denoted by
a ‘?’).
4. If you think that a question is unsound in any way, pencil across
the ‘unsound’ cell (U) in addition to pencilling across either an
answer cell or the ‘don’t know’ cell.
5. On Part 1 of the CME form, write your name and personal
identifier (not your examination number), and the assignment
number for the examination (M347 81). You should also pencil
across the cells in the two blocks in Part 1 of the form
corresponding to your personal identifier and the assignment
number given above.
6. Please note that you will not be allowed extra time at the end of
the examination to fill in your CME form.
7. On the sample CME form opposite, Part 1 has been completed for
a fictitious student so that you can see how to complete this part of
the form.
Failure to follow the above instructions may mean that we are unable to
identify your work and award a mark for this part of the examination.
2 M347 June 2018

Sample CME form
M347 June 2018 TURN OVER 3

Question 1
Let X follow the distribution with cdf
x2
F (x) = on 0 < x < 2.
4
Choose the option that gives the value of the upper quartile of X.
Options for Question 1

√
A 3/4 B − 3 C 3
√ √ √
D 3 E 3/2 F 3/4
Question 2
Let X follow the beta distribution with pdf
f (x) = K(1 − x)3 on 0 < x < 1.
Choose the option that gives the value of K.

A 1/2 B 1/3 C 1/4
D 2 E 3 F 4
Question 3
Let X follow a binomial distribution with pmf

n x
p(x) = p (1 − p)n−x on x = 0, 1, . . . , n.
x
If E(X) = 4 and V (X) = 0.8 choose the option that gives the value of p.

A 0.2 B 0.4 C 0.5
D 0.6 E 0.8 F 0.9
Question 4
Suppose that, marginally, X follows an exponential, M(1), distribution
and that, conditionally, Y | X = x follows an (exponential) M(x)
distribution.
Select the option that gives the conditional distribution of X | Y = y.

A Gamma(1, y + 1) B Gamma(2, y + 1) C M(x)
D M(x + 1) E M(y) F M(y + 1)
4 M347 June 2018

Question 5
Let X and Y be random variables such that
V (X + Y ) = 80 and V (X − Y ) = 40.
Choose from the following options the value of Cov(X, Y ).

A 40 B 20 C 10
D −10 E −20 F −40
Question 6
Suppose that X and Y follow a bivariate normal distribution with
μX = 1/2, μY = 1/2, σX = 1, σY = 1 and ρ = 1/2.
Choose from the following options the one statement that is TRUE.
A X − Y is normally distributed and V (X − Y ) = 1.
B X − Y is normally distributed and V (X − Y ) = 3/2.
C X − Y is normally distributed and Pr(X > Y ) = 0.
D X − Y is not normally distributed.
E X and Y are independent and Cov(X, Y ) = 0.
F X and Y are correlated and Cov(X, Y ) = 1.
Question 7
Consider the estimator of the population mean μ that has the form
1√
n
n + Xi
μ̃ = 2 √ i=1 .
n+ n
Choose from the following options the bias of μ̃.

1√ 1
A 0 B n C √
2 2(1 + n)
√
μ μ n 1 − 2μ
D √ E √ F √
n+ n 2(n + n) 2(1 + n)

Question 8
Suppose independent observations x1 , x2 , . . . , xn are available from a
shifted Pareto distribution, for which the information based on a single
observation is
1
i(θ) = .
(1 + θ)2
Choose from the following options the Cramér-Rao lower bound for the
variance of any unbiased estimator of θ.

√
A (1 + θ)2 / n B (1 + θ)2 C n(1 + θ)2
√
D (1 + θ)2 /n E n/(1 + θ)2 F (1 + θ)/ n
Question 9
Suppose that independent observations x1 , x2 , . . . , xn are available from
the gamma distribution with parameters 3 and b, which has pdf
f (x) = 12 b3 x2 e−bx .
The log-likelihood is

n
l(b) = −n log 2 + 3n log b + 2 log xi − bnx
i=1
where x is the sample mean. The MLE of b is b = 3/x.

A likelihood ratio test is to be performed of H0 : b = 2 against the
alternative hypothesis, H1 : b = 2.
Choose the correct value of the test statistic from the following options.

A 3n{log( x3 ) − 1} B 3
n[3{log( 2x ) − 1} + 2x]
3
C 2n[3{log( 2x ) − 1} + 2x] D n[3 log( x2 ) − x2 + 2x]
E 2n[3 log( x2 ) − x2 + 2x] F 3n{log( x3 ) − 12 }
6 M347 June 2018

Question 10
Suppose that X ∼ U(0, θ), the uniform distribution on (0, θ). Given
independent random variables X = (X1 , X2 , . . . , Xn ) from this
distribution, let Xmax denote the maximum value. It can be shown
that Xmax /θ has the power distribution with parameter n, which has
density
f (x) = nxn−1 on 0 < x < 1.
Select the two correct statements from the following options.
Note: you should select TWO options for this question.
A Xmax /θ is a pivot.
B Xmax is a pivot.
C θXmax is a pivot.
D If t(X, θ) is a pivot, a 100(1 − α)% confidence interval for θ
consists of those values of θ such that
pα/2 < t(X, θ) < p1−(α/2)
where pα/2 and p1−(α/2) are the (α/2)- and (1 − (α/2))-quantiles of
the pivot’s distribution.
E Pivots are used in conjunction with their asymptotic χ2
distributions to produce approximate confidence intervals.
F If t(X, θ) is a pivot, a 100(1 − α)% confidence interval for θ
consists of those values of θ such that
tα/2 < t(X, θ) < t1−(α/2)
where tα/2 and t1−(α/2) are the (α/2)- and (1 − (α/2))-quantiles of
the t(n − 1) distribution.
Question 11
Let X be a random variable that has a symmetric distribution with
mean 0 and upper quartile q3/4 . Use Chebyshev’s inequality to choose
from the following options the correct inequality for the variance V (X)
relative to the upper quartile q3/4 .

2 2
A V (X) ≤ q3/4 /4 B V (X) ≤ q3/4 /4 C V (X) ≤ q3/4 /2
2 2
D V (X) ≥ q3/4 /4 E V (X) ≥ q3/4 /4 F V (X) ≥ q3/4 /2

Question 12
Suppose x1 , x2 , . . . , xn are independent observations from the
Poisson(μ) distribution with pmf
μx e−μ
p(x) = on x = 0, 1, . . . .
x!
Then i(μ), the Fisher information contained in a single observation, is
equal to 1/μ.
Choose from the following options the approximate distribution of the
MLE μn of μ.

A N(μ, 1/(nμ)) B N(μ, μ/n) C N(0, 1/(nμ))
D N(0, μ/n) E N(μ, n/μ) F N(0, 1)
Question 13
Let X be an observation from the distribution with pmf
f (x|θ) = (x − 1)θ 2 (1 − θ)x−2 , on x = 2, 3, . . .
where 0 < θ < 1.
A prior is required for θ which represents a lack of any idea about the
value of θ.
Choose from the following options the most suitable choice of prior
for θ.

A Beta(1, 1) B N(0, 1) C U(0, ∞)

D Beta 12 , 12 E Gamma(1, 1) F Poisson(1)
Question 14
The number of phone calls received by a specific help-line in a day is
assumed to follow a Poisson distribution with mean λ. The prior
knowledge about the value of λ corresponds to the gamma distribution
1 2 −λ/3
f (λ) = λe , on λ > 0.
54
In the next 4 days the help-line received 3, 7, 6 and 10 phone calls,
respectively.
Choose from the following options the posterior distribution of λ.

A Poisson(1) B Poisson(3) C Poisson(9)
D Gamma(28, 13/3) E Gamma(29, 13/3) F Gamma(29, 7)
8 M347 June 2018

Question 15
Suppose that 5 independent random variables X1 , X2 , . . . , X5 are each
modelled by a normal distribution with unknown mean μ and known
variance σ02 = 1. Suppose further that a normal prior is used for μ, with
mean a = 0 and variance b = 1, so that μ ∼ N(0, 1). For this model it
is known that the predictive distribution for a future observation Y
given the observed data x = (x1 x2 . . . x5 )T is of the form

b n x + σ02 a 2 b
Y |x ∼ N , σ0 1 +
b n + σ02 b n + σ02
where x is the sample mean.
Choose from the following options the best prediction of Y under the
absolute loss function.
Options for Question 15 5

i=1 xi
A 0 B x C
6
x+1 x+1 5x + 1
D E F
5 6 6
Question 16
Consider the following events.
• A : X1 , X2 , . . . , Xn are independent and come from a geometric
distribution with pmf
f (x|θ1 ) = θ1 (1 − θ1 )x , on x = 0, 1, . . .
with 0 < θ1 < 1;
• Ac : X1 , X2 , . . . , Xn are independent and come from a geometric
distribution with pmf
f (x|θ2 ) = θ2 (1 − θ2 )x , on x = 0, 1, . . .
with 0 < θ2 < 1.
Assuming that you have observed a sample mean x = 3, choose from
the following options the correct statement about the Bayes factor for
A against Ac when θ1 = 1/3 and θ2 = 2/3.

A The data values equally support both events.
B The data values favour A over Ac .
C The data values favour Ac over A.
D The Bayes factor cannot be used for comparing these two events.
E The given information is not enough to calculate the Bayes factor.
F The Bayes factor suggests that the geometric distribution is a poor
model for these data.

Question 17
Choose from the set of statements below the two options that are
correct statements for a continuous Markov chain. Note: you should
select TWO options for this question.
A Given the past, the future is independent of the present.
B Given the present, the future is independent of the past.
C Given the future, the present is independent of the past.
D The transition distribution of the Markov chain is necessarily a
continuous distribution.
E The transition kernel of the Markov chain is given by a transition
matrix.
F The Markov chain does not necessarily have an equilibrium
distribution.
Question 18
The matrix

a b
P= 1 3
4 4
is the transition matrix of an irreducible and aperiodic Markov chain.

Suppose 1 that
the equilibrium distribution of this Markov chain is
2
π = 3 3 .
T
Choose from the options below the true values of a and b.

A a = 13 , b = 2
3
B a = 14 , b = 3
4
C a = 12 , b = 1
D a = 14 , b = 1
4
E a = 16 , b = 5
6
F a = 12 , b = 1
2
10 M347 June 2018

Question 19
Suppose that values x1 , x2 , . . . , xt have been simulated for
X1 , X2 , . . . , Xt using the Metropolis-Hastings algorithm, and that
xt = 0. At the (t + 1)th stage of the Metropolis-Hastings algorithm, a
candidate value x∗ = 1 for Xt+1 is sampled from the proposal density
q(xt+1 |xt ) which is that of a standard normal distribution. Suppose
that the target pdf is f (x) = √1π exp[−(x − 1)2 ]. (You might like to note
that the value of e is approximately 2.718.)
Choose from the options below the resulting value of xt+1 , and the
reason for that value.
A The acceptance probability α(x∗ |xt ) < 1, so xt+1 = 1.
B The acceptance probability α(x∗ |xt ) = 1, so xt+1 = 1.
C The acceptance probability α(x∗ |xt ) > 1, so xt+1 = 1.
D The acceptance probability α(x∗ |xt ) < 1, so xt+1 = 0.
E The acceptance probability α(x∗ |xt ) = 1, so xt+1 = 0.
F The acceptance probability α(x∗ |xt ) > 1, so xt+1 = 0.
Question 20
Where appropriate, linear regression with one explanatory variable can
be performed ‘through the origin’, that is, with the intercept α fixed
equal to zero. The log-likelihood in this reduced model is
S0 (β)
l(β, σ) = constant − n log σ −
2σ 2
where

n
S0 (β) = (yi − βxi )2 .
i=1
Solve the equation dS0 (β)/dβ = 0 to provide the candidate value β for
the value of β that minimises S0 (β). (It can be confirmed that β thus
found is the MLE for β but you need not do so.)
Choose the correct value of β from the following options.

n n
xi yi 2 xi yi
A i=1
n 2
B i=1
n 2
i=1 xi i=1 xi
n n
(x − x)(yi − α) xy
C n i
i=1
D n i=1 i i 2
i=1 (xi − x) i=1 (xi − x)
2
n n
(x − x)(yi − y) x (y − y)
E n i
i=1
F ni i 2
i=1
i=1 (xi − x)
2
i=1 xi

Question 21
Let RSS, TSS and ESS be the residual, total and explained sums of
squares in multiple regression. Choose from the following options the
one that equals the following:
T (y − X β)
(y − X β)
1− n .
i=1 (yi − y)
2

A ESS B RSS/TSS C ESS/RSS
D ESS/TSS E 1− ESS/RSS F 1− ESS/TSS
Question 22
In the multiple regression model, the covariance matrix of β is
Choose from the
σ 2 (X T X)−1 . The fitted values are given by Y = X β.
following options the covariance matrix of Y .

A σ 2 Id+1 B σ 2 In C σ2
D σ 2 (X T X)−1 X T E σ 2 X(X T X)−1 X T F σ 2 X(X T X)−1
Question 23
Let Y be the sum of three independent variables with distributions
M(λ), so Y ∼ Gamma(3, λ). This distribution is a member of the
exponential dispersion family with a(λ) = −λ and mean μ = 3/λ.
Select the formula for its canonical link.

A 3/μ B log(3/μ) C −3/μ
D log(1/μ) E 1/μ F log μ
12 M347 June 2018

Question 24
Listed below are the supposed response distribution, link function and
linear predictor, respectively, of three regression models:
(a) Y ∼ N(μ, 1), g(μ) = μ2 ,
η(x1 , x2 ) = α + β1 x1 + β2 log x2 .
(b) Y ∼ Poisson(λ), g(λ) = (λ + 1)2 ,
η(x1 , x2 ) = α + βx1 + β 2 x2 .
(c) Y ∼ Beta(θ, θ), g(θ) = log θ,
η(x) = α + β1 x + β2 x2 .
In the above, θ > 0, λ > 0 and μ ∈ R. None of these models is,
however, a generalised linear model with a response variable from the
exponential dispersion family (EDF). In each case, one element of the
Response, Link, or Predictor does not fit the definition.
Choose from the options below the correct reason for each of the
models to not be a GLM with EDF response distribution.
A (a) Link, (b) Response, (c) Predictor
B (a) Link, (b) Predictor, (c) Response
C (a) Predictor, (b) Response, (c) Link
D (a) Predictor, (b) Link, (c) Response
E (a) Response, (b) Link, (c) Predictor
F (a) Response, (b) Predictor, (c) Link

Question 25
Consider the Bayesian analysis of a multiple linear regression model
and that of a generalised linear model.
Choose the two statements from the following options that are false.
Note: you should select TWO options for this question.
A A conjugate prior distribution could be used with both types of
model.
B With a generalised linear model, there is no general family of
conjugate prior distributions that always leads to analytically
tractable posteriors.
C With both types of model, independent priors always result in
independent posteriors.
D A Bayesian analysis of a multiple regression model does not always
need to use MCMC methods.
E Bayesian analysis of a generalised linear model can require the
Metropolis–Hastings algorithm.
F It is not possible to determine for certain whether the output of an
MCMC algorithm has converged after any finite number of steps.
14 M347 June 2018

PART 2
Please answer FIVE questions from Questions 26–32 in Part 2.
Throughout this part you should show all the main steps in your
calculations. Each question carries 10 marks.
Question 26
Let X be a continuous random variable following the distribution with
cdf
9
FX (x) = 1 − 2 on x > 3.
x
(a) What is the value of Pr(X > 9)? [2]
(b) Find the pdf, f (x). [2]
(c) Show that the quantile function can be written
3
Q(α) = √ . [2]
1−α
(d) Show that the interquartile range can be written

√
IQR = 2 3 − 3 . [2]
(e) Define Y = 1/X. Find the cdf of Y . (You may take it for granted
that the support of Y is 0 < y < 1/3). [2]
Question 27
Let (X, Y ) follow the bivariate distribution with pdf
1
f (x, y) = on 0 < x < y, 0 < y < 1.
y
(a) Show that the Y -marginal of this distribution is the U(0, 1)
distribution, that is, the uniform distribution with parameters 0
and 1. [2]
(b) Show that, for each value of 0 < y < 1, the conditional density of
X|Y = y is
1
fX|Y (x|y) = on 0 < x < y. [2]
y
(c) Name the distribution whose pdf is obtained in part (b), and state
its parameters. Hence, give the formula for E(X|Y = y). [2]
(d) Use your answer in part (c) to show that E(X) = 1/4. [2]
(e) Find E(XY ). [2]

Question 28
Suppose that x1 , x2 , . . . , xn is a sample of independent observations
from the Rayleigh distribution, which has pdf
2
x x
f (x|β) = 2 exp − on x > 0,
β β
with β > 0. Write
1 1 2
n n
t= log xi and r= x.
n i=1 n i=1 i
(a) Show that the log-likelihood is

r

(β) = n log 2 − log β + t − . [3]
β
(b) Find
(β) and hence show that the candidate MLE is β = r. [2]
(c) Confirm that β = r is indeed the MLE of β. [3]
= β and
(d) It turns out that for this Rayleigh distribution, E(β)
= β 2 /n. What is the formula for the mean squared error of β
V (β)
and
as an estimator of β? How does this formula relate to V (β),
why? [2]
Question 29
Let independent positive random variables X1 , X2 , . . . , Xn be modelled
by a gamma distribution with parameters a = 2 (known) and θ > 0
(unknown), so that for i = 1, 2, . . . , n,
f (xi | θ) = θ2 xi exp(−θxi ) on xi > 0.
(a) Show that L(θ), the likelihood for θ based on observed data
x = (x1 , x2 , . . . , xn )T , can be written
L(θ) ∝ θ2n exp(−θnx). [2]
(b) Suppose that a Gamma(a, b) prior is specified for θ. Show that the
posterior f (θ|x) can be written
f (θ|x) ∝ θa+2n−1 exp{−(b + nx)θ}. [3]
(c) The posterior corresponds to which distribution for θ? [1]

of θ under the
(d) The prior mean is E(θ) = a/b; the MLE, θ,
Gamma(2, θ) model turns out to be θ = 2/x; and the posterior
mean turns out to be
a + 2n
E(θ|x) = .
b + nx
Show that E(θ|x) can be written
E(θ|x) = tθ + (1 − t)E(θ),
where 0 < t < 1, and give the expressions for t and 1 − t. [4]
16 M347 June 2018

Question 30
Let f (λ|x) be the posterior density of a parameter λ, given data x. To
simulate samples from this posterior distribution, the
Metropolis–Hastings algorithm generates values for the sequence of
random variables, λ1 , λ2 , . . . Let λt be the generated value of λt for the
tth iteration, and suppose that the proposal distribution for λt+1 |λt = λt
is N(λt , 3). Denote the proposal density function by q(λt+1 |λt ).
(a) Let λ1 = 0.25 be the starting value for the algorithm. What
distribution will the Metropolis–Hastings algorithm use to generate
the candidate value λ∗ for λ2 ? [1]
(b) Explain why, in this case, the acceptance probability for λ∗ can be
written

∗ f (λ∗ |x)
α(λ |λ1 ) = min ,1 . [2]
f (λ1 |x)
(c) Suppose that the candidate value generated from the distribution
with density q(λ2 |λ1 ) is λ∗ = 0.1 and that the acceptance
probability in part (b) turns out to be 0.492. Suppose also that u is
simulated from U(0, 1) so that u = 0.477. What is the value of λ2 ?
From what distribution will the candidate value λ∗∗ for λ3 be
generated? [2]
(d) After several thousand sampled values of λ have been generated
according to the above algorithm, an autocorrelation function is
calculated. It takes the value 1 when the lag (the time between
observations) is zero, 0.4 when the lag is one, 0.15 when the lag is
two, and 0 for all higher lags. It is decided to thin the series of
values obtained. Say what is meant by thinning in this context,
why one would do it, and give an appropriate value for how much
to thin the sample. [3]
(e) When all full conditionals of the posterior distribution of a
multidimensional vector of parameters are available, it is
appropriate to use a method that is a special case of the
Metropolis–Hastings algorithm. Explain what is meant by a “full
conditional” and name that special case method. [2]

Question 31
The standard multiple linear regression model with d explanatory
variables and a sample of size n states that the conditional distribution
of Y |X = X is
Y |X = X ∼ Nn (Xβ, σ 2 I n ).
(a) Explain why β is a d + 1 vector when there are only d explanatory
variables. Also, describe the matrix X, including its dimensions. [3]
(b) The MLE of β turns out to be
β = (X T X)−1 X T y.
Show that, conditional on X = X,
β ∼ Nd+1 (β, σ 2(X T X)−1 ). [4]
(c) The trace of a square matrix is the sum of its diagonal elements.
For example, tr(I n ) = n. Also, tr(AB) = tr(BA) for any A and
B whose product is a square matrix. Use these facts to evaluate
tr(H) where the hat matrix
H = X(X T X)−1 X T
in terms of the observed values y through
gives the fitted values y
= Hy.
y
The expectation of the residual sum of squares, R, in the multiple
regression model can be written as
E(R) = σ 2 {tr(I n ) − tr(H)}.
Hence give an unbiased estimator for σ 2 in terms of R, n and d. [3]
Question 32
Suppose the random variable Y follows the geometric distribution with
parameter p ∈ (0, 1):
f (y|p) = p(1 − p)y−1 on y = 1, 2, . . . .
Then Y has a distribution belonging to the exponential dispersion
family (EDF) with θ = p and dispersion parameter φ = 1. (You can
therefore set d(φ) = 1 below.)
(a) Identify the functions a(p) and b(p) in the standard formula for the
pmf of a discrete member of the EDF. [4]
(b) Use general properties of the EDF to show that E(Y ) = 1/p and
V (Y ) = (1 − p)/p2 . [5]
(c) Identify the canonical link for the geometric distribution. [1]
[END OF QUESTION PAPER]
18 M347 June 2018

M347
Some useful results and formulae
Univariate pdfs and pmfs, with means and variances
• The normal distribution, N(μ, σ2 ), with μ ∈ R and σ > 0:

2
1 1 x−μ
f (x) = √ exp − on x ∈ R; E(X) = μ, V (X) = σ2 .
σ 2π 2 σ
• The exponential distribution, M(λ), with λ > 0:

1 1
f (x) = λ exp(−λx) on x > 0; E(X) = , V (X) = 2 .
λ λ
• The uniform distribution, U(a, b), with a < b:
1 a+b (b − a)2
f (x) = on a < x < b; E(X) = , V (X) = .
b−a 2 12
• The power distribution with β > 0:
β β
f (x) = βxβ−1 on 0 < x < 1; E(X) = , V (X) = .
β+1 (β + 2)(β + 1)2
• The gamma distribution, Gamma(a, b), with a, b > 0:

ba a−1 −bx a a
f (x) = x e on x > 0; E(X) = , V (X) = 2 .
Γ(a) b b
• The beta distribution, Beta(a, b), with a, b > 0:

1
f (x) = xa−1 (1 − x)b−1 on 0 < x < 1;
B(a, b)
a ab
E(X) = , V (X) = .
a+b (a + b + 1)(a + b)2
• The Bernoulli distribution, Bernoulli(p), with 0 < p < 1:

p(x) = px (1 − p)1−x on x = 0, 1; E(X) = p, V (X) = p(1 − p).
• The binomial distribution, B(n, p), with 0 < p < 1:

n
p(x) = px (1 − p)n−x on x = 0, 1, . . . , n; E(X) = np, V (X) = np(1 − p).
x
• The Poisson distribution, Poisson(μ), with μ > 0:

μx e−μ
p(x) = on x = 0, 1, . . . ; E(X) = μ, V (X) = μ.
x!
1
Bivariate and multivariate normal distribution
• The bivariate normal distribution with μX , μY ∈ R,

σX , σY > 0 and −1 < ρ < 1: for x, y ∈ R2 ,
1
f (x, y) =
2πσX σY 1 − ρ2

1 σ2Y (x − μX )2 − 2ρσX σY (x − μX )(y − μY ) + σ2X (y − μY )2
× exp − .
2 σ2X σ2Y (1 − ρ2 )
E(X) = μX , E(Y ) = μY , V (X) = σ2X , V (Y ) = σ2Y , Corr(X, Y ) = ρ.

X ∼ N(μX , σ2X ), Y ∼ N(μY , σ2Y ).
Y | X = x ∼ N(μY + ρ(σY /σX )(x − μX ), σ2Y (1 − ρ2 )),
X | Y = y ∼ N(μX + ρ(σX /σY )(y − μY ), σ2X (1 − ρ2 )).
• This is a special case of the multivariate normal distribution:
for x ∈ Rd , the Nd (μ, Σ) density is
1
f (x) = exp − 12 (x − μ)T Σ−1 (x − μ) .

(2π)d det(Σ)
Some moment and normality formulae
• E(Y ) = E{E(Y |X)}, V (Y ) = E{V (Y |X)} + V {E(Y |X)}.

• V (aX + bY ) = a V (X) + 2ab Cov(X, Y ) + b2 V (Y ).
2
• Cov(aX + bY, cX + dY ) = ac V (X) + (bc + ad) Cov(X, Y ) + bd V (Y ).

• If E(X) = μ and V (X) = Σ, then
E(aT X + b) = aT μ + b, V (aT X + b) = aT Σa.
• If X1 , X2 , . . . , Xm are independent random variables with
Xi ∼ N(μi , σ2i ), i = 1, 2, . . . , m, and a1 , a2 , . . . , am are constants,
then
m
m
m
Y = ai Xi ∼ N ai μi , a2i σ2i .
i=1 i=1 i=1
• If Z ∼ Np (μ, Σ) and V = AZ, where A is a q × p matrix, then

V ∼ Nq (Aμ, AΣAT ).
• In the regression case where, independently of one another,
Yi | Xi = xi ∼ N(α + βxi , σ2 ),
n
n
n
n
Z= j Y j ∼ N α j + β j xj , σ2 2j .
j=1 j=1 j=1 j=1
n
If, in addition, W = j=1 mj Yj ,

n
2
Cov(Z, W ) = σ j mj .
j=1
2
Gamma and beta functions
• For a > 0,
∞
Γ(a) = xa−1 e−x dx,
0
√
Γ(a + 1) = a Γ(a), Γ( 12 ) = π,
and, for integer n = 1, 2, . . ., Γ(n) = (n − 1)!.
• For a, b > 0,
1
B(a, b) = xa−1 (1 − x)b−1 dx,
0
Γ(a) Γ(b)
B(a, b) = .
Γ(a + b)
Score and information functions, and the Cramér–Rao lower

bound
• The score function is U(θ) = (θ).
• The information function is
I(θ) = n i(θ) = E{− (θ)} = E{U(θ)2 }.
• The minimum possible variance of any unbiased estimator,
θ, of θ
is given by the Cramér–Rao lower bound, or CRLB:
1
V (
θ) ≥ .
I(θ)
Some asymptotics
p
• Xn converges in probability to X, denoted Xn −→ X, if
P (|Xn − X| > ε) → 0 as n → ∞, for any ε > 0.
ms
• Xn converges in mean square to X, denoted Xn −→ X, if
E{(Xn − X)2 } → 0 as n → ∞.
D
• Xn converges in distribution to X, denoted Xn −→ X, if
Fn (x) → F (x) as n → ∞,
for all values x where F (x) does not jump.
• Let θn be the MLE of θ based on a random sample of size n.
Subject to some regularity conditions,

√ 1
n(
D
θn − θ) −→ W ∼ N 0, .
i(θ)
3
Some important theorems
Bayes’ Theorem
For conditional and marginal distributions,
fX|Y (x|y) fY (y)
fY |X (y|x) = ;
fX (x)
for prior and posterior pdfs or pmfs,
f (x|θ) f (θ)
f (θ|x) = .
f (x)
Fisher–Neyman Factorisation Theorem

T (X) is a sufficient statistic for θ if and only if there exist two
functions g and h such that the likelihood can be written
L(θ) = g(T (x), θ) × h(x),
where h is a function of x and/or other known constants which does
not involve θ.
Neyman–Pearson Lemma
When performing a hypothesis test between H0 : θ = θ0 and
H1 : θ = θ1 , the likelihood ratio test with rejection region of the form
R = {x : L(θ1 )/L(θ0 ) > c} is the most powerful test with a given size.
Weak and Strong Laws of Large Numbers

Weak Law:
p
X n −→ μ,
where Xn is a sequence of independent random variables, each with
the same distribution with finite mean μ and finite variance σ2 .
p ms
For the Strong Law, replace X n −→ μ with X n −→ μ.
Central Limit Theorem

If X1 , X2 , . . . , Xn is a sample of independent random variables each
with the same distribution with finite mean μ and finite variance σ2 ,
then
Xn − μ D
√ −→ Z ∼ N(0, 1).
σ/ n
Gauss–Markov Theorem
is any solution of the normal
If kT θ is an estimable function and θ
T
equations, then k θ has minimum variance in the class of linear
unbiased estimators of kT θ.
4
Test statistics
Let
θ be the MLE of θ, and the log-likelihood. For testing H0 : θ = θ0
against H1 : θ = θ0 , four test statistics based on likelihood theory are:
• Likelihood ratio test: 2 log(LR) = 2{(θ) − (θ0 )}.

• Wald test (version 1): W1 = (θ0 − θ)2 E {− (θ)} |.
θ
• Wald test (version 2): W2 = (θ0 −
θ)2 E {− (θ)} |θ0 .
• Score test: S = {l (θ0 )}2 /E{−l (θ)}|θ0 .
Inequalities
For any random variable X and any a > 0,

• P (|X| ≥ a) ≤ E{|X|}/a (Markov inequality);
• P (|X| ≥ a) ≤ E{X 2 }/a2 (Chebyshev inequality).
One- and two-parameter conjugate models

• In the beta/binomial model, X is modelled by a binomial
distribution, X ∼ B(n, θ), and a beta prior is used for θ, so that
θ ∼ Beta(a, b). The posterior distribution for θ, given data x, is
then
θ|x ∼ Beta(a + x, b + n − x).
• In the gamma/Poisson model, independent X1 , X2 , . . . , Xn are
modelled by a Poisson distribution, Xi ∼ Poisson(λ), and a
gamma prior is used for λ, so that λ ∼ Gamma(a, b). The
posterior distribution for λ, given x = (x1 x2 · · · xn )T , is then
λ|x ∼ Gamma(a + nx, b + n).
• In the normal-gamma/normal model, independent X1 , X2 , . . . , Xn
are modelled by a normal distribution, N(μ, 1/τ), and a
normal-gamma prior is used for (μ, τ), so that
μ, τ ∼ Ngamma(a, b, c, d), with
μ|τ ∼ N(a, b/τ) and τ ∼ Gamma(c, d).
The posterior distribution for (μ, τ), given x = (x1 x2 · · · xn )T , is
then
μ, τ | x ∼ Ngamma(a1 , b1 , c1 , d1 ),
where
a + bnx b
a1 = , b1 = ,
1 + bn 1 + bn
1
n
n n(x − a)2
c1 = c + , d 1 = d + (xi − x)2 + .
2 2 i=1 2(1 + bn)
The marginal posterior for μ|x is

b1
μ|x ∼ t 2c1 ; a1 , .
E(τ|x)
5
Bayesian and Markovian miscellany
• The Bayes factor for event A against event Ac is given by
B(x, A) = f (x|A)/f (x|Ac ).
• The equilibrium distribution in a discrete Markov chain satisfies
πT = πT P .
• The detailed balance equation for a continuous Markov chain
with transition kernel k(xt+1 |xt ) and equilibrium density π is, for
all x, y ∈ S,
k(y|x) π(x) = k(x|y) π(y);
and for a discrete Markov chain, it is, for all i, j ∈ S,
P (Xt+1 = j | Xt = i) πi = P (Xt+1 = i | Xt = j) πj .
• The acceptance probability in the Metropolis–Hastings algorithm
is

∗ q(xt |x∗ ) f (x∗ )
α(x |xt ) = min ,1 .
q(x∗ |xt ) f (xt )
Some results in regression

• In linear regression with one explanatory variable,

2 1 x2 σ2
∼ N α, σ
α + , β ∼ N β, .
n Sxx Sxx
Also,

1 (xi − x)2
ri = Yi − Yi ∼ N 2
0, σ 1 − −
n Sxx
and

1 (x0 − x)2
+
α βx0 ∼ N 2
α + βx0 , σ + .
n Sxx
• In linear regression with one explanatory variable using the
improper prior f (α, β, τ) ∝ τ−1 , the posteriors are

Sxy 1 1 x2
α | τ, y ∼ N y − x, + ,
Sxx τ n Sxx

Sxy 1
β | τ, y ∼ N , ,
Sxx τSxx

n−2 R
τ|y ∼ Gamma , ,
2 2
so that

2 1 x2
α|y ∼ t n − 2; α
, S + ,
n Sxx
2

S
β|y ∼ t n − 2;
β, .
Sxx
• In multiple regression,
β = (X T X)−1 X T y ∼ Nd+1 (β, σ2 (X T X)−1 ).
6
• In the general linear model, the normal equations are
(AT A)θ = AT y.
• In multiple regression, the multivariate normal-gamma prior is a
conjugate prior for θ = (βT τ)T , so that
β, τ ∼ MNgamma(a, B, c, d), with
β|τ ∼ N(a, B/τ) and τ ∼ Gamma(c, d).
The posterior distribution is given by
θ|y ∼ MNgamma(a1 , B 1 , c1 , d1 ),
where
−1
B 1 = B −1 + X T X , a1 = B 1 (B −1 a + X T y),
n 1
c1 = c + , d1 = d + (y T y + aT B −1 a − aT1 B −11 a1 ).
2 2
Exponential families and generalised linear models

• The pdf or pmf of a member of the exponential dispersion family is
of the form

y a(θ) − b(θ)
f (y|θ, φ) = exp + c(y, φ) .
d(φ)
When the dispersion parameter, φ, is known, this is the natural
exponential family.
• If Y ∼ f (y|θ, φ), then
b (θ)
E(Y ) = μ = ,
a (θ)

1 d b (θ)
V (Y ) = d(φ) h(θ), where h(θ) = × ,
a (θ) dθ a (θ)
the variance function is
Var(μ) = h(θ(μ)),
and the canonical link function is the function g which satisfies
g(μ) = a(θ).
• Inference for β in generalised linear models using exponential
dispersion families is based on
≈ Nd+1 (β, φ Σ(β)) .
β
• If a generalised linear model with d(φ) = φ/wi has been fitted to
obtained, the scaled deviance is
data y1 , y2, . . . , yn and MLEs μ
D ∗ (y, μ
, φ) = 2{(y, φ) − (
μ, φ)}.
Writing μi ) and
θi = θ( θi = θ(yi ),

n
) = 2
D(y, μ wi {yi (a(
θi ) − a(
θi )) − (b(
θi ) − b(
θi ))}
i=1
is the deviance, and D ∗ (y, μ

, φ) = D(y, μ
)/φ.

M347 201806

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

M347 201806

Caricato da

Copyright:

Formati disponibili

M3471806F1PV2

Module Examination 2018

Thursday 7 June 2018 2.30 pm – 5.30 pm

Time allowed: 3 hours

The examination is in TWO parts. You should attempt both parts.

Instructions for completing the computer-marked examination

2 M347 June 2018

M347 June 2018 TURN OVER 3

Options for Question 1

Options for Question 2

Options for Question 3

Options for Question 4

4 M347 June 2018

Choose from the following options the value of Cov(X, Y ).

Options for Question 5

Choose from the following options the bias of μ̃.

Options for Question 7

M347 June 2018 TURN OVER 5

Options for Question 8

where x is the sample mean. The MLE of b is b = 3/x.

Options for Question 9

6 M347 June 2018

Options for Question 11

M347 June 2018 TURN OVER 7

Options for Question 12

Options for Question 13

Options for Question 14

8 M347 June 2018

Options for Question 15 5

Options for Question 16

M347 June 2018 TURN OVER 9

is the transition matrix of an irreducible and aperiodic Markov chain.

Choose from the options below the true values of a and b.

Options for Question 18

10 M347 June 2018

Options for Question 20

M347 June 2018 TURN OVER 11

Options for Question 21

Options for Question 22

Options for Question 23

12 M347 June 2018

M347 June 2018 TURN OVER 13

14 M347 June 2018

(d) Show that the interquartile range can be written

M347 June 2018 TURN OVER 15

(a) Show that the log-likelihood is

(c) The posterior corresponds to which distribution for θ? [1]

16 M347 June 2018

M347 June 2018 TURN OVER 17

[END OF QUESTION PAPER]

18 M347 June 2018

• The normal distribution, N(μ, σ2 ), with μ ∈ R and σ > 0:

• The exponential distribution, M(λ), with λ > 0:

• The gamma distribution, Gamma(a, b), with a, b > 0:

• The beta distribution, Beta(a, b), with a, b > 0:

• The Bernoulli distribution, Bernoulli(p), with 0 < p < 1:

• The binomial distribution, B(n, p), with 0 < p < 1:

• The Poisson distribution, Poisson(μ), with μ > 0:

• The bivariate normal distribution with μX , μY ∈ R,

E(X) = μX , E(Y ) = μY , V (X) = σ2X , V (Y ) = σ2Y , Corr(X, Y ) = ρ.

f (x) =  exp − 12 (x − μ)T Σ−1 (x − μ) .

Some moment and normality formulae

• E(Y ) = E{E(Y |X)}, V (Y ) = E{V (Y |X)} + V {E(Y |X)}.

where x is the sample mean. The MLE of b is b = 3/x.

f (x) = exp − 12 (x − μ)T Σ−1 (x − μ) .

• Likelihood ratio test: 2 log(LR) = 2{(θ) − (θ0 )}.