Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
5 Maximum Likelihood 3
5.1 Examples and definitions . . . . . . . . . . . . . . . . . . . . . . . . 3
5.1.1 Non-normality . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5.1.2 Probability Model . . . . . . . . . . . . . . . . . . . . . . . . 3
5.1.3 The Likelihood Function . . . . . . . . . . . . . . . . . . . . 4
5.2 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . 5
5.2.1 The Estimator . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.2.2 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2.3 The Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2.4 The Information Matrix . . . . . . . . . . . . . . . . . . . . 10
5.2.5 The Fréchet-Darmois-Cramer-Rao Lower Bound . . . . . . . 12
5.3 Asymptotic Properties of MLE . . . . . . . . . . . . . . . . . . . . 12
5.3.1 Consistency and Asymptotic Normality . . . . . . . . . . . . 13
5.3.2 Asymptotic Efficiency . . . . . . . . . . . . . . . . . . . . . 14
5.3.3 Variance Estimation . . . . . . . . . . . . . . . . . . . . . . 15
5.4 Binary Dependent Variable . . . . . . . . . . . . . . . . . . . . . . . 16
5.4.1 Linear Probability Model . . . . . . . . . . . . . . . . . . . . 16
5.4.2 Probit and Logit . . . . . . . . . . . . . . . . . . . . . . . . 17
5.4.3 Interpretation of Results . . . . . . . . . . . . . . . . . . . . 19
5.5 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.5.1 Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.5.2 Score Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5.3 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . 23
5.5.4 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.5.5 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.5.6 Confidence Region . . . . . . . . . . . . . . . . . . . . . . . 25
2
Chapter 5
Maximum Likelihood
,→ Laplace distribution:
√ |y − x0 β|
2
1
f y|x; β, σ =√ exp − 2
2σ 2 σ
4
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
It measures how likely a parameter value θ is when we observe the sample {y1 , . . . , yn },
given {x1 , . . . , xn }.
Some examples:
,→ Normal data generating process: y|x ∼ N (x0 β, σ 2 )
– Likelihood of individual i:
l β, σ 2 ; yi |xi = f (yi |xi ; β, σ 2 )
!
1 (yi − x0i β)2
= √ exp −
2πσ 2 2σ 2
– Log-likelihood of individual i:
L β, σ 2 ; yi |xi = log l β, σ 2 ; yi |xi
1 2
(yi − x0i β)2
= − log 2πσ −
2 2σ 2
– Log-likelihood of the sample:
n
X
2
L β, σ 2 ; yi |xi
L β, σ ; y|X = because of independence
i=1
n
n 2
X (yi − x0i β)2
= − log 2πσ −
2 i=1
2σ 2
n (y − Xβ)0 (y − Xβ)
= − log 2πσ 2 −
2 2σ 2
– Note: the density may not be defined for some values of the parameters.
Here, we must have σ > 0.
5
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
1 E n (y − x0 β)2
E n L β, σ 2 ; y|x 2πσ 2 −
= − log
2 2σ 2
−1
P n 0 2
1 2 n i=1 (yi − xi β)
= − log 2πσ −
2 2σ 2
−1
n ky − Xβk2
1
= − log 2πσ 2 −
2 2σ 2
1 n−1 (y − Xβ)0 (y − Xβ)
= − log 2πσ 2 − .
2 2σ 2
∂ n−1 X 0 (y − Xβ)
E n L β, σ 2 ; y|x =
∂β σ2
∂ 2
1 n−1 ky − Xβk2
E n L β, σ ; y|x = − + .
∂σ 2 2σ 2 2σ 4
This yields the First-Order Conditions (FOC)
∂
2
n−1 X 0 (y − X β)b
E n L β, σ
b b ; y|x = =0
∂β b2
σ
∂
2
1 n−1 ky − X βk
b 2
E n L β,
b σ ; y|x = − + = 0.
∂σ 2 σ2 σ4
b
2b 2b
Therefore,
,→ βbM L = (X 0 X)−1 X 0 y
2 −1 b 2 n−K 2
,→ σ
bM L = n ky − X βk = n
s
6
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
∂2 X0X
− 0
nb
σ 2
E n L θbM L ; y|x =
∂θ∂θ 0 0 − 2bσ1 4
is negative definite if X has full rank. So θbM L is a global and thus local maximum.
2
1 2
√ E n |y − x0 β|2
E n L β, σ ; y|x = − log 2σ − 2
2 Pσn
1 √ n −1 0
i=1 |yi − xi β|
= − log 2σ 2 − 2
.
2 σ
Then βbM L = arg min n−1 ni=1 |yi − x0i β|. This is the same as the Minimum
P
Absolute Deviations (MAD) estimator.
Exercice 5.1. Consider the (unconditional) exponential model where
1 y
f (y; θ) = exp − y>0 E (y) = θ Var (y) = θ2 .
θ θ
Show that the MLE is the sample average.
5.2.2 Identification
Definition 1. θ0 is globally identified in Θ if no other value of θ gives the same
model, i.e. f (y|x; θ) 6= f (y|x; θ0 ) for all θ ∈ Θ different from θ0 .
7
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
NB: When we write E , it means expectation with respect to the true distribution
f (y|x; θ0 ).
,→ The expected log-likelihood can be written, for any value of β and σ as:
1 E [(y − x0 β)2 |x]
E L β, σ 2 ; y|x |x = − log 2πσ 2 −
2" 2σ 2 #
1 2
σ02 + (x0 (β0 − β))2
= − log 2πσ + .
2 σ2
9
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
Score function
The score function is defined as
∂
Lθ (θ) = L (θ) .
∂θ
Lemma 5.2.2. Under Assumption Differentiability, E [Lθ (θ0 ) |x] = 0.
(Non-conditional version also holds.)
∂ (xx0 ) (β0 − β)
E Lβ β, σ 2 ; y|x |x = L β, σ 2 ; y|x |x =
E
∂β σ2
∂ 1 σ02 + [x0 β0 − x0 β]2
E Lσ2 β, σ 2 ; y|x |x = 2
E L β, σ ; y|x |x = − 2 + .
∂σ 2 2σ 2σ 4
Equal to zero for β = β0 and σ 2 = σ02 .
Exercice 5.4. Consider the (unconditional) exponential model. Derive the score
function. Check that the lemma holds in this case.
∂2
E [L (θ0 ; y|x) |x] = E [Lθθ (θ0 ; y|x) |x] negative definite .
∂θ∂θ 0
This matrix is related to the information matrix.
10
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
Var [Lθ (θ0 )] = E [Var [Lθ (θ0 )|x]] + Var [E [Lθ (θ0 )|x]] by the l.i.e.
= E [Var [Lθ (θ0 )|x]] because E [Lθ (θ0 )|x] = 0 .
More generally
One can define the conditional information matrix (of the sample) as
Similarly, we have
n
1X
I (θ|X) = − E (Lθθ (θ)|xi ) .
n i=1
11
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
Exercice 5.5. Consider the (unconditional) exponential model. Derive the in-
formation matrix.
is positive semi-definite.
Conditional version is also true and we can show that the Gauss-Markov Theorem
is a consequence of this latter result.
,→ We don’t always have an analytic formula for MLE, so small sample proper-
ties difficult to establish.
,→ MLE is not necessarily unbiased, but it has nice large sample properties.
12
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
γ
b = arg max E n L(γ)
γ
is such that γ
b = h(θ).
b
Asymptotic Analysis
Convergence
13
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
√ −1 √
n θ − θ0 = −E n Lθθ (θ̄)
b nE n [Lθ (θ 0 )] .
Now
√
1. nE n [Lθ (θ 0 )] is a sample average, so it is asymptotically normal.
n
√ √ 1X
nE n [Lθ (θ 0 )] = n Lθ (θ 0 ; yi |xi )
n i=1
with E [Lθ (θ0 ; y|x)] = 0 and Var [Lθ (θ 0 , y|x)] = I(θ 0 ) (here we consider
the unconditional variance).
Hence the CLT yields
√ d
nE n [Lθ (θ 0 )] −→ N (0, I(θ 0 )) .
2. −E n Lθθ (θ̄) converges to the information matrix.
p p
Since θ̄ −→ θ0 , the LLN yields −E n Lθθ (θ̄) −→ I(θ0 ).
3. By Slutsky’s theorem,
√
d
n θb − θ0 −→ N 0, I −1 (θ0 ) .
14
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
1. I(θ)
b if we know the exact form of the information matrix I(θ).
In the case where y|x ∼ N (x0 β, σ 2 ), since
" 0 #
X X
nσ02
0
I(θ0 |X) = 1 ,
0 2σ04
we can estimate it by
X0X
σ2
nb
0
I(θ|X)
b = 1
0 σ4
2b
h i
Var n Lθ θ;
b y|x = E n Lθ (θ)L b0
b θ (θ)
" n 0 #
X ∂ ∂
= n−1 log f (yi |xi ; θ)
b log f (yi |xi ; θ)
b .
i=1
∂θ ∂θ
In the case where y|x ∼ N (x0 β, σ 2 ), this yields the same estimator as the
first one.
15
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
The three estimators are generally different. Careful, as the third one is not
necessarily positive definite.
Exercice 5.7. Check that in the exponential model where
1 y
f (y; θ) = exp − y>0 Ey = θ Var y = θ2 ,
θ θ
the three estimators are respectively
1
I θb = 2
ȳ
n
1 X (yi − ȳ)2
Var n Lθ (θ) =
b 6= I θb
n i=1 ȳ 4
1 ȳ 1
E n −Lθθ (θ)b = − 2 +2 3 = 2 .
ȳ ȳ ȳ
Bernoulli model
If ỹ is Bernoulli with parameter p,
Pr (ỹ = y; p) = py (1 − p)(1−y) .
The likelihood is l(p; y) = Pr (ỹ = y; p) and the log-likelihood is
L(p; y) = y log p + (1 − y) log (1 − p) .
Exercice 5.8. Check that pbM L = ȳ. Derive the score function and the informa-
tion matrix.
16
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
,→ ε cannot be homoscedastic
As y = y 2 ,
c = 1|x) = x0 βb
Pr(y
Probit
u ∼ N (0, 1) → F (·) = Φ(·).
σu2 = 1 is a normalization, because if σ 2 unrestricted
0
0 xβ
F (x β) = Φ
σ
Logit
1
u is Logistic, F (t) = Λ(t) = 1+exp(−t)
, and Var (u) = π 2 /3 ≈ 3.29.
17
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
18
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
Formulas
Note that Var (y|x) = F (x0 β) (1 − F (x0 β)).
,→ Log-likelihood:
Exercice 5.9. Find the score function and check that its conditional expectation
is zero. Find the conditional information and the unconditional version. Check
that the information matrix equality holds.
,→ Since F (·) is strictly increasing (and f (·) always positive), βj > 0 implies
a positive relation between xj and y (respectively βj < 0 means negative
relation).
19
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
when other explanatory variables are fixed at some values (e.g. median
values).
5.5 Tests
We will assume that the null hypothesis of interest is H0 : r(θ0 ) = 0 for some
function r(·) from Rp into Rq , that is there are q restrictions on the p parameters
in θ0 . We will look at the trinity of tests.
20
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
is such that R (θ0 ) has full rank, which means that there are no redundant con-
straints.
Then the delta method yields
b ≈ n−1 R (θ0 ) I −1 (θ0 ) R (θ0 )0 .
Var r(θ)
We use
b = n−1 R(θ)
d r(θ)
Var b Ib−1 R(θ)
b 0,
Then −1
0 −1 0
W = n r(θ) R(θ)I R(θ)
b b b b r(θ)
b ,
(RSSR − RSS)
W = qF =
RSS/(n − K)
21
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
The test is also known as the Lagrange Multiplier test, because S can be rewritten
as a function of the Lagrange multipliers of the constrained problem. The test
statistic is then also labeled LM .
We usually use for Ib a consistent estimator that depends only on the restricted
estimator θbR . This estimator will then be consistent under H0 (but not necessarily
under H1 ).
One can show that under H0
p d
S − W −→ 0 ⇒ S −→ χ2q .
The rejection region is S > χ2q,1−α .
i=1
bR2
σ
1 0
0
−1 0 −1 b−c
= R R(X X) R R β
σR2
nb
E n Lσ2 (θbR ) = 0 .
Then
0
−1
Rβb − c (R(X 0 X)−1 R0 ) Rβb − c RSSR − RSS
S= =n
bR2
σ RSSR
where σbR2 = RSSR /n is MLE for σ 2 in the restricted model.
Note that in practice some of the components of the average score E n Lβ (θbR ) will
be zero. For instance, if we impose β2 = 0, then
E n Lβ1 (βb1R ) = 0 .
22
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
We need to deal with the two constrained and unconstrained problems, but we
don’t need to choose how to evaluate the information I (θ0 ).
One can show that under H0
p d
LR − W −→ 0 ⇒ LR−→ χ2q .
Exercice 5.10. In the exponential model, consider testing the null hypothesis
that θ = θ0 against θ 6= θ0 . Show that
(ȳ − θ0 )2
W = n
ȳ 2
(ȳ − θ0 )2
S = n
θ2
0
θ0 (ȳ − θ0 )
LR = 2n log + .
ȳ θ0
23
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
5.5.4 Invariance
There may be different ways of writing the restrictions, e.g. exp(β1 ) − 1 = 0 ⇔
β1 = 0. One would like to obtain the same decision in any case: this property is
called invariance. But this is not always the case. It is true for the LR test, not
true for the Wald test, and true in some cases for the Score test.
r
For two equivalent restrictions 1 (θ) = 0 and r2 (θ) = 0, the value of the maximum
average log-likelihood E n L θR is also the same.
b
This is related to the invariance of MLE.
5.5.5 Consistency
The Wald test is consistent
To see it, assume r(θ0 ) 6= 0, then
p
b −→ r(θ0 ) 6= 0
,→ r(θ)
p
b Ib−1 R(θ)
,→ R(θ) b 0 −→ R(θ0 )I −1 (θ0 )R(θ0 )0 which is positive definite.
p
,→ Hence W −→ ∞.
,→ Hence Pr W > χ2r,1−α → 1.
24
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
H0 : θ = θ0 against H1 : θ = θ1 .
,→ We can consider confidence regions associated with any test of the trinity.
25
Appendix A
Z
fθ (y|x; θ0 )
⇒ E [Lθ (θ0 ) |x] = f (y|x; θ0 ) dy
f (y|x; θ0 )
Z Z
∂
= fθ (y|x; θ0 ) dy = f (y|x; θ 0 ) dy = 0 .
∂θ
Differentiating this expression with respect to θ (and using shortcut notations) gives:
∂(Lθ f ) ∂Lθ ∂f
= f + Lθ 0
∂θ 0 ∂θ 0 ∂θ
= Lθθ f + Lθ (fθ )0
fθ
= Lθθ f + Lθ L0θ f since Lθ =
f
= (Lθθ + Lθ L0θ )f
which is equivalent to
Since the Hessian is equal to a negative variance matrix (which is positive semidefinite),
it is a negative semidefinite matrix.
27
P. Lavergne — F. Poinas — S. Sinha Econometrics M1
W = nθb2 /ω 2 .
θ0
Choose some α > 0, then H00 : α−θ 0
= 0 is equivalent to H0 . What is the Wald test
statistic for this hypothesis?
∂ θ0 2
As ∂θ α−θ0 = α/(α − θ0 ) , the delta method yields
! !
√
θb θ0 d
n − −→ N (0, α2 ω 2 /(α − θ0 )4 )
α − θb α − θ0
θ0
The Wald test statistic for H00 : α−θ0 = 0 is then
2 !2
b α − θb
θ/ θb
W0 = n =W 1− .
α2 ω 2 /(α − θ)
b4 α
28