Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Econometrics II
Generalized Method of Moments
(GMM) Estimation
Morten Nyboe Tabor
university of copenhagen department of economics
Learning Outcomes
1 Explain why the OLS estimator is inconsistent for linear regression models
with endogenous regressors (e.g. when estimating Taylor rules). Explain
the role of instruments in these models.
2 Give an account for the principle of MM and GMM estimation. Explain
the notion of under-, exact, and over-identification.
3 Explain how the GMM estimator may be computed in practice.
4 Give an account for the assumptions needed in order to obtain consistency
and asymptotic normality of the GMM.
5 Explain the role of the weight matrix.
6 Explain the notion of efficient GMM, and how this is obtained.
7 Explain the Hansen J-test.
8 Be able to determine the functions f (wt , zt , θ), g(θ), gT (θ), and QT (θ)
for specific models, such as the linear regression model (e.g. when
estimating a Taylor rule) and the C-CAPM. Explain how the functions are
computed in practice (if possible).
9 Explain the relationship between GMM and 2SLS.
10 Explain the notion of weak instruments and weak identification.
In the seminal paper, John Taylor (1993) suggested that the central bank sets
the short term interest rate, rt , according to the simple monetary policy rule
In the seminal paper, John Taylor (1993) suggested that the central bank sets
the short term interest rate, rt , according to the simple monetary policy rule
Consider the monetary policy rule for the interest rate, rt , given by the Taylor
rule with with rational expectations:
In general, we do not observe the market’s expectations for future inflation and
the current output gap (which is not observed).
where,
ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt , (3)
and α0∗ ∗
= α0 − α1 π .
where,
ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt , (3)
and α0∗ = α0 − α1 π . ∗
πt+12 = E [πt+12 |It ] + vt , E [πt+12 |It ] − πt+12 = −vt , where E [vt |It ] = 0.
yt = E [yt |It ] + wt , E [yt |It ] − yt = −wt , where E [wt |It ] = 0.
E [zt · ut ].
where,
ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt , (3)
and α0∗ ∗
= α0 − α1 π .
• Assuming rational expectations implies: E [ut | It ] = 0.
• Introducing R instruments, zt ∈ It , we have the moment conditions:
The moment conditions hold for all variables zt in the information set It !
• We need R ≥ 3, as we have 3 parameters to estimate, θ = (α0∗ , α1 , α2 )0 .
• For GMM estimation, we consider the sample moments
T
1 X
gT (θ) = zt · (rt − α0∗ − α1 πt+12 − α2 yt ).
T
t=1
zt = (1, rt−1 , rt−2 , πt−1 , πt−2 , yt−1 , yt−2 , bt−1 , bt−2 , xt−1 , xt−2 )0 ,
f (wt , zt , θ) = u(wt , θ) · zt ,
| {z } |{z}
(1×1) (R×1)
g(θ0 ) = E (u(wt , θ0 ) · zt ) = 0,
stating that the instruments are uncorrelated with the error term of the
model.
where f (yt , µ0 ) = yt − µ0 .
We found that, for a given sample {y1 , . . . , yT }, the ML estimator was the
sample average:
T
1 X
λML =
b yt
T
t=1
yt = xt0 β0 + t ,
Q. What is the relevant expression for the moment condition, g(θ0 ), and the
function, f (wt , zt , θ0 )?
yt = xt0 β0 + t . (∗∗)
T T T
1 X 1 X 1 X
gT (βb) = xt yt − xt0 βb = xt yt − xt xt0 βb = 0.
T T T
t=1 t=1 t=1
Example: Underidentification
yt = xt0 β0 + t
0 0
= x1t γ0 + x2t δ0 + t .
yt = xt0 β0 + t = x1t
0 0
γ0 + x2t δ0 + t ,
where the K1 variables in x1t are uncorrelated with t , while the K2 variables in
x2t are correlated with t . Assume that we have K2 valid instruments, z2t .
0 0 0 0 0 0
Finally, define xt = (x1t , x2t ) and zt = (x1t , z2t ).
Q. How can the instruments be used to derive a method of moments estimator
of β0 ?
P −1
T PT
(A) g(β0 ) = E (xt t ) = 0 ⇒ βbMM = 1
T
x x0
t=1 t t
1
T t=1
xt yt .
P −1
T PT
(B) g(β0 ) = E (zt t ) = 0 ⇒ βbMM = 1
T t=1
zt xt0 1
T t=1
zt yt .
P −1
T PT
(C) g(β0 ) = E (zt t ) = 0 ⇒ βbMM = 1
T t=1
zt zt0 1
T t=1
zt yt .
P −1
T PT
(D) g(β0 ) = E (z2t t ) = 0 ⇒ βbMM = 1
T
z x0
t=1 2t t
1
T t=1
z2t yt .
• Assume K2 new variables, z2t , that are correlated with x2t but
uncorrelated with t :
E (z2t t ) = 0. ()
The K2 moment conditions in () can replace (). To simplify
notation, we define
x1t x1t
xt = and zt = .
(K ×1) x2t (K ×1) z2t
xt are model variables, z2t are new instruments, and zt are instruments.
We say that x1t are instruments for themselves.
T
1 X
gT (βb) = zt yt − xt0 βb = 0.
T
t=1
T
!−1 T
1 X 1 X
βbMM = zt xt0 zt yt ,
T T
t=1 t=1
PT
provided that 1
T
z x0
t=1 t t
is non-singular.
• Note the following:
1 We need the instruments to identify the parameters.
2 The MM estimator coincides with the simple IV estimator.
3 The procedure only works with K2 new instruments (i.e., R = K ).
PT
4 Non-singularity of z x0
t=1 t t
requires relevant instruments.
gT (θ)0 WT gT (θ) .
θbGMM (WT ) = arg min
θ
Q. Which of the three assumptions are needed for consistency of the GMM
estimator? (And, more importantly, why?)
(A) Consistency requires Assumption 0.
• Assume that a law of large numbers (LLN) applies to f (wt , zt , θ), i.e.,
T
p
X
T −1 f (wt , zt , θ) → E (f (wt , zt , θ)) for T → ∞.
t=1
T
√ 1 X D
T · gT (θ0 ) = √ f (wt , zt , θ0 ) → N(0, S),
T t=1
where !
∂f (wt , zt , θ)
D=E
∂θ0
θ=θ 0
Recall, that the GMM estimator depends on the symmetric and positive
definite weight matrix WT .
(A) Because the asymptotic variance of the GMM estimator depends on the
weight matrix.
(B) Because the GMM estimator is only consistent for some weight matrices.
(C) Because the asymptotic variance of the sample moments depends on the
weight matrix.
(D) Because the GMM estimator is only asymptotically normal for some
weight matrices.
(E) Don’t know.
5. Efficient GMM
Socrative Question 9
Under Ass. 1 and 2 the asymptotic distribution of θbGMM is,
√
T (θb − θ0 ) → N(0, V ),
where
V = (D 0 WD)−1 D 0 WSWD(D 0 WD)−1 ,
and S is the asymptotic variance of the sample moment conditions and D is
the expected value of the R × K matrix of first derivatives of f (wt , zt , θ).
W is an (R × R) positive definite weight matrix which attaches different
weights to the R sample moments.
Q. Efficient GMM uses the optimal weight matrix, WTopt , which minimizes the
asymptotic variance V . How is that achieved?
• The variance of θ
bGMM depends on the weight matrix, WT .
The efficient GMM estimator has the smallest possible (asymptotic)
variance.
• Intuition: a moment with small variance is informative and should have
large weight.
It can be shown that the optimal weight matrix, WTopt , has the property
that
plimWTopt = S −1 .
With the optimal weight matrix, W = S −1 , the asymptotic variance
simplifies to
−1 −1 −1
V = D 0 S −1 D D 0 S −1 SS −1 D D 0 S −1 D = D 0 S −1 D .
Estimation of the weight matrix is typically the most tricky part of GMM.
Computational Issues
We need an optimal weight matrix, WTopt , but that depends on the parameters!
Two-step efficient GMM:
1 Choose an initial weight matrix, e.g. W[1] = IR , and find a consistent but
inefficient first-step GMM estimator
opt
2 Find the optimal weight matrix, W[2] , based on θb[1] . Find the efficient
estimator
θb[2] = arg min gT (θ)0 W[2]
opt
gT (θ).
θ
The estimator is not unique as it depends on the initial weight matrix W[1] .
opt
• From the estimator θ
b[2] it is natural to update the weights, W[3] , and
update θb[3] .
opt
We can switch between estimating W[·] and θb[·] until convergence.
Iterated GMM does not depend on the initial weight matrix.
The two approaches are asymptotically equivalent.
Continuously updated GMM estimator:
• A third approach is to recognize from the outset that the weight matrix
depends on the parameters, and minimize
using iterative GMM with HAC standard errors and the set of instruments
zt = (1, rt , ..., rt−6 , πt , πt−1 , ..., πt−6 , yt−1 , ..., yt−6 )0 .
The figure shows the inflation rate (red line), rt , and predicted inflation rate
(blue line),
brt = 1.12 + 1.19 · πt+12 + 0.49 · yt .
20 rt ^r
t
10
where u 0 (·) is the derivative, and Rt+1 = 1 + rt+1 is the return factor.
where u 0 (·) is the derivative, and Rt+1 = 1 + rt+1 is the return factor.
Now assume a constant relative risk aversion (CRRA) utility function:
ct1−γ
u(ct ) = , γ < 1.
1−γ
ct−γ − E −γ
(A) δ · ct+1 · Rt+1 zt = 0.
−γ
ct+1
(B) E δ· ct
· Rt+1 zt = 0.
−γ
ct+1
(C) E δ· ct
· Rt+1 − 1 zt = 0.
−γ
(D) E δ · ct+1 · Rt+1 zt = 0.
(E) Don’t know.
(And how many instruments do we need? And which ones would you suggest?)
university of copenhagen department of economics
ct1−γ
u(ct ) = , γ < 1,
1−γ
ct−γ − E δ · ct+1
−γ
· Rt+1 | It = 0.
f (wt , zt ; θ) = u(wt ; θ) · zt ,
ct+1 −γ
u(wt ; θ) = δ· · Rt+1 − 1.
ct
g(θ0 ) = E [f (wt , zt ; θ0 )] = 0,
i.e.
−γ0
ct+1
E δ0 · · Rt+1 − 1 ·1 = 0
ct
−γ0
ct+1 ct
E δ0 · · Rt+1 − 1 = 0
ct ct−1
−γ0
ct+1
E δ0 · · Rt+1 − 1 Rt = 0,
ct
for t = 1, 2, ..., T .
• Given a choice of weight matrix, WT ,
• Considering the actual data, the variables ct+1 /ct , Rt ≈ 1 for all t.
Hence the variation is small.
Consequently, if δ = 1,
−γ
ct+1
u(wt ; θ) = δ· · Rt+1 − 1 ≈ δ(1)−γ − 1 ≈ 0
ct
for all γ!
PT
E 2t zt zt0 .
1
(A) S= T t=1
T
V 2t zt zt0 .
1
P
(B) S= T t=1
1
PT
(C) S= T t=1
E (zt t ).
T
1
2 E (zt zt0 ).
P
(D) S= T t=1 t
(E) Don’t know.
university of copenhagen department of economics
A natural estimator is
T
1 X 2
ST = ft . (∗)
T
t=1
1 XT
γj = E (ft · ft−j ),
T t=j+1
We can write S as
XT −1
S = γ0 + 2 · γ1 + 2 · γ2 + 2 · γ3 + .... + 2 · γT −1 = γ0 + 2 · γj ,
j=1
T
1 X
γ̂j = ft · ft−j .
T
t=j+1
q−1
X
ST = γ̂0 + 2 · γ̂j .
j=1
5/6
4/6
3/6
2/6
1/6
0 Lags
-8 -6 -4 -2 0 2 4 6 8
Y = X β + .
X = Z γ + U,
γ = ???
b
X
b = Zbγ = ???
Step 2. Regress Y on X
b:
Y =X
bB + E,
B b 0X
b = (X b )−1 X
b 0 Y = ???
Q. What is b
γ and B
b?
(A) γ = (ZZ 0 )−1 ZY and B
b b = (X 0 Z (Z 0 Z )−1 Z 0 X )−1 X 0 Z (Z 0 Z )−1 Z 0 Y .
γ = (Z 0 Z )−1 Z 0 Y and B
(B) b b = (Z 0 X (X 0 X )−1 X 0 Z )−1 Z 0 X (X 0 X )−1 X 0 Y .
(C) γ = (Z 0 Z )−1 Z 0 Y and B
b b = (XZ 0 (Z 0 Z )−1 ZX 0 )−1 XZ 0 (Z 0 Z )−1 ZY .
γ = (Z 0 Z )−1 Z 0 Y and B
(D) b b = (X 0 Z (Z 0 Z )−1 Z 0 X )−1 X 0 Z (Z 0 Z )−1 Z 0 Y .
(E) Don’t know.
university of copenhagen department of economics
yt = xt0 β0 + t = x1t
0 0
γ0 + x2t δ0 + t ,
T
1 X 1
gT (β) = zt yt − xt0 β = Z 0 (Y − X β) ,
| {z } T T
t=1
(R×1)
−1
(A) βbGMM (WT ) = (Z 0 XWT X 0 Z ) Z 0 XWT X 0 Y .
−1
(B) βbGMM (WT ) = −2T −2 (X 0 ZWT Z 0 X ) X 0 ZWT Z 0 Y .
−1
(C) βbGMM (WT ) = (X 0 ZWT Z 0 X ) X 0 ZWT Z 0 Y .
−1
(D) βbGMM (WT ) = X 0 Z (Z 0 Z )−1 Z 0 X X 0 Z (Z 0 Z )−1 Z 0 Y .
(E) Don’t know.
From Lecture Note 3
Let A be a (n × k) matrix, V a (k × k) matrix, and β a (k × 1) vector of
parameters.
∂(β 0 A0 )
= A0 . (7∗)
∂β
∂(β 0 V β)
= (V + V 0 )β. (8∗)
∂β
university of copenhagen department of economics
• We take the first derivative, and the GMM estimator is the solution to
∂QT (β)
= −2T −2 X 0 ZWT Z 0 Y + 2T −2 X 0 ZWT Z 0 X β = 0.
∂β
• We find the general closed-form GMM estimator in the linear model
−1
βbGMM (WT ) = X 0 ZWT Z 0 X X 0 ZWT Z 0 Y ,
yt = xt0 β0 + t , t = 1, 2, ..., T ,
• To estimate the optimal weight matrix, WTopt = ST−1 , we use the estimator
T T
1 X 1 X 2 0
ST = · f (wt , zt , θ)f (wt , zt , θ)0 = t zt zt ,
b
T T
t=1 t=1
T
! T
!−1 T
!!
−1 −1
X X X
=T −T xt zt0 T −1
2t zt zt0
b −T −1
zt xt0
t=1 t=1 t=1
T
!−1 T T
!−1
X X X
= xt zt0 2t zt zt0
b zt xt0 .
t=1 t=1 t=1
Q. What does the efficient GMM estimator, βbGMM (WTopt ), simplifies to?
−1
(A) βbGMM (WT ) = (Z 0 XZ 0 ZX 0 Z ) Z 0 XZ 0 ZX 0 Y .
−1
(B) βbGMM (WT ) = X 0 Z (σ
b2 Z 0 Z )Z 0 X X 0 Z (σ
b2 Z 0 Z )Z 0 Y .
−1
(C) βbGMM (WT ) = X 0 Z (Z 0 Z )−1 Z 0 X X 0 Z (Z 0 Z )−1 Z 0 Y .
−1
(D) βbGMM (WT ) = T −2 X 0 Z (σ
b2 Z 0 Z )−1 Z 0 X X 0 Z (σ
b2 Z 0 Z )−1 Z 0 Y .
(E) Don’t know.
university of copenhagen department of economics
yt = xt0 β0 + t , t = 1, 2, ..., T ,
• If we assume that the error terms are IID, the optimal weight matrix
simplifies to
T
b2 X 0
σ
ST = zt zt = T −1 σ
b2 Z 0 Z ,
T
t=1
yt = xt0 β + t , t = 1, 2, ..., T ,
Q. What are the asymptotic properties of βbML if t is i.i.d but NOT normally
distributed?
yt = θ + t , t ∼ IID(0, 1).
Robustness: First order conditions should hold! Moment conditions should hold!
PML is a GMM interpretation of ML. Weights and variances can
Use larger PML variance. be made robust.