05 GMM (Annotated)

university of copenhagen department of economics
Econometrics II
Generalized Method of Moments
(GMM) Estimation
Morten Nyboe Tabor
Learning Outcomes
1 Explain why the OLS estimator is inconsistent for linear regression models
with endogenous regressors (e.g. when estimating Taylor rules). Explain
the role of instruments in these models.
2 Give an account for the principle of MM and GMM estimation. Explain
the notion of under-, exact, and over-identification.
3 Explain how the GMM estimator may be computed in practice.
4 Give an account for the assumptions needed in order to obtain consistency
and asymptotic normality of the GMM.
5 Explain the role of the weight matrix.
6 Explain the notion of efficient GMM, and how this is obtained.
7 Explain the Hansen J-test.
8 Be able to determine the functions f (wt , zt , θ), g(θ), gT (θ), and QT (θ)
for specific models, such as the linear regression model (e.g. when
estimating a Taylor rule) and the C-CAPM. Explain how the functions are
computed in practice (if possible).
9 Explain the relationship between GMM and 2SLS.
10 Explain the notion of weak instruments and weak identification.
Econometrics II — Generalized Method of Moments — Slide 2/52

Course Outline: Generalized Method of Moments

1 Example: Estimation of a Taylor Rule
2 Introduction to GMM
3 Method of Moments (MM) Estimation
Principle
Examples: MM Estimator of the Mean, OLS, Underidentification and IV
4 Generalized Method of Moments (GMM) Estimation
Principle
Properties: Consistency and Asymptotic Distribution
5 Efficient GMM
Principle
Computational Issues
Test of Overidentifying Moment Conditions
6 Examples
The C-CAPM Model
Weight Matrix Estimation
2SLS
7 Pseudo-Maximum Likelihood Estimation
Pseudo-ML (PML) Estimation
8 Concluding Remarks
Comparison ML/GMM
1. Example: Estimation of a Taylor Rule
Example: The Taylor Rule
In the seminal paper, John Taylor (1993) suggested that the central bank sets
the short term interest rate, rt , according to the simple monetary policy rule
rt = α0 + α1 · E [πt+12 − π ∗ | It ] + α2 · E [yt | It ]. (1)
• E [· | It ] denotes the rational expectation conditional on the information

set available at time t.
• πt is the current inflation (year-on-year).
• π ∗ is the constant inflation target.
• yt is the output gap.

Example: The Interpretation
In the seminal paper, John Taylor (1993) suggested that the central bank sets
the short term interest rate, rt , according to the simple monetary policy rule
• α1 > 1 in order to increase the real interest rate.

• α2 > 0 in order to cool the economy.
• Values suggested in Taylor’s original paper: α1 = 1.5 and α2 = 0.5.

Socrative Question 1
Consider the monetary policy rule for the interest rate, rt , given by the Taylor
rule with with rational expectations:
We want to estimate the coefficients θ = (α0 , α1 , α2 )0 . Because we do not

observe the expected inflation E [πt+1 | It ] and expected output gap E [yt | It ],
we consider the model,
rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)
where α0∗ = α0 − α1 π ∗ and ut contains the forecast errors.

Q. Is the OLS estimator θbOLS in (2) consistent?
(A) θbOLS is consistent as the moment condition is fullfilled.

(B) θbOLS is consistent if the rational expectations are correct on average.
(C) θbOLS is inconsistent as the moment condition is violated.
(D) θbOLS is inconsistent as (2) contains the future variable πt+12 .
(E) Don’t know.
Example: OLS Estimation
In general, we do not observe the market’s expectations for future inflation and
the current output gap (which is not observed).
• Replace the variables with the observed values:
rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)
where,

ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt , (3)
and α0∗ ∗
= α0 − α1 π .
• As the explanatory variables, πt+12 and yt , are correlated with ut , the

moment conditions E [πt+12 ut ] 6= 0 and E [yt ut ] 6= 0 are violated.
Hence, the OLS estimator is inconsistent.

Consider the Taylor rule in terms of observable variables:
rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)
where α0∗ = α0 − α1 π ∗ and

ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt . (3)
We can always decompose πt+12 (or yt ) into a conditional expectation given It

and a forecast error, which is orthogonal to It :
πt+12 = E [πt+12 |It ] + vt , where E [vt |It ] = 0.
Q. If the model is correct, what is the implication for the conditional

expectation E [ut |It ]?
(A) E [ut |It ] = 0.

(B) E [ut |It ] = t .

(C) E [ut |It ] = α2 E [yt | It ] − yt .
(D) E [ut |It ] = α2 .
(E) Don’t know.
Example: Rational Expectations

The Taylor rule in terms of observable variables:
rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)
where,
and α0∗ = α0 − α1 π . ∗
Under rational expectations (model-consistent expectations), the time t

expectation of πt+12 is represented as the conditional expectation of πt+12
given the information set It .
• We can always decompose πt+12 (or yt ) into a conditional expectation
given It and a forecast error, which is orthogonal to It :
πt+12 = E [πt+12 |It ] + vt , E [πt+12 |It ] − πt+12 = −vt , where E [vt |It ] = 0.
yt = E [yt |It ] + wt , E [yt |It ] − yt = −wt , where E [wt |It ] = 0.
• Consequently, under rational expectations the expectation error, ut , is

orthogonal to the information set It and with conditional expectation zero:
ut = −α1 vt − α2 wt , where E [ut |It ] = 0.

Consider the Taylor rule in terms of observable variables:
rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

By introducing a set of R instruments, denoted zt , we can estimate the

parameters from the moment condition:
E [zt · ut ].
Q. What is required for the instruments zt to be valid instruments? (And why?)
(A) They must be in the information set at time t, zt ∈ It .

(B) They must be in the information set at time t, zt ∈ It , and correlated
with rt .
(C) They must be in the information set at time t, zt ∈ It , and correlated
with πt+12 and yt .
(D) They must be correlated with the forecast errors E [πt+12 | It ] − πt+12 and
E [yt | It ] − yt .
(E) Don’t know.
Bonus question: What are natural candidates for the instruments zt ?
Example: Introducing Instruments

The Taylor rule in terms of observable variables:
rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)
where,
and α0∗ ∗
= α0 − α1 π .
• Assuming rational expectations implies: E [ut | It ] = 0.
• Introducing R instruments, zt ∈ It , we have the moment conditions:
E [zt · ut ] = E [zt · (rt − α0∗ − α1 πt+12 − α2 yt )] = 0.
The moment conditions hold for all variables zt in the information set It !
• We need R ≥ 3, as we have 3 parameters to estimate, θ = (α0∗ , α1 , α2 )0 .
• For GMM estimation, we consider the sample moments
T
1 X
gT (θ) = zt · (rt − α0∗ − α1 πt+12 − α2 yt ).
T
t=1

Example: Introducing Instruments
• Consider the 11 instruments,
zt = (1, rt−1 , rt−2 , πt−1 , πt−2 , yt−1 , yt−2 , bt−1 , bt−2 , xt−1 , xt−2 )0 ,
where bt is a bond yield and xt is the unemployment rate.
• The instruments are included in the information set at time t, but it is

assumed that the central bank does not react directly to these variables
when deciding on the policy rate.
• In relation to 2SLS/IV estimation,

• Step 1: Use the instruments to forecast E [πt+12 | It ] and E [yt | It ].
This is done by regressing πt+12 and yt on zt , which yields π̂t+12 and ŷt .
• Step 2: Regress rt on x̂t = (1, π̂t+12 , ŷt )0
By construction, E [x̂t ut ] = 0.
• Weak Instruments:
The forecasts of the endogenous variables based on the instruments are
”poor”, meaning that there is a low correlation between the instruments
and the endogenous variables.

Example: On the Timing of Instruments
• If bt , xt ∈ zt , we would have to assume that bt , xt ∈ It .

It means that these two variables should be observed by the Central Bank
when rt is determined.
• If yt ∈ zt , we would assume that yt ∈ It , meaning that we could replace

E [yt |It ] with yt in the policy rule.
In that case we would not need an instrument for yt .

2. Introduction to GMM
Introduction to Generalizes Method of Moments
Generalized method of moments (GMM) is a general estimation principle.

Estimators are derived from moment conditions.
Three main motivations:
1 Maximum likelihood estimators have the smallest variance in the class of
consistent and asymptotically normal estimators.
But: We need a full description of the DGP and correct specification.
GMM is an alternative based on minimal assumptions.
2 GMM estimation is often possible where a likelihood analysis is extremely

difficult.
We only need a partial specification of the model.
Forward looking models under rational expectations.
3 Many estimators can be seen as special cases of GMM. Unifying

framework for comparison. MM/OLS/IV/2SLS/ML.

Comparison of ML and GMM
Maximum Likelihood Generalized Method of Moments
Assumptions Full specification. Partial specification/weak assumptions.

Know Density(θ0 ) apart from θ0 . Moment conditions: E (f (data;θ0 )) = 0.
Strong economic assumptions.
Typical Statistical description of the data. Estimate relevant parameters of

approach Misspecification testing. economic model.
Restrictions recover economics.

3. Method of Moments (MM) Estimation
Moment Conditions and Identification
• A moment condition is a statement involving the data and the parameters:
g(θ0 ) = E (f (wt , zt , θ0 )) = 0. (∗)
where θ is a K × 1 vector of parameters with true value θ0 ; f (·) is an

R × 1 vector of (non-linear) functions; wt contains model variables; and zt
contains instruments.
• If we knew the expectation then we could solve the equations in (∗) to
find θ0 .
• If there is a unique solution, so that
E (f (wt , zt , θ)) = 0 if and only if θ = θ0 ,
then we say that the system is identified.

• Identification is essential in econometrics. Two ideas:
1 Is the model constructed so that θ0 is unique (identification)?
2 Are the data informative enough to determine θ0 (empirical identification)?

Models With Instrumental Variables
• In many applications, the moment condition has the specific form:
f (wt , zt , θ) = u(wt , θ) · zt ,
| {z } |{z}
(1×1) (R×1)
where the R instruments in zt are multiplied by the disturbance term,

u(wt , θ).
• You can think of u(wt , θ) as the equivalent of an error term.

The moment condition becomes
g(θ0 ) = E (u(wt , θ0 ) · zt ) = 0,
stating that the instruments are uncorrelated with the error term of the
model.
• This class of estimators is referred to as instrumental variables estimators.

The function u(wt , θ) may be linear or non-linear in θ.

Method of Moments (MM) Estimator
• For a given sample, wt and zt (t = 1, 2, ..., T ), we cannot calculate the

expectation.
We replace with sample averages to obtain the analogous sample
moments:
T
1 X
gT (θ) = f (wt , zt , θ).
T
t=1
We can derive an estimator, θbMM , as the solution to gT (θbMM ) = 0.

• To find a unique estimator, we need at least as many equations as
parameters.
The order condition for identification is R ≥ K .
• R = K is called exact identification.
The estimator is denoted the method of moments estimator, θbMM .
• R > K is called overidentification.
The estimator is denoted the generalized method of moments estimator,
θbGMM .
• What about the case R < K ? ⇒ Underidentification

Example: MM Estimator of the Mean
• Assume that yt is a random variable drawn from a population with

expectation µ0 .
We have a single moment condition:
g(µ0 ) = E (f (yt , µ0 )) = E (yt − µ0 ) = 0,
where f (yt , µ0 ) = yt − µ0 .
• For a sample, y1 , y2 , ..., yT , we state the corresponding sample moment

condition:
T
1 X
gT (µ
b) = (yt − µ
b) = 0.
T
t=1
The MM estimator of the mean µ0 is the solution, i.e.,

T
1 X
µ
bMM = yt ,
T
t=1
which is the sample average.

Example: MM Estimator of the Mean
Recall the Count Data Models and the Poisson model.
Poisson distribution, Y ∼ Poisson(λ), defined as:

λy exp(−y )
Prob(Y = y | λ) = y ∈ {0, 1, 2, 3, . . .}
y!
E (Y ) = V (Y ) = λ
We found that, for a given sample {y1 , . . . , yT }, the ML estimator was the
sample average:
T
1 X
λML =
b yt
T
t=1
Therefore, GMM coincides with ML in the Poisson model.

Consider the linear regression model,
yt = xt0 β0 + t ,
where xt and β0 are K -dimensional vectors. Assuming that the model

represents the conditional expectation, E (yt |xt ) = xt0 β0 , an estimator of the
true parameters β can be derived based on moment condition of the general
form:
g(θ0 ) = E (f (wt , zt , θ0 )) = 0.
Q. What is the relevant expression for the moment condition, g(θ0 ), and the
function, f (wt , zt , θ0 )?
(A) g(β) = E (xt t ) = E (xt (yt − xt0 β)) = 0.

(B) g(β) = E (zt t ) = E (zt (yt − xt0 β)) = 0.
(C) g(β) = E (t |xt ) = E (yt − xt0 β|xt ) = 0.
(D) g(β) = E (yt |xt ) = E (xt0 β) = 0.
(E) Don’t know.
Example: OLS as a MM Estimator
• Consider the linear regression model of yt on xt (K × 1):
yt = xt0 β0 + t . (∗∗)
Assume that (∗∗) represents the conditional expectation:
E (yt | xt ) = xt0 β0 so that E (t | xt ) = 0.
• That implies the K unconditional moment conditions
g(β0 ) = E (xt t ) = E xt yt − xt0 β0

= 0,
which we recognize as the minimal assumption for consistency of the OLS

estimator.

Example: OLS as a MM Estimator
• We define the corresponding sample moment conditions as
T T T
1 X 1 X 1 X

gT (βb) = xt yt − xt0 βb = xt yt − xt xt0 βb = 0.
T T T
t=1 t=1 t=1
And the MM estimator is derived as the unique solution:

T
!−1 T
1 X 1 X
βbMM = xt xt0 xt yt ,
T T
t=1 t=1
PT
provided that 1
T t=1
xt xt0 is non-singular.
• Method of moments is one way to motivate the OLS estimator.

Highlights the minimal (or identifying) assumptions for OLS.

Example: Underidentification
• Consider again a regression model
yt = xt0 β0 + t
0 0
= x1t γ0 + x2t δ0 + t .
• Assume that the K1 variables in x1t are predetermined, while the

K2 = K − K1 variables in x2t are endogenous. This implies
E (x1t t ) = 0 (K1 × 1) ()

E (x2t t ) 6= 0 (K2 × 1). ()
• We have K parameters in β0 = (γ00 , δ00 )0 , but only K1 < K moment

conditions (i.e., K1 equations to determine K unknowns).
The parameters are not identified and cannot be estimated consistently.

Consider the linear regression model,
yt = xt0 β0 + t = x1t
0 0
γ0 + x2t δ0 + t ,
where the K1 variables in x1t are uncorrelated with t , while the K2 variables in
x2t are correlated with t . Assume that we have K2 valid instruments, z2t .
0 0 0 0 0 0
Finally, define xt = (x1t , x2t ) and zt = (x1t , z2t ).
Q. How can the instruments be used to derive a method of moments estimator
of β0 ?
P −1
T PT
(A) g(β0 ) = E (xt t ) = 0 ⇒ βbMM = 1
T
x x0
t=1 t t
1
T t=1
xt yt .
P −1
T PT
(B) g(β0 ) = E (zt t ) = 0 ⇒ βbMM = 1
T t=1
zt xt0 1
T t=1
zt yt .
P −1
T PT
(C) g(β0 ) = E (zt t ) = 0 ⇒ βbMM = 1
T t=1
zt zt0 1
T t=1
zt yt .
P −1
T PT
(D) g(β0 ) = E (z2t t ) = 0 ⇒ βbMM = 1
T
z x0
t=1 2t t
1
T t=1
z2t yt .
(E) Don’t know.

Example: Simple IV Estimator
• Assume K2 new variables, z2t , that are correlated with x2t but
uncorrelated with t :
E (z2t t ) = 0. ()
The K2 moment conditions in () can replace (). To simplify
notation, we define

x1t x1t
xt = and zt = .
(K ×1) x2t (K ×1) z2t
xt are model variables, z2t are new instruments, and zt are instruments.
We say that x1t are instruments for themselves.
• Using () and () we have K moment conditions:

E (x1t t )
= E (zt t ) = E (zt yt − xt0 β0 ) = 0,

g(β0 ) =
E (z2t t )
which are sufficient to identify the K parameters in β.

Example: Simple IV Estimator
• The corresponding sample moment conditions are given by
T
1 X

gT (βb) = zt yt − xt0 βb = 0.
T
t=1
• The method of moments estimator is the unique solution:
T
!−1 T
1 X 1 X
βbMM = zt xt0 zt yt ,
T T
t=1 t=1
PT
provided that 1
T
z x0
t=1 t t
is non-singular.
• Note the following:
1 We need the instruments to identify the parameters.
2 The MM estimator coincides with the simple IV estimator.
3 The procedure only works with K2 new instruments (i.e., R = K ).
PT
4 Non-singularity of z x0
t=1 t t
requires relevant instruments.

Consider the Taylor rule
rt = α0 + α1 · E [πt+12 − π ∗ | It ] + α2 · E [yt | It ], (1)
which can be written in terms of observable variables,
rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

To estimate the parameters in (2), we consider the moment conditions:
E [zt · ut ] = E [zt (rt − α0∗ + α1 πt+12 + α2 yt )] = 0.
Q. What is required for the instruments zt to be valid and relevant?b

(A) zt must be uncorrelated with E [πt+12 | It ], E [yt | It ], πt+12 , and yt .
(B) zt must be uncorrelated with E [πt+12 | It ] and E [yt | It ],
but correlated with πt+12 , and yt .
(C) zt must be uncorrelated with (E [πt+12 | It ] − πt+12 ), (E [yt | It ] − yt ),
πt+12 , and yt .
(D) zt must be uncorrelated with (E [πt+12 | It ] − πt+12 ) and (E [yt | It ] − yt ),
but correlated with πt+12 and yt .
(E) Don’t know.
4. Generalized Method of Moments (GMM)
Estimation
Generalized Method of Moments Estimation
• The case R > K is called overidentification.

Note that this is a fortunate situation, not a problem!
More equations than parameters and no solution to gT (θ) = 0 in general.
• Instead we minimize the distance from gT (θ) to zero.
The distance is measured by the quadratic form
QT (θ) = gT (θ)0 WT gT (θ),
where WT is an R × R symmetric and positive definite weight matrix.

• The GMM estimator depends on the weight matrix:
gT (θ)0 WT gT (θ) .

θbGMM (WT ) = arg min
θ
We can find the estimator by solving the K equations

∂QT (θ) ∂(gT (θ)0 WT gT (θ))
= = 0 .
∂θ ∂θ (K ×1)
Sometimes analytically but often by numerical methods.

Distances and Weight Matrices

Consider a simple example with 2 moment conditions

ga
gT (θ) = ,
gb
where the dependence between T and θ is suppressed.

How to specify the weight matrix WT when:
• The two moment conditions are equally important? Consider WT = I2 :

1 0 ga
QT (θ) = gT (θ)0 WT gT (θ) = = ga2 + gb2 ,

ga gb
0 1 gb
which is the square of the simple distance from gT (θ) to zero.

Here the coordinates are equally important.
• The moment condition ga is more important than gb : Alternatively, look
at a different weight matrix:

2 0 ga
QT (θ) = gT (θ)0 WT gT (θ) = = 2·ga2 +gb2 ,

ga gb
0 1 gb
which attaches more weight to the first coordinate in the distance.

Consider the derived expression for the GMM estimator,
θb = θ0 − (DT0 WT DT )−1 DT0 WT gT (θ0 ).
and the three assumptions:

Assumption 0. Moment condition holds: g(θ0 ) = 0.
Assumption 1. A LLN applies to f (wt , zt , θ):
T
X
T −1 f (wt , zt , θ) → E (f (wt , zt , θ) for T → ∞.
i=1
Assumption 2. A CLT applies to f (wt , zt , θ):

T
√ X
T · T −1 f (wt , zt , θ0 ) → N(0, S) for T → ∞.
i=1
Q. Which of the three assumptions are needed for consistency of the GMM
estimator? (And, more importantly, why?)
(A) Consistency requires Assumption 0.
(B) Consistency requires Assumption 1.
(C) Consistency requires Assumptions 1 and 2.
(D) Consistency requires Assumptions 1 and 2 and 3.
(E) Don’t know.

Consistency: Why Does it Work? (Box 1 in the lecture note)
• Assume that a law of large numbers (LLN) applies to f (wt , zt , θ), i.e.,
T
p
X
T −1 f (wt , zt , θ) → E (f (wt , zt , θ)) for T → ∞.
t=1
That requires IID or stationarity and weak dependence.
• If the moment conditions are correct, g(θ0 ) = 0, then GMM is consistent,

p
θbGMM (WT ) → θ0 as T → ∞,
for any WT positive definite.
• Intuition: If a LLN applies, then gT (θ) converges to g(θ).

Since θbGMM (WT ) minimizes the distance from gT (θ) to zero, it will be a
consistent estimator of the solution to g(θ0 ) = 0.
• The weight matrix, WT , has to be positive definite, so that we put a

positive and non-zero weight on all moment conditions.

Asymptotic Distribution (Box 1 in the lecture note)

• Assume a central limit theorem for f (wt , zt , θ), i.e.:
T
√ 1 X D
T · gT (θ0 ) = √ f (wt , zt , θ0 ) → N(0, S),
T t=1
where S is the asymptotic variance.

• Then it holds that for any positive definite weight matrix, W , the
asymptotic distribution of the GMM estimator is given by
√
D
T θbGMM − θ0 → N(0, V ).
The asymptotic variance is given by

−1 −1
V = D 0 WD D 0 WSWD D 0 WD ,
where !
∂f (wt , zt , θ)
D=E
∂θ0
θ=θ 0
is the expected value of the R × K matrix of first derivatives of f (wt , zt , θ).

Note: The variance depends on the choice of weight matrix.
Recall, that the GMM estimator depends on the symmetric and positive
definite weight matrix WT .
Q. Why do we care about the weight matrix WT ?
(A) Because the asymptotic variance of the GMM estimator depends on the
weight matrix.
(B) Because the GMM estimator is only consistent for some weight matrices.
(C) Because the asymptotic variance of the sample moments depends on the
weight matrix.
(D) Because the GMM estimator is only asymptotically normal for some
weight matrices.
(E) Don’t know.
5. Efficient GMM
Under Ass. 1 and 2 the asymptotic distribution of θbGMM is,
√
T (θb − θ0 ) → N(0, V ),
where
V = (D 0 WD)−1 D 0 WSWD(D 0 WD)−1 ,
and S is the asymptotic variance of the sample moment conditions and D is
the expected value of the R × K matrix of first derivatives of f (wt , zt , θ).
W is an (R × R) positive definite weight matrix which attaches different
weights to the R sample moments.
Q. Efficient GMM uses the optimal weight matrix, WTopt , which minimizes the
asymptotic variance V . How is that achieved?
(A) By using the weight matrix W = S.

(B) By using the weight matrix W = S −1 .
(C) By using the weight matrix W = D.
(D) By using the weight matrix W = D −1 .
(E) Don’t know.
(And what is the expression for the optimal weight matrix?)
Efficient GMM Estimation
• The variance of θ
bGMM depends on the weight matrix, WT .
The efficient GMM estimator has the smallest possible (asymptotic)
variance.
• Intuition: a moment with small variance is informative and should have
large weight.
It can be shown that the optimal weight matrix, WTopt , has the property
that
plimWTopt = S −1 .
With the optimal weight matrix, W = S −1 , the asymptotic variance
simplifies to
−1 −1 −1
V = D 0 S −1 D D 0 S −1 SS −1 D D 0 S −1 D = D 0 S −1 D .
• The best moment conditions have small S and large D.

• A small S means that the sample variation of the moment (noise) is small.
• A large D means that the moment condition is much violated if θ 6= θ0 .

The moment is very informative on the true values, θ0 .
Related to the curvature of the criterion function as in ML.

• Hypothesis testing can be based on the asymptotic distribution:

a
θbGMM ∼ N(θ0 , T −1 V
b ).
• An estimator of the asymptotic variance is given by

−1
b = DT0 ST−1 DT
V .
Recall that D = E (∂f (·)/∂θ0 ), which we can estimate by the average

T
∂gT (θ) 1 X ∂f (wt , zt , θ)
DT = = .
|{z} ∂θ0 T ∂θ0 θ=θ̂
(R×K ) t=1
Recall that S = T · V (gT (θ)). If the observations are independent, a

consistent estimator is
T
1 X
ST = f (wt , zt , θ)f (wt , zt , θ)0 .
T
t=1
Estimation of the weight matrix is typically the most tricky part of GMM.

Computational Issues
We need an optimal weight matrix, WTopt , but that depends on the parameters!
Two-step efficient GMM:
1 Choose an initial weight matrix, e.g. W[1] = IR , and find a consistent but
inefficient first-step GMM estimator
θb[1] = arg min gT (θ)0 W[1] gT (θ).

θ
opt
2 Find the optimal weight matrix, W[2] , based on θb[1] . Find the efficient
estimator
θb[2] = arg min gT (θ)0 W[2]
opt
gT (θ).
θ
The estimator is not unique as it depends on the initial weight matrix W[1] .

Iterated GMM estimator:
opt
• From the estimator θ
b[2] it is natural to update the weights, W[3] , and
update θb[3] .
opt
We can switch between estimating W[·] and θb[·] until convergence.
Iterated GMM does not depend on the initial weight matrix.
The two approaches are asymptotically equivalent.
Continuously updated GMM estimator:
• A third approach is to recognize from the outset that the weight matrix
depends on the parameters, and minimize
QT (θ) = gT (θ)0 WT (θ)gT (θ).
That is never possible to solve analytically.

Test of Overidentifying Moment Conditions

• Recall that K moment conditions are sufficient to estimate the K
parameters in θ.
• If R > K , we can test the validity of the R − K overidentifying moment
conditions.
• By MM estimation we can set K moment conditions equal to zero.
If all R conditions are valid then the R − K moments should also be close
to zero.
• From CLT we have
a
gT (θ0 ) ∼ N(0, T −1 S).
If we use the optimal weights, WTopt → S −1 , then
ξJ = T · gT (θbGMM )0 WTopt gT (θbGMM ) = T · QT (θbGMM ) → χ2 (R − K ).
• This is the J-test or the Hansen test for overidentifying restrictions.

In linear models it is often referred to as the Sargan test.
ξJ is not a test of the validity of model or the underlying economic theory.
ξJ considers whether the R − K moments are in line with the K
identifying moments.

6. Examples
Consider the estimated Taylor rule for the US economy for 1988(1)-2004(12):
rt = 1.21 + 1.19 · E (πt+12 |It ) + 0.49 · E (yt |It ),

(0.72) (0.21) (0.07)
using iterative GMM with HAC standard errors and the set of instruments
zt = (1, rt , ..., rt−6 , πt , πt−1 , ..., πt−6 , yt−1 , ..., yt−6 )0 .
The figure shows the inflation rate (red line), rt , and predicted inflation rate
(blue line),
brt = 1.12 + 1.19 · πt+12 + 0.49 · yt .
20 rt ^r
t
10
1975 1980 1985 1990 1995 2000 2005
Q. Is there empirical evidence for the Taylor rule?

(A) That depends on whether ût passes the misspecification tests.
(B) Yes, the estimates are significant and in line with the theory.
(C) No, the predicted interest rate b
rt does not match the actual rate rt .
(D) There might be, but are the estimates robust?
(E) Don’t know.
Example: The C-CAPM Model
• Consider the consumption based capital asset pricing (C-CAPM) model of

Hansen and Singleton (1982).
• A representative agent maximizes the discounted value of lifetime utility

subject to a budget constraint:
∞
X
max E (δ s · u(ct+s ) | It ) ,
s=1
At+1 = (1 + rt+1 ) At + yt+1 − ct+1 ,
where At is financial wealth, yt is income, 0 ≤ δ ≤ 1 is a discount factor,

and It is the information set at time t.
• The first order condition is given by the Euler equation:
u 0 (ct ) = E δ · u 0 (ct+1 ) · Rt+1 | It ,

where u 0 (·) is the derivative, and Rt+1 = 1 + rt+1 is the return factor.

The C-CAPM model yields the Euler equation:
u 0 (ct ) = E δ · u 0 (ct+1 ) · Rt+1 | It ,

where u 0 (·) is the derivative, and Rt+1 = 1 + rt+1 is the return factor.
Now assume a constant relative risk aversion (CRRA) utility function:
ct1−γ
u(ct ) = , γ < 1.
1−γ
Q. For the set of instruments, zt ∈ It , which of the following is a valid set of

moment conditions?
ct−γ − E −γ

(A) δ · ct+1 · Rt+1 zt = 0.
−γ
ct+1
(B) E δ· ct
· Rt+1 zt = 0.
−γ
ct+1
(C) E δ· ct
· Rt+1 − 1 zt = 0.
−γ

(D) E δ · ct+1 · Rt+1 zt = 0.
(E) Don’t know.
(And how many instruments do we need? And which ones would you suggest?)
• Now assume a constant relative risk aversion (CRRA) utility function:
ct1−γ
u(ct ) = , γ < 1,
1−γ
so that u 0 (ct ) = ct−γ . That gives the explicit Euler equation:
ct−γ − E δ · ct+1
−γ

· Rt+1 | It = 0.
• To ensure stationarity, we reformulate:

−γ
ct+1

E δ· · Rt+1 − 1 | It = 0,
ct
which is a conditional moment condition.
• That implies the unconditional moment conditions

−γ
ct+1 ct+1

E f , Rt+1 ; zt ; δ, γ =E δ· · Rt+1 − 1 zt = 0,
ct ct
for all variables zt ∈ It included in the formation set.

• The model variables are wt = (ct+1 /ct , Rt+1 )0 .
• To estimate the (K = 2) parameters, θ = (δ, γ)0 , we need at least R = 2

instruments in zt .
• We consider the R = 3 instruments zt = (1, ct /ct−1 , Rt )0 .
• The function f (·) is
f (wt , zt ; θ) = u(wt ; θ) · zt ,
ct+1 −γ

u(wt ; θ) = δ· · Rt+1 − 1.
ct

• The moment conditions are
g(θ0 ) = E [f (wt , zt ; θ0 )] = 0,
i.e.
−γ0
ct+1

E δ0 · · Rt+1 − 1 ·1 = 0
ct
−γ0
ct+1 ct

E δ0 · · Rt+1 − 1 = 0
ct ct−1
−γ0
ct+1

E δ0 · · Rt+1 − 1 Rt = 0,
ct
for t = 1, 2, ..., T .
• Given a choice of weight matrix, WT ,
θ̂GMM = arg min{gT (θ)0 WT gT (θ)},

θ
T
1 X
gT (θ) = f (wt , zt ; θ).
T
t=1

Results for US data, 1959 : 3 − 1978 : 12. (Data: hs.xls)

Method Lags δ γ T ξJ DF p − val
2-Step HC 1 0.9987 0.8770 237 0.434 1 0.510
(0.0086) (3.6792)
Iterated HC 1 0.9982 1.0249 237 1.068 1 0.301
(0.0044) (1.8614)
CU HC 1 0.9981 0.9549 237 1.067 1 0.302
(0.0044) (1.8629)
2-Step HAC 1 0.9987 0.8876 237 0.429 1 0.513

(0.0092) (4.0228)
Iterated HAC 1 0.9980 0.8472 237 1.091 1 0.296
(0.0045) (1.8757)
CU HAC 1 0.9977 0.7093 237 1.086 1 0.297
(0.0045) (1.8815)
2-Step HC 2 0.9975 0.0149 236 1.597 3 0.660

(0.0066) (2.6415)
Iterated HC 2 0.9968 −0.0210 236 3.579 3 0.311
(0.0045) (1.7925)
CU HC 2 0.9958 −0.5526 236 3.501 3 0.321
(0.0046) (1.8267)
2-Step HAC 2 0.9970 −0.1872 236 1.672 3 0.643

(0.0068) (2.7476)
Iterated HAC 2 0.9965 −0.2443 236 3.685 3 0.298
(0.0047) (1.8571)
CU HAC 2 0.9952 −0.9094 236 3.591 3 0.309
(0.0048) (1.9108)

• Empirically: The model is formally identified but γ is poorly determined.

Weak instruments, little variation in the data, or wrong model?
• If the instruments are weak, it is hard to identify the parameters

(Verbeek, Sections 5.5.4 and 5.6.4).
Recall that identification states that g(θ) = 0 ⇔ θ = θ0 .
Loosely speaking, weak identification means that g(θ) ≈ 0 for some
θ 6= θ0 .
• Considering the actual data, the variables ct+1 /ct , Rt ≈ 1 for all t.
Hence the variation is small.
Consequently, if δ = 1,
−γ
ct+1

u(wt ; θ) = δ· · Rt+1 − 1 ≈ δ(1)−γ − 1 ≈ 0
ct
for all γ!

Consider the asymptotic variance of the moments, S, given by:

T
!
√ 1 X
S=V T gT (θ) = V f (wt , zt , θ) .
T
t=1
Assume that the moments ft = f (wt , zt , θ) = zt t are independent over time.
Q. What does the expression for S simplifies to?
PT
E 2t zt zt0 .
1

(A) S= T t=1
T
V 2t zt zt0 .
1
P
(B) S= T t=1
1
PT
(C) S= T t=1
E (zt t ).
T
1
2 E (zt zt0 ).
P
(D) S= T t=1 t
(E) Don’t know.
Weight Matrix Estimation (Univariate Case)

• The optimal weight matrix is WTopt = ST−1 where ST is a consistent
estimator of
√ XT
! T
!
√ T 1 X
S = V ( T · gT (θ)) = V ft = ·V ft ,
T T
t=1 t=1
where ft = f (wt , zt , θ).

• If ft and fs are independent, then the variance of the sum is the sum of
the variances:
T
! T T
1 X 1 X 1 X
E ft2 .

S= ·V ft = V (ft ) =
T T T
t=1 t=1 t=1
A natural estimator is
T
1 X 2
ST = ft . (∗)
T
t=1
• This is robust to heteroskedasticity by construction and is often referred

to as the heteroskedasticity consistent (HC) covariance estimator.

Consider the asymptotic variance of the moments, S, given by:

T
!
√ 1 X
S=V T gT (θ) = V f (wt , zt , θ) .
T
t=1
Let f (wt , zt , θ) = zt t and assume that t is conditionally i.i.d.:
E (t |zt ) = 0, E (2t |zt ) = σ 2 , E (t s |zt , zs ) = 0 for any t 6= s.
Q. What does S simplifies to? And why?

PT
(A) S= 1
T t=1
E (2t zt zt0 ).
2 PT
(B) S= σ
T t=1
E (zt zt0 ).
2t PT
(C) S= T t=1
E (zt zt0 ).
σ2 T
P
(D) S= T t=1
V (zt ).
(E) Don’t know.
• If ft and fs are correlated, the variance includes the covariances:

XT
1

S= ·V ft
T t=1
= T −1 · V (f1 + f2 + ... + ft + ... + fT )
= T −1 · E [(f1 + f2 + ... + ft + ... + fT ) (f1 + f2 + ... + ft + ... + fT )]
= T −1 { E (f12 ) + E (f1 · f2 ) + E (f1 · f3 ) + ... + E (f1 · fT )
+ E (f2 · f1 ) + E (f22 ) + E (f2 · f3 ) + ... + E (f2 · fT )
+ ...
+ E (fT · f1 ) + E (fT · f2 ) + E (fT · f3 ) + ... + E (fT2 ) }.
• Defining the autocovariances of the moments as
1 XT
γj = E (ft · ft−j ),
T t=j+1
We can write S as
XT −1
S = γ0 + 2 · γ1 + 2 · γ2 + 2 · γ3 + .... + 2 · γT −1 = γ0 + 2 · γj ,
j=1
which is known as the long-run variance.

• Estimators are derived by using
T
1 X
γ̂j = ft · ft−j .
T
t=j+1
Note that γ̂0 is the HC estimator in (∗).

Also note that γ̂T −1 , γ̂T −2 , ... are based on very few observations: Not
consistent!
• One heteroskedasticity and autocorrelation consistent (HAC) variance
estimator is
T −1
X
ST = γ̂0 + 2 · γ̂j .
j=1
Problem: Not consistent because it uses γ̂T −1 .

• If γj = 0 for j ≥ q, then we can use the truncated estimator
q−1
X
ST = γ̂0 + 2 · γ̂j .
j=1
Problem: Not necessarily positive definite.

• A more general trick is to use a weight wj → 0 on covariance j.

This class of so-called kernel estimators can be written as
T −1
X
ST = γ̂0 + wj · 2 · γ̂j ,
j=1
where wj is a kernel weight depending on the lag, j, and a bandwidth

parameter, q.
• Example: Bartlett kernel (Newey-West estimator):
(A) Weights in the Bartlett kernel, q=6

Weights in the Bartlett kernel, q=6
1 1
5/6
4/6
3/6
2/6
1/6
0 Lags
-8 -6 -4 -2 0 2 4 6 8

Consider the two-stage least squares (2SLS) estimator of β in the model:
Y = X β + .
Step 1. Regress instruments Z on endogenous variables X to get X

b:
X = Z γ + U,
γ = ???
b
X
b = Zbγ = ???
Step 2. Regress Y on X
b:
Y =X
bB + E,
B b 0X
b = (X b )−1 X
b 0 Y = ???
Q. What is b
γ and B
b?
(A) γ = (ZZ 0 )−1 ZY and B
b b = (X 0 Z (Z 0 Z )−1 Z 0 X )−1 X 0 Z (Z 0 Z )−1 Z 0 Y .
γ = (Z 0 Z )−1 Z 0 Y and B
(B) b b = (Z 0 X (X 0 X )−1 X 0 Z )−1 Z 0 X (X 0 X )−1 X 0 Y .
(C) γ = (Z 0 Z )−1 Z 0 Y and B
b b = (XZ 0 (Z 0 Z )−1 ZX 0 )−1 XZ 0 (Z 0 Z )−1 ZY .
γ = (Z 0 Z )−1 Z 0 Y and B
(D) b b = (X 0 Z (Z 0 Z )−1 Z 0 X )−1 X 0 Z (Z 0 Z )−1 Z 0 Y .
(E) Don’t know.
Example: The Linear Model
• Consider again a regression model
yt = xt0 β0 + t = x1t
0 0
γ0 + x2t δ0 + t ,
where E (x1t t ) = 0 and E (x2t t ) 6= 0.

Assume that you have R > K valid instruments in zt so that
g(β0 ) = E (zt t ) = E (zt yt − xt0 β0 ) = 0.

• The corresponding sample moments are given by
T
1 X 1
gT (β) = zt yt − xt0 β = Z 0 (Y − X β) ,
| {z } T T
t=1
(R×1)
where Y (T × 1), X (T × K ), and Z (T × R) are the stacked matrices.
• In this case we cannot solve gT (β) = 0 directly; Z 0 X is R × K and not

invertible.

Consider the linear model
Y = X β + ,
with moment conditions, E (zt t ) = 0, and the quadratic form
QT (β) = gT (β)0 WT gT (β)

= T −2 Y 0 ZWT Z 0 Y − 2β 0 X 0 ZWT Z 0 Y + β 0 X 0 ZWT Z 0 X β .

Given some weight matrix, the GMM estimator solves:

∂QT (β)
= 0.
∂β
Q. What is the closed-form solution for βbGMM (WT )?
−1
(A) βbGMM (WT ) = (Z 0 XWT X 0 Z ) Z 0 XWT X 0 Y .
−1
(B) βbGMM (WT ) = −2T −2 (X 0 ZWT Z 0 X ) X 0 ZWT Z 0 Y .
−1
(C) βbGMM (WT ) = (X 0 ZWT Z 0 X ) X 0 ZWT Z 0 Y .
−1
(D) βbGMM (WT ) = X 0 Z (Z 0 Z )−1 Z 0 X X 0 Z (Z 0 Z )−1 Z 0 Y .
(E) Don’t know.
From Lecture Note 3
Let A be a (n × k) matrix, V a (k × k) matrix, and β a (k × 1) vector of
parameters.
∂(β 0 A0 )
= A0 . (7∗)
∂β
∂(β 0 V β)
= (V + V 0 )β. (8∗)
∂β
• Instead, we want to derive the GMM estimator by minimizing the criterion

function
QT (β) = gT (β)0 WT gT (β)

0
T −1 Z 0 (Y − X β) WT T −1 Z 0 (Y − X β)

=
T −2 Y 0 ZWT Z 0 Y − 2β 0 X 0 ZWT Z 0 Y + β 0 X 0 ZWT Z 0 X β .

=
• We take the first derivative, and the GMM estimator is the solution to
∂QT (β)
= −2T −2 X 0 ZWT Z 0 Y + 2T −2 X 0 ZWT Z 0 X β = 0.
∂β
• We find the general closed-form GMM estimator in the linear model
−1
βbGMM (WT ) = X 0 ZWT Z 0 X X 0 ZWT Z 0 Y ,
given some symmetric and positive definite weight matrix WT .

Example Case 1: Linear Model with Conditional Heteroskedasticity
We now consider the linear model
yt = xt0 β0 + t , t = 1, 2, ..., T ,
where we allow for heteroskedasticity of the moments f (wt , zt , θ) = zt t .
• From before, we have the closed-form solution GMM estimator:

−1
• To estimate the optimal weight matrix, WTopt = ST−1 , we use the estimator
T T
1 X 1 X 2 0
ST = · f (wt , zt , θ)f (wt , zt , θ)0 = t zt zt ,
b
T T
t=1 t=1
which allows for general heteroskedasticity of the disturbance term.

• For the asymptotic distribution, we recall that

−1
a
βbGMM ∼ N β0 , T −1 D 0 S −1 D .
The derivative is given by

PT
∂gT (β) ∂ T −1 t=1
zt (yt − xt0 β) T
X
DT = = = −T −1 zt xt0 ,
(R×K ) ∂β 0 ∂β 0
t=1
so the variance of the estimator becomes

−1
V βbGMM = T −1 DT0 WTopt DT
T
! T
!−1 T
!!
−1 −1
X X X
=T −T xt zt0 T −1
2t zt zt0
b −T −1
zt xt0
t=1 t=1 t=1
T
!−1 T T
!−1
X X X
= xt zt0 2t zt zt0
b zt xt0 .
t=1 t=1 t=1
• Note that this is the heteroskedasticity consistent (HC) variance estimator

(White).
GMM with allowance for heteroskedastic errors automatically produces
heteroskedasticity consistent standard errors!
Consider the linear model
Y = X β + ,
with moment conditions, E (zt t ) = 0, and closed-form solution for the GMM
estimator −1 0
βbGMM (WT ) = X 0 ZWT Z 0 X X ZWT Z 0 Y .
Assuming IID errors, such that we estimate the optimal weight matrix, WTopt ,
based on
T T
b2 X 0
σ 1 X 2
ST = zt zt = T −1 σ
b2 Z 0 Z , b2 =
σ t .
b
T T
t=1 t=1
Q. What does the efficient GMM estimator, βbGMM (WTopt ), simplifies to?
−1
(A) βbGMM (WT ) = (Z 0 XZ 0 ZX 0 Z ) Z 0 XZ 0 ZX 0 Y .
−1
(B) βbGMM (WT ) = X 0 Z (σ
b2 Z 0 Z )Z 0 X X 0 Z (σ
b2 Z 0 Z )Z 0 Y .
−1
(C) βbGMM (WT ) = X 0 Z (Z 0 Z )−1 Z 0 X X 0 Z (Z 0 Z )−1 Z 0 Y .
−1
(D) βbGMM (WT ) = T −2 X 0 Z (σ
b2 Z 0 Z )−1 Z 0 X X 0 Z (σ
b2 Z 0 Z )−1 Z 0 Y .
(E) Don’t know.
Example Case 2: Linear Model with Conditionally IID Errors
We now consider the linear model
yt = xt0 β0 + t , t = 1, 2, ..., T ,
with moment conditions, E (zt t ) = 0, and assuming conditional

homoskedasticity and independence over time (IID) of the error term t :
E (t |zt ) = 0, E (2t |zt ) = σ 2 , E (t s |zt , zs ) = 0 for any t 6= s.
• From before, we have the closed-form solution GMM estimator:

−1

• If we assume that the error terms are IID, the optimal weight matrix
simplifies to
T
b2 X 0
σ
ST = zt zt = T −1 σ
b2 Z 0 Z ,
T
t=1
b is a consistent estimator for σ 2 .

where σ 2
• In this case the efficient GMM estimator becomes

−1
βbGMM = X 0 ZST−1 Z 0 X X 0 ZST−1 Z 0 Y .
−1 −1 −1
= X 0 Z T −1 σ
b2 Z 0 Z Z 0X X 0 Z T −1 σ
b2 Z 0 Z Z 0Y
−1 −1 −1
= X 0Z Z 0Z Z 0X X 0Z Z 0Z Z 0Y ,
which is identical to the two stage least squares (2SLS) estimator.

• The variance of the estimator is
−1 −1
V βbGMM = T −1 DT0 ST−1 DT b2 (X 0 Z Z 0 Z
=σ Z 0 X )−1 ,
which again coincides with the 2SLS variance.

7. Pseudo-Maximum Likelihood Estimation
Consider the linear regression model
yt = xt0 β + t , t = 1, 2, ..., T ,
where we assume t ∼ iidN(0, σ 2 ) and estimate the parameters β by maximum

likelihood:
T
! −1 T
X X
βbML = xt xt0 xt yt .
t=1 t=1
Q. What are the asymptotic properties of βbML if t is i.i.d but NOT normally
distributed?
(A) βbML is consistent, but not asymptotically normally distributed.

(B) βbML is inconsistent and not asymptotically normally distributed.
(C) βbML is consistent and asymptotically normally distributed.
(D) βbML is asymptotically normally distributed, but inconsistent.
(E) Don’t know.
Pseudo-ML (PML) Estimation (Box 2 in the lecture note)

• The first order conditions for ML estimation can be seen as a sample
counterpart to a moment condition. With st (θ) the first-derivative of the
log-likelihood contribution,
T
1 1 X
s (θ) = st (θ) = 0 corresponds to E (st (θ)) = 0,
T T
t=1
and ML becomes a special case of GMM.

• θ
bML is consistent for weaker assumptions than maintained by ML.
E.g.: The FOC for a normal regression model corresponds to
E (xt (yt − xt0 β)) = 0,
which is weaker than the assumption that the entire distribution is

correctly specified. OLS is consistent even if t (the error term) is not
normal.
• A ML estimation that maximizes a likelihood function different from the
true model likelihood is referred to as a pseudo-ML or a quasi-ML
estimator.
Note that the variance matrix is no longer the inverse information (in
general).
Quasi-Maximum Likelihood Estimation: The Location Model
Consider the location model
yt = θ + t , t ∼ IID(0, 1).
We may consider the pseudo-log-likelihood function based on the assumption

that t is normal:
T
X 1 (yt − θ)2
LT (θ) = − log(2π) − .
2 2
t=1
The QMLE maximizes LT (θ), and the FOC yields

T
∂LT (θ) X
= 0 ⇔ (yt − θ) = 0.
∂θ
t=1
This is exactly the sample moment condition corresponding to E [yt − θ] = 0.

PT
Even if t is not normal, we obtain ȳ = T1 y as the estimator for θ.
t=1 t

Quasi-Maximum Likelihood Estimation: The General Case

With θ the parameter vector and lt (θ) the log-likelihood contribution, recall
that the ML estimator satisfies,
√
T (θ̂ML − θ0 ) → N(0, J −1 ),
where

∂ 2 lt (θ0 )
J = −E , positive definite (the information matrix).
∂θ∂θ0
Suppose that lt (θ) is not necessarily based on the true model likelihood. Then
we introduce the quasi-maximum likelihood estimator (QMLE),
T
X
θ̂QML = arg max lt (θ).
θ
t=1
Under suitable conditions,

√
T (θ̂QML − θ0 ) → N(0, J −1 ΣJ −1 ),
where

∂lt (θ0 ) ∂lt (θ0 )
Σ = E , positive definite (the variance of the score).
∂θ ∂θ0
If lt (θ) is the true model likelihood, J = Σ, and the asymptotic variance is J −1 .
Quasi-Maximum Likelihood Estimation: ARCH

Consider a simple version of the ARCH model:
yt = σt z t , zt ∼ IID(0, 1),
σt2 = 2
ω + αyt−1 .
The model parameters are θ = (ω, α)0 . The quasi-log-likelihood function based
on the normal distribution is
T
X 1 1 y2
LT (θ) = lt (θ), lt (θ) = − log(2π) − log(σt2 (θ)) − 2t .
2 2 2σt (θ)
t=1
Under suitable conditions,

√
T (θ̂QML − θ0 ) → N(0, J −1 ΣJ −1 ).
∂σ 2 (θ ) ∂σ 2 (θ )
It holds that J = 21 E [ σ4 (θ
1 t 0
∂θ
t 0
∂θ 0
] and Σ = κ2 J, where κ = E [(zt2 − 1)2 ].
t 0)
Hence,
√ κ
T (θ̂QML − θ0 ) → N(0, J −1 ).
2
Note that if zt ∼ N(0, 1), then E [zt4 ] = 3 and κ = E [zt4 ] + 1 − 2E [zt2 ] = 2.
Hence if zt is normal, such that we have chosen the correct likelihood, we have
that the asymptotic variance is J −1 .
8. Concluding Remarks
Comparison of ML and GMM
Maximum Likelihood Generalized Method of Moments
Assumptions: Full specification. Partial specification/weak assumptions.

Know Density(θ0 ) apart from θ0 . Moment conditions: E (f (data;θ0 )) = 0.
Strong economic assumptions.
Efficiency: Cramér-Rao lower bound. Efficient based on moment condition.

(Smallest possible variance). Never smaller than Cramér-Rao.
Typical Statistical description of the data. Estimate relevant parameters of

approach: Misspecification testing. economic model.
Restrictions recover economics. (Not much attention to stationarity.)
Robustness: First order conditions should hold! Moment conditions should hold!
PML is a GMM interpretation of ML. Weights and variances can
Use larger PML variance. be made robust.

05 GMM (Annotated)

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

05 GMM (Annotated)

Caricato da

Copyright:

Formati disponibili

university of copenhagen department of economics

Econometrics II — Generalized Method of Moments — Slide 2/52

Course Outline: Generalized Method of Moments

Example: The Taylor Rule

rt = α0 + α1 · E [πt+12 − π ∗ | It ] + α2 · E [yt | It ]. (1)

• E [· | It ] denotes the rational expectation conditional on the information

Econometrics II — Generalized Method of Moments — Slide 5/52

Example: The Interpretation

rt = α0 + α1 · E [πt+12 − π ∗ | It ] + α2 · E [yt | It ]. (1)

• α1 > 1 in order to increase the real interest rate.

Econometrics II — Generalized Method of Moments — Slide 6/52

rt = α0 + α1 · E [πt+12 − π ∗ | It ] + α2 · E [yt | It ]. (1)

We want to estimate the coefficients θ = (α0 , α1 , α2 )0 . Because we do not

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

where α0∗ = α0 − α1 π ∗ and ut contains the forecast errors.

(A) θbOLS is consistent as the moment condition is fullfilled.

Example: OLS Estimation

• Replace the variables with the observed values:

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

• As the explanatory variables, πt+12 and yt , are correlated with ut , the

Hence, the OLS estimator is inconsistent.

Econometrics II — Generalized Method of Moments — Slide 7/52

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

where α0∗ = α0 − α1 π ∗ and

We can always decompose πt+12 (or yt ) into a conditional expectation given It

πt+12 = E [πt+12 |It ] + vt , where E [vt |It ] = 0.

Q. If the model is correct, what is the implication for the conditional

(A) E [ut |It ] = 0.

Example: Rational Expectations

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

Under rational expectations (model-consistent expectations), the time t

• Consequently, under rational expectations the expectation error, ut , is

ut = −α1 vt − α2 wt , where E [ut |It ] = 0.

Econometrics II — Generalized Method of Moments — Slide 8/52

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

By introducing a set of R instruments, denoted zt , we can estimate the

Q. What is required for the instruments zt to be valid instruments? (And why?)

(A) They must be in the information set at time t, zt ∈ It .

Example: Introducing Instruments

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

E [zt · ut ] = E [zt · (rt − α0∗ − α1 πt+12 − α2 yt )] = 0.

Econometrics II — Generalized Method of Moments — Slide 9/52

Example: Introducing Instruments

• Consider the 11 instruments,

where bt is a bond yield and xt is the unemployment rate.

• The instruments are included in the information set at time t, but it is

• In relation to 2SLS/IV estimation,

Econometrics II — Generalized Method of Moments — Slide 10/52

Example: On the Timing of Instruments

• If bt , xt ∈ zt , we would have to assume that bt , xt ∈ It .

• If yt ∈ zt , we would assume that yt ∈ It , meaning that we could replace

Econometrics II — Generalized Method of Moments — Slide 11/52

Introduction to Generalizes Method of Moments

Generalized method of moments (GMM) is a general estimation principle.

2 GMM estimation is often possible where a likelihood analysis is extremely

3 Many estimators can be seen as special cases of GMM. Unifying

Econometrics II — Generalized Method of Moments — Slide 13/52

Comparison of ML and GMM

Maximum Likelihood Generalized Method of Moments

Assumptions Full specification. Partial specification/weak assumptions.

Typical Statistical description of the data. Estimate relevant parameters of

Econometrics II — Generalized Method of Moments — Slide 14/52

Moment Conditions and Identification

(A) g(β) = E (xt t ) = E (xt (yt − xt0 β)) = 0.

E (yt | xt ) = xt0 β0 so that E (t | xt ) = 0.

g(β0 ) = E (xt t ) = E xt yt − xt0 β0

E (x1t t ) = 0 (K1 × 1) ()

• Using () and () we have K moment conditions: