Sei sulla pagina 1di 106

university of copenhagen department of economics

Econometrics II
Generalized Method of Moments
(GMM) Estimation
Morten Nyboe Tabor
university of copenhagen department of economics

Learning Outcomes
1 Explain why the OLS estimator is inconsistent for linear regression models
with endogenous regressors (e.g. when estimating Taylor rules). Explain
the role of instruments in these models.
2 Give an account for the principle of MM and GMM estimation. Explain
the notion of under-, exact, and over-identification.
3 Explain how the GMM estimator may be computed in practice.
4 Give an account for the assumptions needed in order to obtain consistency
and asymptotic normality of the GMM.
5 Explain the role of the weight matrix.
6 Explain the notion of efficient GMM, and how this is obtained.
7 Explain the Hansen J-test.
8 Be able to determine the functions f (wt , zt , θ), g(θ), gT (θ), and QT (θ)
for specific models, such as the linear regression model (e.g. when
estimating a Taylor rule) and the C-CAPM. Explain how the functions are
computed in practice (if possible).
9 Explain the relationship between GMM and 2SLS.
10 Explain the notion of weak instruments and weak identification.

Econometrics II — Generalized Method of Moments — Slide 2/52


university of copenhagen department of economics

Course Outline: Generalized Method of Moments


1 Example: Estimation of a Taylor Rule
2 Introduction to GMM
3 Method of Moments (MM) Estimation
Principle
Examples: MM Estimator of the Mean, OLS, Underidentification and IV
4 Generalized Method of Moments (GMM) Estimation
Principle
Properties: Consistency and Asymptotic Distribution
5 Efficient GMM
Principle
Computational Issues
Test of Overidentifying Moment Conditions
6 Examples
The C-CAPM Model
Weight Matrix Estimation
2SLS
7 Pseudo-Maximum Likelihood Estimation
Pseudo-ML (PML) Estimation
8 Concluding Remarks
Comparison ML/GMM
Econometrics II — Generalized Method of Moments — Slide 3/58
1. Example: Estimation of a Taylor Rule
university of copenhagen department of economics

Example: The Taylor Rule

In the seminal paper, John Taylor (1993) suggested that the central bank sets
the short term interest rate, rt , according to the simple monetary policy rule

rt = α0 + α1 · E [πt+12 − π ∗ | It ] + α2 · E [yt | It ]. (1)

• E [· | It ] denotes the rational expectation conditional on the information


set available at time t.
• πt is the current inflation (year-on-year).
• π ∗ is the constant inflation target.
• yt is the output gap.

Econometrics II — Generalized Method of Moments — Slide 5/52


university of copenhagen department of economics

Example: The Interpretation

In the seminal paper, John Taylor (1993) suggested that the central bank sets
the short term interest rate, rt , according to the simple monetary policy rule

rt = α0 + α1 · E [πt+12 − π ∗ | It ] + α2 · E [yt | It ]. (1)

• α1 > 1 in order to increase the real interest rate.


• α2 > 0 in order to cool the economy.
• Values suggested in Taylor’s original paper: α1 = 1.5 and α2 = 0.5.

Econometrics II — Generalized Method of Moments — Slide 6/52


Socrative Question 1

Consider the monetary policy rule for the interest rate, rt , given by the Taylor
rule with with rational expectations:

rt = α0 + α1 · E [πt+12 − π ∗ | It ] + α2 · E [yt | It ]. (1)

We want to estimate the coefficients θ = (α0 , α1 , α2 )0 . Because we do not


observe the expected inflation E [πt+1 | It ] and expected output gap E [yt | It ],
we consider the model,

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

where α0∗ = α0 − α1 π ∗ and ut contains the forecast errors.


Q. Is the OLS estimator θbOLS in (2) consistent?

(A) θbOLS is consistent as the moment condition is fullfilled.


(B) θbOLS is consistent if the rational expectations are correct on average.
(C) θbOLS is inconsistent as the moment condition is violated.
(D) θbOLS is inconsistent as (2) contains the future variable πt+12 .
(E) Don’t know.
university of copenhagen department of economics

Example: OLS Estimation

In general, we do not observe the market’s expectations for future inflation and
the current output gap (which is not observed).

• Replace the variables with the observed values:

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

where,
 
ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt , (3)

and α0∗ ∗
= α0 − α1 π .

• As the explanatory variables, πt+12 and yt , are correlated with ut , the


moment conditions E [πt+12 ut ] 6= 0 and E [yt ut ] 6= 0 are violated.

Hence, the OLS estimator is inconsistent.

Econometrics II — Generalized Method of Moments — Slide 7/52


Socrative Question 2
Consider the Taylor rule in terms of observable variables:

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

where α0∗ = α0 − α1 π ∗ and


 
ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt . (3)

We can always decompose πt+12 (or yt ) into a conditional expectation given It


and a forecast error, which is orthogonal to It :

πt+12 = E [πt+12 |It ] + vt , where E [vt |It ] = 0.

Q. If the model is correct, what is the implication for the conditional


expectation E [ut |It ]?

(A) E [ut |It ] = 0.


(B) E [ut |It ] = t .

(C) E [ut |It ] = α2 E [yt | It ] − yt .
(D) E [ut |It ] = α2 .
(E) Don’t know.
university of copenhagen department of economics

Example: Rational Expectations


The Taylor rule in terms of observable variables:

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

where,  
ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt , (3)
and α0∗ = α0 − α1 π . ∗

Under rational expectations (model-consistent expectations), the time t


expectation of πt+12 is represented as the conditional expectation of πt+12
given the information set It .
• We can always decompose πt+12 (or yt ) into a conditional expectation
given It and a forecast error, which is orthogonal to It :

πt+12 = E [πt+12 |It ] + vt , E [πt+12 |It ] − πt+12 = −vt , where E [vt |It ] = 0.
yt = E [yt |It ] + wt , E [yt |It ] − yt = −wt , where E [wt |It ] = 0.

• Consequently, under rational expectations the expectation error, ut , is


orthogonal to the information set It and with conditional expectation zero:

ut = −α1 vt − α2 wt , where E [ut |It ] = 0.

Econometrics II — Generalized Method of Moments — Slide 8/52


Socrative Question 3
Consider the Taylor rule in terms of observable variables:

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)


 
ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt . (3)

By introducing a set of R instruments, denoted zt , we can estimate the


parameters from the moment condition:

E [zt · ut ].

Q. What is required for the instruments zt to be valid instruments? (And why?)

(A) They must be in the information set at time t, zt ∈ It .


(B) They must be in the information set at time t, zt ∈ It , and correlated
with rt .
(C) They must be in the information set at time t, zt ∈ It , and correlated
with πt+12 and yt .
(D) They must be correlated with the forecast errors E [πt+12 | It ] − πt+12 and
E [yt | It ] − yt .
(E) Don’t know.
Bonus question: What are natural candidates for the instruments zt ?
university of copenhagen department of economics

Example: Introducing Instruments


The Taylor rule in terms of observable variables:

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)

where,  
ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt , (3)
and α0∗ ∗
= α0 − α1 π .
• Assuming rational expectations implies: E [ut | It ] = 0.
• Introducing R instruments, zt ∈ It , we have the moment conditions:

E [zt · ut ] = E [zt · (rt − α0∗ − α1 πt+12 − α2 yt )] = 0.

The moment conditions hold for all variables zt in the information set It !
• We need R ≥ 3, as we have 3 parameters to estimate, θ = (α0∗ , α1 , α2 )0 .
• For GMM estimation, we consider the sample moments

T
1 X
gT (θ) = zt · (rt − α0∗ − α1 πt+12 − α2 yt ).
T
t=1

Econometrics II — Generalized Method of Moments — Slide 9/52


university of copenhagen department of economics

Example: Introducing Instruments

• Consider the 11 instruments,

zt = (1, rt−1 , rt−2 , πt−1 , πt−2 , yt−1 , yt−2 , bt−1 , bt−2 , xt−1 , xt−2 )0 ,

where bt is a bond yield and xt is the unemployment rate.

• The instruments are included in the information set at time t, but it is


assumed that the central bank does not react directly to these variables
when deciding on the policy rate.

• In relation to 2SLS/IV estimation,


• Step 1: Use the instruments to forecast E [πt+12 | It ] and E [yt | It ].
This is done by regressing πt+12 and yt on zt , which yields π̂t+12 and ŷt .
• Step 2: Regress rt on x̂t = (1, π̂t+12 , ŷt )0
By construction, E [x̂t ut ] = 0.
• Weak Instruments:
The forecasts of the endogenous variables based on the instruments are
”poor”, meaning that there is a low correlation between the instruments
and the endogenous variables.

Econometrics II — Generalized Method of Moments — Slide 10/52


university of copenhagen department of economics

Example: On the Timing of Instruments

• If bt , xt ∈ zt , we would have to assume that bt , xt ∈ It .


It means that these two variables should be observed by the Central Bank
when rt is determined.

• If yt ∈ zt , we would assume that yt ∈ It , meaning that we could replace


E [yt |It ] with yt in the policy rule.
In that case we would not need an instrument for yt .

Econometrics II — Generalized Method of Moments — Slide 11/52


2. Introduction to GMM
university of copenhagen department of economics

Introduction to Generalizes Method of Moments

Generalized method of moments (GMM) is a general estimation principle.


Estimators are derived from moment conditions.
Three main motivations:
1 Maximum likelihood estimators have the smallest variance in the class of
consistent and asymptotically normal estimators.
But: We need a full description of the DGP and correct specification.
GMM is an alternative based on minimal assumptions.

2 GMM estimation is often possible where a likelihood analysis is extremely


difficult.
We only need a partial specification of the model.
Forward looking models under rational expectations.

3 Many estimators can be seen as special cases of GMM. Unifying


framework for comparison. MM/OLS/IV/2SLS/ML.

Econometrics II — Generalized Method of Moments — Slide 13/52


university of copenhagen department of economics

Comparison of ML and GMM

Maximum Likelihood Generalized Method of Moments

Assumptions Full specification. Partial specification/weak assumptions.


Know Density(θ0 ) apart from θ0 . Moment conditions: E (f (data;θ0 )) = 0.
Strong economic assumptions.

Typical Statistical description of the data. Estimate relevant parameters of


approach Misspecification testing. economic model.
Restrictions recover economics.

Econometrics II — Generalized Method of Moments — Slide 14/52


3. Method of Moments (MM) Estimation
university of copenhagen department of economics

Moment Conditions and Identification

• A moment condition is a statement involving the data and the parameters:

g(θ0 ) = E (f (wt , zt , θ0 )) = 0. (∗)

where θ is a K × 1 vector of parameters with true value θ0 ; f (·) is an


R × 1 vector of (non-linear) functions; wt contains model variables; and zt
contains instruments.
• If we knew the expectation then we could solve the equations in (∗) to
find θ0 .
• If there is a unique solution, so that

E (f (wt , zt , θ)) = 0 if and only if θ = θ0 ,

then we say that the system is identified.


• Identification is essential in econometrics. Two ideas:
1 Is the model constructed so that θ0 is unique (identification)?

2 Are the data informative enough to determine θ0 (empirical identification)?

Econometrics II — Generalized Method of Moments — Slide 16/52


university of copenhagen department of economics

Models With Instrumental Variables

• In many applications, the moment condition has the specific form:

f (wt , zt , θ) = u(wt , θ) · zt ,
| {z } |{z}
(1×1) (R×1)

where the R instruments in zt are multiplied by the disturbance term,


u(wt , θ).

• You can think of u(wt , θ) as the equivalent of an error term.


The moment condition becomes

g(θ0 ) = E (u(wt , θ0 ) · zt ) = 0,

stating that the instruments are uncorrelated with the error term of the
model.

• This class of estimators is referred to as instrumental variables estimators.


The function u(wt , θ) may be linear or non-linear in θ.

Econometrics II — Generalized Method of Moments — Slide 17/52


university of copenhagen department of economics

Method of Moments (MM) Estimator

• For a given sample, wt and zt (t = 1, 2, ..., T ), we cannot calculate the


expectation.
We replace with sample averages to obtain the analogous sample
moments:
T
1 X
gT (θ) = f (wt , zt , θ).
T
t=1

We can derive an estimator, θbMM , as the solution to gT (θbMM ) = 0.


• To find a unique estimator, we need at least as many equations as
parameters.
The order condition for identification is R ≥ K .
• R = K is called exact identification.
The estimator is denoted the method of moments estimator, θbMM .
• R > K is called overidentification.
The estimator is denoted the generalized method of moments estimator,
θbGMM .
• What about the case R < K ? ⇒ Underidentification

Econometrics II — Generalized Method of Moments — Slide 18/52


university of copenhagen department of economics

Example: MM Estimator of the Mean

• Assume that yt is a random variable drawn from a population with


expectation µ0 .
We have a single moment condition:

g(µ0 ) = E (f (yt , µ0 )) = E (yt − µ0 ) = 0,

where f (yt , µ0 ) = yt − µ0 .

• For a sample, y1 , y2 , ..., yT , we state the corresponding sample moment


condition:
T
1 X
gT (µ
b) = (yt − µ
b) = 0.
T
t=1

The MM estimator of the mean µ0 is the solution, i.e.,


T
1 X
µ
bMM = yt ,
T
t=1

which is the sample average.

Econometrics II — Generalized Method of Moments — Slide 19/52


university of copenhagen department of economics

Example: MM Estimator of the Mean

Recall the Count Data Models and the Poisson model.

Poisson distribution, Y ∼ Poisson(λ), defined as:


λy exp(−y )
Prob(Y = y | λ) = y ∈ {0, 1, 2, 3, . . .}
y!
E (Y ) = V (Y ) = λ

We found that, for a given sample {y1 , . . . , yT }, the ML estimator was the
sample average:
T
1 X
λML =
b yt
T
t=1

Therefore, GMM coincides with ML in the Poisson model.

Econometrics II — Generalized Method of Moments — Slide 20/52


Socrative Question 4

Consider the linear regression model,

yt = xt0 β0 + t ,

where xt and β0 are K -dimensional vectors. Assuming that the model


represents the conditional expectation, E (yt |xt ) = xt0 β0 , an estimator of the
true parameters β can be derived based on moment condition of the general
form:
g(θ0 ) = E (f (wt , zt , θ0 )) = 0.

Q. What is the relevant expression for the moment condition, g(θ0 ), and the
function, f (wt , zt , θ0 )?

(A) g(β) = E (xt t ) = E (xt (yt − xt0 β)) = 0.


(B) g(β) = E (zt t ) = E (zt (yt − xt0 β)) = 0.
(C) g(β) = E (t |xt ) = E (yt − xt0 β|xt ) = 0.
(D) g(β) = E (yt |xt ) = E (xt0 β) = 0.
(E) Don’t know.
university of copenhagen department of economics

Example: OLS as a MM Estimator

• Consider the linear regression model of yt on xt (K × 1):

yt = xt0 β0 + t . (∗∗)

Assume that (∗∗) represents the conditional expectation:

E (yt | xt ) = xt0 β0 so that E (t | xt ) = 0.

• That implies the K unconditional moment conditions

g(β0 ) = E (xt t ) = E xt yt − xt0 β0



= 0,

which we recognize as the minimal assumption for consistency of the OLS


estimator.

Econometrics II — Generalized Method of Moments — Slide 21/52


university of copenhagen department of economics

Example: OLS as a MM Estimator

• We define the corresponding sample moment conditions as

T T T
1 X 1 X 1 X
 
gT (βb) = xt yt − xt0 βb = xt yt − xt xt0 βb = 0.
T T T
t=1 t=1 t=1

And the MM estimator is derived as the unique solution:


T
!−1 T
1 X 1 X
βbMM = xt xt0 xt yt ,
T T
t=1 t=1
PT
provided that 1
T t=1
xt xt0 is non-singular.

• Method of moments is one way to motivate the OLS estimator.


Highlights the minimal (or identifying) assumptions for OLS.

Econometrics II — Generalized Method of Moments — Slide 22/52


university of copenhagen department of economics

Example: Underidentification

• Consider again a regression model

yt = xt0 β0 + t
0 0
= x1t γ0 + x2t δ0 + t .

• Assume that the K1 variables in x1t are predetermined, while the


K2 = K − K1 variables in x2t are endogenous. This implies

E (x1t t ) = 0 (K1 × 1) ()


E (x2t t ) 6= 0 (K2 × 1). ()

• We have K parameters in β0 = (γ00 , δ00 )0 , but only K1 < K moment


conditions (i.e., K1 equations to determine K unknowns).
The parameters are not identified and cannot be estimated consistently.

Econometrics II — Generalized Method of Moments — Slide 23/52


Socrative Question 5

Consider the linear regression model,

yt = xt0 β0 + t = x1t
0 0
γ0 + x2t δ0 + t ,

where the K1 variables in x1t are uncorrelated with t , while the K2 variables in
x2t are correlated with t . Assume that we have K2 valid instruments, z2t .
0 0 0 0 0 0
Finally, define xt = (x1t , x2t ) and zt = (x1t , z2t ).
Q. How can the instruments be used to derive a method of moments estimator
of β0 ?
 P −1
T PT
(A) g(β0 ) = E (xt t ) = 0 ⇒ βbMM = 1
T
x x0
t=1 t t
1
T t=1
xt yt .
 P −1
T PT
(B) g(β0 ) = E (zt t ) = 0 ⇒ βbMM = 1
T t=1
zt xt0 1
T t=1
zt yt .
 P −1
T PT
(C) g(β0 ) = E (zt t ) = 0 ⇒ βbMM = 1
T t=1
zt zt0 1
T t=1
zt yt .
 P −1
T PT
(D) g(β0 ) = E (z2t t ) = 0 ⇒ βbMM = 1
T
z x0
t=1 2t t
1
T t=1
z2t yt .

(E) Don’t know.


university of copenhagen department of economics

Example: Simple IV Estimator

• Assume K2 new variables, z2t , that are correlated with x2t but
uncorrelated with t :
E (z2t t ) = 0. ()
The K2 moment conditions in () can replace (). To simplify
notation, we define
   
x1t x1t
xt = and zt = .
(K ×1) x2t (K ×1) z2t

xt are model variables, z2t are new instruments, and zt are instruments.
We say that x1t are instruments for themselves.

• Using () and () we have K moment conditions:


 
E (x1t t )
= E (zt t ) = E (zt yt − xt0 β0 ) = 0,

g(β0 ) =
E (z2t t )

which are sufficient to identify the K parameters in β.

Econometrics II — Generalized Method of Moments — Slide 24/52


university of copenhagen department of economics

Example: Simple IV Estimator

• The corresponding sample moment conditions are given by

T
1 X
 
gT (βb) = zt yt − xt0 βb = 0.
T
t=1

• The method of moments estimator is the unique solution:

T
!−1 T
1 X 1 X
βbMM = zt xt0 zt yt ,
T T
t=1 t=1
PT
provided that 1
T
z x0
t=1 t t
is non-singular.
• Note the following:
1 We need the instruments to identify the parameters.
2 The MM estimator coincides with the simple IV estimator.
3 The procedure only works with K2 new instruments (i.e., R = K ).
PT
4 Non-singularity of z x0
t=1 t t
requires relevant instruments.

Econometrics II — Generalized Method of Moments — Slide 25/52


Socrative Question 6
Consider the Taylor rule

rt = α0 + α1 · E [πt+12 − π ∗ | It ] + α2 · E [yt | It ], (1)

which can be written in terms of observable variables,

rt = α0∗ + α1 πt+12 + α2 yt + ut , (2)


 
ut = α1 E [πt+12 | It ] − πt+12 +α2 E [yt | It ] − yt . (3)

To estimate the parameters in (2), we consider the moment conditions:

E [zt · ut ] = E [zt (rt − α0∗ + α1 πt+12 + α2 yt )] = 0.

Q. What is required for the instruments zt to be valid and relevant?b


(A) zt must be uncorrelated with E [πt+12 | It ], E [yt | It ], πt+12 , and yt .
(B) zt must be uncorrelated with E [πt+12 | It ] and E [yt | It ],
but correlated with πt+12 , and yt .
(C) zt must be uncorrelated with (E [πt+12 | It ] − πt+12 ), (E [yt | It ] − yt ),
πt+12 , and yt .
(D) zt must be uncorrelated with (E [πt+12 | It ] − πt+12 ) and (E [yt | It ] − yt ),
but correlated with πt+12 and yt .
(E) Don’t know.
4. Generalized Method of Moments (GMM)
Estimation
university of copenhagen department of economics

Generalized Method of Moments Estimation

• The case R > K is called overidentification.


Note that this is a fortunate situation, not a problem!
More equations than parameters and no solution to gT (θ) = 0 in general.
• Instead we minimize the distance from gT (θ) to zero.
The distance is measured by the quadratic form

QT (θ) = gT (θ)0 WT gT (θ),

where WT is an R × R symmetric and positive definite weight matrix.


• The GMM estimator depends on the weight matrix:

gT (θ)0 WT gT (θ) .

θbGMM (WT ) = arg min
θ

We can find the estimator by solving the K equations


∂QT (θ) ∂(gT (θ)0 WT gT (θ))
= = 0 .
∂θ ∂θ (K ×1)

Sometimes analytically but often by numerical methods.

Econometrics II — Generalized Method of Moments — Slide 27/58


university of copenhagen department of economics

Distances and Weight Matrices


Consider a simple example with 2 moment conditions
 
ga
gT (θ) = ,
gb

where the dependence between T and θ is suppressed.


How to specify the weight matrix WT when:
• The two moment conditions are equally important? Consider WT = I2 :
  
1 0 ga
QT (θ) = gT (θ)0 WT gT (θ) = = ga2 + gb2 ,

ga gb
0 1 gb

which is the square of the simple distance from gT (θ) to zero.


Here the coordinates are equally important.
• The moment condition ga is more important than gb : Alternatively, look
at a different weight matrix:
  
2 0 ga
QT (θ) = gT (θ)0 WT gT (θ) = = 2·ga2 +gb2 ,

ga gb
0 1 gb

which attaches more weight to the first coordinate in the distance.


Econometrics II — Generalized Method of Moments — Slide 28/58
Socrative Question 7
Consider the derived expression for the GMM estimator,

θb = θ0 − (DT0 WT DT )−1 DT0 WT gT (θ0 ).

and the three assumptions:


Assumption 0. Moment condition holds: g(θ0 ) = 0.
Assumption 1. A LLN applies to f (wt , zt , θ):
T
X
T −1 f (wt , zt , θ) → E (f (wt , zt , θ) for T → ∞.
i=1

Assumption 2. A CLT applies to f (wt , zt , θ):


T
√ X
T · T −1 f (wt , zt , θ0 ) → N(0, S) for T → ∞.
i=1

Q. Which of the three assumptions are needed for consistency of the GMM
estimator? (And, more importantly, why?)
(A) Consistency requires Assumption 0.

(B) Consistency requires Assumption 1.

(C) Consistency requires Assumptions 1 and 2.

(D) Consistency requires Assumptions 1 and 2 and 3.

(E) Don’t know.


university of copenhagen department of economics

Consistency: Why Does it Work? (Box 1 in the lecture note)

• Assume that a law of large numbers (LLN) applies to f (wt , zt , θ), i.e.,

T
p
X
T −1 f (wt , zt , θ) → E (f (wt , zt , θ)) for T → ∞.
t=1

That requires IID or stationarity and weak dependence.

• If the moment conditions are correct, g(θ0 ) = 0, then GMM is consistent,


p
θbGMM (WT ) → θ0 as T → ∞,

for any WT positive definite.

• Intuition: If a LLN applies, then gT (θ) converges to g(θ).


Since θbGMM (WT ) minimizes the distance from gT (θ) to zero, it will be a
consistent estimator of the solution to g(θ0 ) = 0.

• The weight matrix, WT , has to be positive definite, so that we put a


positive and non-zero weight on all moment conditions.

Econometrics II — Generalized Method of Moments — Slide 29/58


university of copenhagen department of economics

Asymptotic Distribution (Box 1 in the lecture note)


• Assume a central limit theorem for f (wt , zt , θ), i.e.:

T
√ 1 X D
T · gT (θ0 ) = √ f (wt , zt , θ0 ) → N(0, S),
T t=1

where S is the asymptotic variance.


• Then it holds that for any positive definite weight matrix, W , the
asymptotic distribution of the GMM estimator is given by
√  
D
T θbGMM − θ0 → N(0, V ).

The asymptotic variance is given by


−1 −1
V = D 0 WD D 0 WSWD D 0 WD ,

where !
∂f (wt , zt , θ)
D=E
∂θ0
θ=θ 0

is the expected value of the R × K matrix of first derivatives of f (wt , zt , θ).


Note: The variance depends on the choice of weight matrix.
Econometrics II — Generalized Method of Moments — Slide 30/58
Socrative Question 8

Recall, that the GMM estimator depends on the symmetric and positive
definite weight matrix WT .

Q. Why do we care about the weight matrix WT ?

(A) Because the asymptotic variance of the GMM estimator depends on the
weight matrix.
(B) Because the GMM estimator is only consistent for some weight matrices.
(C) Because the asymptotic variance of the sample moments depends on the
weight matrix.
(D) Because the GMM estimator is only asymptotically normal for some
weight matrices.
(E) Don’t know.
5. Efficient GMM
Socrative Question 9
Under Ass. 1 and 2 the asymptotic distribution of θbGMM is,

T (θb − θ0 ) → N(0, V ),

where
V = (D 0 WD)−1 D 0 WSWD(D 0 WD)−1 ,
and S is the asymptotic variance of the sample moment conditions and D is
the expected value of the R × K matrix of first derivatives of f (wt , zt , θ).
W is an (R × R) positive definite weight matrix which attaches different
weights to the R sample moments.
Q. Efficient GMM uses the optimal weight matrix, WTopt , which minimizes the
asymptotic variance V . How is that achieved?

(A) By using the weight matrix W = S.


(B) By using the weight matrix W = S −1 .
(C) By using the weight matrix W = D.
(D) By using the weight matrix W = D −1 .
(E) Don’t know.
(And what is the expression for the optimal weight matrix?)
university of copenhagen department of economics

Efficient GMM Estimation

• The variance of θ
bGMM depends on the weight matrix, WT .
The efficient GMM estimator has the smallest possible (asymptotic)
variance.
• Intuition: a moment with small variance is informative and should have
large weight.
It can be shown that the optimal weight matrix, WTopt , has the property
that
plimWTopt = S −1 .
With the optimal weight matrix, W = S −1 , the asymptotic variance
simplifies to
−1 −1 −1
V = D 0 S −1 D D 0 S −1 SS −1 D D 0 S −1 D = D 0 S −1 D .

• The best moment conditions have small S and large D.


• A small S means that the sample variation of the moment (noise) is small.

• A large D means that the moment condition is much violated if θ 6= θ0 .


The moment is very informative on the true values, θ0 .
Related to the curvature of the criterion function as in ML.

Econometrics II — Generalized Method of Moments — Slide 32/58


university of copenhagen department of economics

• Hypothesis testing can be based on the asymptotic distribution:


a
θbGMM ∼ N(θ0 , T −1 V
b ).

• An estimator of the asymptotic variance is given by


−1
b = DT0 ST−1 DT
V .

Recall that D = E (∂f (·)/∂θ0 ), which we can estimate by the average


T
∂gT (θ) 1 X ∂f (wt , zt , θ)
DT = = .
|{z} ∂θ0 T ∂θ0 θ=θ̂
(R×K ) t=1

Recall that S = T · V (gT (θ)). If the observations are independent, a


consistent estimator is
T
1 X
ST = f (wt , zt , θ)f (wt , zt , θ)0 .
T
t=1

Estimation of the weight matrix is typically the most tricky part of GMM.

Econometrics II — Generalized Method of Moments — Slide 33/58


university of copenhagen department of economics

Computational Issues

We need an optimal weight matrix, WTopt , but that depends on the parameters!
Two-step efficient GMM:
1 Choose an initial weight matrix, e.g. W[1] = IR , and find a consistent but
inefficient first-step GMM estimator

θb[1] = arg min gT (θ)0 W[1] gT (θ).


θ

opt
2 Find the optimal weight matrix, W[2] , based on θb[1] . Find the efficient
estimator
θb[2] = arg min gT (θ)0 W[2]
opt
gT (θ).
θ

The estimator is not unique as it depends on the initial weight matrix W[1] .

Econometrics II — Generalized Method of Moments — Slide 34/58


university of copenhagen department of economics

Iterated GMM estimator:

opt
• From the estimator θ
b[2] it is natural to update the weights, W[3] , and
update θb[3] .
opt
We can switch between estimating W[·] and θb[·] until convergence.
Iterated GMM does not depend on the initial weight matrix.
The two approaches are asymptotically equivalent.
Continuously updated GMM estimator:
• A third approach is to recognize from the outset that the weight matrix
depends on the parameters, and minimize

QT (θ) = gT (θ)0 WT (θ)gT (θ).

That is never possible to solve analytically.

Econometrics II — Generalized Method of Moments — Slide 35/58


university of copenhagen department of economics

Test of Overidentifying Moment Conditions


• Recall that K moment conditions are sufficient to estimate the K
parameters in θ.
• If R > K , we can test the validity of the R − K overidentifying moment
conditions.
• By MM estimation we can set K moment conditions equal to zero.
If all R conditions are valid then the R − K moments should also be close
to zero.
• From CLT we have
a
gT (θ0 ) ∼ N(0, T −1 S).
If we use the optimal weights, WTopt → S −1 , then

ξJ = T · gT (θbGMM )0 WTopt gT (θbGMM ) = T · QT (θbGMM ) → χ2 (R − K ).

• This is the J-test or the Hansen test for overidentifying restrictions.


In linear models it is often referred to as the Sargan test.
ξJ is not a test of the validity of model or the underlying economic theory.
ξJ considers whether the R − K moments are in line with the K
identifying moments.

Econometrics II — Generalized Method of Moments — Slide 36/58


6. Examples
Socrative Question 10
Consider the estimated Taylor rule for the US economy for 1988(1)-2004(12):

rt = 1.21 + 1.19 · E (πt+12 |It ) + 0.49 · E (yt |It ),


(0.72) (0.21) (0.07)

using iterative GMM with HAC standard errors and the set of instruments
zt = (1, rt , ..., rt−6 , πt , πt−1 , ..., πt−6 , yt−1 , ..., yt−6 )0 .
The figure shows the inflation rate (red line), rt , and predicted inflation rate
(blue line),
brt = 1.12 + 1.19 · πt+12 + 0.49 · yt .

20 rt ^r
t

10

1975 1980 1985 1990 1995 2000 2005

Q. Is there empirical evidence for the Taylor rule?


(A) That depends on whether ût passes the misspecification tests.
(B) Yes, the estimates are significant and in line with the theory.
(C) No, the predicted interest rate b
rt does not match the actual rate rt .
(D) There might be, but are the estimates robust?
(E) Don’t know.
university of copenhagen department of economics

Example: The C-CAPM Model

• Consider the consumption based capital asset pricing (C-CAPM) model of


Hansen and Singleton (1982).

• A representative agent maximizes the discounted value of lifetime utility


subject to a budget constraint:

X
max E (δ s · u(ct+s ) | It ) ,
s=1

At+1 = (1 + rt+1 ) At + yt+1 − ct+1 ,

where At is financial wealth, yt is income, 0 ≤ δ ≤ 1 is a discount factor,


and It is the information set at time t.

• The first order condition is given by the Euler equation:

u 0 (ct ) = E δ · u 0 (ct+1 ) · Rt+1 | It ,




where u 0 (·) is the derivative, and Rt+1 = 1 + rt+1 is the return factor.

Econometrics II — Generalized Method of Moments — Slide 38/58


Socrative Question 16
The C-CAPM model yields the Euler equation:

u 0 (ct ) = E δ · u 0 (ct+1 ) · Rt+1 | It ,




where u 0 (·) is the derivative, and Rt+1 = 1 + rt+1 is the return factor.
Now assume a constant relative risk aversion (CRRA) utility function:

ct1−γ
u(ct ) = , γ < 1.
1−γ

Q. For the set of instruments, zt ∈ It , which of the following is a valid set of


moment conditions?

ct−γ − E −γ
 
(A) δ · ct+1 · Rt+1 zt = 0.
 −γ  
ct+1
(B) E δ· ct
· Rt+1 zt = 0.
 −γ  
ct+1
(C) E δ· ct
· Rt+1 − 1 zt = 0.
−γ
 
(D) E δ · ct+1 · Rt+1 zt = 0.
(E) Don’t know.
(And how many instruments do we need? And which ones would you suggest?)
university of copenhagen department of economics

• Now assume a constant relative risk aversion (CRRA) utility function:

ct1−γ
u(ct ) = , γ < 1,
1−γ

so that u 0 (ct ) = ct−γ . That gives the explicit Euler equation:

ct−γ − E δ · ct+1
−γ

· Rt+1 | It = 0.

• To ensure stationarity, we reformulate:


 −γ 
ct+1

E δ· · Rt+1 − 1 | It = 0,
ct

which is a conditional moment condition.

• That implies the unconditional moment conditions


 −γ  
ct+1 ct+1
   
E f , Rt+1 ; zt ; δ, γ =E δ· · Rt+1 − 1 zt = 0,
ct ct

for all variables zt ∈ It included in the formation set.

Econometrics II — Generalized Method of Moments — Slide 39/60


university of copenhagen department of economics

• The model variables are wt = (ct+1 /ct , Rt+1 )0 .

• To estimate the (K = 2) parameters, θ = (δ, γ)0 , we need at least R = 2


instruments in zt .

• We consider the R = 3 instruments zt = (1, ct /ct−1 , Rt )0 .

• The function f (·) is

f (wt , zt ; θ) = u(wt ; θ) · zt ,
ct+1 −γ
 
u(wt ; θ) = δ· · Rt+1 − 1.
ct

Econometrics II — Generalized Method of Moments — Slide 40/60


university of copenhagen department of economics

• The moment conditions are

g(θ0 ) = E [f (wt , zt ; θ0 )] = 0,

i.e.
 −γ0  
ct+1

E δ0 · · Rt+1 − 1 ·1 = 0
ct
 −γ0  
ct+1 ct

E δ0 · · Rt+1 − 1 = 0
ct ct−1
 −γ0  
ct+1

E δ0 · · Rt+1 − 1 Rt = 0,
ct

for t = 1, 2, ..., T .
• Given a choice of weight matrix, WT ,

θ̂GMM = arg min{gT (θ)0 WT gT (θ)},


θ
T
1 X
gT (θ) = f (wt , zt ; θ).
T
t=1

Econometrics II — Generalized Method of Moments — Slide 41/60


university of copenhagen department of economics

Results for US data, 1959 : 3 − 1978 : 12. (Data: hs.xls)


Method Lags δ γ T ξJ DF p − val
2-Step HC 1 0.9987 0.8770 237 0.434 1 0.510
(0.0086) (3.6792)
Iterated HC 1 0.9982 1.0249 237 1.068 1 0.301
(0.0044) (1.8614)
CU HC 1 0.9981 0.9549 237 1.067 1 0.302
(0.0044) (1.8629)

2-Step HAC 1 0.9987 0.8876 237 0.429 1 0.513


(0.0092) (4.0228)
Iterated HAC 1 0.9980 0.8472 237 1.091 1 0.296
(0.0045) (1.8757)
CU HAC 1 0.9977 0.7093 237 1.086 1 0.297
(0.0045) (1.8815)

2-Step HC 2 0.9975 0.0149 236 1.597 3 0.660


(0.0066) (2.6415)
Iterated HC 2 0.9968 −0.0210 236 3.579 3 0.311
(0.0045) (1.7925)
CU HC 2 0.9958 −0.5526 236 3.501 3 0.321
(0.0046) (1.8267)

2-Step HAC 2 0.9970 −0.1872 236 1.672 3 0.643


(0.0068) (2.7476)
Iterated HAC 2 0.9965 −0.2443 236 3.685 3 0.298
(0.0047) (1.8571)
CU HAC 2 0.9952 −0.9094 236 3.591 3 0.309
(0.0048) (1.9108)

Econometrics II — Generalized Method of Moments — Slide 42/60


university of copenhagen department of economics

• Empirically: The model is formally identified but γ is poorly determined.


Weak instruments, little variation in the data, or wrong model?

• If the instruments are weak, it is hard to identify the parameters


(Verbeek, Sections 5.5.4 and 5.6.4).
Recall that identification states that g(θ) = 0 ⇔ θ = θ0 .
Loosely speaking, weak identification means that g(θ) ≈ 0 for some
θ 6= θ0 .

• Considering the actual data, the variables ct+1 /ct , Rt ≈ 1 for all t.
Hence the variation is small.
Consequently, if δ = 1,
−γ
ct+1

u(wt ; θ) = δ· · Rt+1 − 1 ≈ δ(1)−γ − 1 ≈ 0
ct
for all γ!

Econometrics II — Generalized Method of Moments — Slide 43/60


Socrative Question 11

Consider the asymptotic variance of the moments, S, given by:


T
!
√  1 X
S=V T gT (θ) = V f (wt , zt , θ) .
T
t=1

Assume that the moments ft = f (wt , zt , θ) = zt t are independent over time.

Q. What does the expression for S simplifies to?

PT
E 2t zt zt0 .
1

(A) S= T t=1
T
V 2t zt zt0 .
1
P 
(B) S= T t=1
1
PT
(C) S= T t=1
E (zt t ).
T
1
2 E (zt zt0 ).
P
(D) S= T t=1 t
(E) Don’t know.
university of copenhagen department of economics

Weight Matrix Estimation (Univariate Case)


• The optimal weight matrix is WTopt = ST−1 where ST is a consistent
estimator of
√ XT
! T
!
√ T 1 X
S = V ( T · gT (θ)) = V ft = ·V ft ,
T T
t=1 t=1

where ft = f (wt , zt , θ).


• If ft and fs are independent, then the variance of the sum is the sum of
the variances:
T
! T T
1 X 1 X 1 X
E ft2 .

S= ·V ft = V (ft ) =
T T T
t=1 t=1 t=1

A natural estimator is
T
1 X 2
ST = ft . (∗)
T
t=1

• This is robust to heteroskedasticity by construction and is often referred


to as the heteroskedasticity consistent (HC) covariance estimator.

Econometrics II — Generalized Method of Moments — Slide 44/58


Socrative Question 12

Consider the asymptotic variance of the moments, S, given by:


T
!
√  1 X
S=V T gT (θ) = V f (wt , zt , θ) .
T
t=1

Let f (wt , zt , θ) = zt t and assume that t is conditionally i.i.d.:

E (t |zt ) = 0, E (2t |zt ) = σ 2 , E (t s |zt , zs ) = 0 for any t 6= s.

Q. What does S simplifies to? And why?


PT
(A) S= 1
T t=1
E (2t zt zt0 ).
2 PT
(B) S= σ
T t=1
E (zt zt0 ).
2t PT
(C) S= T t=1
E (zt zt0 ).
σ2 T
P
(D) S= T t=1
V (zt ).
(E) Don’t know.
university of copenhagen department of economics

• If ft and fs are correlated, the variance includes the covariances:


 XT
1

S= ·V ft
T t=1
= T −1 · V (f1 + f2 + ... + ft + ... + fT )
= T −1 · E [(f1 + f2 + ... + ft + ... + fT ) (f1 + f2 + ... + ft + ... + fT )]
= T −1 { E (f12 ) + E (f1 · f2 ) + E (f1 · f3 ) + ... + E (f1 · fT )
+ E (f2 · f1 ) + E (f22 ) + E (f2 · f3 ) + ... + E (f2 · fT )
+ ...
+ E (fT · f1 ) + E (fT · f2 ) + E (fT · f3 ) + ... + E (fT2 ) }.
• Defining the autocovariances of the moments as

1 XT
γj = E (ft · ft−j ),
T t=j+1

We can write S as
XT −1
S = γ0 + 2 · γ1 + 2 · γ2 + 2 · γ3 + .... + 2 · γT −1 = γ0 + 2 · γj ,
j=1

which is known as the long-run variance.

Econometrics II — Generalized Method of Moments — Slide 45/58


university of copenhagen department of economics

• Estimators are derived by using

T
1 X
γ̂j = ft · ft−j .
T
t=j+1

Note that γ̂0 is the HC estimator in (∗).


Also note that γ̂T −1 , γ̂T −2 , ... are based on very few observations: Not
consistent!
• One heteroskedasticity and autocorrelation consistent (HAC) variance
estimator is
T −1
X
ST = γ̂0 + 2 · γ̂j .
j=1

Problem: Not consistent because it uses γ̂T −1 .


• If γj = 0 for j ≥ q, then we can use the truncated estimator

q−1
X
ST = γ̂0 + 2 · γ̂j .
j=1

Problem: Not necessarily positive definite.

Econometrics II — Generalized Method of Moments — Slide 46/58


university of copenhagen department of economics

• A more general trick is to use a weight wj → 0 on covariance j.


This class of so-called kernel estimators can be written as
T −1
X
ST = γ̂0 + wj · 2 · γ̂j ,
j=1

where wj is a kernel weight depending on the lag, j, and a bandwidth


parameter, q.
• Example: Bartlett kernel (Newey-West estimator):

(A) Weights in the Bartlett kernel, q=6


Weights in the Bartlett kernel, q=6
1 1

5/6

4/6

3/6

2/6

1/6

0 Lags
-8 -6 -4 -2 0 2 4 6 8

Econometrics II — Generalized Method of Moments — Slide 47/58


Socrative Question 13
Consider the two-stage least squares (2SLS) estimator of β in the model:

Y = X β + .

Step 1. Regress instruments Z on endogenous variables X to get X


b:

X = Z γ + U,
γ = ???
b
X
b = Zbγ = ???

Step 2. Regress Y on X
b:

Y =X
bB + E,
B b 0X
b = (X b )−1 X
b 0 Y = ???

Q. What is b
γ and B
b?
(A) γ = (ZZ 0 )−1 ZY and B
b b = (X 0 Z (Z 0 Z )−1 Z 0 X )−1 X 0 Z (Z 0 Z )−1 Z 0 Y .
γ = (Z 0 Z )−1 Z 0 Y and B
(B) b b = (Z 0 X (X 0 X )−1 X 0 Z )−1 Z 0 X (X 0 X )−1 X 0 Y .
(C) γ = (Z 0 Z )−1 Z 0 Y and B
b b = (XZ 0 (Z 0 Z )−1 ZX 0 )−1 XZ 0 (Z 0 Z )−1 ZY .
γ = (Z 0 Z )−1 Z 0 Y and B
(D) b b = (X 0 Z (Z 0 Z )−1 Z 0 X )−1 X 0 Z (Z 0 Z )−1 Z 0 Y .
(E) Don’t know.
university of copenhagen department of economics

Example: The Linear Model

• Consider again a regression model

yt = xt0 β0 + t = x1t
0 0
γ0 + x2t δ0 + t ,

where E (x1t t ) = 0 and E (x2t t ) 6= 0.


Assume that you have R > K valid instruments in zt so that

g(β0 ) = E (zt t ) = E (zt yt − xt0 β0 ) = 0.




• The corresponding sample moments are given by

T
1 X  1
gT (β) = zt yt − xt0 β = Z 0 (Y − X β) ,
| {z } T T
t=1
(R×1)

where Y (T × 1), X (T × K ), and Z (T × R) are the stacked matrices.

• In this case we cannot solve gT (β) = 0 directly; Z 0 X is R × K and not


invertible.

Econometrics II — Generalized Method of Moments — Slide 48/60


Socrative Question 14
Consider the linear model
Y = X β + ,
with moment conditions, E (zt t ) = 0, and the quadratic form

QT (β) = gT (β)0 WT gT (β)


= T −2 Y 0 ZWT Z 0 Y − 2β 0 X 0 ZWT Z 0 Y + β 0 X 0 ZWT Z 0 X β .


Given some weight matrix, the GMM estimator solves:


∂QT (β)
= 0.
∂β

Q. What is the closed-form solution for βbGMM (WT )?

−1
(A) βbGMM (WT ) = (Z 0 XWT X 0 Z ) Z 0 XWT X 0 Y .
−1
(B) βbGMM (WT ) = −2T −2 (X 0 ZWT Z 0 X ) X 0 ZWT Z 0 Y .
−1
(C) βbGMM (WT ) = (X 0 ZWT Z 0 X ) X 0 ZWT Z 0 Y .
−1
(D) βbGMM (WT ) = X 0 Z (Z 0 Z )−1 Z 0 X X 0 Z (Z 0 Z )−1 Z 0 Y .
(E) Don’t know.
From Lecture Note 3
Let A be a (n × k) matrix, V a (k × k) matrix, and β a (k × 1) vector of
parameters.

∂(β 0 A0 )
= A0 . (7∗)
∂β

∂(β 0 V β)
= (V + V 0 )β. (8∗)
∂β
university of copenhagen department of economics

• Instead, we want to derive the GMM estimator by minimizing the criterion


function

QT (β) = gT (β)0 WT gT (β)


0
T −1 Z 0 (Y − X β) WT T −1 Z 0 (Y − X β)

=
T −2 Y 0 ZWT Z 0 Y − 2β 0 X 0 ZWT Z 0 Y + β 0 X 0 ZWT Z 0 X β .

=

• We take the first derivative, and the GMM estimator is the solution to

∂QT (β)
= −2T −2 X 0 ZWT Z 0 Y + 2T −2 X 0 ZWT Z 0 X β = 0.
∂β
• We find the general closed-form GMM estimator in the linear model
−1
βbGMM (WT ) = X 0 ZWT Z 0 X X 0 ZWT Z 0 Y ,

given some symmetric and positive definite weight matrix WT .

Econometrics II — Generalized Method of Moments — Slide 49/60


university of copenhagen department of economics

Example Case 1: Linear Model with Conditional Heteroskedasticity

We now consider the linear model

yt = xt0 β0 + t , t = 1, 2, ..., T ,

where we allow for heteroskedasticity of the moments f (wt , zt , θ) = zt t .

• From before, we have the closed-form solution GMM estimator:


−1
βbGMM (WT ) = X 0 ZWT Z 0 X X 0 ZWT Z 0 Y ,

given some symmetric and positive definite weight matrix WT .

• To estimate the optimal weight matrix, WTopt = ST−1 , we use the estimator

T T
1 X 1 X 2 0
ST = · f (wt , zt , θ)f (wt , zt , θ)0 = t zt zt ,
b
T T
t=1 t=1

which allows for general heteroskedasticity of the disturbance term.

Econometrics II — Generalized Method of Moments — Slide 50/60


university of copenhagen department of economics

• For the asymptotic distribution, we recall that


 −1 
a
βbGMM ∼ N β0 , T −1 D 0 S −1 D .

The derivative is given by


 PT 
∂gT (β) ∂ T −1 t=1
zt (yt − xt0 β) T
X
DT = = = −T −1 zt xt0 ,
(R×K ) ∂β 0 ∂β 0
t=1

so the variance of the estimator becomes


  −1
V βbGMM = T −1 DT0 WTopt DT

T
! T
!−1 T
!!
−1 −1
X X X
=T −T xt zt0 T −1
2t zt zt0
b −T −1
zt xt0
t=1 t=1 t=1
T
!−1 T T
!−1
X X X
= xt zt0 2t zt zt0
b zt xt0 .
t=1 t=1 t=1

• Note that this is the heteroskedasticity consistent (HC) variance estimator


(White).
GMM with allowance for heteroskedastic errors automatically produces
heteroskedasticity consistent standard errors!
Econometrics II — Generalized Method of Moments — Slide 51/60
Socrative Question 15
Consider the linear model
Y = X β + ,
with moment conditions, E (zt t ) = 0, and closed-form solution for the GMM
estimator −1 0
βbGMM (WT ) = X 0 ZWT Z 0 X X ZWT Z 0 Y .
Assuming IID errors, such that we estimate the optimal weight matrix, WTopt ,
based on
T T
b2 X 0
σ 1 X 2
ST = zt zt = T −1 σ
b2 Z 0 Z , b2 =
σ t .
b
T T
t=1 t=1

Q. What does the efficient GMM estimator, βbGMM (WTopt ), simplifies to?

−1
(A) βbGMM (WT ) = (Z 0 XZ 0 ZX 0 Z ) Z 0 XZ 0 ZX 0 Y .
−1
(B) βbGMM (WT ) = X 0 Z (σ
b2 Z 0 Z )Z 0 X X 0 Z (σ
b2 Z 0 Z )Z 0 Y .
−1
(C) βbGMM (WT ) = X 0 Z (Z 0 Z )−1 Z 0 X X 0 Z (Z 0 Z )−1 Z 0 Y .
−1
(D) βbGMM (WT ) = T −2 X 0 Z (σ
b2 Z 0 Z )−1 Z 0 X X 0 Z (σ
b2 Z 0 Z )−1 Z 0 Y .
(E) Don’t know.
university of copenhagen department of economics

Example Case 2: Linear Model with Conditionally IID Errors

We now consider the linear model

yt = xt0 β0 + t , t = 1, 2, ..., T ,

with moment conditions, E (zt t ) = 0, and assuming conditional


homoskedasticity and independence over time (IID) of the error term t :

E (t |zt ) = 0, E (2t |zt ) = σ 2 , E (t s |zt , zs ) = 0 for any t 6= s.

• From before, we have the closed-form solution GMM estimator:


−1
βbGMM (WT ) = X 0 ZWT Z 0 X X 0 ZWT Z 0 Y ,

given some symmetric and positive definite weight matrix WT .

Econometrics II — Generalized Method of Moments — Slide 52/60


university of copenhagen department of economics

• If we assume that the error terms are IID, the optimal weight matrix
simplifies to
T
b2 X 0
σ
ST = zt zt = T −1 σ
b2 Z 0 Z ,
T
t=1

b is a consistent estimator for σ 2 .


where σ 2

• In this case the efficient GMM estimator becomes


−1
βbGMM = X 0 ZST−1 Z 0 X X 0 ZST−1 Z 0 Y .
 −1 −1 −1
= X 0 Z T −1 σ
b2 Z 0 Z Z 0X X 0 Z T −1 σ
b2 Z 0 Z Z 0Y
 −1 −1 −1
= X 0Z Z 0Z Z 0X X 0Z Z 0Z Z 0Y ,

which is identical to the two stage least squares (2SLS) estimator.


• The variance of the estimator is
  −1 −1
V βbGMM = T −1 DT0 ST−1 DT b2 (X 0 Z Z 0 Z
=σ Z 0 X )−1 ,

which again coincides with the 2SLS variance.

Econometrics II — Generalized Method of Moments — Slide 53/60


7. Pseudo-Maximum Likelihood Estimation
Socrative Question 17

Consider the linear regression model

yt = xt0 β + t , t = 1, 2, ..., T ,

where we assume t ∼ iidN(0, σ 2 ) and estimate the parameters β by maximum


likelihood:
T
! −1 T
X X
βbML = xt xt0 xt yt .
t=1 t=1

Q. What are the asymptotic properties of βbML if t is i.i.d but NOT normally
distributed?

(A) βbML is consistent, but not asymptotically normally distributed.


(B) βbML is inconsistent and not asymptotically normally distributed.
(C) βbML is consistent and asymptotically normally distributed.
(D) βbML is asymptotically normally distributed, but inconsistent.
(E) Don’t know.
university of copenhagen department of economics

Pseudo-ML (PML) Estimation (Box 2 in the lecture note)


• The first order conditions for ML estimation can be seen as a sample
counterpart to a moment condition. With st (θ) the first-derivative of the
log-likelihood contribution,
T
1 1 X
s (θ) = st (θ) = 0 corresponds to E (st (θ)) = 0,
T T
t=1

and ML becomes a special case of GMM.


• θ
bML is consistent for weaker assumptions than maintained by ML.
E.g.: The FOC for a normal regression model corresponds to

E (xt (yt − xt0 β)) = 0,

which is weaker than the assumption that the entire distribution is


correctly specified. OLS is consistent even if t (the error term) is not
normal.
• A ML estimation that maximizes a likelihood function different from the
true model likelihood is referred to as a pseudo-ML or a quasi-ML
estimator.
Note that the variance matrix is no longer the inverse information (in
general).
Econometrics II — Generalized Method of Moments — Slide 55/60
university of copenhagen department of economics

Quasi-Maximum Likelihood Estimation: The Location Model

Consider the location model

yt = θ + t , t ∼ IID(0, 1).

We may consider the pseudo-log-likelihood function based on the assumption


that t is normal:
T  
X 1 (yt − θ)2
LT (θ) = − log(2π) − .
2 2
t=1

The QMLE maximizes LT (θ), and the FOC yields


T
∂LT (θ) X
= 0 ⇔ (yt − θ) = 0.
∂θ
t=1

This is exactly the sample moment condition corresponding to E [yt − θ] = 0.


PT
Even if t is not normal, we obtain ȳ = T1 y as the estimator for θ.
t=1 t

Econometrics II — Generalized Method of Moments — Slide 56/60


university of copenhagen department of economics

Quasi-Maximum Likelihood Estimation: The General Case


With θ the parameter vector and lt (θ) the log-likelihood contribution, recall
that the ML estimator satisfies,

T (θ̂ML − θ0 ) → N(0, J −1 ),
where
 
∂ 2 lt (θ0 )
J = −E , positive definite (the information matrix).
∂θ∂θ0
Suppose that lt (θ) is not necessarily based on the true model likelihood. Then
we introduce the quasi-maximum likelihood estimator (QMLE),
T
X
θ̂QML = arg max lt (θ).
θ
t=1

Under suitable conditions,



T (θ̂QML − θ0 ) → N(0, J −1 ΣJ −1 ),
where
 
∂lt (θ0 ) ∂lt (θ0 )
Σ = E , positive definite (the variance of the score).
∂θ ∂θ0
If lt (θ) is the true model likelihood, J = Σ, and the asymptotic variance is J −1 .
Econometrics II — Generalized Method of Moments — Slide 57/60
university of copenhagen department of economics

Quasi-Maximum Likelihood Estimation: ARCH


Consider a simple version of the ARCH model:
yt = σt z t , zt ∼ IID(0, 1),
σt2 = 2
ω + αyt−1 .
The model parameters are θ = (ω, α)0 . The quasi-log-likelihood function based
on the normal distribution is
T  
X 1 1 y2
LT (θ) = lt (θ), lt (θ) = − log(2π) − log(σt2 (θ)) − 2t .
2 2 2σt (θ)
t=1

Under suitable conditions,



T (θ̂QML − θ0 ) → N(0, J −1 ΣJ −1 ).
∂σ 2 (θ ) ∂σ 2 (θ )
It holds that J = 21 E [ σ4 (θ
1 t 0
∂θ
t 0
∂θ 0
] and Σ = κ2 J, where κ = E [(zt2 − 1)2 ].
t 0)
Hence,
√ κ
T (θ̂QML − θ0 ) → N(0, J −1 ).
2
Note that if zt ∼ N(0, 1), then E [zt4 ] = 3 and κ = E [zt4 ] + 1 − 2E [zt2 ] = 2.
Hence if zt is normal, such that we have chosen the correct likelihood, we have
that the asymptotic variance is J −1 .
Econometrics II — Generalized Method of Moments — Slide 58/60
8. Concluding Remarks
university of copenhagen department of economics

Comparison of ML and GMM

Maximum Likelihood Generalized Method of Moments

Assumptions: Full specification. Partial specification/weak assumptions.


Know Density(θ0 ) apart from θ0 . Moment conditions: E (f (data;θ0 )) = 0.
Strong economic assumptions.

Efficiency: Cramér-Rao lower bound. Efficient based on moment condition.


(Smallest possible variance). Never smaller than Cramér-Rao.

Typical Statistical description of the data. Estimate relevant parameters of


approach: Misspecification testing. economic model.
Restrictions recover economics. (Not much attention to stationarity.)

Robustness: First order conditions should hold! Moment conditions should hold!
PML is a GMM interpretation of ML. Weights and variances can
Use larger PML variance. be made robust.

Econometrics II — Generalized Method of Moments — Slide 60/60

Potrebbero piacerti anche