Sei sulla pagina 1di 33

The endogeneity problem

Proxy variables
Instrumental variables
STATA

Endogeneity

Gabriel V. Montes-Rojas

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

The endogeneity problem

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

The endogeneity problem

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

Our primary interest is in the estimation of β 1 and β 2 .


However, we know that if abil is not observed, we would
obtain biased estimators of them (Why? See Omitted
Variables Bias).

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

The endogeneity problem

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

Our primary interest is in the estimation of β 1 and β 2 .


However, we know that if abil is not observed, we would
obtain biased estimators of them (Why? See Omitted
Variables Bias).
In practice, we can only estimate the model

log(wage ) = β 0 + β 1 educ + β 2 exper + v

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

The endogeneity problem

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

Our primary interest is in the estimation of β 1 and β 2 .


However, we know that if abil is not observed, we would
obtain biased estimators of them (Why? See Omitted
Variables Bias).
In practice, we can only estimate the model

log(wage ) = β 0 + β 1 educ + β 2 exper + v

Here, the main problem is that educ and exper are not
exogenous, or they are endogenous:
Cov (educ, v ) 6= 0, Cov (exper , v ) 6= 0.
Gabriel Montes-Rojas Endogeneity
The endogeneity problem
Proxy variables
Instrumental variables
STATA

There are 3 possible solutions to this problem:


1 Measure abil. But this is almost impossible, how can you
measure it?

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

There are 3 possible solutions to this problem:


1 Measure abil. But this is almost impossible, how can you
measure it?
2 Find a proxy variable for ability.

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

There are 3 possible solutions to this problem:


1 Measure abil. But this is almost impossible, how can you
measure it?
2 Find a proxy variable for ability.
3 Find an instrumental variable for the endogenous variables.

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Proxy variables

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Proxy variables

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

A proxy variable for abil is IQ.

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Proxy variables

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

A proxy variable for abil is IQ.


A proxy variable should satisfy:

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Proxy variables

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

A proxy variable for abil is IQ.


A proxy variable should satisfy:
1 abil = δ0 + δ3 IQ+v3 , where v3 is uncorrelated with educ,
exper and IQ

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Proxy variables

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

A proxy variable for abil is IQ.


A proxy variable should satisfy:
1 abil = δ0 + δ3 IQ+v3 , where v3 is uncorrelated with educ,
exper and IQ
2 u is uncorrelated with educ, exper and abil.

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Proxy variables

Consider the wage equation model

log(wage ) = β 0 + β 1 educ + β 2 exper + β 3 abil + u

A proxy variable for abil is IQ.


A proxy variable should satisfy:
1 abil = δ0 + δ3 IQ+v3 , where v3 is uncorrelated with educ,
exper and IQ
2 u is uncorrelated with educ, exper and abil.
The we can estimate
y = ( β 0 + β 3 δ0 ) + β 1 educ + β 2 exper + β 3 δ3 IQ + u + β 3 v3

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Example 9.3: IQ as a Proxy for Ability (WAGE2 database)


Ind.vars (1) (2) (3)
educ .065 .054 .018
(.006) (.007) (.041)
exper .014 .014 .014
(.002) (.002) (.003)
tenure .012 .011 .011
(.002) (.002) (.002)
married .199 .200 .201
(.039) (.039) (.039)
south -.091 -.080 -.080
(.026) (.026) (.026)
urban .184 .182 .184
(.027) (.027) (.027)
black -.188 -.143 -.147
(.038) (.039) (.040)
IQ - .0036 -.0009
(.0010) (.0052)
educ.IQ - - -.00034
(.00038)
Gabriel Montes-Rojas Endogeneity
The endogeneity problem
Proxy variables
Instrumental variables
STATA

Bias when using a proxy


Assume that

abil = δ0 + δ1 educ + δ2 exper + δ3 IQ + v3

⇒ y = ( β 0 + β 3 δ0 ) + ( β 1 + β 3 δ1 )educ
+( β 2 + β 3 δ2 )exper + β 3 δ3 IQ + u + β 3 v3

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Instrumental variables

Consider the model

y = β0 + β1 x + u
where Cov (x, u ) 6= 0 (i.e. x is endogenous)
A good instrumental variable (say z) satisfies these two conditions:
1. It is not correlated with the error term: Cov (z, u ) = 0
2. It is correlated with the endogenous variable: Cov (x, z ) 6= 0

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

How can we estimate β 1 using z?


Note that

Cov (z, y )
β1 =
Cov (z, x )
Why?

Cov (z, y ) = Cov (z, β 0 + β 1 x + u )


= Cov (z, β 0 ) + Cov (z, β 1 x ) + Cov (z, u )

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

IV as a two step least squares estimator

x = γ0 + γ1 z + v (1)

x̂ = γ̂0 + γ̂1 z (2)

y = β 0 + β 1 x̂ + u (3)

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Instrumental variables: multiple regression model

Consider the model

y = β 0 + β 1 x1 + β 2 x2 + u
where Cov (x1 , u ) 6= 0 (i.e. x1 is endogenous) and Cov (x2 , u ) = 0.
A good instrumental variable (say z) satisfies these two conditions:
1. It is not correlated with the error term: Cov (z, u ) = 0
2. It is correlated with the endogenous variable: Cov (x1 , z ) 6= 0

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Testing for endogeneity

The 2SLS estimator is less efficient (i.e. larger variance) than OLS
when the explanatory variables are exogenous.
Therefore, it is important to test for endogeneity first, in order to
avoid using an IV estimator that is:
1 more computationally intensive (2 stages is more difficult than
1)
2 less efficient

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Testing for endogeneity


Consider the model

y1 = β 0 + β 1 y2 + β 2 z1 + β 3 z2 + u
where y2 is (possibly) endogenous; z3 and z4 are IVs
In order to test for endogeneity:

I . y2 = π0 + π1 z1 + π2 z2 + π3 z3 + π4 z4 + v2 compute residuals v̂2

II . y1 = β 0 + β 1 y2 + β 2 z1 + + β 3 z2 + δ1 v̂2 + error
III. Test for the significance of v̂2 in the latter model. If we reject
H0 : δ1 = 0, then there is evidence that u and v2 are correlated,
therefore y2 is endogenous!!!
Gabriel Montes-Rojas Endogeneity
The endogeneity problem
Proxy variables
Instrumental variables
STATA

Testing for the Validity of the Instruments:


Overidentification Restrictions − > Hansen or Sargan tests

This is a test that will tell you if the instruments are uncorrelated
with the error term, an essential condition for the validity of the
IVs.
Requirement: You need more IVs than endogenous variables.
In the model above, we can run the 2SLS with z3 as the only IV;
compute û3 = y1 − β̂ 0 − β̂ 1 y2 − β̂ 2 z1 − β̂ 3 z2 ; and then evaluate
the regression model û3 = δ0 + δ1 z4 , in particular, test the
significance of z4 .
This is a valid test for the validity of z4 as an IV. BUT it needs to
assume that z3 is a valid IV.

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

Testing for the validity of the instruments:


Overidentification restrictions − > Hansen test

1 Estimate the full 2SLS model with all IVs, obtain the residuals
û.
2 Regress û on ALL exogenous variables (i.e. the exogenous
variables and the IVs)
3 Consider the F-test of significance of the regression. H0 can be
interpreted as exogeneity of all variables in the model. Then if
you reject H0 one (or more) of your IVs are not exogenous.

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

How to do it in STATA?
Assume that x1 is endogenous and x2 is exogenous. Moreover
assume that you have two instruments available: z1 and z2

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

How to do it in STATA?
Assume that x1 is endogenous and x2 is exogenous. Moreover
assume that you have two instruments available: z1 and z2
ivregress 2sls y (x1=z1 z2) x2 (instrumental variables
estimation)

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

How to do it in STATA?
Assume that x1 is endogenous and x2 is exogenous. Moreover
assume that you have two instruments available: z1 and z2
ivregress 2sls y (x1=z1 z2) x2 (instrumental variables
estimation)
ivregress 2sls y (x1=z1 z2) x2, first (idem - request that
the first-stage regression results are shown)

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

How to do it in STATA?
Assume that x1 is endogenous and x2 is exogenous. Moreover
assume that you have two instruments available: z1 and z2
ivregress 2sls y (x1=z1 z2) x2 (instrumental variables
estimation)
ivregress 2sls y (x1=z1 z2) x2, first (idem - request that
the first-stage regression results are shown)
estat firststage (test for the significance of the instruments -
thumb-rule F > 10)

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

How to do it in STATA?
Assume that x1 is endogenous and x2 is exogenous. Moreover
assume that you have two instruments available: z1 and z2
ivregress 2sls y (x1=z1 z2) x2 (instrumental variables
estimation)
ivregress 2sls y (x1=z1 z2) x2, first (idem - request that
the first-stage regression results are shown)
estat firststage (test for the significance of the instruments -
thumb-rule F > 10)
estat overid (test for the validity of the instruments: need more
instruments than endogenous variables...)

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

How to do it in STATA?
Assume that x1 is endogenous and x2 is exogenous. Moreover
assume that you have two instruments available: z1 and z2
ivregress 2sls y (x1=z1 z2) x2 (instrumental variables
estimation)
ivregress 2sls y (x1=z1 z2) x2, first (idem - request that
the first-stage regression results are shown)
estat firststage (test for the significance of the instruments -
thumb-rule F > 10)
estat overid (test for the validity of the instruments: need more
instruments than endogenous variables...)
estat endogenous (test for the exogeneity of all variables)

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

How to do it in STATA?
Assume that x1 is endogenous and x2 is exogenous. Moreover
assume that you have two instruments available: z1 and z2
ivregress 2sls y (x1=z1 z2) x2 (instrumental variables
estimation)
ivregress 2sls y (x1=z1 z2) x2, first (idem - request that
the first-stage regression results are shown)
estat firststage (test for the significance of the instruments -
thumb-rule F > 10)
estat overid (test for the validity of the instruments: need more
instruments than endogenous variables...)
estat endogenous (test for the exogeneity of all variables)
reg x1 z1 z2
test z1 z2 (test for the significance of the instruments -
thumb-rule F > 10)
Gabriel Montes-Rojas Endogeneity
The endogeneity problem
Proxy variables
Instrumental variables
STATA

How to do it in STATA?

An intuitive way of understanding the IV estimator is to run a


two stage regression model. For instance suppose you want to
obtain:
ivreg y (x1=z1 z2) x2
You can obtain the same results by:
reg x1 z1 z2 x2
predict x1hat
reg y x1hat x2

Gabriel Montes-Rojas Endogeneity


The endogeneity problem
Proxy variables
Instrumental variables
STATA

See examples in Chapter 15

http://fmwww.bc.edu/gstat/examples/wooldridge/wooldridge15.html

Gabriel Montes-Rojas Endogeneity

Potrebbero piacerti anche