Sei sulla pagina 1di 37

Instrumental Variables (IV) and

Two Stage Least Squares (2SLS)

Wooldridge Chapter 15, Dougherty Chapter


10 & Stock and Watson Chapter 12.
Econometrics II
Winter, 2019
Simultaneous Equations
It is a system of equations that determines a set of
endogenous variables as a function of exogenous
variables and unobserved shocks.

Endogenous = determined within system.


Exogenous = determined outside of it.

Focus on linear SEMs, i.e., linear parametric functions.


Coefficients and shocks have structural interpretation.
2
Simultaneous Equations
The most important models in economics and business are simultaneous in nature,
e.g., demand and supply simultaneously determines the market price. In simultaneous
models, Assumption 3 fails because the error term is now correlated with some of the
regressors. As a result, an alternative estimation method, two-stage-least squares
(2SLS), will usually be called upon.

The economic world is full of feedback effects and dual causality that requires the
application of simultaneous equations:

• quantity demanded and price


• joint determination of wages and prices
• the interaction between foreign exchange rates and international trade and capital
flows
• wages and the years of schooling

3
Structural and Reduced-Form Equations:
The Nature of Simultaneous Equations Systems

To begin with, we examine a typical econometric equation:

A simultaneous system is one in which Y clearly has an effect on


at least one of the Xs in addition to the effect that Xs have on Y.

4
Endogenous and Exogenous Variables

We have to distinguish between variables that are simultaneously


determined (the Y’s, called the endogenous variable) and those
that are not (the X’s, called exogenous variables).

5
Endogenous and Exogenous Variables:
A Supply-Demand Example

6
Endogenous and Exogenous Variables:
A Supply-Demand Example

7
Endogenous and Exogenous Variables:
A Supply-Demand Example

8
Simultaneous Systems can violate
Classical Assumption 3

9
Simultaneous Systems can violate
Classical Assumption 3

10
Simultaneous Systems can violate
Classical Assumption 3

11
Structural and Reduced Form Equations

12
Structural and Reduced Form Equations

13
Structural and Reduced Form Equations:
An Example

14
Structural and Reduced Form Equations:
An Example

15
Reasons to use the reduced-form equations

16
Simultaneity Bias

17
Instrumental Variables
Potential solution: use an instrumental variable.

We want to split the Xi into two parts:


1) part that is correlated with the error term
2) part that is uncorrelated with the error term

18
Instrumental Variables
Instrumental Variables (IV) estimation is used
when your model has endogenous X’s.
➢That is, whenever 𝑪𝒐𝒗 𝑿, 𝝁 ≠ 𝟎

IV can be used to address the problem of omitted


variable bias. Additionally, IV can be used to
solve the classic errors-in-variables problem.

19
Instrumental Variables
In order for a variable, Z, to serve as a valid
instrument for X, the following must be true:
➢ Exogeneity: The instrument must be
exogenous. That is, 𝑪𝒐𝒗 𝒁, 𝝁 = 𝟎

➢ Relevance: The instrument must be


correlated with the endogenous variable X.
That is, 𝑪𝒐𝒗(𝒁, 𝑿) ≠ 𝟎

20
Validity of Instruments
➢ We have to use common sense and economic theory to
decide if it makes sense to assume 𝐶𝑜𝑣 𝑍, 𝑢 = 0
➢ We can test if 𝑪𝒐𝒗 𝒁, 𝑿 ≠ 𝟎
𝑿 = 𝝅𝟎 + 𝝅𝟏 𝒁 + 𝒗𝟏
Just testing 𝐻0 : 𝜋1 = 0
Sometimes refer to this regression as the first-stage
regression
➢ It is also possible to have multiple instruments:
𝑿 = 𝝅𝟎 + 𝝅𝟏 𝒁𝟏 + 𝝅𝟐 𝒁𝟐 + 𝒗𝟐
Here we’re assuming that both instruments are valid that
they are uncorrelated with the structural error term.
21
Example
In the research paper, “The Colonial Origins of Comparative
Development: An Empirical Investigation” the authors tested
the following relationship:
𝑮𝒅𝒑 𝒑𝒆𝒓 𝒄𝒂𝒑𝒊𝒕𝒂 = 𝜷𝟎 + 𝜷𝟏 𝒊𝒏𝒔𝒕𝒊𝒕𝒖𝒕𝒊𝒐𝒏𝒔 + 𝛍
Simultaneity bias exists in the above equation.
Thus, the authors used mortality rates as an instrument for
current institutions.
(potential) settler mortality settlements

early institutions current institutions

current performance

22
Instrumental Variables (Validity)
Exclusion Restriction:
Highly correlated with our outcome BUT does not directly CAUSE changes in our outcome
1

INSTRUMENT Endogenous X variable Outcome


Settler Mortality Current Institutions GDP per capita
2
Relevance:
3 𝑪𝒐𝒗(𝒎𝒐𝒓𝒕𝒂𝒍𝒊𝒕𝒚, 𝒊𝒏𝒔𝒕𝒊𝒕𝒖𝒕𝒊𝒐𝒏𝒔) ≠ 𝟎
OTHER FACTORS Highly correlated with our problematic variable
𝝁 and CAUSES direct changes in this variable

Exogeneity:
𝑪𝒐𝒗 𝒎𝒐𝒓𝒕𝒂𝒍𝒊𝒕𝒚, 𝝁 = 𝟎
Uncorrelated with any unobserved/uncontrolled variables that exist
23
in our error term and cause changes in our outcome
Two-Stage Least Squares (2SLS)
First Stage Regression:
Regress Xi on Zi& obtain predicted values
෡ 𝒊= 𝝅
𝑿 ෝ 𝒐 +ෝ
𝝅𝟏 𝒁𝒊

Second Stage Regression:


Regress Yi on 𝑋෠ i to obtain consistent 2SLS estimator of Xi’s
exogenous effect on Yi:
෡𝒊
𝒀𝒊 = 𝜷 𝟎 + 𝜷 𝟏 𝑿

24
Example
For example, let take the returns to education again:
𝐥𝐨𝐠 𝒘𝒂𝒈𝒆 = 𝜷𝟎 + 𝜷𝟏 𝒆𝒅𝒖 + ⋯ + 𝝁
where 𝜇 may be correlated with 𝑒𝑑𝑢 because 𝑎𝑏𝑖𝑙𝑖𝑡𝑦 is omitted.
Consider a variable near_college which is a dummy variable equal to
1 if the individual i grew up near a four-year college.
1. Relevance?
An individual is more likely to be educated if there is a college
within a reasonable distance. That is,
𝑪𝒐𝒗 𝒏𝒆𝒂𝒓_𝒄𝒐𝒍𝒍𝒆𝒈𝒆, 𝒆𝒅𝒖 ≠ 𝟎.
2. Exogeneity?
Is it reasonable to assume 𝑪𝒐𝒗 𝒏𝒆𝒂𝒓_𝒄𝒐𝒍𝒍𝒆𝒈𝒆, 𝒂𝒃𝒊𝒍𝒊𝒕𝒚 = 𝟎?

25
Example
Dependent Variable Independent Variable This is the p-value of the
model. It tests whether R2 is
Model Sum of Squares different from 0. usually we
tells you how much of need a p-value lower than 0.05
the variation in the to show a statistically
dependent variable significant relationship
does your model between X and Y.
explain. The closer to
TSS the better fit. R-square shows
the amount of
Residual Sum of variance of Y
Squares tells you how explained by X.
much of the dependent
variable’s variation your Root Mean
model did not explain. Squared Error: is
the sd of the
Total Sum of Squares regression. It
tells you how much shows the average
variation there is in the distance of the
dependent variable. estimator from the
mean. The closer
to zero better the
fit.
The t-values and the two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this,
the p-value has to be lower than 0.05 or a t-value greater than 1.96.

26
Example- Testing for IV relevance
(First Stage)
reg education near_college

F-stats of >10 and preferably 12


strongly confirm relevance.
Example- 2SLS (Second Stage)
Dependent Variable Endogenous Variable IV

Instrumented indicates endogenous X.


Instruments indicate IV for this X.
The point to know is that the education variable in this regression is
actually the predictions of education that we obtained in the first stage.
STATA computes these predictions itself and then substitutes those in place
of the endogenous X variable.
Properties of Two-Stage Least Squares (2SLS)

29
An Example of 2SLS

30
An Example of 2SLS

31
Trick for figuring out endogenous variables in
simultaneous equation system

32
An Example of 2SLS

33
An Example of 2SLS

34
An Example of 2SLS

35
To Summarize
➢ Instrumental Variables (IV) estimation is used when
your model has endogenous X variables.
▪ That is, when 𝐶𝑜𝑣(𝑋, 𝜇) ≠ 0
➢ IV can be used to address the problem of omitted
variable bias and systematic measurement errors in X
variables.
➢ We use 2SLS or two-staged least squares to estimate
regression with IV.

36
To Summarize
➢ In order for a variable , Z, to serve as a valid
instrument for X, the following must be true:
▪ The instrument must be exogenous. That is, 𝐶𝑜𝑣 𝑍, 𝜇 = 0.
▪ The instrument must be correlated with the endogenous
variable X. That is, 𝐶𝑜𝑣 𝑍, 𝑋 ≠ 0.
➢ We have to use common sense and economic theory
to decide if it makes sense to assume 𝐶𝑜𝑣 𝑍, 𝜇 = 0.
➢ We can test if 𝐶𝑜𝑣 𝑍, 𝑋 ≠ 0.

37

Potrebbero piacerti anche