Lec06 - Panel Data

Lecture 6
Panel Data Econometrics
What are Panel Data?

Panel data are a form of longitudinal data, where observations on cross-section units are
regularly repeated
o Cross-section units can be individuals, households, plants, firms, municipalities,
states or countries
o Repeat observations are usually time period (e.g. five years intervals, annual,
quarters, weeks, days, etc) or units within clusters (e.g. siblings within a family,
firms within an industry, workers within a firm, etc)
An important characteristic of panel data is that we cannot assume that the observations
are independently distributed across time
o e.g. unobserved factors that affects a persons wage in 1990 will also affect that
persons wage in 1991
Independently pooled cross-section data are obtained by sampling randomly from a large
population at different points in time.
o Such data consist of independently sampled observations, which rules out
correlation in the error terms across different observations
Examples of Panel Data

o Firm or company data
o Longitudinal data on patterns of individual behaviour over the life-cycle.
o Comparative country-specific macroeconomic data over time.
Examples of Panel Datasets
o Panel Study of Income Dynamics (PSID)
o National Longitudinal Surveys of Labor Market Experience (NLS)
o German Socioeconomic Panel (GSEP)
o The British Household Panel Survey (BHPS)
o Swedish Agriculture Farm Level Survey (JEU)
o Finnish Company Database (Yritystietokanta)
o Luxembourg Income Study (LIS)
Common features of Panel Data:

o The sample of individuals is typically relatively large
o The number of time periods is generally short
o Time series dimension of aggregate data tends to be longer (e.g. the Penn World
Tables, World Development Indicators)
Why use panel data methods?

o Increased precision of regression estimates
o Repeated observations on individuals allow for possibility of isolating effects of
unobserved differences between individuals
o We can study dynamics
o The ability to make causal inference is enhanced by temporal ordering
o The ability to model temporal effects and control for variables that vary over time
o Some phenomena are inherently longitudinal (e.g. poverty persistence, unstable
employment)
But there are limits to the benefits of panel data:

o Variation between people usually far exceeds variation over time for an individual
o A panel with waves doesnt give times the information of a crosssection
o Variation over time may not exist for some important variables or may be inflated
by measurement error
o Panel data imposes a fixed timing structure; continuous time survival analysis may
be more informative
o We still need very strong assumptions to draw clear inferences from panels:
sequencing in time does not necessarily reflect causation
Advantages of panel estimation methods

o Large number of data points (observations)
o Increased degrees of freedom
o Reduces the collinearity among the explanatory variables
o Improved efficiency of econometric estimates
o More variability, less aggregation over firm and individuals
o Better able to study dynamics of adjustment in unemployment, income mobility,
etc
o More reliable and stable parameter estimates
o Identify and measure effects not detectable in pure cross-section (CS) or timeseries (TS) data. Control for unobservable individual heterogeneity and dynamics
not possible in TS (N=1) and CS (T=1). Example: married woman labour-force
participation of 50% interpreted as 50% chance of being in labour force in any
given year, or alternatively 50% always work and 50% never
o Dynamic effects cannot be estimated using CS data.
Disadvantages of panel estimation methods

o Complicated survey design, stratification
o Changing structure of population (use of rotating panel data)
o Incomplete coverage of the population of interest
o Data collection and management problem
o Distortions of measurement errors due to faulty response, unclear questions,
o Selectivity problems (self-selectivity not to work because reservation
wage>offered wage)
o Non-response (partial or complete) due to lack of cooperation
o Attrition problem, non-response over time is increasing
o Short time-series dimension, increased N costly, increased T deteriorates attrition
o New estimation problems
o Imputations of unit non-response/missing
Pooled Data
Pooling Independent Cross-Sections across Time
o Since a random sample is drawn at each time period, pooling the resulting random
samples gives us an independently pooled cross-section
o As such, we can use standard OLS methods
o Advantage of pooling is to increase the sample size, thereby obtaining more precise
estimates and test statistics with greater power
o Pooling is only useful in this regard if the relationship between the dependent variable
and at least some of the independent variables remains constant over time
o To reflect the fact that the population may have different distributions in different time
periods, the intercept is usually allowed to differ across time periods (can be
accomplished by including year dummies)
o The coefficients on the year dummies may be of interest (e.g. after controlling for other
factors has the pattern of fertility changed over time?)
o Year dummies can also be interacted with other explanatory variables to see if the
effect of that variable has changed over time
Testing for Structural Change across Time

o Considering a pooled dataset of two time periods and
o Interact each variable with a year dummy for
o Test for the joint significance of the year dummy and all of the interaction terms
o Since the intercept in a regression model often changes over time, the Chow test can
detect such changes. It is usually more interesting to allow for an intercept difference and
then to test whether certain slope coefficients change over time
o This can be extended to more than two time periods
Policy Analysis with Pooled Cross-Sections

Difference-in-Difference Estimation
o Methodology
Examine the effect of some sort of treatment by comparing the treatment group
after treatment both to the treatment group before treatment and to some other
control group.
Standard case: outcomes are observed for two groups for two time periods. One of
the groups is exposed to a treatment in the second period but not in the first period.
The second group is not exposed to the treatment during either period. Structure
can apply to repeated cross sections or panel data.
Usually related to a so-called natural (or quasi-) experiment, when some exogenous
event often a change in government policy changes the environment in which
individuals, families, firms or cities operate.
Example
o A state offers a tax break to firms providing employers with health insurance. To estimate
the impact of the bill on the percentage of firms offering health insurance we could use
data on a state that didnt implement such a law as a control group. It is not correct just
to compare pre- and post-law changes in the percentage of firms offering health
insurance, i.e.
= + 2 +
where 2 is a dummy for period two.
(1)
o Here the coefficient estimate gives an estimate of the difference in the percentage of
firms offering health insurance between periods one and two
o The coefficient doesnt necessarily provide a (causal) estimate of the impact of the tax
break however, since there could be a trend towards more employers offering health
insurance over time
With repeated cross sections, let A be the control group and B the treatment group. Write
= + + 2 + 2. +
(2)
where:
y is the outcome of interest (e.g. percentage of firms offering health insurance in each State)
dB captures possible differences between the treatment and control groups prior to the
policy change (e.g. State A versus State B)
d2 captures aggregate factors that would cause changes in y over time even in the absence
of a policy change, i.e. for both States (e.g. time dummies)
The coefficient of interest is , which gives an estimate of the change in health insurance
take-up for firms in State B, and which is called the difference-in-difference estimator.
State A

Year 1
Year 2
Coefficient

State B

Calculation

( ) ( )
The difference-in-differences (DD) can be written as:

= , , , ,
(3)
In other words, represents the difference in the changes over time.

Assuming that both states have the same health insurance trends over time, we have now
controlled for a possible national time trend, and can now identify what the true impact of
the tax deductibility is on employers offering insurance.
Inference based on moderate sample sizes in each of the four groups is straightforward, and
is easily made robust to different group/time period variances in regression framework.
Can refine the definition of treatment and control groups.

o Example: change in state health care policy aimed at elderly. Could use data only on
people in the state with the policy change, both before and after the change, with the
control group being people 55 to 65 (say) and the treatment group being people over
65
o This DD analysis assumes that the paths of health outcomes for the younger and older
groups would not be systematically different in the absence of intervention
o Alternatively, use the over-65 population from another state as an additional control
Let dE be a dummy equal to one for someone over 65:
= + + + . + 2 + 2. + 2. + 2. . +
(4)
= ,!, ,!, ,", ,", # ,!, ,!, ,", ,", #
(5)
The OLS estimate is
where the A subscript means the state not implementing the policy and the N subscript
represents the non-elderly. This is the difference-in-difference-in-differences (DDD) estimate.
Can add covariates to either the DD or DDD analysis to control for compositional changes.
Can use multiple time periods and groups.
This methodology has a number of applications, particularly when the data arise from a
natural experiment (or quasi experiment)
o Occurs when some exogenous event often a change in government policy changes
the environment in which individuals, families, firms or cities operate
A natural experiment always has a control group, which is not affected by the policy change,
and a treatment group thought to be affected by the policy change
Different to a true experiment the control and treatment groups in natural experiments
arise from the particular policy change and are not randomly assigned
If $ is the control group and the treatment group, and letting equal one for those in
the treatment group , and zero otherwise. Then, letting 2 denote a dummy for the
second (post-policy change) time period, the equation of interest is;
= + 2 + + 2 + &() +&),
Where measures the effect of the policy
Without other factors in the regression will be the difference-in-difference estimator:

= ,- ,. (,- ,. )
where the bar denotes the average
Control
Treatment
Treatment - Control
Before

+

After
+
+ + +
+
After - Before

+

The parameter sometimes called the average treatment effect can be estimated in
two ways:
1. Compute the differences in averages between the treatment and control groups in
each time period, and then difference the results over time
2. Compute the change in averages over time for each of the treatment and control
groups, and then difference these changes, i.e. write
= ,- ,- (,. ,. )
When explanatory variables are added to the regression, the OLS estimate of no longer
has a simple form, but its interpretation is similar
Panel Data
A balanced panel has the same number of time observations () for each of the / individuals
An unbalanced panel has different numbers of time observations (0 ) on each individual
A compact panel covers only consecutive time periods for each individual there are no
gaps
Attrition is the process of drop-out of individuals from the panel, leading to an unbalanced
(and possibly non-compact) panel
A short panel has a large number of individuals but few time observations on each, (e.g. the
British Household Panel Survey has 5,500 households and 14 waves)
A long panel has a long run of time observations on each individual, permitting separate
time-series analysis for each (e.g. Penn World Tables has data from 1960)
While panel data can be analyzed using standard OLS techniques, it is better to use some
techniques specifically designed to take advantage of panel data.
o Specifically, you know that in a panel data set there is a special relationship between
the multiple observations of a particular individual.
Consider the following regression specification:
01 = 201 + 01
A common assumption used in panel data is that we can write the error term as:
01 = 301 + 40
Where 40 is called a fixed- (or random-) effect that doesnt vary over time
One assumption of OLS is that (01 |201 ) = 0. If the 40 are correlated with 201 therefore OLS
will provide inconsistent estimates of the parameters
Panel data methods allow us to estimate the parameters consistently using so-called fixed
effects (or related) methods
We replace the assumption that (01 |201 ) = (301 + 40 |201 ) = 0 with the weaker
assumption that (301 |20 ) = 0
Individual Effects in Panel Data

It may look as though there is a positive relationship between X and Y and you would be
tempted to draw some sort of positively sloped linear relationship
However, if you know that this is panel data, you might consider which points in the graph
are observations on the same individuals and, perhaps, circle observations of particular
individuals. In this case, you might find the following:
This reveals a very different relationship between 7 and 8. The knowledge of which
observations came from the same person, company, plant, unit, state, city or whatever can
dramatically impact the results of your research.
In this case, we see a negative relationship between the variables, but we might imagine
that there is a separate intercept for each person. This is one type of panel data model.
Panel Data Methods

Distinguish between Fixed Effects and Random Effects Models
o In fixed effects models 40 and 201 are allowed to be correlated
o In random effects models 40 and 201 are assumed to be uncorrelated
The Fixed effects panel data model
Consider the following model
01 = 40 + 201 + 301
For 9 = 1, . . . , individuals over = 1, . . . , periods

Model includes
o An individual effect, 40 (constant over time).
o Marginal effects for 201 (common across 9 and )
(1)
The model can be estimated using the (pooled) Ordinary Least Squares (OLS) estimator
o The simplest approach to the estimation.
o Individual effects 40 are fixed and common across economic agents, such that 40 = 4
for all 9 = 1, . . . ,
o OLS produces consistent and efficient estimates of 4 and .
One assumption of OLS is that there is a zero correlation between the error terms of any
two observations.
The problem with panel data is that we would expect there to be correlations between error
terms for a particular individual across different time periods.
So, if unobserved variables for an individual tend to make its error term positive in one
period, they will tend to make its error term positive in other periods as well. For example, if
a county has a particularly high rate of unemployment in one year, it is likely to have a high
rate then next year, too.
This correlation between error terms is a violation of one of the assumptions of OLS. This
violation means that OLS is not the best estimator.
Bias from Ignoring Fixed Effects
Accounting for Fixed Effects

First Differencing the Data
Is the easiest way of dealing with the fixed effects
The lagged value of 01 is:
01; = 40 + 201; + 301;
Taking first differences gives us:

(01 01; ) = (40 40 ) + (201 201; ) + (301 301; )
Or:
01 = 201 + 301
OLS on this transformed equation will yield consistent estimates of since the 40 has been
removed through first-differencing
201 and 301 are assumed uncorrelated because of the assumption that (301 |201 ) = 0
First differencing tends to introduce a negative correlation across observations since:

&=(301 , 301; ) = &=(301 301; , 301; 301; ) = =)(301; )
First-differenced regression is less efficient than other (fixed effects) methods (except when
errors follow a random walk)
In addition to first-differences, one could use longer differences (e.g. five-year differences).
Such estimators often have useful properties (e.g. robust to measurement error)
The Within-Groups (WG) estimator
o Can be used if individual effects 40 are fixed but not common across 9 = 1, . . . ,
o Eliminates the fixed effect 40 by differencing with respect to the mean
Let 0 = ; -1? 01 and 20 = ; -1? 201
Define: 201
= 201 20 and 01 = 01 0
Then 0 = 40 + 20B + 30
Subtracting from (1) gives:
01 0 = (40 40 ) + (201 20 ) + (301 30 )
01 0 = (201 20 ) + (301 30 )
Or 01 = 201
+ 301
Which can then be estimated by OLS
The individual effects can be estimated as 4D0 = 0 20
The estimator of the slope parameters, , is consistent if either or become large
The estimator of the individual effects, 4D0 , is constant only if becomes large
The number of degrees of freedom need to be adjusted.
o Usually the degrees of freedom would be E, but with individual effects we have
E (software packages usually make this correction when running their panel
commands)
The Within Groups Estimator
Drawback with the Within-Groups estimator

o Eliminates time-invariant characteristics from a model of the form
01 = 40 + 201 + F0B + 301
o As such, we cannot distinguish between observed and unobserved heterogeneity
The Least Squares Dummy Variable (LSDV) Model

Define a series of group-specific dummy variables G01 = 1 (H = 9)
This gives:
01 = 40 + 201 + 301
(2)
01 = 4 01 + 4 01 + + 4" "01 + 201 + 301

Estimate by standard OLS (excluding a constant)
Here the constant terms vary by individual, but the slopes are the same for all individuals
A test for individual effects:

J : 4 = 4 = = 4"
which can be tested using an F-test.
o Note: equation (2) can be written as 01 = 4 + L0 + 201 + 301 , where 4 is the average
individual effect and L0 is the deviation from average
o The model can thus be estimated by including a constant and 1 individual
dummies
Problems
o Incidental parameters the number of dummies grows as increases. The usual proof
for consistency does not hold for LSDV models therefore
o Inverting an + E matrix can be impossible, and even when possible impractical
and/or inaccurate
Random Effects Models

In the random effects model the 40 are treated as random variables, rather than fixed
constants
o The 40 are usually assumed to be independent of the errors 301 and also mutually
independent, i.e.
o 40 ~NNO(0, PQ )
o 301 ~NNO(0, PR )
o 40 and 301 are independently distributed
Since 40 are now random, the errors now take the following form: 01 = 40 + 301
The presence of 40 produces a correlation among the errors of the same cross-section unit
(i.e. $&=(01 , 0S ) 0, though the errors from the different cross-section units are
independent(i.e. $&=01 , U1 = 0)
o OLS is thus inefficient in the random effects model, and yields incorrect standard errors
Since the errors are correlated, we use Generalised Least Squares (GLS) to estimate the
model
One possibility is to estimate a regression model with panel data using OLS. This imposes the assumption that the fixed
effects are the same for each individual.
. regress lwage educ exper expersq black married hisp, vce(robust)

Linear regression
Number of obs
F( 6, 4353)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4360
141.60
0.0000
0.1659
.48676
-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0989097
.0045823
21.59
0.000
.0899261
.1078932
exper |
.0940639
.0101683
9.25
0.000
.0741289
.1139989
expersq | -.0032342
.000678
-4.77
0.000
-.0045634
-.0019051
black | -.1142367
.0248047
-4.61
0.000
-.1628665
-.0656069
married |
.1165982
.0154429
7.55
0.000
.0863222
.1468742
hisp |
.0266568
.0198779
1.34
0.180
-.012314
.0656276
_cons | -.0065678
.0649791
-0.10
0.919
-.13396
.1208244
In this case it is often useful to cluster standard errors around the individual. Statas cluster command specifies
that the standard errors allow for intragroup correlation, relaxing the usual requirement that the observations be
independent. That is, the observations are independent across groups (clusters) but not necessarily within groups.
. regress lwage educ exper expersq black married hisp, vce(cluster nr)
Linear regression
Number of obs
F( 6,
544)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4360
58.28
0.0000
0.1659
.48676
(Std. Err. adjusted for 545 clusters in nr)

-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------educ |
.0989097
.0092261
10.72
0.000
.0807865
.1170328
exper |
.0940639
.012361
7.61
0.000
.0697827
.1183451
expersq | -.0032342
.0008643
-3.74
0.000
-.0049321
-.0015364
black | -.1142367
.0520134
-2.20
0.028
-.2164083
-.012065
married |
.1165982
.0266349
4.38
0.000
.0642784
.168918
hisp |
.0266568
.0403928
0.66
0.510
-.0526881
.1060018
_cons | -.0065678
.119913
-0.05
0.956
-.2421171
.2289815
To allow for differences in the distribution in different time periods it is often desirable to allow for differences
in the intercept over time. This can be achieved by including a set of time dummies.
. regress lwage educ exper expersq black married hisp d8*, vce(cluster nr)
Linear regression
Number of obs
F( 13,
544)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4360
43.29
0.0000
0.1682
.48649

-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------educ |
.0920318
.0111277
8.27
0.000
.0701732
.1138903
exper |
.0789065
.0201169
3.92
0.000
.0393902
.1184228
expersq | -.0030804
.0010397
-2.96
0.003
-.0051227
-.0010381
black | -.1101816
.0524171
-2.10
0.036
-.2131463
-.0072168
married |
.1173324
.0266045
4.41
0.000
.0650723
.1695925
hisp |
.0272366
.0403107
0.68
0.500
-.0519471
.1064203
d81 |
.0500808
.0285791
1.75
0.080
-.0060582
.1062197
d82 |
.0496073
.0378311
1.31
0.190
-.0247058
.1239203
d83 |
.0417228
.0473305
0.88
0.378
-.0512501
.1346957
d84 |
.0677402
.0594756
1.14
0.255
-.0490897
.1845702
d85 |
.079509
.0686687
1.16
0.247
-.0553792
.2143973
d86 |
.1092778
.0782723
1.40
0.163
-.0444753
.2630308
d87 |
.1512288
.0872289
1.73
0.084
-.020118
.3225756
_cons |
.0958252
.1620766
0.59
0.555
-.2225475
.4141978
To estimate the model using panel techniques (i.e. fixed and random effects) it is necessary to tell Stata which is the
group and which if the time identifier.
. tsset nr year
panel variable:
time variable:
delta:
nr (strongly balanced)
year, 1980 to 1987
1 unit
To run a fixed effects model is then straightforward using the xtreg command.
. xtreg lwage educ exper expersq married black hisp, fe vce(robust)
note: educ omitted because of collinearity
note: black omitted because of collinearity
note: hisp omitted because of collinearity
Fixed-effects (within) regression
Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:
Obs per group: min

avg
max
F(3,544)
Prob > F
=
=
=
=
=
8
8.0
8
135.44
0.0000
within = 0.1741
between = 0.0014
overall = 0.0534
corr(u_i, Xb)
= -0.1289

-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------educ | (omitted)
exper |
.1169371
.0106843
10.94
0.000
.0959496
.1379245
expersq | -.0043329
.0006857
-6.32
0.000
-.0056799
-.0029859
married |
.0473384
.0211735
2.24
0.026
.0057465
.0889303
black | (omitted)
hisp | (omitted)
_cons |
1.085044
.0367222
29.55
0.000
1.01291
1.157179
-------------+---------------------------------------------------------------sigma_u | .40387667
sigma_e | .35204264
rho | .56824994
(fraction of variance due to u_i)
Similarly, a random effects model can be estimated as:

. xtreg
lwage educ exper expersq married black hisp, re vce(robust)
Random-effects GLS regression

Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:
Obs per group: min =

avg =
max =
8
8.0
8
within = 0.1739
between = 0.1548
overall = 0.1635
corr(u_i, X)
= 0 (assumed)
Wald chi2(6)
Prob > chi2
=
=
517.71
0.0000

-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------educ |
.1011033
.0088919
11.37
0.000
.0836756
.1185311
exper |
.1128358
.0104931
10.75
0.000
.0922696
.1334019
expersq | -.0041483
.0006719
-6.17
0.000
-.0054652
-.0028314
married |
.065336
.0192047
3.40
0.001
.0276956
.1029765
black | -.1269633
.0514673
-2.47
0.014
-.2278373
-.0260892
hisp |
.026507
.0407199
0.65
0.515
-.0533025
.1063165
_cons | -.0845855
.1154202
-0.73
0.464
-.310805
.1416339
-------------+---------------------------------------------------------------sigma_u | .33561173
sigma_e | .35204264
rho | .47611949
It is also possible to estimate the fixed effects model using the least squares dummy variable approach. Initially we
have to define a set of dummy variables one for each individual.
. tab(nr), gen(dumi)
Then estimate the regression model including the dummy variables using OLS
. regress lwage educ exper expersq married black hisp dumi*, noc vce(robust)
note: dumi375 omitted because of collinearity
Linear regression
Number of obs
F(548, 3812)
Prob > F
R-squared
Root MSE
=
4360
= 1048.92
= 0.0000
= 0.9639
= .35204
-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------educ |
.0826707
.0040599
20.36
0.000
.0747108
.0906305
exper |
.1169371
.0091402
12.79
0.000
.0990168
.1348573
expersq | -.0043329
.0005988
-7.24
0.000
-.0055069
-.0031589
married |
.0473384
.0181832
2.60
0.009
.0116886
.0829882
black |
.1943954
.0733538
2.65
0.008
.0505789
.3382118
hisp |
.4369434
.0835889
5.23
0.000
.2730601
.6008268
dumi1 | -.3174653
.3215929
-0.99
0.324
-.947976
.3130454
dumi2 | -.0474871
.0700694
-0.68
0.498
-.1848643
.0898
..................
dumi542 |
.5402538
.078385
6.89
0.000
.3865731
.6939344
dumi543 | -.3221958
.1511144
-2.13
0.033
-.6184686
-.025923
dumi544 |
.7369184
.0553026
13.33
0.000
.6284928
.8453439
dumi545 | -.0496064
.0986855
-0.50
0.615
-.2430879
.1438751
Warning: we now have estimates for educ, black, etc, but things are not as they appear!
Fixed effects can also be eliminated by differencing.

. regress d.lwage d.educ d.exper
note: _delete omitted because of
Linear regression
d.expersq d.black d.married d.hisp, vce(robust)

collinearity
collinearity
collinearity
collinearity
Number of obs
F( 2, 3812)
Prob > F
R-squared
Root MSE
=
=
=
=
=
3815
5.08
0.0063
0.0030
.44326
-----------------------------------------------------------------------------|
Robust
D.lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------educ |
D1. | (omitted)
|
exper |
D1. | (omitted)
|
expersq |
D1. | -.0038615
.0013887
-2.78
0.005
-.0065841
-.0011388
|
black |
D1. | (omitted)
|
married |
D1. |
.0383504
.0233253
1.64
0.100
-.0073809
.0840818
|
hisp |
D1. | (omitted)
|
_cons |
.1155318
.0211452
5.46
0.000
.0740747
.1569889
The Two-Way Fixed Effects Model

In the one-way model, we assume that there exists an unobserved individual heterogeneity,
but that the model is homogenous over time
Two-way panel models allow for unobserved heterogeneity across both time and individuals
The two-way panel model can be written as

01 = 40 + V1 + 201 + 301
or,
01 = 4 + L0 + W1 + 201 + 301
where 0 L0 = 0 and 1 W1 = 0
We can then define:

o Individual/time effect: 401 = 4 + L0 + W1
o The average effect: 4 = 4 = "- 0 1 401

o The individual effect: 4 + L0 = 40 = - 1 401

o The time effect: 4 + W1 = 41 = " 0 401
Using these we can write 401 40 41 + 4 = 0
The two-way fixed effects panel model can be estimated using the LSDV approach by
including time dummies FS01 = 1 (, = ) in addition to individual dummies, thus estimating:
01 = 4 01 + 4 01 + + 4" "01 + V F01 + + V- F-01 + 201 + 301
One problem with estimating the two-way panel model using dummy variables is that there
is an incidental parameters problem as either or go to infinity
o A new within transformation can remove these:
X01 = 01 0 1 +
o The two-way within model can then be written as
X01 = 2X01 + 01
The average, individual and time effects can now be estimated as
o 4DZ = Z 2
o 4DZ,0 = 0 Z 20
o 4DZ,1 = 1 Z 21
Consistency:
o 4DZ and Z are consistent as either or tend to infinity
o 4DZ,0 is only T-consistent
o 4DZ,1 is only N-consistent
The two-way within transformation removes both observed and unobserved heterogeneity
for both individual and time effects
The two-way model can also be estimated using a random effects model by GLS
In one-way models the fixed effects are either fixed or random. In a two-way model the
individual and time effects can be fixed or random
o i.e. we may have mixed random effects / fixed effects models where the time effect is
assumed fixed and the individual effect random for example
o if is small for example, one may estimate a one-way random effects model on a set
of exogenous variables and time dummies
In certain cases a dataset might have more than 2 dimensions: e.g. firm-industry-year;
country region-year; individual-household-year; employee-firm-year; farm-region-year.
This class of data can be analyzed using nested error component models.
80U1 = 4 + 7[0U1 + 0U1 , 0U1 = L0 + \U + W1 + ]0U + 30U1
Which can be estimated using LSDV methods.
Other Estimators
Between Effects Regression
Between Groups estimation involves OLS on the cross-section equation:
0 = 20 + (40 + 0 )
i.e. we average out all of the within-individual variation, leaving only the between-individual
variation
The model can be estimated using OLS by either
o Using one group-mean observation per individual
o Or using 0 copies of the individual group mean data for individual 9
The latter is equivalent to a weighted regression of 0 on 20 , with the weights given by 0 for
individual 9. It is often desirable to give more weight to individuals with many time series.
Consistency requires that (201 40 ) = 0
Between groups estimation is not efficient
Usually only used to obtain an estimate of PQ when using feasible GLS
Seemingly Unrelated Regression

Seemingly unrelated regressions (SUR) involves estimating a different model for each
individual within the data set (though there is a bit more to it) involves system estimation.
For example, if you are trying to estimate high school graduation rates and you have data on
all fifty states in the U.S. from a particular time period (1985 through 2005, for example),
you estimate a separate model for each state.
o This generates some coefficient estimates and some error terms.
o However, error terms are likely to be correlated between states for any particular year.
For example, there might have been some national event in 1990 that caused
graduation rates across the country to be unusually high, so we would expect to
see that error terms are correlated across observations in this year.
o This correlation is a violation of the assumptions of OLS, and the SUR takes advantage
of this to improve upon OLS.
The process uses the information about the correlation between the error terms to improve
upon the OLS estimates and come up with improved coefficient estimates.
The statistical technique that is used to compute these improved estimates is called
generalized least squares.
One potential problem with the SUR model is that you might have more explanatory
variables than you have observations for any individual in the data, which would essentially
make this approach impossible.
Mixed Models or Random Coefficient Models

In linear random-intercept models, the overall level of the response, conditional on 2, could
vary across clusters
In random coefficients models, we also allow the marginal effect of the covariates to vary
across clusters
Consider the model:
01 = ( + 0 ) + ( + 0 )201 + 01
This allows for the intercept and slope coefficients to vary across individuals (we could also
allow for the coefficients to change across time also)
Such models are in many cases not estimable due to a degrees of freedom problem
One solution is to assume that each regression coefficient is a random variable with a
probability distribution
This reduces the number of parameters to be estimated significantly
In particular we may assume that:

0 ~(0, ^ )
0 ~(0, ^ )
&=( 0 , 0 ) = ^
In the above equation we can consider to be the common mean coefficient vector and
the 0 s as the individual deviation from the mean.
Rewriting the above equation we have:
01 = ( + 201 ) + ( 0 + 0 201 ) + 01
]01 = ( 0 + 0 201 ) + 01

=)(]01 ) = ^ + 2^ 201 + ^ 201
+ P
Since the variance of ]01 depends on 201 there is heteroscedasticity

The model can be estimated by Generalised Least Squares
Example of Snijders and Bosker

Dataset allows one to consider verbal IQ as a predictor of language scores
Data is collected on individuals within schools
To estimate a Random Coefficients model we can use the xtmixed command in Stata
To let Stata know that we want the covariance between the intercept and slope to be estimated we specify covariance(unstructured)
. xtmixed langpost iqvc || schoolnr: iqvc, mle covariance(unstructured)

Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
log likelihood = -7615.9951
Iteration 1:
Iteration 2:
Computing standard errors:
Mixed-effects ML regression
Group variable: schoolnr
Log likelihood = -7615.3887
Number of obs
Number of groups
=
=
2287
131

avg =
max =
4
17.5
35
Wald chi2(1)
Prob > chi2
=
=
962.03
0.0000
-----------------------------------------------------------------------------langpost |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------iqvc |
2.52637
.0814522
31.02
0.000
2.366726
2.686013
_cons |
40.70956
.3042423
133.81
0.000
40.11325
41.30586
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
-----------------------------+-----------------------------------------------schoolnr: Unstructured
|
sd(iqvc) |
.4583713
.1100965
.286264
.7339526
sd(_cons) |
3.058354
.2491357
2.607043
3.587791
corr(iqvc,_cons) | -.8168636
.1743621
-.9744848
-.1196644
-----------------------------+-----------------------------------------------sd(Residual) |
6.44051
.1004244
6.246659
6.640377
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(3) =
246.91
Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
The expected language score for a child with average IQ now averages 40.7 across schools, with a standard dviation of 3.1.
The expected gain in language score per point of IQ averages 2.5, with a standard devaition of 0.48.
The intercept and slope have a negative correlation of -0.82 across schools, so schools with higher language scores for a kid with average verbal IQ tend
to show smaller average gains.
Although random effects are not directly estimated, you can form best linear unbiased predictions (BLUPs) of them (and standard errors) using predict after
xtmixed.
The next step is to predict fitted values as well as the random effects (we can also verify that we can reproduce the fitted values)
. predict yhat2, fitted
// yhat for model 2
. predict rb2 ra2, reffects
//residual
slope and intercept for model 2
. gen check = (_b[_cons]+ra2) + (_b[iqvc]+rb2)*iqvc

. list yhat2 check in 1/10
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
+---------------------+
|
yhat2
check |
|---------------------|
| 20.78043
20.78043 |
| 24.53934
24.53934 |
| 27.04527
27.04527 |
| 30.80418
30.80418 |
| 33.31012
33.31012 |
|---------------------|
| 34.56309
34.56309 |
| 34.56309
34.56309 |
| 34.56309
34.56309 |
| 34.56309
34.56309 |
| 35.81606
35.81606 |
+---------------------+
The graph of fitted lines shows clearly how school differences are more pronounced at lower than at higher verbal IQs.
Fixed or Random Effects

o Random Effects is more efficient
o Can estimate all parameters (e.g. time invariant variables) with random effects
o Random effects is inconsistent if fixed effects are present
o Fixed effects allow for arbitrary correlation between the individual effects and the
regressors
o The group effect can be thought of as random if we can think of the sample as being
drawn from a larger population.
o Fixed effects model appropriate when differences between individuals may be viewed as
parametric shifts in the regression function (considered reasonable when the sample
covers broadly exhaustive sample of the population)
o Random effects more applicable when we want to draw inferences for the whole
population
o Random effects preferred when there is no correlation between the individual effects and
the regressors (can test this using the Hausman test)
o LSDV model often results in a large loss in degrees of freedom
o Fixed effects model eliminates a large portion of the total variation if the between sum of
squares are large relative to the within sum of squares
o The 40 are a total of several factors specific to the cross-section units and thus represents
specific ignorance, which can be treated as random variables, in the same manner as 01
which represents general ignorance are treated as random
Comparison of Estimators
(1)
OLS
educ
exper
expersq
married
black
hisp
Observations
0.0989***
(0.00458)
0.0941***
(0.0102)
-0.00323***
(0.000678)
0.117***
(0.0154)
-0.114***
(0.0248)
0.0267
(0.0199)
4360
(2)
OLS
(clustered s.e)
0.0989***
(0.00923)
0.0941***
(0.0124)
-0.00323***
(0.000864)
0.117***
(0.0266)
-0.114**
(0.0520)
0.0267
(0.0404)
4360
(3)
FE
0.117***
(0.0107)
-0.00433***
(0.000686)
0.0473**
(0.0212)
4360
(4)
RE
(5)
BE
0.101***
(0.00889)
0.113***
(0.0105)
-0.00415***
(0.000672)
0.0653***
(0.0192)
-0.127**
(0.0515)
0.0265
(0.0407)
4360
0.0941***
(0.0112)
-0.00271
(0.0511)
0.00212
(0.00327)
0.161***
(0.0423)
-0.0963*
(0.0498)
0.0233
(0.0439)
4360
Chow Test
o Provides a test of the pooled (restricted model) versus the fixed effects (unrestricted)
model
o This is simply a joint test of whether the fixed effects are significant
(aabb cabb)/( 1)
$J_` =
cabb/( E)
where RRSS and URSS are the residuals sum of squares from the restricted and unrestricted
model respectively. This is distributed e";,"-;";f under the null of no fixed effects.
o If there are a number of observed individual specific variables in the model, these are
included in the pooled model, but not the fixed effects model (i.e. we want to test for
unobserved heterogeneity)
. regress lwage educ exper expersq black married hisp i.nr

note: 11892.nr omitted because of collinearity
Source |
SS
df
MS
-------------+-----------------------------Model |
764.09314
547
1.3968796
Residual | 472.436482 3812 .123934019
-------------+-----------------------------Total | 1236.52962 4359 .283672774
Number of obs
F(547, 3812)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
4360
11.27
0.0000
0.6179
0.5631
.35204
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------educ |
.0290989
.0352816
0.82
0.410
-.0400737
.0982715
exper |
.1169371
.0084385
13.86
0.000
.1003926
.1334815
expersq | -.0043329
.0006066
-7.14
0.000
-.0055222
-.0031436
black |
.8273663
.1536651
5.38
0.000
.5260925
1.12864
married |
.0473384
.0183445
2.58
0.010
.0113725
.0833043
hisp |
.6551428
.1539299
4.26
0.000
.3533498
.9569357
|
nr |
17 |
.2164064
.1614937
1.34
0.180
-.1002159
.5330287
18 |
.5947677
.1539649
3.86
0.000
.2929061
.8966293
45 |
.496684
.1534798
3.24
0.001
.1957735
.7975946
..
12500 | -.1118741
.1536758
-0.73
0.467
-.4131688
.1894206
12534 |
.8936683
.1537159
5.81
0.000
.592295
1.195042
12548 | (omitted)
|
_cons |
.4325396
.4165826
1.04
0.299
-.3842066
1.249286
------------------------------------------------------------------------------
. testparm i.nr
(
(
(
(
(
1)
2)
3)
4)
5)
17.nr = 0
18.nr = 0
45.nr = 0
110.nr = 0
120.nr = 0
(536) 12420.nr = 0
(537) 12433.nr = 0
(538) 12451.nr = 0
(539) 12477.nr = 0
(540) 12500.nr = 0
(541) 12534.nr = 0
F(541, 3812) =
Prob > F =
8.34
0.0000
So, we reject the null that all fixed effects are zero (and thus prefer the fixed effects over the pooled model)
Hausman Test
o Usually applied to test for fixed versus random effects models
o Compares directly the random effects estimator, g! to the fixed effects estimator, h!
o In the presence of a correlation between the individual effects and the regressors the
GLS estimates are inconsistent, while the OLS fixed effects results are consistent
o If there is no correlation between the fixed effects and the regressors both estimators
are consistent, but the OLS fixed effects estimator is inefficient
o Construct i = h! g! and j (i ) = jh! jg!
;
o Test statistic: k = iD jl (iD)# iD distributed as a m statistic with n degress of freedom
(where n is the dimensionality of )
o The null hypothesis is that the preferred model is a random effects model and the
alternative that the fixed effects model is preferred
. xtreg lwage educ exper expersq black married hisp, fe

Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:

avg =
max =
8
8.0
8
within = 0.1741
between = 0.0014
overall = 0.0534
corr(u_i, Xb)
= -0.1289
F(3,3812)
Prob > F
=
=
267.93
0.0000
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------educ | (omitted)
exper |
.1169371
.0084385
13.86
0.000
.1003926
.1334815
expersq | -.0043329
.0006066
-7.14
0.000
-.0055222
-.0031436
black | (omitted)
married |
.0473384
.0183445
2.58
0.010
.0113725
.0833043
hisp | (omitted)
_cons |
1.085044
.026295
41.26
0.000
1.033491
1.136598
-------------+---------------------------------------------------------------sigma_u | .40387667
sigma_e | .35204264
rho | .56824994
-----------------------------------------------------------------------------F test that all u_i=0:
F(544, 3812) =
8.29
Prob > F = 0.0000
. estimates store coeff_consistent
. xtreg lwage educ exper expersq black married hisp, re

Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:

avg =
max =
8
8.0
8
within = 0.1739
between = 0.1548
overall = 0.1635
corr(u_i, X)
= 0 (assumed)
Wald chi2(6)
Prob > chi2
=
=
901.13
0.0000
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------educ |
.1011033
.0091567
11.04
0.000
.0831565
.1190501
exper |
.1128358
.0082738
13.64
0.000
.0966195
.1290521
expersq | -.0041483
.0005928
-7.00
0.000
-.0053101
-.0029864
black | -.1269633
.0488629
-2.60
0.009
-.2227328
-.0311938
married |
.065336
.0168465
3.88
0.000
.0323175
.0983546
hisp |
.026507
.0437909
0.61
0.545
-.0593215
.1123355
_cons | -.0845855
.1135289
-0.75
0.456
-.3070982
.1379271
-------------+---------------------------------------------------------------sigma_u | .33561173
sigma_e | .35204264
rho | .47611949
-----------------------------------------------------------------------------. estimates store coeff_efficient
. hausman coeff_consistent coeff_efficient

---- Coefficients ---|
(b)
(B)
(b-B)
sqrt(diag(V_b-V_B))
| coeff_cons~t coeff_effi~t
Difference
S.E.
-------------+---------------------------------------------------------------exper |
.1169371
.1128358
.0041013
.0016594
expersq |
-.0043329
-.0041483
-.0001846
.0001287
married |
.0473384
.065336
-.0179977
.0072605
-----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test:
Ho:
difference in coefficients not systematic

chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
11.79
Prob>chi2 =
0.0081
So the fixed effects model is preferred
Breusch and Pagan Test

o Provides a test of the random effects model against the pooled OLS model
o Tests the null hypothesis that PQ = 0, which is the case where the individual effects do
not exist and OLS is applicable (i.e. the random effects model reduces to the pooled
one if the variance of the individual effects is zero)
o Denote the residuals from the OLS (pooled) regression as 01

"

o Define: b = "
0?(1? 01 ) and b = 0? 1? 01
"-
qr
o Test statistic: W = (-;) pq 1t , distributed as a m statistic with 1 degree of

s
freedom under the null hypothesis
. xtreg lwage educ exper expersq black married hisp, vce(robust)

Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:

avg =
max =
8
8.0
8
within = 0.1739
between = 0.1548
overall = 0.1635
corr(u_i, X)
= 0 (assumed)
Wald chi2(6)
Prob > chi2
=
=
517.71
0.0000

-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------educ |
.1011033
.0088919
11.37
0.000
.0836756
.1185311
exper |
.1128358
.0104931
10.75
0.000
.0922696
.1334019
expersq | -.0041483
.0006719
-6.17
0.000
-.0054652
-.0028314
black | -.1269633
.0514673
-2.47
0.014
-.2278373
-.0260892
married |
.065336
.0192047
3.40
0.001
.0276956
.1029765
hisp |
.026507
.0407199
0.65
0.515
-.0533025
.1063165
_cons | -.0845855
.1154202
-0.73
0.464
-.310805
.1416339
-------------+---------------------------------------------------------------sigma_u | .33561173
sigma_e | .35204264
rho | .47611949
------------------------------------------------------------------------------
. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects
lwage[nr,t] = Xb + u[nr] + e[nr,t]
Estimated results:
|
Var
sd = sqrt(Var)
---------+----------------------------lwage |
.2836728
.5326094
e |
.123934
.3520426
u |
.1126352
.3356117
Test:
Var(u) = 0
chibar2(01) =
Prob > chibar2 =
3425.60
0.0000
We reject the null hypothesis, which indicates that there are significant differences across individuals and
random effects is more appropriate
Misspecification Tests
It is difficult to investigate the time-series properties (e.g. autocorrelation, stationarity, etc) of
panel data when is small
Testing for heteroscedasticity is possible with small using the Bickel version of the
Bresuch-Pagan test
o This is a test of both within and between heterogeneity
o This is a test of V = = Vu = 0 in the regression model
u
01 = V + V D01 + + Vu D01 + v01
where 01 and D01 are the residuals and fitted values respectively from the within regression
For medium and larger values of a Bartlett type test is used.

o This assumes homoscedasticity within individuals, and tests for heteroscedasticity
between individuals
o Using the residuals from the within regression calculate the total residual variance,
, =
"-;";f
01 , and the within individual variances, ,0 =
-;
1 01
o Calculate the Bartlett statistic, =
(-;) "wxS s ; wxSys #

,
z{("z)/(-;)|

which is distributed as m";
under the null hypothesis

A test for first-order within individual autocorrelation is calculated from the within
regression residuals as
"
0? 1? 01 0,1;
)=

"
0? 1? 01
"- s
o The simplest test is then the Breusch-Godfrey test; }~ = -; . ), which is distributed
(0,1) under the null hypothesis
o Given the slow convergence to normality a superior alternative due to Fisher is often
used; F =
z
"-;";f
. ln
,

;
which is also distributed (0,1) under the null
If evidence of heteroscedasticity or autocorrelation is discovered, one could try to model the

heteroscedasticity and/or correlations
o This can be difficult even for large , but is generally impossible for small
o An alternative is to accept the coefficient estimates, but use robust standard errors
If heteroscedasticity is a problem we can use Whites robust standard errors
If heteroscedasticity and/or within individual autocorrelation is suspected we can
use Arrelanos robust standard errors
o The White method is often included in statistical packages, with the variance;
;
covariance matrix given by, j) = 77 01 701 701 77 , where 7 is the
( E) difference-from-mean matrix of all exogenous variables and 701 the (1 E)
row vector of variables for a given observation
o The Arellano method is less standard, with the variance-covariance matrix given by,
;
;
j) = 77 70 0 0 70 77 , where 70 is the ( E) difference-frommean matrix of exogenous variables, and 0 is the ( 1) vector of residuals for the 9th
individual
A user-written command in Stata (xttest3) allows one to test for heteroscedasticity. This tests for
heteroscedasticity within groups.
. ssc install xttest3
. xtreg lwage educ exper expersq black married hisp, fe
Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:

avg =
max =
8
8.0
8
within = 0.1741
between = 0.0014
overall = 0.0534
corr(u_i, Xb)
= -0.1289
F(3,3812)
Prob > F
=
=
267.93
0.0000
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------educ | (omitted)
exper |
.1169371
.0084385
13.86
0.000
.1003926
.1334815
expersq | -.0043329
.0006066
-7.14
0.000
-.0055222
-.0031436
black | (omitted)
married |
.0473384
.0183445
2.58
0.010
.0113725
.0833043
hisp | (omitted)
_cons |
1.085044
.026295
41.26
0.000
1.033491
1.136598
-------------+---------------------------------------------------------------sigma_u | .40387667
sigma_e | .35204264
rho | .56824994
-----------------------------------------------------------------------------F test that all u_i=0:
F(544, 3812) =
8.29
Prob > F = 0.0000
. xttest3
Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model
H0: sigma(i)^2 = sigma^2 for all i
chi2 (545) =
Prob>chi2 =
2.2e+05
0.0000
So, we reject the null of homoscedasticity

Note: the power of the test is weak in large N, small T panels.
. net sj 3-2 st0039

. net install st0039
. xtserial lwage educ exper expersq black married hisp
Wooldridge test for autocorrelation in panel data
H0: no first-order autocorrelation
F( 1,
544) =
24.809
Prob > F =
0.0000
Linear Dynamic Panel Models

Why model dynamics?
o Many economic relationships are dynamic in nature and one of the advantages of panel
data is that they allow the researcher to better understand the dynamics of adjustment
o Current outcomes might depend on past values of the explanatory variables, i.e. include
lags of the 2s in the model (a distributed lag model)
In this case we can use similar techniques to those described above
o Simple dynamic model regresses 01 on polynomial in time
e.g. growth curve of child height or IQ as grow older
use previous models with 201 polynomial in time or age
o Adjustment might be partial: this years outcome depends not only on 2, but also on
last years outcome, i.e. include lags of
Note: this is equivalent to including an infinite number of lagged 2s
This further implies that we have in the equation the entire history of the RHS
variables, meaning that any measured influence is conditioned on this history
Linear Dynamic Panel Models with Individual Effects

It is common to consider an AR(1) model with individual fixed effects
01 = V0,1; + 201 + 40 + 01
Though more general models can be used (e.g. error correction models, ARMA models).
Consider the within-group transformation (i.e. the mean difference);

01 0 = (201 20 ) + V0,1; 0 + 01 0
-y
1?
1?
1
1
0 = 0,1; = 01

We have got rid of the individual effect. But what are the statistical properties of a
regression of 01 0 on (201 20 ) and 0,1; 0 ?
Properties of the Within-Group Estimator

Obtain an expression for 01 that only involves 40 , 2, and 0 (the starting value or initial
condition of )
01 = V0,1; + 201 + 40 + 01
By substitution;
01 = VV0,1; + 20,1; + 40 + 0,1; + 201 + 40 + 01
01 = V 0,1; + V20,1; + V40 + V0,1; + 201 + 40 + 01
01 = V V0,1; + 20,1; + 40 + 0,1; + V20,1; + V40 + V0,1; + 201 + 40 + 01
01 = V V0,1; + V 20,1; + V 40 + V 0,1; + V20,1; + V40 + V0,1; + 201 + 40 + 01

01 = V V0,1; + 201 + V20,1; + V 20,1; + (1 + V + V )40 + 01 + V0,1; + V 0,1;
And so on until we arrive at = 0 (i.e. initial conditions are important)
Hence, the statement that this essentially estimating a model including an infinite number
of lags of the 2s
Distributed lag form:

1;
01 = V S 40 + V S 20,1;S + 0,1;S + V 1 0

S?
1 V1
01 =
40 + V S 20,1;S + 01 + V0,1; + + V 1 0, # + V 1 0
1V
S
01; is a function of 0,1; , , 0,

0
y ;
-1?
01
=
0 is a function of 0,-; , , 0, and 0
01; 0 is correlated with 01 0
Bias in within-group regression coefficients

o Bias of the within-groups estimator is caused by eliminating the individual effect from
the equation. This causes a correlation between the transformed error term and the
transformed lagged dependent variable
o Bias is generally negative for small
o For large the bias is small, but with panel data tends to be small
As with the within-groups estimator there is a bias in the estimator when first-differencing
the model
This is also true for other models (e.g. pooled OLS, random effects,)
Consider first-differencing to eliminate individual effects
01 0,1; = 201 20,1; + V0,1; 0,1; + 01 0,1;
OLS is inconsistent since 01 0,1; is correlated with 01 0,1; (even under the
assumption that 01 is serially uncorrelated)
o The transformed error term 01 0,1; is a MA(1) process which contains 0,1; , and
is thus correlated with 01 0,1;
There are several IV estimators which correct for endogeneity of the lagged dependent
variable.
o Similar to the method of Hausman and Taylor (see below) the instruments come from
within the model.
o Examples include Anderson and Hsiao, Arellano and Bond, and Blundell and Bond
What we need is a set of instruments that are correlated with 01 0,1; , but not with
0,1;
o All lagged 201 and 0,1; , , 0 are valid instruments if {01 | is serially independent
Since 0,1; is not correlated with 01 0,1; Anderson and Hsiao suggested using 0,1;
as an instrument for 0,1; 0,1; alongside 201 , 20,1; and 20,1;
Problems
If 01 is (or is close to) a random walk then 0,1; is not correlated with 0,1; and is not a
valid instrument
Methods based solely on the differenced equation ignores potentially valuable information
contained in the initial condition
What is the optimal point on the trade-off between the number of lags used as instruments
and the number of time periods retained in the estimation sample?
System Estimators
The time differenced model:
01 0,1; = 201 20,1; + V0,1; 0,1; + 01 0,1;
01 = 201 + V0,1; + 01
= 2, , 0
(1)
Can be considered a system of 0 1 linear equations with cross-correlated errors (since

01 is correlated with 0,1; and 0,1z )
There is also some related process generating the initial conditions, 0 and 0 , which could
provide further equations
A different number of instruments is available for each of the equations in (1), for example:
o The equation for t=2 has only (20 20- , 0 )
o The equation for t=0 has (20 20- , 0 0-; )
The Method of Moments

The method of moments is a way of getting consistent estimates of model parameters
Specify moment conditions (e.g. means, covariances) implied by the model as a function of
its parameters (population moments)
Write down the sample analogues of these moment conditions, i.e. expressions into which
you can plug the sample data, as a function of parameter estimates
Choose values for the parameter estimates which solve the sample moment conditions
Consider the mean of a random variable y

o The mean of is defined as L =
o Rearrange this as a moment condition: k(; L) = L = 0
o The sample analogue is: k

(; L) = 0?(0 L) = 0
o Solve to get the MM estimator: L = 0? 0
Often there are more moment conditions than parameters to be estimated. Then the
moment conditions dont have a unique solution
In this case, we minimise a (weighted) sum of the squares of the sample moments. In vector
(, , )B
(, , )
notation this is written in the general case as
This is called the generalised method of moments (GMM)
IV estimators are members of the class of GMM estimators (e.g. 2SLS)
System Estimation of Dynamic Panel Models

Arellano and Bond is a variation of Anderson and Hsiao that uses an unbalanced set of
instruments with further lags as instruments.
Instead of regarding (1) as one equation think of it as a system of 1 equations
t=3: 0 = 20 + V0, + 0 , instruments: F0 = 0 , 20
t=4: 0 = 20 + V0, + 0 , instruments: F0 = F0 , 0 , 20
t=T: 0- = 20- + V0,-; + 0- , instruments: F0- = F0-; , 0-; , 20 It is the use of different instruments for equations of different time periods that defines the
A&B method relative to conventional IV estimation, which uses the same instrument set for
all endogenous variables.
Conventional instruments can also be used in the analysis
A problem arises with the Arellano and Bond method if the variables are close to a random
walk, with lagged levels being poor instruments for the first differences
Arellano and Bover (1995) and Blundell and Bond (1998) show that adding the original
equation in levels to the system can increase the number of moment conditions and
increase efficiency
o In the levels equations endogenous variables are instrumented with lags of their first
differences
XTABOND2 in Stata fits both the Arellano and Bond difference GMM estimator and the
Blundell and Bond system (i.e. levels and differences) GMM estimator
Specification Testing in Dynamic Panel Models

Tests for Overidentifying Restrictions (i.e. whether the instruments appear exogenous)
o Can be tested using the standard Sargan test
o Stata also reports the Hansen J test (since the Sargan test is not robust to heteroscedasticity
or autocorrelation)
Testing for Residual Serial Correlation

o If the 01 are serially independent, then

# = PR
(01 01; ) = (01 01; )(01; 01; ) = 01;
o Thus, we would expect first order serial correlation

o We would not however expect there to be any second order serial correlation, i.e.
(01 01; ) = (01 01; )(01; 01; ) = 0
o One should test for second order serial correlation therefore
o The presence of second order serial correlation indicates a specification error
use http://www.stata-press.com/data/r7/abdata.dta, clear

xtabond2 n l.n w k l.w l.k
yr1980 yr1981 yr1982 yr1983 yr1984, gmm(l.n) iv(yr1980-yr1984) noleveleq
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Dynamic panel-data estimation, one-step difference GMM
-----------------------------------------------------------------------------Group variable: id
Number of obs
=
751
Time variable : year
Number of groups
=
140
Number of instruments = 33
5
Wald chi2(10) =
1235.04
avg =
5.36
Prob > chi2
=
0.000
max =
7
-----------------------------------------------------------------------------n |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------n |
L1. |
.554094
.1037099
5.34
0.000
.3508264
.7573616
|
w | -.7749952
.2021376
-3.83
0.000
-1.171178
-.3788128
k |
.48439
.1422565
3.41
0.001
.2055723
.7632077
|
w |
L1. |
.3597913
.1214988
2.96
0.003
.121658
.5979247
|
k |
L1. |
-.334291
.1447846
-2.31
0.021
-.6180637
-.0505183
|
yr1980 | -.0139178
.0134693
-1.03
0.301
-.0403172
.0124815
yr1981 | -.0466677
.0231872
-2.01
0.044
-.0921137
-.0012217
yr1982 |
-.038262
.0386544
-0.99
0.322
-.1140232
.0374993
yr1983 | -.0311078
.0519977
-0.60
0.550
-.1330214
.0708058
yr1984 | -.0303459
.0642001
-0.47
0.636
-.1561758
.095484
Instruments for first differences equation

Standard
D.(yr1980 yr1981 yr1982 yr1983 yr1984)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(1/.).L.n
-----------------------------------------------------------------------------Arellano-Bond test for AR(1) in first differences: z = -4.29 Pr > z = 0.000
Arellano-Bond test for AR(2) in first differences: z = -0.27 Pr > z = 0.788
-----------------------------------------------------------------------------Sargan test of overid. restrictions: chi2(23)
= 49.90 Prob > chi2 = 0.001
(Not robust, but not weakened by many instruments.)
Difference-in-Sargan tests of exogeneity of instrument subsets:
iv(yr1980 yr1981 yr1982 yr1983 yr1984)
Sargan test excluding group:
chi2(18)
= 40.82 Prob > chi2 =
Difference (null H = exogenous): chi2(5)
=
9.08 Prob > chi2 =
0.002
0.106
. xtabond2 n l.n w k l.w l.k yr1980 yr1981 yr1982 yr1983 yr1984, gmm(l.n) iv(yr1980-yr1984)
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Dynamic panel-data estimation, one-step system GMM
-----------------------------------------------------------------------------Group variable: id
Number of obs
=
891
Time variable : year
Number of groups
=
140
Number of instruments = 41
6
Wald chi2(10) =
6066.80
avg =
6.36
Prob > chi2
=
0.000
max =
8
-----------------------------------------------------------------------------n |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------n |
L1. |
.8658989
.0572771
15.12
0.000
.7536379
.9781598
|
w | -.7153568
.1246113
-5.74
0.000
-.9595905
-.4711232
k |
.5904417
.091638
6.44
0.000
.4108345
.770049
|
w |
L1. |
.5637949
.1067572
5.28
0.000
.3545547
.7730351
|
k |
L1. | -.4850106
.0941143
-5.15
0.000
-.6694714
-.3005499
|
yr1980 |
-.000474
.0123318
-0.04
0.969
-.0246438
.0236958
yr1981 | -.0172628
.0170324
-1.01
0.311
-.0506458
.0161201
yr1982 |
.0163149
.0195558
0.83
0.404
-.0220137
.0546435
yr1983 |
.0206385
.0204995
1.01
0.314
-.0195397
.0608167
yr1984 |
.0126074
.0295783
0.43
0.670
-.0453651
.0705799
_cons |
.6428872
.3137909
2.05
0.040
.0278683
1.257906
------------------------------------------------------------------------------
Instruments for first differences equation

Standard
D.(yr1980 yr1981 yr1982 yr1983 yr1984)
L(1/.).L.n
Instruments for levels equation
Standard
_cons
yr1980 yr1981 yr1982 yr1983 yr1984
D.L.n
-----------------------------------------------------------------------------Arellano-Bond test for AR(1) in first differences: z = -8.39 Pr > z = 0.000
Arellano-Bond test for AR(2) in first differences: z = -0.20 Pr > z = 0.838
-----------------------------------------------------------------------------Sargan test of overid. restrictions: chi2(30)
= 60.24 Prob > chi2 = 0.001
(Not robust, but not weakened by many instruments.)
Difference-in-Sargan tests of exogeneity of instrument subsets:
GMM instruments for levels
chi2(23)
= 54.29 Prob >
=
5.95 Prob >
iv(yr1980 yr1981 yr1982 yr1983 yr1984)
chi2(25)
= 49.46 Prob >
= 10.77 Prob >
chi2 =
chi2 =
0.000
0.546
chi2 =
chi2 =
0.002
0.056
Endogeneity Revisited
Consider the following wage regression:
01 = 40 + (30 + +(k(0 + H(01 + (2()01 + 01
We can think of two possible forms of endogeneity:
o Two-way causation experience is rewarded with high pay and workers tend to stay
in high-paid jobs
o Unobserved common factors ability is rewarded with high pay and high ability
people stay in education longer
Two-way causation
Tenure model:
(2()01 = V01 + =01
(2()01 = V(40 + (30 + +(k(0 + H(01 + (2()01 + 01 ) + =01
(2()01 = V(40 + (30 + +(k(0 + H(01 + 01 ) + =01 /(1 V )

&=((2()01 , 40 ) = VPQ /(1 V )
&=((2()01 , 01 ) = VPR /(1 V )

To deal with this kind of endogeneity we can estimate a within-group IV regression model
o The within-group transformation eliminates the 40 s and the IV deals with the
covariance between (2()01 and 01
Unobserved common factors: 40 represents high ability and high ability people stay in education
longer
(30 = 40 + &() =)9(,
&=((30 , 40 ) = PQ
( > 0)
&=((30 , 01 ) = 0
To deal with this kind of endogeneity we can estimate a within group regression model
o The within-group transformation eliminates the 40 s
o It also eliminates time-invariant variables, but there are approaches (e.g. HausmanTaylor) to obtain coefficients on these variables
Instrumental Variables Regression with Panel Data

The standard 2SLS estimator for cross-section can be easily extended to the panel context
Consider the model:
01 = 40 + 201 + 01
Where a subset of the 201 s are considered to be endogenous

Partition 201 , i.e. 201 = (201 , 201 )
Where 201 represents the endogenous covariates
&=(201 , 01 ) = 0 and &=(201 , 01 ) 0
Obtain a set of instruments i01 (at least as many as in 201 )

o Where &=(i01 , 01 ) = 0
The full set of instruments is thus i01 = (201 , i01 )
Within-group transformation:
01 0 = (201 20 ) + 01 0
The within-groups IV estimator then uses (i01 i0 ) as instruments
Other IV Estimators
By applying the between-group transformation or the random-effects GLS transformation to
the model and instruments, we can define between-group and random effects IV estimators
analogous to the regression case
o As with standard regression these estimators are not robust with respect to correlation
between the 40 and 201
. webuse nlswork, clear

. xtreg ln_wage tenure age not_smsa, fe
Group variable: idcode
Number of obs
Number of groups
=
=
28093
4699
R-sq:

avg =
max =
1
6.0
15
within = 0.1335
between = 0.2484
overall = 0.1862
corr(u_i, Xb)
= 0.1840
F(3,23391)
Prob > F
=
=
1201.75
0.0000
-----------------------------------------------------------------------------ln_wage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------tenure |
.0209418
.0008001
26.17
0.000
.0193735
.02251
age |
.0123481
.0004125
29.93
0.000
.0115395
.0131566
not_smsa |
-.099398
.0097221
-10.22
0.000
-.118454
-.080342
_cons |
1.280688
.0112142
114.20
0.000
1.258707
1.302668
-------------+---------------------------------------------------------------sigma_u | .38143467
sigma_e | .29745202
rho | .62184184
-----------------------------------------------------------------------------F test that all u_i=0:
F(4698, 23391) =
7.33
Prob > F = 0.0000
. xtivreg ln_wage age not_smsa (tenure = union south), fe

Fixed-effects (within) IV regression
Number of obs
Number of groups
=
=
19007
4134
R-sq:

avg =
max =
1
4.6
12
within =
.
between = 0.1277
overall = 0.0879
corr(u_i, Xb)
= -0.6875
Wald chi2(3)
Prob > chi2
=
=
141873.28
0.0000
-----------------------------------------------------------------------------ln_wage |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------tenure |
.2452291
.0386354
6.35
0.000
.1695051
.3209531
age | -.0651322
.0127701
-5.10
0.000
-.0901611
-.0401034
not_smsa | -.0159519
.0346643
-0.46
0.645
-.0838927
.0519888
_cons |
2.831893
.2443845
11.59
0.000
2.352908
3.310878
-------------+---------------------------------------------------------------sigma_u | .71942007
sigma_e | .64359089
rho | .55546192
-----------------------------------------------------------------------------F test that all u_i=0:
F(4133,14870) =
1.38
Prob > F
= 0.0000
-----------------------------------------------------------------------------Instrumented:
tenure
Instruments:
age not_smsa union south
. xtreg ln_wage tenure age not_smsa, re

Number of obs
Number of groups
=
=
28093
4699
R-sq:

avg =
max =
1
6.0
15
within = 0.1322
between = 0.2638
overall = 0.1979
corr(u_i, X)
= 0 (assumed)
Wald chi2(3)
Prob > chi2
=
=
4879.47
0.0000
-----------------------------------------------------------------------------ln_wage |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------tenure |
.025503
.0007467
34.15
0.000
.0240395
.0269666
age |
.0118608
.0003859
30.74
0.000
.0111044
.0126171
not_smsa | -.1580611
.0077541
-20.38
0.000
-.1732589
-.1428633
_cons |
1.289867
.0116069
111.13
0.000
1.267118
1.312616
-------------+---------------------------------------------------------------sigma_u | .32162515
sigma_e | .29745202
rho | .53898759
. xtivreg ln_wage age not_smsa (tenure = union south), re

G2SLS random-effects IV regression
Number of obs
Number of groups
=
=
19007
4134
R-sq:

avg =
max =
1
4.6
12
within = 0.0607
between = 0.1725
overall = 0.1192
corr(u_i, X)
= 0 (assumed)
Wald chi2(3)
Prob > chi2
=
=
929.08
0.0000
-----------------------------------------------------------------------------ln_wage |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------tenure |
.1768498
.0110283
16.04
0.000
.1552346
.1984649
age | -.0333235
.0030544
-10.91
0.000
-.03931
-.027337
not_smsa | -.2135208
.0129503
-16.49
0.000
-.2389029
-.1881386
_cons |
2.20578
.0581674
37.92
0.000
2.091774
2.319787
-------------+---------------------------------------------------------------sigma_u | .32796027
sigma_e | .64359089
rho | .20614163
-----------------------------------------------------------------------------Instrumented:
tenure
Instruments:
. xtivreg ln_wage age not_smsa (tenure = union south), fd

First-differenced IV regression
Group variable:
idcode
Time variable:
year
R-sq:
within = 0.1235
between = 0.2071
overall = 0.0892
corr(u_i, Xb)
= -0.4766
Number of obs
Number of groups
=
=
5934
3461

avg =
max =
1
4.3
11
Wald chi2(3)
Prob > chi2
=
=
5.83
0.1203
-----------------------------------------------------------------------------D.ln_wage |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------tenure |
D1. |
.1365949
.0778382
1.75
0.079
-.0159652
.289155
|
age |
D1. | -.0048762
.0135226
-0.36
0.718
-.03138
.0216277
|
not_smsa |
D1. | -.0633273
.0382332
-1.66
0.098
-.138263
.0116083
|
_cons | -.0694077
.0598777
-1.16
0.246
-.1867658
.0479503
-------------+---------------------------------------------------------------sigma_u | .53692033
sigma_e | .28615582
rho | .77878957
-----------------------------------------------------------------------------Instrumented:
tenure
Instruments:
Simultaneity involving only individual effects: the Hausman-Taylor case

Consider the model:
01 = 40 + 201 + F0 + 01
(1)
Where F0 are a set of individual specific (non-time varying) variables (e.g. education)
Partition 201 and F0 :
201 = (201 , 201 ), F0 = (F0 , F0 )

Where:
(40 |201 ) = 0, (40 |F0 ) = 0 201 , F0 are exogenous
(40 |201 ) = 0, (40 |F0 ) = 0 201 , F0 are endogenous

We must assume that
(01 |201 ) = 0, (01 |F0 ) = 0 for all 2- and F-variables
Identification condition: number of 201 > number of F0
Method: use 201 as instruments for F0
Hausman-Taylor IV Estimator
Uses exogenous time-varying regressors 201 from periods other than the current one as
instruments
One benefit of this approach is that it allows the estimation of a coefficient of a timeinvariant regressor in a fixed effects model (which is not possible using the standard FE
model)
Step 1: compute the within-group estimator for :
o Regress 01 0 on (201 20 ), which gives us the estimates (which are consistent
estimates of the parameters)
Step 2: construct within-group residuals and estimate PR
01 = 01 0 (201 20 )
-y
0? 1? 01

PDR =
(/( 1) n )
Step 3: estimate model for (0 = 0 20 :
(0 = + F0 + )(,93
o To do this, stack the group means of these residuals in a full sample length data vector
o Use as IVs i01 = (201 , F0 ) (which requires that the number of 2 s exceeds the number
of the number of F s
o This provides a consistent estimator of the s
Step 4: Construct (0 = 0 F0 20 ; estimate PQ from 01 and (0 . These form the
weights in the GLS (random effects) estimation
Step 5: Estimate (1) as a random effects model using as IVs i01 = (F0 , (201 20 ), (201
20 ), 20 )
This estimator was first proposed as a way of estimating wage regressions:
o Given that unobserved ability is omitted from the regression model random effects
estimation will suffer from an endogeneity bias
o Fixed effects estimation can eliminate this bias, but also prevents us from estimating
the coefficients on schooling as well as other time-invariant (dummy) variables
webuse psidextract, clear

regress lwage wks south smsa ms exp exp2 occ ind union fem blk ed, vce(cluster id)
Linear regression
Number of obs
F( 12,
594)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4165
65.91
0.0000
0.4286
.34936
(Std. Err. adjusted for 595 clusters in id)

-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------wks |
.0042161
.001542
2.73
0.006
.0011877
.0072444
south | -.0556374
.0261593
-2.13
0.034
-.1070134
-.0042614
smsa |
.1516671
.0241026
6.29
0.000
.1043303
.1990039
ms |
.0484485
.0409438
1.18
0.237
-.0319638
.1288608
exp |
.0401046
.0040764
9.84
0.000
.0320987
.0481106
exp2 | -.0006734
.0000913
-7.37
0.000
-.0008527
-.000494
occ | -.1400093
.0272428
-5.14
0.000
-.1935133
-.0865054
ind |
.0467886
.0236627
1.98
0.048
.0003159
.0932614
union |
.0926267
.0236719
3.91
0.000
.046136
.1391175
fem | -.3677852
.0455743
-8.07
0.000
-.4572917
-.2782788
blk | -.1669376
.0443291
-3.77
0.000
-.2539986
-.0798767
ed |
.0567042
.0055646
10.19
0.000
.0457756
.0676328
_cons |
5.251124
.1235461
42.50
0.000
5.008483
5.493764

Group variable: id
Number of obs
Number of groups
=
=
4165
595
R-sq:

avg =
max =
7
7.0
7
within = 0.6581
between = 0.0261
overall = 0.0461
corr(u_i, Xb)
= -0.9100
F(9,594)
Prob > F
=
=
377.62
0.0000

-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------wks |
.0008359
.0008658
0.97
0.335
-.0008644
.0025363
south | -.0018612
.0893013
-0.02
0.983
-.1772459
.1735235
smsa | -.0424691
.0294829
-1.44
0.150
-.1003726
.0154343
ms | -.0297259
.0268702
-1.11
0.269
-.0824979
.0230462
exp |
.1132083
.0040499
27.95
0.000
.1052543
.1211622
exp2 | -.0004184
.0000824
-5.07
0.000
-.0005803
-.0002564
occ | -.0214765
.0189947
-1.13
0.259
-.0587815
.0158285
ind |
.0192101
.0226818
0.85
0.397
-.0253361
.0637564
union |
.0327849
.0250658
1.31
0.191
-.0164436
.0820133
fem | (omitted)
blk | (omitted)
ed | (omitted)
_cons |
4.648767
.0780057
59.60
0.000
4.495567
4.801968
-------------+---------------------------------------------------------------sigma_u | 1.0338102
sigma_e | .15199444
rho | .97884144
. xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endog(exp exp2 occ ind union ed) constant(fem
blk ed)
Hausman-Taylor estimation
Group variable: id
Random effects u_i ~ i.i.d.
Number of obs
Number of groups
=
=
4165
595

avg =
max =
7
7
7
Wald chi2(12)
Prob > chi2
=
=
6874.89
0.0000
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------TVexogenous |
wks |
.000909
.0005988
1.52
0.129
-.0002647
.0020827
south |
.0071377
.032548
0.22
0.826
-.0566553
.0709306
smsa | -.0417623
.0194019
-2.15
0.031
-.0797893
-.0037352
ms |
-.036344
.0188576
-1.93
0.054
-.0733041
.0006161
TVendogenous |
exp |
.1129718
.0024697
45.74
0.000
.1081313
.1178122
exp2 | -.0004191
.0000546
-7.68
0.000
-.0005261
-.0003121
occ | -.0213946
.0137801
-1.55
0.121
-.048403
.0056139
ind |
.0188416
.0154404
1.22
0.222
-.011421
.0491043
union |
.0303548
.0148964
2.04
0.042
.0011583
.0595513
TIexogenous |
fem | -.1368468
.1272797
-1.08
0.282
-.3863104
.1126169
blk | -.2818287
.1766269
-1.60
0.111
-.628011
.0643536
TIendogenous |
ed |
.1405254
.0658715
2.13
0.033
.0114197
.2696311
|
_cons |
2.884418
.8527775
3.38
0.001
1.213004
4.555831
-------------+---------------------------------------------------------------sigma_u | .94172547
sigma_e | .15180273
rho | .97467381
-----------------------------------------------------------------------------Note: TV refers to time varying; TI refers to time invariant.
Binary Response Models with Panel Data

A Reminder
o Whenever the variable that we want to model is binary, it is natural to think in terms of
probabilities, examples include:
What is the probability that a firm with given characteristics exports?
If a female has a child what is the effect on the probability of being in the labour
force?
o Binary Response models can be estimated in a number of ways.
The most straightforward is to use the Linear Probability Model (LPM)
o In this type of model the probability of success (i.e. = 1) is a linear function of the
explanatory variables in the vector 2.
The model is estimated using linear regression techniques
e.g. the model can be estimated using OLS or the within groups estimator.
The particular linear estimator used in the case of panel data will depend on the
relationship between the observed explanatory variables and the unobserved
individual effects
Properties of the LPM

One undesirable property of the LPM is that we can get predicted "probabilities" either less
than zero or greater than one
A related problem is that, conceptually, it does not make sense to say that a probability is
linearly related to a continuous independent variable for all possible values. If it were, then
continually increasing this explanatory variable would eventually drive ( = 1|2 ) above
one or below zero
A third problem with the LPM, is that the residuals are heteroscedastic. The easiest way of
solving this problem is to obtain estimates of the standard errors that are robust to
heteroscedasticity
A fourth and related problem is that the residual is not normally distributed. This implies
that inference in small samples cannot be based on the usual normality-based distributions
such as the -test
Results from the LPM are easy to estimate and easy to interpret (e.g. the marginal effects
are simply given by the estimated coefficients)
Certain econometric problems are easier to address within the LPM framework than with
probit and logit models (e.g. when using instrumental variables whilst controlling for fixed
effects)
The two main problems with the LPM were: nonsense predictions are possible (there is
nothing to bind the value of 8 to the (0,1) range); and linearity doesnt make much sense
conceptually. To address these problems we can use a nonlinear binary response model
A general index model has the form:
( = 1|) = ()
for some (0,1). That is, 0 < () < 1. In most cases, () is a cumulative
distribution function for a continuous random variable with density H(). Then, () is
strictly increasing, and the estimates are easier to interpret.
The leading cases are:
o The Probit Model: (F) = (F) ; (=)= where (=) is the standard normal
density (=) = (2);/ exp (F /2)

o The Logit Model: (F) = (F) = exp(F) /1 + exp (F), i.e. the cumulative distribution
function for a standard logistic random variable
Such models can be estimated using Maximum Likelihood (ML) techniques
One can think about these models in terms of an underlying latent variable model
Estimating the effect of 2U on the probability of success ( = 1|) is complicated by the

non-linear nature of ()
o For continuous variables this can be calculated using calculus
If 2U is a roughly continuous variable, its partial effect on () = ( = 1|) is obtained
from
(2)
= H()U ,
2U

where H(F)
(F)
F
Suppose we estimate a probit modelling the probability that a firm does some exporting as a
function of firm size.
For simplicity, abstract from other explanatory variables. Our model is thus:
(export = 1|,9F() = ( + ,9F()
where size is defined as the natural logarithm of employment.
The probit results are:

Coefficient
t-statistic
-2.85
16.6
0.54
13.4
Since the coefficient on size is positive, we know that the marginal effect must be positive.
Treating size as a continuous variable, it follows that the marginal effect is equal to:
(export = 1|,9F()
= ( + ,9F()
,9F(
(export = 1|,9F()
= (2.85 + 0.54,9F()0.54
,9F(
where () is the standard normal density function:
(F) = (2);/ exp (F /2)
Assume that the mean value of log employment is 3.4 (i.e. 30 workers), we can then
evaluate the marginal effect at the mean (this is the partial effect at the average):
= H( + 2 + 2 + + [ 2[ )
H
We then have:
(export = 1|,9F( = 3.4)
1
=
exp( (2.85 + 0.54 3.4) /2) 0.54
,9F(
2
(export = 1|,9F( = 3.4)
= 0.13
,9F(
So, evaluated at log employment = 3.4, the results imply that an increase in log employment
by a small amount () raises the probability of exporting by 0.13
A second way of calculating the marginal effect is the average partial effect:
U =

;
"
0?
)U # =
H(
;
"
) U
H(
0?
With discrete variables we do not need to reply on calculus
Binary Choice Models for Panel Data

We now consider the estimation of logit and probit models in the case of a panel dataset,
where we allow for unobserved individual effects.
Using the latent variable framework, we can write the panel binary choice model as:
01 = 201 + 40 + 301
01 = l01 > 0
And
(01 = 1|201 , 40 ) = (201 + 40 )
Where () is either the standard normal CDF (probit) or the logistic CDF (logit)
In linear panel models it is easy to eliminate the individual effects (i.e. the 40 s) by first
differencing or by using the within groups transformation
This is not possible in this case because of the non-linear nature of the model (i.e. the ()
function)
If we attempt to estimate the 40 s through the inclusion of dummy variables in the probit
and logit specification we will get biased estimates of unless is large.
This is the incidental parameters problem
o With small the estimates of the 40 s are inconsistent (and increasing doesnt solve
this problem).
o Unlike in the linear model, the inconsistency of the 40 s has a knock-on effect in the
sense that the estimate of becomes insignificant too.
Example:
Consider the logit model in which = 2, is a scalar, and 201 is a time dummy such that
20 = 0, 20 = 1. Thus,
exp( 0 + 40 )
(01 = 1|20 , 40 ) =
( 0 + 40 )
1 + exp( 0 + 40 )
(01 = 1|20 , 40 ) =
exp( 1 + 40 )
( 1 + 40 )
(
)
1 + exp 1 + 40
Suppose we attempt to estimate this model with N dummy variables included to control for
the individual fixed effects. There would thus be + 1 parameters to estimate. It can be
shown that in this case the probability of our variable of interest, , is:
lim l = 2
That is, the probability limit of the logit dummy variable estimator for this admittedly very
special case is double the true value of . With a bias of 100% in very large (infinite)
samples, this is not a very useful approach. This form of inconsistency also holds in more
general cases: unless is large, the logit dummy variable estimator will not work.
So how can we proceed?

One possibility is to use the fixed effects or first differenced LPM
o Wooldridge (2010) argues that this may provide reasonable estimates of APEs and has
one or two useful properties
A further possibility is to employ a pooled probit or logit model
o Wooldridge (2010) notes problems with this approach, and notes that a robust variance
matrix is required to account for serial correlation
Three more common approaches:
o The traditional random effects (RE) probit (or logit) model
o The conditional fixed effects logit model
o The Mundlak-Chamberlain approach
The Traditional Random Effects (RE) Probit

Model:
01 = 201 + 40 + 301
01 = 101 > 0
And
(01 = 1|201 , 40 ) = (201 + 40 )
The key assumptions underlying this estimator are:

o The 40 and 201 are independent
o The 201 are strictly exogenous (this will be necessary for it to be possible to write the
likelihood of observing a given series of outcomes as the product of individual
likelihoods).
o 40 has a normal distribution with zero mean and variance PQ (note: homoscedasticity).
o 0 , , 0- are independent conditional on (20 , 40 ) this rules out serial correlation in
01 , conditional on (20 , 40 ). This assumption enables us to write the likelihood of
observing a given series of outcomes as the product of individual likelihoods. The
assumption can easily be relaxed (see Wooldridge, 2002).
These are restrictive assumptions, especially since endogeneity in the explanatory variables
is ruled out. The only advantage over a simple pooled probit model is that the RE model
allows for serial correlation in the unobserved factors determining 01 , i.e. in (40 + 301 )
However, it is fairly straightforward to extend the model and allow for correlation between
40 and 201 this is what the Mundlak-Chamberlain approach does (see below)
If 40 had been observed, the likelihood of observing individual 9 would have been:
-
(201 + 40 )y 1 (201 + 40 )(;y )

1?
and it would be straightforward to maximize the sample likelihood conditional on 201 , 40 and
01
Because the 40 are unobserved however, they cannot be included in the likelihood function.
As discussed above, a dummy variables approach cannot be used, unless is large.
What can we do?

We must make an additional assumption about the relationship between 40 and 20 , namely:
40 |20 ~Normal(0, PQ )
This is a strong assumption, as it implies that 40 and 20 are independent, and that 40 has a
normal distribution
The assumption that (40 ) = 0 is without loss of generality provided 201 contains an
intercept
With the above assumption we can integrate out the 40 from the likelihood function
Recall from basic statistics (Bayes theorem for probability densities) that, in general,
+| (2, ) =
+ (2, )
+ ()
where +| (2, ) is the conditional density of 7 given 8 = , + (2, ) is the joint

distribution of random variables 7, 8, and + () is the marginal density of 8 . Thus,
+ (2, ) = +| (2, )+ ()
The marginal density of 7 can be obtained by integrating out from the joint density
+ (2 ) = + (2, ) = +| (2, )+ ()
We can think about + (2 ) as a likelihood contribution. For a linear model, we might write:
+ ( ) = +RQ (, 4 )4 = +R|Q (, 4 )+Q (4)
Where 01 = 01 (201 + 40 )
In the context of the traditional RE probit, we integrate out 40 from the likelihood as follows:
}0 (0 , , 0- |20 , , 20- ; , PQ )
-
= (201 + 40 )y 1 (201 + 40 )(;y ) (1/PQ )(4/PQ ) 4

1?
Which can be maximised w.r.t. and PQ (or PQ )

In general, there is no analytical solution here, and so numerical methods have to be used.
The most common approach is to use a Gauss-Hermite quadrature method
To form the sample log likelihood, we simply compute weighted sums in this fashion for
each individual in the sample, and then add up all the individual likelihoods expressed in
natural logarithms:
"
log } = log }0 (0 , , 0- |20 , , 20- ; , PQ )

0?
This model can be estimated in Stata using the xtprobit command

There is a logit counterpart to this, but it is generally less desirable than the probit case
Since and PQ can be estimated, the partial effects at 4 = 0 as well as the APEs can be
estimated
Marginal effects at 40 = 0 can be computed using standard techniques, with the APE again a
useful effect to calculate
Since 40 ~Normal(0, PQ ), the APE for a continuous 21U is:
U /(1 + PQ )/ # 21 /(1 + PQ )/ #
Whilst perhaps elegant, the above model does not allow for a correlation between 40 and
the explanatory variables, and so does not achieve anything in terms of addressing an
endogeneity problem. We now turn to more useful models in that context.
The "Fixed Effects" Logit Model

Now return to the panel logit model:
(01 = 1|201 , 40 ) = (201 + 40 )
One important advantage of this model over the probit model is that it will be possible to
obtain a consistent estimator of without making any assumptions about how 40 is related
to 201 (however, we need strict exogeneity to hold).
This is possible, because the logit functional form enables us to eliminate 40 from the
estimating equation
What we do is find the joint distribution of 0 (0 , , 0- ) conditional on 20 , 40 and
/0 -1? 01
It turns out in the logit case that this conditional distribution does not depend upon 20 , so
that it is also the distribution of 0 given 20 and /0
To see this, assume T = 2, and consider the following conditional probabilities:

(0 = 0, 0 = 1|20 , 20 , 40 , 0 + 0 = 1)
The key thing to note here is that we condition on 0 + 0 = 1, i.e. that 01 changes
between the two time periods. For the logit functional form, we have:
(0 + 0 = 1|20 , 20 , 40 )
1
exp(20 + 40 )
1
exp(20 + 40 )
=
+
1 + exp(20 + 40 ) 1 + exp(20 + 40 ) 1 + exp(20 + 40 ) 1 + exp(20 + 40 )
Or simply:
( zQy )z(ys zQy )
yr zQy )z(ys zQy )
yr
(0 + 0 = 1|20 , 20 , 40 ) =
z(
Furthermore:
(0 = 0, 0 = 1|20 , 20 , 40 ) =
exp(20 + 40 )
1
1 + exp(20 + 40 ) 1 + exp(20 + 40 )
Hence conditional on 0 + 0 = 1:
exp(20 + 40 )
(0 = 0, 0 = 1|20 , 20 , 40 , 0 + 0 = 1) =
exp(20 + 40 ) + exp(20 + 40 )
(0 = 0, 0 = 1|20 , 20 , 40 , 0 + 0 = 1) =
exp(20 )
1 + exp(20 )
The key result is that the 40 have been eliminated. It follows that:
1
(0 = 1, 0 = 0|20 , 20 , 40 , 0 + 0 = 1) =
1 + exp(20 )
Notes:
o These probabilities condition on 0 + 0 = 1
o These probabilities are independent of 40
Hence, by maximizing the following conditional log likelihood function:

"
exp(20 )
1
log } = 0 ln
+ 0 ln
1 + exp(20 )
1 + exp(20 )
?
We obtain consistent estimates of , regardless of whether 40 and 201 are correlated

The trick is thus to condition the likelihood on the outcome series (0 , 0 ), and in the more
general case (0 , 0 , , 0- ). For example, if = 3, we can condition on 1 01 = 1, with
possible sequences {1,0,0|, {0,1,0| and {0,0,1|, or on 1 01 = 2 with possible sequences
{1,1,0|, {1,0,1| and {0,1,1|.
Stata does this for us, of course. This estimator is requested in Stata by using xtlogit with the
fe option.
Note that the logit functional form is crucial for it to be possible to eliminate the 40 in this
fashion. It wont be possible with probit. So this approach is not really very general.
Another awkward issue concerns the interpretation of the results. The estimation procedure
just outlined implies we do not obtain estimates of 40 , which means we cant compute
marginal effects.
o We cant estimate the partial effects on the response probabilities unless we plug in a
value for 4
o Because the distribution of 40 is unrestricted and in particular (40 ) is not necessarily
zero it is hard to know what to plug in for 4
o We cant estimate APEs, since doing so would require finding (21 + 40 ) - a task
that requires specifying a distribution for 40
Modelling the Random Effect as a Function of 2-variables

The previous two methods are useful, but:
o The traditional random effects probit/logit model requires strict exogeneity and zero
correlation between the explanatory variables and 40
o The fixed effects logit relaxes the latter assumption but we cant obtain consistent
estimates of 40 and hence we cant compute the conventional marginal effects in
general.
We will now discuss an approach which, in some ways, can be thought of as representing a
middle way. Start from the latent variable model:
01 = 201 + 40 + 301
01 = l01 > 0
Consider writing the 40 as an explicit function of the 2-variables (i.e. allowing for correlation
between 40 and 20 ), for example as follows:
40 = + 20 ] + 0
(1)
40 = + 20 ^ + 0
(2)
Or
where 2 is an average of 201 over time for individual 9 (hence time invariant); 20 contains 201
for all ; 0 is assumed uncorrelated with 20 ; 0 is assumed uncorrelated with 20 . Equation (1)
is easier to implement and so we will focus on this.
Assume that =)(0 ) = P is constant (i.e. there is homoscedasticity) and that 30 is

normally distributed - the model that then results is known as Chamberlains random effects
probit model.
The latent variable formulation can be written as:
01 = 201 + + 20 ] + 0 + 301
Equation (1) may be considered restrictive, in the sense that functional form assumptions
are made, but it at least allows for non-zero correlation between 40 and the regressors 201 .
The probability that 01 = 1 can now be written as:
(01 = 1|201 , 40 ) = (01 = 1|201 , 20 , 0 ) = (201 + + 20 ] + 0 )
We now see that, after having added 20 to the RHS, we arrive at the traditional random
effects probit model:
0 (0 , , 0- |20 , , 20- ; , P )
-
= (201 + + 20 ] + 0 )y
1?
1 (201 + + 20 ] + 0 )(;y ) (1/P ) (/P )
This can be estimated using standard RE probit software

Effectively, we are adding 20 as control variables to allow for some correlation between the
random effect 40 and the regressors.
If 201 contains time invariant variables, then clearly they will be collinear with their mean
values for individual 9, thus preventing separate identification of -coefficients on time
invariant variables.
Notice also that this model nests the simpler and more restrictive traditional random effects
probit: under the (easily testable) null hypothesis that ] = 0, the model reduces to the
traditional model discussed earlier.
We can easily compute marginal effects at the mean of 40 , since:

APEs can be evaluated using:
(40 ) = + (20 )]
"
; 201 + l + 20 ]
0?
Where the subscripts indicate that coefficients have been scaled by (1 + PQ )/
For a discrete variable, the above expression can be evaluated for two different values for 2
For a continuous variable, 2U , the APE can be evaluated by using the average across 9 of
U 201 + l + 20 ] to get the approximate APE of a one-unit increase in 2U
Linear Fixed Effects Model

. xtreg lfp kids lhinc educ black age agesq per1 per2 per3 per4 per5, robust fe
Group variable: id
Number of obs
Number of groups
=
=
28315
5663
R-sq:

avg =
max =
5
5.0
5
within = 0.0031
between = 0.0103
overall = 0.0091
corr(u_i, Xb)
= -0.0073
F(6,5662)
Prob > F
=
=
5.61
0.0000

-----------------------------------------------------------------------------|
Robust
lfp |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------kids | -.0388976
.0091682
-4.24
0.000
-.0568708
-.0209244
lhinc | -.0089439
.0045947
-1.95
0.052
-.0179513
.0000635
educ | (omitted)
black | (omitted)
age | (omitted)
agesq | (omitted)
per1 |
.0176797
.0048541
3.64
0.000
.0081637
.0271957
per2 |
.0133998
.0045121
2.97
0.003
.0045544
.0222453
per3 |
.0067844
.0039786
1.71
0.088
-.0010152
.014584
per4 |
.0053795
.0032723
1.64
0.100
-.0010354
.0117944
per5 | (omitted)
_cons |
.7913419
.0373148
21.21
0.000
.7181905
.8644933
-------------+---------------------------------------------------------------sigma_u | .42247488
sigma_e | .21363541
rho | .79636335
Pooled Probit MLE

. probit lfp kids lhinc
note: per5 omitted
Iteration 0:
log
Iteration 1:
log
Iteration 2:
log
Iteration 3:
log
educ black age agesq per1 per2 per3 per4 per5
because of
likelihood
likelihood
likelihood
likelihood
Probit regression
collinearity
= -17709.021
= -16561.609
= -16556.671
= -16556.671
Number of obs
LR chi2(10)
Prob > chi2
Pseudo R2
=
=
=
=
28315
2304.70
0.0000
0.0651
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------kids | -.1989144
.0074815
-26.59
0.000
-.2135779
-.1842509
lhinc | -.2110738
.0130489
-16.18
0.000
-.2366493
-.1854984
educ |
.0796863
.003201
24.89
0.000
.0734125
.0859601
black |
.2209396
.0334141
6.61
0.000
.1554492
.2864301
age |
.1449159
.0061536
23.55
0.000
.132855
.1569767
agesq | -.0019912
.0000756
-26.34
0.000
-.0021393
-.001843
per1 |
.0577767
.025249
2.29
0.022
.0082896
.1072637
per2 |
.0453522
.0252187
1.80
0.072
-.0040756
.0947799
per3 |
.0252589
.0251707
1.00
0.316
-.0240749
.0745926
per4 |
.0116797
.025157
0.46
0.642
-.0376272
.0609865
per5 | (omitted)
_cons | -1.122226
.1369621
-8.19
0.000
-1.390667
-.853785
. margins , dydx( kids lhinc)

Average marginal effects
Model VCE
: OIM
Number of obs
28315
Expression
: Pr(lfp), predict()
dy/dx w.r.t. : kids lhinc
-----------------------------------------------------------------------------|
Delta-method
|
dy/dx
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------kids | -.0660184
.002395
-27.56
0.000
-.0707126
-.0613242
lhinc |
-.070054
.0042791
-16.37
0.000
-.0784409
-.0616671
Chamberlains RE Probit Pooled MLE

. sort id period
. by id: egen kids_bar = mean(kids)
. by id: egen lhinc_bar = mean(lhinc)
. probit lfp kids lhinc kids_bar lhinc_bar
note: per5 omitted
Iteration 0:
log
Iteration 1:
log
Iteration 2:
log
Iteration 3:
log
because of
likelihood
likelihood
likelihood
likelihood
Probit regression
educ black age agesq per1 per2 per3 per4 per5
collinearity
= -17709.021
= -16521.245
= -16516.437
= -16516.436
Number of obs
LR chi2(12)
Prob > chi2
Pseudo R2
=
=
=
=
28315
2385.17
0.0000
0.0673
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------kids | -.1173749
.0372874
-3.15
0.002
-.1904569
-.044293
lhinc | -.0288098
.0248077
-1.16
0.246
-.077432
.0198125
kids_bar | -.0856913
.0380322
-2.25
0.024
-.160233
-.0111495
lhinc_bar | -.2501781
.0290625
-8.61
0.000
-.3071396
-.1932167
educ |
.0841338
.0032539
25.86
0.000
.0777562
.0905114
black |
.2030668
.0335069
6.06
0.000
.1373945
.268739
age |
.1516424
.0062081
24.43
0.000
.1394748
.1638101
agesq | -.0020672
.0000762
-27.13
0.000
-.0022166
-.0019179
per1 |
.0552425
.0252773
2.19
0.029
.0056999
.1047851
per2 |
.0416724
.0252544
1.65
0.099
-.0078254
.0911701
per3 |
.0220434
.0252037
0.87
0.382
-.027355
.0714417
per4 |
.0162108
.0251878
0.64
0.520
-.0331564
.0655779
per5 | (omitted)
_cons | -.7812987
.1426149
-5.48
0.000
-1.060819
-.5017785
. margins , dydx( kids lhinc)

Average marginal effects
Model VCE
: OIM
Number of obs
28315
Expression
: Pr(lfp), predict()
dy/dx w.r.t. : kids lhinc
-----------------------------------------------------------------------------|
Delta-method
|
dy/dx
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------kids |
-.038852
.0123363
-3.15
0.002
-.0630307
-.0146734
lhinc | -.0095363
.008211
-1.16
0.245
-.0256296
.0065571
. xtprobit lfp kids lhinc kids_bar lhinc_bar educ black age agesq per1 per2 per3
Random-effects probit regression
Group variable: id
Number of obs
Number of groups
=
=
28315
5663
Random effects u_i ~ Gaussian

avg =
max =
5
5.0
5
Log likelihood
= -8609.9002
Wald chi2(12)
Prob > chi2
=
=
623.40
0.0000
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------kids | -.3970102
.0701298
-5.66
0.000
-.534462
-.2595584
lhinc |
-.10034
.0469979
-2.13
0.033
-.1924541
-.0082258
kids_bar | -.4085664
.0898875
-4.55
0.000
-.5847427
-.2323901
lhinc_bar | -.8941069
.1199703
-7.45
0.000
-1.129244
-.6589695
educ |
.3189079
.024327
13.11
0.000
.2712279
.366588
black |
.6388783
.1903525
3.36
0.001
.2657943
1.011962
age |
.7282056
.0445623
16.34
0.000
.640865
.8155461
agesq | -.0098358
.0005747
-17.11
0.000
-.0109623
-.0087094
per1 |
.200357
.049539
4.04
0.000
.1032624
.2974515
per2 |
.1551917
.0499822
3.10
0.002
.0572284
.253155
per3 |
.0756514
.0499737
1.51
0.130
-.0222952
.173598
per4 |
.0646736
.049747
1.30
0.194
-.0328288
.1621759
per5 | (omitted)
_cons | -5.559732
1.000528
-5.56
0.000
-7.52073
-3.598733
-------------+---------------------------------------------------------------/lnsig2u |
2.947234
.0435842
2.861811
3.032657
-------------+---------------------------------------------------------------sigma_u |
4.364995
.0951224
4.182484
4.55547
rho |
.9501326
.002065
.945926
.9540279
-----------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) = 1.6e+04 Prob >= chibar2 = 0.000
P = 4.364995 = 19.05318
So, (1 + PD );/ = 0.22331
gen temp = normalden(_b[kids] * 0.22331 * kids + _b[lhinc] * 0.22331 * lhinc + _b[kids_bar] * 0.22331 * kids_bar +
_b[lhinc_bar] * 0.22331 * lhinc_bar + _b[educ] * 0.22331 * educ + _b[black] * 0.22331 * black + _b[age] * 0.22331 * age
+ _b[agesq] * 0.22331 * agesq + _b[per1] * 0.22331 * per1 + _b[per2] * 0.22331 * per2 + _b[per3] * 0.22331 * per3 +
_b[per4] * 0.22331 * per4 + _b[_cons] * 0.22331)
egen temp1 = mean(temp)
temp1 = 0.3250374 (this is the scale factor)
So, the APE for n9, is (0.3970102 0.22331) 0.3250374 = 0.0288
And the APE for 9/ is (0.10034 0.22331) 0.3250374 = 0.007283
Estimating the LPM by FE gives estimated coefficients of roughly -0.039 and -0.009 on the n9, and 9/ variables
respectively. That is, each child reduces the labour force participation by about 0.039%, while a 10% increase in a
husbands income lowers the probability by about 0.0009.
The APEs become much larger when we use the probit and assume that 40 is independent of 20
Using Chamberlains model gives results similar to the LPM

Not too much difference between pooled and full random effects Chamberlain model
. xtlogit lfp kids lhinc
educ black age agesq per1 per2 per3 per4 per5, fe
Conditional fixed-effects logistic regression

Group variable: id
Log likelihood
= -2003.4184
Number of obs
Number of groups
=
=
5275
1055

avg =
max =
5
5.0
5
LR chi2(6)
Prob > chi2
=
=
57.27
0.0000
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------kids | -.6438386
.1247828
-5.16
0.000
-.8884084
-.3992688
lhinc | -.1842911
.0826019
-2.23
0.026
-.3461878
-.0223943
educ | (omitted)
black | (omitted)
age | (omitted)
agesq | (omitted)
per1 |
.3563745
.0888354
4.01
0.000
.1822604
.5304886
per2 |
.2635706
.0886977
2.97
0.003
.0897262
.437415
per3 |
.1315756
.0880899
1.49
0.135
-.0410774
.3042286
per4 |
.1084422
.0879067
1.23
0.217
-.0638517
.2807361
per5 | (omitted)
Coefficient estimates are difficult to interpret
The relative size is 0.644 / 0.184 = 3.5, which is not too different from the ratio when using the pooled MLE Chamberlain model for
example
Dynamic Unobserved Effects Models

Dynamic models that also contain unobserved effects are important in testing theories and
evaluating policies
We have seen that using lagged dependent variables as explanatory variables complicates
the estimation of standard linear panel data models.
Conceptually, similar problems arise for nonlinear models, but since we dont rely on
differencing the steps involved for dealing with the problems are a little different.
Suppose we date our observations at = 0, so that 0 is the first observation on . For

= 1, , we are interested in the dynamic unobserved effects model:
01 = 10,1; , , 0 , F0 , 40 = F01 + 0,1; + 40
(1)
Where F01 is a vector of contemporaneous explanatory variables, F0 = (F0 , , F0- ) and is

the probit or logit function. In the case of the probit model we have:
01 = 10,1; , , 0 , F0 , 40 = F01 + 0,1; + 40
The F01 are assumed to be strictly exogenous (conditional on 40 )
The probability of success at time is allowed to depend on the outcome in 1 as well as
unobserved heterogeneity, 40 .
The unobserved effect 40 is correlated with 0,1; by definition
The coefficient is often referred to as the state dependence parameter. If 0, then the
outcome 0,1; influences the outcome in period t, 01
If =)(40 ) > 0, so that there is unobserved heterogeneity, we cannot use a pooled probit to
test J : = 0. The reason is that under =)(40 ) > 0, there will be serial correlation in the
01 .
We can write the density function as:

-
+( , , , - | , F, 4; ) = +(1 |1; , , , , F1 , 4; )

1?
+( , , , - | , F, 4; ) = -1? (F1 + 1; + 4 ) 1 (F1 + 1; + 4 ); (2)

Due to the presence of the unobserved effects it is not possible to construct a log-likelihood
function that can be used to estimate consistently.
Treating the 40 as parameters to be estimated i.e. including N individual dummies does
not result in consistent estimators of and
We need to integrate 40 out of the distribution
After integrating out the 40 the likelihood function in the dynamic probit model is:
0 (0 , , 0- |20 , , 20- ; , PQ )
-
= F01 + 0,1; + 40 #
1?
(;y )
1 F01 + 0,1; + 40 #
+ |Q,y (0 , F0 , 4)(1/PQ ) (/PQ )4
There remains an endogeneity problem. In + |Q,y (0 , F0 , 4) the regressor 0 is correlated

with the unobserved random effect.
o This is called the initial conditions problem.
As gets large the problem posed by the initial conditions problem becomes less serious
(since there will be a smaller weight on the problematic term), but with small it can cause
substantial bias.
How to deal with endogeneity?

o i.e. How do we treat the initial observations, 0 ?
Heckman (1981) suggests approximating the conditional density of 0 given (F0 , 40 ) and
then specifying a density for 40 given F0
o e.g. we might specify that 0 follows a probit model with success probability
(\ + F0 + V40 ) and specify the density of 40 given F0 as normal
Once these two densities are specified they can be multiplied by (2) and 4 can be integrated
out to approximate the density of (0 , 0 , 0 , , 0- ) given F0
An alternative approach suggested by Wooldridge (2005) is to obtain the joint distribution of

(0 , 0 , , 0- ) conditional on (0 , F0 ). This approach allows one to remain agnostic about
the distribution of 0 given (F0 , 40 )
If we can find the density of (0 , 0 , , 0- ) given (0 , F0 ) in terms of and other
parameters, then we can use standard conditional MLE methods
To obtain +( , , , - |0 , F0 ) we need to propose a density for 40 given (0 , F0 )
This approach is similar to Chamberlains in the static probit case with unobserved effects
(except that now we condition on 0 also)
Given a density (4 | , F, V), which depends on the parameters V, we have
+( , , , - | , F, ) = + (0 , 0 , , 0- | , F, 4; )(4 | , F, V)

;
The integral can be replaced by a weighted average if the distribution of 4 is discrete.

o When = in (1) a convenient choice for (4 | , F, V) is Normal( + ] 0 +
F], P ), which follows by writing 40 = + ] 0 + F] + 0 , where 0 ~&)k(0, P )
and independent of (0 , F0 )
Then we can write:

01 = 1 + F01 + 0,1; + ] 0 + F0 ] + 0 + (01 > 0#
So that 01 given 0,1; , , 0 , F0 , 40 follows a probit model and 0 given (0 , F0 ) is
distributed as Normal(0, P )
This gives a density in exactly the same form as that for conditional MLE with and P
replacing 4 and PQ
This means that we can use standard RE probit commands to estimate these dynamic
models
We simply expand the list of explanatory variables to include 0 and F0 in each time period
It is then simple to test whether = 0, meaning that there is no state dependence, once we
control for an unobserved effect
In estimating the dynamic model it is important to remember that it is not possible to obtain
consistent estimates of the parameters using pooled probit of 01 on 1, F01 , 0,1; , 0 , F0 .
o While 01 = 10,1; , , 0 , F0 , 40 = + F01 + 0,1; + ] 0 + F0 ] + 0 it is
not true that 01 = 10,1; , , 0 , F0 = + F01 + 0,1; + ] 0 + F0 ]

unless 0 is identically zero.
o Correlation between 0,1; and 0 means that 01 = 10,1; , , 0 , F0 does not
follow a probit model with index that depends on the scaled coefficients of interest
We can estimate Average Partial Effects, but we must now average out the initial condition
along with leads and lags of all strictly exogenous variables.
Let F1 and 1; be given values of the explanatory variables
Then the Average Structural Function:
(F1 + 1; + 40 ) = ( + F1 + 1; + ] 0 + F0 ] )
Can be consistently estimated as:
"
(F1 , 1; ) = ; l + F1 + D 1; + ] 0 + F0 ]

be
0?
Where the subscript denotes that the original coefficients have been multiplied by
(1 + PD );/ and l, , D, ] , ] and PD are the estimates reported by the statistical package
We can then take derivatives of this expression w.r.t. continuous elements of F1 , or take
differences with respect to discrete elements
A particularly interesting case is to alternatively set 1; = 1 and 1; = 0 and obtain the
change in probability that 01 = 1 when 1; goes from zero to one
To obtain a single APE we can also average across all time periods
Example: Dynamic Womens Labour Force Participation

To estimate a dynamic womens labour force participation equation using the method
described above we look to estimate the following model:
+01 = 1n9,01 , 9/01 , +0,1; , 40
We further include time-constant variables: n, (3, H(, H( and a full set of time
dummies
We include among the regressors: +0 , n9,0 through n9,0 and 9/0 through 9/0
. by id: gen lfp_0 = lfp[_n] if _n == 1

(22652 missing values generated)
. by id: replace lfp_0 = sum(lfp_0)
(22652 real changes made)
. by id: gen lhinc_1 = lhinc[_n] if _n ==
. by id: replace lhinc_1 = sum(lhinc_1)
. by id: gen kids_1 = kids[_n] if _n == 2
. by id: gen kids_2 =kids[_n] if _n == 3
. by id: replace kids_1 = sum(kids_1)
2
3
4
5
xtprobit lfp l.lfp lfp_0 kids kids_1 kids_2 kids_3 kids_4 lhinc lhinc_1 lhinc_2 lhinc_3 lhinc_4 educ black age agesq
per2 per3 per4 per5
Group variable: id
Number of obs
Number of groups
=
=
22652
5663

avg =
max =
4
4.0
4
Log likelihood
= -5039.9867
Wald chi2(19)
Prob > chi2
=
=
4093.48
0.0000
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------lfp |
L1. |
1.536826
.0665669
23.09
0.000
1.406358
1.667295
|
lfp_0 |
2.545303
.1556541
16.35
0.000
2.240227
2.85038
kids | -.3591127
.0646669
-5.55
0.000
-.4858574
-.2323679
kids_1 |
.2595521
.0665754
3.90
0.000
.1290668
.3900374
kids_2 |
.0206725
.0355721
0.58
0.561
-.0490476
.0903925
kids_3 |
.0024968
.0358429
0.07
0.944
-.0677541
.0727477
kids_4 |
.047252
.0366343
1.29
0.197
-.02455
.119054
lhinc | -.0745293
.0485506
-1.54
0.125
-.1696867
.0206281
lhinc_1 | -.0747431
.0531001
-1.41
0.159
-.1788173
.0293311
lhinc_2 | -.0080621
.0501491
-0.16
0.872
-.1063524
.0902283
lhinc_3 |
.0088362
.0511642
0.17
0.863
-.0914437
.1091162
lhinc_4 | -.1189348
.0610491
-1.95
0.051
-.2385888
.0007193
educ |
.0459694
.0098917
4.65
0.000
.026582
.0653569
black |
.1281378
.0984119
1.30
0.193
-.0647459
.3210216
age |
.1383024
.019357
7.14
0.000
.1003634
.1762414
agesq | -.0017838
.0002402
-7.43
0.000
-.0022545
-.0013131
per2 | -.7521025
.5635128
-1.33
0.182
-1.856567
.3523623
per3 | -.7700739
.5206839
-1.48
0.139
-1.790596
.2504478
per4 | -.8158966
.4836915
-1.69
0.092
-1.763915
.1321213
per5 | (omitted)
_cons | -2.818611
.5587894
-5.04
0.000
-3.913818
-1.723403
-------------+----------------------------------------------------------------
/lnsig2u |
.1151956
.1209124
-.1217884
.3521796
-------------+---------------------------------------------------------------sigma_u |
1.059289
.0640406
.9409228
1.192545
rho |
.5287671
.030128
.4695905
.587146
-----------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) =
164.94 Prob >= chibar2 = 0.000
From the above PD = 1.059 = 1.122093
So, (1 + PD );/ = 0.686464
gen temp = normalden(_b[lfp_0]*0.686464*lfp_0 + _b[kids]*0.686464*kids + _b[kids_1]*0.686464*kids_1 +
_b[kids_2]*0.686464*kids_2 + _b[kids_3]*0.686464*kids_3 + _b[kids_4]*0.686464*kids_4 + _b[lhinc]*0.686464*lhinc +
_b[lhinc_1]*0.686464*lhinc_1 + _b[lhinc_2]*0.686464*lhinc_2 + _b[lhinc_3]*0.686464*lhinc_3 +
_b[lhinc_4]*0.686464*lhinc_4 + _b[educ]*0.686464*educ + _b[black]*0.686464*black + _b[age]*0.686464*age +
_b[agesq]*0.686464*agesq + _b[per2]*0.686464*per2 + _b[per3]*0.686464*per3 + _b[per4]*0.686464*per4 +
_b[_cons]*0.686464 + _b[L1.]*0.686464) - normalden(_b[lfp_0]*0.686464*lfp_0 + _b[kids]*0.686464*kids +
_b[kids_1]*0.686464*kids_1 + _b[kids_2]*0.686464*kids_2 + _b[kids_3]*0.686464*kids_3 + _b[kids_4]*0.686464*kids_4 +
_b[lhinc]*0.686464*lhinc + _b[lhinc_1]*0.686464*lhinc_1 + _b[lhinc_2]*0.686464*lhinc_2 + _b[lhinc_3]*0.686464*lhinc_3 +
_b[lhinc_4]*0.686464*lhinc_4 + _b[educ]*0.686464*educ + _b[black]*0.686464*black + _b[age]*0.686464*age +
_b[agesq]*0.686464*agesq + _b[per2]*0.686464*per2 + _b[per3]*0.686464*per3 + _b[per4]*0.686464*per4 +
_b[_cons]*1.45674)
. summarize temp
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------temp |
28315
.0830207
.2166758 -.3857821
.3968095
Averaged across all women and all time periods the probability of being in the labour force
at time is about 0.08 higher if the woman was in the labour force at time 1
It is instructive to compare the APE with the estimate of a dynamic probit model that ignores 40
. xtprobit lfp l.lfp
kids lhinc
educ black age agesq per1 per2 per3 per4 per5, re

Group variable: id
Number of obs
Number of groups
=
=
22652
5663

avg =
max =
4
4.0
4
Log likelihood
-5332.529
Wald chi2(10)
Prob > chi2
=
=
12071.51
0.0000
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------lfp |
L1. |
2.875683
.0269811
106.58
0.000
2.822801
2.928565
|
kids | -.0607933
.012217
-4.98
0.000
-.0847381
-.0368484
lhinc | -.1143188
.0211668
-5.40
0.000
-.1558051
-.0728325
educ |
.0291874
.0052362
5.57
0.000
.0189246
.0394501
black |
.079251
.0536696
1.48
0.140
-.0259396
.1844415
age |
.084404
.0099983
8.44
0.000
.0648076
.1040004
agesq | -.0010991
.0001236
-8.90
0.000
-.0013413
-.000857
per1 | (omitted)
per2 |
.0304145
.037152
0.82
0.413
-.042402
.103231
per3 | -.0036646
.0369207
-0.10
0.921
-.0760278
.0686986
per4 |
.0326971
.0371438
0.88
0.379
-.0401035
.1054977
per5 | (omitted)
_cons | -2.201223
.2218053
-9.92
0.000
-2.635954
-1.766493
-------------+---------------------------------------------------------------/lnsig2u | -15.70567
14.44481
-44.01697
12.60564
-------------+---------------------------------------------------------------sigma_u |
.0003886
.002807
2.77e-10
546.109
rho |
1.51e-07
2.18e-06
7.65e-20
.9999966
-----------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) =
0.00 Prob >= chibar2 = 1.000
gen temp = normprob(_b[kids]*kids + _b[lhinc]*lhinc + _b[educ]*educ + _b[black]*black + _b[age]*age + _b[agesq]*agesq

+ _b[per2]*per2 + _b[per3]*per3 + _b[per4]*per4 + _b[_cons] + _b[L1.]) - normprob(_b[kids]*kids + _b[lhinc]*lhinc +
_b[educ]*educ + _b[black]*black + _b[age]*age + _b[agesq]*agesq + _b[per2]*per2 + _b[per3]*per3 + _b[per4]*per4 +
_b[_cons])
. summarize temp
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------temp |
28315
.8375263
.0121556
.6019519
.849521
The APE for state dependence is much higher in this case than when heterogeneity is
controlled for.
o Averaged across all women and all time periods the probability of being in the labour
force at time is about 0.84 higher if the woman was in the labour force at time 1
Therefore, much of the persistence in labour force participation of married women is
accounted for by unobserved heterogeneity.
There is some state dependence, but its value is much smaller than a simple dynamic probit
indicates.

Lec06 - Panel Data

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Lec06 - Panel Data

Caricato da

Copyright:

Formati disponibili

Lecture 6

Panel Data Econometrics

What are Panel Data?

Examples of Panel Data

Common features of Panel Data:

Why use panel data methods?

But there are limits to the benefits of panel data:

Advantages of panel estimation methods

Disadvantages of panel estimation methods

Testing for Structural Change across Time

Policy Analysis with Pooled Cross-Sections

where 2 is a dummy for period two.

The difference-in-differences (DD) can be written as:

In other words,  represents the difference in the changes over time.

Can refine the definition of treatment and control groups.

 = ,!, ,!,  ,", ,", # ,!, ,!,  ,", ,", #

The OLS estimate  is

Where measures the effect of the policy

Without other factors in the regression  will be the difference-in-difference estimator:

where the bar denotes the average

Individual Effects in Panel Data

Panel Data Methods

For 9 = 1, . . . , individuals over = 1, . . . , periods

Bias from Ignoring Fixed Effects

Accounting for Fixed Effects

01; = 40 + 201;  + 301;

Taking first differences gives us:

First differencing tends to introduce a negative correlation across observations since:

The Within-Groups (WG) estimator

The Within Groups Estimator

Drawback with the Within-Groups estimator

The Least Squares Dummy Variable (LSDV) Model

01 = 40 + 201  + 301

01 = 4 01 + 4 01 + + 4" "01 + 201  + 301

A test for individual effects:

Random Effects Models

. regress lwage educ exper expersq black married hisp, vce(robust)

(Std. Err. adjusted for 545 clusters in nr)

(Std. Err. adjusted for 545 clusters in nr)

Obs per group: min

(Std. Err. adjusted for 545 clusters in nr)

Similarly, a random effects model can be estimated as:

lwage educ exper expersq married black hisp, re vce(robust)

Random-effects GLS regression

Obs per group: min =

(Std. Err. adjusted for 545 clusters in nr)

Fixed effects can also be eliminated by differencing.

d.expersq d.black d.married d.hisp, vce(robust)

The Two-Way Fixed Effects Model

The two-way panel model can be written as

01 = 4 + L0 + W1 + 201  + 301

We can then define:

o The average effect: 4 = 4 = "- 0 1 401

o The individual effect: 4 + L0 = 40 = - 1 401

o The time effect: 4 + W1 = 41 = " 0 401

Using these we can write 401 40 41 + 4 = 0

Seemingly Unrelated Regression

Mixed Models or Random Coefficient Models

In particular we may assume that:

&=( 0 , 0 ) = ^

Since the variance of ]01 depends on 201 there is heteroscedasticity

Example of Snijders and Bosker

. xtmixed langpost iqvc || schoolnr: iqvc, mle covariance(unstructured)

Log likelihood = -7615.3887

In other words, represents the difference in the changes over time.

= ,!, ,!, ,", ,", # ,!, ,!, ,", ,", #

The OLS estimate is

Without other factors in the regression will be the difference-in-difference estimator:

01; = 40 + 201; + 301;

01 = 40 + 201 + 301

01 = 4 01 + 4 01 + + 4" "01 + 201 + 301

01 = 4 + L0 + W1 + 201 + 301

o The average effect: 4 = 4 = "- 0 1 401

o The individual effect: 4 + L0 = 40 = - 1 401

o The time effect: 4 + W1 = 41 = " 0 401

Using these we can write 401 40 41 + 4 = 0

&=( 0 , 0 ) = ^

o Test statistic: W = (-;) pq 1t , distributed as a m statistic with 1 degree of

01 = V + V D01 + + Vu D01 + v01

01 , and the within individual variances, ,0 =

o Calculate the Bartlett statistic, =

01 = V 0,1; + V20,1; + V40 + V0,1; + 201 + 40 + 01

01 = V V0,1; + 20,1; + 40 + 0,1; + V20,1; + V40 + V0,1; + 201 + 40 + 01

01 = V V0,1; + V 20,1; + V 40 + V 0,1; + V20,1; + V40 + V0,1; + 201 + 40 + 01

01 = V S 40 + V S 20,1;S + 0,1;S + V 1 0

01; is a function of 0,1; , , 0,

01; 0 is correlated with 01 0

o Solve to get the MM estimator: L = 0? 0

t=3: 0 = 20 + V0, + 0 , instruments: F0 = 0 , 20

t=4: 0 = 20 + V0, + 0 , instruments: F0 = F0 , 0 , 20

(2()01 = V(40 + (30 + +(k(0 + H(01 + (2()01 + 01 ) + =01

(2()01 = V(40 + (30 + +(k(0 + H(01 + 01 ) + =01 /(1 V )

&=((2()01 , 01 ) = VPR /(1 V )

&=((30 , 01 ) = 0

Obtain a set of instruments i01 (at least as many as in 201 )

The within-groups IV estimator then uses (i01 i0 ) as instruments