Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Lecture 15
Panel Data Models
1
RS-15
Cross section
y11 y 21 yi1 y N1
y y 22 yi 2 y N 2
Time
12
series
y1t y 2t yit y Nt
y1T y 2T yiT y NT
A standard panel data set model stacks the yis and the xis:
y = X + c +
X is a iTixk matrix
is a kx1 matrix
c is iTix1 matrix, associated with unobservable variables.
y and are iTix1 matrices
2
RS-15
3
RS-15
4
RS-15
5
RS-15
Note that if the Xjs are so comprehensive that they capture all the
relevant characteristics of the individual, there will be no relevant
unobserved characteristics.
=> Now, c can be dropped. Pooled OLS may be used, treating
all the observations for all of the time periods as a single sample.
6
RS-15
7
RS-15
8
RS-15
Long history: Rao (1965) and Chow (1975) worked on these models.
Compact Notation
Compact Notation: yi = X i + c + i
Xi is a Tixk matrix
is a kx1 matrix
cis a Tix1 matrix
yi and i are Tix1 matrices
9
RS-15
Rank of matrices. X must have full column rank. (Xi may not, if Ti <
K.)
Strict exogeneity and dynamics. If xit contains yi,t-1 then xit cannot be
strictly exogenous. Xit will be correlated with the unobservables in
period t-1. Inconsistent OLS estimates! (To be revisited later.)
10
RS-15
11
RS-15
Simple idea: Aggregate over the clusters. The key is how to cluster.
12
RS-15
where w2ig,jg = E[ig jg|Xg] is the error covariance for the igth and jgth
observations. We use S0 to calculate clustered White SE.
13
RS-15
These new robust SE are generally called Panel corrected SEs (PCSE).
Though, technically speacking, they are HAC and cluster-robust SE.
Pooled Model
General DGP yit = xit + c + it & (A2)-(A4) apply.
14
RS-15
Pooled Model
We can re-write the pooled equation model as:
y = X* * + , X* = [X ] - iTix(k+1) matrix:
* = [ ] - (k+1)x1 matrix
15
RS-15
k
yit yi (x
j 2
j jit x ji ) it i
16
RS-15
y i 1
j2
j x ji i
17
RS-15
Interpretation:
- Within group variation: Measures variation of individuals over time.
- Between group variation: Measures variation of the means across
individuals.
18
RS-15
Driscoll and Kraay (1998): Average the xtet over the clusters. Then,
the KxK sandwich matrix is defined as
^ T N (t )
X ' X ht (b )ht j (b )'
t j 1
with ht ( b ) h
i 1
it (b )
19
RS-15
20
RS-15
21
RS-15
Practical rules
- If aggregate variables (say, by industry, or zip code) are used in the
model, clustering should be done at that level.
- When the data correlates in more than one way, we have two cases:
- If nested (say, city and state), cluster at highest level of aggregation
- If not nested (e.g., time and industry), use multi-level clustering.
22
RS-15
23
RS-15
24
RS-15
For FGLS, use the pooled OLS residuals ei and ej to estimate the
covariance ij. Note that
1 T 1
et et ' E ' E
T t 1 T
where E is a TxN matrix and et=[e1t e2t ... eNt ]' is Nx1 vector. We need
to invert (NxN matrix).
Note: In general, the rank(E) T. Then, rank( ) T < N =>
singularity, FGLS cannot be computed. This is a problem of the data,
not the model.
25
RS-15
26
RS-15
Why? The error is not longer it, but uit. The Var[u] is given by:
i,2 i,1 2 2 2 0 0
2
i,3 i,2 2 2 2
Var (Toeplitz form)
0 2 2
i,T i,T 1 0 2 2 2
i i
27
RS-15
1 y | treatment - y | control
= "difference in differences" estimator.
0 Average change in y i for the "treated"
Finding a good exogenous event may help in the study of the effect of
Xt on yt. There are a lot of natural, exogenous events: asteroids hitting
the Pacific Ocean, earthquakes in New Zealand, etc.
But, few of them have an effect on yt, the financial variables we are
studying (say, analysts forecasting skills).
28
RS-15
Now, we have two periods: Before and after the natural experiment.
29
RS-15
H0: 1=0.
Intuition: Suppose that within s,t groups the errors are perfectly
correlated. Then, we only have S*T independent observations!
30
RS-15
31
RS-15
32
RS-15
If individuals do not change much (or at all) across time, a FEM may
not work very well. We need within-individuals variability in the
variables if we are to use individuals as their own controls.
33
RS-15
Stacking y1 X1 d1 0 0 0
y2 X2 0 d2 0 0
y X 0 0 0
dN
N N
= [X, D]
= Z
FEM: Estimation
The FEM is the CLM, but with many independent variables: k+N.
34
RS-15
FEM: Estimation
M1D 0 0
2
0 M 0
MD D
(The dummy variables are orthogonal)
0 0 MND
M I Ti di ( di d i ) 1 d = I Ti (1/Ti ) d i d
i
D
X MD X = Ni=1 X iMDi X i , X M X i
i
D i k,l
i
T
t=1 (x it,k -x i.,k )(x it,l -x i.,l )
X MD y = Ni=1 X iMDi y i , X M y
i
i
D i k
i
T
t=1 (x it,k -x i.,k )(y it -y i. )
That is, we subtract the group mean from each individual observation.
Then, the individual effects disappear. Now, OLS can easily be used to
estimate the k parameters, using the demeaned data.
We know this method to estimate the FEM: The within-groups
estimation.
35
RS-15
36
RS-15
Note:
- This is simple algebra the estimator is just OLS
- Again, LS is an estimator, not a model. This particular OLS
estimator is called LSDV estimator.
- Note what ai is when Ti=1. Follow this with yit-ai-xitb=0 if Ti=1.
37
RS-15
k
Y it
j2
j X jit it it 1
FD Estimator
Each variable is differenced once over time, so we are effectively
estimating the relationship between changes of variables.
38
RS-15
2
Ti
Ni=1 t=1 (y it -ai -x itb)2
N
i=1 Ti - N - K
(Note the degrees of freedom correction)
39
RS-15
FEM: PCSE
All previous comments and remarks apply to the FEM.
Different tests:
F-test based on the LSDV dummy variable model: constant or
zero coefficients for D. Test follows an F(N-1,NT-N-K) distribution.
F-test based on FEM (the unrestricted model) vs. pooled model
(the restricted model). Test follows an F(N-1,NT-N-K) distribution.
A LR can also be done usually, assuming normality. Test follows
a 2N-1 distribution.
40
RS-15
41
RS-15
N+K
42
RS-15
Pooled
FEM
Conditions:
(1)It is possible to treat each of the unobserved Zp variables as being
drawn randomly from a given distribution.
(2) The Zp variables are distributed independently of all of the Xj
variables.
43
RS-15
Note: We would have to use the FEM, even if the first condition seems
to be satisfied.
If (1) and (2) are satisfied, we can use the REM, but there is a
complication. Now, wit will be heterosccedastic.
44
RS-15
y1 X1 1 u1i1 T1 observations
y X
2 2 2 u2i2 T2 observations
yN XN N uNiN TN observations
= X++u Ni=1 Ti observations
= X+w
In all that follows, except where explicitly noted, X, X i
and xit contain a constant term as the first element.
To avoid notational clutter, in those cases, x it etc. will
simply denote the counterpart without the constant term.
Use of the symbol K for the number of variables will thus
be context specific but will usually include the constant term.
45
RS-15
= 2 I Ti u2ii Ti Ti
= 2 I Ti u2ii
= i
1 0 0
0 2 0 (Note these differ only
Var[ w | X ]
in the dimension Ti )
0 0 N
X X X X
N
Ni1 fi i i a weighted sum of individual moment matrices
i1 T Ti
X X X X
N
Ni1 fi i i i a weighted sum of individual moment matrices
i1 T Ti
X i X i
= 2 Ni1 fi u2 Ni1 fi x i x i
Ti
X i X i
Note asymptotics are with respect to N. Each matrix is the
Ti
moments for the Ti observations. Should be 'well behaved' in micro
level data. The average of N such matrices should be likewise.
T or Ti is assumed to be fixed (and small).
46
RS-15
We can use pooled OLS, but for inferences we need the true
variance i.e., the sandwich estimator:
1 1
1 XX XX XX
Var[b | X] N N
i1 Ti i1 Ti Ni1 Ti Ni1 Ti
0 Q-1 Q * Q-1
0 as N with our convergence assumptions
X X X X
Ni 1 f i i i i , w h e re = i = E [ w i w i | X i ]
Ni 1 T Ti
In th e s p irit o f th e W h ite e s tim a to r, u s e
X X X w
w X
i = y i - X ib
Ni 1 f i i i i i , w
Ni 1 T Ti
H y p o th e s is te s ts a re th e n b a s e d o n W a ld s ta tis tic s .
T H I S I S T H E 'C L U S T E R ' E S T I M A T O R
47
RS-15
Est.Var[b | X ] X X
1
N
i 1 X i w
i X i X X
iw
1
REM: GLS
Standard results for GLS in a GR model
- Consistent
- Unbiased
- Efficient (if functional form for correct)
= [ X
-1
X ] 1 [ X -1
y ]
= [ N
i 1 X i -1
i X i] 1
[ N
i 1 X i -1
i y i]
1 2
-1
i 2
I Ti 2 2
ii
T i u
( n o t e , d e p e n d s o n i o n ly th r o u g h T i )
48
RS-15
REM: GLS
The matrix -1/2 = P is used to transform the data. That is,
y it y i x it x i it
where i 1 2 2
T i u2
Asy .Var [ GLS ] ( X ' 1
X ) 1 2 ( X *' X *) 1
Then, the bigger (smaller) the variance of the unobserved effect i.e.,
individual heterogeneity is bigger-, the closer it is to FE (pooled OLS).
Also, when T is large, it becomes more like FE
49
RS-15
Ni1 Ti
2 Ni1 tTi 1 (y it aOLS x it b OLS ) 2 Ni1 tTi 1 (y it ai x it bLSDV ) 2
u 0
Ni1 Ti
50
RS-15
----------------------------------------------------------------------
Least Squares with Group Dummy Variables..........
LHS=LWAGE Mean = 6.67635
Residuals Sum of squares = 82.34912
Standard error of e = .15205
These 2 variables have no within group variation.
FEM ED
F.E. estimates are based on a generalized inverse.
--------+-------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
--------+-------------------------------------------------------------
EXP| .11346*** .00247 45.982 .0000 19.8538
EXPSQ| -.00042*** .544864D-04 -7.789 .0000 514.405
OCC| -.02106 .01373 -1.534 .1251 .51116
SMSA| -.04209** .01934 -2.177 .0295 .65378
MS| -.02915 .01897 -1.536 .1245 .81441
FEM| .000 ......(Fixed Parameter).......
UNION| .03413** .01491 2.290 .0220 .36399
ED| .000 ......(Fixed Parameter).......
--------+-------------------------------------------------------------
51
RS-15
B r e u s c h a n d P a g a n L a g r a n g e M u lt ip lie r s t a t is t ic
A s s u m in g n o r m a lit y ( a n d f o r c o n v e n ie n c e n o w , a
b a la n c e d p a n e l)
2 2
NT Ni 1 ( T e i2 ) NT Ni 1 [ ( T e i2 ) e i e i ]
LM = N 1
N 2
2 ( T - 1 ) i 1 i 1 e it 2 (T -1 ) Ni 1 e i e i
C o n v e r g e s t o c h i- s q u a r e d [ 1 ] u n d e r t h e n u ll h y p o t h e s is
o f n o c o m m o n e f f e c t s . ( F o r u n b a la n c e d p a n e ls , t h e
s c a le in f r o n t b e c o m e s ( Ni 1 T i ) 2 / [ 2 Ni 1 T i ( T i 1 ) ] . )
52
RS-15
53
RS-15
FE vs. RE
Q: RE estimation or FE estimation?
Case for RE:
- Under no omitted variables or if the omitted variables are
uncorrelated with the Xit in the model then a REM is probably best:
It produces unbiased estimates of the coefficients, is efficient, uses all
the data available, and produces the smallest standard errors.
- RE can deal with observed characteristics that remain constant for
each individual. In FE, they have to be dropped from model.
- In contrast with FE, RE estimates a small number of parameters
- We do not lose N degrees of freedom.
- Philosophically speaking, a REM is more attractive: Why should we
assume one set of unobservables fixed and the other random?
54
RS-15
FE vs. RE
Case against RE:
- If either of the conditions for using RE is violated, we should use FE.
FE vs. RE
FE estimation is always consistent. On the other hand, a violation of
condition (2) causes inconsistency in the RE estimation.
That is, if there are omitted variables, which are correlated with the Xit
in the model, then the FEM provides a way for controlling for omitted
variable bias. In a FEM, individuals serve as their own controls.
55
RS-15
-
=
Wald Criterion: q ;W =q
[Var( q
)]-1 q
FE RE
- ]
nT [ d
N[0 ,VFE ] (inefficient)
FE
- )-(
= (
Note: q ). The lemma states that in the
FE RE
56
RS-15
invariant variables in X.
Note: Pooled OLS is consistent, but inefficient under H0. Then, the
RE estimation is GLS.
57
RS-15
Since the errors and the unobserved effect may not be i.i.d. white noise,
Wooldridge (2009) suggests using PCSE.
58
RS-15
--> matr;bm=b(1:8);vm=varb(1:8,1:8)$
--> matr;list;wutest=bm'<vm>bm$
59
RS-15
Now, you cannot reject the REM at the 5% level. Here you can say,
after accounting for cross-sectional and temporal dependence, the
Hausman test indicates that the coefficient estimates from pooled OLS
estimation are consistent.
60
RS-15
61
RS-15
Measurement Error
62
RS-15
Heteroskedasticity - Review
Given that there is a cross-section component to panel data, there will
always be a potential for heteroskedasticity.
Autocorrelation - Review
Although different to autocorrelation using the usual univariate
models, a version of the Breusch-Pagan LM test can be used.
To deal with autocorrelated errors, we can use the usual methods, say
pseudo-differencing. In general, we will estimate using the LSDV
residuals.
As usual, OLS plus NWs PCSE can help you to avoid a complicated
FGLS estimation. (The usual problems with HAC SE apply.)
63
RS-15
PCSE - Review
With only heteroskedasticity, it is usually Whites PCSE (Rogers SE)
that are used. It is convenient to work with group mean deviation form!
In general, we find that PCSE tend to be bigger than the usual OLS
SE. The bigger the cross-sectional correlation, the bigger the SE. That
is, NW SE tend to be smaller than Driscoll and Kraay SE.
PCSE - Review
Key Assumption
Correlations within a cluster (a group of firms, a region, different
years for the same firm, different years for the same region) are
the same for different observations.
Procedure
(1) Identify clusters using economic theory (clustered by industry,
year, industry and year)
(2) Calculate clustered standard errors
(3) Try different ways of defining clusters and see how the
estimated SE are affected. Be conservative, report largest SE.
Performance
Not a lot of studies some simulations done for simple DGPs.
PCSEs coverage rates are not very good (typically below their
nominal size).
64
RS-15
65
RS-15
Criticisms:
- Angrist and Pichke (2009): Assumptions are not always plausible.
- Allison (2009)
- Bollen and Brand (2010): Hard to compare models.
66
RS-15
General remarks:
- Ignoring dynamics i.e., lags not a good idea: omitted variables
problem.
- It is important to think carefully about dynamic processes:
How long does it take things to unfold?
What lags does it make sense to include?
With huge datasets, we can just throw lots in
With smaller datasets, it is important to think things
through.
y it x it Yit c i it
X = exogenous covariates
Y = other endogenous covariates (may be related to it)
ci = unobserved unit-specific characteristic
it = idiosyncratic error
Treat ci as random, fixed, or use differencing to wipe it out
Use contemporaneous or lagged X and (appropriate) lags of Y as
instruments in two-stage estimation of yit.
67
RS-15
68
RS-15
69
RS-15
Strategies:
Tests for unit roots in time series & panel data
Differencing as a solution
A reason to try FD models.
70
RS-15
71