Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
University of Wisconsin Press and The Board of Regents of the University of Wisconsin System are
collaborating with JSTOR to digitize, preserve and extend access to The Journal of Human Resources.
http://www.jstor.org
Limited DependentVariableModels
Using Panel Data
G. S. Maddala
ABSTRACT
I. Introduction
The present paper reviews some problemsthat arise in the
analysis of panel data when the dependent variablesare truncated,censored, or qualitative. The paper discusses panel logit, panel probit, and
panel tobit models with fixed and randomeffects. Hazardrate models and
durationmodels are excluded(for these models, see Amemiya1985,Chapter 11, and Heckman 1984). The purpose of this paper is to present an
overview and comparativeassessmentof the panel logit, probit, and tobit
models which would aid empiricalresearchersin this area in choosingthe
appropriatemodel. For a thoroughdiscussionof particularmodels, readers
can refer to the several papers by Chamberlain,Heckman, and others
referredto at the end of the paper.
Before we proceedwith our discussionof the limiteddependentvariable
The authoris a professorof economicsat the Universityof Florida.He gratefullyacknowledgesfinancialsupportfrom the NationalScienceFoundationand thanksJamesD. Adams
for helpfulcomments.He claimsresponsibilityfor any errors.
THE JOURNAL
OF HUMAN RESOURCES
* XXII * 3
Yit=
i+
Xi
i = 1,2,...,N
t =1,2,...,
whereyitis the output and xi the vector of inputsfor the ith firmin the tth
period, ci captures the firm specific unobserved inputs assumed to be
constantover time, and uitis the errorterm. We assume uit - IID(O,o2).
Thismodelcan be estimatedby includingN interceptdummyvariablesor by
differencingout the ai's.
The next importantstep was the model with randomeffects by Balestra
andNerlove (1966)whereOaiin (1) wastreatedas a randomvariablejustlike
uit.The model they consideredwas a dynamicmodel whereYi,t-1is used as
an explanatoryvariable.Since the introductionof i,t- createssome problems we discusslater, we will dropit for the present.Denotingyi = 1/Tt yit
and y = 1/N ii we can decompose the total sum of squares Ty = Si,t (Yit - t
+ - i,t(
Y
( t
) Yi
i,yyi2 i
y)2 into two componentsas Tyy=
=
y)2
Wyy + Byy. Wyymeasureswithin groupvariationand Byy measures
between group variation in y. Using a similar decompositionfor all the
variancesandcovarianceswe get the estimatorof 3from(1) as B = Wx,lWx
- Yi) This is known as the "within group
i)(yit
estimator." Assuming aoi - IID(0,o2) and uit IID(0,2) we get the
where Wxy =
i,t (xit -
LS
= (W +xxOBx)-
(Wxy+ OBxy)
where
==
o2 + T 02
Fullerand Battese (1973) show that this is the same as usingthe ordinary
least squaresestimationwith the transformeddata:
(3) Yit- \Yi and xi - AXiwhere A= 1 - V
This transformationis worth noting because:
(i) it puts the model in a form that is easily estimatedand
(ii) it collapses to the fixed effects model if X = 1.
The argumentsfor using "randomeffects" models instead of the fixed
effects models are several.If we have a largenumberof cross-sectionunits,
insteadof estimatingN of the cxias in the fixedeffects models, we estimate
only the mean andvariancein the randomeffectsmodels. Thissavesa lot of
degreesof freedom.Anotherargumentis the one in Maddala(1971)thatthe
oi measuresfirmspecificeffectsthatwe areignorantaboutjustthe sameway
thatuitmeasureseffectsfor the ith cross-sectionunitin the tthperiodthatwe
are ignorantabout. Thus, if ut is treatedas a randomvariable,then there is
no reasonwhy ai shouldnot also be treatedas random.Another argument
used in the analysisof varianceliteratureis that if we want to make inferences about only this set of cross-sectionunits, then we should treat cx as
fixed. On the other hand, if we want to make inferencesabout the population fromwhichthese cross-sectiondatacame,we shouldtreat tias random.
In mostof the appliedwork,the latteris the case. Finally,veryoften we have
some time-invariantobservationsas well, e.g., yearsof schoolingandfamily
backgroundvariablesin studies of wages. In this case the model is:
ti + Uit
If we use the fixed effects model we cannot estimate the parametersy,
because(xicapturesthe effect of all the time-invariantvariables.In this case
one has to use the randomeffects model.
The choice betweenthe randomandthe fixedeffectsformulationsshould
also dependupon the statisticalpropertiesof the impliedestimators.Later,
in our discussionof dynamicmodels,we will showthatwithlargevaluesof N
and smallvaluesof T,the fixedeffects model gives inconsistentestimatesof
the parameters.This is because the estimationof the fixed-effectsmodel,
whichamountsto differencingout the ai, producesa linearregressionmodel
with lagged dependent variablesand serially correlatederrors. Note that
this problem arises with linear regression models, not just the limited
dependent variablemodels discussedlater on.
An interestingresult occurs if the oas are not independentof the xits.
Mundlak(1978) arguesthat the dichotomybetween fixed effects and ran(4)
Yit= y Zi + P 'it +
oti
'Xi + Wi
XHi)+ Vit
'(Xit-
= (Ti
i -
_
-p
The fixed effects and randomeffectsmodels also yield the same behavior
if instead of (5) we have
(7)
as the
TlX2i -+ T'Z2i + Wi
Yit = PXlit
2Z2i
+ Wi + Uit
We then have
(9)
Yi = Pi
+ Y'i
+
li + P22
'
2Z2i +
2i + 'TTZ2 + Wi + Ui
Subtracting (9) from (8) we see that the within-group estimator pw = Wfxx
Wxygives us a consistent estimator p. The question is about the estimation of
the parameters l1and Y2. Note that if we use the standard methods of
estimation for random effects models with Equation (8) we can get estimates
of y1and mrlbut we cannot get separate estimates of Y2and X'2. Hausman and
Taylor suggest an instrumental variable procedure to get a separate estimate
of Y2. This procedure can be applied if k1 - g2 in which case xli can be used as
instruments for the endogenous variables z2i- Thus, the paper by Hausman
and Taylor shows that one can estimate random effects models with instrumental variable methods if the random effects are correlated with some
of the explanatory variables, provided some conditions are satisfied.
B. Specification Tests
Before undertaking the elaborate analysis of models where E(ti xzit,zi) +
0, we might estimate some simplified models and test the hypothesis
E(ai | xt, zi) = 0. Complicated models can be estimated if this hypothesis is
rejected. For this purpose Hausman (1978) suggests the following specification test: if the null hypothesis is correct, then the GLS estimator from the
random effects model is both consistent and efficient. On the other hand,
the within-group estimator ,w is consistent regardless of whether the null
hypothesis is valid, since all time invariant effects cancel. Thus, we can
construct the difference q = w - IGLS, with variance V(q) = V(pw) V(,GLs). Hence we can use m = C'[iV()] -1 ^ as a X2statistic with k degrees
of freedom, where k is the dimensionality of P. The random effects model is
rejected in favor of the fixed effects model if m is sufficiently high.
Yet another specification test is to test a2 = 0. This is the case where the
individual components do not exist and we can use the OLS method. A test
for this hypothesis is given in Breusch and Pagan (1980) who show that
under the null hypothesis, if we denote the residuals from the least squares
regression by u^i, then
m=
i=1 t=l
NT
2(T-
1)
^Uit
-1
t=it
i= 1 t=l
311
we can write it as
Yt= PYt-1+ Pxt - ppxt-- + et
This is the same as the dynamic regression equation
(11)
Yt = YYt-i + Poxt +
xt-
+ et
313
316
Then
Prob(O,1) =
1
exp(P2 + a)
1 + exp(3'xil + axi) 1 + exp(P'xi2 + Oi)
and
Prob(1 0) =
Prob(l,0)=
1
exp(p'xil + ai)
1 + exp(3'xil + oti) 1 + exp(3'xi2 + ti)
rob(1,0)
Prob(1,0) + Prob(0,1)
exp[p'(xil
- Xi2)
D
and
where
D = 1 + exp[3'(xil-
xi2)]
The axs have been eliminated and we have a standard logit model to
estimate, in which changes in the xits are used to explain changes in the
dichotomous dependent variables. Bj6rklund (1985) uses this two-period
model to study the relationship between unemployment and mental health
in Sweden.
For the case of T = 3, we have to consider two different sets: EYit= 1 and
SYit = 2. For the set Yit = 1 we get, since exp(ax) cancels,
Prob(1,0, 0J Yit= 1) = exp[ '(Xi1- Xi3)]/D
Prob(0, 1,0
yit= 1) = exp[3'(Xi2
xi3)]/D
where
1,2,...,N
t = 1,2,...,
with
(12)
it = 1
if yi > 0
=0
otherwise
-1
>
-"0to
>" -Xit
raqi
(Tu
o'u
If we now define
it
_-
PXit
p1/2
(1 - p)l2
Yit= 0
vit I ai
oo
f
-oo
i=l
r [1
t=l
-/2
F(ait)]yit[F(ait)1
it 1 -
2'
q2/2
ui ~ IN(O, 1)
Xi + Ui
Yi =1 if y*>O
= 0 otherwise
We get
E(yi xi) = F(p'xi) and Var(yiIxi) = F(p'xi)[l - F(3'xi)]
where F( ) is the cumulative distribution function of the standard normal.
The error
i= Yi-
F(P'xi)
is orthogonal to functions of the vector xi. Note that E(ei Ixi) = 0 and that Ei
are heteroskedastic with variance depending on 3'xi.
Analogous to the least squares method, the method of moments suggests
estimating p by solving the equation
n
(13)
XY [Yi -
t=l
F(p'x,)]g(xi, p) = 0
ap
xi)x
319
320
aF('xi)
B_
g(xi,)=
Var(yi
Ixi)
f(3'xi)xi
F(3'xi)[1 - F('xi)]
With this choice Equation (13) gives the first-order conditions for the ML
estimation of the probit model [see Maddala (1983), 26, for the first-order
conditions]. We can also consider g(xi, P) as a weighted nonlinear instrumental variable. Thus, the method of moments is related to GLS estimation method and weighted nonlinear instrumental variable methods.
When we come to panel data, the derivations get more complicated.
AHH start with the first-order conditions for GLS estimation and proceed
from them. They first write the model as:
Yit = PtXit + bit
Yit,=1 if YTt> 0
= 0 otherwise
where 8it is the composite error term ai + uit. They assume a general
covariance matrix E for the error terms 8it (i = 1, 2, . . . , N). In the
Heckman and Willis model the correlation between bitand bit,(t + t') is p. In
the AHH model it is Ptt',unrestricted. AHH argue that if X is incorrectly
is
constrained to have equal off-diagonal terms, then the ML estimator of B3
inconsistent. It will be consistent only if we assume no correlation (i.e., p =
0). AHH also make the assumption (for most of their discussion),
E(i
where
Ixi) = 0
i = (8il 8 ,i2
8,iT)
and xi =
(xil,
i2,
* ,XT).
This implies that 8it is uncorrelated with functions of current, past and future
x's or that x's are strictly exogenous. Also, the distribution of each element
of 8i conditional on xi is assumed to be normal with mean zero and unit
variance.
Under these assumptions we have
E(Yit xi) = E(yit xit) = F(3'xit)
where F( *) is the cumulative distribution function of the standard normal.
The forecast error
Eit =
it- F(P'xit)
t = F(3'xit) + dt
di = (dil, d ,
i,
)'
- I
Ni=1
H fi-
Hi where fl = E(did)
N lH;'
n,-lHi=
ap
can be solved. However, in practice Qi is not known and it has to be
estimatedfromthe residualswhichare themselvesfunctionsof p. Insteadof
consideringthe detailedexpressions,AHH note that condition(16) implies
a linear weighting of the cross products of sample residuals and their
derivativeswith respect to p. Thus, condition(16) can be representedas:
i=1
1l
Y AiGi(Yi, i, )=
Ni=1
where Gi( ) is a T2k dimensional vector containing all possible cross
(17)
ai-= 'xi + vi
vi - IN(O, C2) and xl = (Xi1,Xi2, ...
,XiT)
YT}[3I+ 1-']
1/2and 1 is a
where Ytare the normalization factors given by Yt= (att + r2)v
where
Yt= Azt + v,
A = -B-1C and vt = B- ut
Let
E(v v) = fl.
(Y
t=l
- Azt)'-l
(yt- Azt)
323
lXlij + Elij
li + Ulij
= O2i +
U2ij
Yt = ai + P'xit + Uit
yit = Yytif Y*t> 0
= 0 otherwise.
dit = 1 if yi > 0
= 0 otherwise.
Let
Log L
i,t
(1 - dit)Log (-
+ X dit
flog-
xit)
r'Xit)2}
t-
Unlike the case of the linear model, in this model it is not possible to devise
estimators of P and C that are not functions of the fixed effects aci.Since the
number of observations per cross-section unit is fixed (Tis fixed and usually
small) it is not possible to consistently estimate the fixed effects ci and this
inconsistency carries through to the estimates of P and cr.
Heckman and MaCurdy (1980) estimate this model by iterative methods.
They argue that even though the estimates are inconsistent, from the practical point of view this might not be a serious problem if there are no lagged
dependent variables (1980, 59). This observation was based on the results of
a Monte Carlo study of the multivariate probit model with fixed effects and
T = 8 done by Heckman. Though no Monte Carlo study was done for the
fixed effects tobit model, one can presume that the same results would carry
through for the tobit model.
The procedure used by Heckman and MaCurdy is as follows: start with
some initial values of B3and r, maximize the likelihood function (21) with
respect to ci, substitute this ML estimate in (21) and maximize it with
respect to [3 and u to obtain new estimates of 13and a. This procedure is
iterated till convergence. Heckman and MaCurdy report that they obtained
rapid convergence (1980, 69). If dit = 1 for all t, there is no problem in this
model because we observe Yitbut if dit = 0 for all t, then the corresponding
estimate of (ai is infinite. Thus, these cross-section units have to be discarded. Let I1 be the subset of the sample with these units discarded. For the
ith cross-section unit the probability dit = 1 for at least one t is:
325
Xit
Cr
Hit =
If Hit > 0 the person is in the laborforce and we observeHitand Wit.If Hit
< 0 the person is not in the labor force. The likelihood function for this
model is, therefore,
(25)
where
L=
Hit>O
f(Wit, Hit)
Ait = y2xlit-
H (DI(Ait)
Hit<O
t Xit + U2it
Looking at the likelihood function (25) we see that the second expression
does not involvea2i sinceit dependson Hitonly. As for the firstexpressionin
(25), we can factorf(W, H) into fi(WI H). f2(H). Then it is onlyf1(WIH)
that involves t2i. This conditionaldensityis normal.Maximizingthe likelihood function with respect to o2i, Heckman and MaCurdy (1980, 68) obtain
328
oti
i = 1,2, ..,
='Xit+ E.it
t = 1,2
+ Uit
IN(O, a2 ),
Uit -
IN(O, Ca2).
Prob(yit =
i=1,2,...,N,
exp(oi + yi,t-1)
yi,t1) = 1 +
exp(aoi+ i,t-1)
t=1,2,...,T
y is the increase in the log-odds ratio of being in State 1 due to being in State
1 in the preceding period (after allowing for the individual specific effect).
Again, maximizing the joint likelihood function over (xiand y will not give
a consistent estimate of y as N->oo for fixed T. Intuitively the number of
parameters to be estimated grows with N. Inconsistency is likely to be
particularly serious in the autoregressive case. Chamberlain, therefore,
suggests working with the conditional likelihood function. The sufficient
dEBi
/t
TitYi,
t_- i
t=2
exp(y
\
t=2
dtdt_ /
where
Bi = [d = (dl, d2, . . ., dT) | dt = 1 or 0, d = Yil,
t
dt = SY,t, dT = YiTI]
t
Note that YYt= 1 and 2Yt = 4 give only one case each and hence the
conditionalprobabilitiesdo not involve y. Even with more cases in a set,
conditionalprobabilitiesneed not involve y. For instance,considery1 = 1,
year. Corcoranand Hill obtain these odds as 12.1 if the ci are ignoredand
equal to 3.6 making allowance for cxiusing the Chamberlaintechnique.
Thus, making allowance for the individualspecific effects oti reduces the
effect of past unemploymenton currentunemploymentconsiderably.
There are two major limitationsof the autoregressivemodel (26). The
first is that the use of the alternativeset and the conditionallogit method
mightinvolve discardinga largenumberof sampleobservationsand usinga
very smallportionof the data, as in the paperby Corcoranand Hill (1985).
The second problemis that the model does not permitthe use of exogenous
variables(this is a computationalproblemwith this technique)and thus has
limited use for policy analysis.All it tells is whetherYtdepends on t- 1 or
not, afterallowingfor individualspecificeffectsai. Both these problemscan
be solved by going to the randomeffects (probit) models. An alternative
model, suggested by Chamberlain(1978, 30) that allows for the use of
exogenous variablesis the following:
(26a)
Prob(yit = 1 i,i,
t-
exp(o) + YYi,t-1
exp(oti) + exp( - 'xit)
T=1Yitand Sil
T=2YitYi,t-1.Sill gives
t- 1 +Uit
yit = i + 3xit + YYi,
= 0 otherwise
The methodsof estimationfor fixedeffects andrandomeffectstobit models
carrythroughfor this type of models, thoughagainthere is the problemof
the initialconditions.It is betterto startwitha probabilitydistributionforyio
ratherthan treat them fixed.
One other problem that arises is the issue of lagged index vs. lagged
dummy(or censored)variablemodels.Forinstance,in models(26) or (27) it
is the laggedvalue of the realizedvariable itratherthan the latent variable
yitthat occurs. A lagged index model would be to specify
(28)
Yit= oti
'YxYi,t-1
+
+ Uit
That is, it is the latent variableYi*t-1 that influencesyt, not the realized
value. In studiesof unemployment,this impliesthat it is the "propensityto
be unemployed"in the last period ratherthan the actualstate of employment or unemploymentthat determinesthe currentprobabilityof unemployment.
An exampleof the laggedindex model is in GretherandMaddala(1982)
where a model like (28), withoutthe ci, has been estimatedwith data from
the 1972, 1974, 1976 election panel study administeredby the Center for
PoliticalStudiesat the Universityof Michigan.Thoughthe exampleis from
the politicalscience area, there are likelyto be severalexamplesin economics where the laggedindexmodel makessense. Some Monte Carloevidence
is also presentedin Gretherand Maddala(1982)on the performanceof the
suggested estimator.
In model (30), however, this probability depends on the entire history of the
process. Thus, model (29) implies a first order Markov chain but model (30)
does not, for the discrete sequence it generates. Hence, one could test
whether the effect of Yt- 2 on Ytis zero or not. But what if we specify (29) with
Yt-2 included? Now the distinction depends on whether the coefficient of
Yt-3 is zero, and so on. Chamberlain says that basing the test on the order of
autoregression in (29) is not attractive. With the availability of data on xt,
one can reformulate the test as asking whether there is a dynamic response
to changes in xt or not. Consider
(31)
If y = 0 then
Prob(Yit = xit, xi,t-,
Thus, a test for state dependence is carried out by including lagged x's and
testing whether their coefficients are significant or not. Chamberlain suggests using this simple test in practice allowing for individual specific effects.
One problem with serial correlation is that it depends crucially on the
sampling interval. The smaller this interval, the higher the serial correlation.
Suppose our period of observation is one day. The probability that a person
who worked yesterday would work today would be very close to one.
Hence, Chamberlain argues that finding a significant coefficient for y in (31)
may say very little about the underlying process. The underlying process is a
complete description of the amount of time spent by the individual in each
state (say, employment and unemployment). The analysis now depends on
whether the sample is generated by point sampling or interval sampling. In
333
334
Prob Yl,Y2,...,
=exp2
B =d
t
YtYt-1,YT-1,YT)
T
YtYt- 2)
\t=3
where
YTIY1,Y2,X Yt,
= (d,d2, . ., ddt
\dEB
exp(2
I dt d-2
t=3
= 0or, d = Y, d2 = Y2
dt = SYt,
dtdt-1 = t YtYt-1, dT-1 = YT-1, dT = YT
t
t
1 + exp(y2)
References
Amemiya, Takeshi. 1985. AdvancedEconometrics.Cambridge:Harvard
UniversityPress.
Andersen, E. B. 1973. ConditionalInferenceand Modelsfor Measuring.
Copenhagen:MentalhygiejniskForlag.
Anderson, T. W., and C. Hsiao. 1982. "Formulationand Estimationof
DynamicModels Using Panel Data." Journalof Econometrics18(1):67-82.
335
336
24(1):21-35.
Balestra, P., and M. Nerlove. 1966. "PoolingCross-Sectionand Time-Series
Data in the Estimationof a DynamicModel: The Demand for NaturalGas."
Econometrica 34(4):585-612.
Chicago:Universityof ChicagoPress.
Corcoran,Mary, and MarthaS. Hill. 1985. "Reoccurenceof Unemployment
Among Young Adult Men." The Journal of Human Resources 20(2):165-83.
for Estimationof
Fuller, W. A., and G. E. Battese. 1973. "Transformations
LinearModels With Nested ErrorStructure."Journalof the American
Statistical Association 68:626-32.
337
338