Sei sulla pagina 1di 33

The Board of Regents of the University of Wisconsin System

Limited Dependent Variable Models Using Panel Data


Author(s): G. S. Maddala
Source: The Journal of Human Resources, Vol. 22, No. 3 (Summer, 1987), pp. 307-338
Published by: University of Wisconsin Press
Stable URL: http://www.jstor.org/stable/145742 .
Accessed: 24/01/2015 06:07
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

University of Wisconsin Press and The Board of Regents of the University of Wisconsin System are
collaborating with JSTOR to digitize, preserve and extend access to The Journal of Human Resources.

http://www.jstor.org

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

Limited DependentVariableModels
Using Panel Data

G. S. Maddala

ABSTRACT

This paper presents a survey of the methods used in the estimation


of limited dependent variable models with panel data. It first reviews some issues in the analysis of panel data when the dependent
variables are continuous. The problems of fixed effects vs. random
effects and serious correlation vs. state dependence are discussed
with reference to continuous data. The paper then discusses these
problems with reference to the panel logit, panel probit, and panel
tobit models. The paper presents a comparative assessment of these
models.

I. Introduction
The present paper reviews some problemsthat arise in the
analysis of panel data when the dependent variablesare truncated,censored, or qualitative. The paper discusses panel logit, panel probit, and
panel tobit models with fixed and randomeffects. Hazardrate models and
durationmodels are excluded(for these models, see Amemiya1985,Chapter 11, and Heckman 1984). The purpose of this paper is to present an
overview and comparativeassessmentof the panel logit, probit, and tobit
models which would aid empiricalresearchersin this area in choosingthe
appropriatemodel. For a thoroughdiscussionof particularmodels, readers
can refer to the several papers by Chamberlain,Heckman, and others
referredto at the end of the paper.
Before we proceedwith our discussionof the limiteddependentvariable
The authoris a professorof economicsat the Universityof Florida.He gratefullyacknowledgesfinancialsupportfrom the NationalScienceFoundationand thanksJamesD. Adams
for helpfulcomments.He claimsresponsibilityfor any errors.
THE JOURNAL

OF HUMAN RESOURCES

* XXII * 3

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

308 The Journalof Human Resources


models, it would be helpful to review some issues in the analysisof panel
datawhen the dependentvariablesare continuous.Thiswill give us an idea
of what problems one needs to addresswhen using the panel data with
limited dependent variables.
Since the panel data thatwe frequentlyencounterhave a largenumberN
of cross-sectionunits but extend over shorttime periods T, the asymptotic
results we would be interestedin would be for fixed T and N->oo.

II. Some Issues in the Analysis of Panel Data


with Continuous Variables
A. Random Effects and Fixed Effects Models

One of the early uses of panel data in economicswas in the


context of estimation of productionfunctionswhere allowancehad to be
made for unobservedeffects specificto each productionunit. The model
used is now referredto as the "fixedeffects" model and is given by
(1)

Yit=

i+

Xi

i = 1,2,...,N

t =1,2,...,

whereyitis the output and xi the vector of inputsfor the ith firmin the tth
period, ci captures the firm specific unobserved inputs assumed to be
constantover time, and uitis the errorterm. We assume uit - IID(O,o2).
Thismodelcan be estimatedby includingN interceptdummyvariablesor by
differencingout the ai's.
The next importantstep was the model with randomeffects by Balestra
andNerlove (1966)whereOaiin (1) wastreatedas a randomvariablejustlike
uit.The model they consideredwas a dynamicmodel whereYi,t-1is used as
an explanatoryvariable.Since the introductionof i,t- createssome problems we discusslater, we will dropit for the present.Denotingyi = 1/Tt yit
and y = 1/N ii we can decompose the total sum of squares Ty = Si,t (Yit - t
+ - i,t(
Y
( t
) Yi
i,yyi2 i
y)2 into two componentsas Tyy=
=
y)2
Wyy + Byy. Wyymeasureswithin groupvariationand Byy measures
between group variation in y. Using a similar decompositionfor all the
variancesandcovarianceswe get the estimatorof 3from(1) as B = Wx,lWx
- Yi) This is known as the "within group
i)(yit
estimator." Assuming aoi - IID(0,o2) and uit IID(0,2) we get the

where Wxy =

i,t (xit -

generalizedleast squaresestimatorof P in the randomeffects model as (see


Maddala1971):
(2)

LS

= (W +xxOBx)-

(Wxy+ OBxy)

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques:Maddala 309

where

==

o2 + T 02

Fullerand Battese (1973) show that this is the same as usingthe ordinary
least squaresestimationwith the transformeddata:
(3) Yit- \Yi and xi - AXiwhere A= 1 - V
This transformationis worth noting because:
(i) it puts the model in a form that is easily estimatedand
(ii) it collapses to the fixed effects model if X = 1.
The argumentsfor using "randomeffects" models instead of the fixed
effects models are several.If we have a largenumberof cross-sectionunits,
insteadof estimatingN of the cxias in the fixedeffects models, we estimate
only the mean andvariancein the randomeffectsmodels. Thissavesa lot of
degreesof freedom.Anotherargumentis the one in Maddala(1971)thatthe
oi measuresfirmspecificeffectsthatwe areignorantaboutjustthe sameway
thatuitmeasureseffectsfor the ith cross-sectionunitin the tthperiodthatwe
are ignorantabout. Thus, if ut is treatedas a randomvariable,then there is
no reasonwhy ai shouldnot also be treatedas random.Another argument
used in the analysisof varianceliteratureis that if we want to make inferences about only this set of cross-sectionunits, then we should treat cx as
fixed. On the other hand, if we want to make inferencesabout the population fromwhichthese cross-sectiondatacame,we shouldtreat tias random.
In mostof the appliedwork,the latteris the case. Finally,veryoften we have
some time-invariantobservationsas well, e.g., yearsof schoolingandfamily
backgroundvariablesin studies of wages. In this case the model is:
ti + Uit
If we use the fixed effects model we cannot estimate the parametersy,
because(xicapturesthe effect of all the time-invariantvariables.In this case
one has to use the randomeffects model.
The choice betweenthe randomandthe fixedeffectsformulationsshould
also dependupon the statisticalpropertiesof the impliedestimators.Later,
in our discussionof dynamicmodels,we will showthatwithlargevaluesof N
and smallvaluesof T,the fixedeffects model gives inconsistentestimatesof
the parameters.This is because the estimationof the fixed-effectsmodel,
whichamountsto differencingout the ai, producesa linearregressionmodel
with lagged dependent variablesand serially correlatederrors. Note that
this problem arises with linear regression models, not just the limited
dependent variablemodels discussedlater on.
An interestingresult occurs if the oas are not independentof the xits.
Mundlak(1978) arguesthat the dichotomybetween fixed effects and ran(4)

Yit= y Zi + P 'it +

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

310 The Journalof Human Resources


dom effects models disappearsif we assume that ai depend on the mean
values of xit, an assumptionhe regardsas reasonablein manyproblems.In
this case, the randomeffects and fixedeffects models give the same estimator. We have
(5)

oti

'Xi + Wi

Substitutingthis in (1), we get


Yit = T'Xi + P'Xit + Wi + Uit

Using the FullerandBattese argument,we note thatthe estimatorof 3from


the randomeffectsmodel is obtainedby the use of ordinaryleast squaresfor
the equation:
(6)

Yit- Xi = r'(Xi Xi) + P'(Xit


=

XHi)+ Vit

Xi) + 8Xi + Vit

'(Xit-

where 8 = (7r + B3)(1- X) and Xis defined in (3). Since xi is orthogonal to xi


- xi and since Cov[(xit - xi)(yit - Xi)] = Wx we get the result that B =
Wx' Wy, the within-groupestimator. Thus, in this particularcase the

randomeffects model givesus the sameestimatoras the fixedeffectsmodel.


We can also obtain an estimate of Tr.We have 8 = (xi xI)-1 (ZYi

x')(1 - A) and hence we get


'

= (Ti

i -

_
-p

The fixed effects and randomeffectsmodels also yield the same behavior
if instead of (5) we have
(7)

oti = .XFi+ ,yZi + Wi

because in this case Equation (6) can be written as


Yit- )i = P(xit xi) + 68Xi+ (1 X)y'zi + vit
and since 1/T t(xit - x3i)zi = 0 we again get the estimator of
within-group estimator W7x1Wy.

as the

However, we still have the problemthatthe coefficientsof time invariant


variablescannotbe estimated.HausmanandTaylor(1981)specifya general
model like (4) andconsiderthe case whereai canbe correlatedwithsome of
the zi andxt. They decomposezi into two groupsof g1variableszli whichare
not correlatedwith ai and g2 variables z2i which are correlatedwith ai.
Similarly,they decompose the variablesxitinto two groupsof k1 variables
x1itwhichare not correlatedwithai andk2 variablesx2itwhichare correlated
with ai. We can, therefore, write
ti =

TlX2i -+ T'Z2i + Wi

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques: Maddala


Substituting this in (4) we get (noting that xi and zi are now being partitioned).
(8)

Yit = PXlit

+ P X2it + Yi Zli + Y2Z2i + IT1X2i +

2Z2i

+ Wi + Uit

We then have
(9)

Yi = Pi

+ Y'i

+
li + P22

'

2Z2i +

2i + 'TTZ2 + Wi + Ui

Subtracting (9) from (8) we see that the within-group estimator pw = Wfxx
Wxygives us a consistent estimator p. The question is about the estimation of
the parameters l1and Y2. Note that if we use the standard methods of
estimation for random effects models with Equation (8) we can get estimates
of y1and mrlbut we cannot get separate estimates of Y2and X'2. Hausman and
Taylor suggest an instrumental variable procedure to get a separate estimate
of Y2. This procedure can be applied if k1 - g2 in which case xli can be used as
instruments for the endogenous variables z2i- Thus, the paper by Hausman
and Taylor shows that one can estimate random effects models with instrumental variable methods if the random effects are correlated with some
of the explanatory variables, provided some conditions are satisfied.
B. Specification Tests
Before undertaking the elaborate analysis of models where E(ti xzit,zi) +
0, we might estimate some simplified models and test the hypothesis
E(ai | xt, zi) = 0. Complicated models can be estimated if this hypothesis is
rejected. For this purpose Hausman (1978) suggests the following specification test: if the null hypothesis is correct, then the GLS estimator from the
random effects model is both consistent and efficient. On the other hand,
the within-group estimator ,w is consistent regardless of whether the null
hypothesis is valid, since all time invariant effects cancel. Thus, we can
construct the difference q = w - IGLS, with variance V(q) = V(pw) V(,GLs). Hence we can use m = C'[iV()] -1 ^ as a X2statistic with k degrees
of freedom, where k is the dimensionality of P. The random effects model is
rejected in favor of the fixed effects model if m is sufficiently high.
Yet another specification test is to test a2 = 0. This is the case where the
individual components do not exist and we can use the OLS method. A test
for this hypothesis is given in Breusch and Pagan (1980) who show that
under the null hypothesis, if we denote the residuals from the least squares
regression by u^i, then

m=

i=1 t=l

NT

2(T-

1)

^Uit

-1

t=it
i= 1 t=l

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

311

312 The Journalof Human Resources


has a 2-distributionwith 1 degreeof freedom.Whetherthistest precedesor
follows the Hausman test depends upon the type of testing strategy we
follow. If we start with a simple model and then progressivelygo to more
complicatedmodels, we firstapplythe test c2 = 0 and if it is rejectedwe
next test the hypothesis E(oti xitzi) = 0. On the other hand, if we start with a

general model and then progressivelysimplify it, we would follow the


reverse procedure.
Fromthe computationalpointof view, it is moreconvenientto startwitha
simplemodel andthen progressivelygo to morecomplicatedmodels.Thisis
also the modelingstrategysuggestedby MiltonFriedman,T. C. Koopmans,
andKarlPopper,who have arguedthatthe usefulnessof modelsdependson
our ability to explain complex phenomena in terms of simple models.
However,fromthe statisticalpointof view, the significancelevels for testing
anyhypothesesarenot well knownfor thistype of modelingstrategy.On the
otherhand, the statisticaltheoryof testingis moretractableif we startwitha
generalmodel andprogressivelysimplifyit by introducinga seriesof restrictions on the parameters.Many empiricalresearcherspreferthe procedure
of starting with simple models and progressivelycomplicatingthem. A
common illustrationof this procedure is the estimation of a regression
model assumingno serial correlationin the errors, and then estimatingan
equationwith serialcorrelationif the Durbin-Watsontest statisticis significant. We will presentlydiscussthe limitationsof this procedure[in Equations (10) and (11)] when we discussdynamicmodels.
C. Dynamic Models

One finalissue we have to discussis that of dynamicmodels. In the case of


dynamic models with fixed T, the fixed effects model gives inconsistent
estimates. It is easy to see this if we consider T = 2 as illustratedin
Chamberlain(1980, 227). Consider
Yit= fYi,t-I + ti + Uit

and condition it on Yio.The fixed effects model in this case amounts to


estimatingthe regressionequation:
Yi2- Yil = P(Yil - Yi) + i2 - Uil.

Sincethe errortermis correlatedwithyi we get the resultthatthe estimator


of P is inconsistent.
Regardingthe randonqeffects model, Balestra and Nerlove (1966) encountered some problems with maximum likelihood (ML) estimation.
Based on the resultsof a Monte Carlostudy, Nerlove (1971) arguesagainst
the use of the ML method in dynamicmodels with errorcomponents(i.e.,
randomeffects) anddiscussesalternativesto the MLmethod.However,the

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques: Maddala


problems he encounters with the ML method are due to using the ML
method conditional on Yioand not taking account of the distribution of yio.1
An important issue that arises in dynamic models is that of serial correlation vs. "state dependence," that is, whether any direct effects of the lagged
dependent variable exist apart from those generated indirectly by serial
correlation of the errors. An alternative terminology for the "serial correlation model" vs. "state dependence model" is models with "error dynamics"
and "system dynamics," respectively. This issue is not special to panel data
and arises in the usual regression models as well. To clarify this problem, we
will drop the subscript i and consider a single cross-section unit. For example, if we consider the regression model with no lagged variables but serially
correlated errors:
(10)

Yt = PXt + Ut, Ut = pUt_l + et

we can write it as
Yt= PYt-1+ Pxt - ppxt-- + et
This is the same as the dynamic regression equation
(11)

Yt = YYt-i + Poxt +

xt-

+ et

with the restriction yP1o+ i = 0.


The two models thus differ in this restriction. If the restriction -yPo+ P1 =
0 holds, the apparent effect of Yt- on Ytis due to serial correlation in the
errors. On the other hand, if this restriction does not hold then Yt- has an
effect on Ytand we have what is known as "state dependence." Thus, an
estimation of Equation (11) and a test of the restriction yp3o+ P, = 0 will
enable us to discriminate between the "serial correlation model" and the
"state dependence model."
Note that a common procedure that is adopted is to estimate the regression model
Yt= 3xt + ut
and to estimate the serial correlation model if the Durbin-Watson statistic is
significant. This procedure is wrong because the Durbin-Watson statistic
1. In his MonteCarlostudy,Nerlovegenerated20 observationsanddiscardedthe firstten and
usedthe last ten observationsin the estimationusingYioas the initialobservationandusingthe
MLmethodassumingYioas known.However,fromthe waythe dataaregeneratedin hisstudy,
one can write down the probabilitydistributionof Yio.When this was incorporatedinto the
likelihoodfunction,I foundthattherewereno moreboundarysolutionswiththe MLmethod,
as wasfoundin Nerlove'sMonteCarlostudy.Thus,whatthe NerloveMonteCarlostudyshows
is thatwhen Tis small,one hasto be verycarefulin definingthe likelihoodfunction.The result
will dependcruciallyon whetherthis is takenconditionallyon the initialvaluesor whetherthe
distributionof the initialvalueis takeninto account.Thispointhasalsobeenemphasizedin the
papersby Andersonand Hsiao (1982) and Bhargavaand Sargan(1983).

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

313

314 The Journalof Human Resources


could be significantnot becausewe have serialcorrelationin the errorsbut
becausewe have omittedYt 1andxt- 1fromthe explanatoryvariables.This
omission is called "mis-specifieddynamics."Thus, what we are doing is
ascribingto "errordynamics"whatis really"systemdynamics."The proper
procedureis to firstestimateEquation(11) andtest for the restrictiony130+
P1 = 0. If this is not rejected, then we test for serialcorrelationby testingp
= 0. Thus, the test for the serialcorrelationshouldbe undertakenafterwe
have determinedthat what we have is perhapsa serial correlationmodel.
The use of the Durbin-Watsonstatistic at the beginningis not a correct
procedure.
All these commentsapplyto paneldataas well. To simplifythe exposition
we have considereda single cross-sectionunit and omittedthe subscripti.
Returningto the case of panel data, the serialcorrelationmodel and the
state dependence model correspondingto Equation (4) are:
(i) The serial correlationmodel:
Yit= y Zi + P'Xit+ ai + Wit
Wit = pWi,t- 1 + Uit

(ii) The state dependence model


Yit

Yi,t- 1 = Y'Zi+ P'Xit + ai + Uit

Andersonand Hsiao (1982) discussthe estimationof these modelsunder


eight differentassumptions.The most reasonableassumption,however, is
whatthe model itself impliesaboutthe distributionof io.The estimationof
the models under this assumptionis discussedin Anderson and Hsiao's
paper as well as in Lee (1981) and Bhargavaand Sargan(1983).
Thus, in dynamicmodels the more importantissues are those of distinguishingbetween serial correlationand "state dependence"and the problem of estimatingthe model under the assumptionaboutYiothat is not ad
hoc but is derivedby implicationfromthe model itself. Giventhe complexity of the estimationprocedure(whichis however,feasible)for modelswith
continuousvariables,it is to be expectedthat it wouldbe almostimpossible
to implementa similarprocedurefor the logit, probit, and tobit models.
We have reviewed some issues in the estimationof continuousvariable
models based on panel data. We will see how many of these resultscarry
through for the logit, probit, and tobit models and what computational
complexitiesarise.
In the remainingsections of this paper we will discuss the following
models:
(i) Fixedeffectslogit andprobitmodels. Herewe arguethatfor large
N and small T (whichis typicallythe case) ML estimationof the

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques:Maddala 315


fixed effects model gives inconsistentestimates of the parameters. However, in the logit model one can obtain consistent
estimatesusing the conditionalML method (conditioningon the
fixed effects). Such conditioningis not possible with the probit
model. Hence, for the analysisof the fixedeffectsmodel, the logit
model is the appropriateone.
(ii) Random effects logit and probit models. For the analysis of
random effects, the probit model is the appropriateone. The
random effects produce correlationsamong the errors and the
multivariatelogisticdistributionis too restrictivefor this purpose
(since it impliesthat all correlationsare 0.5). The probitmodel is
based on the multivariatenormal distributionwhich is more
flexible. For the analysis of random effects probit models we
considerthree models suggestedby Heckmanand Willis (1976),
Avery, Hansen, and Hotz (1983), and Chamberlain(1985).
Fixed
effects and randomeffects tobit models. These are exten(iii)
sions of the previousmodels. Illustrativeexamplesare in Heckman and MaCurdy(1980) and Hausmanand Wise (1979).
(iv) Autoregressivelogit and probitmodels. These are extensionsof
models in (i) and (ii) to the case wherelaggedy's are includedas
explanatoryvariables. The conditionallogit model can be extended to this case but only undersome restrictiveassumptions.
The randomeffects probit model can be easily extended to this
case.
We will now discussthese four categoriesof models.

III. Fixed Effects Logit and Probit Models


In the case of continuousvariablesand no autoregressions,
the fixed effects model gives consistentestimatesof the slope parameters3
in Equation (1). This is not the case when the dependent variableyi, is
observedonly as a qualitativevariableand there are only a few time-series
observationsper individualas is usually the case. Andersen (1973) and
Chamberlain(1980) demonstratethis for the logit model and suggest a
conditional likelihood approach. The idea is to consider the likelihood
functionconditionalon sufficientstatisticsfor the incidentalparametersxi.
In the logit model, as in model (1), these sufficientstatisticsare jtYitfor xi.
For the logit model the conditionallikelihood approachresultsin a computationallyconvenient estimator. The conditionalML estimatorof 3 is
consistent, providedthat the conditionallikelihood function satisfiesregularityconditions, which impose mild restrictionson the oti.Chamberlain

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

316

The Journal of Human Resources


(1980) also shows that the standard errors obtained by the usual conditional
logit programs can be used as the asymptotic standard errors for the conditional ML estimator of 3.
In the use of the conditional logit approach we discard alternative sets for
which lYit = 0 or Eyit = T (i.e., persons who never change states) because
they contribute zero to the likelihood function. We will illustrate the method
for T = 2 and 3. Let the logit model be:
Prob ( it= 1) =

exp ('xit + oai)


1 + exp(3'xit + ai)

Then

Prob(O,1) =

1
exp(P2 + a)
1 + exp(3'xil + axi) 1 + exp(P'xi2 + Oi)

and

Prob(1 0) =
Prob(l,0)=

1
exp(p'xil + ai)
1 + exp(3'xil + oti) 1 + exp(3'xi2 + ti)

Thus, since (1,0) and (0,1) are mutually exclusive,


Prob[(1,0) (1,0) or (0,1)] =

rob(1,0)
Prob(1,0) + Prob(0,1)
exp[p'(xil

- Xi2)

D
and

Prob[(0,1) | (1,0) or (0,1)] =

where

D = 1 + exp[3'(xil-

xi2)]

The axs have been eliminated and we have a standard logit model to
estimate, in which changes in the xits are used to explain changes in the
dichotomous dependent variables. Bj6rklund (1985) uses this two-period
model to study the relationship between unemployment and mental health
in Sweden.
For the case of T = 3, we have to consider two different sets: EYit= 1 and
SYit = 2. For the set Yit = 1 we get, since exp(ax) cancels,
Prob(1,0, 0J Yit= 1) = exp[ '(Xi1- Xi3)]/D
Prob(0, 1,0

yit= 1) = exp[3'(Xi2

xi3)]/D

Prob(0,0,1 1lyi = 1) = l/D1


where D1 = 1 + exp[3'(xi1 - Xi3)] + exp[P'(xi2 - X3)].
For the set 2Yit = 2, we get, by cancelling exp(2ti),
and

Prob(1, 1, 0 Yit= 2) = exp[p'(xi2 - xi3)]D2

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques:Maddala 317


Prob(0, 1,1 Ilyi = 2) = exp[3'(Xi2 - Xil)]/D2
Prob(1,0, 1lYit = 2) = l/D2
D2 = 1 + exp[p'(xi2 - xi3)] + exp['(xi2 - xi1)].
For general T, we have to consider the sets ;Yit= 1, 2, . .., (T - 1).

where

Chamberlain(1980, 231) shows that the conditionalML method can be


extended to the multinomiallogit model as well as the log-linearmodel.
By contrast,the fixed effects probitmodel is difficultto implementcomputationally.The conditionalML methoddoes not producecomputational
simplificationsas in the logit model becausethe fixed effects do not cancel
out. This implies that all N fixed effects must be estimatedas part of the
estimationprocedure.Further,this also impliesthat, since the estimatesof
the fixed effects are inconsistentfor small T, the fixed effects probitmodel
gives inconsistentestimatesfor 3 as well. Thus, in applyingthe fixedeffects
models to qualitativedependent variablesbased on panel data, the logit
model and the log-linearmodels seem to be the only choices. However, in
the case of randomeffects models it is the probit model that is computationallytractableratherthanthe logit model. We now, therefore,turnto the
randomeffects probit model.

IV. Random Effects Probit Models


We discussedearlierthe argumentsin favorof randomeffects
modelsin paneldata.Withrandomeffects,the compositeerrortermin (1) is
correlatedacrosscross-sectionunitseven if uitareIID. Withthe logitmodel,
where the errorsare assumedto have a logisticdistribution,we need to use
the multivariatelogistic distribution.Whereas, with the probit model we
need to use the multivariatenormaldistribution.The multivariatelogistic
distributionhas the disadvantagethatthe correlationsare all constrainedto
be 1/2 (see Johnsonand Kotz 1972,293-94). Thoughsome generalizations
are possible, the multivariatelogistic distributiondoes not permit much
flexibility.Hence, when we considerrandomeffects modelswe will confine
ourselves to randomeffect probit models.
Two importantpropertiesof the randomeffects probitmodels are worth
mentioning:
(i) Unlike the estimates from the fixed effects probit model, the
estimates from the randomeffects probit model are consistent.
(ii) The random effects probit model is based on the multivariate
normal distribution.However, ignoringthe correlationsamong
the errors and using a standardprobit estimationmethod with
pooled data produces consistent (though inefficient) estimates.

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

318 The Journalof Human Resources


[Robinson (1982) proves this for the tobit model but the result
holdsfor the probitmodel as well.] These estimatescanbe used as
initial values in any iterative method to compute the ML estimates.
A. The Heckman and Willis Model

The firstapplicationof randomeffects robitmodel is that of Heckmanand


Willis (1976). Their model is:
YiP= Xit+ a-i+ it

1,2,...,N

t = 1,2,...,

with
(12)

it = 1

if yi > 0

=0

otherwise

ai ~ IN(O, 2), uit~ IN(0, a) and ci and uitare mutuallyindependentas


well as independent of x,. Define '2 = C2 + r2 and p = '2 /'2.
Also define vit = uit/(ru, qi = oXi/orand p* = 3P/(
YYit

-1

>

-"0to

>" -Xit

raqi

(Tu

o'u

If we now define
it

_-

PXit

p1/2

(1 - p)l2

then we can restate the above conditionas


Yit= 1

> Vit > ait

Yit= 0

vit I ai

Conditionalon given values of ai, yi are independentnormal. Hence,


conditionalon given values of ai, we can easily write down the joint probabilitydensityof it.To get the unconditionalprobabilitydensity,we multiply this densityby the densityfunctionfor otiandintegratewithrespectto ai.
Note that p and caiare estimableonly up to a scale factor.
The joint density of the Yitis, therefore,
N

oo

f
-oo

i=l

r [1
t=l

-/2

F(ait)]yit[F(ait)1

it 1 -

2'

q2/2

where F(ait) is the common degree of freedom of the standardnormal.


Thus, the expressionis reducedto the evaluationof the expressionsF(ait)
and a single integralfor which good approximationsare available.Butler

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques: Maddala


and Moffitt (1982) provide an efficient computational algorithm for the
evaluation of this integral and thus the ML estimation of the random effects
probit model. Their program is much faster than the one used by Heckman
and Willis. The use of the probit model with the entire set of NT observations ignoring the correlations will give an initial consistent estimate of p.
B. The Avery-Hansen-Hotz (AHH) Model
The random effects model given by (12) assumes that the correlation between successive disturbances for the same individual unit is a constant p.
This specification is often referred to as the specification of "equicorrelation." If we relax this assumption, then the estimation of the model involves
T-fold integrals and the simplification suggested by Heckman and Willis
does not hold.
Avery, Hansen, and Hotz (1983), henceforth referred to as AHH, suggest
using the method of moments (MOM) estimators for this case. To understand what the method of moments is, let us first consider the usual probit
model. Thus, we consider a single time period and drop the subscript t.
The model now is,
Yi*=

ui ~ IN(O, 1)

Xi + Ui

Yi =1 if y*>O
= 0 otherwise
We get
E(yi xi) = F(p'xi) and Var(yiIxi) = F(p'xi)[l - F(3'xi)]
where F( ) is the cumulative distribution function of the standard normal.
The error
i= Yi-

F(P'xi)

is orthogonal to functions of the vector xi. Note that E(ei Ixi) = 0 and that Ei
are heteroskedastic with variance depending on 3'xi.
Analogous to the least squares method, the method of moments suggests
estimating p by solving the equation
n

(13)

XY [Yi -

t=l

F(p'x,)]g(xi, p) = 0

where g( ) is a suitably chosen function. One can choose g(xi, P) = xi but


this cannot be justified since F(1'xi) is a nonlinear function. Another
alternative is
aF( 'xi) =f

ap

xi)x

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

319

320

The Journal of Human Resources


where f( ) is the density function of the standard normal. However, since
E(s are heteroskedastic we have to weight this in inverse proportion to
Var(Ei). The GLS method implies choosing
(14)

aF('xi)
B_

g(xi,)=

Var(yi

Ixi)

f(3'xi)xi

F(3'xi)[1 - F('xi)]
With this choice Equation (13) gives the first-order conditions for the ML
estimation of the probit model [see Maddala (1983), 26, for the first-order
conditions]. We can also consider g(xi, P) as a weighted nonlinear instrumental variable. Thus, the method of moments is related to GLS estimation method and weighted nonlinear instrumental variable methods.
When we come to panel data, the derivations get more complicated.
AHH start with the first-order conditions for GLS estimation and proceed
from them. They first write the model as:
Yit = PtXit + bit

Yit,=1 if YTt> 0
= 0 otherwise
where 8it is the composite error term ai + uit. They assume a general
covariance matrix E for the error terms 8it (i = 1, 2, . . . , N). In the
Heckman and Willis model the correlation between bitand bit,(t + t') is p. In
the AHH model it is Ptt',unrestricted. AHH argue that if X is incorrectly
is
constrained to have equal off-diagonal terms, then the ML estimator of B3
inconsistent. It will be consistent only if we assume no correlation (i.e., p =
0). AHH also make the assumption (for most of their discussion),
E(i

where

Ixi) = 0

i = (8il 8 ,i2

8,iT)

and xi =

(xil,

i2,

* ,XT).

This implies that 8it is uncorrelated with functions of current, past and future
x's or that x's are strictly exogenous. Also, the distribution of each element
of 8i conditional on xi is assumed to be normal with mean zero and unit
variance.
Under these assumptions we have
E(Yit xi) = E(yit xit) = F(3'xit)
where F( *) is the cumulative distribution function of the standard normal.
The forecast error
Eit =

it- F(P'xit)

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques:Maddala 321


is orthogonalto functionsof the entirevectorxi as long as the randomvector
has a finitesecond moment. Thisimpliesthe followingregressionequation:
(15)

t = F(3'xit) + dt

where dit is a disturbanceterm orthogonalto functionsof xi.


One can think of estimatingthis equation by nonlinear GLS but the
disturbancesditareheteroskedasticwithvariancesinvolvingthe parameters
1 and S. A consistent estimator for ,3 can be obtained by estimating a
standardprobit model with all the NT observationsignoringthe correlations. However, obtainingan estimatorfor E would be quite cumbersome.
AHH, therefore, propose the method of moments, which is essentiallya
weightednonlinearinstrumentalvariablemethod. This method startswith
the first-orderconditionsimpliedby the GLSestimationof (15). Using dit =

Yi, - F(P'xit), define

di = (dil, d ,

.., diT) = Hi(it,

i,

)'

Then GLS estimationof (15) amountsto minimizingthe weighted sum of


squaresof residuals
N

- I

Ni=1

H fi-

Hi where fl = E(did)

If fni is known, the first-ordercondition


(16)

N lH;'
n,-lHi=

ap
can be solved. However, in practice Qi is not known and it has to be
estimatedfromthe residualswhichare themselvesfunctionsof p. Insteadof
consideringthe detailedexpressions,AHH note that condition(16) implies
a linear weighting of the cross products of sample residuals and their
derivativeswith respect to p. Thus, condition(16) can be representedas:
i=1

1l

Y AiGi(Yi, i, )=
Ni=1
where Gi( ) is a T2k dimensional vector containing all possible cross
(17)

productsof elements in the T dimensionalvector Hi and the T x k matrix


dHi /d3 and Ai is a k x T2k matrix consisting of zeroes and elements of f2- 1.
Given the difficulty of estimating Ai, AHH suggest some tractable
alternatives.In essence, what they use are expressionsanalogousto those
presentedfor the simpleprobitmodelin (14) buttakinginto accountthe fact
thatin the case of paneldatathe productof aHi/83 andHi involvesleadsand
lags. Let us, for the sake of compactness,denoteF(P'xit)by Fit.Thenwithno
leads or lags, the weight used is the inverseof FIt(1 - Fit)as in (14). With
leads and lags, for two differenttime periodst ands, the weightused is the
inverse of [Fit(1 - Fit)]l/2[Fis(1 - Fs)]1/2.The AHH method thus corrects

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

322 The Journalof Human Resources


for serial correlationin a linear manner, whereas Equations (16) or (17)
involve correctionsin a complicatednonlinearway.
The detailedexpressionsderivedby AHH (1983,27-28) arequitecomplicated and need not be reproducedhere. However, a computerprogram
HOTZTRAN can be obtainedto implementtheirprocedurefor the probit
as well as the tobit models. The above discussiongives a roughidea of the
method of moments that they use.
AHH applytheirprocedureto studylaborforce participationof married
women. They compare their results with those from the specificationof
equicorrelationand reject the latter.
C. The ChamberlainModel
One major limitation of the preceding models is that we assumed the
randomeffectsto be uncorrelatedwiththe explanatoryvariablesxi. Thisis a
seriouslimitationand needs to be relaxed.Avery, Hansen, andHotz (1983,
29) test for the exogeneityassumption(13) whichcanbe performedwiththe
HOTZTRAN program.However, a questionarises as to what alternative
model to considerif we drop this assumption.Earlier,in our discussionof
models with continuousvariables,we discussedMundlak'sassumptions(5)
whichled to the within-groupestimatorfor f3or the estimatorfromthe fixed
effects model. In the case of qualitativevariables,thisspecificationdoes not
lead to the estimatorfrom the fixed effects model.
Chamberlain(1980, 1985) considersthe specification
(18)
where

ai-= 'xi + vi
vi - IN(O, C2) and xl = (Xi1,Xi2, ...

,XiT)

This specificationleads to the randomeffects model


(19)

it = P'xit+ r'xi + vi + uit


Yit= 1 if Yt > 0
= 0 otherwise

The correlationstructureof the errorsin this model is the same as in the


Heckman and Willis model (12). However, the additionof T'xiproduces
some cross-equationrestrictionson the coefficients.We can considerprobit
equations for each t (t = 1, 2, ...,

T) which involves the use of all the leads

andlags of x. If we denote the matrixof these multivariateprobitcoefficients


by D, then D has the structure
(20)

D = diag. {y1, y2,...,

YT}[3I+ 1-']

1/2and 1 is a
where Ytare the normalization factors given by Yt= (att + r2)v

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques: Maddala


T x 1 vector of ones. Chamberlain suggests the estimation of the probit
coefficients D by running the T probit equations separately. Then he proposes (1985, 1252) estimating B3and Trby imposing the constraints (20),
using a minimum distance estimator.
The minimum distance estimator is just a GLS estimator. To see how it is
used when there are restrictions on the parameters, consider a simultaneous
equations model
Byt + Czt = Ut
where Ytis a vector of endogenous variables and zt is a vector of exogenous
variables. The reduced form is

where

Yt= Azt + v,
A = -B-1C and vt = B- ut

Let

E(v v) = fl.

The estimates of B and C can be obtained by minimizing


T

(Y
t=l

- Azt)'-l

(yt- Azt)

subject to the restrictions A = - B-1 C or BA + C = 0. However, fl is not


known. But we can substitute fDfor Q where Q is obtained from the residuals
from the OLS estimation of the reduced form equations (i.e., ignoring the
restrictions BA + C = 0). The GLS estimation with Qfsubstituted for f and
subject to the restrictions BA + C = 0 is called the "minimum distance
method" of estimation. It is discussed in Malinvaud (1970, 675-78). Malinvaud shows that the estimators for B and C are consistent and if the errors ut
are normal, they are asymptotically efficient.
Chamberlain (1985) uses a similar minimum distance method to estimate
the relationship between labor force participation of married women and
the presence of young children, based on the data from the Michigan Panel
Study of Income Dynamics. He also estimates the fixed effects conditional
logit model with the same data. The results from the two models are,
however, conflicting. The random effects probit model with random effects
specified as linear functions of xi suggested that the cross-section estimates
overstate the negative effects of young children on the woman's participation probability. The fixed effects conditional logit model, on the other
hand, gives the result that the cross-section estimates underestimate the
negative effect. Both models control for the unobserved individual effects,
the logit model leaving them unspecified, and the probit model using a
special specification. It is possible that the special specification for ai is not
the appropriate one or that the fixed effects logit model should include leads
and lags of the x's as part of the structural model.

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

323

324 The Journalof Human Resources


In summary,there are three models that have been suggested for the
randomeffects probit models:
(i) The Heckmanand Willismodel (Equation12), for whichButler
and Moffitt (1982) have an efficientcomputeralgorithm.
(ii) The Avery-Hansen-Hotz(1983) model, for which there is the
HOTZTRAN computerprogram.Since this model is a generalizationof the Heckman-Willismodel, one might wonderwhy
we should considerthe former.If the equicorrelationhypothesis
andthe exogeneityhypothesisare not rejected,we mightconsider the Butler and Moffittprocedurebecause it is more efficient
than the method of moments consideredby AHH. The AHH
procedure,on the other hand, is also robustagainstother forms
of serial correlationsin the errors.
(iii) The Chamberlainmodel (1985). This allows for the random
effects to dependon the current,futureandpastx's. A computer
programto implementthis procedurecan be obtainedfrom Professor George Jakubsonat CornellUniversity.
All these procedures,however, assumethat the slope coefficients,3do not
change over time. Since the estimationof a probitmodel is not costly, it is
worthwhile estimating probit models for each cross-sectionand for the
pooled sample before any analysis is performed using the above three
models. Note that the pooled probitmodel gives consistentestimatorsof P
even when there is serialcorrelationin the errorsdue to the randomeffects.
Whenit comes to an extensionof the randomeffectsprobitmodelto more
complicatedsituationslike simultaneousequationsmodels, the models by
Avery, Hansen and Hotz and by Chamberlainare more complicatedto
implement.A generalizationof the Heckman-Willismodel to simultaneous
equations is in Sickles and Taubman(1986) who employ the Butler and
Moffitt (1982) procedurefor estimatingthe model. The model consists of
two equations.
lij =

lXlij + Elij

Yij-= wyij + P2X2ij+ 2ij


Elij =
E2ij

li + Ulij

= O2i +
U2ij

For yjij they have a polytomousprobitmodel with orderedresponsesand


for Y2ijthey have a binaryprobitmodel. The paperby SicklesandTaubman
shows that it is feasible to estimate simultaneousequations models with
randomeffects in each equation.

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques: Maddala

V. Fixed Effects Tobit Model


The fixed effects tobit model can be written as:
Uit- IN(O,a2)

Yt = ai + P'xit + Uit
yit = Yytif Y*t> 0
= 0 otherwise.
dit = 1 if yi > 0
= 0 otherwise.

Let

The log-likelihood function is:


(21)

Log L

i,t

(1 - dit)Log (-

+ X dit

flog-

xit)

r'Xit)2}
t-

Unlike the case of the linear model, in this model it is not possible to devise
estimators of P and C that are not functions of the fixed effects aci.Since the
number of observations per cross-section unit is fixed (Tis fixed and usually
small) it is not possible to consistently estimate the fixed effects ci and this
inconsistency carries through to the estimates of P and cr.
Heckman and MaCurdy (1980) estimate this model by iterative methods.
They argue that even though the estimates are inconsistent, from the practical point of view this might not be a serious problem if there are no lagged
dependent variables (1980, 59). This observation was based on the results of
a Monte Carlo study of the multivariate probit model with fixed effects and
T = 8 done by Heckman. Though no Monte Carlo study was done for the
fixed effects tobit model, one can presume that the same results would carry
through for the tobit model.
The procedure used by Heckman and MaCurdy is as follows: start with
some initial values of B3and r, maximize the likelihood function (21) with
respect to ci, substitute this ML estimate in (21) and maximize it with
respect to [3 and u to obtain new estimates of 13and a. This procedure is
iterated till convergence. Heckman and MaCurdy report that they obtained
rapid convergence (1980, 69). If dit = 1 for all t, there is no problem in this
model because we observe Yitbut if dit = 0 for all t, then the corresponding
estimate of (ai is infinite. Thus, these cross-section units have to be discarded. Let I1 be the subset of the sample with these units discarded. For the
ith cross-section unit the probability dit = 1 for at least one t is:

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

325

326 The Journalof Human Resources


Pi = 1 - Prob(dit = 0 for all t)
-Hi
i=1

Xit
Cr

Conditionalon the informationthatdit= 1 for at least one t, the logarithm


of the likelihood function is now the same as (21) but with the subscripti
used only for subset I, and with IE log pi subtracted.The maximizationof
this likelihood function again can be done by the same procedureas the
maximizationof (21).
The model considered by Heckman and MaCurdyis actually a two
equation model but the estimationinvolves fixed effects in only one tobit
type equation because of the way the likelihoodfunctionis factored. The
model they use is a generalizationof Heckman'slaborsupplymodelto panel
data. We start with a shadow wage equation
vXit
(22) Sit= y1Hit+yxi+
and a marketwage equation
(23)

Wit = t'x2it + v2it

where H is hours of work and x1 and x2 are sets of exogenous variables.


Heckman assumes that hours of work adjust so that Sit = Wit.Hence, we get
(24)

Hit =

'x2it - Y2lit + V2it- vlit


Yi
Yi

If Hit > 0 the person is in the laborforce and we observeHitand Wit.If Hit
< 0 the person is not in the labor force. The likelihood function for this
model is, therefore,
(25)
where

L=
Hit>O

f(Wit, Hit)

Ait = y2xlit-

H (DI(Ait)

Hit<O

'x2itand c2 = Var(v2t - vt)

andf(W, H) is the joint densityof W and H. One can introduceindividual


specific effects in the two Equations (22) and (23) for shadow wage and
market wage and reparametrizethese so that we have individualspecific
effects in the wage and hours worked equations. Heckmanand MaCurdy
derivethe individualspecificeffectsfrom a model of life-cyclelaborsupply.
Thus, these effects have a specificmeaningand are not postulatedin an ad
hoc fashion.HeckmanandMaCurdyarguethatsincethese effectscannotbe
assumedto be independentof the explanatoryvariables,the model should
be estimated by fixed effects ratherthan by the randomeffects model.
Though we start with individual-specificeffects in Equations (22) and
(23), we can reparametrizeand write the model with individualspecific

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques:Maddala 327


effects in (23) and (24), which are the equationsrelevantfor the likelihood
function (25). Thus, we will write
Hit = cxli + y'zit + ulit
Wit = ot2i +

t Xit + U2it

Looking at the likelihood function (25) we see that the second expression
does not involvea2i sinceit dependson Hitonly. As for the firstexpressionin
(25), we can factorf(W, H) into fi(WI H). f2(H). Then it is onlyf1(WIH)
that involves t2i. This conditionaldensityis normal.Maximizingthe likelihood function with respect to o2i, Heckman and MaCurdy (1980, 68) obtain

the ML estimateswhichjust involvethe samplemeansof the variables(as in


the usual linear fixed effects model, except that these means are over the
subsetof observationsfor whichHit> 0). Substitutingthese estimatesof ot2i
in the likelihood function, Heckmanand MaCurdyobtain a concentrated
likelihoodfunctionthatinvolvescli only andis similarin structureto that of
a single equation tobit model with fixed effects like (21).
Thus, though the model is a two-equationmodel with fixed effects in the
wages and hoursworkedequation, by factoringf(W, H) intof1(WIH) and
f2(H), the fixed effects a2i can be eliminatedas in the linearmodel. We are
left with only one set of fixed effects ali and a tobit model.

VI. Random Effects Tobit Models


The three models that we consideredunder randomeffects
can all be generalized to the case of tobit models. The
models
probit
HOTZTRAN programcomputes the method of moments estimatorsfor
tobit models. The Heckman and Willis model again involves a univariate
integraland the Butler and Moffittprogramcan be suitablymodified.Note
thatthe estimationof the tobitmodelignoringthe serialcorrelationproblem
does give consistentestimatesfor B(see Robinson1982).These can be used
as initialstartingvalues. The Chamberlainmodelnow involvestobit estimation for each cross-section(includingthe current,future,and pastx's). The
only differenceis thatin the probitmodelthe parametersareestimableup to
a scale factoronly, whereasall parametersare estimablein the tobit model.
There are very few applicationsof the randomeffects tobit model with
panel data. As mentionedearlier,HeckmanandMaCurdyarguedagainstit
in theirapplicationon the groundsthatthe ai were expectedto be correlated
with the explanatoryvariables. There is, however, one example of the
randomeffects tobit model, but this is with self-selectionand with T = 2.
Though the computationalproblemsfor such models are differentand do
not illustratethe modificationsof the Heckman-Willis,AHH, and Chamberlainmodels to the tobit case, this exampleis worthcitingin this context.

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

328

The Journal of Human Resources


An example of the use of random effects tobit model with self-selection is
the paper by Hausman and Wise (1979). The model is a two-period model of
earnings it.
Yit=
eit =i

oti

i = 1,2, ..,

='Xit+ E.it

t = 1,2

+ Uit
IN(O, a2 ),

Uit -

IN(O, Ca2).

The problem is that Yi2is observed only if an index of attrition Ai - 0 where


Ai is defined by
Ai = Rib + Ei3.

A common procedure is to discard observations for which Yi2 is zero.


Hausman and Wise argue that this is an incorrect procedure if the probability of observing Yi2varies with its value and the value of other variables. The
details of the estimation by Hausman and Wise need not be reproduced here
because the detailed structure of models with self-selection differs case by
case (see Maddala 1983, Chapter 9). The paper by Hausman and Wise which
is an example of an estimation of random effects models with self-selection
can provide guidance in such models.

VII. Autoregressive Logit Models


An additional problem occurs if there is "state dependence"
in which an individual's past state i,t-1 will help in predicting his or her
current state yit after allowing for the individual effects oi. This problem has
frequently been posed in connection with the effect of past unemployment
on current unemployment (see Corcoran 1982, Corcoran and Hill 1985,
Heckman and Borjas 1980). A model due to Cox (1958), analyzed further by
Chamberlain (1978, 1985), is the autoregressive logit model:
(26)

Prob(yit =
i=1,2,...,N,

exp(oi + yi,t-1)
yi,t1) = 1 +
exp(aoi+ i,t-1)
t=1,2,...,T

y is the increase in the log-odds ratio of being in State 1 due to being in State
1 in the preceding period (after allowing for the individual specific effect).
Again, maximizing the joint likelihood function over (xiand y will not give
a consistent estimate of y as N->oo for fixed T. Intuitively the number of
parameters to be estimated grows with N. Inconsistency is likely to be
particularly serious in the autoregressive case. Chamberlain, therefore,
suggests working with the conditional likelihood function. The sufficient

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques:Maddala 329


statisticsfor Oiare tYitandYiT. In additionhe dealswithinitialconditionsby
conditioningon Yil. Then,
Prob (Yi, Yi2,.. , iT Yil, tYit,YiT)
exp (

dEBi

/t
TitYi,
t_- i

t=2

exp(y
\

t=2

dtdt_ /

where
Bi = [d = (dl, d2, . . ., dT) | dt = 1 or 0, d = Yil,
t

dt = SY,t, dT = YiTI]
t

The conditionalprobabilityfor the individualobservationis given by the


sumof productsof adjacenty's for the individual,dividedby the sumof such
productsfor all individualsin thatset (withthe same il, YiTand itYit). Some
such sets will not involve y at all andwe have to omit them. Thismodel is in
the formof a conditionallogit model andcan be estimatedeasilysincethe ci
have been eliminated. It has been used by Chamberlain(1978, 35-36),
Corcoran(1982), and Corcoranand Hill (1985), all to study labor force
participation.
To applythis method we need to determinethe sets that resultin probabilities that depend on y. For instance, consider T = 5, yi = 0, Y5 = 1. Then
if SYt = 2 we get 3 cases:
0 1 0 0 1 with YtYt-1 = 0 and hence Prob = e?/D = 1/D
O O 1 0 1 with YtYt-1 = 0 and hence Prob = e?/D = 1/D

OO0 1 1 with ~YtYt- = 1 and hence Prob = e/lD


where D = 1 + 1 + et = e + 2.
If SYt = 3 we again get 3 cases
0 1 1 0 1 with SYtYt-1= 1 and hence Prob = eYD*
0 1 0 1 1 with SYtYt- = 1 and hence Prob = eYID*

0 0 1 1 1 with lytYt-1 = 2 and hence Prob = e2YID*


where D* = e- + e- + e2Y.CancellingeYthroughoutwe get the probabilities as 1/D, 1/D and elYD where D = e- + 2.

Note that YYt= 1 and 2Yt = 4 give only one case each and hence the
conditionalprobabilitiesdo not involve y. Even with more cases in a set,
conditionalprobabilitiesneed not involve y. For instance,considery1 = 1,

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

330 The Journalof Human Resources


Y5 = 1, SYt = 4. There are three cases = 1 1 1 0 1, 1 10 1 1, 1 0 111.
However, for each we have SYtYt- = 2. Hence, the conditionalprobabilities are e2Y/3e2Y= 1/3 which does not involve y. For the case T = 5, there

are only six sets that resultin conditionalprobabilitiesthat involvey. These


are:
Yl = 0, Y5 = 1 and >Yt= 2 or lYt = 3
Y1= 0, y5 = 0 and Sy = 2
Yl = 1, 5 = 1 and Yt =3

Y1= 1,y5 = 0 and lYt = 2 or yt = 3


This conditional approachwould usually result in using only a small
percentage of the sample observations.In their applicationCorcoranand
Hill used T = 5 but from Table A.2 of their paper, we can infer that the
proportion of the sample used in estimationwas 7.1 percent. With N =
1,251 this means that only 89 observationswere used in the estimation.
Chamberlain(1978, 34-36) considersa dataset with T = 5 andN = 1, 583.
He uses more observationsthan Corcoranand Hill. The percentageof the
sample observationsused was 17 percent, which implies that about 270
observationswere used. Chamberlaingets an estimate yj = 1.96 which
impliesthatthe oddsthat an individualunemployedlast yearis unemployed
this year are e1'96 = 7.1 times higher than if the individual was employed last

year. Corcoranand Hill obtain these odds as 12.1 if the ci are ignoredand
equal to 3.6 making allowance for cxiusing the Chamberlaintechnique.
Thus, making allowance for the individualspecific effects oti reduces the
effect of past unemploymenton currentunemploymentconsiderably.
There are two major limitationsof the autoregressivemodel (26). The
first is that the use of the alternativeset and the conditionallogit method
mightinvolve discardinga largenumberof sampleobservationsand usinga
very smallportionof the data, as in the paperby Corcoranand Hill (1985).
The second problemis that the model does not permitthe use of exogenous
variables(this is a computationalproblemwith this technique)and thus has
limited use for policy analysis.All it tells is whetherYtdepends on t- 1 or
not, afterallowingfor individualspecificeffectsai. Both these problemscan
be solved by going to the randomeffects (probit) models. An alternative
model, suggested by Chamberlain(1978, 30) that allows for the use of
exogenous variablesis the following:
(26a)

Prob(yit = 1 i,i,

t-

exp(o) + YYi,t-1
exp(oti) + exp( - 'xit)

By multiplyingthroughby exp(p'xit)it can be seen thatif y = 0 this reduces


to the fixedeffects logit model with no "statedependence."For this model,

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques:Maddala 331


sufficient statistics for oi are Si =

T=1Yitand Sil

T=2YitYi,t-1.Sill gives

the numberof times a one is precededby a one. Conditioningon yil to deal


with initial conditions, we get the alternativeset as
0 or 1, dl = yil,
Bi = {d = (d, d2,. . . , d)dT)
= SYit, and idtdt_ 1 = SYitYi,t- }.
Sdt
t
t
t
t

Chamberlainshows that the model now reduces to the usual conditional


logit model with no o(i.Theredo not appearto be any empiricalapplications
of model (26a) though Solomon Polachek in his comment on Corcoran
(1982) argued that this model be estimated.
Another problem that also arises with model (26a) is that it does not
ensure that Prob(yit = 1) will necessarilylie in the range (0, 1). Also,
Chamberlainsuggested model (26a) for testing the fixed effects model,
rather than as a generalizationof (26). His idea was that a test of the
hypothesisy = 0 wouldenableus to judgewhetherthe fixedeffectsmodel is
adequate.
The problem of allowing for the influence of observed heterogeneous
time varyingexplanatoryvariablesxitin dynamicmodels can be solved by
going to randomeffects probit models. Also, they can be handled in the
frameworkof log-linearmodels as pointed out by Lee (forthcoming).A
discussionof the log-linearmodel approachby Lee wouldbe too lengthyto
be includedhere. There do not appearto be any empiricalapplicationsof
this approachas yet.

VIII. Autoregressive Probit Models


Withthe probitmodels, the conditioningapproach,used with
the logit models, is not feasible because the fixed effects do not factorout.
Hence, we considerrandomeffects models. We can generalizethe random
effects probit models that we consideredearlier to the case that includes
"state dependence"or laggedvalues of y. The one extraprobleminvolved
here is the problemof the initialconditions.As we mentionedearlierin the
case of continuous variables, Balestra and Nerlove (1966) encountered
problemswith the ML estimationtreatingYioas fixed and Anderson and
Hsiao (1982) and Bhargavaand Sargan(1983) showed the importanceof
specifyingthe distributionof yio. The best specificationis the one that is
directly derived from the model itself. In the case of the autoregressive
probitmodels this involvesvery complicatedexpressions.A less preferable
but more tractableprocedureis to assume that yi's are randomvariables
with a probabilitydistribution

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

332 The Journalof Human Resources


Prob(yio = 1) = F(x!o8)

where8 is a set of unknownparametersto be estimated.This is the assumption used in Heckman (1981).

IX. Other Types of Autoregressive Models


Analogous to the autoregressivelogit and probitmodels, we
can think of autoregressivetobit models of the followingform:
(27)

t- 1 +Uit
yit = i + 3xit + YYi,

Yit= Yitif Yit> 0

= 0 otherwise
The methodsof estimationfor fixedeffects andrandomeffectstobit models
carrythroughfor this type of models, thoughagainthere is the problemof
the initialconditions.It is betterto startwitha probabilitydistributionforyio
ratherthan treat them fixed.
One other problem that arises is the issue of lagged index vs. lagged
dummy(or censored)variablemodels.Forinstance,in models(26) or (27) it
is the laggedvalue of the realizedvariable itratherthan the latent variable
yitthat occurs. A lagged index model would be to specify
(28)

Yit= oti

'YxYi,t-1
+

+ Uit

That is, it is the latent variableYi*t-1 that influencesyt, not the realized
value. In studiesof unemployment,this impliesthat it is the "propensityto
be unemployed"in the last period ratherthan the actualstate of employment or unemploymentthat determinesthe currentprobabilityof unemployment.
An exampleof the laggedindex model is in GretherandMaddala(1982)
where a model like (28), withoutthe ci, has been estimatedwith data from
the 1972, 1974, 1976 election panel study administeredby the Center for
PoliticalStudiesat the Universityof Michigan.Thoughthe exampleis from
the politicalscience area, there are likelyto be severalexamplesin economics where the laggedindexmodel makessense. Some Monte Carloevidence
is also presentedin Gretherand Maddala(1982)on the performanceof the
suggested estimator.

X. Serial Correlation or State Dependence


The issue of serialcorrelationvs. state dependencediscussed
with continuous-variablemodels in Equations(10) and (11) has also been

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques: Maddala


raised in the case of dummy variable models. In the case of continuous
variable models this distinction can be made if there are explanatory variables x. In the case of dummy variable models Heckman (1978) argues that
this distinction can be made even without any explanatory variables. He
uses some tests based on the observed runs of 0 and 1. Chamberlain (1978,
1984) and Lee (forthcoming) point out the limitations of these tests and
suggest corrections. It is not possible to review these tests in detail here, but
some simple procedures suggested by Chamberlain are as follows:
Consider the two models:
yt* = at + PYt- + et and
(30) Yt*= a + ut, ut = put-1 + et
In both models, Yt= 1 if t* > 0, Yt= 0 otherwise. Model (29) is a pure state
dependence model and model (30) is a serial correlation model with no state
dependence. In model (29)
Prob(yt = I|o, Yt- 1, Yt-2 ...)= Prob(yt = 1 1a, t- 1)
(29)

In model (30), however, this probability depends on the entire history of the
process. Thus, model (29) implies a first order Markov chain but model (30)
does not, for the discrete sequence it generates. Hence, one could test
whether the effect of Yt- 2 on Ytis zero or not. But what if we specify (29) with
Yt-2 included? Now the distinction depends on whether the coefficient of
Yt-3 is zero, and so on. Chamberlain says that basing the test on the order of
autoregression in (29) is not attractive. With the availability of data on xt,
one can reformulate the test as asking whether there is a dynamic response
to changes in xt or not. Consider
(31)

yit = P'xit + WYi,t-1+ uit

If y = 0 then
Prob(Yit = xit, xi,t-,

. ..) = Prob(yit = 1 Ixit)

Thus, a test for state dependence is carried out by including lagged x's and
testing whether their coefficients are significant or not. Chamberlain suggests using this simple test in practice allowing for individual specific effects.
One problem with serial correlation is that it depends crucially on the
sampling interval. The smaller this interval, the higher the serial correlation.
Suppose our period of observation is one day. The probability that a person
who worked yesterday would work today would be very close to one.
Hence, Chamberlain argues that finding a significant coefficient for y in (31)
may say very little about the underlying process. The underlying process is a
complete description of the amount of time spent by the individual in each
state (say, employment and unemployment). The analysis now depends on
whether the sample is generated by point sampling or interval sampling. In

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

333

334

The Journal of Human Resources


point sampling we observe what state the individual is in at each point in
time. In interval sampling we ask whether the individual was ever in one of
the states during the previous time period (say, year). An example of this is
the question, "Did you work last year?"
Chamberlain (1984) calls the tests he derives, tests for "duration dependence" rather than "state dependence." The tests are based on conditional
logit models. For point sampling, he suggests estimating Y2from the following model and testing Y2 = 0. (We will drop the i-subscript, since the model
is to be used for each i. We obtain Y2 for each i and then get a weighted
average.)
/

Prob Yl,Y2,...,
=exp2
B =d
t

YtYt-1,YT-1,YT)
T

YtYt- 2)
\t=3

where

YTIY1,Y2,X Yt,

= (d,d2, . ., ddt

\dEB

exp(2

I dt d-2

t=3

= 0or, d = Y, d2 = Y2

dt = SYt,
dtdt-1 = t YtYt-1, dT-1 = YT-1, dT = YT
t
t

For interval sampling Chamberlain suggests a similar test for Y2 except


that the conditioning set B is different (details are in Chamberlain 1984).
The method requires T > 6 in order to generate any conditional probabilities that depend on 72. For the case T = 6, there is only one set that produces
conditional probabilities that depend on Y2.This set includes two sequences:
S1 = (101000) and S2 = (100100). The conditional probabilities are:
exp (/2)
Prob(S1I S or S2) =exp
1 + exp(y2)
and
Prob(S2 S1 or S2) =

1 + exp(y2)

An estimate of exp(y2) is obtained by dividing the number of individuals


with sequence S1 by the number of individuals with sequence S2. Corcoran
(1982) and Corcoran and Hill (1985) use Chamberlain's model with interval
sampling. It is not clear, however, as to how many observations were
available for the estimation of Y2. As with most of the conditional logit
models, it appears that only a small portion of the data set can be used.

XI. Summary and Conclusions


We started with a review of some problems arising in the
estimation of continuous time models with panel data. We reviewed the

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques: Maddala


issues of fixed effects vs. random effects, the problem of initial conditions in
dynamic models, and the issue of serial correlation vs. state dependence.
We next reviewed these issues with qualitative and limited dependent
variable models. With qualitative variables, for fixed effects models we use
the conditional logit model of Chamberlain. In dynamic models this usually
involves discarding a large proportion of the sample data. All in all, estimation of random effects models appears to be more advantageous. They use
all the observations and are more flexible. Here the model we consider is the
probit model rather than the logit model. We considered three methods for
theestimation of random effect probit models: the Heckman-Willis model,
the Avery-Hansen-Hotz model, and the Chamberlain model.
With tobit models, one can estimate both the fixed effects and random
effects models. Also, there are no extra problems with dynamic models
except the problem of how to deal with initial conditions. Here it is desirable
to treat the initial variables as random rather than as fixed.
With dynamic models, one should consider whether the lagged index
model or the lagged dummy (or censored) variables model is appropriate for
the problem at hand. The estimation of the former is more straightforward
than the latter.
Since serial correlation is dependent on the interval of observation, one
should study "duration xependence." Chamberlain suggests some conditional logit models for this, suitable for point and interval sampling.
Besides the models considered here, there are other types of models
based on panel data and using the methods of limited dependent variables.
An example of this is the estimation of "frontier production functions" with
panel data. Illustrations of this are Pitt and Lee (1981) and Schmidt and
Sickles (1984).
One problem that needs further discussion is that of specification testing.
Tests for exogeneity are available in the Avery-Hansen-Hotz method (the
HOTZTRAN program). Chamberlain's models are concerned with tests for
exogeneity as well as omitted variable bias. The Breusch and Pagan test
needs to be extended to cover limited dependent variables and the tests
discussed in Lee and Maddala (1985) need to be extended to cover panel
data.

References
Amemiya, Takeshi. 1985. AdvancedEconometrics.Cambridge:Harvard
UniversityPress.
Andersen, E. B. 1973. ConditionalInferenceand Modelsfor Measuring.
Copenhagen:MentalhygiejniskForlag.
Anderson, T. W., and C. Hsiao. 1982. "Formulationand Estimationof
DynamicModels Using Panel Data." Journalof Econometrics18(1):67-82.

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

335

336

The Journal of Human Resources


Avery, R. B., L. P. Hansen, and V. J. Hotz. 1983. "MultiperiodProbit Models
and Orthogonality Condition Estimation." International Economic Review

24(1):21-35.
Balestra, P., and M. Nerlove. 1966. "PoolingCross-Sectionand Time-Series
Data in the Estimationof a DynamicModel: The Demand for NaturalGas."
Econometrica 34(4):585-612.

Bhargava,A., and J. D. Sargan.1983. "EstimatingDynamicRandom Effects


Models From Panel Data CoveringShortTime Periods."Econometrica
51(6):1635-59.

Bjorklund,Anders. 1985. "Unemploymentand MentalHealth: Some Evidence


From Panel Data." The Journal of Human Resources 20(4):469-83.

Breusch, T., and A. R. Pagan. 1980. "The LagrangeMultiplierTest and Its


Applicationsto Model Specificationin Econometrics."Reviewof Economic
Studies 47:239-53.

Butler, J. S., and R. Moffitt. 1982. "A ComputationallyEfficientQuadrature


Procedurefor the One-FactorMultinomialProbitModel." Econometrica
50(3):761-64.

Chamberlain,Gary. 1978. "On the Use of Panel Data." Manuscript.Harvard


University.
. 1980. "Analysisof CovarianceWith QualitativeData." Reviewof
Economic Studies 47:225-38.

. 1982. "MultivariateRegressionModels for Panel Data." Journalof


Econometrics 18:5-46.

. 1985. "Panel Data." In Handbookof Econometrics,ed. Z. Griliches


and M. D. Intrilligator,vol. 2, 1248-1318.Amsterdam:North-Holland
PublishingCo.
. 1984. "Heterogeneity,OmittedVariableBias and Duration
Dependence." In Longitudinal Analyses of Labor Market Data, ed. J.

Heckmanand B. Singer. New York: AcademicPress.


Corcoran,Mary. 1982. "The Employment,Wage and FertilityConsequencesof
Teenage Women's Nonemployment." In The Youth Labor Market Problem:
Its Nature, Causes and Consequences, ed. R. B. Freeman and D. A. Wise.

Chicago:Universityof ChicagoPress.
Corcoran,Mary, and MarthaS. Hill. 1985. "Reoccurenceof Unemployment
Among Young Adult Men." The Journal of Human Resources 20(2):165-83.

Cox, D. R. 1958. "The RegressionAnalysisof BinarySequences"(with


Discussion). Journal of the Royal Statistical Society Series B, 20:215-42.

for Estimationof
Fuller, W. A., and G. E. Battese. 1973. "Transformations
LinearModels With Nested ErrorStructure."Journalof the American
Statistical Association 68:626-32.

Grether, D. M., and G. S. Maddala.1982. "A Time Series Model With


Qualitative Variables." In Games, Economic Dynamics and Time Series

Analysis, ed. M. Diestler, E. Furst, and G. Schwodiauer,291-305.


Vienna-Wurzburg:Physica-Verlag.
Hansen, Lars P. 1982. "LargeSamplePropertiesof GeneralizedMethod of
MomentsEstimators."Econometrica50(4):1029-54.

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

New Techniques: Maddala


Hausman, Jerry. 1978. "Specification Tests in Econometrics." Econometrica
46:1252-72.

Hausman,Jerry, and WilliamE. Taylor. 1981. "PanelData and Unobservable


Individual Effects." Econometrica 49(6):1377-98.

Hausman,Jerry, and David Wise. 1979. "AttritionBias in Experimentaland


Panel Data: The Gary Income MaintenanceExperiment."Econometrica
47(2):455-73.

Heckman,James J. 1978. "SimpleStatisticalModels for Discrete Panel Data


Developed and Applied to Test the Hypothesisof True State Dependence
Against the Hypothesisof SpuriousState Dependence." Annales de
L'INSEE, 30/31 (1978), 227-69.
. 1981. "StatisticalModels for Discrete Panel Data." In Structural
Analysis of Discrete Data With Econometric Applications, ed. C. F. Manski

and D. McFadden,114-78. Cambridge:MIT Press.


. 1981. "The IncidentalParametersProblemand the Problemof Initial
Conditionsin Estimatinga Discrete Time Discrete Data StochasticProcess."
In Structural Analysis of Discrete Data With Econometric Applications, ed. C.

F. Manskiand D. McFadden,179-95. Cambridge:MIT Press.


. 1981. "Heterogeneityand State Dependence." In Studiesin Labor
Markets,ed. S. Rosen, 91-139. Chicago:Universityof ChicagoPress.
. 1984. "EconometricDurationAnalysis."Journalof Econometrics
24(1/2):63-132.

Heckman,James J., and George J. Borjas. 1980. "Does UnemploymentCause


FutureUnemployment?Definitions, Questionsand AnswersFrom a
ContinuousTime Model of Heterogeneityand State Dependence."
Economica 47(187):247-83.

Heckman,James J., and ThomasE. MaCurdy.1980. "A Life-CycleModel of


Female Labor Supply." Review of Economic Studies 47:47-74.

Heckman,James J., and Robert Willis. 1976. "Estimationof a StochasticModel


of Reproduction:An EconometricApproach."In HouseholdProductionand
Consumption,ed. N. Terleckyj.New York: NationalBureauof Economic
Research.
Johnson, NormanL., and SamuelKotz. 1972. ContinuousMultivariate
Distributions. New York: Wiley.

Lee, Lung-fei. 1981. "EfficientEstimationof DynamicError Components


Models With Panel Data." In Time-SeriesAnalysis, ed. O. D. Anderson and
M. R. Perryman,267-85. Amsterdam:North-Holland.
. Forthcoming."Analysisof EconometricModels for Discrete Panel Data
in the MultivariateLog-LinearProbabilityModels." DiscussionPaper #23,
October 1980, Centerfor Econometricsand Decision Sciences, Universityof
Florida. Journal of Econometrics.

Lee, Lung-fei, and G. S. Maddala.1985. "The CommonStructureof Tests for


SelectivityBias, Serial Correlation,Heteroscedasticityand Non-Normalityin
the Tobit Model." International Economic Review 26(1):1-20.

Maddala,G. S. 1971. "The Use of VarianceComponentsModels in Pooling


Cross-Sectionand Time-SeriesData." Econometrica39(2):341-58.

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

337

338

The Journal of Human Resources


. 1983. LimitedDependentand QualitativeVariablesin Econometrics.
New York: CambridgeUniversityPress.
Malinvaud,E. 1970. StatisticalMethodsof Econometrics.Amsterdam:
North-HollandPublishingCo.
Mundlak,Yair. 1978. "On the Pooling of Time-Seriesand Cross-SectionData."
Econometrica46(1):69-85.
Nerlove, Marc. 1971. "FurtherEvidence on the Estimationof Dynamic
Economic Relations From a Time-Seriesof Cross-Sections."Econometrica
39(2):359-82.
Pitt, M. M., and Lung-feiLee. 1981. "The Measurementof Sources of
TechnicalInefficiencyin the IndonesianWeavingIndustry."Journalof
DevelopmentEconomics9(1):43-64.
Robinson, P. M. 1982. "On the AsymptoticPropertiesof Estimatorsof Models
ContainingLimitedDependent Variables."Econometrica50(1):27-41.
Schmidt,Peter, and Robin Sickles. 1984. "ProductionFrontiersand Panel
Data." Journalof Businessand EconomicStatistics2:367-74.
Sickles, Robin C., and Paul Taubman.1986. "A MultivariateError Components
Analysis of the Health and RetirementStatusof the Elderly." Econometrica
54(6):1339-56.

This content downloaded from 204.235.148.92 on Sat, 24 Jan 2015 06:07:53 AM


All use subject to JSTOR Terms and Conditions

Potrebbero piacerti anche