2.1986 - Longitudinal Data Analysis Using Generalized Linear Models - Zeger.1986

Likelihood, Quasi-Likelihood and Pseudolikelihood: Some Comparisons
Author(s): J. A. Nelder and Y. Lee

Reviewed work(s):
Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 54, No. 1
(1992), pp. 273-284
Published by: Wiley-Blackwell for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2345963 .
Accessed: 20/08/2012 11:06
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Wiley-Blackwell and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend
access to Journal of the Royal Statistical Society. Series B (Methodological).
http://www.jstor.org
J. R. Statist.Soc. B (1992)
54, No. 1,pp. 273-284
Likelihood,Quasi-likelihoodand Pseudolikelihood:Some Comparisons

By J.A. NELDERt and Y. LEE
ImperialCollege of Science, Hallym University,

Chunchon,Korea
Technologyand Medicine,London, UK
[ReceivedApril 1990. RevisedDecember1990]
SUMMARY
in thefitting
Thereis considerableinterest of modelsjointlyto themeanand dispersionof a
response. For the mean parameter,the Wedderburnestimatingequations are widely
accepted.However,thereis some controversy about estimating thedispersionparameters.
Finitesamplingpropertiesof severaldispersionestimatorsare investigated
forthreemodels
by simulation. We compare the maximum extended quasi-likelihoodestimator,the
maximumpseudolikelihoodestimatorand themaximumlikelihoodestimator,if it exists.
themaximumextendedquasi-likelihoodestimatoris usuallysuperiorin
Of theseestimators,
minimizingthemean-squarederror.
Keywords: DISPERSION PARAMETER; EXPONENTIAL FAMILY; EXTENDED QUASI-LIKELIHOOD;
MEAN-SQUARE ERROR; PSEUDOLIKELIHOOD; SIMULATION; VARIANCE FUNCTION
1. INTRODUCTION
There is considerableinterestin the fittingof models jointly to the mean and
dispersionof a response(Nelder and Pregibon,1987; Davidian and Carroll, 1987,
1988;Breslow,1990),and severalapproacheshavebeendeveloped.If itis possibleto
specifya likelihoodthenmaximumlikelihood(ML) can be used; see Aitkin(1987) for
a discussionof the normalcase whenboth it and a2 are modelledas functionsof
covariates.For non-normalerrorsa commonextensionis to theclass of generalized
linear models (GLMs) (McCullagh and Nelder, 1989); however,the GLMs with
Poisson and binomialerrorshave thevarianceas a fixedfunctionof the mean. To
allow thedispersionto varyindependently it is usuallynecessaryto move to models
forwhichthereis no exactlikelihood,but forwhicha quasi-likelihood(QL) can be
defined,based on the firsttwo momentsonly of the distribution(McCullagh and
Nelder(1989), chapter9). For some models,theremayexista truelikelihoodwitha
distributionnot belonging to the GLM family, and also an alternativeQL
formulation.The latteris usually easier to fit, and hence one question to arise
concernsthepropertiesof themaximumquasi-likelihood(MQL) estimatorsrelative
to theML estimatorsforsuch models.
Wedderburn(1974) gave estimatingequationsfortheparametersin themodel for
themean,and theseare widelyaccepted.For GLMs witha fulllikelihoodtheyare the
ML equations; otherwisetheyare MQL equations. Workingfromtheviewpointof
optimumestimatingequations Godambe and Thompson(1989) gave anotherform
fortheseequations; however,as Nelder(1989) pointedout, fordistributions whose
tAddressfor correspondence:Departmentof Mathematics,Imperial College of Science, Technology and

Medicine,HuxleyBuilding,180 Queen's Gate, London, SW7 2BZ, UK.
? 1992 RoyalStatisticalSociety 0035-9246/92/54273 $2.00

274 NELDER AND LEE [No. 1,
thirdand fourthcumulantsfollowtheexponentialfamilypattern,thesereduceto the
Wedderburnequations. There is thus little disagreementabout the estimating
equationsforthemodelforthemean.For modellingthedispersion,somepossibilities
are discussedin McCullaghand Nelder(1989), chapter10; themaindichotomyliesin
thechoiceof responsevariableforthedispersionanalysis.One approachis to use d,
thedeviancecomponentforeach observation,and theotherto use X2, thePearson
x2-component.Associated respectively withthesetwo alternativesare two criteria,
extendedquasi-likelihood(EQL) and pseudolikelihood(PL). EQL was introducedby
Nelderand Pregibon(1987) and leads to theuse of d, whilePL (Carrolland Ruppert,
1982; Davidian and Carroll,1987; Breslow,1990)uses a normallikelihoodwitha2 a
functionof i and leads to the use of X2. For normalerrors,all threecriteria,the
likelihood,EQL and PL, are equivalent.However,theybecome different whenwe
move to modelswithnon-normalerrors,in particularthe class of GLMs, and here
thereis considerabledisagreement about thebestformfortheestimating equations.
On thebasis of asymptotictheory,Davidian and Carroll(1988) have preferred PL to
EQL forsomeproblems.Thusa secondquestionconcernsthefinitesampleproperties
of thetwo corresponding estimators.
Breslow (1990) is interestedin developingrobuststandarderrorsand testsfor
meansthatdo notdependfortheirvalidityon a correctspecification ofthedispersion
parameters.However, for the analysis of quality improvementexperiments,for
example,estimationof dispersionparametersis itselfof interest.In thispaper we
presentsimulationsforthreemodels. In the firstwe considera model involvinga
distributionfortheerrorswhichdoes notbelongto theGLM familyof distributions,
butforwhichthereexistsa QL. The modeltherefore has a likelihoodfromwhichML
estimatorscan be derived.Hence, we comparethe likelihood,EQL and PL forthe
dispersionparameters.In thesecond examplewe have a modelspecifiedby thefirst
two momentsonly,and we compareQL and PL forestimatinga parameterin the
variancefunction;thisexamplecontainsan exampleof Davidian and Carroll(1988).
The thirdexampleis takenfromDean et al. (1989) and involvesa Poisson mixture
withthe inverseGaussian distribution, the latterbeing parameterizedso that the
resultingvariancefunctionhas thenegativebinomialformIt + apu2. We comparethe
likelihood,EQL and PL forthejointestimationof f, theparametersinthemodelfor
it, and a.
Section2 describestheestimatingequationsforthethreecriteria;in Section3 we
presentthe simulations,and in Section4 we discusstheresults.
2. THE THREE CRITERIA

2.1. GeneralizedLinear Models
For theclass-ofGLMs we have a responsevariabley forwhichthelog-likelihood
can be writtenin the form
1(O;y) = {yO-b(O)}/a(0) + c(y,4) (2.1)
where0 is the canonical parameterand a(4) has the form?/m, where0 is the
dispersionparameterand m is thepriorweight.For thefollowingdiscussionwe shall
take m = 1. The mean and varianceof y are givenby
E(y) = A = b'(0)
1992] LIKELIHOOD, QUASI-LIKELIHOOD AND PSEUDOLIKELIHOOD 275
and
var(y) = b"(?).
In a GLM we assume il = g(,t) = Exj1j, wherethexj are covariatesand g( ) is a
monotonicfunctionknownas thelinkfunction.The varianceis theproductof two
terms,thedispersionparameterq and thevariancefunctionb"(6), whichwe writein
the formV(,u) = adt/dO. In thestandardformof GLM we modelit as a functionof
unknownparameters,B,assumingq fixedand with V( ) containingno unknown
parameters.Of thedistributions withlikelihoodsof theform(2.1), thePoisson and
binomial have q = 1, i.e. fixed a priori, while the normal, gamma and inverse
have q variableand usuallyunknown.The negativebinomial
Gaussian distributions
whose variancefunctioncan be writtenin theformV(1t) = it + it2/k,
distribution,
givesan exampleof a variancefunctioncontainingan unknownparameter.
For q fixedtheML equationsforthea are independentof q and are givenby
Z W(y-,u)Odnl -O (2.2)
whereW is theweightfunctiondefinedby
W-1= (dal 2()
As a measureof discrepancyof a fitforone observation,we may use the deviance

componentd, givenby
d =-2 YV Udu
y V(u)
The devianceD is thesum of d overtheobservations.
For all GLMs, we have therelation
al/da = (y-,u)/{4V(,u)} (2.3)
so thatthisfirstderivativeofthelikelihooddependsonlyon thefirsttwomomentsof
y. This led Wedderburn(1974) to definea QL (morestrictly
a quasi-log-likelihood)q
by therelation
aq/ala = (y- At)/{l V(A)}. (2.4)
The use of q as a criterionfor fittingallows the class of GLMs to be extendedto
modelsdefinedonlyby thepropertiesof the firsttwo moments.The QL q willbe a
oftheGLM typehavingvar(y) = q V(1t).QLs
truelikelihoodifthereis a distribution
allow two kindsof extensionto GLMs. In the first,GLMs withfixedq = 1 can be
extendedto allow a variableq; forexamplelog-linearmodelswithPoisson errorsfor
whichvar(y) = ftcan be enlargedto allow overdispersionwithvar(y) = Olt. In the
secondextension,V(,u)takesa formwhichdoes notcorrespondto thatof a standard
GLM, forexample V(jt) =,t, forvariablea.
2.2. Quasi-distributions
A quasi-(log)-likelihood q can be made intoa distribution
bynormalizing it. Such a
distribution If q has parametervectorp.,thenthequasi-
we call a quasi-distribution.
distributionhas frequencyfunction
fq = exp(q)/w,
wherecX = expq dy, and likelihood
lq = q - log .
The ML equations forthe quasi-distribution,
alq/la =0, and the MQL equations,
aq/a1 = 0, willdifferby a term
a(logc) 1 acLa/
= dj-- expqudy
(assumingthatwe can differentiate

insidetheintegral)
!Ialexpaq Yi dy
d: 0 V(,)
whereA* is the quasi-meanI yfqdy. If ,u*- A is small comparedwithy - ,uwe can
expecttheMQL estimatesto approximatecloselytheML estimatesfromthe quasi-
distribution.
2.3. ExtendedQuasi-likelihoodand Pseudo-likelihood

The Wedderburnformof QL can be used to comparedifferent linearpredictorsor
link functionson the same data. It cannot,however,be used to compare
different
variancefunctionson thesamedata. For thiswe needtheEQL ofNelderand
different
Pregibon (1987). This is most simplywrittenin the formof the corresponding
extendedquasi-devianceD+, whichtakestheform
D+ = E di/i + E log{2rxoiV(yi)}, (2.5)

whereq is thedispersionparameterand V(y) is thevariancefunctionappliedto the
observation.For simplicityof notation,we shall suppresssubscripts.When there
existsa distribution
of theexponentialfamilywitha givenvariancefunction,itturns
out that the EQL is the saddlepoint approximationto that distribution.The
approximationis exactforthenormaland inverseGaussiandistributions, differsfor
thegammadistribution by a functionthatdependsonlyon theshape parameterand
forthediscretedistributions by thereplacementof all thefactorialsby theirStirling
approximations.Because theStirlingapproximationfailsfory= 0, it is necessaryto
amendit to take the form
n! V1{2ir(n+')}n nexp(- n).
For detailssee Nelderand Pregibon(1987), section4. In equation(2.5), we mayeither
regardq as known,in whichcase we have a deviancefora GLM withq as dispersion,
or we can regardIt, and hence d, as known,in whichcase we have a QL forthe

dispersionmodel of gamma form.Thus we can use the EQL as a criterionto be
minimizedformodels involvingboth mean and dispersion.Such estimateswe call
MEQL.
PL givesriseto an analogous extendeddevianceof theform
Dp= ZX2/+ + E log{2iro V(yi)}, (2.6)

whereX2 is the Pearson X2-component forthe observation.It can be obtainedby
writingdown a normallikelihoodin whichthevarianceis a functionof themean. It
differsfromthe EQL by havingX2 in place of d and V(,u) in place of V(y). The
analogous estimateswillbe called MPL.
The thirdcriterion,likelihood,needsno descriptionhere.
3. SIMULATIONS
3.1. Example 1: NBa-distribution
The standard negativebinomial model assumes a gamma distributionfor u,
the mean of the Poisson distribution.If we writethis gamma distributionin the
form
expl - -1
(-X dI- (3.1)
then,forthemixturedistribution,
E(y) = = av
and
var(y) = av + a2v.
The standardnegativebinomialmodelforGLMs arisesfromassumingthatvis fixed
as 1tvaries.This gives
var(y) = t + t2/V. (3.2)
If, however,we assume that i varies withv and a remainsconstant,we obtain a
thatwe call theNBa-distribution;forthis
distribution
var(y) = A(1+c) (3.3)
so that we obtain a model withan errordistributionresemblingan overdispersed
withdispersionparameterc = 1 + a. This distribution
Poisson distribution, is not a
standardGLM-type exponentialfamily,so that the ML and MQL equations are
different.The NBa-distributionhas found applicationsin social science; see, for
example,Hausman et al. (1984) and King (1988).
Let dlg, ddg and dtg be differencesof log-gamma,digamma and trigamma
functionsrespectively,definedby
dlg(y,!?) =logP(y+y)
r- logPQ?),
0O, ify=0,
dy-1+p1 +/a+y-2+,
+.
+/a W/a ify= 192, ..
and
09 ify=0,
dtg(Yi9 ) = { - ify=1,2 .
can be written
The log-likelihoodof theNBa-distribution
) A
4(y; t,CZ) dlg(L
( alog a + (y+La) log (I+) + logY!3
Hence, theML estimatingequationsfora log-linkbecome
ai= E ' - log(l

(Y9) +C) 0 (3.4)
fortheparametersin thelinear
Z predictorand
{ddg
Aa =
Addg
a ( - log(l +a)3 + Z f - -0
=
forthenon-linearparametera. If theintercepttermis includedin themodellingof

mean A, al/la = 0 reducesto theequation
(y-A,u)/cx = O. (3.5)
For givena we solveequation(3.4) fora bytheNewtonmethod.The Hessian matrix

is of the form
H ____ r - , [? { dg
d a?) - log(l + a)3] - Ex,x5 2dtg (y, ?).
However,H may have a non-positiveeigenvalue;if so, we replaceH by
E Xs?
Zxr g(Y,Lai)
sincetheexpectationof thefirsttermof His 0 becauseE(/li/far) = 0. This prevents

misbehaviourof theNewtonmethodifH shouldbecomenon-positive-definite. This
is similarto FisherscoringwhichreplacestheHessian matrixbyitsexpectation.The
estimationof a was done by a simpleone-dimensional search.The denominatora in
equation(3.5) makesthesolutionlocusmorequadraticand hencehelpsconvergence.
The estimationproceduresfor: and a arecarriedoutalternately untilconvergence.If
the likelihoodhas a negativeslope at ac=0, we truncatethe estimateto 0. If the
estimateis smalland positive,we avoid thepole in equation(3.5) bygivinga penalty
valuefornon-positive valuesof a duringthesearch.ThisML estimatoris numerically
difficultto obtaincomparedwiththeQL and PL estimators.
The QL and PL estimatorsof /3do notrequireknowledgeof a, and thosefora can

be easilyobtainedfromequations(2.5) and (2.6): sincewe estimatethemeanbyusing
explanatoryvariables,theestimators(adjusted forsamplesize) become
&cql= d/(n- p) - 1
and
p= X2/(n-p)- 1.
Using leverages,as Davidian and Carroll (1987) suggest,leads to anotherpossible

modificationgivingPL equationsforscale parametersbased on
X2- (1 -h)(1 + a)
Z (1 + a)2 ?
where the h are the leverages. In this paper, for simplicity,we consider only
adjustmentforsamplesize; thiswas also suggestedbyBreslow(1990). The QL and PL
estimatorsare also truncatedat 0, as above. To investigatethe propertiesof these
estimators,we did an experimentwithfivefactorsA, B, C, D and E, each at two
levels.Thesewerechosento reflectaspectsoftheconfiguration ofthemeanswhichwe
thoughtmightbe important.The factorsand theirlevelswereas follows:
A -the minimumvalue of thei= 1, 5;
B-the rangeof , definedas max{*t}/min{, = 5, 10;
C-the samplesize, 20, 50;
D-the evennessof spreadfortheexplanatoryvariablex; thelevelis 0 ifx-values
are equally concentratedat end pointsand is 1 ifx-valuesare equally spread
overtherange;
E-the magnitudeof a = 1, 5, i.e. slightor markedoverdispersion.
We used a 2- 1 fractionalfactorialdesignwithalias I= ABCDE fromwhichwe can
estimateall main effectsand two-factorinteractions.When x-values are equally
concentratedat end points,thereare no solutionsforthe ML, MPL and MEQL
estimatorsifall thevalues ofy in one groupare 0. Such data setswerediscardedand
replacementsetsgenerated.
The resultsfortheestimatorsof a are bestexpressedin termsof standardizedbias
and standardizedmean-squarederror(MSE). These are definedby
standardized bias = (a - a)/(1 + a)
and
standardizedMSE = MSE x df/(1+ a)2,
where df is the degrees of freedom.The details of the 16 runs, each with 500
simulations,are given in Table 1. The mean standardizedbiases of the three
estimatorsare all fairlysmall and in mean value not veyydifferent. However,the
variationin bias betweenrunsis considerablydifferent,withtheML estimatorbeing
least variable and the MEQL estimatormost variable. Furthermorethe different
factors in the experimentaffect the estimatorsin differentways. The ML
standardizedbias is primarilyaffectedby C (sample size), beingless as the sample
TABLE 1
Standardizedbias, MSE and relativeefficiency
for threeestimatorsof a
A B C D E No. of samples Standardizedbias StandardizedMSE Relative

discarded ML MEQL MPL ML MEQL MPL efficiency
(from500) MEQL MPL
1 1 1 1 1 1 - 0.0983 - 0.0621 - 0.0658 2.209 1.490 1.983 1.482 1.114

2 2 1 1 1 0 - 0.0982 0.0020 - 0.0188 1.719 1.799 1.813 0.956 0.948
1 1 2 2 1 0 - 0.0488 - 0.0073 -0.0346 2.473 1.753 2.664 1.411 0.928
2 2 2 2 1 0 -0.0506 -0.0098 -0.0176 2.017 1.992 2.081 1.013 0.969
2 1 1 1 2 0 - 0.0998 -0.0920 - 0.0705 2.139 1.735 2.263 1.233 0.945
1 2 1 1 2 11 -0.0992 - 0.2943 - 0.1992 3.135 2.878 3.197 1.089 0.980
2 1 2 2 2 0 -0.0457 -0.0635 -0.0328 2.343 2.101 2.884 1.115 0.812
1 2 2 2 2 0 -0.0402 - 0.1665 - 0.0688 3.039 3.051 4.456 0.996 0.682
2 1 2 1 1 0 - 0.0494 - 0.0033 -0.0200 2.096 2.003 2.212 1.047 0.948
1 2 2 1 1 0 -0.0498 - 0.0632 -0.0382 2.545 1.843 2.783 1.381 0.917
2 1 1 2 1 0 -0.1016 -0.0070 -0.0208 1.755 1.845 1.836 0.952 0.956
1 2 1 2 1 0 -0.0910 -0.0042 - 0.0378 2.044 1.787 1.930 1.144 1.060
1 1 2 1 2 0 - 0.0433 - 0.3147 - 0.1223 4.631 6.415 6.320 0.722 0.733
2 2 2 1 2 0 -0.0492 - 0.0823 -0.0405 2.420 2.177 3.056 1.112 0.792
1 1 1 2 2 0 -0.0918 -0.2683 - 0.1688 3.475 2.619 3.565 1.327 0.975
2 2 1 2 2 0 -0.1058 - 0.0518 - 0.0452 1.977 1.821 2.077 1.086 0.952
Mean -0.0727 -0.0930 -0.0626 2.501 2.332 2.820 1.129 0.919
size increases;by comparisonsamplesize does not affectthe MEQL estimator,the

largesteffectscoming fromnon-additiveeffectsof A (minimumIt) and E (a).
The MPL estimatoralso shows effectsof A and E, but also some effectof sample
size.
The standardizedMSE fortheMPL estimatoris almostuniformly worsethanthat
fortheMEQL estimator,theratiosforMPL/MEQL varyingbetween0.985 and 1.52.
The relationbetweentheML estimatorand theMEQL estimatoris morecomplex;for
12 out of the 16treatmentstheMEQL estimatorhas smallerstandardizedMSE, with
ratiosforML/MEQL varyingfrom0.722 to 1.482 withaverage 1.129. The MEQL
estimatoris least favourablecomparedwiththeML estimatorfortreatment13 with
thecombinationof smallminimumIt, largesamplesize and largea; it is just in this
regionthatthe quasi-distributionderivedfromthe QL differsmost fromthe true
distribution. Whereasthe truedistribution
is unimodalhere,the quasi-distribution
has an additionalspikeat theorigin.
3.2. Example 2: Counts withPower VarianceFunction

Davidian and Carroll (1988) have produced some argumentswhich, when
comparingthe estimatorsbased on EQL and PL, indicatesome advantagesforthe
latter.In particular,althoughtheestimators
are oftenveryclose,thosebased on EQL
maybe asymptotically biasedwhereasthosebased on PL arenot. In theirexamplethe
data are generatedby the Poisson distribution.However,the model postulatesa
variancefunctionof the form
V(y)=
and an estimateis requiredof a, the truevalue being 1. The varianceof y can be

modelledas var(y) = qA,u,where4 can eitherbe assumedto be 1 or be estimatedfrom
data. In thisexample,themain interestis theestimationof a.
The finite-sample-adjusted QL and PL estimatorsfor a can be obtained by
minimizing the followingexpressions:
respectively
D+ =?, + " n log{4(y+ 6)}
and
Dp= E ++ PElog(4t )I
Davidian and Carroll(1988) discussa particularexamplewherethedata consistof a

combinationof two Poisson sampleswithmeans 1 and 4. However,theirasymptotic
argumentsare irrelevanthere since theyassume that the dispersionparameterq
convergesto 0, whilein thisexampleq staysat value 1. Because thePL estimatoris
based on the second momentof the Pearson x2-residualsit will be asymptotically
consistentunderappropriateregularity conditions.By contrast,asymptoticproper-
tiesfortheQL estimatorareoftenbased on theasymptotics relativeto itratherthanto
thesamplesize, and hencewould be consistentforlargeit and so expectedto givea
good performance in thatregion.
For thisexamplewe did a 23 factorialexperiment with500 simulationsforeach
treatment, withfactorsA, therangeofA,(1, 4) or (5, 20), B, thesamplesize,20 or 50,
and C, dispersionfixedat 1 or estimated.The mainresultsare shownin Table 2. As
predictedbyDavidian and Carroll(1988), thenegativebias of theMEQL estimatoris
largerthanthatof theMPL estimator,butthedifferences are notlarge,and in run6
the smallerpositivebias of MEQL is an advantage.The most strikingeffectis the
largegain in efficiencyfromestimatingthe dispersionwhenthe ,uvalues are small
(runs2 and 4). The effectis stillvisibleforlarge,u,butmuchless in size. The MEQL
estimatoris, apartfroma trivialdifference in run5, uniformly betterthantheMPL
estimator.
TABLE 2
for theMEQL and MPL estimatorsof a
Samplingstatistics
A B C Bias Variance MSE MSE ratio

MEQL MPL MEQL MPL MEQL MPL MPL/MEQL
1 1 1 -0.147 -0.085 0.0566 0.1267 0.0783 0.1339 1.71

1 1 2 -0.111 0.190 0.0570 0.3333 0.0693 0.3694 5.33
1 2 1 - 0.090 - 0.032 0.0167 0.0428 0.0247 0.0439 1.78
1 2 2 -0.109 0.052 0.0117 0.0928 0.0235 0.0955 4.06
2 1 1 -0.024 - 0.028 0.0200 0.0197 0.0206 0.0205 1.00
2 1 2 0.441 0.545 0.2738 0.2953 0.4681 0.5922 1.27
2 2 1 - 0.005 - 0.009 0.0074 0.0073 0.0074 0.0074 1.00
2 2 2 0.123 0.185 0.0691 0.0876 0.0843 0.1219 1.45
3.3. Example 3: Poisson-InverseGaussianMixture
Dean et al. (1989) discussa modelin whicha Poisson distribution
is mixedwithan
inverseGaussian distribution suchthattheresultingvariancefunctionis ,t + a,u 2,i.e.
has the same formas the gamma mixturethatgives rise to the negativebinomial
distribution.Startingwith the three-parameter form of the inverse Gaussian
distribution(Folks and Chhikara,1978), whoselog-likelihoodis givenby
a
(a-vy)2 4Iog (2ra2y3)
2or2y 2 a
we considera versionin whichtheproductav staysconstantas themean/tvaries;this
givesthe same distributionas Dean et al. consider,and atis givenby the formula
c = cr2/av.They give asymptoticefficiencies
fortwo estimatorsof a, withdifferent
patternsforthemeans. These estimatorsare
(a) theML estimateand
(b) a modifiedMPL estimator,whichis thesolutionof theequation
(yA-)2-(C+a,2) - o
(1 + a)2
We have takentheirmodel (e), whichhas / = exp(f30 + f3lx)withi31= 0.5, exp,0 = 10

and a thirdof x-valuesat each of - 1, 0 and 1. From simulateddata we derived100
values of threeestimators,theML estimator,theMPL estimatorof Dean et al. and
theMEQL estimatorforthevariancefunction,u + aA 2.We havemodifiedthelasttwo
to allow fordegreesof freedomlostin fitting thetwoparametersforthemeans.The
foursimulationsused samplesizes21 and 51, combinedwithvalues0.2 and 0.5 fora.
The resultsfor a are summarizedin Table 3. For small sample size all three
estimators havenegativebias, thatforPL beingsmallest.However,thisestimatorhas
thelargestvarianceand also the largestMSE. In termsof MSE the EQL estimator
emergesas best,withratiosto theML estimatoras low as 73 0o.For largesamplesize
the differencesare smaller,but withthe MEQL estimatorstillgivingsmallerMSE
than the ML estimator.For large overdispersionthe estimatorsgive verysimilar
TABLE 3
for a frommodel (e) of Dean et al. (1989) with100 runs
Samplingstatistics
Samplesize a Estimator Bias Variance MSE Relativeefficiency
21 0.2 ML - 0.0446 0.00976 0.01175

MPL - 0.0236 0.01164 0.01182 0.994
MEQL - 0.0302 0.00823 0.00914 1.286
21 0.5 ML -0.0936 0.04470 0.05346
MPL -0.0797 0.05244 0.05879 0.909
MEQL -0.1197 0.02473 0.03901 1.370
51 0.2 ML -0.0093 0.004855 0.004941
MPL -0.0049 0.004548 0.004572 1.081
MEQL -0.0141 0.003990 0.004189 1.180
51 0.5 ML - 0.0195 0.02368 0.02406
MPL - 0.0305 0.02169 0.02262 1.064
MEQL -0.0917 0.01482 0.02323 1.036
results.The resultsfor01 showsmallnegativebiases forall threeestimatorsand little

difference in MSE, thoughthatforML is slightlylargerthanfortheothertwo.
The resultsfora are strikinglydifferentfromtheasymptotic resultsgivenbyDean
et al. (1989). Theyobtainedasymptotic fortheMPL estimatorof
relativeefficiencies
0.860 fora = 0.2 and 0.641 fora = 0.5, whereasforour foursimulationsthe finite
samplerelativeefficiency (in termsof MSE) variesbetween0.909 and 1.081. For the
MEQL estimator,whichtheydid notconsider,theresultsare evenmorestriking:for
all runsthisestimatorhas relativeefficiency exceeding1, therangebeingfrom1.036
to 1.370.
4. DISCUSSION
Our studiesof the finitesample propertiesof estimatesof dispersionparameters
showthemarkedlimitationsof asymptoticarguments.Two estimators,whichin the
limitshow one biased and one unbiased, may have verysimilarbiases in finite
samples; similarlyasymptoticrelativeefficienciesmay be quite misleadingwhen
extrapolatedto finitesamples. Taguchi-typeexperiments, used in qualityimprove-
ment,are examplesof data setswheredispersionis to be modelledwithfairlysmall
amountsof data.
As Pierce and Schafer(1986) have pointedout, devianceresidualsgenerallyare
verynearlythesame as thosebased on thebestpossiblenormalizing transformation.
Their arguments,in relation to the saddlepoint approximation,are based on
asymptoticsrelativeto it. They also presentMonte Carlo resultsshowingthatthis
normalizationargumentseems to hold even for small values of it. However, bias
correctionis necessarywhenit is small. This dependson the distributionand hence
requiresestimationof higherordercumulants.Hence, we mayexpectthattheMEQL
estimatorwould have smallervariancethan otherestimators,but would have non-
trivialbias whenitis small.However,thePL estimatoris based on thesecondmoment
of Pearson x2-residualsand hence would be consistenteven for small it with
increasingsamplesizeunderappropriateregularity conditions.However,thePearson
residualsmay, in finitesamples,differconsiderablyfromthose based on the best
normalizingtransformation. This suggeststhattheMPL estimatorhas smallerbias
whenItis smallbutthesamplesizeis large.However,itsvarianceis alwayslargerthan
that of the MEQL estimator.Our simulationstudyshows that varianceinflation
dominatesand hencetheMSE of theMPL estimatoris largerthanthatof theMEQL
estimator.
The efficiencyof the ML estimatoris an asymptoticproperty.But under the
normalityassumptionthe efficiency of the ML estimatorholds forfinitesamples.
Hence, if thereis a normalizingtransformation, thebestestimationequationwould
be obtained.So we mayexpectthatin finitesamplestheMEQL estimatorwouldhave
the smallestvariance, and our simulationstudyshows this. Our studyseems to
indicate that the approximatenormalityof the deviance residual holds quite
generally,and thiswould explainwhythe MEQL estimatorquite oftenhas smaller
MSE thanthatof theML estimator.
Althoughthe scope of the simulationsreportedis necessarilylimited,we can
concludethatof theML, MPL and MEQL estimatorstheMEQL estimatorin these
examplesis neverappreciablyinferior to theMPL estimatorand is oftenmuchbetter
(in termsof MSE). The relationsbetweenthe ML and MEQL estimatorsare more
complex,buttheMEQL estimatorhas a smallerMSE overa widerangeofconditions.
SupportfortheMQL estimatorin estimating theparameterin thevariancefunction
of thenegativebinomialdistribution
comes fromPiegorsch(1990).
ACKNOWLEDGEMENTS
The researchof the second authorwas supportedby the Ministryof Education,
Korean Government.We thankDr N. Breslowforhelpfulcommentson our initial
draft.
REFERENCES
Aitkin,M. (1987) Modellingvarianceheterogeneity innormalregression usingGLIM. Appi. Statist.,36,

332-339.
Breslow,N. (1990) Tests of hypothesesin overdispersedPoisson regressionand otherquasilikelihood
models. J. Am. Statist.Ass., 85, 565-571.
Carroll,R. J. and Ruppert,D. (1982) Robustestimationin heteroscedastic linearmodels.Ann. Statist.,
10, 429-441.
Davidian, M. and Carroll, R. J. (1987) Variance functionestimation.J. Am. Statist. Ass., 82,
1079-1091.
(1988) A note on extendedquasi-likelihood.J. R. Statist.Soc. B, 50, 74-82.
Dean, C., Lawless, J. F. and Willmot,G. E. (1989) A mixed Poisson-inverse-Gaussianregression
model. Can. J. Statist.,17, 171-181.
Folks, J. L. and Chhikara, R. S. (1978) The inverse Gaussian distributionand its statistical
application-a review.J. R. Statist.Soc. B, 40, 263-275.
Godambe, V. P. and Thompson,M. E. (1989) An extensionof quasi-likelihoodestimation.J. Statist.
Planng Inf., 22, 137-152.
Hausman, J., Hall, B. H. and Griliches,Z. (1984) Econometricmodels for count data with an
applicationto thepatents-R& D relationship.Econometrica,52, 909-938.
King,G. (1988) Statisticalmodelsforpoliticalscienceeventcounts:bias in conventionalproceduresand
evidencefortheexponentialPoisson regressionmodel. Am. J. Polit. Sci., 32, 838-868.
McCullagh,P. and Nelder,J.A. (1989) GeneralizedLinear Models, 2nd edn. London: Chapman and
Hall.
Nelder,J.A. (1989) Discussionon An extensionof quasi-likelihoodestimation(byV. P. Godambe and
M. E. Thompson). J. Statist.Planng Inf., 22, 158-160.
Nelder,J.A. and Pregibon,D. (1987) An extendedquasi-likelihoodfunction.Biometrika,74, 221-231.
Piegorsch, W. W. (1990) Maximum likelihood estimationfor the negative-binomialdispersion
parameter.Biometrics,46, 863-867.
Pierce,D. A. and Schafer,D. W. (1986) Residualsingeneralizedlinearmodels.J.Am. Statist.Ass., 81,
977-986.
Wedderburn, R. W. M. (1974) Quasi-likelihood functions,generalized linear models and the
Gauss-Newtonmethod.Biometrika,61, 439-447.

2.1986 - Longitudinal Data Analysis Using Generalized Linear Models - Zeger.1986

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

2.1986 - Longitudinal Data Analysis Using Generalized Linear Models - Zeger.1986

Caricato da

Copyright:

Formati disponibili

Likelihood, Quasi-Likelihood and Pseudolikelihood: Some Comparisons

Author(s): J. A. Nelder and Y. Lee

Likelihood,Quasi-likelihoodand Pseudolikelihood:Some Comparisons

ImperialCollege of Science, Hallym University,

[ReceivedApril 1990. RevisedDecember1990]

tAddressfor correspondence:Departmentof Mathematics,Imperial College of Science, Technology and

? 1992 RoyalStatisticalSociety 0035-9246/92/54273 $2.00

2. THE THREE CRITERIA

W-1= (dal 2()

As a measureof discrepancyof a fitforone observation,we may use the deviance

(assumingthatwe can differentiate

2.3. ExtendedQuasi-likelihoodand Pseudo-likelihood

D+ = E di/i + E log{2rxoiV(yi)}, (2.5)

or we can regardIt, and hence d, as known,in whichcase we have a QL forthe

Dp= ZX2/+ + E log{2iro V(yi)}, (2.6)

Hence, theML estimatingequationsfora log-linkbecome

ai= E ' - log(l

forthenon-linearparametera. If theintercepttermis includedin themodellingof

For givena we solveequation(3.4) fora bytheNewtonmethod.The Hessian matrix

sincetheexpectationof thefirsttermof His 0 becauseE(/li/far) = 0. This prevents

The QL and PL estimatorsof /3do notrequireknowledgeof a, and thosefora can

Using leverages,as Davidian and Carroll (1987) suggest,leads to anotherpossible

A B C D E No. of samples Standardizedbias StandardizedMSE Relative

1 1 1 1 1 1 - 0.0983 - 0.0621 - 0.0658 2.209 1.490 1.983 1.482 1.114

Mean -0.0727 -0.0930 -0.0626 2.501 2.332 2.820 1.129 0.919

size increases;by comparisonsamplesize does not affectthe MEQL estimator,the

3.2. Example 2: Counts withPower VarianceFunction

and an estimateis requiredof a, the truevalue being 1. The varianceof y can be

D+ =?, + " n log{4(y+ 6)}

Davidian and Carroll(1988) discussa particularexamplewherethedata consistof a

A B C Bias Variance MSE MSE ratio

1 1 1 -0.147 -0.085 0.0566 0.1267 0.0783 0.1339 1.71

We have takentheirmodel (e), whichhas / = exp(f30 + f3lx)withi31= 0.5, exp,0 = 10

Samplesize a Estimator Bias Variance MSE Relativeefficiency

21 0.2 ML - 0.0446 0.00976 0.01175

results.The resultsfor01 showsmallnegativebiases forall threeestimatorsand little

Aitkin,M. (1987) Modellingvarianceheterogeneity innormalregression usingGLIM. Appi. Statist.,36,

Potrebbero piacerti anche