Sei sulla pagina 1di 6

PHARMACEUTICAL STATISTICS

Pharmaceut. Statist. 2003; 2: 127132 (DOI:10.1002/pst.051)

Negative binomial control limits for


count data with extra-Poisson variation
David Homan*,y
Preclinical and Research Statistics, Sano-Synthelabo Research Division, Malvern, PA,
USA

Traditional techniques for calculating control limits for processes with discrete responses are based on
the Poisson distribution. However, for many processes, the assumption of a Poisson distribution is
violated. In such cases, use of traditional Poisson control limits may result in an inated risk of Type I
error. The negative binomial distribution is a natural extension of the Poisson distribution and allows
for over-dispersion relative to the Poisson distribution. A simple approach to calculating exact and
approximate control limits for count data based on the negative binomial distribution is described.
The approach is illustrated by application to water bacteria count data taken from a water
purication system. Copyright # 2003 John Wiley & Sons Ltd.

1. INTRODUCTION The negative binomial distribution is a natural


and more exible extension of the Poisson
The Poisson is often the standard distribution distribution and allows for over-dispersion relative
considered for modelling random counts. As such, to the Poisson. The negative binomial distribution
traditional techniques for calculating control limits can be derived from several models. Accordingly,
for processes with discrete responses are based on there are a variety of denitions in the literature.
the Poisson distribution [1]. The Poisson distribu- Though typically derived as a generalization of the
tion has the well-known property that the mean of geometric distribution, the negative binomial can
the distribution is equal to the variance. also be derived as a mixture of Poisson distribu-
However, for many processes, the Poisson tions.
distribution provides an inadequate model. Var- Applications of the negative binomial distribu-
ious types of processes can produce distributions tion are wide-ranging. The negative binomial
of counts which are not adequately modelled by distribution has been shown to have applicability
the Poisson distribution. Such processes include in accident statistics, birthdeath processes, mar-
situations where counts tend to occur in clusters, ket research, econometrics, biometrics, and ecol-
or situations where the intensity rate of the counts ogy, among others [3]. However, there is little in
varies randomly over time [2]. the literature to fully illustrate the use of the
negative binomial distribution for control chart
*Correspondence to: D. Homan, Preclinical and Research purposes, particularly with regard to the use of
Statistics, Sano-Synthelabo Research Division, 9 Great Valley
Parkway, Malvern, PA 19355, USA modern statistical software for negative binomial
y
E-mail: david.homan@sano-synthelabo.com model-tting.

Copyright # 2003 John Wiley & Sons, Ltd.


128 D. Homan

This paper considers the negative binomial So


model as a simple and exible alternative to the Z 1
Poisson model for count data. A simple approach px pxjlf ldl
0
to constructing both exact and approximate Z 1
control limits for count data based on the negative x!Gaba 1 lxa1 el11=b dl
binomial distribution is described. The utility of 0
the negative binomial approach is illustrated by Gx abx

application to water bacteria count data taken x!Ga1 bxa
from a water purication system. ! x  a
xa1 b 1
:
a1 1b 1b
2. THE NEGATIVE BINOMIAL
DISTRIBUTION Note that this is simply a reparameterization of the
negative binomial distribution function given in
2.1. Probability function (1), with k 1=a and m b=k:
The negative binomial distribution is characterized
by two parameters: m and k, where k is typically
termed the negative binomial dispersion para- 3. CONSTRUCTION OF CONTROL
meter. The probability function for the negative LIMITS
binomial distribution with parameters (m, k) is
given by: 3.1. Parameter estimation
! 
kmx Gx 1=k Several methods exist for estimating the para-
pxjm; k ;
1 kmx1=k x!G1=k meters of the negative binomial distribution [3].
The simplest method is the methods of moments.
x 0; 1; 2; . . . ; m; k > 0 1 The method-of-moments estimators of m and k are
The mean and variance of the negative binomial given by:
distribution are given by: s# 2  X%
m# X% ; k#
EX m; varX m km2 X%
2

It is clear that the negative binomial distribution where X% and s# 2 are the sample mean and variance,
may be particularly useful for over-dispersed data respectively.
where var[X]4E[X]. The maximum likelihood estimate of m is simply
the sample mean, while the estimate for k is the
2.2. A derivation of the negative binomial solution of a nonlinear equation and must be
solved iteratively. However, the maximum like-
Suppose that XPoisson(l) and that l itself is a lihood estimator for k is preferable, as it tends to
random variable with lgamma(a; b). Then the have smaller mean square error than the moment
unconditional distribution of X is negative bino- estimator [4]. Many statistical software packages,
mial, as follows. Let p(xjl) denote the conditional such as the SAS GENMOD procedure [5], are
distribution of X, given l. Likewise, let f l denote capable of obtaining the maximum likelihood
the distribution of l. Then estimates of m and k.
lx el
pxjl ; x 0; 1; 2; . . .
x! 3.2. Exact control limits
and
After obtaining estimates for m and k, these
f l Gaba 1 la1 el=b ; l>0 estimates can be used to obtain a tted negative

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 127132
Negative binomial control limits 129

binomial distribution p#xjm# ; k#: Since the negative is a limit UCL such that:
binomial, like the Poisson, is an asymmetric  
2 2UCL 1
distribution, probability limits are more appro- PX > UCL 1  P wn 4 a
1 mk
priate than typical 3s limits, such as those used for
control charts based on the normal distribution. Likewise, an approximate lower control limit for a
For a specied level a Type I error probability in nominal level a Type I error probability is a limit
one direction, an upper control limit (UCL) and LCL such that:
lower control limit (LCL) are limits such that  
2LCL 1
P(X>UCL)4a and P(X5LCL)4a. Thus, an PX5LCL 1  P w2n 5 a
1 mk
exact level a upper control limit and exact level a
lower control limit are given by: Thus, an approximate level a upper control limit
(  ) and approximate level a lower control limit are
X n
 # given by:
UCL inf n p#xjm# ; k51  a  
 x0
w2n;1a 1 m# k#  1
(  ) UCL
X n 2

LCL inf n p#xjm# ; k# > a  
 x0
w2n;a 1 m# k#  1
LCL
These limits cannot be calculated directly, but can 2
be easily determined by iterative calculation. Also where w2n;1a w2n;a is the upper (lower) a percentile
note that these limits are truly exact only if the true  the  chi-square distribution with n 2m# =
of
distribution of the process response is indeed 1 m# k# degrees of freedom. Negative lower
negative binomial and the parameters m and k control limits can simply be set to zero.
are known. In most cases, the true distribution of
the process response will not be known well
enough to verify this.
4. AN EXAMPLE
4.1. Data
3.3. Approximate control limits
The use of exact and approximate negative
Approximate control limits can be obtained based binomial control limits will be illustrated with
on a chi-square approximation to the negative the following example. The data consist of water
binomial. Let X have a negative binomial distribu- bacteria counts (per millilitre) for water samples
tion with parameters (m,k). Then, from a result by taken from a water purication system. Eighteen
Guenther [6], sequential samples were taken and analysed for the
  number of bacterial colonies. The purication
2 2r 1
PX4r  P wn 4 system was assumed to be functioning normally
1 mk
and in control. It was desired to calculate an upper
where w2n is a chi-square variate with n 2m=1 control limit for the number of bacteria in future
mk degrees of freedom. Guenther shows the samples, based on these data. The data are listed in
approximation to be quite good over a range of Table I.
m and k values, particularly for r > 1:
Based on this approximation, approximate
4.2. Negative binomial model tting
upper and lower control limits can be determined.
For a nominal level a Type I error probability in Parameters of the negative binomial model were
one direction, an approximate upper control limit estimated using the SAS GENMOD procedure

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 127132
130 D. Homan

Table I. Water bacteria counts. deviance has an asymptotic chi-square distribution


Sample no. Count Sample no. Count with np degrees of freedom, where n is the
number of observations and p the number of
1 20 10 7
parameters. Thus, a scaled deviance value (de-
2 5 11 30
3 23 12 4 viance divided by degrees of freedom, denoted j)
4 19 13 16 close to 1 is indicative of model adequacy. From
5 13 14 2 the GENMOD procedure, the scaled deviance for
6 12 15 4 the negative binomial model is j 1:199; indicat-
7 14 16 1
ing adequate model t.
8 17 17 2
9 2 18 0

4.3. Control limits


A one-sided a 0:001 upper control limit will be
(SAS version 8.2). The following SAS code
constructed. The choice of a 0:001 is somewhat
performs the model-tting:
arbitrary, but is chosen here to agree closely with
PROC GENMOD; standard normal-theory 3s limits, which corre-
MODEL COUNT = / DIST=NEGBIN LINK=LOG; spond to a one-sided Type I error probability of
0.001 35.
Let p#x denote the tted negative binomial
The GENMOD procedure yields the following model for the water bacteria data. An exact upper
maximum likelihood estimates: m# 10:611; k# control
Pnlimit is given by the smallest integer n such
0:7902: that #x51  0:001: This yields PX467
x0 p
Also note that the goodness of t for a general- 0:999 09: Thus, an exact one-sided a 0:000 91
ized linear model can be roughly assessed by the upper control limit is given by UCL 67: Figure 1
deviance criterion. If the model is correct, the gives the control chart with center line (c%) and

Figure 1. Control chart with exact UCL for water bacteria count.

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 127132
Negative binomial control limits 131

exact one-sided a 0:000 91 upper control limit model yields a scaled deviance value of j 8:052;
for the bacteria counts. indicating very poor model t.
An approximate one-sided nominal a 0:001 An upper control limit based on the Poisson
upper control limit (UCL*) using the chi-square distribution can be obtained with exact or
approximation to the negative binomial can be approximate methods. An exact limit can be
determined by: obtained from the tted Poisson distribution with
l 10:611: We have PX422 0:999 35: Thus,
w2n;10:001 1 10:611  0:7902  1
UCL an exact one-sided a 0:000 65 Poisson upper
2 control limit is given by: UCLPoisson 22:
where w2n;10:001 is the upper a 0:001 percentile of An approximate control limit can be found
a chi-square distribution with n 2  10:611=1 using the chi-square approximation to the Poisson
10:611  0:7902 2:26 degrees of freedom. This [3]. An approximate upper control limit UCL is
yields an approximate upper control limit of: found by setting:
UCL 67:5:  
Note that in this case, the approximate and 1  P w22UCL1 52m# a
exact upper control limits yield an identical a level,
though this will not be true in general. and solving for UCL, where w22UCL1 is a chi-
square variate with 2(UCL+1) degrees of free-
dom. This yields an approximate one-sided
a 0:001 Poisson upper control limit of:
4.4. Comparison with Poisson model
UCLnPoisson 21:4:
As noted earlier, traditional control limits for Note that two (samples 3 and 11) of the eighteen
count data are based on the Poisson distribution. samples cross both Poisson upper control limits.
For comparative purposes, the water bacteria This illustrates the increased risk of a Type I error
count data were also tted to a Poisson model associated with utilizing traditional Poisson con-
using the GENMOD procedure. The Poisson trol limits when the data exhibit extra-Poisson

Figure 2. Observed versus estimated cumulative distribution functions.

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 127132
132 D. Homan

variation. Assuming the water bacteria counts The negative binomial distribution provides a
truly follow a negative binomial distribution with simple and exible alternative to the Poisson
parameters m 10:611 and k 0:7902; the exact distribution for modelling count data. Negative
and approximate one-sided a 0:001 Poisson binomial model-tting can be easily performed
upper control limits yield true Type I error risks with available statistical software, such as the SAS
of a 0:114 and a 0:127; respectively. GENMOD procedure. Exact control limits can be
A more illustrative comparison of the perfor- obtained via iterative calculation. Approximate
mance of the negative binomial and Poisson control limits can be easily obtained based on a
models for the water bacteria data can be seen chi-square approximation to the negative bino-
by visual inspection of the estimated cumulative mial.
probability distributions. Figure 2 gives the
observed cumulative distribution function versus
the estimated negative binomial and Poisson
cumulative distribution functions. Figure 2 clearly
shows the superiority of the negative binomial REFERENCES
model. The Poisson model poorly characterizes the
1. Montgomery D. Introduction to statistical quality
water bacteria data.
control (3rd edn). Wiley: New York, 1992.
2. Rice J. Mathematical statistics and data analysis (2nd
edn). Duxbury: Belmont, CA, 1995.
5. CONCLUSIONS 3. Johnson N, Kotz S, Kemp A. Univariate discrete
distributions (2nd edn). Wiley: New York, 1992.
4. Wilson L, Folks J, Young J. Multistage estimation
Traditional techniques for calculating control compared with xed-sample-size estimation of the
limits for processes with discrete responses are negative binomial parameter k. Biometrics 1984;
based on the Poisson distribution. However, for 40:109117.
many processes, the Poisson distribution provides 5. SAS Institute, Inc. SAS/STAT1 users guide, Version 8.
SAS Institute: Cary, NC, 1999.
an inadequate model. In such cases, use of 6. Guenther W. A simple approximation to the negative
traditional Poisson control limits may result in binomial (and regular binomial). Technometrics 1972;
an increased risk of Type I error. 14:385389.

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 127132

Potrebbero piacerti anche