Sei sulla pagina 1di 13

Power and Sample Size Calculations

A Review and Computer Program

William D. Dupont, PhD, and Walton D. Plummer Jr., BS


Department of Preventive Medicine, Vanderbilt University School of Medicine,
Nashville, Tennessee

ABSTRACT: Methods of sample size and power calculations are reviewed for the most com-
mon study designs. The sample size and power equations for these designs are shown
to be special cases of two generic formulae for sample size and power calculations. A
computer program is available that can be used for studies with dichotomous, con-
tinuous, or survival response measures. The alternative hypotheses of interest may
be specified either in terms of differing response rates, means, or survival times, or
in terms of relative risks or odds ratios. Studies with dichotomous or continuous
outcomes may involve either a matched or independent study design. The program
can determine the sample size needed to detect a specified alternative hypothesis with
the required power, the power with which a specific alternative hypothesis can be
detected with a given sample size, or the specific alternative hypotheses that can be
detected with a given power and sample size. The program can generate help messages
on request that fadlitate the use of this software. It writes a log file of all calculated
estimates and can produce an output file for plotting power curves. It is written in
FORTRAN-77 and is in the public domain.

KEY WORDS: Power and sample size calculations, cohort studies, case-control studies, dichotomous
or continuous outcomes

INTRODUCTION
Sample size a n d p o w e r calculations for clinical trials a n d observational
studies are typically p e r f o r m e d either by h a n d [1-5], t h r o u g h the use of
p u b l i s h e d g r a p h s or tables [6-10], or t h r o u g h the use of specialized c o m p u t e r
p r o g r a m s [11-17 I. Selecting the s a m p l e size for a s t u d y inevitably requires a
c o m p r o m i s e balancing the n e e d s for p o w e r , e c o n o m y , a n d timeliness. In-
vestigators m u s t d e t e r m i n e their s t u d y ' s s a m p l e size, p o w e r , a n d detectable
alternative h y p o t h e s e s . To do this, it is useful to h a v e a p r o g r a m that, given
a n y two of the p r e c e d i n g p a r a m e t e r s , is to be able to calculate the third.
The p u r p o s e of this article is to introduce such a p r o g r a m (POWER) a n d
to review the p o w e r a n d s a m p l e size calculations that are required for the
m o s t c o m m o n s t u d y designs. For each design considered in this article, POWER
calculates the s a m p l e size n e e d e d to detect a particular difference in t r e a t m e n t
efficacy with a specified p o w e r , the p o w e r with w h i c h a particular difference

Address reprint requests to: William D. Dupont, S-3301 Medical Center North, Department of Pre-
ventive Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232-2637.
Received May 10, 1989; revised October 11, 1989.
116 ControUed Clinical Trials 11:116-128 (1990)
0197-2456/1990/$3.50 Elsevier Science Publishing Co., Inc. 1990
655 Avenue of Americas, New York, New York I0010
Review of Power and Sample Size Calculations 117

can be detected with a given sample size, and the difference that can be
detected with a specified power and sample size.
The study designs that can be evaluated by this program are summarized
in Table 1. In this table, independent study designs refer to those in which
subjects are independently selected at random from some target population.
Matched designs are ones in which one or more control subjects are matched
to each case patient with respect to certain attributes. Paired designs are
matched designs with one control per case. In cohort studies, subjects are
followed forward in time until some event occurs [18]. All clinical trials are
cohort studies. Case-control studies look for risk factors in samples of case
patients with a specific disease and control patients who do not have this
disease [2]. A survival outcome variable consists of the time until death or
some morbid event occurs, or the total follow-up time for a patient who does
not suffer this event. Continuous outcome variables like weight or serum
creatinine may take a wide range of values. Dichotomous outcomes take only
two values such as success or failure, or the presence or absence of some risk
factor.
In justifying our study design, the actual power that will be achieved with
the selected sample size is more relevant than the power that would have
been achieved with other sample sizes that were considered but ultimately
rejected. The power associated with the selected sample size can be most
effectively demonstrated by plotting the power curve as a function of the true
value of the parameter of interest under different alternative hypotheses. The
coordinates for such curves can be generated by POWER for input into graph-
ics software packages.
POWER is in the public domain and is available from the authors on request
for the cost of distribution. It is written in ANSI standard FORTRAN-77 and
has been run successfully on both PC computers running under MS-DOS and
VAX computers running VMS.

METHODS
Generic Power and Sample Size Formulas
All the methods discussed in this paper are variations on a familiar theme
[3, sect. 5.2]. Suppose that we observe responses on n patients (or groups of
patients) that are dependent on some parameter 0. Let f(0) be a known mon-
otonic function of 0 and let S be a statistic derived from the n responses that
has a normal distribution with mean X/-dnf(0) and standard deviation ~(0). Let
[z] be the cumulative probability distribution for a standard normal random
variable and let z~ -- ~-111 - a] denote the critical value that is exceeded by
a standard normal random variable with probability o~. Let 00 and 0a denote
the values of 0 under the null and a specific alternative hypothesis, respec-
tively. Let or0 = o'(00), o-a = o-(0o) and let ~ = {f(%) - f(00)}/o'~ denote the
difference between f(%) and f(00) expressed in standard deviations of S under
the alternative hypothesis. Testing the null hypothesis against a two-sided
alternative hypothesis with type I error probability ~ leads to rejection of the
null hypothesis when

IS - ~ f(00)l > ~0z~2. (1)


Table 1 S t u d y D e s i g n s T h a t C a n Be E v a l u a t e d b y the P O W E R C o m p u t e r P r o g r a m ~
Study Design
Method Outcome Test Independent Cohort vs.
Number Variable Statistic vs. Matched Case-Control Reference
1 Survival log rank Independent Cohort Schoenfeld and Richter [10]
2 Continuous t Paired Either Pearson and Harfley [9]
3 Continuous t Independent Either Pearson and Hartley [91
4 Dichotomous X~ Matched Case-control Dupont [61
5 Dichotomous 2 Paired Cohort This paper
6 Dichotomous uncorrected X2 Independent Case--control Schlesselman [2]
7 Dichotomous Fisher's exact Independent Either Casagrande et ai. [5]
or corrected
X2
8 Dichotomous uncorrected X2 Independent Cohort Meinert [1]
o
"The terms used in this table are defined in the Introduction.
Review of Power and Sample Size Calculations 119

W h e n 0 = 0a, the probability that S will satisfy Eq. 1 equals the p o w e r


associated with this alternative hypothesis, which is
l -- ~ = ( 1 ) [ ~ V ~ - (O'O/O'a)Zed2] q- ( ~ [ - ~ V ~ - ((YO/O'a)Z~d2]. (2)
The first a n d second terms o n the right-hand side of Eq. 2 give the probabilities
u n d e r the alternative hypothesis of obtaining S > V~n f(00) + o'0z~,.2and S <
X/-~nf(%) - croZ~2, respectively. One or the other of these terms is usually
negligible for relevant values of ~ and 13. The smaller of these two terms will
be less than 0.001 as long as 2((ro/~r,)z~/2 + z~ >~ 3.1 (see Appendix). Hence,
approximating this probability by zero in Eq. 2 yields
n = [(crdcr,,)z~,~ + z~]~/~ ~. (3)
To illustrate the use of Eq. 3, consider a sample of size n r a n d o m l y d r a w n
from a normal population with mean i~ and k n o w n variance ~r~. Let ~ d e n o t e
the sample mean, S = ~ Y, f(t~) = ~ be the identity function, fro = cro =
~r and ~ = {f(i~) - f(Ix0)}/, = ( ~ - ~0)/~r. T h e n S has a normal distribution
with mean X~n ~ and standard deviation ~r. Substituting %, ~r,, and ~ into
Eq. 3 gives n = (zoo2 + z~)2tr-~/(Ixa - ~0) 2, which is Eq. 5.34 in Ref. 3.
The p o w e r and sample size formulas for the study designs considered in
this article can n o w be obtained by substituting the appropriate definitions
of (r0, cro, and 8 into Eqs. 2 and 3. Equation 3 can also be used to find the
specific alternative hypothesis that can be detected with p o w e r 1 - I~ and
sample size n. These equations do not yield a closed solution w h e n tr0 ~ ~r~,
but can be readily solved by a c o m p u t e r using iterative m e t h o d s [19[.
POWER provides a warning message with sample size estimates w h e n e v e r
2(o'o/(ra)z,~/2 + Z~ <~ 3.1. W h e n this h a p p e n s Eq. 3 will provide a sample size
estimate w h o s e p o w e r exceeds 1 - 13 by no more than a/2. POWER assumes
a two-sided alternative hypothesis for all study designs. Sample size calcu-
lations for one-sided tests may be obtained by doubling the value of a.

Log Rank Tests of Survival Data


Suppose that n patients are to be recruited into each of two treatment
groups during an accrual period of length A w h o are then followed for an
additional follow-up period F. (In other words, follow-up for all patients ends
on the same day, with follow-up intervals ranging from F t h r o u g h A + F
days.) Assume that recruitment follows a uniform distribution over the accrual
interval A and that the survival times for patients on treatments 1 and 2 have
exponential distributions with medians m~ and m2, respectively.
Let R = m2/m~ be the ratio of median survival times on the two treatments.
(R is also the relative hazard or instantaneous relative risk for patients on
treatment 1 relative to patients on treatment 2.) Let t i be the total n u m b e r of
patient days of follow-up on treatment j and let ai be the n u m b e r of observed
events on treatment j: j = 1,2. Let S = ~ log(t2adha2)), m = (rn~ + m2)/2,
P ( A ) = {1 - e x p ( - l o g ( 2 ) A / m ) } / ( l o g ( 2 ) A / m ) , G(F) = e x p ( - log(2)F/m), and p
= 1 - P ( A ) G ( F ) . Then Schoenfeld and Richter [10_] have s h o w n that S has
an asymptotically normal distribution with mean V'n log(R) and approximate
standard deviation ~r = k/~p. We wish to test the null hypothesis that R =
1. Letting f(R) = log(R), or0 = ~r~ = or, and ~) = {f(R) - f(1)}/~r,, = log(R)/~r,
120 W.D. Dupont and W.D. Plummer, Jr.

and substituting 8, 0, and cr~ into Eqs. 2 and 3 gives the p o w e r and sample
size formulas associated with the specific alternative h y p o t h e s i s that the ratio
to m e d i a n survival times equals R. This version of Eq. 3 is identical to the
sample size formula derived by Schoenfeld and Richter [10, p. 169]. These
formulas are appropriate for studies that will be analyzed using the log-rank
test [20] in addition to the parametric test of Schoenfeld and Richter [10].

t Tests of Paired Continuous Response Data


Let T k ( X ) d e n o t e the cumulative probability distribution for a t statistic with
k degrees of freedom. Let tk,,~ = T ~ I ( 1 - c~) be the critical value that will be
exceeded with probability c~by a t statistic with k degrees of freedom. S u p p o s e
that a c o n t i n u o u s r e s p o n s e measure is observed on n patients before and
after t r e a t m e n t and that the difference of these measures on a given patient
has a normal distribution with m e a n A and u n k n o w n standard deviation or.
Let ~ d e n o t e the average difference in these r e s p o n s e measures and let S =
V~n ~. We wish to test the null h y p o t h e s i s that A = 0. Letting 8 = A/or and
approximating the sample standard deviation by ~r gives that

1 - /3 = T , , _ l [ S X / ~ n - t~-1.,,/2] + T,_l[-a~nn - t,-,.~,21, (4)


n = ( t , 1.,,.2 q- t,,_. 1,13)2/~ 2, (5)

by precisely the same a r g u m e n t used to derive Eqs. 2 and 3. Equation 5 must


n o w be solved using iterative m e t h o d s because n appears on both sides of
the equal sign [19].
Exact p o w e r and sample size calculations for t tests can be derived in terms
of the noncentral t distribution [21[. Table 10 of Pearson and Hartley [9]
provides g r a p h s for p o w e r calculations that are based on this derivation. Their
graphs are in close a g r e e m e n t with Eqs. 4 and 5.

t T e s t s for I n d e p e n d e n t Continuous Response Data


S u p p o s e that i n d e p e n d e n t normal response m e a s u r e s are observed on
patients w h o either receive an experimental or control treatment. S u p p o s e
further that n patients receive the experimental treatment a n d that the ratio
of control to experimental subjects is m. Let the m e a n r e s p o n s e for experi-
mental and control patients be ~1 and ~2, respectively, and a s s u m e that the
standard deviation of responses within each treatment g r o u p is ~,. Let Yc~
and :t= d e n o t e the sample m e a n res._ponse of the experimental and control
groups respectively, and let S = Vn(i~ - i=). T h e n S has m e a n ~/~n(~.~ -
~.2) and standard deviation cr = ~ , X/1 + l l m . Thus, if we let ~ = (~.1 -- ~._,)/,
then the analogous p o w e r and sample size formulas c o r r e s p o n d i n g to Eqs.
4 and 5 are

1 - /3 = T,,~.... 1)-2 [aX/-~ - tn~.,, i) . 2,,.,'2]


+ T,,(.... 1~-2 [ - c r ~ n - t,,~.... I) -- 2,t,!2] (6)
and

~ ~- (~n(m+D-2~:2 ~- ~{ ....l) 2,~)2/~2 (~)


Review of Power and Sample Size Calculations 121

Equations 6 a n d 7 give p o w e r and sample size estimates that are in d o s e


a g r e e m e n t with the Table 10 of Pearson and Hartley [9].

X2 C o n t i n g e n c y Table Tests of I n d e p e n d e n t D i c h o t o m o u s R e s p o n s e Data


S u p p o s e that i n d e p e n d e n t d i c h o t o m o u s responses are observed on patients
w h o either receive an experimental or control treatment. S u p p o s e further that
n patients receive the experimental treatment and that the ratio of control to
experimental subjects is m. Let po and Pl d e n o t e the probabilities that a patient
on the control or experimental treatments will have an e v e n t (positive re-
sponse). Let p d e n o t e this probability for all subjects c o m b i n e d and let ~)0 and
p~ d e n o t e the observed proportion of events in the two treatment groups.
Let q = 1 - p and qi = 1 - pi: i = 0,1. Then S = ~/-~n(p~ - J)0) will have an
asymptotically normal distribution with m e a n V~n(pl - P0). U n d e r the null
h y p o t h e s i s that p~ = p0 = p, the variance of S will be O~o = pq(1 + l / m ) .
Under the alternative hypothesis that p~ - po = 4, this variance will be ~
= poqo/m + Plql. Thus, the p o w e r and sample size formulas associated with
the specific alternative hypothesis that pl - P0 = ~ m a y be obtained by
substituting 8 = ~/%, ~0, and % into Eqs. 2 and 3. This version of Eq. 3 is
identical to Eq. 6.6 of Schlesselman [2] and to Eq. 9.7 of Meinert [1]. W h e n
m = 1, this equation corresponds to the sample size formula given by Fried-
man et al. [8, p. 75].
Equations 2 and 3 may also be used for case-control studies. In these studies
p0 and p~ d e n o t e the probability of exposure in control and case patients,
respectively. The alternative h y p o t h e s i s for such studies may also be ex-
pressed in terms of P0 and the odds ratio ~. In this case p~ = p0~/(1 + P0(~
- 1)). For prospective studies, it is generally more useful to express the
alternative h y p o t h e s i s in terms of P0 and the relative risk R = p~/po.
The preceding version of Eq. 3 is appropriate for studies that can be ad-
equately assessed with an uncorrected ~ statistic. Casagrande et al. [5] pro-
posed a continuity correction which should be used for studies that will be
analyzed with Fisher's exact test or with a 2 test that uses a continuity
correction. The m e t h o d of Casagrande et al. has been generalized to the case
of unequal sample sizes by Fleiss [4, Eq. 3.18]. Let n' d e n o t e the estimated
n u m b e r of experimental subjects obtained from Eq. 3. Then the c o r r e s p o n d i n g
corrected sample size is

n =~ 1 + \/1 + nmlpo- Pd] " (8)

X2 Tests for Matched Case-Control Studies


S u p p o s e we have n matched sets, each of which consists of a case patient
and m matched controls. Let P0 and pl d e n o t e the probability of exposure to
some risk factor of interest a m o n g control and case patients, respectively, let
~ d e n o t e the correlation coefficient for exposure b e t w e e n a case and o n e of
his matched controls, and let ~ d e n o t e the odds ratio for e x p o s u r e in cases
and controls. Let ql = 1 - Pl, q0 = 1 - P0, Pl~ = plPo + ~ ~ Pm=
q~po - rb p~~q~p~o, po = pl~/p~, po- = pm/ql, qo = 1 - po, qo- = 1 - po-,
and
122 W.D. Dupont and W.D. l'lummer, Ir.

t~ = p~ k - 1 P~7-~ qo+''' ~' l + q~ p~, qi",- ~ : k = 1.... m.

Let n~ d e n o t e the n u m b e r of m a t c h e d sets of subjects in which the case patient


was (i = 1) or was not (i = 0) exposed and j of the m matched controls were
exposed. Let T~ = n~,k_~ + no,t be the n u m b e r of sets in which k subjects were
exposed. Let S = X~'_~ I'll,k_l/ ~ , f(q~) = Y-"~!'=IkttO/(kq~ + m - k ~ 1), and
0.2(q~) = El'~l k t k O ( m - k + 1)/(k~, + m - k + 1)2. T h e n D u p o n t [6] shows
the conditional distribution of S given the ancillary statistics Vk" k = 1. . . . m
has an asymptotically normal distribution with mean X/77nf(6) and standard
deviation 0.(,b). We wish to test the null h y p o t h e s i s that ~ = 1. Substituting
0.o = o'(1), 0",~ = 0"(0), and 8 = {f(~) - f(1)}/0",, into Eqs. 2 and 3 give the
p o w e r and sample size formulas associated with the specific alternative hy-
pothesis that the odds ratio equals O. These versions of Eqs. 2 and 3 are
identical to Eqs. 6 and 7 of D u p o n t [6], respectively.

McNemar's Test for Paired Dichotomous Response Data


from Prospective Studies
The test statistic discussed in the preceding section reduces to M c N e m a r ' s
test for paired studies with a single control per case (Eq. 5.5 of Breslow and
Day [22]). Thus, the p o w e r calculations of D u p o n t [6] are also appropriate
for paired case-control studies that are evaluated using M c N e m a r ' s test. For
paired prospective studies with d i c h o t o m o u s response variables we may be
primarily interested in the relative risk of failure in experimental subjects
relative to controls. Let P0 and p~ d e n o t e the probability of failure a m o n g
control and experimental patients, respectively, and let R = p~/po. The sample
size and p o w e r calculations associated with a specific relative risk R can then
be derived in an analogous fashion to those of the preceding section. These
calculations differ only in that n o w Pl = Rp0, whereas in the paired case-
control study, Pl is a function of p,, 0, and ~b [6].

U S I N G THE POWER PROGRAM: AN EXAMPLE


The p o w e r and sample size formulas that are reviewed in the preceding
section have been i m p l e m e n t e d in a program called POWER. The p o w e r and
sample size calculations g e n e r a t e d by POWER agree with examples published
by other authors [2, 4, 6, 8-10]. Users of this program may obtain help mes-
sages at any time by typing a question mark (?). These messages give the
definitions of the terms that the user must enter into the p r o g r a m and elim-
inate the need for a separate reference manual. An illustration of the use of
this p r o g r a m for survival cohort studies follows. In this example p r o m p t i n g
messages are written in regular type while input from the user is written in
boldface:
S RUN POWER
POWER program. Type ? for help, R for references, (ctrl)z to exit
Please enter a log file name
(The default log file n a m e is POWER.LOG):
Review of Power and Sample Size Calculations 123

Type of outcome variable?


(1 = survival, 2 = continuous, 3 = dichotomous): 1
What do you want to know?
(1 = sample size, 2 = power, 3 = detectable alternative): 1
How is the alternative hypothesis expressed?
(1 = two survival times, 2 = hazard ratio or relative risk): 1
Enter ALPHA, POWER, M1, M2, A, AND F: ?
Input required from the user:

ALPHA Type I error probability for two-sided test


POWER The desired statistical power
M1 Median survival time on control treatment
M2 Median survival time on experimental treatment
A Accrual time during which patients are recruited
F Additional follow-up time after end of recruitment
Output:
N Number of patients per group that must be recruited to detect a
true ratio of medial survival times M2/M1 with power 1-BETA and
type I error probability ALPHA.

Type E to edit previously entered values


Enter ALPHA, POWER, M1, M2, A and F: 0.1, 0.8, 11, 16.5, 24, 12
ALPHA = 0.1000 POWER = 0.8000 M1 = 11.0000
M2 = 16.5000 A = 24.0000 F -= 12.0000
Required sample size: 110
Enter ALPHA, POWER, M1, M2, A, AND F:

Answering "2" to the second question in the preceding example permits


the derivation of the power curve associated with a range of different alter-
native hypotheses. The coordinates of these curves can be written to a data
file for subsequent use by graphics software packages. Figure 1 shows such
a curve for survival data with a sample size of 110 patients per group, a
median control survival time of 11 months, a two-sided type I error probability
of 0.1 (one sided c~ = 0.05), and accrual and follow-up times of 24 and 12
months, respectively. Note that this figure is in agreement with the sample
size calculations of the preceding example.
The questions asked by POWER to specify the sample size and power
calculation method used are given in Table 2. This table also shows the ac-
ceptable answers to these questions and the resulting method that is used.
124 W.D. Dupont and W.D. Plummer, jr.

D.~
0.8
PATIENTSPER
0.7-~ \ / TREATMENT
GROUP= 110
0.6-~ \ / TWOSIDEDTYPEI ERROR
~05- \ / PROBAB,LITY
= OA
~o " ~ / PATIENTACCRUALTIME
0.~,- \ / = 24 MONTHS
0..5- \ / ADDITIONALFOLLOW-UPTiME
O2- ~ / = 12 MONTHS
~ MEDIANSURVIVALTIMEFOR
0.1 CONTROLPATIENTS=I1 MONTHS
0.0 . . . . , . . . . , . . . . , . . . . 0

5 10 15 20 25
MEDIANSURVIVALTIMEFOREXPERIMENTALPATIENTSIN MONTHS

Figure 1 Power curve for a clinical trial in which 110 patients are randomized into
each of two treatments. The coordinates of this curve were generated by
the POWER computer program using the method Schoenfeld and Richter
[101.

A D D I T I O N A L EXAMPLES A N D C O M M E N T S
Survival Data
Consider a clinical trial in which patients are r a n d o m i z e d to one of two
treatments and then followed for some specified length of time, or until death.
A statistic that is c o m m o n l y used to assess treatment efficacy for such trials
is the log-rank test [20]. This test makes no a s s u m p t i o n s about h o w mortal
risk varies with time since r a n d o m i z a t i o n in either g r o u p , and is the optimal
test with respect to alternative h y p o t h e s e s in which the hazard ratio (instan-
taneous relative risk) b e t w e e n the treatment g r o u p s remains constant over
time [23]. The POWER p r o g r a m uses the m e t h o d of Schoenfeld and Richter
[10] to assess the p o w e r of trials that will be analyzed with the log-rank test.
In this m e t h o d , the alternative h y p o t h e s i s of interest is specified by the median
survival times on the two treatments. If preliminary data are available, these
times m a y be estimated from K a p l a n - M e i e r survival curves [20,24] to be the
times at which 50% of patients in each g r o u p will have died. These curves
must be extrapolated b e y o n d the available follow-up period if less than 50%
of patients have died d u r i n g this time. If the survival curves follow an ex-
ponential distribution then the ratio of the m e d i a n survival times of the ex-
perimental patients relative to the controls will equal the hazard ratio of
controls relative to experimental patients. Thus, if we only have preliminary
data on the control group, we can base our p o w e r calculations on the expected
m e d i a n survival time a m o n g control patients and the relative risk or hazard
ratio between experimental and control subjects that we wish to detect. POWER
permits p o w e r calculations for survival studies to be formulated in this way.
In the preceding example, this could be d o n e by a n s w e r i n g "2" to the third
question to specify that the alternative h y p o t h e s i s is to be expressed as a
hazard ratio. POWER will then ask for this ratio, which, in this example, is
Table 2 Q u e s t i o n s A s k e d by the P O W E R Program, the Acceptable A n s w e r s , a n d the Resulting S a m p l e Size
a n d P o w e r Calculation M e t h o d s T h a t Are Used a
Questions Answers
Type of outcome variable? Survival Continuous Dichotomous
O
What is the study design? Q.N.A. ~ Paired Independent Matched or
Paired Independent
Is this a case-control study? Q.N.A. Q.N.A. Q.N.A. Yes No Yes No
Method number 1 2 3 4 5 6,7 7,8
aThe method number given above is defined in Table 1.
~Question not asked
126 W.D. Dupont and W.D. Plummer, Jr.

16.5/11 = 1.5. In other words a trial with 110 patients per group will be able
to detect the alternative hypothesis of 50% greater morbidity on the control
treatment with 80% power and a 10% type I error.
The Schoenfeld and Richter [10] method permits the follow-up interval to
be specified as an accrual interval A when patients are recruited plus an
additional follow-up interval F. Note that if all patients are followed for the
same length of time then A equals zero and F equals the uniform follow-up
interval.

Continuous Response Data


The power of studies of continuous response data are affected by the
variation of patient responses within treatment groups. It is for this reason
that it is necessary to estimate the standard deviation of patient responses.
Although this can be very difficult to do in the absence of good pilot data, it
is helpful to bear in mind that 95% of patient responses should lie within a
range of four standard deviations.
Paired study designs will be more powerful than independent designs if
the matching variables account for a sizable amount of the patient variation.
It can, however, be difficult to find suitable pairs of patients if there are several
matching variables and the matching criteria are sufficiently strict. In obser-
vational studies it is often easier to recruit control patients than case patients.
The power of such studies may be increased by recruiting multiple controls
per case.
A common clinical trial design involves measuring a response variable on
patients before and after treatment. Suppose that patients are randomized to
a control and experimental treatment, that the response measure (say weight)
is normally distributed, that we measure each patient's weight before and
after each treatment, and that we wish to determine whether the change in
weight varies between the experimental and control groups. Such studies can
be analyzed by an independent t test on the change in weight for each patient.
To perform power calculations for such a study using POWER, one must
specify a continuous independent study design and then estimate the stan-
dard deviation of the change in weight among patients who receive the same
treatment. If it is easier to specify the standard deviation (r, of patient's base-
line weight, and the correlation coefficient p between baseline and follow-up
weight on the same patient, then the standard deviation of the patient's
weight change can be calculated to be (r = ~b'V'(2(1 -- f)) [1, sect. 9.4.2.2].
This standard deviation is then entered as S after specifying a continuous
paired study design to the POWER program.

Dichotomous Response Data


Suppose that patients are randomized to an experimental or control treat-
ment whose outcome is either success or failure. The success rates of these
treatments may be assessed using a 2 test or Fisher's exact text [4]. When
the number of successes and failures on each treatment are large, all of these
methods are equivalent. There has, however, been considerable controversy
over the most appropriate method when the number of patients in one of
Review of Power and Sample Size Calculations 127

the four cells of the o u t c o m e table become m o d e r a t e or small [25]. Meinert


[1] r e c o m m e n d s using the uncorrected X2 statistic if there are at least 15
patients in each cell of this table. M a n y authors r e c o m m e n d that Yates's
corrected 2 statistic be used for tables with m o d e r a t e m i n i m u m cell size and
that Fisher's exact test be used w h e n the m i n i m u m expected cell size is less
than five [4]. The correct sample size calculations for such studies d e p e n d s
on the test statistic that will be used. POWER performs the appropriate sample
size calculations for each of these test statistics.
The alternate h y p o t h e s i s is often expressed in terms of relative risks. For
prospective studies this is simply the ratio of failure rates of patients on the
two treatments. For case-control studies, relative risk is estimated by the o d d s
ratio if the disease u n d e r s t u d y is rare [4]. POWER asks w h e t h e r the user is
planning a case-control s t u d y in order to allow the user to express the alter-
native hypothesis as an odds ratio. The choice of the test statistic is not affected
by w h e t h e r the study is prospective or retrospective.

We thank Robert A. Parker, George W. Reed, Gordon R. Bernard, Curtis L. Meinert, and the
referees for helpful advice, and Janelle Steele and Virginia McKinney for assistance in preparing
this manuscript. This research was supported in part by NIH grants and contracts HL-14192,
N01-AI-52593, R01-CA40517, and R01-CA46492.

REFERENCES
1. Meinert CL: Clinical Trials: Design, Conduct, and Analysis. New York: Oxford
University Press, 1986
2. Schlesselman JJ: Case-Control Studies: Design, Conduct, Analysis. New York:
Oxforcl University Press, 1982
3. Steel RGD, Torrie JH: Principles and Procedures of Statistics: A Biometrical Ap-
proach, 2nd ed. New York: McGraw-Hill, 1980
4. Fleiss JL: Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley,
1981
5. Casagrande JT, Pike MC, Smith PG: An improved approximate formula for cal-
culating sample sizes for comparing two binomial distributions. Biometrics 34:483-
486, 1978
6. Dupont WD: Power calculations for matched case-control studies. Biometrics 44:
1157-1168, 1988
7. Feigl P: A graphical aid for determining sample size when comparing two inde-
pendent proportions. Biometrics 34:111-122, 1978
8. Friedman LM, Furberg CD, DeMets DL: Fundamentals of Clinical Trials. Boston:
John Wright PSG, 1982
9. Pearson ES, Hartley HO: Biometrika Tables for Statisticians, 3rd ed. Cambridge:
Cambridge University Press, 1970, vol I
10. Schoenfeld DA, Richter JR: Nomograms for calculating the number of patients
needed for a clinical trial with survival as an endpoint. Biometrics 38:163-170,
1982
11. Gross AJ, Hunt HH, Cantor AB, Clark BC: Sample size determination in clinical
trials with an emphasis on exponentially distributed responses. Biometrics 43:875-
883, 1987
12. Halpen J, Brown BW Jr: Designing clinical trials with arbitrary, specification of
128 W.D. Dupont and W.D. Plummer, Jr.

survival functions and for the log rank or generalized Wilcoxon test. Controlled
Clin Trials 8:177-189, 1987
13. Lachin JM, Foulkes MA: Evaluation of sample size and power for analysis of
survival with allowance for nonuniform patient entry, losses to follow-up, non-
compliance and stratification. Biometrics 42:507-519, 1986
14. Lakatos E: Sample sizes based on the log rank statistic in complex clinical trials.
Biometrics 44:229-241, 1988
15. Parker RA, Bregman DJ: Sample size for individually matched case-control studies.
Biometrics 42:919-926, 1986
16. Self SG, Mauritsen RH: Power/sample size for generalized linear models. Bio-
metrics 44:79-86, 1988
17. Taulbee JD, Symons MJ: Sample size and duration for cohort studies of survival
time with covariables. Biometrics 39:351-360, 1983
18. Kelsey JL, Thompson WD, Evans AS: Methods in Observational Epidemiology.
New York: Oxford University Press, 1986
19. Ralston A: A First Course in Numerical Analysis. New York: McGraw-Hill, 1965
20. Peto R, Pike MC, Armitage P, et al: Design and analysis of randomized clinical
trials requiring prolonged observation of each patient: II. Analysis and examples.
Br J Cancer 35:1-39, 1977
21. Johnson NL, Kotz S: Distributions in Statistics Continuous Univariate Distribu-
tions--2. New York: Wiley, 1970
22. Breslow NE, Day NE: Statistical Methods in Cancer Research: Vol I--The Analysis
of Case-Control Studies. Lyon: International Agency for Research on Cancer, 1980
23. Peto R: Rank tests of maximal power against Lehmann-type alternatives. Bio-
metrika 59:472-475, 1972
24. Lee ET: Statistical Methods for Survival Data Analysis. Belmont, CA: Lifetime
Learning, 1980, pp 76-87
25. Dupont WD: Sensitivity of Fisher's exact test to minor perturbations in 2 x 2
contingency tables. Stat Med 5:629-635, 1986

APPENDIX
S u p p o s e that 2(Cro/cr~)z./~ ~ z~ ~ 3.1 and we select n according to Eq. 3.
Then

I, lx/ n =
and hence

-Jgl~ - (~4~,,)z~/2 ~ - 3.1 = @-~ [0.001]. (9)


W h e n ~ > 0, Eq. 9 implies that ~he right-most term in Eq. 2 is less than 0.001.
W h e n ~ < 0 the other term in Eq. 2 will be ~ 0.001. Hence the approximation
used to derive Eq. 3 from Eq. 2 is excellent as long as 2 ( d , ) z ~ / ~ ~ z~ ~ 3.1.

Potrebbero piacerti anche