Sei sulla pagina 1di 5

Journal of Clinical Epidemiology 66 (2013) 197e201

Sample size calculations for pilot randomized trials: a confidence


interval approach
Kim Cocks, David J. Torgerson*
York Trials Unit, Department of Health Sciences, University of York, Heslington, York YO10 5DD, UK
Accepted 5 September 2012; Published online 27 November 2012

Abstract
Objectives: To describe a method using confidence intervals (CIs) to estimate the sample size for a pilot randomized trial.
Study Design: Using one-sided CIs and the estimated effect size that would be sought in a large trial, we calculated the sample size
needed for pilot trials.
Results: Using an 80% one-sided CI, we estimated that a pilot trial should have at least 9% of the sample size of the main planned trial.
Conclusion: Using the estimated effect size difference for the main trial and using a one-sided CI, this allows us to calculate a sample
size for a pilot trial, which will make its results more useful than at present. 2013 Elsevier Inc. All rights reserved.
Keywords: Sample size; Pilot trials; Confidence intervals; Statistical power; Review; Randomised trials

1. Background
Randomized controlled trials (RCTs) are often complex,
time consuming, and expensive. Ideally, before a large RCT
is undertaken, a pilot or feasibility study that informs the
design of the main trial should be conducted. It is useful,
at this stage, to distinguish between a pilot trial and feasibility study. A pilot trial replicates, in miniature, a planned
larger study [1], whereas a feasibility study may help in the
development of the intervention and/or outcome measures.
Consequently, in the definitive trial, the intervention may
be quite different, and the outcomes may have changed.
In this article, we are only discussing pilot trials: studies
that mimic, in all the major essentials, the future definitive
trial. Our arguments apply to both pilot trials run before the
main study (external pilot trials) and those run as the first
stage of the main trial (internal pilot studies).
Often, researchers justify a pilot study to help with the
calculation of the sample size for the main trial. Estimates
of treatment effects and their variance from pilot studies
may be used to generate possible sample size requirements,
but there is a problem with this approach. Effect sizes from
any small trial will be bounded by a high degree of uncertainty. Consequently, if one were to plan a definitive trials
Conflict of interest statement: We confirm that we have no conflict of
interest.
Competing interests: Both authors have no competing interests.
* Corresponding author.
E-mail address: djt6@york.ac.uk (D.J. Torgerson).
0895-4356/$ - see front matter 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jclinepi.2012.09.002

sample size based on the estimate obtained from a small


pilot, then it is likely that the main trial will be underpowered as many published small trials overestimate treatment
effects. For example, in the design of a trial of yoga for low
back pain, three small previously published pilot trials
returned an average difference in back pain scores of 0.98
of a standard deviation [2]. However, this difference was
not used to plan the main trial as the researchers judged
it to be unexpectedly large. Indeed, the main trial showed
a much smaller difference of around 0.5 of a standard deviation difference [3]. Therefore, although pilot trials are
useful in many areas of trial design, their use for the determination of the sample size for the main trial should be
used with caution, and the clinical relevance must also be
considered. There is often the scenario that the difference
to be detected for the main trial can be reasonably informed
by known clinical or economic importance; however, a pilot
study may still be desirable to assess whether such a difference is likely. We propose that if careful attention is paid to
the sample size calculation in these pilot trials, they could
be much more informative, resulting in cost savings and
more efficient use of patients in trials.
Sample size calculations for pilot trials are sometimes
not undertaken. Indeed, many journal editors publishing
pilot trials either do not expect them or suggest that they
should not be done [1,4]. In our experience, this view is
widely held, but we will argue that it is mistaken.
The argument for not undertaking a sample size calculation for pilot trials hinges around the problem of a type II

198

K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201

What is new?
 Many randomized controlled pilot trials do not
have an a priori sample size calculation. In this article, we argue that sample size calculations are
beneficial.
 In this study, we suggest a novel approach of using
the anticipated main study to inform the pilot trials sample size using a confidence interval
approach.
 We argue that using a sample size such that it gives
a one-sided 80% confidence interval which excludes the minimum important clinical effect size
for the main study enhances a pilot studys utility.

error, which is where one concludes, because of the small


sample size, that there is no worthwhile difference between
the groups when in fact there is. By definition, a pilot trial
is small and not intended to be large enough to identify
a meaningful difference between the treatment groups that
can be statistically significant. Consequently, how would
one undertake a meaningful sample size calculation? Nevertheless, some authors have made some suggestions of
sample sizes for pilot trials with figures of 12 [5], 10 [6],
and 15 [7] per group, 32 in total [8] for a two-arm trial
or 50% of the total main trials sample size [9] (Table 1
for fuller details). Justifications for these figures include
precision around the mean and variance or having the
power to show a large difference (1 SD) between groups
if it were present.
In this article, we suggest an alternative approach,
whereby the sample size calculation for the pilot trial is
driven by the proposed sample size of the main trial. We
suggest that we can use objective criteria for establishing
a sample size for pilot studies by using a confidence interval (CI) approach [10] rather than using the more usual
power and statistical significance method.

2. The role of CIs for informing more research


Most medical journals nowadays insist that reports of trials (and other quantitative studies) include CIs or their Bayesian counterparts, credibility intervals. A 95% interval, the
most commonly used, gives an estimate of where the true,
but unknown, clinical difference lies. For large trials, the uncertainty surrounding any estimate is relatively narrow, and
we can be confident that the true clinical difference does
not depart very much from the observed difference seen in
the trial. The most important use of measures of uncertainty
is to inform the need for further research. Very narrow intervals, in the context of a robust study, suggest that the further
replication of the study is not required, whereas wider

intervals mean a greater uncertainty and act as a pointer for


further research. Usually, two-sided intervals are generated
because for most treatments, we are interested as to whether
the intervention is harmful and beneficial.
We can use CIs or the Bayesian equivalent, credibility
intervals, in pilot trials to inform the decision to go forward
with the main study. Because uncertainty intervals (i.e.,
confidence or credibility) are used to inform the need for
further research, we would argue that they should be
produced in the analysis of pilot trials.

3. Aim and rationale for the use of the 80% one-sided CI


What we suggest is that we identify a sample size, for
our pilot trial, of sufficient size such that if our observed
difference between the two groups, in the pilot trial, is zero,
then the upper confidence limit will exclude the estimate
that is considered clinically significant in the planned
definitive trial. If we are to use uncertainty estimates in
the analysis of pilot trials then it follows, we should calculate sample sizes for pilot studies to optimize their utility.
This would give us a statistical basis for our sample size
calculation, which will ensure that we have a sufficient
sample size to aid our decision as to whether we should
move forward into the main trial.
We want to identify a sample size that gives us reasonable confidence that our pilot trial is big enough to enable
us to be confident that we are making the right decision in
proceeding to a larger trial or not. However, we would not
require too large a sample size as this increases the cost,
time taken to conduct the pilot, and potential for more patients to be exposed to an ineffective treatment. These all
negate some of the advantages of undertaking a pilot before
our main trial. Consequently, we would not use a 95%
interval rather we would use a smaller interval, and we propose an 80% interval that will satisfy the need for reasonable certainty for trial decision making but would be
small enough to deliver a study within a reasonable budget
and timeframe, although some may feel more comfortable
using a 90% interval. Furthermore, we propose to use
a one-sided CI as we are only interested in proceeding
toward the main trial if there is some evidence of effectiveness. If the intervention appeared to be harmful, even if this
were not statistically significant, we would not proceed.

4. Sample size estimation


In the following examples, CIs for the standardized
effect size have been calculated using the inversion CI principle via SAS software (SAS Institute Inc., Cary, NC, USA)
(NONCT function) [11]. Sample sizes for the main trial
have been calculated using PS Power and Sample Size software (Vanderbilt University) [12]. Table 2 and Fig. 1 show
the recommended pilot sample sizes for various standardized effect sizes (using one-sided 80% and 90% confidence

K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201

199

Table 1. Summary of recommendations for sample size calculations for pilot trials
Study

Recommended
sample size

Wittes and
Brittain [9]

50% of total sample


size of main trial.

Birkett and
Day [6]

10 patients per group


40 maximum. Larger
sample sizes for larger
main studies.
Not clear.

Browne [7]

Sandvik et al.a

Minimum of 20 patients
in total.

Julious [5]

12 patients per group.

Sim and
Lewis [13]

At least 55.

Present study

At least 9% of main
trials sample size.

Justification for sample size


To confirm estimates of variance
used to power main study. The
study should be large enough to
be confident in the variance
estimates.
Extension to Wittes and Brittain but
argued that good estimates of
study variance can be obtained
with a smaller sample size.
Sample size rule of thumb of 30
is too small if one uses pilot
sample estimate of variance.
Need to be more conservative and
use upper confidence intervals.
Sample size depends on sample of
original study that provided
estimate of standard deviation.
Number is divisible by 2, 3, 4, and
6, which can be used as block
sizes in restricted randomization.
Further reductions in variance
decline after 12.
Smaller sample sizes are likely to
underestimate the variance and
lead to an underpowered main
trial.
This allows a one-sided 80%
confidence interval to exclude a
clinically important difference.

Internal/external pilot
Internal

Internal

Method of analysis and findings


Estimate of variance. If larger than
expected, increase sample for
main trial, and if lower, retain
original sample size. No estimate
of treatment effect.
Estimate of variance. No estimate of
treatment effect.

Internal/external

Calculate 80% and 90% confidence


interval of standard deviation, and
use this to plan main studys
sample size.

Internal

Estimate of a new standard


deviation.

Internal/external

Use point estimate of variance.

Internal/external

Use upper 95% one-sided


confidence limit of pilot standard
deviation.

Internal/external

Obtain treatment estimate if above


zero proceed to main trial. No
hypothesis tests.

Sandvik L, Erikssen J, Mowinckel P, Rodland EA. A method for determining size of internal pilot studies. Statistics in Medicine 1996;
1587e90.

limits). Table 3 indicates the sample size requirements for


binary outcomes. Note that the sample size calculations
here make no allowance for attrition.

5. Some worked examples


How might this work? Suppose we wanted to undertake
a study in which we felt that 0.3 of a standard deviation
between two groups was worthwhile. Such a study would
require about 350 participants (assuming 80% power and
a two-sided alpha of 5%) in the final analysis (Table 2).
However, the funding agency recommends that the researchers undertake a pilot trial to test the recruitment rate
and assess whether such a difference is likely to be realistic.
Using a CI approach, we would calculate the pilot sample
size required to produce an upper limit of a one-sided
80% CI which excludes 0.3, assuming that the treatment
estimate from the pilot was zero or less. Thirty-two
participants (approximately 9% of main sample size) would
be required to produce a one-sided 80% confidence limit,
which would exclude this estimate (i.e., upper 80%
CI 5 0.2976). Consequently, if we undertook such a pilot
with that sample size and found an estimate larger than zero
and the pilot also showed that it were feasible to recruit and

retain the participants, and so forth, then the recommendation would be to move forward with the main study. In
Table 2, we show some examples of sample size calculations for continuous outcome measures.
An example of using this approach might be in the field
of low back pain. The main outcome measure used in this
area is the Roland and Morris disability questionnaire
(RMDQ), which has a standard deviation of about 4 points
and an average score of 8 [3]. Let us suppose that we want
to evaluate an inexpensive intervention such that a modest
difference of 1 point is considered worthwhile (i.e., a difference of a quarter of a standard deviation). To have an 80%
power to detect such a difference (alpha 5 0.05), we would
require 504 participants in the analysis. If we recruited, randomized, and analyzed 46 participants (i.e., 23 in each
group), we could produce a one-sided 80% confidence
limit, which would exclude a 1 point difference on the
RMDQ, if the point estimate from the pilot study were 0.
Similarly, let us suppose that we want to undertake a trial to
reduce the proportion of older people who are at risk of falling
from 50% down to 40%. To show this difference with an 80%
power (alpha 5 0.05), we would need to randomize and analyze about 800 participants. However, if we wish to undertake
a pilot, then recruiting, randomizing, and analyzing 72 participants and assuming that 18 fell in each of the two groups (i.e.,

200

K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201

Table 2. Recommended pilot sample size for continuous outcome measures


Standardized effect
size for main trial
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10

Sample size
for main triala

Pilot sample size


(80% level)

Upper 80% one-sided


confidence limit

Pilot sample size


(90% level)

Upper 90% one-sided


confidence limit

128
158
198
258
344
504
786
1398
3142

12
14
18
24
32
46
72
126
284

0.4859
0.4499
0.3967
0.3436
0.2976
0.2482
0.1984
0.14995
0.0999

28
34
42
54
74
106
166
292
658

0.4844
0.4397
0.3955
0.3488
0.2980
0.2490
0.1989
0.14999
0.0999

NB: Sample sizes not corrected for attrition.


a
Assuming 80% power and 5% significance level using t-test.

50%) would produce a one-sided 80% confidence limit that


would exclude us finding a 10% point difference that would
be statistically significant in a larger trial.
In terms of the analysis of the pilot study, we are only
interested in whether the treatment estimate is larger or
smaller than zero. Consequently, it is not necessary to formally undertake a hypothesis test of the results.

6. Recommendations for pilot study sample size


There are alternative approaches to estimate the sample
sizes for pilot studies, and some of these are summarized in
Table 1. As Table 1 shows that most previous studies looking
at the sample size estimation hinge around trying to estimate
an unknown variance. In contrast, our approach assumes that
the variance is known; it is the likelihood of the main study
finding a minimum clinically important effect size, which
drives the sample size in our approach. If the researchers
are unsure of the variance, then they should consider using
one of the other approaches listed in the table.
Using our methods, if we choose to use a one-sided 80%
confidence limit, then we will need to recruit, retain, and analyze about 9% of the total sample size needed for the main
trial (for continuous or dichotomous outcomes). Clearly, in
some instances, larger pilots may be warranted. If we wanted
to estimate the main studys standard deviation, for instance,
we should probably seek a sample size of at least 50 [13]. Furthermore, recruiting only 9% of the total estimated sample

size of the main trial may be insufficient to be sure that recruitment targets are achievable. Therefore, we would see
that 9% of the main studys sample size is potentially the minimum size of a pilot rather than the maximum. Indeed, we
would suggest that as a minimum, at least 20 participants
should be included in a pilot study as this seems to be the
smallest amount that is reasonable from statistical modeling
studies (Table 1). For pilot studies that wanted to estimate the
value of a parameter, such as a standard deviation, and assess
whether the main trial is worthwhile, we suggest using the
largest sample size estimate, if these are different.

7. Discussion
In our experience, there is a current belief among some
journal editors, researchers, and funders that it is not necessary or desirable to undertake sample size calculations for pilot trials. We disagree and have argued that by formally
undertaking a priori sample size calculation for a pilot study
will enhance its utility. However, we believe that the use of
our suggested approach or one advocated by other authors is
better than not doing a sample size estimation for pilot trials.
It is not the case that if an appropriately sized pilot study
showed a zero or negative effect size, this would automatically preclude the main trial going forward. It may be that
the study, although planned as a pilot, actually behaved more
like a feasibility study, in that during the study, the elements
of the intervention were found to require change, which may

Fig. 1. Sample size requirements for one-sided 80% confidence interval to exclude required standardized effect size.

K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201

201

Table 3. Recommended pilot sample size for binary outcomes


Control group
proportion

Difference to be
detected (%)

Sample size for


main trial

Pilot sample size


(80% level)

Upper one-sided
80% confidence limit

Pilot sample size


(90% level)

Upper one-sided 90%


confidence limit

5
10
15
5
10
15
5
10
15

3,130
774
338
2,754
712
324
2,188
586
276

284
72
32
238
60
28
182
46
22

0.0499
0.0992
0.1488
0.0500
0.0996
0.1458
0.0499
0.0993
0.1435

658
166
74
552
138
62
422
106
48

0.0500
0.0995
0.1490
0.0500
0.09999
0.1492
0.0499
0.0996
0.1480

0.50
0.50
0.50
0.30
0.30
0.30
0.20
0.20
0.20

have explained a lack of effect. Similarly, even if the pilot


trial identified a positive effect, it may also have found that
recruitment was simply too difficult to make it possible to
successfully recruit to the main trial. Therefore, all the other
reasons for doing a pilot trial remain. However, undertaking
a formal sample size calculation will ensure that the pilot
produces much more added value than is commonly the case.
Small trials on average appear to generate larger estimates of
treatment effects than bigger trials, and if this phenomenon
were true, then pilot trials using our approach will tend to recommend going ahead more often with the main trial than is optimum. However, this depends on why small trials seem to show
larger effects. Large effects may be because of either publication bias (small positive trials are more likely to be published
than small negative studies) or quality differences. In a metaanalysis of small and large trials, Kjaergaard et al. [14] found
that the exaggerated effect sizes of small trials disappeared
when the analysis adjusted for quality of the randomization.
Thus, small, or pilot, trials will not exaggerate treatment effects
if undertaken rigorously. Therefore, it is unlikely as long as the
pilot study pays good attention to rigorous randomization procedures and other quality criteria that they will overestimate the
need for definitive trials. In conclusion, we believe that it is
helpful to undertake a priori sample size calculations for pilot
trials and that they should be encouraged.

Acknowledgment
The authors would like to thank Julie Oates and Andrew
Thorpe for assisting with the programming.

References
[1] Arain M, Campbell MJ, Cooper CL, Lancaster GA. What is a pilot or
feasibility study? A review of current practice and editorial policy.
BMC Med Res Methodol 2010;10:678.
[2] Cox H, Tilbrook H, Aplin J, Chuang LH, Hewitt C, Jayakody S, et al.
A pragmatic multi-centred randomised controlled trial of yoga for
chronic low back pain: trial protocol. Complement Ther Clin Pract
2010;16:76e80.
[3] Tilbrook HE, Cox H, Hewitt CE, Kangombe AR, Chuang LH,
Jayakody S, et al. Yoga for chronic low back pain. A randomized
trial. Ann Intern Med 2011;155:569e78.
[4] Thabane L, Ma J, Chu R, Cheng J, Ismaila A, Rios LP, et al. A tutorial
on pilot studies: the what, why and how. BMC Med Res Methodol
2010;10:1.
[5] Julious SA. Sample size of 12 per group rule of thumb for a pilot
study. Pharm Stat 2005;4:287e91.
[6] Birkett MA, Day SJ. Internal pilot studies for estimating sample size.
Stat Med 1994;13:2455e63.
[7] Browne RH. On the use of a pilot sample for sample size determination. Stat Med 1995;14:1933e40.
[8] Torgerson DJ, Torgerson CJ. Designing randomised trials in health,
education and the social sciences. Basingstoke, UK: Palgrave Macmillan; 2008.
[9] Wittes J, Brittain E. The role of internal pilot studies in increasing the
efficiency of clinical trial. Stat Med 1990;9:65e72.
[10] Bland JM. The tyranny of power: is there a better way to calculate
sample size? BMJ 2009;339:b3985.
[11] Smithson M. Confidence intervals. Thousand Oaks, CA: Sage Publications, Inc; 2003.
[12] Dupont WD, Plummer WD. Power and sample size calculations: a review and computer program. Control Clin Trials 1990;11:116e28.
[13] Sim J, Lewis M. The size of a pilot study for a clinical trial should be
calculated in relation to considerations of precision and efficiency.
J Clin Epidemiol 2012;65:301e8.
[14] Kjaergaard LL, Vilumsen J, Cluud C. Reported methodologic quality
and discrepancies between large and small randomized trials in metaanalyses. Ann Intern Med 2001;135:982e9.

Potrebbero piacerti anche