Sample size calculations for pilot randomized trials: a conﬁdence interval approach
Kim Cocks, David J. Torgerson *
York Trials Unit, Department of Health Sciences, University of York, Heslington, York YO10 5DD, UK
Accepted 5 September 2012; Published online 27 November 2012
Abstract
Objectives: To describe a method using conﬁdence intervals (CIs) to estimate the sample size for a pilot randomized trial. Study Design: Using onesided CIs and the estimated effect size that would be sought in a large trial, we calculated the sample size needed for pilot trials. Results: Using an 80% onesided CI, we estimated that a pilot trial should have at least 9% of the sample size of the main planned trial. Conclusion: Using the estimated effect size difference for the main trial and using a onesided CI, this allows us to calculate a sample size for a pilot trial, which will make its results more useful than at present. 2013 Elsevier Inc. All rights reserved.
Keywords: Sample size; Pilot trials; Conﬁdence intervals; Statistical power; Review; Randomised trials
1. Background
Randomized controlled trials (RCTs) are often complex, time consuming, and expensive. Ideally, before a large RCT is undertaken, a pilot or feasibility study that informs the design of the main trial should be conducted. It is useful, at this stage, to distinguish between a pilot trial and feasi bility study. A pilot trial replicates, in miniature, a planned larger study [1] , whereas a feasibility study may help in the development of the intervention and/or outcome measures. Consequently, in the deﬁnitive trial, the intervention may be quite different, and the outcomes may have changed. In this article, we are only discussing pilot trials: studies that mimic, in all the major essentials, the future deﬁnitive trial. Our arguments apply to both pilot trials run before the main study (external pilot trials) and those run as the ﬁrst stage of the main trial (internal pilot studies). Often, researchers justify a pilot study to help with the calculation of the sample size for the main trial. Estimates of treatment effects and their variance from pilot studies may be used to generate possible sample size requirements, but there is a problem with this approach. Effect sizes from any small trial will be bounded by a high degree of uncer tainty. Consequently, if one were to plan a deﬁnitive trial’s
Conﬂict of interest statement: We conﬁrm that we have no conﬂict of interest. Competing interests: Both authors have no competing interests. * Corresponding author. Email address: djt6@york.ac.uk (D.J. Torgerson).
08954356/$  see front matter 2013 Elsevier Inc. All rights reserved.
sample size based on the estimate obtained from a small pilot, then it is likely that the main trial will be underpow ered as many published small trials overestimate treatment effects. For example, in the design of a trial of yoga for low back pain, three small previously published pilot trials returned an average difference in back pain scores of 0.98 of a standard deviation [2] . However, this difference was not used to plan the main trial as the researchers judged it to be unexpectedly large. Indeed, the main trial showed a much smaller difference of around 0.5 of a standard de viation difference [3] . Therefore, although pilot trials are useful in many areas of trial design, their use for the deter mination of the sample size for the main trial should be used with caution, and the clinical relevance must also be considered. There is often the scenario that the difference to be detected for the main trial can be reasonably informed by known clinical or economic importance; however, a pilot study may still be desirable to assess whether such a differ ence is likely. We propose that if careful attention is paid to the sample size calculation in these pilot trials, they could be much more informative, resulting in cost savings and more efﬁcient use of patients in trials. Sample size calculations for pilot trials are sometimes not undertaken. Indeed, many journal editors publishing pilot trials either do not expect them or suggest that they should not be done [1,4] . In our experience, this view is widely held, but we will argue that it is mistaken. The argument for not undertaking a sample size calcula tion for pilot trials hinges around the problem of a type II
198
K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201
What is new?
Many randomized controlled pilot trials do not have an a priori sample size calculation. In this ar ticle, we argue that sample size calculations are beneﬁcial.
In this study, we suggest a novel approach of using the anticipated main study to inform the pilot tri al’s sample size using a conﬁdence interval approach.
We argue that using a sample size such that it gives a onesided 80% conﬁdence interval which ex cludes the minimum important clinical effect size for the main study enhances a pilot study’s utility.
error, which is where one concludes, because of the small sample size, that there is no worthwhile difference between
the groups when in fact there is. By deﬁnition, a pilot trial is small and not intended to be large enough to identify
a meaningful difference between the treatment groups that
can be statistically signiﬁcant. Consequently, how would one undertake a meaningful sample size calculation? Nev ertheless, some authors have made some suggestions of
sample sizes for pilot trials with ﬁgures of 12 [5] , 10 [6] , and 15 [7] per group, 32 in total [8] for a twoarm trial or 50% of the total main trial’s sample size [9] ( Table 1 for fuller details). Justiﬁcations for these ﬁgures include precision around the mean and variance or having the power to show a large difference (1 SD) between groups
if it were present.
In this article, we suggest an alternative approach, whereby the sample size calculation for the pilot trial is driven by the proposed sample size of the main trial. We
suggest that we can use objective criteria for establishing
a sample size for pilot studies by using a conﬁdence inter val (CI) approach [10] rather than using the more usual power and statistical signiﬁcance method.
2. The role of CIs for informing more research
Most medical journals nowadays insist that reports of tri
als (and other quantitative studies) include CIs or their Bayes ian counterparts, credibility intervals. A 95% interval, the most commonly used, gives an estimate of where the true, but unknown, clinical difference lies. For large trials, the un certainty surrounding any estimate is relatively narrow, and we can be conﬁdent that the ‘‘true’’ clinical difference does not depart very much from the observed difference seen in the trial. The most important use of measures of uncertainty
is to inform the need for further research. Very narrow inter
vals, in the context of a robust study, suggest that the further replication of the study is not required, whereas wider
intervals mean a greater uncertainty and act as a pointer for further research. Usually, twosided intervals are generated because for most treatments, we are interested as to whether the intervention is harmful and beneﬁcial. We can use CIs or the Bayesian equivalent, credibility intervals, in pilot trials to inform the decision to go forward with the main study. Because uncertainty intervals (i.e., conﬁdence or credibility) are used to inform the need for further research, we would argue that they should be produced in the analysis of pilot trials.
3. Aim and rationale for the use of the 80% onesided CI
What we suggest is that we identify a sample size, for our pilot trial, of sufﬁcient size such that if our observed difference between the two groups, in the pilot trial, is zero, then the upper conﬁdence limit will exclude the estimate that is considered ‘‘clinically signiﬁcant’’ in the planned deﬁnitive trial. If we are to use uncertainty estimates in the analysis of pilot trials then it follows, we should calcu late sample sizes for pilot studies to optimize their utility. This would give us a statistical basis for our sample size calculation, which will ensure that we have a sufﬁcient sample size to aid our decision as to whether we should move forward into the main trial. We want to identify a sample size that gives us reason able conﬁdence that our pilot trial is big enough to enable us to be conﬁdent that we are making the right decision in proceeding to a larger trial or not. However, we would not require too large a sample size as this increases the cost, time taken to conduct the pilot, and potential for more pa tients to be exposed to an ineffective treatment. These all negate some of the advantages of undertaking a pilot before our main trial. Consequently, we would not use a 95% interval rather we would use a smaller interval, and we pro pose an 80% interval that will satisfy the need for reason able certainty for trial decision making but would be small enough to deliver a study within a reasonable budget and timeframe, although some may feel more comfortable using a 90% interval. Furthermore, we propose to use a onesided CI as we are only interested in proceeding toward the main trial if there is some evidence of effective ness. If the intervention appeared to be harmful, even if this were not statistically signiﬁcant, we would not proceed.
4. Sample size estimation
In the following examples, CIs for the standardized effect size have been calculated using the inversion CI prin ciple via SAS software (SAS Institute Inc., Cary, NC, USA) (NONCT function) [11] . Sample sizes for the main trial have been calculated using PS Power and Sample Size soft ware (Vanderbilt University) [12] . Table 2 and Fig. 1 show the recommended pilot sample sizes for various standard ized effect sizes (using onesided 80% and 90% conﬁdence
K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201
199
Table 1. Summary of recommendations for sample size calculations for pilot trials
Recommended 

Study 
sample size 
Justiﬁcation for sample size 
Internal/external pilot 
Method of analysis and ﬁndings 
Wittes and 
50% of total sample size of main trial. 
To conﬁrm estimates of variance used to power main study. The study should be large enough to be conﬁdent in the variance estimates. Extension to Wittes and Brittain but argued that good estimates of study variance can be obtained with a smaller sample size. Sample size ‘‘rule of thumb’’ of 30 is too small if one uses pilot sample estimate of variance. Need to be more conservative and use upper conﬁdence intervals. Sample size depends on sample of original study that provided estimate of standard deviation. Number is divisible by 2, 3, 4, and 6, which can be used as block sizes in restricted randomization. Further reductions in variance decline after 12. Smaller sample sizes are likely to underestimate the variance and lead to an underpowered main trial. This allows a onesided 80% conﬁdence interval to exclude a clinically important difference. 
Internal 
Estimate of variance. If larger than expected, increase sample for main trial, and if lower, retain original sample size. No estimate of treatment effect. Estimate of variance. No estimate of treatment effect. 
Brittain [9] 

Birkett and 
10 patients per group 40 maximum. Larger sample sizes for larger 
Internal 

Day [6] 

Browne [7] 
main studies. Not clear. 
Internal/external 
Calculate 80% and 90% conﬁdence interval of standard deviation, and use this to plan main study’s sample size. 

Sandvik et al. ^{a} Minimum of 20 patients in total. 
Internal 
Estimate of a new standard deviation. 

Julious [5] 
12 patients per group. 
Internal/external 
Use point estimate of variance. 

Sim and 
At least 55. 
Internal/external 

Lewis [13] 
Use upper 95% onesided conﬁdence limit of pilot standard deviation. 

Present study 
At least 9% of main trial’s sample size. 
Internal/external 
Obtain treatment estimate if above zero proceed to main trial. No hypothesis tests. 
^{a} Sandvik L, Erikssen J, Mowinckel P, Rodland EA. A method for determining size of internal pilot studies. Statistics in Medicine 1996;
1587e90.
limits). Table 3 indicates the sample size requirements for binary outcomes. Note that the sample size calculations here make no allowance for attrition.
5. Some worked examples
How might this work? Suppose we wanted to undertake
a study in which we felt that 0.3 of a standard deviation
between two groups was worthwhile. Such a study would require about 350 participants (assuming 80% power and
a twosided alpha of 5%) in the ﬁnal analysis ( Table 2 ). However, the funding agency recommends that the re
searchers undertake a pilot trial to test the recruitment rate and assess whether such a difference is likely to be realistic. Using a CI approach, we would calculate the pilot sample size required to produce an upper limit of a onesided 80% CI which excludes 0.3, assuming that the treatment estimate from the pilot was zero or less. Thirtytwo participants (approximately 9% of main sample size) would
be required to produce a onesided 80% conﬁdence limit,
which would exclude this estimate (i.e., upper 80%
CI 5 0.2976). Consequently, if we undertook such a pilot
with that sample size and found an estimate larger than zero and the pilot also showed that it were feasible to recruit and
retain the participants, and so forth, then the recommenda tion would be to move forward with the main study. In Table 2 , we show some examples of sample size calcu lations for continuous outcome measures. An example of using this approach might be in the ﬁeld of low back pain. The main outcome measure used in this area is the Roland and Morris disability questionnaire (RMDQ), which has a standard deviation of about 4 points and an average score of 8 [3] . Let us suppose that we want to evaluate an inexpensive intervention such that a modest difference of 1 point is considered worthwhile (i.e., a differ ence of a quarter of a standard deviation). To have an 80% power to detect such a difference (alpha 5 0.05), we would require 504 participants in the analysis. If we recruited, ran domized, and analyzed 46 participants (i.e., 23 in each group), we could produce a onesided 80% conﬁdence limit, which would exclude a 1 point difference on the RMDQ, if the point estimate from the pilot study were 0. Similarly, let us suppose that we want to undertake a trial to reduce the proportion of older people who are at risk of falling from 50% down to 40%. To show this difference with an 80% power (alpha 5 0.05), we would need to randomize and ana lyze about 800 participants. However, if we wish to undertake a pilot, then recruiting, randomizing, and analyzing 72 partic ipants and assuming that 18 fell in each of the two groups (i.e.,
200
K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201
Table 2. Recommended pilot sample size for continuous outcome measures
Standardized effect size for main trial 
Sample size for main trial ^{a} 
Pilot sample size (80% level) 
Upper 80% onesided conﬁdence limit 
Pilot sample size (90% level) 
Upper 90% onesided conﬁdence limit 
0.50 
128 
12 
0.4859 
28 
0.4844 
0.45 
158 
14 
0.4499 
34 
0.4397 
0.40 
198 
18 
0.3967 
42 
0.3955 
0.35 
258 
24 
0.3436 
54 
0.3488 
0.30 
344 
32 
0.2976 
74 
0.2980 
0.25 
504 
46 
0.2482 
106 
0.2490 
0.20 
786 
72 
0.1984 
166 
0.1989 
0.15 
1398 
126 
0.14995 
292 
0.14999 
0.10 
3142 
284 
0.0999 
658 
0.0999 
NB: Sample sizes not corrected for attrition. ^{a} Assuming 80% power and 5% signiﬁcance level using t test.
50%) would produce a onesided 80% conﬁdence limit that would exclude us ﬁnding a 10% point difference that would be statistically signiﬁcant in a larger trial. In terms of the analysis of the pilot study, we are only interested in whether the treatment estimate is larger or smaller than zero. Consequently, it is not necessary to for mally undertake a hypothesis test of the results.
6. Recommendations for pilot study sample size
There are alternative approaches to estimate the sample sizes for pilot studies, and some of these are summarized in Table 1 . As Table 1 shows that most previous studies looking at the sample size estimation hinge around trying to estimate an unknown variance. In contrast, our approach assumes that the variance is known; it is the likelihood of the main study ﬁnding a minimum clinically important effect size, which drives the sample size in our approach. If the researchers are unsure of the variance, then they should consider using one of the other approaches listed in the table. Using our methods, if we choose to use a onesided 80% conﬁdence limit, then we will need to recruit, retain, and an alyze about 9% of the total sample size needed for the main trial (for continuous or dichotomous outcomes). Clearly, in some instances, larger pilots may be warranted. If we wanted to estimate the main study’s standard deviation, for instance, we should probably seek a sample size of at least 50 [13] . Fur thermore, recruiting only 9% of the total estimated sample
size of the main trial may be insufﬁcient to be sure that re cruitment targets are achievable. Therefore, we would see that 9% of the main study’s sample size is potentially the min imum size of a pilot rather than the maximum. Indeed, we would suggest that as a minimum, at least 20 participants should be included in a pilot study as this seems to be the smallest amount that is reasonable from statistical modeling studies ( Table 1 ). For pilot studies that wanted to estimate the value of a parameter, such as a standard deviation, and assess whether the main trial is worthwhile, we suggest using the largest sample size estimate, if these are different.
7. Discussion
In our experience, there is a current belief among some journal editors, researchers, and funders that it is not neces sary or desirable to undertake sample size calculations for pi lot trials. We disagree and have argued that by formally undertaking a priori sample size calculation for a pilot study will enhance its utility. However, we believe that the use of our suggested approach or one advocated by other authors is better than not doing a sample size estimation for pilot trials. It is not the case that if an appropriately sized pilot study showed a zero or negative effect size, this would automati cally preclude the main trial going forward. It may be that the study, although planned as a pilot, actually behaved more like a feasibility study, in that during the study, the elements of the intervention were found to require change, which may
Fig. 1. Sample size requirements for onesided 80% conﬁdence interval to exclude required standardized effect size.
K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201 
201 

Table 3. Recommended pilot sample size for binary outcomes 

Control group 
Difference to be detected (%) 
Sample size for main trial 
Pilot sample size (80% level) 
Upper onesided 80% conﬁdence limit 
Pilot sample size (90% level) 
Upper onesided 90% conﬁdence limit 
proportion 

0.50 
5 
3,130 
284 
0.0499 
658 
0.0500 
0.50 
10 
774 
72 
0.0992 
166 
0.0995 
0.50 
15 
338 
32 
0.1488 
74 
0.1490 
0.30 
5 
2,754 
238 
0.0500 
552 
0.0500 
0.30 
10 
712 
60 
0.0996 
138 
0.09999 
0.30 
15 
324 
28 
0.1458 
62 
0.1492 
0.20 
5 
2,188 
182 
0.0499 
422 
0.0499 
0.20 
10 
586 
46 
0.0993 
106 
0.0996 
0.20 
15 
276 
22 
0.1435 
48 
0.1480 
have explained a lack of effect. Similarly, even if the pilot trial identiﬁed a positive effect, it may also have found that recruitment was simply too difﬁcult to make it possible to
successfully recruit to the main trial. Therefore, all the other reasons for doing a pilot trial remain. However, undertaking
a formal sample size calculation will ensure that the pilot
produces much more added value than is commonly the case. Small trials on average appear to generate larger estimates of treatment effects than bigger trials, and if this phenomenon were true, then pilot trials using our approach will tend to rec ommend going ahead more often with the main trial than is op timum. However, this depends on why small trials seem to show larger effects. Large effects may be because of either publica tion bias (small positive trials are more likely to be published than small negative studies) or quality differences. In a meta analysis of small and large trials, Kjaergaard et al. [14] found that the exaggerated effect sizes of small trials disappeared when the analysis adjusted for quality of the randomization.
Thus, small, or pilot, trials will not exaggerate treatment effects
if undertaken rigorously. Therefore, it is unlikely as long as the
pilot study pays good attention to rigorous randomization pro cedures and other quality criteria that they will overestimate the need for deﬁnitive trials. In conclusion, we believe that it is helpful to undertake a priori sample size calculations for pilot trials and that they should be encouraged.
Acknowledgment
The authors would like to thank Julie Oates and Andrew Thorpe for assisting with the programming.
References
[1] Arain M, Campbell MJ, Cooper CL, Lancaster GA. What is a pilot or feasibility study? A review of current practice and editorial policy. BMC Med Res Methodol 2010;10:678.
[2] Cox H, Tilbrook H, Aplin J, Chuang LH, Hewitt C, Jayakody S, et al.
A pragmatic multicentred randomised controlled trial of yoga for
chronic low back pain: trial protocol. Complement Ther Clin Pract
2010;16:76e80.
[3] Tilbrook HE, Cox H, Hewitt CE, Kang’ombe AR, Chuang LH, Jayakody S, et al. Yoga for chronic low back pain. A randomized trial. Ann Intern Med 2011;155:569e 78.
[4] Thabane L, Ma J, Chu R, Cheng J, Ismaila A, Rios LP, et al. A tutorial
on pilot studies: the what, why and how. BMC Med Res Methodol
2010;10:1.
[5] Julious SA. Sample size of 12 per group rule of thumb for a pilot study. Pharm Stat 2005;4:287e91. [6] Birkett MA, Day SJ. Internal pilot studies for estimating sample size. Stat Med 1994;13:2455e63. [7] Browne RH. On the use of a pilot sample for sample size determina tion. Stat Med 1995;14:1933e40. [8] Torgerson DJ, Torgerson CJ. Designing randomised trials in health, education and the social sciences. Basingstoke, UK: Palgrave Mac millan; 2008. [9] Wittes J, Brittain E. The role of internal pilot studies in increasing the efﬁciency of clinical trial. Stat Med 1990;9:65e 72. [10] Bland JM. The tyranny of power: is there a better way to calculate sample size? BMJ 2009;339:b3985. [11] Smithson M. Conﬁdence intervals. Thousand Oaks, CA: Sage Publi cations, Inc; 2003. [12] Dupont WD, Plummer WD. Power and sample size calculations: a re view and computer program. Control Clin Trials 1990;11:116e28. [13] Sim J, Lewis M. The size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efﬁciency. J Clin Epidemiol 2012;65:301e8. [14] Kjaergaard LL, Vilumsen J, Cluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta analyses. Ann Intern Med 2001;135:982 e9.
Molto più che documenti.
Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.
Annulla in qualsiasi momento.