Sei sulla pagina 1di 5
Journal of Clinical Epidemiology 66 (2013) 197 e 201 Sample size calculations for pilot randomized
Journal of Clinical Epidemiology 66 (2013) 197 e 201 Sample size calculations for pilot randomized trials:

Sample size calculations for pilot randomized trials: a confidence interval approach

Kim Cocks, David J. Torgerson *

York Trials Unit, Department of Health Sciences, University of York, Heslington, York YO10 5DD, UK

Accepted 5 September 2012; Published online 27 November 2012

Abstract

Objectives: To describe a method using confidence intervals (CIs) to estimate the sample size for a pilot randomized trial. Study Design: Using one-sided CIs and the estimated effect size that would be sought in a large trial, we calculated the sample size needed for pilot trials. Results: Using an 80% one-sided CI, we estimated that a pilot trial should have at least 9% of the sample size of the main planned trial. Conclusion: Using the estimated effect size difference for the main trial and using a one-sided CI, this allows us to calculate a sample size for a pilot trial, which will make its results more useful than at present. 2013 Elsevier Inc. All rights reserved.

Keywords: Sample size; Pilot trials; Confidence intervals; Statistical power; Review; Randomised trials

1. Background

Randomized controlled trials (RCTs) are often complex, time consuming, and expensive. Ideally, before a large RCT is undertaken, a pilot or feasibility study that informs the design of the main trial should be conducted. It is useful, at this stage, to distinguish between a pilot trial and feasi- bility study. A pilot trial replicates, in miniature, a planned larger study [1] , whereas a feasibility study may help in the development of the intervention and/or outcome measures. Consequently, in the definitive trial, the intervention may be quite different, and the outcomes may have changed. In this article, we are only discussing pilot trials: studies that mimic, in all the major essentials, the future definitive trial. Our arguments apply to both pilot trials run before the main study (external pilot trials) and those run as the first stage of the main trial (internal pilot studies). Often, researchers justify a pilot study to help with the calculation of the sample size for the main trial. Estimates of treatment effects and their variance from pilot studies may be used to generate possible sample size requirements, but there is a problem with this approach. Effect sizes from any small trial will be bounded by a high degree of uncer- tainty. Consequently, if one were to plan a definitive trial’s

Conflict of interest statement: We confirm that we have no conflict of interest. Competing interests: Both authors have no competing interests. * Corresponding author. E-mail address: djt6@york.ac.uk (D.J. Torgerson).

0895-4356/$ - see front matter 2013 Elsevier Inc. All rights reserved.

sample size based on the estimate obtained from a small pilot, then it is likely that the main trial will be underpow- ered as many published small trials overestimate treatment effects. For example, in the design of a trial of yoga for low back pain, three small previously published pilot trials returned an average difference in back pain scores of 0.98 of a standard deviation [2] . However, this difference was not used to plan the main trial as the researchers judged it to be unexpectedly large. Indeed, the main trial showed a much smaller difference of around 0.5 of a standard de- viation difference [3] . Therefore, although pilot trials are useful in many areas of trial design, their use for the deter- mination of the sample size for the main trial should be used with caution, and the clinical relevance must also be considered. There is often the scenario that the difference to be detected for the main trial can be reasonably informed by known clinical or economic importance; however, a pilot study may still be desirable to assess whether such a differ- ence is likely. We propose that if careful attention is paid to the sample size calculation in these pilot trials, they could be much more informative, resulting in cost savings and more efficient use of patients in trials. Sample size calculations for pilot trials are sometimes not undertaken. Indeed, many journal editors publishing pilot trials either do not expect them or suggest that they should not be done [1,4] . In our experience, this view is widely held, but we will argue that it is mistaken. The argument for not undertaking a sample size calcula- tion for pilot trials hinges around the problem of a type II

198

K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201

What is new?

Many randomized controlled pilot trials do not have an a priori sample size calculation. In this ar- ticle, we argue that sample size calculations are beneficial.

In this study, we suggest a novel approach of using the anticipated main study to inform the pilot tri- al’s sample size using a confidence interval approach.

We argue that using a sample size such that it gives a one-sided 80% confidence interval which ex- cludes the minimum important clinical effect size for the main study enhances a pilot study’s utility.

error, which is where one concludes, because of the small sample size, that there is no worthwhile difference between

the groups when in fact there is. By definition, a pilot trial is small and not intended to be large enough to identify

a meaningful difference between the treatment groups that

can be statistically significant. Consequently, how would one undertake a meaningful sample size calculation? Nev- ertheless, some authors have made some suggestions of

sample sizes for pilot trials with figures of 12 [5] , 10 [6] , and 15 [7] per group, 32 in total [8] for a two-arm trial or 50% of the total main trial’s sample size [9] ( Table 1 for fuller details). Justifications for these figures include precision around the mean and variance or having the power to show a large difference (1 SD) between groups

if it were present.

In this article, we suggest an alternative approach, whereby the sample size calculation for the pilot trial is driven by the proposed sample size of the main trial. We

suggest that we can use objective criteria for establishing

a sample size for pilot studies by using a confidence inter- val (CI) approach [10] rather than using the more usual power and statistical significance method.

2. The role of CIs for informing more research

Most medical journals nowadays insist that reports of tri-

als (and other quantitative studies) include CIs or their Bayes- ian counterparts, credibility intervals. A 95% interval, the most commonly used, gives an estimate of where the true, but unknown, clinical difference lies. For large trials, the un- certainty surrounding any estimate is relatively narrow, and we can be confident that the ‘‘true’’ clinical difference does not depart very much from the observed difference seen in the trial. The most important use of measures of uncertainty

is to inform the need for further research. Very narrow inter-

vals, in the context of a robust study, suggest that the further replication of the study is not required, whereas wider

intervals mean a greater uncertainty and act as a pointer for further research. Usually, two-sided intervals are generated because for most treatments, we are interested as to whether the intervention is harmful and beneficial. We can use CIs or the Bayesian equivalent, credibility intervals, in pilot trials to inform the decision to go forward with the main study. Because uncertainty intervals (i.e., confidence or credibility) are used to inform the need for further research, we would argue that they should be produced in the analysis of pilot trials.

3. Aim and rationale for the use of the 80% one-sided CI

What we suggest is that we identify a sample size, for our pilot trial, of sufficient size such that if our observed difference between the two groups, in the pilot trial, is zero, then the upper confidence limit will exclude the estimate that is considered ‘‘clinically significant’’ in the planned definitive trial. If we are to use uncertainty estimates in the analysis of pilot trials then it follows, we should calcu- late sample sizes for pilot studies to optimize their utility. This would give us a statistical basis for our sample size calculation, which will ensure that we have a sufficient sample size to aid our decision as to whether we should move forward into the main trial. We want to identify a sample size that gives us reason- able confidence that our pilot trial is big enough to enable us to be confident that we are making the right decision in proceeding to a larger trial or not. However, we would not require too large a sample size as this increases the cost, time taken to conduct the pilot, and potential for more pa- tients to be exposed to an ineffective treatment. These all negate some of the advantages of undertaking a pilot before our main trial. Consequently, we would not use a 95% interval rather we would use a smaller interval, and we pro- pose an 80% interval that will satisfy the need for reason- able certainty for trial decision making but would be small enough to deliver a study within a reasonable budget and timeframe, although some may feel more comfortable using a 90% interval. Furthermore, we propose to use a one-sided CI as we are only interested in proceeding toward the main trial if there is some evidence of effective- ness. If the intervention appeared to be harmful, even if this were not statistically significant, we would not proceed.

4. Sample size estimation

In the following examples, CIs for the standardized effect size have been calculated using the inversion CI prin- ciple via SAS software (SAS Institute Inc., Cary, NC, USA) (NONCT function) [11] . Sample sizes for the main trial have been calculated using PS Power and Sample Size soft- ware (Vanderbilt University) [12] . Table 2 and Fig. 1 show the recommended pilot sample sizes for various standard- ized effect sizes (using one-sided 80% and 90% confidence

K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201

199

Table 1. Summary of recommendations for sample size calculations for pilot trials

 

Recommended

Study

sample size

Justification for sample size

Internal/external pilot

Method of analysis and findings

Wittes and

50% of total sample size of main trial.

To confirm estimates of variance used to power main study. The study should be large enough to be confident in the variance estimates. Extension to Wittes and Brittain but argued that good estimates of study variance can be obtained with a smaller sample size. Sample size ‘‘rule of thumb’’ of 30 is too small if one uses pilot sample estimate of variance. Need to be more conservative and use upper confidence intervals. Sample size depends on sample of original study that provided estimate of standard deviation. Number is divisible by 2, 3, 4, and 6, which can be used as block sizes in restricted randomization. Further reductions in variance decline after 12. Smaller sample sizes are likely to underestimate the variance and lead to an underpowered main trial. This allows a one-sided 80% confidence interval to exclude a clinically important difference.

Internal

Estimate of variance. If larger than expected, increase sample for main trial, and if lower, retain original sample size. No estimate of treatment effect. Estimate of variance. No estimate of treatment effect.

Brittain [9]

Birkett and

10 patients per group 40 maximum. Larger sample sizes for larger

Internal

Day [6]

Browne [7]

main studies. Not clear.

Internal/external

Calculate 80% and 90% confidence interval of standard deviation, and use this to plan main study’s sample size.

Sandvik et al. a Minimum of 20 patients in total.

Internal

Estimate of a new standard deviation.

Julious [5]

12 patients per group.

Internal/external

Use point estimate of variance.

Sim and

At least 55.

Internal/external

Lewis [13]

 

Use upper 95% one-sided confidence limit of pilot standard deviation.

Present study

At least 9% of main trial’s sample size.

Internal/external

Obtain treatment estimate if above zero proceed to main trial. No hypothesis tests.

a Sandvik L, Erikssen J, Mowinckel P, Rodland EA. A method for determining size of internal pilot studies. Statistics in Medicine 1996;

1587e90.

limits). Table 3 indicates the sample size requirements for binary outcomes. Note that the sample size calculations here make no allowance for attrition.

5. Some worked examples

How might this work? Suppose we wanted to undertake

a study in which we felt that 0.3 of a standard deviation

between two groups was worthwhile. Such a study would require about 350 participants (assuming 80% power and

a two-sided alpha of 5%) in the final analysis ( Table 2 ). However, the funding agency recommends that the re-

searchers undertake a pilot trial to test the recruitment rate and assess whether such a difference is likely to be realistic. Using a CI approach, we would calculate the pilot sample size required to produce an upper limit of a one-sided 80% CI which excludes 0.3, assuming that the treatment estimate from the pilot was zero or less. Thirty-two participants (approximately 9% of main sample size) would

be required to produce a one-sided 80% confidence limit,

which would exclude this estimate (i.e., upper 80%

CI 5 0.2976). Consequently, if we undertook such a pilot

with that sample size and found an estimate larger than zero and the pilot also showed that it were feasible to recruit and

retain the participants, and so forth, then the recommenda- tion would be to move forward with the main study. In Table 2 , we show some examples of sample size calcu- lations for continuous outcome measures. An example of using this approach might be in the field of low back pain. The main outcome measure used in this area is the Roland and Morris disability questionnaire (RMDQ), which has a standard deviation of about 4 points and an average score of 8 [3] . Let us suppose that we want to evaluate an inexpensive intervention such that a modest difference of 1 point is considered worthwhile (i.e., a differ- ence of a quarter of a standard deviation). To have an 80% power to detect such a difference (alpha 5 0.05), we would require 504 participants in the analysis. If we recruited, ran- domized, and analyzed 46 participants (i.e., 23 in each group), we could produce a one-sided 80% confidence limit, which would exclude a 1 point difference on the RMDQ, if the point estimate from the pilot study were 0. Similarly, let us suppose that we want to undertake a trial to reduce the proportion of older people who are at risk of falling from 50% down to 40%. To show this difference with an 80% power (alpha 5 0.05), we would need to randomize and ana- lyze about 800 participants. However, if we wish to undertake a pilot, then recruiting, randomizing, and analyzing 72 partic- ipants and assuming that 18 fell in each of the two groups (i.e.,

200

K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201

Table 2. Recommended pilot sample size for continuous outcome measures

Standardized effect size for main trial

Sample size for main trial a

Pilot sample size (80% level)

Upper 80% one-sided confidence limit

Pilot sample size (90% level)

Upper 90% one-sided confidence limit

0.50

128

12

0.4859

28

0.4844

0.45

158

14

0.4499

34

0.4397

0.40

198

18

0.3967

42

0.3955

0.35

258

24

0.3436

54

0.3488

0.30

344

32

0.2976

74

0.2980

0.25

504

46

0.2482

106

0.2490

0.20

786

72

0.1984

166

0.1989

0.15

1398

126

0.14995

292

0.14999

0.10

3142

284

0.0999

658

0.0999

NB: Sample sizes not corrected for attrition. a Assuming 80% power and 5% significance level using t -test.

50%) would produce a one-sided 80% confidence limit that would exclude us finding a 10% point difference that would be statistically significant in a larger trial. In terms of the analysis of the pilot study, we are only interested in whether the treatment estimate is larger or smaller than zero. Consequently, it is not necessary to for- mally undertake a hypothesis test of the results.

6. Recommendations for pilot study sample size

There are alternative approaches to estimate the sample sizes for pilot studies, and some of these are summarized in Table 1 . As Table 1 shows that most previous studies looking at the sample size estimation hinge around trying to estimate an unknown variance. In contrast, our approach assumes that the variance is known; it is the likelihood of the main study finding a minimum clinically important effect size, which drives the sample size in our approach. If the researchers are unsure of the variance, then they should consider using one of the other approaches listed in the table. Using our methods, if we choose to use a one-sided 80% confidence limit, then we will need to recruit, retain, and an- alyze about 9% of the total sample size needed for the main trial (for continuous or dichotomous outcomes). Clearly, in some instances, larger pilots may be warranted. If we wanted to estimate the main study’s standard deviation, for instance, we should probably seek a sample size of at least 50 [13] . Fur- thermore, recruiting only 9% of the total estimated sample

size of the main trial may be insufficient to be sure that re- cruitment targets are achievable. Therefore, we would see that 9% of the main study’s sample size is potentially the min- imum size of a pilot rather than the maximum. Indeed, we would suggest that as a minimum, at least 20 participants should be included in a pilot study as this seems to be the smallest amount that is reasonable from statistical modeling studies ( Table 1 ). For pilot studies that wanted to estimate the value of a parameter, such as a standard deviation, and assess whether the main trial is worthwhile, we suggest using the largest sample size estimate, if these are different.

7. Discussion

In our experience, there is a current belief among some journal editors, researchers, and funders that it is not neces- sary or desirable to undertake sample size calculations for pi- lot trials. We disagree and have argued that by formally undertaking a priori sample size calculation for a pilot study will enhance its utility. However, we believe that the use of our suggested approach or one advocated by other authors is better than not doing a sample size estimation for pilot trials. It is not the case that if an appropriately sized pilot study showed a zero or negative effect size, this would automati- cally preclude the main trial going forward. It may be that the study, although planned as a pilot, actually behaved more like a feasibility study, in that during the study, the elements of the intervention were found to require change, which may

of the intervention were found to require change, which may Fig. 1. Sample size requirements for

Fig. 1. Sample size requirements for one-sided 80% confidence interval to exclude required standardized effect size.

 

K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201

201

Table 3. Recommended pilot sample size for binary outcomes

 

Control group

Difference to be detected (%)

Sample size for main trial

Pilot sample size (80% level)

Upper one-sided 80% confidence limit

Pilot sample size (90% level)

Upper one-sided 90% confidence limit

proportion

0.50

5

3,130

284

0.0499

658

0.0500

0.50

10

774

72

0.0992

166

0.0995

0.50

15

338

32

0.1488

74

0.1490

0.30

5

2,754

238

0.0500

552

0.0500

0.30

10

712

60

0.0996

138

0.09999

0.30

15

324

28

0.1458

62

0.1492

0.20

5

2,188

182

0.0499

422

0.0499

0.20

10

586

46

0.0993

106

0.0996

0.20

15

276

22

0.1435

48

0.1480

have explained a lack of effect. Similarly, even if the pilot trial identified a positive effect, it may also have found that recruitment was simply too difficult to make it possible to

successfully recruit to the main trial. Therefore, all the other reasons for doing a pilot trial remain. However, undertaking

a formal sample size calculation will ensure that the pilot

produces much more added value than is commonly the case. Small trials on average appear to generate larger estimates of treatment effects than bigger trials, and if this phenomenon were true, then pilot trials using our approach will tend to rec- ommend going ahead more often with the main trial than is op- timum. However, this depends on why small trials seem to show larger effects. Large effects may be because of either publica- tion bias (small positive trials are more likely to be published than small negative studies) or quality differences. In a meta- analysis of small and large trials, Kjaergaard et al. [14] found that the exaggerated effect sizes of small trials disappeared when the analysis adjusted for quality of the randomization.

Thus, small, or pilot, trials will not exaggerate treatment effects

if undertaken rigorously. Therefore, it is unlikely as long as the

pilot study pays good attention to rigorous randomization pro- cedures and other quality criteria that they will overestimate the need for definitive trials. In conclusion, we believe that it is helpful to undertake a priori sample size calculations for pilot trials and that they should be encouraged.

Acknowledgment

The authors would like to thank Julie Oates and Andrew Thorpe for assisting with the programming.

References

[1] Arain M, Campbell MJ, Cooper CL, Lancaster GA. What is a pilot or feasibility study? A review of current practice and editorial policy. BMC Med Res Methodol 2010;10:678.

[2] Cox H, Tilbrook H, Aplin J, Chuang LH, Hewitt C, Jayakody S, et al.

A pragmatic multi-centred randomised controlled trial of yoga for

chronic low back pain: trial protocol. Complement Ther Clin Pract

2010;16:76e80.

[3] Tilbrook HE, Cox H, Hewitt CE, Kang’ombe AR, Chuang LH, Jayakody S, et al. Yoga for chronic low back pain. A randomized trial. Ann Intern Med 2011;155:569e 78.

[4] Thabane L, Ma J, Chu R, Cheng J, Ismaila A, Rios LP, et al. A tutorial

on pilot studies: the what, why and how. BMC Med Res Methodol

2010;10:1.

[5] Julious SA. Sample size of 12 per group rule of thumb for a pilot study. Pharm Stat 2005;4:287e91. [6] Birkett MA, Day SJ. Internal pilot studies for estimating sample size. Stat Med 1994;13:2455e63. [7] Browne RH. On the use of a pilot sample for sample size determina- tion. Stat Med 1995;14:1933e40. [8] Torgerson DJ, Torgerson CJ. Designing randomised trials in health, education and the social sciences. Basingstoke, UK: Palgrave Mac- millan; 2008. [9] Wittes J, Brittain E. The role of internal pilot studies in increasing the efficiency of clinical trial. Stat Med 1990;9:65e 72. [10] Bland JM. The tyranny of power: is there a better way to calculate sample size? BMJ 2009;339:b3985. [11] Smithson M. Confidence intervals. Thousand Oaks, CA: Sage Publi- cations, Inc; 2003. [12] Dupont WD, Plummer WD. Power and sample size calculations: a re- view and computer program. Control Clin Trials 1990;11:116e28. [13] Sim J, Lewis M. The size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efficiency. J Clin Epidemiol 2012;65:301e8. [14] Kjaergaard LL, Vilumsen J, Cluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta- analyses. Ann Intern Med 2001;135:982 e9.