Sei sulla pagina 1di 8

S AMPLING NTI Bulletin 2006,42/3&4, 55 - 62

Sample size determination in health studies

VK Chadha*

Summary Required level of significance

One of the most important factors to Required precision / power


consider in the design of an intervention trial is The procedures for sample size estimation
the choice of an appropriate study size. Studies provide a rough estimate of the required study
that are too small may fail to detect important size, as they are often based on approximate
effects on the outcomes of interest, or may estimates of expected disease rates and
estimate those effects too imprecisely. Studies subjective decisions about the size of effects.
that are larger than necessary are a waste of However, a rough estimate of the necessary size
resources. Statistical methods are available for of a study is generally all that is needed.
estimation of appropriate sample size depending
One should be familiar with the following
upon the type of outcome measure, expected
concepts before embarking on the estimation of
disease rates or size of effects, study design and
sample size.
the requirements of confidence interval/precision
or power. These concepts along with the methods Types of outcome measures
of estimating sample size in varied situations are The statistical methods for sample size
presented in this article. determination depend on which type of outcome
Key words: Sample size, proportion, is expected. The 3 most common types of
mean, confidence interval, precision, null outcomes in case of surveys / studies / trials
hypothesis, power, Type I and Type II errors. are:

Introduction i. Proportions : For example, in a trial of a new


measles vaccine, an outcome measure of interest
One of the most important factors to
may be the proportion of vaccinated subjects who
consider in the design of an intervention trial is
develop high levels of antibodies.
the choice of an appropriate study size. Studies
that are too small may fail to detect important ii. Means : For example, in a trial of an anti-
effects on the outcomes of interest, or may malarial intervention, it may be of interest to
estimate those effects too imprecisely. Studies compare the mean packed cell volume (PCV) at
that are larger than necessary are a waste of the end of the malaria season among those in
resources. the intervention group and those in the
comparison group.
Before calculating sample size one has to
decide on the following: iii. Rates : For example, in a trail of multi-drug
Study design therapy for leprosy, the incidence rates of relapse
following treatment may be compared in the
Types of outcome measure
different study groups under consideration.
Guess at likely result

* Sr. Epidemiologist, National Tuberculosis Institute, Bangalore

55
For a quantitative variable, it is only the One of the factors influencing the width of
rough estimates of the proportion, means or rates the confidence interval is the sample size. The
that are required for estimating sample size. larger the sample size, the narrower is the
confidence interval.
Sampling error
The multiplying factor 1.96 is used when
An estimate of an outcome measure
calculating the 95 percent confidence interval.
calculated in an intervention study is subject to
In some circumstances, confidence intervals
sampling error, because it is based on a sample
other than 95 percent limits may be required and
of individuals and not on the whole population of
then values of the multiplying factor are as under:
interest. The term does not mean that the
sampling procedure was applied incorrectly, but Confidence interval (%) Multiplying factor
that when sampling is used to decide which
90 1.64
individuals are in which group, there will be an
element of random variation in the results. 95 1.96
Sampling error is reduced when the study size is
99 2.58
increased and vice versa.
99.9 3.29
Confidence Interval
Confidence intervals and their corresponding
The methods of statistical inference allow
multiplying factors, based on the Normal distribution
the investigator to draw conclusions about the
true value of the outcome measure on the basis Precision of effect measures – The
of the information in the sample. In general, the narrower the confidence interval, the greater the
observed value of the outcome measure gives precision of the estimate.
the best estimate of the true value. In addition,
Significance tests & P value
it is useful to have some indication of the precision
of this estimate, and this is done by attaching a In some instances, before calculating a
confidence interval to the estimate. The confidence interval to indicate a range of plausible
confidence interval is a range of plausible values values of the outcome measure of interest, it
for the true value of the outcome measure. It is may be appropriate to test a specific hypothesis
conventional to quote the 95 percent confidence about the outcome measure. In the context of
interval (also called 95 percent confidence limits). an intervention trial, this will often be the
This is calculated in such a way that there is a hypothesis that there is no true difference
95 percent probability that it includes the true between the outcomes in the groups under
value. comparison- null hypothesis. The objective is
If the outcome measure is a proportion thus to assess whether any observed difference
estimated from the sample data as ` . The 95 in outcomes between the study groups may have
percent confidence intervals to be presented here occurred just by chance due to sampling error.
are = ` ± 1.96 x SE, where SE denotes the The methods for testing the null hypothesis are
standard error of the estimate. Similarly, if the known as significance tests. The sample data
outcome measure is a mean, 95 percent of the are used to calculate a quantity (called a statistic)
values derived from different samples are which gives a measure of the difference between
expected to fall within 1.96 standard deviations the groups with respect to the outcome(s) of
of the mean. interest. Once the statistic has been calculated,

56
its value is referred to an appropriate set of obtaining a significant result of a study, even if
statistical tables, in order to determine the p – there is a real difference. It is necessary to
value (probability value) or ‘significance’ of the consider the probability of obtaining a statistically
results. significant result in a trial, and this probability is
called the power of the study. In other words, if
For example, suppose a difference in mean
the true difference exits, power of the study
PCV of 1.5 percent is observed at the end of the
indicates the probability of finding a statistically
malaria season between two groups of
significant difference between the two groups.
individuals, one of which was supplied with
Thus a power of 80 percent to detect a difference
mosquito-nets. A p-value of 0.03 would indicate
of a specified size means that if the study were
that, if nets had no true effect on PCV levels, (if
to be conducted repeatedly, a statistically
null hypothesis was true) there would only be a
significant result would be obtained four times
3 percent chance of obtaining an observed
out of five if the true difference was really of the
difference of 1.5 percent or greater.
specified size. When designing a study, the
The smaller the p-value, the less plausible objective is to ensure that the study size is large
the null hypothesis seems as an explanation of enough to give high power if the true effect of
the observed data. For example, a p-value of the intervention is large enough to be of practical
0.001 means that the null hypothesis is highly importance.
implausible, and this can be interpreted as very
The power of a study depends on:
strong evidence of a real difference between the
groups. On the other hand, a p-value of 0.20 1. The value of the true difference between
means that a difference of the observed the study groups; in other words, the true
magnitude could quite easily have occurred by effect of the intervention (effect size). The
chance, even if there was no real difference greater the effect, the higher the power to
between the groups. Conventionally, p-values of detect the effect as statistically significant
0.05 and below have been regarded as sufficiently for a study of a given size.
low to be taken as reasonable evidence against
2. The study size; The larger the study size,
the null hypothesis, and have been referred to
higher is the power.
as indicating a ‘statistically significant difference’.
3. The probability level at which a difference
While a small p-value can be interpreted
will be regarded as ‘statistically significant’.
as evidence for a real difference between the
groups, a larger ‘non-significant’ p-value must not The power also depends on whether a one-
be interpreted as indicating that there is no sided or two sided significance test is to be
difference. It merely indicates that there is performed and on the underlying variability of
insufficient evidence to reject the null hypothesis, the data.
so that there may be no true difference between One sided and Two sided tests
the groups.
If it is accepted that the null hypothesis is
Power of study false, that means alternate hypothesis is true.
The concept of power comes into play For example if the claim is about superiority of
when the focus of the study is to find out whether a new drug, this is a one-sided alternative. If
a significant difference exists between the two the claim is not of superiority or inferiority but
groups. Because of the variations resulting from only that they are different, the alternate
sampling error, we cannot always be certain of hypothesis is two sided. This means that when

57
the p-value is computed, it measures the needs to be guarded against. For this reason, P
probability (if the null hypothesis is true) of value is kept at a low level (<5%). When P
observing a difference as great as that actually value is small, it is safe to conclude that groups
observed in either direction (i.e. positive or are different. This threshold, 0.05 is the level of
negative). It is usual to assume that tests are significance.
two-sided.
The second type of error is failure to reject
Wrongly rejecting a true null hypothesis is null hypothesis when it is actually false. This
called type I error. The probability of this error is corresponds to missed diagnosis as also to
referred as P value as already discussed. The pronounce a criminal not guilty. The probability
maximum P value allowed in a problem is called of this error is denoted by β. In a clinical trial set
the level of significance (α). In a diagnostic set up, this is equivalent to declaring a drug
up, this is the probability of declaring a person ineffective when it is actually effective. The
sick when he is actually not. In a clinical trial set complimentary probability of type II error is the
up, P value is the probability that the drug is statistical power (1-β). Thus the power of a
declared effective or better when it is actually statistical test is a probability of correctly rejecting
not. a null hypothesis when it is false.

Diagnosis Disease actually present Choice of criterion


No Yes The choice of which of the above two
Disease present Mis-diagnosis " criteria (precision or power) should be used in
any particular instance depends on the objectives
Disease absent " Missed diagnosis
of the study. If it is known unambiguously that
In a court set up, this corresponds to the intervention has some effect, it makes little
convicting an innocent. sense to test the null hypothesis, and the objective
may be to estimate the magnitude of the effect,
Judgment Assumption of innocence and to do this with acceptable precision.
true false
In studies of new interventions, it is often
Pronounced guilty Serious error " not known whether there will be any impact at all
Pronounced not guilty " error on the outcomes of interest. In these
circumstances, it may be sufficient to ensure that
In a clinical trial set up, P value is the there will be good chance of obtaining a significant
probability that the drug is declared effective or result if there is indeed an effect of some specified
better when it is actually not. magnitude. It should be emphasized, however,
that if this course is adopted, the estimates
Statistical decision Null hypothesis
obtained may be very imprecise.
True False
Usually, it is more important to estimate
Rejected Type I error " the effect of the intervention and to specify a
Not rejected " Type II error confidence interval around the estimate to indicate
the likely range, than to test a specific hypothesis.
This wrong conclusion can allow an Therefore, in many situations it may be more
ineffective drug to be marketed as being effective. appropriate to choose the sample size by setting
In a court set up, this corresponds to convicting the width of the confidence interval, rather than
an innocent. This clearly is unacceptable and to rely on power calculations.
58
Allowances of losses therefore likely to be non-significant. However,
the width of the confidence interval for the effect
Losses to follow up occur in most
measure, for example relative risk, will depend
longitudinal studies. Individuals may be lost
upon sample size. If the sample is small the
because of various reasons like refusals,
confidence interval will be very wide and even
migration, death from cause unrelated to the
though it will probably include the null value, it
outcome of interest, etc. Such losses may
will extend to include large values of the effect
produce bias as the individuals who are lost often
measure. In other words, the study will have
differ in important respects from those who remain
failed to establish that the intervention has no
in the study. The losses also reduce the size of
appreciable effect.
the sample available for analysis, and this
decreases the precision or power of the study. In case the intervention does have an
appreciable effect, a study that is too small will
For these reasons, it is important to make
have low power i.e it will have little chance of
every attempt to reduce the number of losses to
giving a statistically significant difference.
a minimum. However, it is rarely possible to
avoid losses completely. The reduced power or FORMULAE FOR SAMPLE SIZE ESTIMATION
precision resulting from losses may be avoided
I. Estimating a population proportion:
by increasing the initial sample size in order to
With specified absolute precision
compensate for the expected number of losses.
A 20% allowance is generally considered Required information for estimating the
appropriate. sample size is as under:-

Practical constraints - Anticipated population proportion: P, a


rough estimate of P is sufficient.
Resources in terms of staff, vehicles,
laboratory capacity, time or money may limit the - Desired confidence level
potential size of a study, and it is often necessary
- Absolute Precision: (d ) - total percentage
to compromise between the estimated study-size
points of the error that can be tolerated
and what can be managed with the available
on each side of the figure obtained. For
resource. Trying to do a study that is beyond the
example, if anticipated prevalence of
capacity of the available resources is likely to be
infection of TB in a population is 10%,
unfruitful, as data quality is likely to suffer and
we would be satisfied if our sample gives
the results may be subject to serious bias, or the
a figure of 8-12%. In this case,
study may even collapse completely, thus wasting
d = 2% ~ 0.02.
the effort and money that has already been put
into it. Sample size can be estimated using the
following formula :-
If the calculations indicate that a study of
manageable size will yield power and/or precision
Z21 − αα (1 − P)
that is unacceptably low, it is probably better not 1− 2
n = 2 ––––– (1)
to conduct the study at all. d2
Consequences of studies those are too small P=anticipated proportion, d=absolute
Suppose first that the intervention under precision required on either side of the proportion.
study has little or no effect on the outcome of p and d are expressed in fractions, z is a constant,
interest. The difference observed in the study is its value for a two sided test is 1.96 for 95%

59
confidence, 1.645 for 90% confidence and 2.576 tuberculous infection among children of 0-9 years
for 99% confidence. of age in a locality, how many should be included
in the sample so that prevalence may be obtained
Example: For a survey aimed at estimating
within 10% of true value with 95% confidence.
vaccination coverage, P is usually anticipated at
The anticipated P is 5-10%
0.5, d=0.1 and the level of significance = 5%.
The estimated sample size using the above P= 0.05 (lower limit is to be taken to have
formula is 96 for random sampling. The larger sample & better precision)
estimated sample size is applicable only in case Confidence level is 95%
of simple random sampling (SRS). If another
Relative Precision (å) = 10% = 0.1
sampling method is used, a larger sample size
is likely to be needed because of design effect. The estimated sample size using formula
For cluster sampling strategy, the estimated 2 is 3457. For cluster sampling the sample size
sample size as above is multiplied by design will be 3457x D (design effect).
effect, which is defined as the ratio of variance III. Estimating the difference between two
obtained in cluster survey to the variance for the population proportions with specified
same sample size adopting SRS. absolute precision (Two – sample
situations)- equal n in the two groups
Sample design var iance
Design effect = Required information :-
var iance usin g SRS
- Anticipated population proportions = P1
In most sample surveys adopting cluster- & P2
sampling strategy, a design effect of 2 is taken.
- Confidence level = 95%
This means twice as many individuals would have
to be studied to obtain the same precision as - Absolute precision required on either
with SRS. Thus, the estimated sample size in side of the true value of the difference
the above example would be 96x2=192. However, between proportions = d
design effect should ideally be estimated from Sample size can be estimated using
previously available data or from pilot studies. following formula:
II. Estimating a population proportion: With
specified relative precision Z12 − αα [P1 (1 − P1 ) + P2 (1 − P2 )]
1−2
n = 2 ––– (3)
Required information :- d2
- Anticipated population proportion : P P1, P2 = anticipated value of the proportions
in the two populations.
- Confidence level
Example : What sample size to be selected
- Relative Precision: ε – The sample result
from each of two groups of people to estimate a
should fall within ε % of the true value.
risk difference to be within 3 percentage points
Sample size can be estimated using of true difference at 95% confidence when
following formula : anticipated P1 & P2 are 40% & 32% respectively.

Z21 − αα (1 − P) (1.96)2 [(0.4 × 0.6) + (0.32 × 0.68)]


1− 2 n = = 1953
n = 2 ––––– (2) (0.03)2
ε2P
IV. Hypothesis testing for two population
Example : For estimating prevalence of
60
proportions Sample size can be estimated using
formula :
Required information :-
- Anticipated values of the population Z2 α × σ2
1-
proportions: P1 & P2 n = 2 ––––– (6)
- Level of significance
2
ε µ2

- Power of the test: 100 (1-β) % VII. Estimating difference between means of
two populations with specified
Sample size can be estimated using
precision.
following formula :
Difference in means = µ1 − µ2
− −
{Z α 2P (1− P )] + Z1−β P1 (1− P1 ) + P2 (1− P2 ) { 2
Standard Deviation (SD) = s1, s2
1−
2
n= – (4) Absolute precision: d
(P1 − P2 )2
First calculate pooled variance and estimate
Values of Z for 90% power = 1.28 sample size using formula
80% power = 0.82
(n1 − 1)s12 − (n 2 − 1)s 22
Pooled variance (σ 2) =
V. Estimating a population mean: With (n1 − 1) + (n 2 − 1)
specified absolute precision
Z12 − αα [2σ 2 ]
1 −
Required information :- n = 22 ––––– (7)
d2
- Variance : σ , known or can be
2

estimated from a pilot study Example : Suppose you want to know the
sample sizes (equal groups) required for detecting
- Absolute precision : d
a mean difference of 0.3 mg / ml between well-
Sample size can be estimated using nourished and under-nourished children. This
formula: difference is considered to be clinically important.
Suppose the mean and SD of Hemoglobin (Hb)
Z12 − αα × σ 2 levels available from an earlier study in a random
1− 2
n = 2 ––––– (5) sample of well-nourished and under-nourished
d2 groups were as follows:
σ = standard deviation estimated from a Well-nourished (group-1) :
pilot study n1 = 100, x-1=10.1 and SD1= 0.9
VI. Estimating population mean: With Under-nourished (group-2) :
specified relative precision
n2 = 70, x-2= 9.7 and SD2= 1.1
Required information :- The SDs do not differ too much and we
- Population mean: µ can pool them. Thus,
Pooled variance (σ2 )
- Variance : σ2
= (99 x 0.92 +69 x 1.12) / (100 + 69 )
- Relative precision: (ε)
= 0.97

1.962 [2 × (0.97)2 ]
n = = 80
(0.3) 2
61
VIII. Hypothesis testing for two population would be used to determine the size.
means
SAMPLE SIZE ESTIMATION FOR OTHER
Required information :- SITUATIONS
Anticipated values of the population For more complex study designs, for
means : µ1 & µ2 example two groups of unequal size, comparison
Standard deviation: s1, s2 of more than two groups, incidence studies, and
interventions allocated to communities; the
Level of significance
methods of sample size estimation are more
Power of the test: 100 (1-β) % complex and may be referred to in a standard
Sample size can be estimated using statistical text book.
formula: There are computer programs available that
perform sample size calculations. In particular,
s12 − s 22 this facility is available in the package ‘Epi Info’,
Pooled variance (σ ) =
2
2 though it does not cover the full range of
possibilities.
2
  SAMPLE SIZE IN QUALITATIVE RESEARCH
2σ  Z α + Z1 − β 
2

n =  1− 2  ––––– (8) Sample size in qualitative research for e.g.


(µ1 − µ 2 ) 2
Knowledge, Attitude and Practice (KAP) studies,
will depend upon expected response, for e.g. the
STUDIES WITH MULTIPLE OUTCOMES proportion of doctors using intermittent regimen.
In most studies, several different outcomes Based on the expected response, the usual
are measured. For example, in a study of the method for estimating sample size can be
efficacy of insecticide-treated mosquito-nets on employed. However, in general assessment of
childhood malaria, there may be interest in the KAP cannot be performed on the basis of a single
effect of the intervention on deaths, deaths parameter. If we use approach based on
attributable to malaria, episodes of clinical malaria proportions, then we need to calculate sample
over a period of time, spleen sizes at the end of size for each parameter separately. In such
the malaria season, packed cell volumes at the situations, usually a score is assigned to the
end of the season. correct response to an item. Thus, a total score
for all the correct responses of each individual
The investigator should first focus attention member is obtained. The total score can then be
on a few key outcomes. Calculate the required treated as a continuous or a dichotomous
study size for each of these key outcomes. The response for analysis. Therefore, the usual
outcomes that result in the largest study size approach for sample size estimation viz.
estimating proportion(s) or mean (s) could be
employed.

62

Potrebbero piacerti anche