Sei sulla pagina 1di 18

Calculating Sample Size for

Prevalence Stuies

Source Naing L, et al. Practical Issues in Calculating the


Sample Size for Prevalence Studies. Arch of Orofac Sci
2006; 1:9-14

2/18/2018
How big a sample do I require for
prevalence studies?
• Aim of sample size calculation is to
determine an adequate sample size to
estimate the population prevalence with
good precision.
• Need to decide which appropriate values
of parameters will be used in the formula.

2/18/2018 Asri Adisasmita


How to Calculate the Sample Size

For CI of 95%  Z value is 1.96; for 99%  Z value is 2.58

Prevalence should be presented as (example): 40%, 95% CI


30% - 50%  d for this estimate is 10%. The width of CI=2d
(the width is 20%)  poor estimate.
If narrower CI is wanted  need to calculate sample size with
2/18/2018 Adisasmita
a smaller d, e.g. d= 0.05Asri CI width= 10%
Practical Issues in Determining Sample
Size Parameters
• Determine d
– If the disease prevalence is between 10% tp 90% 
it’s OK to have d=5%.
– If the disease prevalence is <10% or >90%, d=5% will
be problematic (result in irrelevant negative lower-
bound values or larger than 1 or >100% upper bound
values).
– If Prevalence <10%  d=0.5P, and if prevalence
>90%  d= {0.5(1-P)}
– Ex. If P=0.04, then d=0.02; if P=0.98, then d=0.01

2/18/2018 Asri Adisasmita


Estimating d

95% CI of rare diseases (P =< 0.05) and common diseases (P => 0.95) with a precision
(d) set at 0.05
2/18/2018 Asri Adisasmita
Practical Issues in Determining Sample
Size Parameters
• Estimating P
– May get several P from previous studies in the
literature  preferably P from studies with similar
study design and study population from the most
recent studies.
– If the range is between 20% - 30%  use 30%; but if
between 60% - 80%  use 60%. This will give us a
larger sample size. If the range is 40% - 60%  use
50% (will give larger sample size)
– If there was doubt in P  best to use P=50%  will
give the largest sample size if the prevalence is
between 10% - 90% (Figure below).

2/18/2018 Asri Adisasmita


Estimating P

2/18/2018 Asri Adisasmita


Largest Sample Size
• Many book and guides suggest that if is is impossible to
have a good estimate for P, just set the P=0.5 to get
maximum sample size.
• Setting P = 50% (0.5) does not necessarily provide the
biggest sample size  look at the Figure in previous
slide.
• Arguments: ex. P=0.5 and d=0.05  sample size= 385
– If the real P=1%, we may get 3 or 4 cases only, or no cases at
all. The sample size is too small, and the assumption of
normality is not met.
– If the real P=99%, we would get no non-cases, all would be
cases, and assumption of normality is not met.
• A very crude pilot study with a sample size of 20-30 can
also estimate the prevalence.

2/18/2018 Asri Adisasmita


Assumption of Normal Approximation

• The above sample calculation formula is based


on normal approximation assumption.
• It says: nP and n(1-P) must >5  both cases
and non-cases in the selected sample must be
>5.
• Small sample sizes might not fulfill this
assumption  need to check this assumption
after calculating the sample size.
• Also d=0.5P and d=0.5(1-P) will also meet this
assumption  look at the next table.

2/18/2018 Asri Adisasmita


Checking assumption for calculated
sample sizes

2/18/2018 Asri Adisasmita


Normal approximation assumption

• The basic reason for speculating/estimating P


(prevalence) is that sample size depends on the
standard error (SE) of the distribution of
prevalence of the sample (p)
• Actually, sample size calculation formula is
derived from: d= Z x SE(p), where:

2/18/2018 Asri Adisasmita


Normal approximation assumption
• Example:
Prevalence of smoker in the village = 0.3, d= 0.05, Z= 1.96 

The calculated n is 323  means


that if take a random sample of 323
villagers and measure smoking
prevalence,. we will get smoking
prevalence of 25% to 35% (30%
±5%). And if we repeat 100 times 
95 of them would will be between
25% to 35%.
All these p (prevalence of the
Normally distributed sample estimates (p)
sample) will normally distributed
2/18/2018
Finite Population Correction

• The above sample size formula is valid if sample


size is ≤ 5% of population size (n/N ≤0.05). If the
proportion is > 5%  use finite population
correction:

2/18/2018
Cluster or Multistage Sampling
• The above sample size are valid only if
using simple random or systematic
random sampling methods.
• Cluster or multistage sampling methods
require a larger sample size to achieve the
same precision.  has to be multiplied by
the design effect – deff (in immunization,
the deff=2.

2/18/2018
“The larger the sample size the better
the study”
• That is not always true.
• The aim for sample size calculation is not to
obtain the biggest sample size ever. But, to get
an optimum or adequate sample size.
• Too large a sample size is not cost-effective,
could be unethical  in drug trials, a very large
sample would lead to a conclusion that the new
drug is significantly better than the old drug in
statistical sense, although the difference is
clinically insignificant.
• Look at table below.

2/18/2018
“The larger the sample size the better
the study”

Two-fold increase in sample size will improve precision by 30%. If


quadrupled, the precision becomes halved

2/18/2018
Other Objectives of the Study
• Many studies have more than 1 objectives
 recommended to calculate for sample
size required for other objectives. The
biggest sample size should than be taken
as the sample that would accommodate all
study objectives.

2/18/2018
Anticipating Non-Response or Missing
Data
• The calculated sample size above is
assuming that there is no problem with
non-response or missing values.
• If non-response or missing data occurs 
will not achieve the desired precision.
• Need to over-sample by 10% to 20% of
the calculated required sample size

2/18/2018

Potrebbero piacerti anche