Sei sulla pagina 1di 55

s_sudigdo@yahoo.

com/2010
Population & Sample

Sudigdo Sastroasmoro
s_sudigdo@yahoo.com/2010
Population is a large group of study subjects
(human, animals, tissues, blood specimens,
medical records, etc) with defined
characteristics [Population is a group of
study subjects defined by the researcher as
population]
Sample is a subset of population which will be
directly investigated. Sample should be (or
assumed to be) representative to the
population; otherwise all statistical analyses
will be invalid
All investigations are always performed in the
sample, and the results will be applied to the
population

s_sudigdo@yahoo.com/2010
Avoid using ambiguous
terms
Sample population
Sampled population
Populasi sampel
Study population ~
sample

s_sudigdo@yahoo.com/2010
Gap between
Das Sein & Das Sollen
Literature study
Research question(s)
/ Hypothesis
Methods / Design
Data collection &
analyses
Conclusions
In the real world
(Population)
In the sample

Infer
s_sudigdo@yahoo.com/2010
Sample is assumed to be representative
to the population.
In research: measurements are always
done in the sample, the results will be
applied to population.
S
P
P
S
s_sudigdo@yahoo.com/2010
P
S
Investigation
S
P
S
Sampling
Results
Inference
s_sudigdo@yahoo.com/2010
Target population
Accessible
population
Intended
Sample
Actual
study subjects
s_sudigdo@yahoo.com/2010
Target population = domain = population in which
the results of the study will be applied. In clinical
research it is usually characterized by demographic
& clinical characteristics; e.g. normal infants, teens
with epilepsy, post-menopausal women with
osteoporosis.
Accessible population = subset of target population
which can be accessed by the investigator. Frame:
time & place. Example: teens with epilepsy in
RSCM, 2000-2005; women with osteoporosis, 2002
RSGS
Intended sample = subjects who meet eligibility
criteria and selected to be included in the study
Actual study subjects = subjects who actually
completed the participation in the study
s_sudigdo@yahoo.com/2010
Accessible
population
(+ time,
place)
Usually based on practical
purposes
Appropriate
sampling
technique
[Non-response, drop outs,
withdrawals, loss to follow-up]
Target population
(demographic, clinical)
Intended
Sample
[Subjects selected
for study]
Actual
study
subjects
Subjects
completed
the study
s_sudigdo@yahoo.com/2010
Target
Population
(Domain)
Accessible
population

Intended
Sample
Actual
study
subjects
External validity II:
Does AP represent TP?
[Internal validity: does
ASS represent IS?]
[External validity I:
Does IS represent AP?}
s_sudigdo@yahoo.com/2010
Internal validity: how well the study was done
(usu. measurement, but also incl. whether
actual study subjects represent intended
sample or not). Many drop outs? loss to follow
up? low compliance?.
External validity I: assess whether intended
sample represents accessible population
(random sampling? convenient sampling?)
External validity II: whether accessible
population represents target population. This
cannot be calculated, but can be judged by
common sense & general knowledge
Validity: Internal & external
s_sudigdo@yahoo.com/2010
A. Probability sampling
Simple random sampling (r. table, computer
generated)
Stratified random sampling
Systematic sampling
Cluster sampling
Others: two stage cluster sampling, etc
B. Non-probability sampling
Consecutive sampling
Convenience sampling
Judgmental sampling / Purposive sampling
Sampling methods
s_sudigdo@yahoo.com/2010
Predicting the 1936 Election
In 1936, Literary Digest mailed questionnaires to
10 million people, asking who they would vote for
in the upcoming presidential election. The list was
complied from magazine subscribers, car owners
and telephone directories. Based on the 2.3
million responses, they predicted a victory for
Republican Landon over Roosevelt by a 60 to 40
margin.
Roosevelt won with 61% of the vote, to 36% for
Landon.
George Gallup correctly predicted the election
and the results of the Literary Digest poll!to
within 1 percent, using random samples.
s_sudigdo@yahoo.com/2010
Probability sampling (1)
Simple random sampling:
Select 50 out of 900 students
1. Using Random number table:
o Example: 146*72 2*238*9 12*970 *127*63
8*759*0 29*874 *390*48 6*8301
2. Using computer generated random numbers
(pseudo-random)
Command: How many subjects do you have? 900
How many do you want to select? 50
Enter 017, 068, 113, 142, etc
Repeating the procedure exactly will result in completely
different numbers
s_sudigdo@yahoo.com/2010
Simple Random Sample:
n = 20, N= 2000
s_sudigdo@yahoo.com/2010
Probability sampling (2)
Systematic sampling:
Every m subject is selected
Selected number: k
Example: k =3, m =10:
3, 13, 23, 33, 43, etc
Better (more representative) than
SRS if no natural trends or strata

s_sudigdo@yahoo.com/2010
Systematic sample:
N = 2000, n = 20, m = 100, k = 45
45, 145, 245, 1945
s_sudigdo@yahoo.com/2010
Probability sampling (2)
Stratified [random] sampling:
Random sampling is done in each
strata separately, e.g., by sex, age
group, stage of disease, etc
The results then combined
s_sudigdo@yahoo.com/2010
Stratified sample of 20 from 4
strata
s_sudigdo@yahoo.com/2010
Probability sampling (3)
Cluster sampling

Subjects are selected separately
according to cluster or place (RT, RW,
district, etc)
s_sudigdo@yahoo.com/2010
Cluster Sample of 20
(cluster size = 4)
s_sudigdo@yahoo.com/2010
Non-probability sampling (1)
Consecutive sampling:
Subjects are selected according to their
appearance on the list
Most commonly used in clinical studies
Can be expected resembling random
sampling if time span is long enough
This is the best of non-probability
sampling
s_sudigdo@yahoo.com/2010
Non-probability sampling (2)
Convenience sampling
Judgmental sampling
They are rarely justified except for certain
conditions, e.g. normal values
s_sudigdo@yahoo.com/2010
All statistical analyses (inferences) are
based on (simple) random sampling
Whether or not a sample is
representative to the population
depends on whether or not it resembles
the results if it were done by random
sampling

Note
s_sudigdo@yahoo.com/2010
How to generalize results in the
sample
to the population:

Introduction to
statistical inference

s_sudigdo@yahoo.com/2010
IMPORTANT!!!
Statistical significance vs. clinical
importance
Negligible clinical difference may be
statistically very significant if the number
of subjects >>>. e.g., difference in
reduction of cholesterol level of 3 mg/dl,
n
1
=n
2
= 10,000; p = 0.00002
Large clinical difference may be
statistically non-significant if the no of
subjects <<<, e.g. 30% difference in cure
rate, if n1 = n2 = 10, p = 0.74
s_sudigdo@yahoo.com/2010
R
x = 300
mg/dl
x = 300
mg/dl
Standard
treatment
New
treatment
Cholesterol level,
mg/dl
t = df = 9998 p = 0.00002
x = 200
x = 197
Clinical
Statistical
Clinical importance vs. statistical significance
n=10000
n=10000
s_sudigdo@yahoo.com/2010
Cured Died
Standard Rx 0 10
(100%)
New Rx 3 7 (70%)
Fischer exact test: p = 0.211
Clinical significance vs. statistical
significance
Absolute risk reduction = 30%
Clinical
Statistical
s_sudigdo@yahoo.com/2010
Abstract
Objectives:
Methods:
Results: After 2 months of treatment,
there was significant difference in LDL
(P = 0.0032), HDL (P = 0.048), but there
was no significant difference in
triglyceride (P= 0.073) between the 2
groups.
Conclusion:

s_sudigdo@yahoo.com/2010
Can the results of the study (in sample)
be applied in the accessible or target
population?
Hypothesis testing & confidence interval
Introduction to statistical inference

s_sudigdo@yahoo.com/2010
Statistic and Parameter
An observed value drawn from the sample
is called a statistic (cf. statistics, the
science)
The corresponding value in population is
called a parameter
We measure, analyze, etc statistics and
translate them as parameters
s_sudigdo@yahoo.com/2010
Examples of statistics:
Proportion
Percentage
Mean
Median
Mode
Difference in
proportion/mean

OR
RR
Sensitivity
Specificity
Kappa
LR
NNT

s_sudigdo@yahoo.com/2010
There are 2 ways in inferring
statistic into parameter:
Hypothesis testing p value
Estimation: confidence interval (CI)

P Value & CI tell the same concept in
different ways
s_sudigdo@yahoo.com/2010
P value
Determines the probability that the
observed results are caused solely
by chance (probability to obtain the
observed results if Ho were true)
s_sudigdo@yahoo.com/2010
C 30 (60%) 20 (40%) 50

E 40 (80%) 10 (20%) 50
X
2
= ; df = 1; p = 0.0432
Group Success Failure Total
s_sudigdo@yahoo.com/2010
C 30 (60%) 20 (40%) 50

E 40 (40%) 10 (20%) 50
X
2
= ; df = 1; p = 0.0432
Group Success Failure
Total
If drugs E and C were equally effective, we still can have
the above result (difference of success rate of 20%)
but the probability is small (4.32%)
If drugs E and C were equally effective, the probability
that the result is merely caused by chance is 4.32%
If we define in advance that p<0.05 is significant,
than the result is called statistically significant
s_sudigdo@yahoo.com/2010
Similar interpretation applies to ALL
hypothesis testing: t-test, Anova, non-
parametric tests, Pearson correlation,
multivariate tests, etc:
If null-hypothesis null were true, the
probability of obtaining the result was
. (example 0,02 or 2%, etc)
s_sudigdo@yahoo.com/2010
Confidence Intervals
Estimate the range of values
(parameter) in the population using a
statistic in the sample (as point
estimate)
s_sudigdo@yahoo.com/2010
X X X
If the observed
result in the
sample is X, what
is the figure in
the population?
CI
A statistic (point estimate)
S
P
s_sudigdo@yahoo.com/2010
Most commonly used CI:
CI 90% corresponds to p 0.10
CI 95% corresponds to p 0.05
CI 99% corresponds to p 0.01

Note:
p value only for analytical studies
CI for descriptive and analytical
studies
s_sudigdo@yahoo.com/2010
How to calculate CI
General Formula:

CI = p Zo x SE

p = point of estimate, a value drawn
from sample (a statistic)
Zo = standard normal deviate for o, if o
= 0.05 Zo = 1.96 (~ 95% CI)
s_sudigdo@yahoo.com/2010
Example 1
100 FKUI students 60 females (p=0.6)
What is the proportion of females in
Indonesian FK students? (assuming FKUI
represents FK in Indonesia)


s_sudigdo@yahoo.com/2010
Example 1
7 0 5 0 1 0 6 0
96 1 6 0
100
4 0 6 0
96 1 6 0 95
. ; . . .
. .
. .
. . %
n
pq
SE(p)
=
=
=
=
=
X0.5/10
x
CI
s_sudigdo@yahoo.com/2010
Example 2: CI of the mean
100 newborn babies, mean BW = 3000 (SD =
400) grams, what is 95% CI?
95% CI = x 1.96 x SEM
3080 ; 2920
) 80 3000 ( ); 80 3000 ( 80 3000
100
400
x 96 . 1 3000 CI % 95
n
SD
SEM
=
+ = =
=
=
s_sudigdo@yahoo.com/2010
Examples 3: CI of difference
between proportions (p1-p2)
50 patients with drug A, 30 cured (p1=0.6)
50 patients with drug B, 40 cured (p2=0.8)


29 . 0 ; 11 . 0 ) 09 . 0 2 . 0 ( ); 9 . 0 2 . 0 ( ) p p ( CI % 95
09 . 0
50
4 . 0
50
) 2 . 0 8 . 0 (
50
) 4 . 0 6 . 0 (
n
q p
n
q p
) p p ( SE
) p p ( xSE 96 . 1 ) p p ( ) p p ( CI % 95
2 1
2
2 1
2
1 1
2 1
2 1 2 1 2 1
= + =
= =

=
+ =
=
s_sudigdo@yahoo.com/2010
Example 4: CI for difference
between 2 means
Mean systolic BP:
50 smokers = 146.4 (SD 18.5) mmHg
50 non-smokers = 140.4 (SD 16.8) mmHg
x
1
-x
2
= 6.0 mmHg
95% CI(x
1
-x
2
) = (x
1
-x
2
) 1.96 x SE (x
1
-x
2
)
SE(x
1
-x
2
) = S x V(1/n
1
+ 1/n
2
)
s_sudigdo@yahoo.com/2010
Example 4: CI for difference
between 2 means
V
13.0 1.0; ) (1.96X3.53 6.0 95%CI
3.53
50
1
50
1
17.7 ) x SE(x
17.7
98
16.2 49 18.6) (49
s
2) n (n
1)s (n 1)s (n
s
2 1
2 1
2
2 2
2
1 1
= =
=
|
.
|

\
|
+ =
=
+
=
+
+
=
s_sudigdo@yahoo.com/2010
Other commonly supplied CI
Relative risk (RR)
Odds ratio (OR)
Sensitivity, specificity (Se, Sp)
Likelihood ratio (LR)
Relative risk reduction (RRR)
Number needed to treat (NNT)
s_sudigdo@yahoo.com/2010
Altman & Gore
Statistics with confidence
s_sudigdo@yahoo.com/2010
Suggested CI presentation:
95%CI: 1.5 to 4.5
95%CI: -2.5 to 4.3
95%CI: 12 to -6

Not recommended: 3 + 1.5
Not recommended: -9 + -3
s_sudigdo@yahoo.com/2010
In contrast to CI for proportion, mean,
diff. between proportions/means, where
the values of CI are symmetrical around
point estimate, CIs for RR, OR, LR,
NNT are asymmetrical because the
calculations involve logarithm
s_sudigdo@yahoo.com/2010
Examples
RR = 5.6 (95% CI 1.2 ;
23.7)
OR = 12.8 (95% CI 3.6 ; 44,2)
NNT = 12 (95% CI 9 ; 26)
s_sudigdo@yahoo.com/2010
If p value <0.05, then 95% CI:
exclude 0 (for difference), because if A=B
then A-B = 0 p>0.05
exclude 1 (for ratio), because if A=B then
A/B = 1, p>0.05

For small number of subjects, computer
calculated CI may not meet this rule due
to correction for continuity automatically
done by the computer
s_sudigdo@yahoo.com/2010
Concluding remarks
In every study sample should (assumed to)
be representative to the population.
Otherwise all statistical calculations are not
valid
p values (hypothesis testing) gives you the
probability that the result in the sample is
merely caused by chance, it does not give
the magnitude and direction of the
difference
Confidence interval (estimation) indicates
estimate of value in the population given
one result in the sample, it gives the
magnitude and direction of the difference
s_sudigdo@yahoo.com/2010
Concluding remarks
p value alone tends to equate statistical
significance and clinical importance
CI avoids this confusion because it
provides estimate of clinical values and
exclude statistical significance
whenever applicable, supply CI especially
for the main results of study
in critical appraisal of study results, focus
should be on CI rather than on p value.

Potrebbero piacerti anche