Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
diagnostic testing
Correlated Observations
Correlated data arise when pairs or clusters
of observations are related and thus are
more similar to each other than to other
observations in the dataset.
Ignoring correlations will:
– overestimate p-values for within-person or
within-cluster comparisons
– underestimate p-values for between-person or
between-cluster comparisons
Pair Matching: Why match?
Pairing can control for extraneous sources
of variability and increase the power of a
statistical test.
Match 1 control to 1 case based on potential
confounders, such as age, gender, and
smoking.
Example
Johnson and Johnson (NEJM 287: 1122-1125,
1972) selected 85 Hodgkin’s patients who had a
sibling of the same sex who was free of the
disease and whose age was within 5 years of the
patient’s…they presented the data as….
Tonsillectomy None
Hodgkin’s 41 44
Sib control 33 52
Tonsillectomy 26 15
None 7 37
No diabetes 16 82 98
25 119 144
No diabetes 16 82 98
25 119 144
No diabetes 16 82 98
25 119 144
37 b 37
ˆ
p
37 16 b c 53
MI controls
MI cases Diabetes No Diabetes
46
Diabetes 9 37
No diabetes 16 82 98
25 119 144
b 37
OR
c 16
MI controls
MI cases Diabetes No Diabetes
46
Diabetes 9 37
No diabetes 16 82 98
25 119 144
53 53 53
p value (.5) (.5) (.5) (.5) (.5) 39 (.5)14 ...
37 16 38 15
37 38 39
McNemar’s Test
MI controls
MI cases Diabetes No Diabetes
Diabetes 9 37
No diabetes 16 82
McNemar’s Test:
(37 16 ) 2 212
12 8.32 2.88 2 ; p .01
53 53
Example: McNemar’s EXACT
test
Split-face trial:
– Researchers assigned 56 subjects to apply SPF
85 sunscreen to one side of their faces and SPF
50 to the other prior to engaging in 5 hours of
outdoor sports during mid-day. The outcome is
sunburn (yes/no).
– Unit of observation = side of a face
– Are the observations correlated? Yes.
SPF-50 side
Not sunburned 7 48
7 7 0
P( X 0) .5 .5 .0078
McNemar’s exact test: 0
Null hypothesis: X~binomial (n=7, p=.5) 7 7 0
P( X 7) .5 .5 .0078
0
Two sided p - value .0156
RECALL: 95% confidence
interval for a difference in
INDEPENDENT proportions
Standard error can be estimated by: pˆ (1 pˆ )
n
p E / D (1 p E / D )
Var( p E / D )
ncases controls
p E / ~ D (1 p E / ~ D )
Var( p E / ~ D )
ncases controls
p E & D * p~ E & ~ D p~ E & D * p E & ~ D
Cov( p E / ~ D , p E / D )
ncases controls
pE / D (1 pE / D ) pE / ~ D (1 pE / ~ D ) p *p p~ E &D * pE &~ D
Var( pE / D pE / ~ D ) 2( E &D ~ E &~ D )
n n n
95% CI for difference in
dependent proportions
MI controls
MI cases Diabetes No Diabetes
46
Diabetes 9 37
No diabetes 16 82 98
25 119 144
46 25
pE / D pE /~D .32 .17 .15
144 144
Var( p E / D pE /~D )
p E / D (1 p E / D ) p E / ~ D (1 p E / ~ D ) p * p ~ E &~ D p ~ E & D * p E &~ D
2( E & D
n n n
46 46 25 25 9 82 37 16
( )(1 )( )(1 ) 2( * * )
144 144 144 144 144 144 144 144 .0024
144
95 % CI : 0.15 1.96 ( .0024 ) 0.05 0.24
The connection between McNemar
and Cochran-Mantel-Haenszel Tests
View each pair is it’s own
“age-gender” stratum
Example:
Concordant for
exposure (cell “a”
from before)
Case (MI) Control
Diabetes 1 1
No diabetes 0 0
Case (MI) Control
Diabetes 1 1
x9
No diabetes 0 0
Diabetes 1 0 x 37
No diabetes 0 1
Diabetes 0 1
x 16
No diabetes 1 0
Diabetes 0 0
1
x 82
No diabetes 1
Mantel-Haenszel for pair-
matched data
Case Control
Exposed a b
Not Exposed c d
Case (MI) Control
Diabetes 1 1 ad/T = 0
x9
No diabetes 0 0 bc/T=0
Diabetes 1 0 ad/T=1/2 x 37
No diabetes 0 1 bc/T=0
Case (MI) Control
Diabetes 0 1 ad/T=0
x 16
No diabetes 1 0 bc/T=1/2
Diabetes 0 0 ad/T=0
1 bc/T=0 x 82
No diabetes 1
Mantel-Haenszel Summary OR
144
ai d i
2
37 x
1
2 37
ORMH 144
i 1
bi ci 1 16
i 1 2
16 *
2
Mantel-Haenszel Test Statistic
(same as McNemar’s)
k
[ (a k E (ak ))] 2
i 1
k
~ 2
1
Var(a )
i 1
k
(ak bk ) * (ak ck )
recall : E (ak )
nk
(ak bk ) * (ck d k ) * (ak ck ) * (bk d k )
Var(ak )
nk2 (nk 1)
Concordant cells contribute nothing to Mantel-
Haenszel statistic (observed=expected)
Case (MI) Control (2) * (1)
E ( ak ) 1
Diabetes 1 1 2
a k E ( ak ) 1 1 0
No diabetes 0 0
(2)(1)(1)(0)
Var(ak ) 2
0
2 (1)
(1) * (1) 1
Case (MI) Control E ( ak )
2 2
Diabetes 0 1 1 1
ak E ( ak ) 0
No diabetes 1 0 2 2
(1)(1)(1)(1) 1
Var(ak ) 2
2 (2 1) 4
(row1) * (col1)
recall : E (ak )
nk
(row1) * (row2) * (col1) * (col2)
Var(ak )
nk2 (nk 1)
k
[ (a k E (ak ))]
2
2
1
i 1
k
Var(a )
i 1
k
CMH k
Var(a )
i 1
k
[ .5 .5 ] [.5(b) .5(c)]
2
case disc.cells control disc.cells
.25
disc.cells
(b c)(.25)
.52 (b c) 2 (b c) 2
McNemar' s
.25(b c) bc
~ 12
Example: Salmonella
Outbreak in France, 1996
None 6 7 13
29 30 59
b 23
OR 3.8
c 6
In 2x2 table form: Brand A
Goat’s cheese
Controls
Cases Goat’ cheese B None
32
Goat’s cheese B 8 24
None 2 25 27
10 49 59
b 24
OR 12.0
c 2
Case (MI) Control
Brand A 1 1
0
x8
None 0
Brand A 1 0
x24
None 0 1
Brand A 0 1
0
x2
None 1
Brand A 0 0
x25
None 1 1
n1 k n1k 2 *1
8 concordant exposed : 11k E(n11k ) 1
n k 2
Observed(n11k ) 11k 1 1 0 Using
n1 k n1k n2 k n 2 k 2 *1 * 0 *1 Agresti
Var(n11k ) 2 0 notation
n k (n k 1) 4(2 1) here!
Summary: 8 concordant-exposed pairs (=strata) contribute
nothing to the numerator (observed-expected=0) and nothing to
the denominator (variance=0).
n1 k n1k 0 *1
25 concordant unexposed : 11k E(n11k ) 0
n k 2
Observed(n11k ) 11k 0 0 0
n n n n 0 *1 * 2 *1
Var(n11k ) 12k 1k 2 k 2 k 0
n k (n k 1) 4(2 1)
Summary: 25 concordant-unexposed pairs contribute nothing to
the numerator (observed-expected=0) and nothing to the
denominator (variance=0).
(1)(1) 1
2 discordant cells favor control : 11k
2 2
Observed(n11k ) 11k 0 .5 .5
n1 k n1k n2 k n 2 k 1 *1 *1 *1 1
Var(n11k ) 2
n k (n k 1) 4(2 1) 4
Summary: 2 discordant “control-exposed” pairs contribute -.5
each to the numerator (observed-expected= -.5) and .25 each to
the denominator (variance= .25).
(1)(1) 1
24 discordant cells favor case : 11k
2 2
Observed(n11k ) 11k 1 .5 .5
n1 k n1k n2 k n 2 k 1 *1 *1 *1 1
Var(n11k ) 2
n k (n k 1) 4(2 1) 4
Summary: 24 discordant “case-exposed” pairs contribute +.5
each to the numerator (observed-expected= +.5) and .25 each to
the denominator (variance= .25).
[8(0) 25(0) 24(.5) 2(.5)]2
CMH
0 0 24(.25) 2(.25)
2
22 (.25) 22 2
(24 2) 2
(b c) 2
26(.25) 26 26 bc
Diagnostic Testing and
Screening Tests
Characteristics of a diagnostic test
Sensitivity= Probability that, if you truly have
the disease, the diagnostic test will catch it.
- c d
a+c b+d
Among those who test
a
PPV positive, how many truly have
ac the disease?
- 109 881
118 882
PPV=9/118=7.6%
NPV=881/882=99.9%
sensitivity=18/20=.90
specificity=872/980=.89
Sensitivity and specificity are characteristics of the test, so they don’t
change!
What if disease was more
prevalent?
Mammography
+ -
Breast cancer ( on biopsy)
+ 18 2
- 108 872
126 874
PPV=18/126=14.3%
NPV=872/874=99.8%