Analysis of Clustered Binary Data

PLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by: [University of Alberta]

On: 7 January 2009
Access details: Access Details: [subscription number 713587337]
Publisher Informa Healthcare
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK
Encyclopedia of Biopharmaceutical Statistics
Publication details, including instructions for authors and subscription information:
http://www.informaworld.com/smpp/title~content=t713172960
Analysis of Clustered Binary Data
Valerie Durkalski
a
a
Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina,
Charleston, South Carolina, U.S.A.
Online Publication Date: 13 April 2005
To cite this Section Durkalski, Valerie(2005)'Analysis of Clustered Binary Data',Encyclopedia of Biopharmaceutical Statistics,1:1,1 6
Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article may be used for research, teaching and private study purposes. Any substantial or
systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or
distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss,
actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly
or indirectly in connection with or arising out of the use of this material.
Analysis of Clustered Binary Data
Valerie Durkalski
Medical University of South Carolina, Charleston, South Carolina, U.S.A.
INTRODUCTION
Clinical trial designs often incorporate binary outcomes
(i.e., proportion of responders; success probabilities) or
change continuous outcomes into binary outcomes for
interpretation purposes. When a subject contributes
more than one binary response (paired or unpaired), it
results in clustered data in which the subject is the
cluster and each response within a cluster is a unit. In
this scenario, responses are considered dependent within
a cluster, and independent between clusters.
One can imagine several distinct contexts in which
clustered binary data might arise. Examples include
ophthalmology studies, in which the unit is each eye,
but more generally the number of units may vary
across clusters. This more general formulation then
includes dental studies, in which the unit of analysis
is each tooth, oncology studies, in which the units are
infected nodes or lesions within a patient, and commu-
nity intervention studies, in which the unit is the indi-
vidual within the community. When dealing with two
treatments, it would be possible for some clusters,
and every unit in those clusters, to be assigned to one
treatment group, and for the other clusters, and every
unit in these other clusters, to be assigned to the other
treatment group. For example, to examine the success
of two teaching programs on the graduation rate in
one school, one may randomize classrooms (cluster)
to the treatment and have students be the unit of ana-
lysis. Fear of contamination would preclude one from
randomizing students within the same classroom to
different teaching conditions. This is also the context
for repeated measures studies, in which the cluster is
the patient and the units are the sampling times. Such
a design allows for the separation of effects due to
between-subject variability from effects due to within-
subject variability.
[1]
It is also possible that each cluster
receives both treatments, but each unit within each
cluster receives one treatment or the other, such as in
some ophthalmology studies, where one eye is given
one treatment and the other is given the comparison
treatment. A third context would be in which each unit
receives both treatments, such as in matched-pair
designs, where clusters (i.e., subjects) act as their own
control and receive both procedures=interventions, or
clusters are matched with each receiving one of the
procedures=interventions.
At the unit level, the correlation between units
within a cluster violates the independence assumption
of many of the statistical methods for analyzing binary
outcomes, such as Pearsons chi-square, McNemars
test, and logistic regression. This violation may
decrease standard error estimates and the p-value asso-
ciated with the test statistic, thereby yielding mislead-
ing statistical conclusions. The challenge that arises
when analyzing clustered binary data, therefore, is
handling the correlation among units within a cluster.
While cluster summary approaches such as collapsing
data within a cluster or focusing the design on clus-
ter-level analyses are valid in some cases, they do not
use all available data, and so more informative analysis
techniques would be expected to yield more precise
results. Unit-level analyses are the focus of this entry.
METHODS OF ANALYSIS
Methods for the analysis of clustered binary data
(paired and unpaired) are well developed.
[2,3]
Most
research directly addresses important theoretical and
applied issues regarding the effect of clustering and
the consequential effect on the variance and on the sta-
tistical conclusions. Adjustments for correlated binary
responses within a cluster have been incorporated into
statistical tests using a correlated binomial model,
[4]
Fishers permutation test,
[5]
binomial ratio estimators,
[6,7]
pooled estimators,
[8]
method of moments estimators,
[9]
and intracluster correlation (ICC).
[10]
Another
approach that is widely applied is marginal regression
modeling using the general estimating equation
(GEE) approach of Zeger and Liang.
[11]
Although
these comparison methods have been developed for
clustered binary data, not all tests are consistent in
terms of performance. Depending on the cluster sizes
and the correlation structure within clusters, the tests
performances may vary in terms of size and power.
This entry focuses on the ICC, pooled estimators,
method of moments estimators, and GEE
approaches.
To review different analysis methods, three sub-
scripts are dened in the case of two proportions, as
follows. Let the response variable Y
ijk
be dichotomous
(1 success, 0 failure), where i ( 1, 2) is the
treatment or intervention, j ( 1, 2, . . . , n
k
) is the unit
Encyclopedia of Biopharmaceutical Statistics DOI: 10.1081/E-EBS-120029859
Copyright # 2005 by Taylor & Francis. All rights reserved. 1
D
o
w
n
l
o
a
d
e
d

B
y
:

[
U
n
i
v
e
r
s
i
t
y

o
f

A
l
b
e
r
t
a
]

A
t
:

0
6
:
3
1

7

J
a
n
u
a
r
y

2
0
0
9
within the cluster, and k ( 1, 2, . . . , K) is the cluster.
Let ^ pp
ik
n
1
ik
P
n
k
j1
Y
ijk
be the event rate (i.e., propor-
tion of successes) in group i for cluster k; K is the total
number of clusters in the study population, and
N
P
K
k1
n
k
is the total number of units across all
the clusters for both treatments. The n
k
units in the
kth cluster are assumed to be xed for all K clusters.
With this framework, the data from each unit within
each cluster can be summarized as a standard 2 2
contingency table, as both treatment and outcome
vary, and are binary variables (assuming that there
are only two treatments). The frequencies of the
responses per group may be summed over all K clusters.
This data display is presented in Table 1 for inde-
pendent bivariate random variables and in Table 2
for dependent bivariate variables (matched-pair data).
Hypothesis Testing
When dealing with bivariate random variables, one for
each of two treatment conditions, it is often of interest
to test the hypothesis of equality of the probabilities of
a positive response across the treatment conditions.
Doing so is fairly straightforward when subjects are
allocated to treatment groups by random allocation,
with either no restrictions at all on the randomization
or with the only restriction being that the group totals
are xed. In either case (conditioning on the random
group totals in the rst case), the design-based analysis
is Fishers exact test.
[12]
When clustering is present,
however, this would not be an appropriate analysis,
because it would ignore the clustering.
To avoid underestimating the variance of the
parameter in the presence of clustering, Donner
and colleagues explore a model using the intracluster
correlation coefcient (ICC), which adjusts the chi-
square test for correlation within clustered data.
[13,14]
The ICC, originally applied to interobserver agreement
analyses as an alternative to Cohens kappa, an index
of agreement between two or more raters, is estimated
using the mean square errors of analysis of variance
(ANOVA) for mixed models, where the treatment is
constant and the cluster is random. The estimated
ICC represents the proportion of variation due to
between-cluster differences. Donner presents a detailed
explanation of how a consistent estimate of the ICC is
calculated under the assumption of a constant ICC
across clusters.
[5]
The ICC adjustment is applied to
the Pearson chi-square test,
[13]
chi-square test for linear
trend,
[14]
MantelHaenszel test,
[15]
and McNemar
test.
[16]
The test statistic is adjusted by dividing it by
an ination factor, C
i
1 m 1^ rr, where m
is the average number of units per cluster for treatment
group i and r is the estimated ICC for clustering,
^ rr
BMS WMS
BMS S
0
1WMS
In this equation, BMS is the mean squared error
between subjects, WMS is the mean squared error
within subjects, and S
0
is the adjusted mean cluster
size. The ICC approach for adjustment of a chi-square
test is an extension of a standard chi-square test,
because when r 0, the adjusted test reduces to the
standard test with one degree of freedom. Therefore,
if there is only one unit per cluster, the standard test
to compare overall event rates can be adopted,
w
2
X
2
i1
n
i
p
i
^ pp
2
C
i
^ pp1 ^ pp
where ^ pp
i

P
K
k1
p
ik
and ^ pp is the overall event rate.
In addition, the test can be extended to the comparison
of more than two treatment groups.
[14]
Although the
ICC appears to be quite adaptable to adjusting a
variety of chi-square tests, this method has limitations
regarding the required size of the study population and
the test assumptions. Donald and Donner
[13]
suggest that the number of units per cluster should
be greater than 10 and the ratio of the number of units
to the number of observations per unit should be
greater than two when applying the adjusted chi-
square to Pearsons chi-square test for homogeneity
of proportions. They also note that the methods
assumption, that the correlation between responses
within a cluster is equal, may not be appropriate when
the probability of a correct response varies between
units within a cluster or when the correlation is depen-
dent upon the size of the cluster. However, Jung
et al.
[17]
illustrate through simulation studies that the
adjustment method performs well even if the assump-
tion of a common intracluster correlation is not met,
Table 1 Contingency table for clustered binary data
Response
Success Failure
Group 1
P
K
k1
P
nk
j1
y
1jk
P
K
k1
P
nk
j1
n
1
y
1jk
P
K
k1
P
nk
j1
n
1jk
Group 2
P
K
k1
P
nk
j1
y
2jk
P
K
k1
P
nk
j1
n
2
y
2jk
P
K
k1
P
nk
j1
n
2jk
P
K
k1
P
nk
j1
y
1jk
y
2jk
P
K
k1
P
nk
j1
n
1jk
n
2jk
2 Analysis of Clustered Binary Data

D
o
w
n
l
o
a
d
e
d

B
y
:

[
U
n
i
v
e
r
s
i
t
y

o
f

A
l
b
e
r
t
a
]

A
t
:

0
6
:
3
1

7

J
a
n
u
a
r
y

2
0
0
9
as long as the intraclusters are only moderately differ-
ent. Nonetheless, certain data situations where this
assumption may not hold include lesion detection,
where the number of lesions per subject can vary
widely and the detection of a lesion can depend on
several factors, including its size and location.
To avoid the limitation inherent in the aforemen-
tioned analysis, other techniques would need to be
implemented. For example, Rao and Scott
[6]
have
developed a method that avoids having to calculate
an intracluster correlation and thus avoids the assump-
tion of a constant correlation between units within
cluster. Based on a sampling technique, an ination
factor is dened that accounts for clustered observa-
tions. This ination factor is equal to the variance
estimate of the ratio of positive responses,
var^ pp
i

K
i
K
i
1
K
k1
y
ik
n
ik
^ pp
i
2
_
K
k1
n
ik
_
2
_
_
_
_
_
_
divided by the estimated binomial variance of the
proportion of positive responses, n
1
i
^ pp
i
1 ^ pp
i
. Rao
and Scott refer to the ination factor as the design
effect and apply it to the standard chi-square test
with I 1 degrees of freedom,
~ww
2
I
i1
~yy
i
~ nn
i
~ pp
2
~ nn
i
~ pp1 ~ pp
The number of positive responses (y
i
), number of units
(n
i
) and event rate (p) in the chi-square equation are
each adjusted by the design effect and applied to the
one degree of freedom chi-square statistic. The design
effect approach is best utilized when the mean cluster
size differs between comparison groups. The limiting
factor of this sampling approach is that a relatively
large number of clusters is suggested in order to obtain
a consistent ratio estimator.
Following a similar approach, Obuchowski
[8]
has
developed a test for clustered matched-pair data that
pools the probability of a success across all clusters
and all tests, where p in the above equation is replaced
with pp, which is the average of the proportions of
success for each procedure: pp ^ pp
i
^ pp
i
0 =2. The
homogeneity hypothesis and the numerator of the test
statistic remain the same as in the McNemar test,
except that the overall sample proportion of positive
units detected by procedure i (^ pp
i
) is estimated by
counting the number of positive responses by proce-
dure i summed over all K clusters, e.g.,
a
k
b
k
for Procedure 1 and
a
k
c
k
for Procedure 2.
The test is asymptotically distributed as a chi-square
with one degree of freedom under the null hypothesis,
w
2
0

^ pp
i
^ pp
i
0
2
var^ pp
i
^ pp
i
0
pp
where var^ pp
i
^ pp
i
0
pp
var^ pp
i
pp
var^ pp
i
0
pp

2cov^ pp
i
; ^ pp
i
0
pp
. Obuchowski compares this method to
the ICC adjustment. Under specic conditions (i.e., a
cluster size 5, three different correlation structures,
and various sample sizes), the proposed method per-
forms similar to the ICC in terms of maintaining a
Type I error rate below 0.05. The main difference
between the two procedures is that the sampling
approach does not take into account within-cluster
correlation.
An alternative approach proposed by Durkalski
and colleagues utilizes a method of moments approach
for the analysis of clustered matched-pair data.
[9]
The
proposed test statistic is
w
2
V

K
k1
1=n
k
b
k
c
k
_
2
K
k1
b
k
c
k
=n
k
2
because under the null hypothesis, the estimated
variance of the difference in success rates is
var^ pp
10
^ pp
01

1
K
2
K
k1
b
k
c
k
n
k
_ _
2
This estimate of the variance is consistent according to
large sample theory. Based on simulations, the method
of moments approach performs as well in terms of size
and power as the McNemar test adjusted by the ICC
and Obuchowskis approach for clustered matched-pair
Table 2 Contingency table for clustered matched-pair binary data
Procedure 1
Success Failure
Procedure 2 Success
K
k1
nk
j1
a
jk
K
k1
nk
j1
b
jk
K
k1
nk
j1
a
jk
b
jk
Failure
K
k1
nk
j1
c
k
K
k1
nk
j1
d
k
K
k1
nk
j1
a
jk
c
jk

K
k1
nk
j1
n
k
N
The joint probabilities of success and failure between the two procedures are p
11
N
1
K
k1
Ea
k
, p
10
N
1
K
k1
Eb
k
,
p
01
N
1
K
k1
Ec
k
, and p
00
N
1
K
k1
Ed
k
.
Analysis of Clustered Binary Data 3
D
o
w
n
l
o
a
d
e
d

B
y
:

[
U
n
i
v
e
r
s
i
t
y

o
f

A
l
b
e
r
t
a
]

A
t
:

0
6
:
3
1

7

J
a
n
u
a
r
y

2
0
0
9
data. Moreover, this method has been extended to
account for non-inferiority study designs.
[18]
General Estimating Equations
Although adjusting the chi-square test for the equality
of probabilities is popular, more complex analyses that
incorporate covariate effects may also be of interest.
These analyses require sophisticated models that
account for the correlation within clusters. A popular
method due to the availability of statistical software
is the application of generalized linear models.
[19]
Based on this model, marginal regression modeling
using GEEs was developed as a semiparametric
approach to tting logistic regression models for
binary clustered data.
[11]
The logistic regression model
using the GEE estimating function obtains similar
results as the adjusted chi-square tests previously dis-
cussed in this entry when covariates are not present.
This model is dened as
logit PrY
ijk
1 b
0
b
1
x
ijk
where x
ijk
identies the ith treatment variable from the
jth unit of the kth cluster (in the case of two treat-
ments) and can be run using PROC GENMOD in
SAS.
[20]
The p-value is generated from a comparison
of the log odds ratio to the estimated variance. Because
clustering affects the standard errors of the parameter
estimates rather than the parameter estimates them-
selves, the parameter estimates can be obtained by
running a regression analysis on each response. The
logit link expresses the linear relationship between a
clusters responses and the corresponding covariates.
The correlation within a cluster is accounted for
in the variancecovariance matrix. To estimate the
variance of the response, a specied working
correlation matrix that denes the association among
units within a cluster substitutes for the true, often
unknown, correlation matrix. Although the working
correlation matrix may be mis-specied, which could
possibly result in a decrease of efciency, the GEE
method can produce consistent estimates.
[11]
Further-
more, the decreased efciency may be minimal if the
number of clusters is large.
[21]
Prentice
[22]
extends
Zeger and Liangs model to incorporate the modeling
of correlations within a cluster in order to improve
upon the efciency of the GEE estimates of the regres-
sion parameters. Although the GEE method is more
complex than the straightforward comparison of event
rates, the attractiveness of this approach is that it easily
incorporates adjustments for covariates when needed.
Methods for binary regression using clustered data
deserve an entry of their own and are not discussed
further.
EXAMPLE
To illustrate the clinical application of commonly used
methods for analyzing clustered binary data and the
importance of accounting for the clustered nature of
the data, Donner and Klar
[23]
consider data collected
from schools randomly allocated to one of two inter-
ventions. The outcome of interest is the proportion
of children who use smokeless tobacco after two years
of follow up. The performance of various methods
for testing the equality of clustered binary data is
observed, including the test statistics and p-values of
the unadjusted Pearson chi-square, the ICC ination
factor, the design effect ination factor, and the GEE
approach. The test statistics and p-values illustrate that
statistical conclusions can be false when clustering is
ignored (using the standard chi-square test), while all
methods that adjust for clustering give relatively the
same conclusions with some variability in the actual
test values and signicance levels.
For matched-pair data, a data set containing
clustered matched-pair data that appeared in
Obuchowskis paper
[8]
is assessed with the unadjusted
McNemar test, the ICC adjustment, pooled estimators,
and the method of moments approach. A trial in diag-
nostic methods for hyperparathyroidism is designed to
compare the sensitivity and specicity of two tech-
niques, positron emission tomography (PET) and a
single photon emission CT (SPECT) scan. The data
consist of 21 patients whose glands were examined
for the presence of hyperparathyroidism (Table 3).
Of the 21 patients, a total of 72 glands were evaluated
by both diagnostic tools. The specicity of the two
scanners is considered to be of interest and, therefore,
only the glands that are conrmed negative by surgery
(considered the gold standard) are evaluated. Of the 72
glands among the 21 patients, a total of 51 glands were
conrmed negative. The estimated intracluster correla-
tion is equal to 0.46. Obuchowskis, Donners, and
Durkalskis test results are chi-square values of 2.86
(p 0.091), 3.66 (p 0.056), and 2.32 (p 0.128),
respectively. The unadjusted McNemar test statistic is
4.5 (p 0.034).
All the tests that account for clustering fail to reject
the null hypothesis of no difference between the two
procedures (at the 0.05 level), whereas the McNemar
test (which does not account for the clustering) con-
cludes that a statistically signicant difference does
exist (at the 0.05 level) in the specicity of PET and
SPECT. This is not surprising, because as seen in
Donner and Klars comparisons, accounting for clus-
tering means assigning less weight to multiple observa-
tions from the same cluster. Without doing this, these
multiple observations would have more inuence than
they perhaps should, and this could easily lead to pseudo-
power, or a false rejection of a true null hypothesis.
D
o
w
n
l
o
a
d
e
d

B
y
:

[
U
n
i
v
e
r
s
i
t
y

o
f

A
l
b
e
r
t
a
]

A
t
:

0
6
:
3
1

7

J
a
n
u
a
r
y

2
0
0
9
CONCLUSIONS
Clustered binary data are prevalent in medical
research. Ophthalmology studies, dental research,
oncology studies, and community research programs
often involve multiple layers, which violate the basic
assumption of independence for common statistical
methods. Methods for analyzing clustered binary data
are available and continue to be developed and
enhanced. All methods account for the within-cluster
correlation; yet, it is the approach to doing so that
makes them different. The appropriate strategy is
dependent on the research question being explored
and the data structure in terms of cluster size and
cluster covariates.
Due to the availability of current statistical soft-
ware, the majority of methods discussed in this entry
are simple to implement in practice. Therefore, the
collapsing of the data to a single-level model should
be avoided. Ignoring the cluster effect or collapsing
of data has been shown throughout the literature
to create biased estimates, which can lead to false
statistical conclusions (as was the case in our
examples). If a simple test of the equality of event rates
is being performed and a relatively small difference
in mean cluster size between comparison groups is
present, then the ICC approach is convenient to
implement and can be extended to a number of data
scenarios, including stratied or matched-pair data.
For more complex study designs, such as those that
incorporate covariates or multiple cluster levels (per-
haps classrooms within school districts or designs that
have a combination of clustered and longitudinal
data), hierarchical modeling approaches are available
and continue to be explored.
[1,3]
It is worthwhile to
mention that during the planning stages of a study that
involves clustered binary data, the statistician needs to
consider inating the sample size to offset the loss of
information that occurs due to the clustering. Lee
and Dubin offer considerations for computing these
ination factors in practice.
[24]
ACKNOWLEDGMENT
The author would like to thank Dr. Vance Berger
for his constructive comments, which have added to
this entry.
REFERENCES
1. Neuhaus, J. Assessing change with longitudinal
and clustered binary data. Annu. Rev. Public
Health 2001, 22, 115128.
2. Ashby, M.; Neuhaus, J.M.; Hauck, W.W.;
Bacchetti, P.; Heilbron, D.C.; Jewell, N.P.; Segal,
M.R.; Fusaro, R.E. An annotated bibliography of
method for analyzing correlated categorical data.
Stat. Med. 1992, 11, 6799.
3. Pendergast, J.F.; Gange, S.J.; Newton, M.A.;
Lindstrom, M.J.; Palta, M.; Fisher, M.R. A
survey of methods for analyzing clustered binary
response data. Int. Stat. Rev. 1996, 64 (1), 89118.
4. Hujoel, P.P.; Moulton, L.H.; Loesche, W.J.
Estimation of sensitivity and specicity of site-
specic diagnostic tests. J. Periodont. Res. 1990,
25, 193196.
5. Donner, A. Statistical methodology for paired
cluster designs. Am. J. Epidemiol. 1987, 126 (5),
972979.
6. Rao, J.N.K.; Scott, A.J. A simple method for the
analysis of clustered binary data. Biometrics 1992,
48, 577585.
7. Lee, E.W.; Dubin, N. Estimation and sample size
considerations for clustered binary responses.
Stat. Med. 1994, 13, 12411252.
Table 3 Hyperthyroid data
k n
k
y
ik
y
i
0
k
a
k
b
k
c
k
d
k
1 3 0 2 0 0 2 1
2 3 2 3 2 0 1 0
3 3 3 3 3 0 0 0
4 1 1 1 1 0 0 0
5 3 2 3 2 0 1 0
6 4 4 4 4 0 0 0
7 3 3 3 3 0 0 0
8 2 2 2 2 0 0 0
9 2 2 1 1 1 0 0
10 1 1 1 1 0 0 0
11 3 2 2 2 0 0 1
12 2 2 2 2 0 0 0
13 3 3 3 3 0 0 0
14 2 2 2 2 0 0 0
15 2 0 2 0 0 2 0
16 3 2 2 2 0 0 1
17 3 2 2 2 0 0 1
18 3 2 3 2 0 1 0
19 2 2 2 2 0 0 0
20 1 1 1 1 0 0 0
21 2 2 2 2 0 0 0
K 21, N
P
K
k1
n
k
51, y
ik

P
nk
j1
y
ijk
40, y
i
0
k

P
nk
j1
y
i
0
jk
46, a
P
K
k1
P
nk
j1
a
jk
39, b
P
K
k1
P
nk
j1
b
jk
1,
c
P
K
k1
P
nk
j1
c
jk
7, d
P
K
k1
P
nk
j1
d
jk
4.
(From Ref.
[8]
.)
Analysis of Clustered Binary Data 5
D
o
w
n
l
o
a
d
e
d

B
y
:

[
U
n
i
v
e
r
s
i
t
y

o
f

A
l
b
e
r
t
a
]

A
t
:

0
6
:
3
1

7

J
a
n
u
a
r
y

2
0
0
9
8. Obuchowski, N.A. On the comparison of corre-
lated proportions for clustered data. Stat. Med.
1998, 17, 14951507.
9. Durkalski, V.; Palesch, Y.; Lipsitz, S.; Rust, P.
The analysis of clustered matched-pair data. Stat.
Med. 2003, 22, 24172428.
10. Donner, A. The analysis of intraclass correlation
in multiple samples. Ann. Hum. Genet. 1985, 49,
7582.
11. Zeger, S.L.; Liang, K. Longitudinal data analysis
for discrete and continuous outcomes. Biometrics
1986, 42, 121130.
12. Berger, V.W. Pros and cons of permutation tests
in clinical trials. Stat. Med. 2000, 19, 1319
1328.
13. Donner, A.; Donald, A. The statistical analysis of
multiple binary measurements. J. Clin. Epidemiol.
1988, 41 (9), 899905.
14. Donner, A.; Banting, D. Adjustment of frequently
used chi-square procedures for the effect of site-
to-site dependencies in the analysis of dental data.
J. Dent. Res. 1989, 68 (9), 13501354.
15. Donald, A.; Donner, A. Adjustments to the
MantelHaenszel chi-square statistic and odds
ratio variance estimator when the data are
fclustered. Stat. Med. 1987, 6, 491499.
16. Donner, A.; Eliasziw, M. Application of matched
pair procedures to site-specic data in periodontal
research. J. Clin. Periodontol. 1991, 18,
755759.
17. Jung, S.; Ahn, C.; Donner, A. Evaluation of an
adjusted chi-square statistic as applied to observa-
tional studies involving clustered binary data.
Stat. Med. 2001, 20, 21492161.
18. Durkalski, V.; Palesch, Y.; Lipsitz, S.; Rust, P.
The analysis of clustered matched-pair data under
a non-inferiority study design. Stat. Med. 2003,
22, 279290.
19. Nelder, J.A.; Wedderburn, R.W.M. Generalized
linear models. J. R. Stat. Soc. A 1972, 135,
370384.
20. SAS Statistical Software V8 or V9, Cary, North
Carolina.
21. Ahn, C. Statistical methods for the estimation of
sensitivity and specicity of site-specic diagnos-
tic tests. J. Periodont. Res. 1997, 32, 351354.
22. Prentice, R.L. Correlated binary regression with
covariates specic to each binary observation.
Biometrics 1988, 44, 10331048.
23. Donner, A.; Klar, N. Analysis of binary out-
comes. In Cluster Randomization Trials; Arnold,
Hodder Headline Group: London, 2000; 8695
(Chapter 6).
24. Lee, E.W.; Dubin, N. Estimation and sample size
considerations for clustered binary responses.
Stat. Med. 1994, 13, 12411252.
D
o
w
n
l
o
a
d
e
d

B
y
:

[
U
n
i
v
e
r
s
i
t
y

o
f

A
l
b
e
r
t
a
]

A
t
:

0
6
:
3
1

7

J
a
n
u
a
r
y

2
0
0
9

Analysis of Clustered Binary Data

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Analysis of Clustered Binary Data

Caricato da

Copyright:

Formati disponibili

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [University of Alberta]

2 Analysis of Clustered Binary Data

for Procedure 1 and

Potrebbero piacerti anche