Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Doctor of Philosophy
Department of Biostatistics
UMI 3554514
Published by ProQuest LLC (2013). Copyright in the Dissertation held by the Author.
Microform Edition ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code
ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346
Dissertation Committee
Albert Vexler, Ph.D. (Advisor)
Alan Hutson, Ph.D.
Chang-Xing Ma, Ph.D.
Jihnhee Yu, Ph.D.
ii
ACKNOWLEDGEMENTS
I would like to express my gratitude to all the people who have helped along the journey
of my Ph.D. study.
Firstly, I would like to express my deepest appreciation to my advisor, Professor
Albert Vexler, for his guidance and encouragement. This dissertation would not have
been possible without him. I am also deeply thankful to Professors Alan Hutson, ChangXing Ma, and Jihnhee Yu, who kindly agreed to serve on my dissertation committee, for
their interest and valuable comments on my work.
I would like to thank all of my professors, fellow students, and department staff for
their kind help and support during my graduate study at the University at Buffalo. My
sincere thanks also to Dr. Gregory Gurevich and Dr. Yaakov Malinovsky as well as the
reviewers of the manuscripts submitted to journals for the constructive suggestions
improving the quality of the manuscripts and this dissertation.
I would also like to thank all of my friends for always being there with a word of
encouragement or listening ear.
Last but not least, I would like to thank my beloved parents, brother and sisters.
Without their strong love and support, I would not have had a chance to go abroad to
pursue my graduate study. I love my family beyond expression for their support and
encouragement along the way. I would like to dedicate this work to them.
iii
TABLE OF CONTENTS
Acknowledgements .......................................................................................................iii
List of Tables....................................................................................................................x
List of Figures ............................................................................................................... xiii
Abstract. ........................................................................................................................ . xiv
Chapters. ........................................................................................................................... 1
Chapter 1. Introduction. ............................................................................................ 1
Chapter 2. Estimation and testing based on data subject to
measurement errors: from parametric to non-parametric likelihood
methods............................................................................................................................ 9
2.1 Introduction ..............................................................................................................................9
2.2 Parametric inferences............................................................................................................11
2.2.1 Parametric likelihood functions .......................................................................... 12
2.2.1.1 Parametric likelihood based on repeated data..12
2.2.1.2 Parametric likelihood based on pooled and unpooled data......13
2.2.2 Normal case......................................................................................................... 15
2.2.2.1 Maximum likelihood estimators based on repeated measures.15
2.2.2.2 Maximum likelihood estimators followed the hybrid design......17
iv
vs.
.....94
5.2.2.2 Test 2:
vs.
.........99
5.2.2.3 Test 3:
vs.
.......101
vi
vii
viii
A.3.3.1 Maximum likelihood ratio test statistic for test 1 ......................................... 168
A.3.3.2 Maximum likelihood ratio test statistic for test 2 ......................................... 169
A.3.3.3 Maximum likelihood ratio test statistic for test 3 ......................................... 170
ix
LIST OF TABLES
2.1 The Monte Carlo evaluations of the maximum likelihood estimates based on
repeated measurements and the hybrid design ......................................................... 28
2.2 Coverage probabilities and confidence intervals based on repeated measurements
and the hybrid design............................................................................................... 29
2.3 The Monte Carlo type I errors and powers of the empirical likelihood ratio test
statistics (2.3) and (2.7) for testing
(
hybrid design (
))............................................................. 31
2.4 The Monte Carlo type I errors and powers of the empirical likelihood ratio test
statistics (2.3) and (2.7) for testing
(
hybrid design (
2.5
.... 47
3.3 The Monte Carlo Type I errors of the proposed test with critical values obtained
by Proposition 2.2 to guarantee
................................................................ 50
3.4 An empirical power comparison for the density-based empirical likelihood ratio
test at
........................................................................................................ 51
x
3.5 Test for the IG based on the data introduced in Folks and Chhikara (1978) ........... 52
3.6 Bootstrap type proportion of rejection of the tests for the IG at a 5% level of
significance .............................................................................................................. 54
) values for
respectively .............................................................................................................. 68
4.2 Type I error control of the proposed test statistic (4.6) with (
the significant level
) at
.................................................................................. 70
4.3 Designs of the alternative hypothesis to be applied to the following Monte Carlo
evaluations of the powers of the proposed test (4.6) ................................................ 72
4.4 Proportion of rejection based on the bootstrap method for each considered test.. .. 78
5.3 The Monte Carlo powers of Test 1 by (5.9) vs. the MLR test for
different sample sizes (
5.4 The Monte Carlo powers of Test 2 by (5.15) vs. the MLR test for
different sample sizes (
5.5 The Monte Carlo powers of Test 3 by (5.17) vs. the MLR test for
different sample sizes (
vs.
with
........................ 111
vs.
with
........................ 112
vs.
with
........................ 112
5.6 The Monte Carlo type I errors of the MLR tests ................................................... 113
5.7 The Monte Carlo powers of the proposed test (5.9) vs. the combined
nonparametric test (the two Wilcoxon signed rank tests and one KolmogorovSmirnov test) at
...................................................................................... 113
5.8 The Monte Carlo powers of the proposed test (5.15) vs. the combined
nonparametric test (the one Wilcoxon signed rank test and one KolmogorovSmirnov test) at
..................................................................................... 114
5.9 The Monte Carlo powers of the proposed test (5.17) vs. the Wilcoxon signed
rank test at the significance level
........................................................... 115
5.10 The proportions of rejections based on the bootstrap method for each considered
test.......................................................................................................................... 115
6.1 Means and standard deviations of the %TS data and the TIBC data in each group
............................................................................................................................... 132
xii
LIST OF FIGURES
2.1 The histogram and the normal Q-Q plot of cholesterol data ................................... 33
4.1 3-D plots of powers of the considered tests via all 28 designs (K1-K28) with
different sample sizes at the significant level
.......................................... 80
4.2 Histograms of the differences in CDRS-Rts at baseline and endpoint in the group
1(
4.3 Plot of sample sizes vs. p-value using a bootstrap method ...................................... 87
5.1 Histograms of CDRS-Rts related to the baseline and endpoint in group 1: (
), and in group 2: (
and
)=(9, 6), that were sampled from the original data set .. 116
6.1 The left-hand side of the Figure 6.1 shows plots and histograms of %TS data; the
right-hand side of the Figure 6.1 shows plots and histograms of TIBC data......... 130
6.2 Histograms of %TS data in each group ................................................................. 131
6.3 Histograms of TIBC data in each group ................................................................ 131
xiii
ABSTRACT
The likelihood approach provides a basis for many important procedures and methods in
statistical inference. When data distributions are completely known, the parametric
likelihood approach is unarguably a powerful statistical tool that can provide optimal
statistical inference. In such cases, by virtue of the Neyman-Pearson lemma, the likelihood
ratio tests are the most powerful decision rules. The parametric likelihood methods cannot
be applied properly if assumptions on the forms of distributions of data do not hold. Often,
in the context of likelihood applications, the use of the misspecified parametric forms of
data distributions may result in inaccurate statistical conclusions. The empirical likelihood
(EL) methodology has been well addressed in the literature as a nonparametric counterpart
of its powerful parametric likelihood approach. The objective of this dissertation is to
develop several powerful parametric likelihood methods and nonparametric approaches
using the EL concept. Measurement error (ME) problems can cause bias or inconsistency
of statistical inferences. When investigators are unable to obtain correct measurements of
biological assays, special techniques to quantify MEs need to be applied. In this
dissertation, we present both parametric likelihood and EL methods for dealing with data
subject to MEs based on repeated measures sampling strategies and hybrid sampling
designs (a mixture of pooled and unpooled data). Utilizing the density-based EL
methodology, we also propose different efficient nonparametric tests that approximate
most powerful Neyman-Pearson test statistics. We first introduce the EL ratio based
goodness-of-fit test for the inverse Gaussian model. Then we extend and adapt the densitybased EL approach to compare two samples based on paired data. We present exact
nonparametric tests for composite hypotheses to detect various differences related to
xiv
treatment effects in study groups based on paired measurements. Next, we review and
extend parametric retrospective and sequential Shiryaev-Roberts based policies, carrying out
different contexts of the non-asymptotic optimal properties of the procedures. We propose
techniques to construct novel and efficient retrospective tests for multiple change-points
detection problems. Finally, future works will be discussed.
xv
CHAPTER 1
INTRODUCTION
Likelihood methods are powerful statistical tools for parametric statistical inference. The
parametric likelihood methodology provides optimal statistical inferential procedures
when data distributions are known. However, parametric forms of data distributions are
oftentimes unknown. When the key assumptions regarding the underlying distribution of
data are not met, the parametric likelihood approaches may be extremely biased and
inefficient when compared to their robust nonparametric counterparts. In the
nonparametric context, the classical empirical likelihood (EL) approach is often applied
in order to efficiently approximate properties of parametric likelihoods, using an
approach based on substituting empirical distribution functions for their population
counterparts. Thus, the objective of this work is to develop efficient parametric likelihood
approaches as well as nonparametric methods using the EL concept to deal with problems
arising from epidemiological and medical studies.
The EL ratios were first used by Thomas & Grunkemeier (1975) in the context of
estimating survival probabilities. Owen (1988, 1990), building on earlier work of Thomas
& Grunkemeier (1975), introduced the EL approach for constructing confidence regions
in nonparametric problems. The approach has been developed to various situations. For
example, Owen (1991) and Chen (1993, 1994) extended the method to regression
problems, Chen and Hall (1993) considered the case of quantiles, and Kolaczyk (1994)
made the extension to generalized linear models. Qin and Lawless (1994) have shown
that the EL methodology can be utilized to make inference on parameters of interest
hand, involves randomly grouping and physically mixing individual biological samples.
Assays are then performed on the small number of pooled samples. Therefore, pooling
sampling design reduces the number of measurements without ignoring any individual
biospecimens. Schisterman et al. (2010) proposed a pooled-unpooled hybrid design that
combines the advantages of both pooling and random sampling strategies. The hybrid
designs are implemented by assaying some randomly selected individual biospecimens
and one pooled set that is constituted by the remaining individual biospecimens. The
pooled sample in the hybrid design is used for estimation of the mean of the biomarker
distribution, whereas the random sample proportion in the hybrid design (unpooled data)
is utilized for estimation of the variance. One advantage of using the hybrid design is that
by utilizing a pooled sample, one can estimate not only mean and variance of the
biomarker distribution, but also measurement error, without requiring repeated measures.
The repeated measurements sampling strategy has been well addressed in the literature
under parametric assumptions. In the context of the hybrid sampling design, Schisterman
et al. (2010) have proposed and evaluated a parametric approach for normally distributed
data in the presence of measurement errors. However, nonparametric methods have not
been well investigated to analyze repeated measures data or pooled data subject to
measurement errors. In Chapter 2, we consider general cases of parametric and
nonparametric assumptions, comparing efficiency of pooled-unpooled samples and data
consisting of repeated measures. The applications of the proposed methods are illustrated
using the cholesterol biomarker data from a study of myocardial infarction.
The classical EL approach reflects distribution-based interpretation of the notion of
likelihood (Owen, 2001). Note that the Neyman-Pearson concept to show the optimality
of statistical decision rules utilizes density functions structures of likelihood type tests.
Recently, Vexler and Gurevich (2010) proposed the density-based EL methodology to
construct efficient nonparametric tests that approximate Neyman-Pearson test statistics.
In the following chapters, we extend and adapt the density-based EL approach for
different testing problems, including tests of goodness-of-fit (Chapters 3 and 7.1) and
two-sample comparisons based on paired data (Chapters 4 and 5).
Testing for distributional assumptions has been a major area of continuing statistical
research. Chapter 3 introduces a new and powerful density-based EL goodness-of-fit test
for the inverse Gaussian distribution. The inverse Gaussian distribution is commonly
introduced to model and examine right skewed data having positive support. It originates
as the distribution of the first passage time of Brownian motion with drift (Henze and
Klar, 2012). It has useful applications include reliability problems (Padgett and Tsoi,
1986), lifetime models (Chhikara and Folks, 1977), and accelerated life testing
(Bhattacharyya and Fries, 1982). Folks and Chhikara (1978, 1989) have presented several
properties and applications of this distribution. When applying the inverse Gaussian
model, it is critical to develop efficient goodness-of-fit tests. Mudholkar and Tian (2002)
presented the entropy-based goodness-of-fit test for the inverse Gaussian distribution that
strongly depends on values of the integer parameter m. The problem of selecting the
optimal parameter m causes the difficulty of efficiently implementing the test in practice.
In this chapter, we propose an EL based goodness-of-fit test for the inverse Gaussian
distribution that improves Mudholkar and Tians test (2002) in the context of eliminating
dependence on the integer parameter m.
within each therapy group as well as detecting between-group differences. To this end, in
a nonparametric setting, one can consider combining relevant nonparametric standard
procedures, for example, the Kolmogorov-Smirnov test and the Wilcoxon signed rank
test. The former test is a known procedure to compare distributions of two study groups,
whereas the latter one can be applied to detect treatment effects within each study group.
However, the use of the classical procedures commonly requires complex considerations
to combine the known nonparametric tests, e.g., considerations of combined p-values. In
Chapter 5, we propose simple nonparametric tests for three composite hypotheses related
to treatment effects to provide efficient tools that compare study groups utilizing paired
data. We adapt and extend the density-based EL methodology to deal with various testing
scenarios involved in the two-sample comparisons based on paired data. The proposed
technique is applied for comparing two therapy strategies to treat childrens attention
deficit/hyperactivity disorder and severe mood dysregulation.
Parametric change point detection schemes based on the Shiryaev-Roberts approach
have been well addressed in the statistics and engineering literature. High efficiency of
such procedures can be partially explained by their known asymptotic optimal properties.
Recently, Shiryaev-Roberts based procedures were proposed and examined in applications
to the standard AMOC (at most one change) retrospective change point detection problems.
In Chapter 6, we review and extend parametric retrospective and sequential ShiryaevRoberts based policies, carrying out different contexts of the non-asymptotic optimal
properties of the procedures. We utilize the general principle of the Neyman-Pearson
fundamental lemma to show that the Shiryaev-Roberts approach implies the average most
powerful procedures. We also introduce techniques to construct efficient retrospective tests
for multiple change-points detection. A real data example based on biomarker
measurements is provided to demonstrate implementation and effectiveness of new tests in
practice.
The last chapter of this dissertation addresses directions for future work and outlines
two possible topics of future work: 1). Develop an EL ratio based goodness-of-fit test of
the normality based on several independent samples and errors in regression models; 2)
Develop a simple likelihood ratio type test for independence between two random
variables, without requiring the specification of any kind of dependence and any
assumptions on the forms of data distributions.
CHAPTER 2
ESTIMATION AND TESTING BASED ON DATA
SUBJECT TO MEASUREMENT ERRORS: FROM
PARAMETRIC TO NON-PARAMETRIC
LIKELIHOOD METHODS
2.1
INTRODUCTION
Commonly, many biological and epidemiological studies deal with data subject to
measurement errors (MEs) attributed to instrumentation inaccuracies, within-subject
variation resulting from random fluctuations over time, etc. Ignoring the presence of ME
in data can result in the bias or inconsistency of estimation or testing. The statistical
literature proposed different methods for ME bias correction (e.g., Carroll et al., 1984,
1999; Carroll and Wand, 1991; Fuller, 1987; Liu and Liang, 1992; Schafer, 2001;
Stefanski, 1985; Stefanski and Carroll, 1987, 1990). Among others, one of the common
methods is to consider repeated measurements of biospecimens collecting sufficient
information for statistical inferences adjusted for ME effects (e.g., Hasabelnaby et al.,
1989). In practice, measurement processes based on bioassays can be costly and timeconsuming and can restrict the number of replicates of each individual available for
analysis or the number of individual biospecimens that can be used. It can follow that
9
investigators may not have enough observations to achieve the desired power or
efficiency in statistical inferences.
Dorfman (1943), Faraggi et al. (2003), Liu and Schisterman (2003), Liu et al. (2004),
Mumford et al. (2006), Schisterman et al. (2008, 2010), and Vexler et al. (2006, 2008,
2010, 2011) addressed pooling sampling strategies as an efficient approach to reduce the
overall cost of epidemiological studies. The basic idea of the pooling design is to pool
together individual biological samples (e.g., blood, plasma, serum or urine) and then
measure the pooled samples instead of each individual biospecimen. Since the pooling
design reduces the number of measurements without ignoring individual biospecimens,
the cost of the measurement process is reduced, but relevant information can still be
derived. Recently, it has been found that we can utilize a hybrid design that takes a
sample of both pooled and unpooled biospecimens to efficiently estimate unknown
parameters, allowing for MEs presence in the data without requiring repeated measures
(Schisterman et al., 2010).
In the context of the hybrid strategy, Schisterman et al. (2010) evaluated data that
follow normal distribution functions. In this chapter, we consider general cases of
parametric and nonparametric assumptions, comparing efficiency of pooled-unpooled
samples and data consisting of repeated measures. It should be noted that the repeated
measurement technique proposes to collect a large amount of information regarding just
nuisance parameters related to distribution functions of ME, whereas the pooledunpooled design provides observations that are informative regarding target variables
allowing for ME. Therefore, we show that the pooled-unpooled sampling strategy is more
efficient than the repeated measurement sampling procedure. We construct parametric
10
2.2
PARAMETRIC INFERENCES
In this section, we derive general forms of the relevant likelihood functions. In each case,
we assume that the total measurements of the biomarkers are fixed, say N, for example, N
is the total number of measurements that a study budget allows us to execute.
11
, where true
times repeatedly
measured. In this case, we can define the total number of available individual bioassays
to be T, T > t, when we can consider obtaining a large number of individual biospecimens
to have a low cost with respect to high cost of measurement processes. We assume that
(
and are independent. Firstly, we consider the simple normal case, say
and
show that if
). Accordingly, we observe
and
(non-
identifiability). The observations s in each group i are dependent because they are
measured using the same bioassay. Note that if we fix the value of
is independent of each other, for example, in the case of
|
conditioned on
(
), we have
and
[ ( )
| )]
functions, and further, we can also derive the maximum likelihood estimators of
12
and
( |
).
be the number of
can obtain due to a limited study budget. We obtain the pooling samples by randomly
grouping individual samples into groups of size , where
], the number of
individual samples in a pooling group and [ ] is the integer part of . The pooling design
requires a physical combination of specimens of the same group and a test of each pooled
specimen, obtaining a single observation, when the pooled sample is measured. Since the
measurements are generally per unit of volume, we assume that the true measurement for
a pooled set is the average of the true individual marker values in that group. In this case,
taking into account that instruments applied to the measurement process can be sensitive
and subject to some random exposure ME, we define a single observation to be a sum of
the average of individual marker values and a value of ME. Note that, in accordance with
the pooling literature, we assume that analysis of the biomarkers is restricted by the high
cost of the measurement process, whereas access to a large number of individual
biospecimens can be considered to have a relatively low cost.
In this subsection of hybrid design, we assume that
available, but still we can provide just
unpooled samples is (
(
),
13
individual
bioassays into
groups, where
(see, e.g., Faraggi et al., 2003; Liu and Schisterman, 2003; Liu et al. 2004; Schisterman
et al. 2008, 2010; Vexler et al., 2006, 2008, 2010, 2011). Hence, we can obtain that
are i.i.d. with the mean
, namely,
)
independent observations
(
i.i.d. (
).
Note that the pooled and unpooled samples are independent of each other. As a result,
the likelihood function based on the combination of pooled and unpooled data has the
form of
(
and
and
14
( )
). Since the
estimators follow the maximum likelihood methodology, we can easily show the
asymptotic properties of the estimators.
) and
form analytical solutions for the maximum likelihood estimators of the unknown
parameters,
2.2.2.1
, and
Assume that
Referring to Searle et al. (1992), the likelihood function is a well-known result that can
be expressed by
{ [
( |
)
(
where
)
[ (
)]
]}
)
(
(
where
( )
)
(
15
)
)
) and
) .
Let
with respect to
, and
and setting the equations equal to zero, we obtain the maximum likelihood equations
with the roots
and
)
is
( )
and
are
and
and
, respectively, when
) , respectively, when
)
(
(
( )
( )
) as
where
[
[
)
(
(
16
(
(
)
) ]
2.2.2.2
) and
) , and
, we can write
.
The likelihood function based on pooled-unpooled data then takes the form
( |
( |
(
(
(
(
)
)
), with respect to
, and
given by
)
(
17
By the virtue of the properties of the maximum likelihood estimators, the asymptotic
distribution of the estimators (2.1) is asymptotically
[
) where
Information matrix, I,
)
(
)
(
) ]
[
where
( |
) ]
As shown above, when biomarkers values and MEs are normally distributed, the
maximum likelihood estimators exist and can be easily obtained. It is also clear that we
can consider these estimators as the least square estimators in a nonparametric context.
However, when data are not from normal distributions, it may be very complicated or
even be infeasible to extract the distributions of repeated measures data or pooled and
unpooled data (e.g., Vexler et al., 2011). For example, in various situations, closed
analytical forms of the likelihood functions cannot be found based on pooled data
18
because the density function of the pooled biospecimen values involves complex
convolutions of p-individual biospecimen values. Consequently, it is reasonable to
consider efficient nonparametric inference methodologies based on the repeated measures
data or pooled-unpooled data.
2.3
In this section, we apply the EL methodology to the statement of the problem in this
chapter. The EL technique has been extensively proposed as a nonparametric
approximation of the parametric likelihood approach (e.g., DiCiccio et al., 1989; Owen,
1988, 1991, 2001; Vexler et al. 2009, 2010; Vexler and Gurevich, 2010; Yu et al., 2010).
We begin by outlining the EL ratio method and then modifying the EL ratio test to apply
to construct confidence interval estimations and tests based on data with repeated
measures and pooled-unpooled data.
and | |
( )
is fixed and known. To test for the hypothesis in equation (2.2), we can write
the EL function as
where we assume
empirical constraints correspond to hypotheses settings. Then, under the null hypothesis
19
subject to
is an empirical form of
where
is a root of
Similarly, under the alternative hypothesis, the maximum EL function has the simple
form
( )
( (
))]
)]
), follows asymptotically
if
as
{
(Here,
is the
( )
percentile of a
freedom.)
20
}
distribution with one degree of
when
is independent of
and the
is given by
where
is a root of
when
)]
1
i 1 i
n , as
21
), distributes
is the
percentile of a
freedom.
and i.i.d.
pooled and unpooled biospecimens, respectively. Under the null hypothesis, the EL
function for
where
and
and
Finally, the 2log EL ratio test statistic can be given in the form of
22
log[
)]
log[
)]
In a similar manner to common EL considerations, one can show that the statistics
)] and
)] follow asymptotically a
2log EL ratio,
), has an asymptotic
distributions, the
), has a
distribution
is the
percentile of a
freedom.
In practice, to execute the procedure above, we can directly use standard programs
related to the classical EL ratio tests, for example, the code el.test of the R software can
be utilized to conduct the EL confidence interval estimator (2.4).
The EL technique mentioned above does not use an empirical version of the rule
(
( )
that connects the second moments derived from pooled and unpooled observations.
Intuitively, using a constraint related to equation (2.5), one can increase the power of the
EL approach. Consider the EL function for
23
) )
(
(
( ) )
) Here, is the estimator from
, and
definition of
) with
) )
and
in the
( ) )
, subject to
,
( )
)( )
where
( )
(
24
( )
, and
) ) ,
( ) )
as
(( ) ))
(
(
))]
) is asymptotically equivalent to
the maximum log EL ratio test statistic. By virtue of results mentioned in Qin and
Lawless (1994),
) asymptotically follows a
when
(
where
is the
(
percentile of a
{
percentile of a
} where
is the
distribution.
The Monte Carlo simulation study presented in the next section examines the
performance of each EL method mentioned above.
2.4
In this section, we conducted an extensive Monte Carlo study to evaluate the performance
of the parametric and nonparametric likelihood methods proposed in Sections 2.2 and 2.3.
. We let
and variance
For simplicity, we assumed that each subject had the same number of replicates
25
s, having
and var (
observations.
To obtain the hybrid samples, we first generated a sample of size T, where
(
], samples of
integer and
mean (
MEs,
s,
)
we
assumed
the
values
of
the
pooled
biospecimens,
observations
generated a ME
to be an
=1 and
=1;
=0.4, 1;
=2, 5, 10; the pooling group size = 2, 5, 10; the pooling proportion =0.5; the total
26
sample size
=100, 300. For each set of parameters, there were 10,000 data generations
(Monte Carlo). In this section, following the pooling literature, we assumed that the
simulated analysis of biomarkers was restricted to execute just N measurements and
(
compared with the repeated measures sampling method. The Monte Carlo simulation
results are presented in the next subsection.
appear to be better as the number of replicates increases. Apparently, the Monte Carlo
standard errors of the estimators of
and
replicates.
To accomplish the efficiency comparison between the repeated measures strategy and
the hybrid design strategy, we provide the Monte Carlo properties of the maximum
likelihood estimates based on pooled-unpooled data in Table 2.1. Table 2.1 shows that
the Monte Carlo standard errors of the estimates for
are clearly less than those of the corresponding estimates that utilize repeated measures,
when
(respectively,
based on pooled-unpooled data is very accurate when the total number of measurements
is fixed at the same level. Another advantage is that the standard errors of the estimates
for
based on pooled-unpooled data are much smaller than those of the corresponding
Table 2.1: The Monte Carlo evaluations of the maximum likelihood estimates based on repeated
measurements and the hybrid design
Estimates
Standard Errors
Sample Replicates n; Parameters
SE( ) SE( ) SE( )
N=300
0.9781
0.9688
0.3997
0.9984
0.1553
0.1726
0.2410
0.3106
0.0790
0.1994
n=5
(1, 1, 0.4)
(1, 1, 1.0)
0.9966
1.0015
0.9462
0.9362
0.3990
0.9998
0.2328
0.2442
0.3305
0.3688
0.0623
0.1570
n=10
(1, 1, 0.4)
(1, 1, 1.0)
(1, 1, 0.4)
(1, 1, 1.0)
1.0026
1.0044
0.9987
1.0005
0.8951
0.8917
0.9921
0.9883
0.3999
0.9995
0.3999
0.9999
0.3209
0.3299
0.0889
0.0995
0.4346
0.4690
0.1405
0.1803
0.0597
0.1501
0.0455
0.1162
n=5
(1, 1, 0.4)
(1, 1, 1.0)
0.9995
0.9990
0.9797
0.9766
0.3998
0.9990
0.1356
0.1409
0.1950
0.2181
0.0365
0.0906
n=10
(1, 1, 0.4)
(1, 1, 1.0)
0.9985
0.9985
0.9682
0.9660
0.3997
1.0002
0.1864
0.1914
0.2633
0.2782
0.0344
0.0861
(1, 1, 0.4)
(1, 1, 1.0)
1.0015
1.0007
1.0160
1.0754
0.4365
1.0098
0.1048
0.1327
0.6712
1.0058
0.4579
0.7275
p=5
(1, 1, 0.4)
(1, 1, 1.0)
0.9994
1.0008
1.0045
1.0053
0.3889
0.9880
0.0924
0.1240
0.3857
0.5932
0.1662
0.3217
p=10
(1, 1, 0.4)
(1, 1, 1.0)
(1, 1, 0.4)
(1, 1, 1.0)
0.9993
0.9996
0.9999
1.0002
1.0049
1.0050
0.9974
1.0069
0.3918
0.9836
0.4066
0.9982
0.0871
0.1197
0.0608
0.0758
0.3341
0.5082
0.3868
0.5788
0.1164
0.2486
0.2652
0.4179
p=5
(1, 1, 0.4)
(1, 1, 1.0)
0.9995
0.9993
1.0013
1.0076
0.3969
0.9910
0.0534
0.0711
0.2197
0.3386
0.0954
0.1819
p=10
(1, 1, 0.4)
(1, 1, 1.0)
0.9995
0.9992
0.9995
1.0059
0.3972
0.9922
0.0497
0.0688
0.1935
0.2928
0.0671
0.1436
n=2
Hybrid Design:
N=100
p=2
N=300
1.0021
1.0006
p=2
28
Table 2.2 displays the coverage probabilities of the confidence interval estimators
constructed by the parametric likelihood and EL method based on repeated measures data
and the mixed data, respectively.
Table 2.2: Coverage probabilities and confidence intervals based on repeated measurements and
the hybrid design
Parametric Likelihood
Empirical Likelihood
Sample Replicates n; Parameters
Size Pooling Size p ( , , ) Coverage
CI
Coverage
CI
Repeated Measurements:
N=100
n=2
(1, 1, 0.4)
0.9420 (0.7028, 1.3014) 0.9496 (0.6980, 1.3049)
(1, 1, 1.0)
0.9423 (0.6665, 1.3347) 0.9466 (0.6613, 1.3394)
N=300
n=5
(1, 1, 0.4)
(1, 1, 1.0)
0.9305
0.9289
(0.5584, 1.4348)
(0.5404, 1.4626)
0.9327
0.9353
(0.5519, 1.4466)
(0.5298, 1.4752)
n=10
(1, 1, 0.4)
(1, 1, 1.0)
(1, 1, 0.4)
(1, 1, 1.0)
0.9044
0.9042
0.9477
0.9469
(0.4193, 1.5859)
(0.4040, 1.6047)
(0.8243, 1.1731)
(0.8056, 1.1955)
0.8985
0.9030
0.9517
0.9479
(0.4158, 1.5876)
(0.4054, 1.6065)
(0.8240, 1.1753)
(0.8034, 1.1962)
n=5
(1, 1, 0.4)
(1, 1, 1.0)
0.9400
0.9448
(0.7401, 1.2588)
(0.7257, 1.2722)
0.9467
0.9467
(0.7360, 1.2628)
(0.7210, 1.2763)
n=10
(1, 1, 0.4)
(1, 1, 1.0)
0.9396
0.9336
(0.6422, 1.3547)
(0.6321, 1.3648)
0.9379
0.9417
(0.6318, 1.3582)
(0.6245, 1.3712)
(1, 1, 0.4)
(1, 1, 1.0)
0.9512
0.9463
(0.7939, 1.2090)
(0.7422, 1.2592)
0.9492
0.9421
(0.7725, 1.2303)
(0.7146, 1.2869)
p=5
(1, 1, 0.4)
(1, 1, 1.0)
0.9424
0.9439
(0.8230, 1.1757)
(0.7644, 1.2372)
0.9490
0.9509
(0.7978, 1.2010)
(0.7314, 1.2703)
p=10
(1, 1, 0.4)
(1, 1, 1.0)
(1, 1, 0.4)
(1, 1, 1.0)
0.9393
0.9431
0.9482
0.9469
(0.8339, 1.1646)
(0.7701, 1.2290)
(0.8817, 1.1182)
(0.8525, 1.1479)
0.9498
0.9478
0.9551
0.9520
(0.8099, 1.1887)
(0.7376, 1.2614)
(0.8660, 1.1337)
(0.8334, 1.1672)
p=5
(1, 1, 0.4)
(1, 1, 1.0)
0.9478
0.9463
(0.8963, 1.1026)
(0.8616, 1.1371)
0.9532
0.9506
(0.8822, 1.1166)
(0.8433, 1.1556)
p=10
(1, 1, 0.4)
(1, 1, 1.0)
0.9462
0.9484
(0.9030, 1.0961)
(0.8652, 1.1332)
0.9584
0.9532
(0.8896, 1.1095)
(0.8475, 1.5080)
n=2
Hybrid Design:
N=100
p=2
N=300
p=2
29
Table 2.2 shows that the EL ratio test statistic is as efficient as the traditional
parametric likelihood approach in the context of constructing confidence intervals
because the coverage probabilities and the interval widths of the two methods are very
close.
It is clearly shown that when sample sizes are greater than 100, the coverage
probabilities obtained via the pooled-unpooled design are closer to the expected 0.95
value than those based on repeated measurements. This, again, demonstrates that mixed
data are more efficient than repeated measures data.
To compare the Monte Carlo type I errors and powers of the tests based on the test
(
statistics
and
simulations for each parametric setting and sample size. To test the null hypothesis
(
) and
the Monte Carlo type I errors and powers of the test statistic
than those corresponding to the test statistic
the simple statistic
), in the
considered cases.
Table 2.4 displays the Monte Carlo simulation results of testing the null hypothesis
when
freedom and
, where
is an effect size. Again, in this case, it is obvious that the type I errors
30
size a is large than 0.5. On the contrary, as the effect size a is small such as 0.1 and 0.2,
the Monte Carlo powers of the tests based on the test statistic
(
). This shows that when the effect size a is large, the test
(
).
Table 2.3: The Monte Carlo type I errors and powers of the EL ratio test statistics (2.3)
and (2.7) for testing
based on data following the hybrid design
(
)
(
). The pooling proportion = 0.5; the expected significance
(
level was 0.05).
Sample
Pooling
Parameters
ELR Test Statistic (2.3)
ELR Test Statistic (2.7)
Group
Size (N)
Size (p)
Type I
Power
Type I
Power
Error
Error
=0
=0.5
=1.0
=0
=0.5
=1.0
N=100
p=2
0.4
0.0587 0.9919 1.0000 0.0580 0.9718 0.9990
1.0
0.0558 0.9373 1.0000 0.0611 0.9230 0.9986
N=200
p=5
0.4
1.0
0.0555
0.0587
0.9989
0.9626
1.0000
1.0000
0.0530
0.0604
0.9512
0.9446
0.9992
0.9976
p=10
0.4
1.0
0.4
1.0
0.0588
0.0556
0.0495
0.0536
0.9995
0.9684
0.9999
0.9991
1.0000
1.0000
1.0000
1.0000
0.0595
0.0621
0.0594
0.0593
0.9531
0.9680
0.9990
0.9983
0.9999
0.9992
0.9985
0.9995
p=5
0.4
1.0
0.0540
0.0511
1.0000
0.9997
1.0000
1.0000
0.0524
0.0543
0.9952
0.9981
0.9996
0.9999
p=10
0.4
1.0
0.0549
0.0536
1.0000
0.9999
1.0000
1.0000
0.0546
0.0551
0.9950
0.9979
0.9996
1.0000
p=2
31
Table 2.4: The Monte Carlo type I errors and powers of the EL ratio test statistics (2.3)
and (2.7) for testing
based on data following the hybrid design (
(
) , and
. The pooling proportion
= 0.5; the expected
significance level was 0.05).
Sample
Pooling
Parameters
ELR Test Statistic (2.3)
ELR Test Statistic (2.7)
Group
Size (N)
Size (p)
Type I
Power
Type I
Power
Error
Error
a = 0 a = 0.1 a = 0.2 a = 0.5 a = 0 a = 0.1 a = 0.2 a = 0.5
N=100
p=2
0.4
0.0687 0.0862 0.2021 0.8567 0.0724 0.0989 0.2144 0.7446
1.0
0.0654 0.0792 0.1579 0.6978 0.0690 0.0936 0.1696 0.6417
N=200
2.5
p=5
0.4
1.0
0.0649 0.1123
0.0670 0.0906
0.3133
0.1901
0.9808
0.8141
0.0985
0.0862
0.1583
0.1131
0.3552
0.2338
0.9084
0.8171
p=10
0.4
1.0
0.4
1.0
0.0646
0.0623
0.0587
0.0544
0.1454
0.0916
0.1137
0.0940
0.4460
0.2253
0.3381
0.2583
0.9990
0.8776
0.9907
0.9427
0.1016
0.0933
0.0555
0.0534
0.1724
0.1241
0.1210
0.1044
0.4416
0.2693
0.3294
0.2612
0.9269
0.8744
0.8227
0.8147
p=5
0.4
1.0
0.0559 0.1699
0.0538 0.1134
0.5557
0.3366
0.9998
0.9860
0.0857
0.0770
0.1903
0.1468
0.5215
0.3687
0.8944
0.9436
p=10
0.4
1.0
0.0572 0.2279
0.0568 0.1233
0.7539
0.3988
1.0000
0.9935
0.0801
0.0815
0.2027
0.1629
0.6158
0.4219
0.8875
0.9527
p=2
In this section, we illustrate the proposed methods via data from the Cedars-Sinai
Medical Center. This study on coronary heart disease investigated the discriminatory
ability of a cholesterol biomarker for myocardial infarction (MI). We had 80 individual
measurements of cholesterol biomarker in total. Half of these were collected on patients
who recently survived a MI (cases), and the other half on controls who had normal rest
electrocardiograms and were free of symptoms, having no previous cardiovascular
procedures or MIs. Additionally, the blood specimens were randomly pooled in groups of
, keeping cases and controls separate, and then re-measured. Consequently, we had
32
33
of the EL method with those of the parametric method. To execute the bootstrap study,
we proceeded as follows. We randomly selected 10 pooled assays of group size
with replacement. We then randomly sampled 20 assays from the individual assays,
excluding those performed on individual biospecimens that contributed to the 10 chosen
pooled assays. With our 20 sampled individuals and 10 pooled assays, we applied a
parametric likelihood method assuming a normal distributional assumption and an EL
ratio test (2.3) to calculate the 95% confidence interval of the mean of cholesterol
biomarkers. We repeatedly sampled and calculated the confidence interval of the
cholesterol mean 5,000 times, obtaining 5,000 values for the confidence interval of the
mean value of cholesterol measurements for both cases and controls. Then we take the
average of these obtained 5,000 values. Table 2.5 depicts the outputs of the bootstrap
evaluation.
Table 2.5: Bootstrap evaluations of the confidence interval estimators based on the
parametric likelihood ratio test and the EL ratio test
Health
MI
CI
Length
CI
Length
Parametric
(192.5738, 220.8708)
28.29704 (210.0585, 239.4560) 29.39748
(Normal)
Empirical
(192.9715, 221.1471)
28.17561 (210.4337, 240.5975) 30.16376
In accordance with the results above, the confidence intervals of estimators of the
cholesterol mean via the EL ratio method are close to those corresponding to the
parametric approach; therefore, we cannot observe a significant difference in the
confidence intervals related to the approaches. This result shows that, in this example, the
proposed EL approach is as efficient as the traditional parametric likelihood approach in
the context of constructing confidence intervals.
34
2.6
CONCLUSIONS
35
CHAPTER 3
AN EMPIRICAL LIKELIHOOD RATIO BASED
GOODNESS-OF-FIT TEST FOR INVERSE
GAUSSIAN DISTRIBUTIONS
3.1
INTRODUCTION
The Inverse Gaussian (IG) distribution has a probability density function of the form
( |
where
and
{
(
) }
for
modeling and analyzing right skewed data with positive support across several different
fields of science, e.g., demography, electrical networks, meteorology, hydrology, ecology,
entomology, physiology, cardiology (see, for example, Chhikara and Folks, 1977;
Bardsley, 1980; Seshadri, 1993, 1999; Johnson et al., 1994; Barndorff-Nielsen, 1994).
Given the utility of this distribution, it is meaningful to develop a corresponding
goodness-fit-test, which has satisfactory statistical properties. Towards this end, we
propose constructing a distribution-free goodness-of-fit test for the IG distribution,
which is based on approximating the appropriate parametric likelihood ratio test statistic.
The parametric likelihood approach is a powerful statistical tool, which provides
optimal statistical tests under well-known conditions; e.g. see Lehmann and Romano
36
(2005); Vexler and Wu (2009); Vexler et al. (2010) and Vexler and Tarima (2010). By
and
and
, where
and
and
or
(e.g., Lazar,
2003; Qin and Lawless, 1994; Owen, 2001; Vexler et al., 2009; Vexler et al., 2010; Yu et
al., 2010). The main advantage of the EL approach is that it is based on the maximum
likelihood methodology when given a set of well-defined empirical constraints. The EL
function of n i.i.d. observations
s,
, maximize
, where the
{ ( )
is
( )
)} (see, for details, Owen, 2001). For example, if the null hypothesis
s in
given
is an empirical
version of ( )
. The components
multipliers.
Vexler and Gurevich (2010), as well as Gurevich and Vexler (2011), utilized the main
idea of the classical EL methodology to approximate parametric likelihood ratios. The
authors proposed a nonparametric approach based on approximate density functions. In
this chapter, we derive the EL ratio test for the IG distribution, using the density-based
distribution-free likelihood approach by Vexler and Gurevich (2010). Following
Mudholkar and Tian (2002), we transform the observations to be
and present the likelihood function under the alternative hypothesis in the form of
based on
where
( ) ),
. Values of
f is a density function,
()
above is then utilized for the purpose of developing our goodness-of-fit test statistic.
Since the proposed test statistic approximates the most powerful parametric likelihood
ratio, the density-based EL ratio test is shown to have very efficient characteristics. We
also demonstrate the proposed test statistic improves upon the decision rule of Mudholkar
and Tian (2002) and maintains good type I error control for finite samples.
In Section 3.2, we create the EL test based on densities for the IG. Here, theoretical
propositions depict properties of the test. A Monte Carlo study of the power of the test is
presented in Section 3.3. Section 3.4 is given to real data examples employing the
proposed test. Section 3.5 consists of concluding remarks.
38
3.2
METHOD
and
and
,,
) is from an IG
population.
To approximate the optimal parametric likelihood ratio, we note the following issues.
In the context of testing for an IG, we must assume that the null density function, say
( )
of
hypothesis,
is known up to parameters
and
and
),
( )
( )
( )
( ).
, where
(The transformation
()
( ))
with
)
was
proposed by Mudholkar and Tian (2002) in the context of the entropy-based test for the
IG.) In the following section, we present the method for approximating the
likelihood function
-parametric
that maximize
Obviously, values of
40
. Thus, we
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
where
( ),
if
, and
( ),
if
( )
( )
( )
( )
Note that, using the empirical approximation to the remainder term in Lemma 3.1, we
have
( )
( )
( )
when
as
. For simplicity, by
applying the approximate analog to the mean value integration theorem, we can write
(
( )
41
))
Therefore, by virtue of (3.1), the empirical constraint under the alternative hypothesis is
given by
))
Consequently, under the empirical constraint (3.2), the Lagrangian function of the log EL
is
where
))
Then,
)
))
))
))
( ),
if
, and
))
(
(
( ),
if
Thus, using the maximum EL method, the likelihood ratio test statistic can be
constructed as
42
( )|
))
The density
( | )
where
).
and
can be applied to equation (3.6). Now the test statistic can be written as
(
where
) ]
and
))
proposed by Mudholkar and Tian (2002), who arrived at it in a different manner via
entropy-based consideration (for details, see Appendix 2.2). The step-by-step derivation
of the EL-based method given above demonstrates how the test statistic
is an
approximation to the optimal likelihood ratio. Thus, we expect directly that a test based
on
will provide highly efficient characteristics. The distribution of the test statistic
depends strongly on values of the integer parameter m. To efficiently execute the test
based on sample entropy, the optimal values of m should be evaluated. In accordance
with Mudholkar and Tian (2002), these optimal values of m can be presented using
information regarding the alternative distribution. We take this one step forth and can
improve upon the test statistic
integer parameter m reconsidering the test construction with respect to the EL concept.
The constraint (3.1) is taken into account in order to derive the EL ratio test based on
43
densities.
( )
( )
However,
( )
if
for
m,
(
(
( )
then
values of
some
))
satisfying
Therefore,
(Here, if
( )
, then
( ).)
satisfies
))
, for k
( )
( )
( )
( )
( )
satisfy
are subject
Equations (3.8) and (3.9) conclude the approximation to the likelihood and can be
defined as
44
( ) as
that
. This conditional
bound on m is also mentioned in the literature with respect to proving the consistency of
entropy-based tests (e.g., Vasicek, 1976; Tusnady, 1977; Vexler and Gurevich, 2010).
Thus, we propose the test statistic in the form of
(
where
utilize
) ]
and
))
(Section 3.3 shows that the power demonstrated by the proposed test is
, where C is a
test threshold. The next proposition depicts the asymptotic consistency of the test.
Proposition 3.1. Under
(
then
( (
) [ (
,
(
)
( (
( )
(
)
)
, while, under
, as
)) ]).
45
, where
, if (
( ))
The proof of Proposition 3.1 is based on Proposition 2.2 of Vexler and Gurevich (2010),
where the minimum of
when
, where
is a critical
as well as of ,
from the IG(1,1) distribution. The generated values of the test statistic
used to determine the critical values
significance level . The results of the experiment are presented in Table 3.1.
46
) were
) at the
In order to evaluate the accuracy of the obtained critical values, we depict in Table 3.2
the estimated type I error control using the 5th percentiles of
(
) test statistic
selection of the results is displayed in Table 3.2. It can be seen that the empirical
percentiles given in Table 3.1 provide excellent type I error control and thus can be
confidently recommended to be used in practice.
Table 3.1: The critical
)
, i.e.
{ (
Sample
size\
0.01
10
7.1106
15
8.4823
20
9.1504
25
9.6813
30
10.3156
35
10.7687
40
11.1102
45
11.4799
50
11.7863
55
12.2253
60
12.4767
65
12.7206
70
13.0735
75
13.3515
80
13.6866
85
13.8912
90
14.1715
95
14.3226
100
14.5004
120
15.3873
150
16.3815
200
17.6377
250
19.0457
300
20.0134
values1,
(
(
)}
0.02
6.5853
7.8319
8.4316
8.9327
9.5426
9.9797
10.3427
10.7279
11.0235
11.4291
11.6485
11.8821
12.2269
12.5373
12.8380
12.9608
13.2555
13.3931
13.5990
14.4181
15.3858
16.7383
18.0246
18.8813
0.03
6.2781
7.4314
7.9990
8.5005
9.1027
9.5197
9.8696
10.2458
10.5751
10.9412
11.1331
11.4015
11.7186
12.0057
12.3101
12.5013
12.7261
12.8802
13.0714
13.8550
14.8241
16.1220
17.3325
18.1871
0.04
6.0392
7.1291
7.6968
8.1984
8.7600
9.1735
9.5152
9.8953
10.2299
10.5814
10.7850
11.0314
11.3325
11.6373
11.9135
12.1090
12.3453
12.5106
12.6594
13.4787
14.3910
15.6491
16.8719
17.7225
47
0.05
5.8605
6.9042
7.4769
7.9547
8.4807
8.9102
9.2475
9.6262
9.9838
10.3017
10.5217
10.7672
11.0447
11.3509
11.5901
11.8101
12.0278
12.2058
12.3785
13.1542
14.0375
15.3042
16.4938
17.3100
0.06
5.7016
6.7207
7.2840
7.7582
8.2622
8.6944
9.0397
9.3990
9.7577
10.0863
10.3099
10.5457
10.7979
11.1066
11.3451
11.5501
11.7836
11.9523
12.1138
12.8996
13.7557
15.0077
16.1581
16.9705
0.07
5.5559
6.5589
7.1336
7.5782
8.0810
8.4981
8.8636
9.2114
9.5647
9.8780
10.1081
10.3398
10.5989
10.9014
11.1249
11.3309
11.5773
11.7369
11.8898
12.6652
13.5141
14.7280
15.8500
16.6827
0.08
5.4365
6.4104
6.9831
7.4365
7.9190
8.3395
8.7066
9.0504
9.3945
9.6964
9.9422
10.1695
10.4137
10.7156
10.9333
11.1470
11.3801
11.5396
11.6932
12.4551
13.2955
14.5083
15.5890
16.4113
(
Table 3.2: Type I error control 1 of the proposed
Sample size IG(1,0.5) IG(1,2) IG(1,4) IG(1,8)
10
0.0445
0.0532 0.0601 0.0654
20
0.0424
0.0503 0.0541 0.0573
30
0.0486
0.0534 0.0524 0.0553
40
0.0504
0.0514 0.0529 0.0585
50
0.0446
0.0527 0.052
0.0489
1
Simulation estimates based on 10,000 replications.
48
) test: =0.05
In this chapter, we also present an asymptotic result that insures the appropriate
significance level of our test is maintained as
an asymptotic upper bound on the significance level of the test based on the statistic
[(
(3.10). Define
( ))
( ))]
Proposition 3.2. The significance level of the proposed test satisfies asymptotically
( n ) the inequality
[( [
where
] )
( [
])
( )
) and
] ),
where
constants a and b, we suggest applying the values a = 29.42109 and b = -29.87852. These
values were obtained empirically based on a broad Monte Carlo study. For example,
when n = 100, by virtue of the Proposition 3.2, the recommended asymptotically critical
value is 12.44114 at =0.05, that corresponds to the actual type I error, 0.0507 obtained
using IG(1,1) random samples. Table 3.3 represents the actual type I errors of the
proposed test, when the critical values were chosen with respect to Proposition 3.2 for
. The results presented in Table 3.3 are based on generated samples from
IG(1,1).
49
Table 3.3: The Monte Carlo Type I errors of the proposed test with critical values
obtained by Proposition 3.2 to guarantee
.
Sample size
The Monte Carlo Type I error
100
0.0507
150
0.0621
200
0.0829
250
0.0309
300
0.0459
500
0.0334
Table 3.3 demonstrates that the upper bound obtained by Proposition 3.2 can be used
in practice.
3.3
POWER PROPERTIES
found in Mudholkar and Tian (2002). Table 3.4 shows the estimated power of our EL
goodness-of-fit test (3.10) as compared to other goodness-of-fit tests presented by
Mudholkar and Tian (2002), Mudholkar et al. (2001), Edgeman et al. (1988) and by
Edgeman (1990). Because of the problem of choosing an optimal m, we represent the test
proposed by Mudholkar and Tian (2002) for different values of m as seen in Table 3.4.
50
Z
0.0206
0.0226
0.0328
KS1
0.262
0.518
0.654
KS2
0.280 0.2060
0.525 0.4636
0.668 0.6446
Uniform(0,1)
10
20
30
0.1814
0.426
0.5764
Weibull(1,2)
10
20
30
0.0721
0.1611
0.277
LogNor(0.5,1) 10
0.04672 0.0464 0.0437 0.0318 0.0407 0.048 0.055 0.0460
20
0.0588 0.05352 0.0441 0.0351 0.042
0.068 0.080 0.0541
2
30
0.0707 0.0679 0.0591 0.0517 0.0397 0.082 0.111 0.0547
1
Simulation estimates based on 10,000 replications. 2 Values corresponded to the optimal
m found empirically by Mudholkar and Tian (2002) given known alternatives. Kn,m
Entropy-based test (Mudholkar and Tian, 2002). Z Indepedence characterized test
(Mudholkar et al., 2001). KS1 Modified Kolmogorov-Smirnov test (Edgeman et al.,
1988). KS2 Kolmogorov-Smirnov test using transformation (Edgeman, 1990).
The Monte Carlo study demonstrates the power of the EL goodness-of-fit test is
superior to or about equal to the Mudholkar and Tian (2002) test with values of m that
were selected by Mudholkar and Tian empirically, and given known alternatives (see
Remark 4.2 in Mudholkar and Tian, 2002). Table 3.4 depicts how the selection of m
strongly affects the powers of the test statistic proposed by Mudholkar and Tian (2002).
The wrong choice of m can lead to a 50% reduction in the power of the entropy-based
test by Mudholkar and Tian (2002). We investigate the power in the next section utilizing
the real data examples.
51
3.4
DATA EXAMPLES
In this section, we use the proposed EL goodness-of-fit test described above and the test
proposed by Mudholkar and Tian (2002) to evaluate the appropriateness of the IG
distribution to data from four different studies that were analyzed in Folks and Chhikara
(1978). The data sets were composed of shelflife (days) of a food product (say, Dataset 1),
fracture toughness of MIG (metal inert gas) welds (say, Dataset 2), precipitation (inches)
from Jug Bridge, Maryland (say, Dataset 3), and runoff amounts at Jug Bridge, Maryland
(say, Dataset 4). Table 3.5 presents the p-values obtained via the EL goodness-of-fit test
and the entropy-based test by Mudholkar and Tian (2002) for our example data.
Table 3.5: Test for the IG based on the data introduced in Folks and Chhikara (1978).
Dataset
Sample size Test
p-value
1
26
0.0063
K26, m=2
0.0061
K26, m=3
0.0058
K26, m=4
0.0090
K26, m=5
0.0041
2
19
K19, m=2
K19, m=3
K19, m=4
K19, m=5
0.1791
0.1565
0.1516
0.2663
0.3379
K25, m=2
K25, m=3
K25, m=4
K25, m=5
0.0099
0.0197
0.0159
0.0098
0.0063
K25, m=2
K25, m=3
K25, m=4
K25, m=5
0.9271
0.9393
0.9538
0.8738
0.8001
25
25
52
Based on the results from Table 3.5, our test and the test by Mudholkar and Tian
(2002) provide identical conclusions about the goodness-of-fit test of the IG distribution
at
, the
Mudholkar and Tian (2002) test provides different decisions depending on values of m
for Dataset 3. The density-based EL ratio goodness-of-fit test demonstrates high
sensitivity in these data examples given non-IG alternatives.
In addition to our illustration, a bootstrap type study was conducted to examine the
proposed EL-based goodness-of-fit test and the test by Mudholkar and Tian (2002). From
each of Datasets 1, 3 and 4, two samples with the sizes 15 and 20 were randomly selected,
respectively, in order to be tested for an IG fit at a 5% level of significance. Two samples
of sizes 10 and 15 were randomly selected from Dataset 2, respectively, to be tested for
an IG fit at a 5% level of significance. We repeated this strategy 10,000 times calculating
the frequencies of the events
and
= 5.8605,
= 6.9042,
similar manner to the structure above, we examined the test proposed by Mudholkar and
Tian (2002). For Dataset 2, the proposed test rejected the IG assumption in 2625 (the case
of n = 10) and 4059 (the case of n = 15) events, while the test proposed by Mudholkar
and Tian (2002) rejected the IG assumption in 2583 and 3762 events, respectively. Thus,
53
this indicates that our proposed method is more sensitive as compared to the Mudholkar
and Tians test. The results of the bootstrap type studies are presented in Table 3.6.
Table 3.6: Bootstrap type proportion of rejection1 of the tests for the IG at a 5% level of
significance.
Sample size Kn,m=2 Kn,m=3 Kn,m=4 Kn,m=5
Dataset 1
15
0.7199 0.6655 0.6799 0.7024 0.6796
20
0.9023 0.8574 0.8468 0.8662 0.8499
Dataset 2
10
0.2961 0.2583 0.2784 0.2226 0.2625
15
0.5649 0.3762 0.3302 0.3422 0.4059
Dataset 3
15
0.6880 0.6486 0.5989 0.6507 0.6389
20
0.8422 0.8168 0.7961 0.7893 0.8082
Dataset 4
15
0.1898 0.0997 0.0728 0.0767 0.0938
20
0.2820 0.1318 0.0889 0.0761 0.0948
1
Bootstrap type estimates based on 10,000 replications.
Table 3.6 confirms the practical applicability of the proposed test.
3.5
CONCLUSIONS
dependence on the parameter m. Theoretical support for the proposed EL ratio test is
obtained by proving consistency of the new test and an asymptotic proposition regarding
the null distribution of the density-based EL ratio test statistic. The data examples
demonstrated that the proposed test is reasonable when applied to real data. In general,
the methodology presented in this chapter can be easily adapted to construct different
nonparametric tests, approximating the optimal likelihood ratios via the EL concept.
55
CHAPTER 4
AN EXTENSIVE POWER EVALUATION OF A
NOVEL TWO-SAMPLE DENSITY-BASED
EMPIRICAL LIKELIHOOD RATIO TEST FOR
PAIRED DATA
4.1
INTRODUCTION
Group-comparison methods have been well addressed at clinical field trials and different
biostatistical applications. In many cases, investigators execute a design of experiment
that yields pre-treatment and post-treatment measurements in order to evaluate different
treatment effects. Therefore, it is desirable to have an efficient statistical tool that can be
utilized to compare treatments in studies based on paired data related to different
populations.
To operate with these two independent sets of paired observations, classical
procedures can be applied to the two separate sets of paired values, including the
independent two-sample t-test and the Wilcoxons test (Wilcoxon rank sum test). The
statistical literature has extensively pointed out several issues related to these classical
tests. For example, Albers et al. (2001) indicated that in the case of nonconstant shift
alternatives, the Wilcoxons test may break down completely. The independent twosample t-test is known commonly to have very good properties based on data that are
56
close to be normally distributed, but this test cannot be recommended to be applied using
observations from strongly skewed distributions (see, e.g. Vexler et al., 2009). Note that
the type I error of the t-test based on non-normally distributed data can be controlled just
for large sample sizes using the asymptotic distribution of the t-test statistic. The t-test,
which is mainly used to detect a change of mean, may be inadequate to detect changes of
other measures of location, e.g. the medians. The two-sample Kolmogorov-Smirnov test,
in several situations, is recognized to show relatively lower powers against the other
classical tests. These classical tests are developed to solve general problems to compare
distributions of two populations.
This chapter aims to provide a powerful test to detect differences between
distributions of two therapy groups using the paired data. Towards this end, we propose
to adapt the test introduced by Gurevich and Vexler (2011) and to reconstruct the form of
the Gurevich and Vexlers test statistic based on paired data rather than independent
observations. In a similar manner to the test presented by Gurevich and Vexler (2011),
we approximate nonparametrically the most powerful Neyman-Pearson test statistics to
solve the stated problem. By virtue of the Neyman-Pearson lemma, the likelihood ratio
tests are most powerful decision rules when the forms of distribution functions are known
under the null and alternative hypotheses, respectively (e.g. Lehmann and Romano, 2005;
Vexler and Wu, 2009; Vexler et al., 2010). However, in some cases, required parametric
forms of distributions can be unavailable. Thus, we provide a distribution-free alternative
approach that approximates the optimal parametric likelihood ratio test, utilizing the
density-based empirical likelihood methodology.
57
The empirical likelihood (EL) methodology has been discussed considerably as one of
the principal nonparametric techniques in the statistical literature (e.g. Qin and Lawless,
1994; Owen, 1988, 1990, 2001). The EL ratio approach provides researchers with
methods that are asymptotically close to the parametric likelihood techniques without
parametric assumptions (e.g. Lazar and Mykland, 1998). This approach is based on the
likelihood in the form of
{ ( )
)} , where
, are
, given empirical constraints (e.g. Owen, 1988, 1990, 2001). For example,
declares the empirical constraint
the assumption
. Nevertheless,
according to the Neyman-Pearson lemma, the most powerful test statistic for testing the
is the likelihood ratio
hypothesis
( ), where
functions
( ) and
( ) and
( )
the observations
( )
and
, where
( )
( )
58
( ) ),
( ) is a density function of
( ) ),
Recently, it has been shown that one-sample density-based EL ratio tests for
goodness-of-fit introduced by Vexler and Gurevich (2010) can be efficiently applied to
construct goodness-of-fit tests. Gurevich and Vexler (2011) and Vexler and Yu (2011)
extended the density-based EL ratio tests for goodness-of-fit to tests that can be applied
to twosample problems. In this chapter, we extend and modify the Gurevich and
Vexlers test (2011) to compare two independent samples, where each sample is based on
paired data instead of independent observations. The contributions of this chapter lie
mainly in three directions. First, the proposed test is constructed to detect between-groupdifferences with respect to treatment effects based on paired data. Note that the twosample tests mentioned in the literature (e.g., Gurevich and Vexler, 2011) can be shown
to have good characteristics based on independent observations; however, they might
have a relatively weak efficiency to compare treatment effects. Second, we relax the
bounds of the test parameters and demonstrate the robustness of the proposed method
with respect to values of test parameters. The density-based EL literature states that
properties of density-based EL ratio procedures do not depend significantly on values of
the parameters. Thus, one of the objectives of this chapter is to confirm this fact by
evaluating the proposed test for different values of parameters via an extensive Monte
Carlo study. Lastly, we analyze powers of the proposed density-based EL ratio procedure
based on paired data in an extensive Monte Carlo study, where we utilize various
scenarios of distributions and sample sizes. It should be noted that in nonparametric
settings, commonly there are no statistical tools that can be recommended uniformly for
59
all possible scenarios. Therefore, in the context of applied statistics, extensive empirical
evaluations of the powers of the proposed nonparametric test and the relevant classical
nonparametric procedures would be very helpful to provide investigators with
information regarding the efficiency of the proposed nonparametric technique as
compared to standard procedures.
The applicability of the proposed test in practice is illustrated via the motivating
example based on a dataset from the Center for Children and Families, University at
Buffalo, the State University of New York. The purpose of the study is to investigate the
feasibility and efficacy of a group-based therapy program for children with AttentionDeficit/Hyperactivity Disorder (ADHD) and Severe Mood Dysregulation (SMD). ADHD
is a common diagnosed psychiatric disorder in children (e.g. Biederman, 1998; Nair et al.,
2006). SMD, recently created by the Leibenlufts laboratory in the National Institute of
Mental Healths intramural program, refers to children with hyperarousal, an abnormal
baseline mood, and increased reactivity to negative emotional stimuli (e.g. Brotman et al.,
2006; Carlson, 2007; Leibenluft et al., 2003; Waxmonsky et al., 2008). Both ADHD and
SMD can significantly impair childrens behavioral
and psychophysiological
60
evaluate the treatment effects of ADHD and SMD in children using the proposed densitybased EL ratio test. To evaluate efficiency of the proposed density-based EL ratio test, we
construct appropriate tests based on the classical procedures that are the independent twosample t-test, the Wilcoxons test, and the two-sample Kolmogorov-Smirnov test.
The rest of this chapter is organized as follows. In Section 4.2, we propose the twosample density-based EL ratio test based on paired data to compare treatment effects into
groups. The asymptotic consistency of the proposed test is presented in Section 4.2.
Section 4.3 provides Monte Carlo comparisons between the proposed test and classical
procedures using various designs of data distributions. This section also confirms
experimentally the high efficiency of the density-based EL ratio technique utilizing
different scenarios of data distributions that have not been addressed in the EL literature.
The application of the proposed procedure to the treatment study of ADHD and SMD is
demonstrated in Section 4.4. In Section 4.5, we provide some concluding remarks.
4.2
METHOD
In this section, we extend and adapt the Gurevich and Vexlers test (2011) to develop an
efficient density-based EL ratio method for the two-sample testing problem in paired data
settings. We begin to formalize the testing problem. Suppose that we have independent
paired observations (
),, (
),, (
and
sample tests for paired data, such as the paired t-test and the Wilcoxon signed rank test,
consider differences
) and (
61
). In this case,
and
( ),
( ),
against
where
and
and
(4.1), we compare two samples based on pre- and post-treatment measurements. In the
context of two-sample comparisons of data distributions, Gurevich and Vexler (2011)
proposed to approximate optimal likelihood ratios using the EL concept. The test
proposed by Gurevich and Vexler (2011) was not evaluated when paired data were
utilized. We extend and modify this approach to be applied to the statement of problem
(4.1). In accordance with the stated problem, the corresponding parametric likelihood
ratio test statistic, , for (4.1) takes the form of
where
( )
)
( )
( ) ),
( )
(
),
( )
( )
( ) ),
( )
(
(
) and (
( ) ),
with respect to
( )
( ) ),
and
by maximizing
62
( )
( )
( )
(
(
( )
( )
)
)
( )
( )
( )
if
, and
if
( )
( )
we obtain
and
Note that
when
, as
)
)
( )
63
( )
( )
( )
( )
))
) ))
( )
( (
))
in the form of
(
))
) ))
, the Lagrange
where
[
where
( )
))
if
, and
(
(
) )]
(
))
if
. As a result,
is given by
) )]
64
practice, we need to adapt the test structure by eliminating the dependence on the integer
parameter m. Applying arguments similar to those presented in the Appendix of Gurevich
and Vexler (2011) and using the maximum EL concept, the modified test statistic can be
written in the form of
where
, and
))
. The bounds
) )]
and
are declared
by Proposition 4.1 mentioned below to present the consistency of the proposed test.
Likewise, following the same density-based EL technique, we can obtain the densitybased EL estimator of
given by
where
(
,
if
,
, and
( )
))
( )
if
)
) )]
)).
Finally, taking the log of the equations (4.4) and (4.5) and combining them together yield
our new density-based EL ratio test statistic as
(
Since the test statistic
) is
very powerful. The decision rule developed for hypothesis testing (4.1) is to reject the
null hypothesis if
(
)
65
where
arbitrarily define
be zero. Likewise, if
defined by (
))
))
))
))
if it turns out to
, then it is arbitrarily
test.
Proposition 4.1. Let
(
)), (
))
,(
. Define
)
in the definitions
)
, as
where
{
{
as
(
(
)
))}
)
(
(
(
)
))}
)
The proof of this proposition is based on a proof scheme of the Proposition 4.1 by
Gurevich and Vexler (2011) with additional applications of Bahadur theorems (1966).
We omit the lengthy and technical proof of Proposition 4.1 for brevity.
Proposition 4.1 demonstrates that the proposed test statistic
) is
asymptotically consistent. Note that the structure of the proposed test includes parameters
and . One of the aims of this chapter is to show that the properties of the proposed
66
and
, and
under
, using, e.g.
4.3
SIMULATION STUDY
To evaluate the performance of the proposed method, the following Monte Carlo
experiment was carried out. We begin by tabulating the critical values of the null
distribution.
))
( ) and (
and
:
)
and
50,000 times, we obtained 50,000 replicate samples of the test statistic, which were used
67
) at the level of
and
with
different selected values of a pair ( ). The simulation results of the experiment are
shown in Table 4.1.
Table 4.1: The critical values
of the proposed test at (4.7) with various ( ) values for
sample sizes (
) at the significant level
0.01, 0.05, and 0.1, respectively.
Parameter Values
( )
(0, 0.5)
(0, 0.6)
(0, 0.7)
(0, 0.8)
(0, 0.9)
(0.1, 0.5)
(0.1, 0.6)
(0.1, 0.7)
(0.1, 0.8)
(0.1, 0.9)
(0.3, 0.5)
Sample sizes (
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
(10, 10)
9.3453
7.8281
7.1484
9.2927
7.7415
7.0639
9.2329
7.7251
7.0720
9.2685
7.7005
7.0513
9.2810
7.7377
7.0594
9.3505
7.8262
7.1352
9.2892
7.7570
7.0718
9.2044
7.7193
7.0600
9.3381
7.7256
7.0513
9.3553
7.7695
7.0639
9.8085
8.2263
7.5314
(15, 15)
10.3877
8.7486
7.9681
10.3000
8.7194
7.9550
10.2949
8.7133
7.9764
10.3264
8.7036
7.9291
10.2860
8.6926
7.9694
10.4619
8.7432
7.9729
10.4074
8.7067
7.9626
10.3085
8.7452
7.9890
10.3016
8.6902
7.9656
10.2482
8.6913
7.9372
10.6346
8.9543
8.1834
(15, 25)
11.1116
9.3777
8.5862
11.1116
9.3777
8.5862
11.1237
9.3523
8.5521
11.0694
9.3184
8.5367
11.0481
9.3487
8.5270
11.1104
9.4116
8.6065
11.0543
9.3638
8.5585
11.1186
9.3878
8.5748
10.9595
9.3178
8.5535
11.0056
9.2939
8.5332
11.4540
9.7283
8.9023
68
(25, 25)
11.9537
10.1647
9.3214
11.9506
10.2037
9.3785
11.8968
10.2180
9.3738
11.9343
10.1623
9.3326
11.8869
10.1585
9.3615
11.8833
10.1560
9.3454
11.9135
10.1363
9.3592
11.8887
10.1763
9.3506
11.8469
10.1685
9.3372
11.9070
10.1758
9.3347
12.3162
10.5765
9.7645
(35, 35)
13.1402
11.3360
10.4848
13.1462
11.2840
10.4399
13.1292
11.2918
10.4552
13.1525
11.3200
10.4634
13.1518
11.3087
10.4593
13.0643
11.2571
10.4272
13.0473
11.2593
10.4239
13.1386
11.3269
10.4776
13.1968
11.3018
10.4385
13.3179
11.3176
10.4670
13.3237
11.4876
10.6432
(45, 45)
14.2593
12.3041
11.4243
14.1090
12.2720
11.4064
14.1711
12.2998
11.4066
14.0960
12.3017
11.4200
14.2269
12.2990
11.4061
14.1751
12.2677
11.4030
14.0862
12.2328
11.3824
14.1869
12.3182
11.4320
14.1934
12.3011
11.4067
14.1376
12.3093
11.4092
14.2501
12.3624
11.5093
(50, 50)
14.7108
12.7678
11.8860
14.7456
12.7199
11.8339
14.6170
12.7341
11.8278
14.6327
12.7315
11.8356
14.6962
12.7476
11.8665
14.6041
12.7245
11.8335
14.6985
12.7414
11.8510
14.6367
12.6919
11.8422
14.5721
12.7192
11.8361
14.6761
12.7526
11.8606
14.7420
12.7922
11.8779
(10, 10)
9.7274
8.1814
7.4760
9.7781
8.1341
7.4574
9.7059
8.1505
7.4614
9.7991
8.1529
7.4731
10.6112
8.9814
8.2424
10.5291
8.8678
8.1257
10.3935
8.8601
8.1440
10.6310
8.9170
8.1706
10.5404
8.8990
8.1549
10.5463
8.8759
8.1643
10.5436
8.8847
8.1368
10.5858
8.9052
8.1799
10.5665
8.8779
8.1761
11.5131
9.7490
9.0446
(15, 15)
10.5522
8.9218
8.1521
10.5874
8.9217
8.1852
10.5922
8.9444
8.2003
10.5570
8.8921
8.1300
11.2950
9.5875
8.7705
11.2244
9.5175
8.7307
11.1481
9.5108
8.7274
11.2006
9.5283
8.7469
11.1530
9.4997
8.7254
12.0270
10.2588
9.4938
12.0173
10.2115
9.4756
11.9449
10.2432
9.4871
11.9446
10.2538
9.4856
12.8077
11.0887
10.3768
Sample sizes (
(15, 25) (25, 25)
11.4267 12.3644
9.6847 10.5418
8.8954 9.7147
11.4992 12.3325
9.7175 10.5579
8.9117 9.7226
11.4186 12.2692
9.6586 10.5353
8.8859 9.7096
11.4600 12.3332
9.7241 10.5610
8.8893 9.7366
12.1616 13.0224
10.3334 11.1941
9.5210 10.3597
12.1387 12.9591
10.2938 11.1354
9.4957 10.3177
12.0465 12.9445
10.2899 11.1424
9.4846 10.3336
12.0671 12.8801
10.3300 11.1162
9.5189 10.3191
12.1275 12.9025
10.2749 11.1423
9.4806 10.3266
12.9370 13.6077
11.1353 11.8599
10.2954 11.1037
12.8799 13.6496
11.0909 11.8876
10.2961 11.1079
12.9062 13.7680
11.1300 11.9260
10.3134 11.1472
12.9200 13.7055
11.0918 11.8834
10.2965 11.1167
14.2238 15.3984
12.4651 13.7289
11.7368 13.0363
69
, )
(35, 35)
13.3133
11.4443
10.6206
13.3398
11.5310
10.6658
13.3090
11.4801
10.6150
13.4335
11.5100
10.6499
13.7326
11.9243
11.0983
13.7107
11.9068
11.0790
13.7722
11.9444
11.0964
13.6517
11.9248
11.1131
13.7767
11.9325
11.0896
15.1018
13.3568
12.5803
15.2383
13.3845
12.6004
15.1518
13.3821
12.5850
15.1155
13.3538
12.6088
16.9911
15.2774
14.5681
(45, 45)
14.3610
12.3914
11.5111
14.2964
12.3872
11.5029
14.2615
12.3402
11.4671
14.2825
12.3721
11.5001
15.0571
13.2592
12.4152
15.0017
13.2433
12.4149
15.0430
13.2021
12.3860
15.1367
13.2400
12.4186
15.0916
13.2249
12.4138
16.5377
14.7771
14.0069
16.5816
14.8188
14.0539
16.5732
14.7714
14.0195
16.5425
14.7624
14.0037
19.3162
17.7350
17.0535
(50, 50)
14.6681
12.7798
11.9154
14.7754
12.7980
11.9111
14.7270
12.8266
11.9277
14.6613
12.7856
11.9121
15.3855
13.5104
12.7124
15.5087
13.5428
12.6994
15.4970
13.5794
12.7211
15.4345
13.5226
12.6900
15.5587
13.5991
12.7083
16.8149
15.0410
14.2730
16.8500
15.0384
14.2531
16.7982
15.0326
14.2518
16.8217
15.0570
14.2668
19.5837
17.9512
17.2144
(10, 10)
11.5090
9.7469
9.0276
11.6090
9.7483
9.0256
12.5478
10.7823
10.0446
12.6427
10.7848
10.0603
(15, 15)
12.7709
11.0969
10.3841
12.7505
11.0864
10.3713
14.7903
13.2178
12.5575
14.7093
13.1639
12.5262
Sample sizes (
(15, 25) (25, 25)
14.2371 15.3914
12.4816 13.7190
11.7555 13.0406
14.1758 15.3425
12.4518 13.7218
11.7241 13.0369
16.5983 18.4280
15.0707 16.9354
14.4402 16.3259
16.6315 18.4181
15.0734 16.9273
14.4476 16.3094
, )
(35, 35)
16.9450
15.2892
14.5705
16.9624
15.2435
14.5397
20.9676
19.5255
18.9088
20.9917
19.5463
18.9148
(45, 45)
19.3345
17.7199
17.0355
19.3088
17.7318
17.0605
23.6376
22.1146
21.4800
23.6172
22.1180
21.4977
(50, 50)
19.6321
17.9117
17.2180
19.5333
17.9099
17.2074
24.8691
23.4102
22.7600
24.8397
23.3599
22.7322
In order to show the appropriateness of the critical values displayed in Table 4.1, we
present in Table 4.2 the corresponding type I errors of the proposed test that were
obtained by utilizing critical values from Table 4.1 (
samples under the null hypothesis. As can be seen from Table 4.2, the type I error rates of
the proposed test are well controlled. These results were expected since the proposed test
is exact. Therefore, critical values of the proposed test can be exactly calculated and the
associated estimated type I error rates can be well maintained at the nominal level .
Table 4.2: Type I error control of the proposed test statistic (4.6) with ( ) (
at the significant level
.
Design
Sample sizes (
)
vs.
(10,10)
(15, 15)
(15, 25)
(25, 25)
(35, 35)
(45, 45)
(50, 50)
0.0503
0.0500
0.0508
0.0512
0.0490
0.0524
0.0503
0.0511
0.0483
0.0511
0.0508
0.0515
0.0495
0.0498
0.0509
0.0493
0.0487
0.0488
0.0513
0.0494
0.0514
0.0505
0.0485
0.0494
0.0498
0.0501
0.0493
0.0494
0.0502
0.0504
0.0500
0.0495
0.0516
0.0526
0.0507
vs.
0.0479
0.0503
0.0493
0.0513
0.0519
0.0500
0.0507
70
and
). 10,000
distributions that can be categorized as the following cases: (i) constant location shifts
such as design K1 - N(0, 1) vs. N(0.5, 1) presented in Table 4.3; (ii) constant versus
nonconstant location shifts such as design K2 - N(0, 1) vs. N(0.5, 1.32); (iii) skewed data
such as design K27 -
alternatives distributions.
71
Table 4.3: Designs of the alternative hypothesis to be applied to the following Monte
Carlo evaluations of the powers of the proposed test (4.6).
Design
K1
N(0, 1)
N(0.5, 1)
K2
N(0, 1)
N(0.5, 1.32)
K3
N(0, 1)
N(0.5, 1.52)
K4
N(0, 1)
N(0.5, 2.252)
K5
N(0, 1)
N(0, 1.52)
K6
N(0, 1)
Unif[-1,1]
K7
N(0, 1)
Cauchy(0, 1)
K8
Exp(1)
LogNorm(0, 1)
K9
Exp(1)
LogNorm(0, 22)
K10
Beta(0.7, 1)
Exp(2)
K11
LogNorm(0, 1)
LogNorm(0.5, 1.52)
K12
LogNorm(0, 1)
LogNorm(1, 2.52)
K13
LogNorm(0, 1)
LogNorm(0, 1.22)
K14
LogNorm(0, 1)
Unif[1,2]
K15
Gamma(1, 1)
Gamma(1, 2)
K16
Gamma(1, 1)
Gamma(1, 0.5)
K17
Gamma(1, 1)
Gamma(1, 1.25)
K18
Gamma(1, 1)
Gamma(3, 1.5)
K19
Gamma(1, 1)
Gamma(1, 5)
K20
Gamma(1, 1)
Gamma(1, 10)
K21
Gamma(1, 1)
Gamma(1, 50)
K22
Gamma(1, 1)
Gamma(5, 2.5)
K23
Gamma(1, 1)
Gamma(10, 2.5)
K24
( )
( )
K25
LogNorm(0,
42)
( )
K26
LogNorm(0, 32)
( )
K27
Exp(0.1)
( )
K28
Exp(0.1)
( )
A summary of the power study is displayed in Figure 4.1.
The presented three-dimensional plots represent the following: The x-axis corresponds
to sample sizes (
(50, 50), (45, 45), (35, 35), (25, 25), (15, 25), (25, 15), (15, 15);
the y-axis stands for different values of a pair ( ); the z-axis represents the Monte
Carlo powers of the considered tests. The following symbols represent each test utilized
in the Monte Carlo study: the powers of the proposed test are displayed using the symbol:
; for the classical procedures, we utilize the following symbols: the two-sample
72
do not satisfy
the conditions of Proposition 4.1 as pointed out in the previous section. However, we
investigate these cases to show that the proposed test has stable operating characteristics
with respect to different values of and .
In design K1, the alternative has a constant location shift. When observations are from
a normal distribution, it is anticipated that the t-test has the greatest power among all
considered tests. In this case, the t-test represents the maximum likelihood ratio test based
on correct assumptions regarding sample distributions. The Wilcoxons test is also known
to be efficient when a constant shift is in effect under
that the classical tests perform better for such types of alternatives. In these situations, the
proposed test has 10% or 20% lower in power as compared to the classical procedures.
Designs K2, K3, and K4 have constant and nonconstant location shifts in normal
distributions. In these cases, the power differences between the proposed test and the
classical procedures become larger with the increase of the scale parameter of the
alternative distributions. As can be seen in design K2, the powers of the proposed test are
close to those of the classical procedures. When the scale parameter of the alternative
distribution increases from 1.52 in design K3 to 2.252 in design K4, the powers of the
proposed test become much larger than those of the classical tests; for design K5 with a
scale shift, design K6 with the symmetric Uniform[-1,1] function, and design K7 having
the heavy-tailed symmetric Cauchy distribution as a role of an alternative distribution, it
is obvious that the proposed test is superior to the classical approaches. When we have
nonconstant shifts such as the exponential and lognormal cases in the designs K8 and K9
73
as well as the beta and exponential cases in the design K10, the proposed test clearly
outperforms the classical tests.
In the skewed lognormal case of design K11, the proposed test is more powerful than
the classical procedures in most situations. Only in the cases of large sample sizes (e.g.
50 or 45) or sample sizes of (
power than the proposed test. In the remaining skewed lognormal cases of the designs
K12-K14, the proposed test obviously has higher power than the classical procedures. In
the skewed gamma cases of the designs K15-K23, we can see that the proposed test does
not always perform better than the classical tests. However, their power differences are
not significant; in the skewed chi-squared cases of the designs K24-K28, the proposed
test is superior to the classical tests except for the design K24. The powers of the
proposed test in design K24 are still close to those of the classical tests. We would like to
note again that the type I error control related to the t-test was executed asymptotically,
i.e. the powers demonstrated for the t-test may not be corresponded to the significance
level of 0.05.
In summary, the Monte Carlo outputs report that the powers of the proposed test do
not depend significantly on the values of (
anticipated, the proposed test works quite well, in general, outperforming the classical
procedures in many cases. For a few considered sample sizes and designs, powers of the
proposed test and the classical standard procedures are comparable. For most of sample
sizes and alternative designs considered, the proposed test is found to be superior to the
classical tests for the two-sample problems based on paired data.
74
4.4
In this section, we applied our new method to a study conducted at the Center for
Children and Families, University at Buffalo, State University of New York. A novel
group therapy for children with Attention-Deficit/Hyperactivity Disorder (ADHD) and
severe Mood Dysregulation (SMD) symptoms was created to develop effective
treatments for children with ADHD and SMD. The subjects recruited in this study were
32 children ages 7 to 12 with ADHD and SMD. They were randomly assigned to receive
either the experimental 11 week group therapy program or community psychosocial
treatment. The former was defined as a therapy group with the sample size
whereas the latter was referred to as a control group with the sample size
Measurements were taken at 2 time points: Baseline (week 0) and Endpoint (week 11).
The paired data constituted by the amount of changes in Childrens Depression Rating
Scale Revised total score (CDRS-Rts) at baseline and endpoint were utilized as our
interested outcome measures (see, e.g., Poznanki et al., 1979, 1984).
Let (
),, (
) and (
),, (
) be the CDRS-Rts
observations obtained at baseline and endpoint in the therapy group (group 1) and the
control group (group 2), respectively. We consider the differences
. Define (
and define
differences in CDRS-Rts
differences between the new therapy and control groups with respect to treatment effects
of ADHD and SMD in children. Specifically, we are interested in testing if there are
75
differences in the distributions of paired CDRS-Rts data between two treatment groups.
The empirical histograms of
-sample and
The mean and standard deviation of the paired data in group 1 are -6.8235 and 4.6534,
respectively, while those based on paired data in group 2 are -3.8667 and 4.4540,
respectively. In Section 4.3, the powers of the proposed test are shown experimentally to
be independent of the values of ( ). Without loss of generality, in this example, we
utilize
(
and
) with
Smirnov test, Wilcoxons test, and t-test, we obtained their p-values also larger than the
significance level
According to these testing results based on the full dataset, our new test and the classical
tests lead to the identical conclusion that the distributions of paired CDRS-Rts data have
no statistically significant differences between two therapy groups at
, implying
76
(7, 5), (9, 7), (11, 9), (13, 11), (15, 13), (17,
(
). To obtain a p-value
(
) in the null
distribution that generated from a standard normal distribution N(0,1). We repeated this
resampling with replacement 1,000 times and obtained a p-value in each resample. An
average p-value was computed by taking the average of the obtained 1,000 p-values. The
results of the average p-values of the considered tests in this study are displayed in Figure
4.3. It can be seen from Figure 4.3 that in each case of sample sizes, the p-value
decreases when the sample sizes of each group increase for all tests, implying that
probably when we have more data, the p-values will decrease to be less than the
significant level
). Two samples of size 10 were selected at random from our original data to be
chosen from Table 4.1). Table 4.4 depicts the results. The proposed test rejects the null
hypothesis in 4273 cases, while the two-sample Kolmogorov-Smirnov test, the
Wilcoxons test, and the independent two-sample t-test rejects the null hypothesis in 1226,
2786, 2889 cases, respectively. The number of rejection in each test is not large
compared to 10,000 resampling times (i.e. the rejection rate in each test is small). It again
77
demonstrates that when we do not have enough data, the null hypothesis would be more
inclined to be not rejected. Note that the proposed test has the largest proportion of
rejection, showing that our new test is reliable and most sensitive to detect the differences
between two samples based on paired data in comparison to the classical procedures.
Table 4.4: Proportion of rejection* based on the bootstrap method for each considered test.
Considered Tests
Bootstrapping type study based on (
) (10, 10)
Proposed test
0.4273
Kolmogorov-Smirnov test
0.1226
Wilcoxons test
0.2786
t-test
0.2889
*The proportion of rejection of each test from the bootstrap method was computed based
on sample sizes (
) (10, 10) and 10,000 replications.
4.5
CONCLUSIONS
In this chapter, we extended and adapted the Gurevich and Vexlers test (2011) to
develop a nonparametric approach for comparing treatment effects with respect to two
study groups of individuals involved into biomedical studies. In contrast to the Gurevich
and Vexlers test (2011), we relaxed the boundaries on the values of the test parameters
(considering the case of 0<
< 0.5) and used Monte Carlo simulations to see how that
affects the performance of the proposed test. The simulation results demonstrated that the
proposed test has stable operating characteristics with respect to values of parameters
and . Moreover, we extensively examined the power properties of the proposed test and
relevant classical tests. The study results demonstrated that the proposed test is very
efficient, even outperforming the standard procedures in many cases. We showed that
when the underlying data are normally distributed and only a location shift is assumed
under the alternative hypothesis, the proposed test has high and stable power to detect
differences in location, resulting in a relatively small power loss as compared to the
78
classical Wilcoxons test, the two-sample Kolmogorov-Smirnov test, and the independent
two-sample t-test. On the contrary, in the case of nonconstant location-shift alternatives
with normally distributed data, it turns out that the proposed test achieves a substantial
power gain in comparison to the standard procedures. Furthermore, we applied the
proposed method to the real data example, showing that the proposed procedure helped to
confirm the decision regarding the treatment effect of ADHD and SMD in children. This
illustrates the practical applicability of the proposed test. Therefore, the proposed test can
be utilized as a very powerful tool in nonparametric statistical inference applied to twosample problems based on paired data. The proposed approach can be easily extended to
consider k-sample problems.
79
Figure 4.1: 3-D plots of powers of the considered tests via all 28 designs (K1-K28) with
) (50, 50), (45, 45), (35, 35), (25, 25), (15, 25), (25, 15),
different sample sizes (
(15, 15) and parameter settings of ( ) (0, 0.5), (0, 0.6), (0, 0.7), (0, 0.8), (0, 0.9);
(0.1, 0.5), (0.1, 0.6), (0.1, 0.7), (0.1, 0.8), (0.1, 0.9); (0.3, 0.5), (0.3, 0.6), (0.3, 0.7), (0.3,
0.8), (0.3, 0.9); (0.4, 0.5), (0.4, 0.6), (0.4, 0.7), (0.4, 0.8), (0.4, 0.9); (0.5, 0.6), (0.5, 0.7),
(0.5, 0.8), (0.5, 0.9); (0.6, 0.7), (0.6, 0.8), (0.6, 0.9); (0.7, 0.8), (0.7, 0.9) at the significant
level
.
80
81
82
83
84
85
86
Figure 4.2: Histograms of the differences in CDRS-Rts at baseline and endpoint in the
group 1 (
samples) and group 2 (
samples), respectively
Figure 4.3: Plot of sample sizes vs. p-value using a bootstrap method
p-value
0.5
0.4
Proposed test
0.3
Kolmogorov-Smirnov test
0.2
Wilcoxons test
0.1
t-test
0
(7, 5)
(9, 7)
Sample sizes
87
CHAPTER 5
TWO-SAMPLE DENSITY-BASED EMPIRICAL
LIKELIHOOD RATIO TESTS BASED ON PAIRED
DATA, WITH APPLICATION TO A TREATMENT
STUDY OF ATTENTION-DEFICIT/
HYPERACTIVITY DISORDER AND SEVERE
MOOD DYSREGULATION
5.1
INTRODUCTION
Often, investigators in various fields of medical studies deal with paired data to compare
different population groups. In this chapter, we propose a paired data-based methodology
motivated by the following comparative study of Attention-Deficit/Hyperactivity
Disorder (ADHD) and Severe Mood Dysregulation (SMD). ADHD is a common
diagnosed psychiatric disorder in children (e.g., Biederman, 1998; Nair et al., 2006).
SMD is a diagnostic label recently created by the Leibenlufts laboratory in the National
Institute of Mental Healths intramural program to refer to children with an abnormal
baseline mood, hyperarousal, and increased reactivity to negative emotional stimuli (e.g.,
Brotman et al., 2006; Carlson, 2007; Leibenluft et al., 2003; Waxmonsky et al., 2008). A
novel group therapy study at University at Buffalo enrolled 32 children aged 7-12 with
88
ADHD and SMD. These children were treated for 11 weeks. The study participants were
randomized between two therapy groups: experimental group therapy program (case; new
therapy group) and community psychosocial treatment (control; old therapy group). An
objective of the study was to compare the feasibility and efficacy of these two treatments
using the Childrens Depression Rating Scale Revised total score (CDRS-Rts). The
Childrens Depression Rating Scale, revised version (CDRS-R), is a clinician-rated
instrument for the diagnosis of childhood depression and the assessment of the severity of
depression in children 6-12 years of age (Poznanki et al., 1979, 1984). The CDRS-R
consists of 17 clinician rated items, with 4 items based on the childs self-report or
reports from the parents or teachers and three items based on the childs nonverbal
behavior during the interviews. The CDRS-R provides more reliable depression ratings
compared to the other children depression rating scales, since it collects information from
more sources through interviewing the child, parents or school teachers, independently,
as well as it considers the childs behavior during the interview, and lengthens scales to
capture slight differences of symptomatology. On the basis of clinical experience, a
CDRS-Rts of below 40, 40-60, and above 60 corresponds to none to mild, moderate, and
severe depression, respectively (Poznanki et al., 1979, 1984; Ying et al., 2006). Thus, the
fact that the CDRS-Rts drops significantly over the course of the study implies an
effectiveness of a treatment. To record paired data of this study, two measurements were
taken from the same subjects. The paired data were constituted by observed values of
CDRS-Rts at week 0 (baseline) and week 11 (endpoint).
In this medical study, main research problems are to test differences between
distributions of the two therapy groups as well as to detect treatment effects within each
89
(i.i.d.) pairs of observations within a subject j from sample i, where i 1, 2 are referred to
as treatments; j
classic one-sample tests for paired data, e.g. the paired t-test and the Wilcoxon signed
90
, where
. Note that {
functions, say,
( ) and
denotes a within-pair
and
} and
with distribution
effect with tests for symmetry (e.g., Wilcoxon, 1945). Note that the KolmogorovSmirnov test is a known procedure to compare distributions of populations, whereas the
standard testing procedures such as the paired t-test, the sign test, and the Wilcoxon
signed rank test can be applied to the symmetric problem, i.e., to test for
(
( )
well as detecting treatment effects may be based on multiple hypotheses tests. To this end,
one can create relevant tests combining, for example, the Kolmogorov-Smirnov test and
the Wilcoxon signed rank test. The use of the classical procedures commonly requires
complex considerations to combine the known nonparametric tests. Alternatively, we will
develop a direct distribution-free method for analyzing the two-sample problems. The
proposed method can be easily applied to test nonparametrically for different composite
hypotheses. The proposed approach approximates nonparametrically most powerful
Neyman-Pearson test-rules, providing efficiency of the proposed procedures.
When parametric forms of the relevant distributions are known, corresponding
parametric likelihood ratios can be easily applied to test for the problems mentioned
91
above. According to the Neyman-Pearson lemma, the parametric likelihood ratio tests are
optimal decision rules (e.g., Lehmann and Romano, 2005). We propose to approximate
corresponding likelihood ratios using an empirical likelihood (EL) concept. The EL
methodology has been addressed in the statistical literature as one of powerful
nonparametric techniques (e.g., Owen, 1988, 1991, 2001). The EL methodology allows
researchers to use distribution-free procedures with efficient characteristics that are
asymptotically close to those of related parametric likelihood approaches (e.g., Lazar and
Mykand, 1998). The EL approach is developed via terms of cumulative distribution
functions (e.g., Owen, 2001). Vexler and Yu (2011) demonstrated that the classical EL
method based on distribution functions is well suitable for testing parameters; however,
the EL technique based on density functions performs more efficiently to test for
distributions. To approximate Neyman-Pearson test statistics, Vexler and Gurevich (2010)
and Gurevich and Vexler (2011) proposed to focus on the density-based EL,
, where
} and
(
()
( ) ),
approximate values of
of the constraint ( )
}. In this case,
We extend and adapt the density-based EL approach for the two-sample testing issues
carrying out multiple testing problems in paired data settings. Despite the fact that many
statistical inference procedures have been developed for two-sample problems, to our
knowledge, relevant nonparametric likelihood techniques to deal with the presented twosample issues based on paired data have not been well addressed in the literature. The
92
proposed density-based EL tests are exact that ensures accurate computations of relevant
p-values based on data with small sample sizes.
The rest of this chapter is organized as follows. In Section 5.2, we address the purpose
of each testing hypothesis considered in this chapter, and then we develop corresponding
density-based EL ratio test statistics. The theoretical results will be presented to show the
asymptotic consistency of the proposed tests. To evaluate the proposed approaches,
extensive Monte Carlo studies are carried out in Section 5.3. An application to analyze
the CDRS-Rts data is presented in Section 5.4. In Section 5.5, we provide some
concluding remarks.
5.2
5.2.1
HYPOTHESES SETTING
To test for equality of the distribution of the new therapy group and the control therapy
group based on paired observations {
} and {
the hypotheses,
vs.
In order to incorporate evaluation of the treatment effect on each therapy group, we point
out three tests related to the null hypothesis, 1) the equality of the distributions of two
therapy groups, and 2) no treatment effect in each group. This can be presented by
)
( )
, and 2)
), for all
i.e.,
or
( )
93
(
,
) or
). Against
, and
( )
, we
, where
(
);
(ii)
One asserts that both therapy groups have the same treatment effect. In this case,
and
, a
( )
for all
(
(
or ( )
for i=1 or 2 (i.e. not
( )
;
( )
(
;
),
)
(
);
)
( )
Let Test 1, Test 2, and Test 3, refer to the hypothesis tests for the composite hypotheses
vs.
vs.
, and
vs.
, respectively.
TEST 1:
vs.
), for all
) vs.
, i 1, 2; j
by
(
(
94
)
)
.
, is given
where
( ) ),
(
and
( ))
, i 1, 2;
(
as well as
( )
( )) ,
( )
is a density function
),
( )) ,
( )
( )
} and {
} ,
respectively. The main novelty of the proposed method for developing the nonparametric
test statistic is that we modify the maximum EL concept to obtain directly estimated
, maximizing
values of
To obtain the associated empirical constraint, we utilize the fact that the values of
should be restricted by the equation
( )
( )
( )
, we have
(
( )
( )
( )
( )
))
) ))
(
is assumed
denoted as
))
) ))
,
95
(
( )
where
))
[ (
) ))
function.
By virtue of Proposition 2.1 in Vexler and Gurevich (2010), we have that for all integer
,
(
( )
( )
( )
where
( )
if
( )
( )
( )
and
( )
if
Since
( )
( )
( )
(
(
, when
)
)
as
( )
, and
.
Fn1 (u) n1
I Z
n1
j 1
1j
u , the empirical version of the equation (5.3) then has the form
of
(
( )
))
96
( ))
))
))
))
( ) )]
This leads to
(
(
(
( )
)
)
)(
Now, by the equations (5.1), (5.2), and (5.5), the resulting empirical constraint for values
of
is
(
To find values of
)(
where
)(
where
( ),
if
, and
),
if
can be
formulated by
values of the integer parameter m. Attending to this issue, we eliminate the dependence
97
on the integer parameter m. Towards this end, we utilize the maximum EL concept in a
similar manner to arguments proposed in Vexler and Gurevich (2010). Thus, the
modified test statistic can be written as
(
is
where
(
))
) )),
is defined in (5.2).
Finally, the proposed test statistic for Test 1 has the form of
if
(
where
is a test threshold. Proposition 5.1 in Section 5.2.3 will demonstrate that the
proposed test
bounds for the integer parameters m and k in definitions of (5.7) and (5.8), respectively,
were selected to provide the asymptotic consistency. Note that, to test for the composite
98
hypotheses
vs.
the Kolmogorov-Smirnov test and the Wilcoxon signed rank test can be applied (see, for
example, Section 5.3.2). Alternatively, the test (5.9) uses measurements from the therapy
groups, in an approximate Neyman-Person manner, providing a simple procedure to
evaluate the treatment effect on each therapy group. Section 5.3 shows, in various
situations, the test (5.9) is superior to the combinations of the classic tests based on
Kolmogorov-Smirnov and Wilcoxon procedures. It is also shown that the proposed
nonparametric test has power comparable with that of correct parametric likelihood ratio
tests. Thus, in contexts of the study described in Section 5.1, the direct application of the
density-based EL ratio test (5.9) provides an efficient evaluation of treatment effects with
ADHD and SMD in children.
5.2.2.2
TEST 2:
vs.
( )
),
( )
)
(
).
vs.
can be defined as
where
corresponding ratio
can be
form of
(
where
{ [ (
(
(
( ) )]
[ (
))
))
))
))
( ) )]
where
where
( ),
if
, and
),
if
are
100
Finally, taking into account (5.10) and (5.14), the proposed test statistic for Test 2 can be
constructed as
(
In this case, the decision rule developed for Test 2 is to reject the null hypothesis if
(
where
5.2.2.3
is a test threshold.
TEST 3:
vs.
vs.
( )
)
).
where
( ))
derive values of
( ))
( ))
and
( ))
and
, respectively, as well as
} and {
101
( )
Test 3 is
(
where
(
) )]
[ (
))
))
))
Thus, the decision rule for Test 3 is to reject the null hypothesis if
(
where
is a test threshold.
))
and (
( )
( )
and under
and
))
. Let
)) . Then under
,(
,
(
{
as
vs.
.
102
(
(
)
)}
)
(
(
)
)}
)
as
( ))
, and under
and (
(
))
. Then under
,
(
(
)
)
)}
Proof. We omit the proof, since it is similar to the proof of Proposition 5.1.
( ) and
( )
(
))
)
( )
( )
(
)) as
)) , where the
Thus, the distributions of the proposed test statistics are independent of the distributions
of observations and hence, the critical values of the proposed tests can be exactly
computed. For each proposed test, we conducted the following procedures to determine
the critical values,
and
from
the standard normal distribution N(0,1) and then calculated the test statistics
corresponding to each proposed test. At each sample size
and
generated values of the test statistics (5.9), (5.15), and (5.17), with
, we obtained 50,000
tabulating the
critical values for the null distributions of the test statistics at the significance levels
0.01; 0.05; 0.1 (see, Table 5.2).
103
Table 5.2: The critical values for Test 1 by (5.9) (Test 2 by (5.15)) [Test 3 by (5.17)] with =0.1 for
different sample sizes ( , ) and significance levels
0.01
10
7.44 (5.75) [4.17]
15
7.18 (5.85) [4.18]
20
25
7.39 (6.02) [4.18] 7.33 (6.17) [4.16]
30
7.45 (6.33) [4.25]
35
7.47 (6.20) [4.42]
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
10
15
20
25
30
35
0.01
0.05
0.1
Remark 5.1: The definitions (5.9), (5.15), and (5.17) of the proposed test statistic
include
). We set up
values of , we conducted an extensive Monte Carlo study. The Monte Carlo powers of
the proposed tests were not found to be significantly dependent on values of
These experimental results are similar to those shown in Gurevich and Vexler (2011).
104
).
5.3
SIMULATION STUDY
In this section, we examine the power properties of the proposed tests in various cases
using Monte Carlo simulations. The proposed tests based on (5.9), (5.15), and (5.17) with
=0.1, are compared with the common test procedures: the maximum likelihood ratio
(MLR) tests, assuming parametric conditions on distributions of observations (for details
of the constructions and definitions of the MLR tests, see Appendix A.3.3); combined
classic nonparametric tests with a structure based on the Wilcoxon signed rank test or/and
the Kolmogorov-Smirnov test. We fixed the significance level of the tests to be 0.05 in
all considered cases.
based on N(0,1)-
distributed observations Z. To study the powers of the tests, 10,000 samples for each size
( ,
) were generated from a variety of distributions. Tables 5.3-5.5 depict the Monte
Carlo powers of the proposed tests and those of the corresponding MLR tests.
When observations are normally distributed, as anticipated, the MLR tests would be
more powerful than the proposed nonparametric tests. The tables show the powers of the
proposed tests are very close to those of the MLR tests, demonstrating that the densitybased EL tests are comparable to the parametric methods that utilize the correct
information regarding distributions of observations. Table 5.6 displays the actual type I
105
errors of the MLR tests under the misspecification of underlying distributions, i.e., when
observations were simulated from t distributions with different degrees of freedom, a
logistic with parameters (0,1), a Laplace with parameters (0,1), and the Unif[0, 1] under
. As can be seen from Table 5.6, the type I errors of the MLR tests for
vs.
vs.
and
are not under control until the degrees of freedom of the t distribution < 200.
For the cases of the logistic and the Laplace distributions, the type I errors of the MLR
tests are not well controlled. When the observations are from Unif[0,1], the impact of the
misspecification of the model on the type I errors of the MLR tests is more significant.
This illustrates that the considered MLR tests are strongly dependent on assumptions
regarding distributions of observations.
vs.
and
vs.
. For the comparison, we used combined nonparametric tests with the Bonferroni
method. Let W-test denote the Wilcoxon signed rank test and K-S test denote the
Kolmogorov-Smirnov test. The combined nonparametric test for
two W-tests for symmetry and one K-S test based on
for
vs.
consists of
. The former
tests are employed to assess a treatment effect of each therapy group, whereas the latter
106
test is conducted to detect the group difference. Similarly, we performed the combined
nonparametric test for
procedure for
To test
vs.
vs.
that includes one W-test and one K-S test. The classical
vs.
and
vs.
) vs. (
vs.
, we
:
comparisons of the two different testing procedures: the proposed procedures and the
nonparametric testing procedures based on the W-test and/or K-S test using the
Bonferroni approach.
The Monte Carlo outputs shown in Tables 5.7-5.9 indicate that the new tests have
higher powers as compared to the combined nonparametric tests. In particular, for the
cases of small sample sizes (e.g., (
significantly superior to the classic tests. In several cases, the powers of the proposed
tests have values that are 3-4 times larger than those of the combined nonparametric tests.
5.4
DATA ANALYSIS
In this section, we apply the proposed method to the study described in Section 5.1,
which evaluates treatment effects of ADHD and SMD in children. Study subjects were
randomized to receive either the experimental 11 week group therapy program (
or community psychosocial treatment (
the latter as group 2. For each child enrolled in the study, CDRS-Rts was taken at the
baseline (week 0) and endpoint (week 11). Specifically, we computed the differences of
107
CDRS-Rts:
, for i 1, 2; j
, where
receives treatment
represents another
CDRS-Rts at the endpoint after subject receives treatment . The empirical histograms
of the CDRS-Rts at baseline and endpoint for each group are shown in Figure 5.1. As can
be seen from Figure 5.1, it appears that both therapy groups have a decline in the CDRSRts after baseline but the decrease in the CDRS-Rts seems to be more significant in the
group 1.
In the context of the studys interests to test a claim that the distributions of the
changes in CDRS-Rts are not equivalent with respect to the therapy groups or at least one
therapy group has a treatment effect, we performed the proposed test 1. In this case, the
observed value of the test statistic by (5.9), with =0.1, is 22.8217 and the corresponding
p-value is 0.00002, indicating the null hypothesis of no group differences and the lack of
treatment effects in both groups is rejected. The combined nonparametric test (the two
W-tests and one K-S test) also rejects the null hypothesis with the p-value 0.000005.
Based on these results, there is strong evidence to reject the null hypothesis.
In addition, to demonstrate an applicability of the proposed tests, we carried out Test 2
that might be appropriate to test an assertion that there is a treatment effect in one group
and no such effect in the other besides a group difference. The observed value of the test
statistic by (5.15), with =0.1, is 11.9370 and the corresponding p-value is 410-5. The
combined nonparametric test (the W-test and one K-S test) with the Bonferroni
method also supports the result to reject the null hypothesis with the p-value of 510-7.
These results show that the proposed procedures are in conjunction with the classic tests,
demonstrating that our proposed tests can be utilized in the ADHD and SMD study.
108
11), (15, 13) from the original dataset. Then we calculated the corresponding test statistic
(
) by (5.9), where =0.1. We repeated this strategy 10,000 times calculating the
proportion of rejections at
percentage of times when
also carried out following the same procedures as described above. The results regarding
the proportion of the rejections of the null hypothesis for each considered test are
provided in Table 5.10.
Table 5.10 demonstrates that the proposed procedures have a larger proportion of the
rejections in comparison with the combined nonparametric tests. In particular, when the
sample sizes are relatively small (e.g., (
proportions of the rejections between two approaches are strongly recognizable. For
example, we select a sample of size 9 from the group 1 and a sample of size 6 from the
group 2. This sub-dataset was tested for the hypotheses
vs.
the result that the nonparametric test based on the two W-tests and one K-S test for
vs.
tests are 0.0617, 0.1050, and 0.9873, respectively), the proposed Test 1, with =0.1, is
statistically significant (p-value=0.0005). Figure 5.2 shows the empirical histograms of
and
from the sub-dataset. All these results indicate that the proposed methods for
Tests 1 and 2 are more sensitive to detect the difference between the null hypothesis and
109
5.5
CONCLUSIONS
In this chapter, we proposed and examined the two-sample density-based EL ratio tests
based on paired observations. While constructing the tests, we used approximations to the
most powerful test statistics with respect to the stated problems, providing efficient
nonparametric procedures. The proposed tests are shown to be exact and simple in
performing. The extensive Monte-Carlo studies confirmed powerful properties of the
proposed tests. We showed that our tests outperform different tests with a structure based
on the Wilcoxon signed rank test and/or the Kolmogorov-Smirnov test, and outperform
the parametric likelihood ratio tests when the underlying distributions are misspecified.
The data example illustrated that the proposed tests can be easily and efficiently used in
practice.
110
Table 5.3: The Monte Carlo powers of Test 1 by (5.9) vs. the MLR test for
vs.
with
) at the significance level
different sample sizes (
.
Proposed
MLR
test at (5.9)
N(0, 1)
N(0.2, 0.252) N(0.1, 0.52)
N(0.5, 1)
10
10
0.1541
0.1579
50
50
0.6232
0.6921
2
2
2
N(1, 1)
N(3, 2 )
N(1, 2 )
N(3, 1.5 )
10
10
0.8245
0.8319
25
25
0.9993
0.9996
N(2.5, 0.82) N(1.5, 0.52)
N(1, 1.52)
N(1.5, 0.62)
10
10
0.7267
0.7351
25
25
0.9956
0.9978
2
2
2
2
N(2, 1.5 )
N(5, 2.5 )
N(0, 3 )
N(3, 1.5 )
10
10
0.9174
0.9199
25
25
0.9998
1.0000
N(0.3, 0.52) N(0.5, 1)
N(0.25, 0.252) N(0.5, 0.52)
10
10
0.2818
0.4497
50
50
0.9764
0.9993
2
2
N(0, 1)
N(0.5, 1)
N(0.5, 1.1 )
N(1, 0.5 )
10
10
0.2450
0.2461
50
50
0.8731
0.9274
N(0.1, 1)
N(0.5, 1)
N(1.5, 1.12)
N(1, 1.32)
10
10
0.1605
0.1732
50
50
0.6476
0.7450
2
N(0.5, 0.5 ) N(1, 1)
N(0, 1)
N(0, 1)
10
10
0.1723
0.1764
50
50
0.7171
0.7942
N(0, 1)
N(0, 1)
N(1.5, 1.12)
N(1, 1.32)
10
10
0.1125
0.1300
50
50
0.4321
0.5531
2
2
N(0, 1)
N(0.5, 1)
N(0.5, 1.2 )
N(1, 0.5 )
10
10
0.2208
0.2230
50
50
0.8348
0.8785
111
Table 5.4: The Monte Carlo powers of Test 2 by (5.15) vs. the MLR test for
) at the significance level
different sample sizes (
.
Proposed
MLR
test at (5.15)
N(0, 0.52) N(0.5, 1) N(0,1)
10 10 0.2325
0.2351
50 50 0.7989
0.8560
2
N(0, 0.5 ) N(1.5, 1) N(0,1)
10 10 0.9564
0.9593
15 15 0.9957
0.9967
N(0, 1)
N(1.5, 1) N(0,1)
10 10 0.877
0.8991
25 25 0.9995
0.9997
2
2
N(0, 1.5 ) N(3, 2 )
N(0,1)
10 10 0.9795
0.9971
15 15 0.9989
1.0000
N(1, 0.52) N(2, 1.52) N(0,1)
10 10 0.5219
0.6194
50 50 0.9975
0.9997
vs.
Table 5.5: The Monte Carlo powers of Test 3 by (5.17) vs. the MLR test for
vs.
) at the significance level
different sample sizes (
.
Proposed
test at
(5.17)
N(1, 1)
N(1.5, 1.52) N(1, 1)
N(1.5, 1.52)
10 10 0.2058
50 50 0.6709
N(1, 1)
N(1.5, 1)
N(1, 1)
N(1.5, 1)
10 10 0.3059
50 50 0.8661
N(0, 0.52)
N(0.6, 1)
N(0, 0.52)
N(0.6, 1)
10 10 0.5922
50 50 0.9974
2
2
N(0.5, 0.25 )
N(1, 1)
N(0.5, 0.25 ) N(1, 1)
10 10 0.5072
50 50 0.9869
N(2.5, 1.252)
N(2, 0.52)
N(2.5, 1.252) N(2, 0.52)
10 10 0.3383
50 50 0.9004
112
with
with
MLR
0.2191
0.7919
0.3235
0.9407
0.6377
0.9996
0.5444
0.9977
0.3509
0.9582
Table 5.6: The Monte Carlo type I errors of the MLR tests
MLR test
for
vs.
Logistic(0,1)
Laplace(0,1)
Unif[0, 1]
MLR test
for
vs.
MLR test
for
vs.
10
50
10
50
0.1599
0.2934
0.1838
0.3260
0.0428
0.0465
10
50
10
50
0.1154
0.1939
0.1322
0.2201
0.0434
0.0483
10
50
10
50
0.0955
0.1447
0.1066
0.1687
0.0458
0.0488
10
50
10
50
0.0507
0.0503
0.0493
0.0506
0.0499
0.0502
10
50
10
50
0.0717
0.0855
0.0759
0.0968
0.0463
0.0508
10
50
10
50
0.1108
0.1446
0.1293
0.1645
0.0438
0.0488
10
10
1.0000
0.9993
1.0000
Logistic(0,1)
Laplace(0,1)
Unif[0, 1]
Table 5.7: The Monte Carlo powers of the proposed test (5.9) vs. the combined nonparametric test
(the two Wilcoxon signed rank tests and one Kolmogorov-Smirnov test) at
.
Proposed
W and K-S
test at (5.9) tests
Exp(1)
Lognorm(0, 22)
N(0, 1)
N(0.5, 1.52)
10 10
0.2238
0.0946
50 50
0.9616
0.6273
2
2
Lognorm(1, 1) Lognorm(1, 0.5 ) N(0,1)
N(1.5, 2 )
10 10
0.3646
0.2754
50 50
0.9953
0.9849
Exp(3)
Lognorm(0, 22)
Gamma(5,1) Gamma(1, 5)
10 10
0.6321
0.5016
2
Gamma(1,10)
N(0,1)
N(0.5,
2
)
( )
10 10
0.3815
0.1179
50 50
1.0000
0.8576
Exp(1)
Cauchy(1,1)
N(0.5, 1)
N(1.5, 22)
10 10
0.1819
0.1255
50 50
0.7928
0.7426
2
Exp(1)
Lognorm(0, 2 )
Unif[-1, 1]
Unif[-1, 1]
10 10
0.2325
0.0836
50 50
0.9981
0.6939
113
Table 5.8: The Monte Carlo powers of the proposed test (5.15) vs. the combined nonparametric
test (the one Wilcoxon signed rank test and one Kolmogorov-Smirnov test) at
.
Proposed
W and K-S
test at (5.15) tests
Exp(3)
N(1.5, 22)
N(0, 1)
10
10
0.5638
0.2876
50
50
0.9995
0.9816
2
Lognorm(1, 1)
Lognorm(1.3, 1.5 ) N(0,1)
10
10
0.6164
0.1279
Exp(1)
Beta(1,1)
N(0, 1)
10
10
0.2256
0.1193
50
50
0.9983
0.8046
Gamma(1,5)
N(0,
1)
( )
10
10
0.5582
0.1454
25
25
0.9949
0.9789
Exp(1)
Cauchy(1,1)
N(0, 1)
10
10
0.1613
0.0401
50
50
0.9736
0.2323
Exp(1.5)
N(0.5,1)
Unif[-1, 1]
10
10
0.1677
0.0384
50
50
0.9984
0.3042
Lognorm(1, 0.52) Lognorm(1.1, 0.52) Unif[-1, 1]
10
10
0.4484
0.0808
25
25
0.9963
0.9574
Exp(1.5)
Beta(3,1)
Unif[-1, 1]
10
10
0.1328
0.0934
50
50
0.8774
0.4128
Gamma(2,1)
Unif[-1,
1]
( )
10
10
0.7688
0.4005
Exp(1)
Cauchy(1,1)
Unif[-1, 1]
10
10
0.4052
0.0608
Exp(3)
N(1.5, 22)
10
10
0.4198
0.2553
Lognorm(1, 1)
Lognorm(1.2, 1)
10
10
0.2892
0.0657
50
50
0.8807
0.6635
Exp(1.5)
Beta(2,1)
10
10
0.2204
0.0648
50
50
0.9997
0.4062
Gamma(10,1)
( )
10
10
0.9044
0.6699
Exp(1)
Cauchy(1,1)
10
10
0.0792
0.0316
50
50
0.2891
0.0877
114
Table 5.9: The Monte Carlo powers of the proposed test (5.17) vs. the Wilcoxon signed rank test
at the significance level
.
Proposed
test at (5.17)
Exp(1)
Lognorm(0, 22)
Exp(1)
Lognorm(0, 22)
10 10
0.4136
50 50
0.9992
2
2
Lognorm(1, 1) Lognorm(1, 0.5 ) Lognorm(1, 1) Lognorm(1, 0.5 )
10 10
0.1218
50 50
0.7906
Gamma(5,1)
Gamma(1, 5)
Gamma(5,1)
Gamma(1, 5)
10 10
0.1218
50 50
0.8074
Gamma(1,10)
Gamma(1,10)
( )
( )
10 10
0.3125
50 50
0.9629
Beta(0, 0.8)
Exp(1.5)
Beta(0, 0.8)
Exp(1.5)
10 10
0.2094
50 50
0.9928
Table 5.10: The proportions of rejectionsa based on the bootstrap method for each
considered test.
Bootstrapped Test 1
Test 2
sample sizes
Proposed
Classic testb Proposed
Classic testc
(
)
test (5.9)
test (5.15)
(9, 6)
0.9858
0.7135
0.9755
0.9134
(9, 7)
0.9870
0.7172
0.9800
0.9165
(11, 9)
0.9989
0.9795
0.9955
0.9844
(13, 10)
0.9998
0.9962
0.9993
0.9972
(13, 11)
0.9999
0.9965
0.9997
0.9975
(15, 13)
1.0000
0.9997
0.9999
0.9994
a. The proportion of rejections of each test from the bootstrap method was computed based
on sample sizes (
) and 10,000 replications; b. The combined classic test for
vs.
is based on two W-tests and one K-S test; c. The combined classic test for
vs.
is
based on one W-test and one K-S test.
115
W test
0.2933
0.9223
0.0731
0.2208
0.0886
0.2294
0.2344
0.8517
0.1518
0.6003
Figure 5.1: Histograms of CDRS-Rts related to the baseline and endpoint in group 1: (
), and in group 2: (
).
and
)=(9, 6), that were sampled from the original data set.
116
CHAPTER 6
OPTIMAL PROPERTIES OF PARAMETRIC
SHIRYAEV-ROBERTS STATISTICAL CONTROL
PROCEDURES
6.1
INTRODUCTION
In this chapter, we study parametric Shiryaev-Roberts type procedures that can be applied
to key problems of the statistical process control issues that include retrospective and
sequential change point detection problems. Considerations of these problems are very
important in the context of quality and reliability controls, special topics of statistical
inference, as well as in experimental and mathematical sciences (e.g., Lai, 1995;
Gurevich and Vexler, 2010).
Firstly, we outline a main principle of the proof related to the Neyman-Pearson
fundamental lemma (e.g., Vexler and Gurevich, 2011). To this end, let us define [0,1]
and A , B to be any real numbers. Then, it is clear that
A BI A B 0 ,
where I is the indicator function. This inequality can be easily applied to evaluate
optimal properties of decision rules. For example, consider the simple classification
problem, where given a sample of k independent and identically distributed observations
X 1 ,..., X k , we want to test the hypothesis
117
Here F0 and F1 are known distributions with the density functions f 0 x and f1 x ,
respectively. The inequality (6.1) determines that the most powerful test for (6.2) is the
likelihood ratio test that rejects H 0 if
threshold.
This
classical
i1 f1 X i / i1 f0 X i C , where C
k
proposition
directly
follows
from
is a fixed
(6.1),
when
observed sample, and the expectation, under H 0 , is derived from both the sides of (6.1).
Although the example mentioned above is very simple, the inequaltiy (6.1) can be
applied to show different aspects of optimality related to operating characteristics of
complex test-statistics. In this chapter, we utilize the trivial inequality (6.1) to provide
simple proofs of non-asymptotic optimal properties of retrospective Shiryaev-Roberts
procedures. We consider situations related to the retrospective change point detections,
proposing accordingly adjusted forms of the retrospective Shiryaev-Roberts procedure.
The problem to detect more than one change-point is also analyzed in this chapter. The
real data example is provided to demonstrate applicability of the proposed approach in
practice. Considering sequential change point problems, we show that any given
sequential test can be evaluated via an application of (6.1) type inequalities that provide
optimal properties of this test, but explanations of this optimality corresponding to the
classical operating characteristics of tests are complicated tasks. The presented analysis
of the sequential Shiryaev-Roberts procedure and its non-asymptotic optimal property
clearly demonstrate this issue.
All sections of this chapter are supplied with brief introductions related to the
corresponding problem statements. In Section 6.2, we consider the retrospective AMOC
118
(at most one change) change point problem and review the techniques addressed in the
literature. Theoretical results, which show a non-asymptotic optimal property of the
retrospective Shiryaev-Roberts procedures, are also presented. In Section 6.3, we propose
and analyze in details adjusted forms of the retrospective Shiryaev-Roberts procedure for
detecting two changes in distributions of independent observations. This section clearly
demonstrates how this procedure can be adapted to be used for the multiple changepoints detection. A real data example introduced in Section 6.3 demonstrates that the
proposed generalized Shiryaev-Roberts procedures can be easily applied in practice.
Section 6.4 contributes results related to sequential change point problems. We outline
here the proof of a non-asymptotic optimality of the sequential Shiryaev-Roberts
procedure. We present main conclusions in Section 6.5.
6.2
119
notation related to hypothesis testing, when we want to test the null hypothesis:
H 0 : g i f 0 for all i 1,...,n .
proposed decision rules based on likelihood ratios and recursive residuals; Gombay and
Horvath (1994) considered the general case, defining density functions f0 u f u;0 ,
f1 u f u;1 , 0 1 , where the parameters 0 , 1 are unknown. They suggested to use
sup i 1 f xi ;0 sup j k f x j ;1
sup i 1 f xi ;0
n
The Gombay and Horvaths test rule is to reject H 0 for large values of Z n .
Following the aims of this chapter, let us begin with a consideration related to a simple
situation, where density functions f 0 and f1 are known. In this case, the maximum
likelihood estimation of the change point parameter employed in the likelihood ratio
i k
where C 0 is a threshold. Optimal properties of the CUSUM procedure have not been
addressed in the retrospective change point literature. Vexler (2006) and Vexler and
Gurevich (2011) showed the following non-asymptotic optimal property of the test (6.5)
for the problem (6.3). Let Pk and E k ( k 0,...,n ) respectively denote probability and
121
1
1
1
Rn C I Rn C Rn C .
n
n
n
k 1
n
n
1 n
1 n
... I 1 f 0 xi f1 xi dxi Pk 1 ,
n k 1
n k 1
i 1
i k
i 1
the derivations of H 0 -expectation applied to both the left and right side of (6.6) provide
the next proposition.
Proposition 6.1. The test (6.5) is the average most powerful test, i.e.
1 n 1
1
1 n
Pk Rn C CP0 Rn C Pk rejects H0 CP0 rejects H0 ,
n k 1 n
n
n k 1
(6.5) following the mixture methodology. That is, we choose a prior ( ) and pretend
that ~ ( ) . Hence, the mixture Shiryaev-Roberts type statistic has the form of
122
1 (1) 1 n n f1 X i ;
Rn
d ( ) .
n
n k 1 i k f 0 X i
This definition provides to show the following property of the adapted change point
detection scheme:
1 n
1
k
n k 1
n
1 n
Pk rejects H0 X j jk are from f1 X i ; d ( ) CP0 rejects H0 ,
n k 1
for any decision rule [0,1] based on the observations X 1 ,..., X n . This optimality is
again obtained using the inequality (6.1). In this case, the meaning of optimality
mentioned in Proposition 6.1 is modified to be integrated over values of the unknown
parameter with respect to the function .
The different approach for the case, where f0 u f u;0 , f1 u f u;1 , 0 1 ,
is to adapt the CUSUM and Shiryaev-Roberts tests to be the next rules: to reject H 0 if
max k C1 ,
1 n
k C2 ,
n k 2
2 k n
and
respectively, where C1, C2 0 are thresholds and the ratios k , k 2,..., n, are denoted in
(6.4). Gurevich and Vexler (2010) conducted an extensive Monte Carlo study to compare
various change point procedures. The powers of the modified CUSUM test (6.7) and the
modified Shiryaev-Roberts test (6.8) were compared for different families of the null-and
the alternative-distributions. It was shown that the test (6.8) is more powerful (not just
more powerful in average) than the test (6.7) in most of considered scenarios. However,
123
when the change point location is relatively very close to 1, the test (6.7) is better than
the test (6.8). Monte Carlo experiments presented in Gurevich and Vexler (2010) also
confirmed that the modified Shiryaev-Roberts test statistic (6.8) is usually more robust
than the CUSUM test statistic (6.7) with respect to misclassifications regarding the data
distributions.
Remark 6.2: Gurevich and Vexler (2010) proposed distribution-free forms of the
CUSUM and Shiryaev-Roberts procedures approximating nonparametrically the
likelihood ratio's components of (6.7) and (6.8). Gurevich and Vexler (2010) used Monte
Carlo studies to show that comparisons of the nonparametric CUSUM and ShiryaevRoberts tests give the results similar to those of that related to the parametric tests
comparisons. The nonparametric form of the Shiryaev-Roberts test is more powerful (and
always more powerful in average) than that of the CUSUM test in most of scenarios.
The Shiryaev-Roberts procedure (6.5) can be easily adapted to be a multiple changepoints detection procedure. In the next section, we propose an extended Shiryaev-Roberts
procedure for the two change-points detection problem, presenting in details its nonasymptotic properties. We demonstrate an application of the proposed procedure in this
section to a real data example.
6.3
versus
H1 : g1 ... g1 1 f1; g1 ... g 2 1 f 2 ; g 2 ... g n f3 ,
124
1 (2) 1 n n
Rn
n
n k1 1 k2 k1
i 1
k2 1
j k1
n
l k2
f1 ( X i ) f 2 ( X j ) f 3 ( X l )
f0 ( X i )
i 1
Then, we reject H 0 if
1 (2)
Rn C ,
n
C
k ,k n
Pk ,k rejects H 0 ,
n k1 1 k2 k1 1 2 n
n k1 1 k2 k1 1 2
for any decision rule [0, 1] with fixed PH0 rejects H 0 = based on the observations
X 1 ,..., X n .
Proof. The corresponding proof scheme is similar to that of Proposition 6.1. That is,
using Equation (6.1) with A Rn(2) / n and B C , we can write
125
1 (2)
1 (2)
n Rn C I n Rn C 0 .
1
1
1
1
EH0 Rn(2) I Rn(2) C C EH0 I Rn(2) C EH0 Rn(2) C EH0 .
n
n
It is enough to note that utilizing Equation (6.12), we can complete the proof, since
EH 0
k2 1
n
k1 1
f
(
X
)
f
(
X
)
f3 ( X l )
1
i
2
j
n
n
i 1
j k1
l k2
Rn(2) EH 0
n
k1 1 k2 k1
f0 ( X i )
i 1
k1 1
k 1 k k
1
i 1
k2 1
j k1
n
l k2
f1 ( X i ) f 2 ( X j ) f 3 ( X l )
f0 ( X i )
i 1
i 1
f 0 ( X i ) dxi
i 1
k1 1 k2 k1
n
k1 1
k2 1
i 1
j k1
l k2
i 1
I ( 1) f1( X i ) f 2 ( X j ) f3 ( X l ) dxi
Pk ,k 1 Pk ,k rejects H0 .
k1 1 k2 k1
k1 1 k2 k1
126
k1 1
1 (3) 1 n n
Rn
n
n k1 1 k2 1
i 1
k2 1
j k1
n
l k2
f1 ( X i ;1 ) f 2 ( X j ; 2 ) f 3 ( X l ;3 )
i 1
f 0 ( X i ;0 )
d (1 , 2 ,3 ) ,
where 0 arg max f0 ( X i ; ) is the maximum likelihood estimator, under the null
i 1
(6.13), the test (6.15) is the average integrated most powerful test with respect to a prior
(1 ,2 ,3 ) for a fixed estimate of the significance level
n
1 n n
1
n k1 1 k2 k1
n
1 n n
Pk ,k rejects H 0 d (1,2 ,3 ) C PH0 rejects H 0 .
n k1 1 k2 k1 1 2
127
parameters j , j 1,2,3 , under the alternative hypothesis, are normal densities, i.e.
j ~ N ( j , j 2 ), j 1,2,3 . Then the mixture Shiryaev-Roberts statistic (6.14) is
12
12
1 k 1 k 1 (k1 1)
1 2
n
n
1 (3)
Rn
n
( k1 1)
2
( k2 k1 )
22
2
32
22
32
(k2 k1 )
(n k2 1)
( 0 ) n exp ( xi x0 ) 2 2 02
i 1
( n k2 1)
2
Ak ,k
exp 1 2
2
where
Ak1 ,k2
( x1 1 )2
( x2 2 )2
( x3 3 )2
, and
k1 1
k2 1
i 1
i 1
j k1
x0 xi n , x1 xi (k1 1), x2
xj
(k2 k1 ), x3
xl
l k2
(n k2 1) .
k2 1
1 (4) 1 n n 1 i 1
Rn
n
n k1 1 k2 k1
2 j k1
sup f 0 ( X i ;0 )
3 l k2
0 i 1
128
where 0
( X i 0 )2
i 1
k2 1
Xj
j k1
k2 k1
( k1 1)
?1 2
, 0
Xi
i 1
( X 3 )2
k2
n k2 1
( k2 k1 )
2
2
( n k2 1)
?3 2
; 1
( X i 1 )2
i 1
k1 1
k1 1
; 3
n
2
k2 1
k1 1
, 1
Xi
i 1
k1 1
; 2
( X j 2 )2
j k1
k2 k1
, 3
X
k2
n k2 1
Figure 6.1: The left-hand side of the Figure 6.1 shows plots and histograms of %TS data;
the right-hand side of the Figure 6.1 shows plots and histograms of TIBC data.
Tian et al. (2011) categorized the study subjects into three groups based on the results
of ferritin concentration that provides a useful screening test for iron deficiency anemia
(IDA). Non-pregnant women with anemia and a ferritin concentration less than 20
( g / L) were assigned to the IDA group, while those with anemia and a ferritin
concentration greater than 240 ( g / L) were assigned to be in the anemia of chronic
disease (ACD) group. The intermediate group consists of the women with ferritin
concentration between 20 and 240 ( g / L) . There were 29, 14, 12 female anemia
patients in IDA, intermediate, ACD groups, respectively. The histograms of the %TC
data and those of the TIBC measurements in each group are shown in Figures 6.2 and 6.3,
respectively.
130
Our interest is to detect if the underlying distributions of the %TC data as well as the
distributions of the TIBC measurements change at two different points. In this section,
131
we formally test for the assumption made by Tian et al. (2011) that suggested to consider
the %TS measurements as three groups as well as the TIBC measurements splitted into
the three groups, i.e. there are two change points in the distributions of the %TS and also
two change points in the TIBC measurements distributions.
Following the publications mentioned above in this section, we assume the %TS and
TIBC data distributed normally. Thus, we apply the test based on the statistic (6.18). The
mean and standard deviation of the %TS data are 4.55 and 2.59, respectively, whereas the
mean and standard deviation of the TIBC observations are 345 and 120 ( g / L) ,
respectively. The means and standard deviations of the %TS and TIBC data in each
group are presented in Table 6.1.
Table 6.1: Means and standard deviations of the %TS data and the TIBC data in each
group
Group
IDA
intermediate
ACD
Sample size n
29
14
12
Mean
3.5276
5.0714
5.7500
Standard deviation
1.8820
2.5859
2.0505
To approximate the p-value of the test (6.17), where the statistic Rn(4) / n is defined by
(6.18), we propose the following methods.
6.3.2.1 THE METHODS FOR P-VALUE APPROXIMATION REALTED TO
TEST (6.17)
In this section, we propose and apply two different methods for the p-value
approximation related to the test (6.17) with the statistic Rn(4) / n by (6.18).
1) The Monte Carlo technique. Since, given that observations follow a normal
distribution, the construction of the test statistic (6.18), under the null hypothesis, does
not depend on parameters 0 and 02 of the null normal distribution, we can conduct the
132
Monte Carlo study to obtain the p-value of the test. To execute the Monte Carlo
experiment, we first drew 50,000 replicate samples of 55 observations X i ~ N 0,1 ,
i 1,...,55 , and evaluated the generated values of the test statistic, say, rj Rn(4)
55 / 55 at
one generation of X1,..., X 55 , j 1,...,50,000 . Let r be the observed test statistic value
based on the data. Then we determined the approximate p-value of the test as the
proportion of cases when values of r j , j 1,...,50,000 , exceed the value of r . Following
the procedures mentioned above, we obtained the p-value of 0.0244 based on the %TS
data and the p-value close to zero based on the TIBC measurements (p-value < 0.0001).
Both the p-values are less than the significance level of 0.05 ; therefore, we
recommend to reject the null hypothesis, implying that there are changes at two points in
both the %TS and TIBC data distributions.
2) Bootstrap calibration. The procedure of the bootstrap calibration (e.g., Owen, 2001) is
defined as follows. Let X i*b , i 1,..., n , b 1,..., B , be independent random vectors
sampled from the empirical distribution function Fn of the data X i , i 1,..., n . This
resampling can be implemented by drawing n random integers (i, b) independently from
the uniform distribution Unif [1, n], and setting X i*b X
( i ,b )
. We use n 55 and
*b
*b
B 10,000 . Now let Hb Rn(4)
55 ( X1 ,..., X 55 ) / 55 . This defines the order statistics
H (1) H (2) ... H ( B) . Then, the critical value of the test at the significance level of
and (1 q n) approximates the p-value. The bootstrap procedure gives the corresponding
p-values based on the %TS data and the TIBC measurements as 0.003 and 0.0001,
133
respectively. Both the p-values are less than the significance level of 0.05 , supporting
the conclusion that the underlying distributions of the %TC and TIBC measurements
have significant changes at two different points.
Therefore, both methods 1) and 2) suggest to reject the null hypothesis. Note that, for
the method 1), it is important that the observations under the null hypothesis are
independent and identically normally distributed, whereas for method 2), the observations,
under the null hypothesis, are assumed to be just independent and identically distributed
(i.i.d.). Consequently, in the case where data are close to be normally distributed, the type
I errors of method 2) is expected to be very close to those using method 1).
6.3.2.2 ADDITIONAL STUDY
In this subsection, we consider a situation when no change is expected in the real data
distributions. Now we test the hypotheses (6.9) based on the %TS observations in the
IDA group (n=29). By using the Monte Carlo study and the bootstrap calibration as
mentioned above, we obtain that the corresponding p-values are 0.4118 and 0.1136,
respectively. These results suggest that the distribution of the %TS data in the IDA group
has no significant change in this case. Similarly, by applying the Monte Carlo study and
the bootstrap calibration, the p-values based on the TIBC observations in the IDA group,
are 0.5376 and 0.6533, respectively. These p-values indicate that there is no significant
change in the distribution of the TIBC data in the IDA group in this case.
6.4
There are extensive references in the statistics and engineering literature on the subject of
quick detection, with low false alarm rate, of changes in stochastic systems on the basis
134
of sequential observations from the system. These problems are very important in the
context of quality and reliability controls (e.g., Lai, 1995).
In many common situations, we assume that we survey sequentially independent
observations X 1 , X 2 ,... . Initially, the observations follow an in-control distribution with a
density function f 0 . It is possible that at -time, an unknown point in time, an accident is
in effect, causing the distribution of the observations to change to an out-of-control
distribution with a density function f1 .
A common performance measure for any inspection scheme is the in-control average
run length (ARL). Let T be the random variable corresponding to the time when the
scheme signals that the process is out of control (distribution of the observations has
changed), which henceforth will be referred to as the stopping time. Thus, T is the
number of observations until the alarm signal. The in-control ARL is defined by E f0 T ,
whereas the out-of-control ARL is defined by E f1 T . Additionally, we define by E f T
the expectation of the stopping time T under the assumption that the observations come
from a distribution with a density function f . Clearly, one desires E f0 T to be large and
E f1 T
E T 1 T . The latter is the expectation of the delay in detection given that the
change is at point in time, and given that the stopping time T is larger than .
In this section, we consider the observations X 1 , X 2 ,..., X 1 to be distributed according
to a density function f 0 , whereas X , X 1 ,... from a density function f1 , with an
unknown (1 ) . The case indicates the situation, when all observations
are distributed according to f 0 . In this case, the notations P and E denote probability
135
and expectation, respectively, when all observations are distributed according to f 0 . The
sequential change point detection procedures are assumed to raise an alarm as soon as
possible after the change, avoiding false alarms. It is well known that CUSUM and
Shiryaev-Roberts procedures are efficient detection methods for this stated problem (e.g.,
Moustakides, 1986; Mei, 2006; Gurevich and Vexler, 2011). The CUSUM policy is: we
stop sampling of X s and report that a change in distribution of X has been detected at
the first time n 1 that max i k f1 X i / f 0 X i C , for a given threshold C ; similarly,
n
1k n
Rn
k 1 i k
f1 X i
f0 X i
The sequential CUSUM detection procedure has a non-asymptotic optimal property (e.g.,
Moustakides, 1986). That is, if the initial and the final distributions of the observations
are known, then the CUSUM control procedure most rapidly detect a change in
distribution among all procedures with a common bound specifying an acceptable rate of
false alarms, i.e. in-control ARL. For the Shiryaev-Roberts procedure, an asymptotic (as
C ) optimality has been shown (Pollak, 1985). To demonstrate the optimality of the
136
However, in the context of simple application of the inequality (6.1), the procedure (6.19)
declares loss functions for which that detection policy is optimal. That is, setting
A RminTC ,n and B C in (6.1) leads to
min TC , n
C I Rmin TC ,n C 0 ,
min TC , n
C I Rmin TC ,n C
Rk C 1 I TC k Rn C I TC n 0 .
k 1
It is clear that (6.21) can report an optimal property of the detection rule TC .
For simplicity, noting that every summand in the left side of the inequality (6.21) is nonnegative, we can focus only on Rn C I TC n 0 .Thus, if is defined to be a
stopping time and I n, then
E C Rn I n, TC n 0 .
C P TC n P min , TC n Pk TC n Pk min , TC n 0 .
k 1
Therefore,
C P TC n P min , TC n Pk TC n Pk min , TC n 0 ,
n 1
n 1 k 1
where
Pk TC n Pk min ,TC n
n 1 k 1
137
Pk TC n Pk min , TC n
k 1 n k
Ek TC k 1 Ek min , TC k 1
k 1
aIa 0 . The inequality (6.22) with (6.23) gives the next proposition.
En TC n 1 CETC
n 1
n 1
Here, E presents the average run length to false alarm of a stopping rule . Small
values of E are privileged, whereas small values of En n 1 are also
preferable (because En n 1 relates to fallibility of the sequential detection in the
case n ). It is clear that, if , then min , TC detects that faster than the
stopping time TC . Consequently, Proposition 6.4 states the non-asymptotic optimal
property of the Shiryaev-Roberts sequential procedure in the context of series of delays in
the detection, considering the expectation E T 1 as the index of the speed of the
detection.
6.5
CONCLUSIONS
In this chapter, we introduced the general principals related to the retrospective change
point problems. We provided schemes to construct the Shiryaev-Roberts type procedures
corresponding to different change point problems. Although we considered the relatively
simple statement of the problem (6.3), with independent observations, in a similar
138
139
CHAPTER 7
FUTURE WORKS
Based on results of previous studies mentioned above, the newly created densitybased EL methodology has been shown to be very efficient. This was expected, since the
proposed test-statistics approximate nonparametrically the most powerful likelihood ratios.
There remain several open problems in biomedical research that can be investigated via
EL approaches. Thus, my future work will continue along this line of research by
developing different efficient EL ratio based tests that can be applied to important
biomedical studies. Two possible topics of future work are briefly outlined as follows.
141
efficiently capture all possible structures of dependency between two random variables.
The Pearson correlation coefficient is a measure of the strength of the linear relationship
between two random variables (e.g., Pearson, 1920; Hauke and Kossowski, 2011). The
Spearman correlation coefficient is a measure of a monotonic association between two
random variables (e.g., Spearman, 1904a; Hauke and Kossowski, 2011). The correlation
coefficient rs is commonly utilized when assumptions required to use the Pearson
correlation coefficient do not hold. Both the Pearson and the Spearman correlation
coefficients cannot be well served to analyze nonlinear forms of dependence between two
random variables (e.g., Embrechts et al., 2002). Practical data issues state the problems to
develop a general coefficient that can efficiently measure both linear and non-linear
dependencies between random variables. The Kendall correlation coefficient ( ) is a
well-known measure of the concordance between two rankings associated with two sets
of observations (Kendall, 1938, 1948). However, in many cases, the Kendall correlation
coefficient shows relatively lower power comparing to the former two classical measures
r and rs (e.g., Mudholkar and Wilding, 2003). Note that, for example, in the context of a
lung cancer study, Gu et al. (2012) proposed to consider relationships of two
polymorphisms rs1051730 and rs8034191 in random-effect-type forms that should be
tested. In additional to linear/nonlinear dependence structures, the present applied
biostatistical literature introduces random-effect-type associations between two sets of
observations. We plan to develop a simple density-based EL test that can be applicable to
all general cases of dependency, including linear, non-linear, and/or random effect type
associations, between two random variables. Towards this end, we will construct an exact
nonparametric likelihood ratio type test.
142
APPENDICES
A.1.1
Under the normal assumption, the log likelihood function based on the pooled-unpooled
data is in the form of
( |
(
(
)
)
(
(
)
)
))
))
( |
yields
(
(
(
(
)
(
(
)
)
)
)
(
(
)
)
(
(
(
(
(
(
)
)
(
)
)
)
)
Let these first derivatives be equal to zero. Then we can obtain the following system of
equations whose solutions are the maximum likelihood estimators.
143
)
)
To present the matrix , we obtain the second derivatives of the log likelihood function
as follows:
)
)
)
)
(
(
(
(
)
)
)
(
(
)
)
(
)
)
144
)
(
)
(
) ]
) ]
The asymptotic distribution of the maximum likelihood estimators (2.1) can be derived
A.1.2
A.1.2.1
Under the null hypothesis, the empirical likelihood function based on repeated measures
data is
(
145
(
where
is a root of
where
Therefore,
(
(
)
)
Now consider the log empirical likelihood ratio statistic in the form of
(
log [
[ (
146
)]
(
(
)]
)
(
(
)]
)
where
Hence,
(
In a similar manner to Section A1.2.1., we present the empirical likelihood ratio based on
the pooled-unpooled data in the form of
][
(
[
where
and
)] [
are roots of
)
(
respectively.
147
)
(
)]
where
)
(
Consequently,
(2.3).
By applying the Taylor expansion with
) and
) around zeros, we
obtain
(
where
and
[ (
[ (
148
yields
)]
)]
)]
]
(
where
Hence,
log [
)]
log [
)]
A.2.1
Vexler and Gurevich (2010) proved the first equality of Lemma 3.1. Now we obtain the
second equation of Lemma 3.1, using the empirical distribution functions
(
( )
( )
( )
( )
)
(
( )
( )
149
)
)
( )
, we have
( )
( )
)( (
))
) ))
( )
)( (
( )
( )
))
( ) ))
)( (
))
) ))
( )
)( (
( )
( )
))
( ) ))
( )
where
and
A.2.2
distribution based on sample entropy proposed by Mudholkar and Tian (2002). The
density-based EL ratio test statistic is defined as
(
The maximum likelihood estimator of
[(
))
) ]
150
) ] ( (
) (
) (
) (
) )
) ))
) )
is formulated as
))
))
) )
))
))
) )
))
) )
)
) )
(
( ))( ), presented
))
( )
A.2.3
) )},
and
) (
).
The proof of Proposition 3.1 is based on Proposition 2.2 of Vexler and Gurevich (2010).
In order to show the consistency of the proposed test, let us check conditions that are
present in Vexler and Gurevich (2010):
Conditions for Proposition 3.1
(C1) (
( ))
, we define |
(|
as
) [ (
as
( (
, for
)]),
and
containing
| ). Both
| |
)|
and , respectively, as
( ), for all
Let us outline the proof of Proposition 3.1., using the proof scheme of Proposition 2.2. of
Vexler and Gurevich (2010). Consider the statistic
152
))
( (
))
( (
))
(
( (
(
))
)
( (
) , and
)( (
))
))
) ))
) ))),
, where
( )
( )
and
( )
( )
),
( ) as
as
( )),
),
is a non-
. Hence,
)
( )), as
.
Therefore,
(
as
( ( )))
in the form of
( | ))
( | ))
153
( | )))
( )
Since (A.2.1), under
,
(
as
( )))
Since (
( ))
Let
) and
( | ))
( | )))
as well as | |
( | )
| |, where
).
By expanding the third term of (A.2.2) in Taylor series until the first derivative, we
obtain
{
( | ))
( | ))}
, where
(
and
)|
)(
( ), for all
)
,
and , respectively,
{
( | ))
where, |
( | ))}
)| (
154
)(
,|
as
{
Therefore, under
( | ))
Under
,
)
( | ))}
, (A.2.2)-(A.2.5) conclude
)
(
as
( ))
( )
( | )
( | )
( | )
( )
( | )
as
A.2.4
(( [
since [
)
] )
)
(
(
)
(
( [
])
])
)), as
. (Here,
by (3.7)).
Then,
[( [
] )
( [
155
])
( )
We choose
where
( )
Let
( )
( )
))
( ))
))
))
))
alternative hypothesis
( )
( ) )]
under the
( ) can be estimated by
( )
[ (
denote the estimated value of the right-hand side of equation (5.11). It can be
156
)]
{ [ (
))
[ (
))
))
( ) )]
A.3.2
( ) )]
))
in Test 2 is given by
vs.
define
[
)]
)
(
)]
), can be expressed as
)
)
157
We first investigate the first term of the right-hand side of the equation (A.3.4). To this
end, we define the distribution function ( ) to be
( )
( )
( ))(
where
( )
( )
)) and
( )
( )
))
( )
( ))(
where
( )
))
( )
))
and
and
in (A.3.4) can be
reformulated as
158
))
) )]
))
))
))
))
))
))
))
))
(
(
)
)
)
) )]
The first term in the right-hand of the equation (A.3.6) can be expressed as
))
))
(
(
))
(
(
(
(
))
))
))
(
(
))
))
(
(
))
))
where
Let ( )
Suppose
(
(
( )
(
and
))
( )
(
(
))
)(
(
))
))
159
) ))
(mod 2m).
( )
( )
( )(
( )(
(
(
( (
)) (
))
) ).
))
))
(
)
It follows that
) ))
can be written as
(mod 2m).
( )(
so that we have (
and
) ( )
and
i.e.
) (
and
) (
((
) (
(((
( )(
) ( )
)) (
))) (
(
(
))
( ) and
( ) in which ( )
converges in probability to
],
((
) )),
) )).
) (
))
continuity of
,
( )
and
(
where
)) and log((
.
(
as
160
))
,
((
) (
))}
as
converges in probability to {
). Similarly,
) (
((
))},
).
Therefore,
{
as
) (
((
))}
, uniformly over
) (
((
))}
).
))
as
))
( (
))}
).
, uniformly over
Similarly, we have
(
as
(
))
))
(
)
( (
))}
, uniformly over
).
(
(
)
))
)
(
(
))
))
161
))
(
(
)
)
)
(
(
)
)}
)
as
, uniformly over
).
Now, we consider the second term in the right-hand side of the equation (A.3.6). By the
| ( )
( )|
| ( )
( )|
and
(
( )|
| ( )
Hence,
| ( )
implying that
))
)
| ( )
, for
( )|
))
| ( )
. According to
the definition of
(
( )|
(
(
))
))
))
( )|
(
))
).
, we have
(
))
)
)
)
))
162
))
and
].
))
))
(
))
))
))
))
and
))
(
(
))
term in the right-hand side of the equality (A.3.6) converges to zero in probability. That is,
as
))
))
))
))
, uniformly over
Finally, using the result of Lemma 1 of Vasicek (1976), the last term in the right-hand
side of the equality of (A.3.6) also converges to zero in probability. That is,
as
))
) )]
, uniformly over
).
(
163
(
(
)
)}
)
as
(
, uniformly over
).
as
(
(
)
)
)
, we have
)]
, uniformly over
{
as
Hence, under
(
(
(
)
)
)
(
(
)
)}
)
(
(
)
)
))
))
(
(
)
)
))
))
(
(
164
)
)
)
(
(
)
)
(
(
)
)
(
(
)
)}
)
).
and under
(
{
, as
(
(
(
(
)
)}
)
(
(
(
)
)
)
)
)
)
(
(
)
)}
)
{
as
(
(
(
)
)
)
(
(
)
)}
)
.
(
as
, where
is defined by
(5.12), the rest of the proof is similar to the proof shown in Section A.3.2.1 regarding the
test statistic of the proposed Test 1,
). To consider
165
), as
, we
))
( ))
[ (
[ (
( ))
))
( ) )]
))
It is obvious that
( )|
( ))
))
(
and
) )]
))
( ))
) as
(
( ))
as
),
.
(
).
Hence,
(
as
))
( ))
.
,
( ) )]
. Let
as
distributed with (
) (
[ (
))
is used). Then
[ (
))
[ (
( ) )]
))
( ) )]
166
))
( ))
and
(
) (
))
( ) )]
), vanishes to zero as
, we focus
on the remaining terms of the equation (A.3.16), which can be reorganized as follows:
))
))
))
( ) )]
))
( ))
Clearly,
( ))
( ))
and
based on (
( ))
( ))
))
( ))
( ))
( ))
and
( ) )]
, the distribution of
167
),
as
is symmetric under
under
( ))
))
( ))
( ))
, is
can be
taken as the uniform distribution on the interval [-1, 1]. Thus, we obtain
(
( ))
, where
( ))
( )
( ))
{(
(
(
( )}
)
)
)
( )
( ) )]
as
as
[ (
))
as
[ (
))
) )]
as
), where
and
article
168
Hence, the corresponding hypothesis of interest for Test 1 using the maximum likelihood
ratio test is
vs.
: not
is given by
)
(
( )
, and
( )
( )
, and
)
), respectively.
vs.
), where
and
1,2, are
vs.
Thus, the maximum likelihood ratio for Test 2 can be formulated by
)
(
)
,
, and
(
)
, yields the following maximum likelihood ratio test statistic for Test 2:
169
( )
where
( )
( )(
)
and
).
Assume
vs.
), where
and
(
,
)
, and
(
(
).
170
) (
), and
BIBLIOGRAPHY
Albers, W., Kallenberg, W. C. M., and Martini, F. (2001). Data-Driven Rank Tests for
Classes of Tail Alternatives. Journal of American Statistical Association, 96, 685696.
Bahadur, R. R. (1966). A Note on Quantiles in Large Samples, Ann. Math. Statist, 37,
577-580.
Bardsley, W. E. (1980). Note on the use of the inverse Gaussian distribution for wind
energy applications. Journal of Applied Meteorology, 19, 1126-1130.
Barndorff-Nielsen, O. E. (1994). A note on electrical networks and the inverse Gaussian
distribution. Advances in Applied Probability, 26, 63-67.
Bhattacharyya, G., and Fries, A. (1982). Fatigue failure models - Birnbaum-Saunders vs.
inverse Gaussian. IEEE Trans. Reliab., 31, 439441.
Biederman, J. (1998). Attention-deficit/hyperactivity disorder: a life-span perspective.
The Journal of Clinical Psychiatry, 59, 4-16.
Brotman, M. A., Schmajuk, M., Rich, B., Dickstein, D. P., Guyer, A. E., Costello, E. J.,
Egger, H. L., Angold, A., and Leibenluf, E. (2006). Prevalence, clinical correlates
and longitudinal course of severe mood dysregulation in children. Biological
Psychiatry, 60, 991-997.
Canner, P. L. (1975). A simulation study of one-and two-sample Kolmogorov-Smirnov
statistics with a particular weight function. Journal of American Statistical
Association, 70, 209-211.
Carlin, B., and Louis, T. A. (2008). Bayes and Empirical Bayes Methods for Data
Analysis. Chapman & Hall/CRC, New York.
Carlson, G. A. (2007). Who Are the Children with Severe Mood Dysregulation, a.k.a.
"Rages"? American Journal of Psychiartry, 164, 1140-1142.
Carroll, R. J., Roeder, K., and Wasserman, L. (1999). Flexible Parametric Measurement
Error Models. Biometrics, 55, 44-54.
Carroll, R. J., Spiegelman, C. H., Lan, K. K., Bailey, K. T., and Abbott, R. D. (1984). On
errors-in-variables for binary regression models. Biometrika, 71, 19-25.
171
Embrechts, P., McNeil, A., and Straumann, D. (2002). Correlation and dependence in
risk management: properties and pitfalls. In Risk Management: Value at Risk and
Beyond, ed. M.A.H. Dempster, Cambridge University Press, Cambridge, 176-223.
Faraggi, D., Reiser, B., and Schisterman, E. (2003). ROC curve analysis for biomarkers
based on pooled assessments. Statistics in Medicine, 22, 2515-27.
Folks, J. L., and Chhikara, R. S. (1978). The inverse Gaussian distribution and its
statistical applicationa review. Journal of the Royal Statistical Society of Great
Britain, 40, 263-289.
Folks, J. L., and Chhikara, R. S. (1989). The Inverse Gaussian Distribution, Theory,
Methodology and Applications. Marcel Dekker, New York.
Freedman, L. S., Fainberg, V., Kipnis, V., Midthune, D., and Carroll, R. J. (2004). A new
method for dealing with measurement error in explanatory variables of regression
models. Biometrics, 60, 172-181.
Fuller, W. A. (1987). Measurement Error Models. Wiley, New York.
Gombay, E., and Horvath, L. (1994). An application of the maximum likelihood test to
the change-point problem. Stochastic Processes and their Applications, 50, 161-171.
Gombay, E. (2001). U-statistics for Change under Alternatives. Journal of Multivariate
Analysis, 78, 139-158.
Good, P. (2005). Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer,
New York.
Gu, M., Dong, X., Zhang, X., Wang, X., Qi, Y., Yu, J., and Niu, W. (2012). Strong
Association between Two Polymorphisms on 15q25.1 and Lung Cancer Risks: A
Meta-Analysis. PLoS ONE, 7, e37970. DOI: 10.1371/journal.pone.0037970
Gurevich, G. (2006). Nonparametric AMOC change point tests for stochastically ordered
alternatives. Communications in Statistics - Theory and Methods, 35, 887-903.
Gurevich, G. (2007). Retrospective parametric tests for homogeneity of data.
Communications in Statistics-Theory and Methods, 36, 2841-2862.
Gurevich, G., and Vexler, A. (2005). Change point problems in the model of logistic
regression. Journal of Statistical Planning and Inference, 131, 313-331.
Gurevich, G., and Vexler, A. (2010). Retrospective change point detection: from
parametric to distribution free policies. Communications in Statistics-Simulation and
Computation, 39, 899-920.
173
Gurevich, G., and Vexler, A. (2011). Non-asymptotic optimal properties of ShiryaevRoberts statistical control procedures. Proceedings of the 1st International
Symposium & 10th Balkan Conference on Operational Research (BALCOR 2011), 1,
242-246.
Gurevich, G., and Vexler, A. (2011). A two-sample empirical likelihood ratio test based
on samples entropy. Statistics and Computing, 21, 657-670.
Hall, P. (1984). Limit theorems for sums of general functions of m-spacings.
Mathematical Proceedings of the Cambridge Philosophical Society, 96, 517-532.
Hall, P. (1986). On powerful distributional tests on sample spacings. Journal of
Multivariate Analysis, 19, 201-255.
Hall, P., and La Scala, B. (1990). Methodology and algorithms of empirical likelihood.
International Statistical Review, 58, 109-127.
Hasabelnaby, N. A., Ware, J. H., and Fuller, W. A. (1989). Indoor air pollution and
pulmonary performance: investigating errors in exposure assessment (with
comments). Statistics in Medicine, 8, 1109-1126.
Hauke, J., and Kossowksi, T. (20 ). Comparison of values of Pearsons and Spearmans
correlation coeffcienets on the same sets of data. Quaestiones Geographicae, 3, 8793.
Henze, N., and Klar, B. (2002). Goodness-of-Fit Tests for the Inverse Gaussian
Distribution Based on the Empirical Laplace Transform. Annals of the Institute of
Statistical Mathematics, 54, 425-444.
Iyengar, S., and Patwardhan, G. (1988). Recent developments in the inverse Gaussian
distribution. In: Krishnaiah, P.R. and Rao, C.R. (Eds.) Handbooks of Statistics, 7,
479-480.
James, B., James, K. L., and Siegmund, D. (1987). Tests for a change-point. Biometrika,
74, 71-83.
Johnson, N.L., Kotz, S., and Balakrishnan, N. (1994). Continuous Unvariate
Distributions 1 & 2. Wiley, New York.
Kander, Z., and Zacks, S. (1966). Test procedures for possible changes in parameters of
statistical distributions occurring at unknown time points. Annals of Mathematical
Statistics, 37, 1196-1210.
Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30, 81-89.
Kendall, M. G. (1948). Rank Correlation Methods, London, Griffin.
174
Mudholkar, G. S., and Natarajan, R. (2002). The inverse Gaussian models: analogues of
symmetry, skewness and kurtosis. Annals of the Institute of Statistical Mathematics,
54, 138-154.
Mudholkar, G. S., Natarajan, R., and Chaubey, Y. P. (2001). A goodness-of-fit test for
the inverse Gaussian distribution using its independence characterization. Sankhy B,
63, 362-374.
Mudholkar, G.S., and Tian, L. (2001). On the null distribution of entropy tests for the
Gaussian and inverse Gaussian models. Commun. Statist. Theory Methods, 30, 15071520.
Mudholkar, G. S., and Tian L. (2002). An entropy characterization of the inverse
Gaussian distribution and related goodness-of-fit test. Journal of Statistical Planning
and Inference, 102, 211-221.
Mudholkar, G. S., and Tian, L. (2004). A test for homogeneity of ordered means of
inverse Gaussian population. Journal of Statistical Planning and Inference, 118, 3749.
Mudholkar, G. S., and Wang, H. (2007). IG-symmetry and R-symmetry: Interrelations
and applications to the inverse Gaussian theory. Journal of Statistical Planning and
Inference, 137, 3655-3671.
Mudholkar, G. S., and Wilding, G. E. (2003). On the conventional wisdom regarding two
consistent tests of bivariate independence. Journal of the Royal Statistical Society,
Series D, 52, 41-57.
Mumford, S. L., Schisterman, E. F., Vexler, A., and Liu, A. (2006). Pooling
biospecimens and limits of detection: effects on ROC curve analysis. Biostatistics, 7,
585-598.
Nair, J., Ehimare, U., Beitman, B. D., Nair, S. S., Lavin, A. (2006). Clinical review:
evidence-based diagnosis and treatment of ADHD in children. Missouri Medicine,
103, 617-621.
Natarajan, R., and Mudholkar, G. S. (2004). Moment based goodness-of-fit tests for
inverse Gaussian distribution. Technometrics, 46, 339-347.
Obuchowski, N. (2006). An ROC-type measure of diagnostic accuracy when the gold
standard is continuous-scale, Statistics in Medicine, 25, 481-493.
Owen, A. B. (1988). Empirical Likelihood Ratio Confidence Intervals for a Single
Functional. Biometrika, 75, 237-249.
176
177
Schisterman, E. F., and Vexler, A. (2008). To pool or not to pool, from whether to when:
applications of pooling to biospecimens subject to a limit of detection. Pediatric and
Perinatal Epidemiology, 22, 486-496.
Schisterman, E. F., Vexler, A., Mumford, S. L., and Perkins, N. J. (2010). Hybrid pooledunpooled design for cost-efficient measurement of biomarkers. Statistics in Medicine,
29, 597-613.
Schuster, E. F. (1975). Estimating the distribution function of a symmetric distribution.
Biometrika, 62, 631-635.
Sen, A., and Srivastava, M. S. (1975). On tests for detecting change in mean. The Annals
of Statistics, 3, 98-108.
Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New
York.
Seshadri, V. (1993). The Inverse Gaussian Distribution: A Case Study in Exponential
Families. Clarendon Press, Oxford.
Seshadri, V. (1999). The Inverse Gaussian Distribution: Statistical Theory and
Applications. Springer, New York.
Song, K.-S. (2000). Limit theorems for nonparametric sample entropy estimators.
Statistics & Probability Letters, 49, 9-18.
Song, K.-S. (2002). Goodness-of-fit Tests Based on Kullback-Leibler Discrimination
Information. IEEE Transactions on information theory, 48, 1103-1117.
Spearman, C. E. ( 904a), The proof and measurements of association between two things,
American Journal of Psychology, 15, 72-101.
Stefanski, L. A. (1985). The effects of measurement error on parameter estimation.
Biometrika, 72, 583-592.
Stefanski, L. A., and Carroll, R. J. (1987). Conditional scores and optimal scores in
generalized linear measurement-error models. Biometrika, 74, 703-716.
Stefanski, L. A., and Carroll, R. J. (1990). Score Tests in Generalized Linear
Measurement Error Models. Journal of the Royal Statistical Society, Series B
(Methodological), 52, 345-359.
Stigler, S. M. (1977). Do robust estimators work with real data? The Annals of Statistics,
5, 1055-1098.
178
179
Vexler, A., Liu, S., Kang, L., and Hutson, A. D. (2009). Modifications of the Empirical
Likelihood Interval Estimation with Improved Coverage Probabilities.
Communications in Statistics (Simulation and Computation), 38, 2171-2183.
Vexler, A., Liu, A., and Schisterman, E. F. (2006). Efficient Design and Analysis of
Biospecimens with Measurements Subject to Detection Limit. Biometrical Journal,
48, 780-791.
Vexler, A., Liu, A, and Schisterman, E. F. (2010). Nonparametric deconvolution of
density estimation based on observed sums. Journal of Nonparametric Statistics, 22,
23-39.
Vexler, A., Liu, S., and Schisterman EF. (2011). Nonparametric-likelihood inference
based on cost-effectively-sampled-data. Journal of Applied Statistics, 38, 769-783.
Vexler, A., Schisterman, E. F., and Liu, A. (2008). Estimation of ROC curves based on
stably distributed biomarkers subject to measurement error and pooling mixtures.
Statistics in Medicine, 27, 280-296.
Vexler, A., and Tarima, S. (2010). An optimal approach for hypothesis testing in the
presence of incomplete data. Annals of the Institute of Statistical Mathematics, 63,
1141-1163.
Vexler, A., and Wu, C. (2009). An Optimal Retrospective Change Point Detection Policy.
Scandinavian Journal of Statistics, 36, 542-558.
Vexler, A., Wu, C., and Yu, K. F. (2010). Optimal hypothesis testing: from semi to fully
Bayes factors. Metrika, 71, 125-138.
Vexler, A., and Yu, J. (2011). Two-sample density-based empirical likelihood tests for
incomplete data in application to a pneumonia study. Biometrical Journal, 53, 628651.
Vexler, A., Yu, J., Tian, L., and Liu, S. (2010). Two-sample nonparametric likelihood
inference based on incomplete data with an application to a pneumonia study.
Biometrical Journal, 52, 348-361.
Waxmonsky, J., Pelham, W. E., Gnagy, E., Cummings, M. R., O'Connor, B., Majumdar,
A., Verley, J., Hoffman, M. T., Massetti, G. A., Burrows-MacLean, L., Fabiano, G.
A., Waschbusch, D. A., Chacko, A., Arnold, F. W., Walker, K. S., Garefino, A. C.,
and Robb, J. A. (2008). The efficacy and tolerability of methylphenidate and
behavior modification in children with attention-deficit/hyperactivity disorder and
severe mood dysregulation. J Child Adolesc Psychopharmacol, 18, 573-88.
Weinberg, C. R., and Umbach, D. M. (1999). Using pooled exposure assessment to
improve efficiency in case-control studies. Biometrics, 55, 718-26.
180
Wians, F. H., Urban, J. E., Keffer, J. H. and Kroft, S. H. (2001). Discriminating between
iron deficiency anemia and anemia of chronic disease using traditional indices of
iron status vs transferrin receptor concentration. American Journal of Clinical
Phathology, 115, 112-118.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80-83.
Wolfe, D. A., and Schechtman, E. (1984). Nonparametric statistical procedures for the
change point problem. Journal of Statistical Planning and Inference, 9, 389-396.
Ying, G., Mary, E. N., John, H., Michael, G. W., and Graham, E. (2006). An Exploratory
Factor Analysis of the Childrens Depression Rating Scale-Revised. J Child Adol
Psychop, 16, 482-491.
Yu, J., Vexler, A., and Tian, L. (2010). Analyzing Incomplete Data Subject to a
Threshold Using Empirical Likelihood Methods: An Application to a Pneumonia
Risk Study in an ICU Setting. Biometrics, 66, 123130.
Yu, J., Vexler, A,. Kim, S., and Hutson, A. D. (2011). Two-sample Empirical likelihood
ratio tests for medians application to biomarker evaluations. The Canadian Journal
of Statistics, 39, 671-689.
181