Sei sulla pagina 1di 197

ADVANCED AND NOVEL PARAMETRIC AND

NONPARAMETRIC LIKELIHOOD STATISTICAL


TECHNIQUES WITH APPLICATIONS IN
EPIDEMIOLOGY
by
Wan-Min Tsai
December 17, 2012

A dissertation submitted to the


Faculty of the Graduate School of
the University at Buffalo, State University of New York
in partial fulfillment of the requirements for the
degree of

Doctor of Philosophy

Department of Biostatistics

UMI Number: 3554514

All rights reserved


INFORMATION TO ALL USERS
The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.

UMI 3554514
Published by ProQuest LLC (2013). Copyright in the Dissertation held by the Author.
Microform Edition ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code

ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346

Dissertation Committee
Albert Vexler, Ph.D. (Advisor)
Alan Hutson, Ph.D.
Chang-Xing Ma, Ph.D.
Jihnhee Yu, Ph.D.

ii

ACKNOWLEDGEMENTS
I would like to express my gratitude to all the people who have helped along the journey
of my Ph.D. study.
Firstly, I would like to express my deepest appreciation to my advisor, Professor
Albert Vexler, for his guidance and encouragement. This dissertation would not have
been possible without him. I am also deeply thankful to Professors Alan Hutson, ChangXing Ma, and Jihnhee Yu, who kindly agreed to serve on my dissertation committee, for
their interest and valuable comments on my work.
I would like to thank all of my professors, fellow students, and department staff for
their kind help and support during my graduate study at the University at Buffalo. My
sincere thanks also to Dr. Gregory Gurevich and Dr. Yaakov Malinovsky as well as the
reviewers of the manuscripts submitted to journals for the constructive suggestions
improving the quality of the manuscripts and this dissertation.
I would also like to thank all of my friends for always being there with a word of
encouragement or listening ear.
Last but not least, I would like to thank my beloved parents, brother and sisters.
Without their strong love and support, I would not have had a chance to go abroad to
pursue my graduate study. I love my family beyond expression for their support and
encouragement along the way. I would like to dedicate this work to them.

iii

TABLE OF CONTENTS

Acknowledgements .......................................................................................................iii
List of Tables....................................................................................................................x
List of Figures ............................................................................................................... xiii
Abstract. ........................................................................................................................ . xiv
Chapters. ........................................................................................................................... 1
Chapter 1. Introduction. ............................................................................................ 1
Chapter 2. Estimation and testing based on data subject to
measurement errors: from parametric to non-parametric likelihood
methods............................................................................................................................ 9
2.1 Introduction ..............................................................................................................................9
2.2 Parametric inferences............................................................................................................11
2.2.1 Parametric likelihood functions .......................................................................... 12
2.2.1.1 Parametric likelihood based on repeated data..12
2.2.1.2 Parametric likelihood based on pooled and unpooled data......13
2.2.2 Normal case......................................................................................................... 15
2.2.2.1 Maximum likelihood estimators based on repeated measures.15
2.2.2.2 Maximum likelihood estimators followed the hybrid design......17
iv

2.2.2.3 Remarks on the normal case.. ...................18


2.3 Empirical likelihood method ...............................................................................................19
2.3.1 The empirical likelihood ratio test ...................................................................... 19
2.3.2 The empirical likelihood method based on repeated measures data................... 21
2.3.3 The empirical likelihood method based on pooled-unpooled data ..................... 22
2.4 Monte Carlo experiments .....................................................................................................25
2.4.1 Simulation settings .............................................................................................. 25
2.4.2 Monte Carlo outputs ........................................................................................... 27
2.5 A real data example ..............................................................................................................32
2.6 Conclusions ............................................................................................................................35

Chapter 3. An empirical likelihood ratio based goodness-of-fit test for


inverse Gaussian distributions .............................................................................. 36
3.1 Introduction ............................................................................................................................36
3.2 Method ....................................................................................................................................39
3.2.1 Density-based empirical likelihood ratio goodness-of-fit test ............................ 40
3.2.2 Null distribution .................................................................................................. 46
3.3 Power properties ....................................................................................................................50
3.4 Data examples........................................................................................................................52
3.5 Conclusions ............................................................................................................................54

Chapter 4. An extensive power evaluation of a novel two-sample


density-based empirical likelihood ratio test for paired data. ................... 56
4.1 Introduction ............................................................................................................................56
4.2 Method ....................................................................................................................................61
4.3 Simulation study....................................................................................................................67
4.3.1 Null distribution .................................................................................................. 67
4.3.2 Power study ......................................................................................................... 71
4.4 A real data example ..............................................................................................................75
4.5 Conclusions ............................................................................................................................78

Chapter 5. Two-sample density-based empirical likelihood ratio tests


based on paired data, with application to a treatment study of
Attention-Deficit/Hyperactivity Disorder and Severe Mood
Dysregulation............................................................................................................... 88
5.1 Introduction ............................................................................................................................88
5.2 Statement of problems and Methods ..................................................................................93
5.2.1 Hypotheses setting .............................................................................................. 93
5.2.2 Test statistics ....................................................................................................... 94
5.2.2.1 Test 1:

vs.

.....94

5.2.2.2 Test 2:

vs.

.........99

5.2.2.3 Test 3:

vs.

.......101
vi

5.2.3 Asympotic consistency of the tests .................................................................... 102


5.2.4 Null distributions of the proposed test statistics ................................................ 103
5.3 Simulation study..................................................................................................................105
5.3.1 Power comparison with the parametric method................................................. 105
5.3.2 Power comparison with classic nonparametric methods ................................... 106
5.4 Data analysis ........................................................................................................................107
5.5 Conclusions ..........................................................................................................................110

Chapter 6. Optimal properties of parametric Shiryaev-Roberts


statistical control procedures ............................................................................... 117
6.1 Introduction ..........................................................................................................................117
6.2 Retrospective change point detection ...............................................................................119
6.3 Retrospective detection of two change points .................................................................124
6.3.1 Non-asymptotic optimal properties of the Shiryaev-Roberts test (6.10) ......... 125
6.3.2 A real data example .......................................................................................... 129
6.3.2.1 The methods for p-value approximation related to test (6.16)...132
6.3.2.2 Additional study.............134
6.4 Sequential change point detection.....................................................................................134
6.5 Conclusions ..........................................................................................................................138

vii

Chapter 7. Future works ....................................................................................... 140


7.1 Testing normality based on independent samples and errors in regression models
......................................................................................................................................................140

7.2 A sample density-based empirical likelihood ratio test for independence


......................................................................................................................................................141

Appendices ................................................................................................................. 143


A.1.1 Parametric likelihood based on the hybrid design ......................................................143
A.1.2 The asymptotic distribution of the log empirical likelihood ratio test .....................145
A.1.2.1 The repeated measurements .......................................................................... 145
A.1.2.2 The hybrid design ......................................................................................... 147
A.2.1 Proof of Lemma 3.1 ........................................................................................................149
A.2.2 To show the proposed test statistic is identical to the test statistic presented by
Mudholkar and Tian (2012)..................................................................................................... 150
A.2.3 Proof of Proposition 3.1 .................................................................................................152
A.2.4 Proof of Proposition 3.2 ................................................................................................ 155
A.3.1 Computing the empirical constraint (5.11) utilized to develop test 2 ......................156
A.3.2 Proof of Proposition 5.1 ................................................................................................ 157
A.3.2.1 Proposed test 1 .............................................................................................. 157
A.3.2.2 Proposed test 2 .............................................................................................. 165
A.3.3 Mathematical derivation of maximum likelihood ratio tests.................................... 168

viii

A.3.3.1 Maximum likelihood ratio test statistic for test 1 ......................................... 168
A.3.3.2 Maximum likelihood ratio test statistic for test 2 ......................................... 169
A.3.3.3 Maximum likelihood ratio test statistic for test 3 ......................................... 170

Bibliography .............................................................................................................. 171

ix

LIST OF TABLES

2.1 The Monte Carlo evaluations of the maximum likelihood estimates based on
repeated measurements and the hybrid design ......................................................... 28
2.2 Coverage probabilities and confidence intervals based on repeated measurements
and the hybrid design............................................................................................... 29
2.3 The Monte Carlo type I errors and powers of the empirical likelihood ratio test
statistics (2.3) and (2.7) for testing
(

hybrid design (

based on data following the

))............................................................. 31

2.4 The Monte Carlo type I errors and powers of the empirical likelihood ratio test
statistics (2.3) and (2.7) for testing
(

hybrid design (
2.5

based on data following the


)).. ............................................................. 32

Bootstrap evaluations of the confidence interval estimators based on the


parametric likelihood ratio test and the empirical likelihood ratio test.................... 34

3.1 The critical values,

), of the proposed test at the significance level

3.2 Type I error control of the proposed

.... 47

) test: =0.05 .................................... 48

3.3 The Monte Carlo Type I errors of the proposed test with critical values obtained
by Proposition 2.2 to guarantee

................................................................ 50

3.4 An empirical power comparison for the density-based empirical likelihood ratio
test at

........................................................................................................ 51
x

3.5 Test for the IG based on the data introduced in Folks and Chhikara (1978) ........... 52
3.6 Bootstrap type proportion of rejection of the tests for the IG at a 5% level of
significance .............................................................................................................. 54

4.1 The critical values


sample sizes (

of the proposed test at (4.7) with various (


) at the significant level

) values for

0.01, 0.05, and 0.1,

respectively .............................................................................................................. 68
4.2 Type I error control of the proposed test statistic (4.6) with (
the significant level

) at

.................................................................................. 70

4.3 Designs of the alternative hypothesis to be applied to the following Monte Carlo
evaluations of the powers of the proposed test (4.6) ................................................ 72
4.4 Proportion of rejection based on the bootstrap method for each considered test.. .. 78

5.1 Hypotheses of interest to be tested based on paired data ......................................... 94


5.2 The critical values for Test 1 by (5.9) (Test 2 by (5.15)) [Test 3 by (5.17)] with
=0.1 for different sample sizes (

) and significance levels ..................... 104

5.3 The Monte Carlo powers of Test 1 by (5.9) vs. the MLR test for
different sample sizes (

) at the significance level

5.4 The Monte Carlo powers of Test 2 by (5.15) vs. the MLR test for
different sample sizes (

) at the significance level

5.5 The Monte Carlo powers of Test 3 by (5.17) vs. the MLR test for
different sample sizes (

) at the significance level


xi

vs.

with

........................ 111
vs.

with

........................ 112
vs.

with

........................ 112

5.6 The Monte Carlo type I errors of the MLR tests ................................................... 113
5.7 The Monte Carlo powers of the proposed test (5.9) vs. the combined
nonparametric test (the two Wilcoxon signed rank tests and one KolmogorovSmirnov test) at

...................................................................................... 113

5.8 The Monte Carlo powers of the proposed test (5.15) vs. the combined
nonparametric test (the one Wilcoxon signed rank test and one KolmogorovSmirnov test) at

..................................................................................... 114

5.9 The Monte Carlo powers of the proposed test (5.17) vs. the Wilcoxon signed
rank test at the significance level

........................................................... 115

5.10 The proportions of rejections based on the bootstrap method for each considered
test.......................................................................................................................... 115

6.1 Means and standard deviations of the %TS data and the TIBC data in each group
............................................................................................................................... 132

xii

LIST OF FIGURES
2.1 The histogram and the normal Q-Q plot of cholesterol data ................................... 33
4.1 3-D plots of powers of the considered tests via all 28 designs (K1-K28) with
different sample sizes at the significant level

.......................................... 80

4.2 Histograms of the differences in CDRS-Rts at baseline and endpoint in the group
1(

samples) and group 2 (

samples), respectively ................................... 87

4.3 Plot of sample sizes vs. p-value using a bootstrap method ...................................... 87
5.1 Histograms of CDRS-Rts related to the baseline and endpoint in group 1: (
), and in group 2: (

)... ............................................................................... 116

5.2 Histograms of the paired observations


with sample sizes (

and

based on the CDRS-Rts data,

)=(9, 6), that were sampled from the original data set .. 116

6.1 The left-hand side of the Figure 6.1 shows plots and histograms of %TS data; the
right-hand side of the Figure 6.1 shows plots and histograms of TIBC data......... 130
6.2 Histograms of %TS data in each group ................................................................. 131
6.3 Histograms of TIBC data in each group ................................................................ 131

xiii

ABSTRACT
The likelihood approach provides a basis for many important procedures and methods in
statistical inference. When data distributions are completely known, the parametric
likelihood approach is unarguably a powerful statistical tool that can provide optimal
statistical inference. In such cases, by virtue of the Neyman-Pearson lemma, the likelihood
ratio tests are the most powerful decision rules. The parametric likelihood methods cannot
be applied properly if assumptions on the forms of distributions of data do not hold. Often,
in the context of likelihood applications, the use of the misspecified parametric forms of
data distributions may result in inaccurate statistical conclusions. The empirical likelihood
(EL) methodology has been well addressed in the literature as a nonparametric counterpart
of its powerful parametric likelihood approach. The objective of this dissertation is to
develop several powerful parametric likelihood methods and nonparametric approaches
using the EL concept. Measurement error (ME) problems can cause bias or inconsistency
of statistical inferences. When investigators are unable to obtain correct measurements of
biological assays, special techniques to quantify MEs need to be applied. In this
dissertation, we present both parametric likelihood and EL methods for dealing with data
subject to MEs based on repeated measures sampling strategies and hybrid sampling
designs (a mixture of pooled and unpooled data). Utilizing the density-based EL
methodology, we also propose different efficient nonparametric tests that approximate
most powerful Neyman-Pearson test statistics. We first introduce the EL ratio based
goodness-of-fit test for the inverse Gaussian model. Then we extend and adapt the densitybased EL approach to compare two samples based on paired data. We present exact
nonparametric tests for composite hypotheses to detect various differences related to
xiv

treatment effects in study groups based on paired measurements. Next, we review and
extend parametric retrospective and sequential Shiryaev-Roberts based policies, carrying out
different contexts of the non-asymptotic optimal properties of the procedures. We propose
techniques to construct novel and efficient retrospective tests for multiple change-points
detection problems. Finally, future works will be discussed.

xv

CHAPTER 1
INTRODUCTION
Likelihood methods are powerful statistical tools for parametric statistical inference. The
parametric likelihood methodology provides optimal statistical inferential procedures
when data distributions are known. However, parametric forms of data distributions are
oftentimes unknown. When the key assumptions regarding the underlying distribution of
data are not met, the parametric likelihood approaches may be extremely biased and
inefficient when compared to their robust nonparametric counterparts. In the
nonparametric context, the classical empirical likelihood (EL) approach is often applied
in order to efficiently approximate properties of parametric likelihoods, using an
approach based on substituting empirical distribution functions for their population
counterparts. Thus, the objective of this work is to develop efficient parametric likelihood
approaches as well as nonparametric methods using the EL concept to deal with problems
arising from epidemiological and medical studies.
The EL ratios were first used by Thomas & Grunkemeier (1975) in the context of
estimating survival probabilities. Owen (1988, 1990), building on earlier work of Thomas
& Grunkemeier (1975), introduced the EL approach for constructing confidence regions
in nonparametric problems. The approach has been developed to various situations. For
example, Owen (1991) and Chen (1993, 1994) extended the method to regression
problems, Chen and Hall (1993) considered the case of quantiles, and Kolaczyk (1994)
made the extension to generalized linear models. Qin and Lawless (1994) have shown
that the EL methodology can be utilized to make inference on parameters of interest

under a semiparametric model. The EL method has several advantages compared to


contemporary methods such as the bootstrap (Hall and Scala, 1990). The EL approach
not only provides confidence regions whose shapes and orientations are determined
entirely by the data, but it can also easily incorporate known constraints on parameters
(Chen, 1994). Moreover, DiCiccio et al. (1989) showed that in a very general setting, the
EL method for constructing confidence regions is Bartlett correctable which allows very
low coverage error to improve the accuracy of inferences. For a more comprehensive
literature review, see e.g., Owen (2001).
In this dissertation, we mainly focus on the developments of novel and powerful
distribution-free approaches to approximate optimal parametric likelihood ratio test
statistics via the EL methodology. In Chapter 2, we introduce both parametric and
empirical likelihood methods for dealing with measurement error problems. In Chapters 35, we develop several robust EL-based methods motivated by important biomedical studies.
In Chapter 6, we propose techniques to construct efficient parametric retrospective tests for
multiple change-points detection problems. In the last chapter, we discuss directions for
future works. In this introductory chapter, we begin with a brief description of background
and motivation of each study and review of related literature, which are essential for
comprehending the following chapters covered in this dissertation.
In Chapter 2, we propose and examine parametric and nonparametric tests for data
subject to measurement errors based on two sampling schemes---a common sampling
strategy based on repeated measures and a hybrid design (a mixture of pooled and
unpooled data). Biomarkers are distinctive biochemical indicators of biological processes
or events that help measure the progress of disease or the effects of treatment. The

imprecision of measurement of exposures can cause bias or inconsistency of estimation


or testing of unknown parameters of the biomarker distribution. The sampling based on
repeated measurements is a common method to deal with measurement error problems
(e.g., Dunn, 1989; Freedman et al., 2004). However, because of financial cost and other
resources constraints, obtaining and assaying the desired number of replications can be
infeasible. For example, the cost of a single assay to measure polychlorinated biphenyl
(PCB) is between $500 and $1000. The high assay expense restricts the number of
biological samples that can be measured in the study related to examination of the
association between the marker PCB and cancer or endometriosis (Louis et al., 2005;
Mumford et al., 2006). Take the Seveso cohort as another example. In 1976, due to a
severe industrial accident, serious dioxin contamination of a large resident population of
people and farm animals occurred in Seveso, Italy. Investigators need to assay those
frozen storage blood specimens from the resident population in the exposed area of
dioxin in order to examine whether dioxin contamination adversely affects human health.
However, each determination of blood dioxin currently costs more than $1000, resulting
in only the limited number of subjects that can be considered in this study project
(Weinberg and Umbach, 1999).
Pooling and simple random sampling designs have been suggested as cost-efficient
sampling approaches (e.g., Dorfman, 1943; Weinberg and Umbach, 1999; Faraggi et al.,
2003, Vexler et al., 2006, Schisterman and Vexler, 2008). Simple random sampling
designs are conducted by selecting and assaying a random subset of available specimens.
In this case, several individual biospecimens are ignored; however, estimation of the
operating characteristics of these individual biospecimens is simple. Pooling, on the other

hand, involves randomly grouping and physically mixing individual biological samples.
Assays are then performed on the small number of pooled samples. Therefore, pooling
sampling design reduces the number of measurements without ignoring any individual
biospecimens. Schisterman et al. (2010) proposed a pooled-unpooled hybrid design that
combines the advantages of both pooling and random sampling strategies. The hybrid
designs are implemented by assaying some randomly selected individual biospecimens
and one pooled set that is constituted by the remaining individual biospecimens. The
pooled sample in the hybrid design is used for estimation of the mean of the biomarker
distribution, whereas the random sample proportion in the hybrid design (unpooled data)
is utilized for estimation of the variance. One advantage of using the hybrid design is that
by utilizing a pooled sample, one can estimate not only mean and variance of the
biomarker distribution, but also measurement error, without requiring repeated measures.
The repeated measurements sampling strategy has been well addressed in the literature
under parametric assumptions. In the context of the hybrid sampling design, Schisterman
et al. (2010) have proposed and evaluated a parametric approach for normally distributed
data in the presence of measurement errors. However, nonparametric methods have not
been well investigated to analyze repeated measures data or pooled data subject to
measurement errors. In Chapter 2, we consider general cases of parametric and
nonparametric assumptions, comparing efficiency of pooled-unpooled samples and data
consisting of repeated measures. The applications of the proposed methods are illustrated
using the cholesterol biomarker data from a study of myocardial infarction.
The classical EL approach reflects distribution-based interpretation of the notion of
likelihood (Owen, 2001). Note that the Neyman-Pearson concept to show the optimality

of statistical decision rules utilizes density functions structures of likelihood type tests.
Recently, Vexler and Gurevich (2010) proposed the density-based EL methodology to
construct efficient nonparametric tests that approximate Neyman-Pearson test statistics.
In the following chapters, we extend and adapt the density-based EL approach for
different testing problems, including tests of goodness-of-fit (Chapters 3 and 7.1) and
two-sample comparisons based on paired data (Chapters 4 and 5).
Testing for distributional assumptions has been a major area of continuing statistical
research. Chapter 3 introduces a new and powerful density-based EL goodness-of-fit test
for the inverse Gaussian distribution. The inverse Gaussian distribution is commonly
introduced to model and examine right skewed data having positive support. It originates
as the distribution of the first passage time of Brownian motion with drift (Henze and
Klar, 2012). It has useful applications include reliability problems (Padgett and Tsoi,
1986), lifetime models (Chhikara and Folks, 1977), and accelerated life testing
(Bhattacharyya and Fries, 1982). Folks and Chhikara (1978, 1989) have presented several
properties and applications of this distribution. When applying the inverse Gaussian
model, it is critical to develop efficient goodness-of-fit tests. Mudholkar and Tian (2002)
presented the entropy-based goodness-of-fit test for the inverse Gaussian distribution that
strongly depends on values of the integer parameter m. The problem of selecting the
optimal parameter m causes the difficulty of efficiently implementing the test in practice.
In this chapter, we propose an EL based goodness-of-fit test for the inverse Gaussian
distribution that improves Mudholkar and Tians test (2002) in the context of eliminating
dependence on the integer parameter m.

It is a common practice to conduct medical trials in order to compare a new therapy


with a standard-of-care based on paired data consisted of pre- and post-treatment
measurements. In the subsequent chapters (Chapters 4 and 5), we focus on developments
of efficient nonparametric likelihood methods for comparing two study groups based on
paired observations. In Chapter 4, we present an efficient distribution-free test to compare
two independent samples, where each is based on paired data. The statistical literature
has pointed out several issues related to the well-known classical tests for paired data
such as the t-test and the Wilcoxons test. For instance, the t-test is not appropriate to be
applied to data from strongly skewed distributions (see, e.g. Vexer et al., 2009); the
Wilcoxons test may break down completely when a nonconstant change is in effect
under the alternative hypothesis (see, e.g. Albers et al., 2001). In addition, the classical
Kolmogorov-Smirnov test is not designed to be based on paired data directly; therefore, it
may not be well suited to compare two samples based on paired observations. Due to the
drawbacks of these classical statistical procedures, Chapter 4 aims to provide an
alternative powerful nonparametric test based on paired data to compare treatment effects
between two study groups. We extend and modify the density-based EL ratio test
presented by Gurevich and Vexler (2011) to formulate an appropriate parametric
likelihood ratio test statistic corresponding to the hypothesis of our interest and then to
approximate the test statistic nonparametrically.
In many case-control studies, a great interest often lies in identifying treatment effects

within each therapy group as well as detecting between-group differences. To this end, in
a nonparametric setting, one can consider combining relevant nonparametric standard
procedures, for example, the Kolmogorov-Smirnov test and the Wilcoxon signed rank

test. The former test is a known procedure to compare distributions of two study groups,
whereas the latter one can be applied to detect treatment effects within each study group.
However, the use of the classical procedures commonly requires complex considerations
to combine the known nonparametric tests, e.g., considerations of combined p-values. In
Chapter 5, we propose simple nonparametric tests for three composite hypotheses related
to treatment effects to provide efficient tools that compare study groups utilizing paired
data. We adapt and extend the density-based EL methodology to deal with various testing
scenarios involved in the two-sample comparisons based on paired data. The proposed
technique is applied for comparing two therapy strategies to treat childrens attention
deficit/hyperactivity disorder and severe mood dysregulation.
Parametric change point detection schemes based on the Shiryaev-Roberts approach
have been well addressed in the statistics and engineering literature. High efficiency of
such procedures can be partially explained by their known asymptotic optimal properties.
Recently, Shiryaev-Roberts based procedures were proposed and examined in applications
to the standard AMOC (at most one change) retrospective change point detection problems.
In Chapter 6, we review and extend parametric retrospective and sequential ShiryaevRoberts based policies, carrying out different contexts of the non-asymptotic optimal
properties of the procedures. We utilize the general principle of the Neyman-Pearson
fundamental lemma to show that the Shiryaev-Roberts approach implies the average most
powerful procedures. We also introduce techniques to construct efficient retrospective tests
for multiple change-points detection. A real data example based on biomarker
measurements is provided to demonstrate implementation and effectiveness of new tests in
practice.

The last chapter of this dissertation addresses directions for future work and outlines
two possible topics of future work: 1). Develop an EL ratio based goodness-of-fit test of
the normality based on several independent samples and errors in regression models; 2)
Develop a simple likelihood ratio type test for independence between two random
variables, without requiring the specification of any kind of dependence and any
assumptions on the forms of data distributions.

CHAPTER 2
ESTIMATION AND TESTING BASED ON DATA
SUBJECT TO MEASUREMENT ERRORS: FROM
PARAMETRIC TO NON-PARAMETRIC
LIKELIHOOD METHODS

2.1

INTRODUCTION

Commonly, many biological and epidemiological studies deal with data subject to
measurement errors (MEs) attributed to instrumentation inaccuracies, within-subject
variation resulting from random fluctuations over time, etc. Ignoring the presence of ME
in data can result in the bias or inconsistency of estimation or testing. The statistical
literature proposed different methods for ME bias correction (e.g., Carroll et al., 1984,
1999; Carroll and Wand, 1991; Fuller, 1987; Liu and Liang, 1992; Schafer, 2001;
Stefanski, 1985; Stefanski and Carroll, 1987, 1990). Among others, one of the common
methods is to consider repeated measurements of biospecimens collecting sufficient
information for statistical inferences adjusted for ME effects (e.g., Hasabelnaby et al.,
1989). In practice, measurement processes based on bioassays can be costly and timeconsuming and can restrict the number of replicates of each individual available for
analysis or the number of individual biospecimens that can be used. It can follow that
9

investigators may not have enough observations to achieve the desired power or
efficiency in statistical inferences.
Dorfman (1943), Faraggi et al. (2003), Liu and Schisterman (2003), Liu et al. (2004),
Mumford et al. (2006), Schisterman et al. (2008, 2010), and Vexler et al. (2006, 2008,
2010, 2011) addressed pooling sampling strategies as an efficient approach to reduce the
overall cost of epidemiological studies. The basic idea of the pooling design is to pool
together individual biological samples (e.g., blood, plasma, serum or urine) and then
measure the pooled samples instead of each individual biospecimen. Since the pooling
design reduces the number of measurements without ignoring individual biospecimens,
the cost of the measurement process is reduced, but relevant information can still be
derived. Recently, it has been found that we can utilize a hybrid design that takes a
sample of both pooled and unpooled biospecimens to efficiently estimate unknown
parameters, allowing for MEs presence in the data without requiring repeated measures
(Schisterman et al., 2010).
In the context of the hybrid strategy, Schisterman et al. (2010) evaluated data that
follow normal distribution functions. In this chapter, we consider general cases of
parametric and nonparametric assumptions, comparing efficiency of pooled-unpooled
samples and data consisting of repeated measures. It should be noted that the repeated
measurement technique proposes to collect a large amount of information regarding just
nuisance parameters related to distribution functions of ME, whereas the pooledunpooled design provides observations that are informative regarding target variables
allowing for ME. Therefore, we show that the pooled-unpooled sampling strategy is more
efficient than the repeated measurement sampling procedure. We construct parametric

10

likelihoods based on both sampling methods. Additionally, in order to preserve


efficiencies of both strategies without parametric assumptions, we consider a
nonparametric approach using the empirical likelihood (EL) methodology (e.g., DiCicco
et al., 1989; Owen, 1988, 1991, 2001; Vexler et al., 2009, 2010; Vexler and Gurevich,
2010; Yu et al., 2010). We develop and apply novel EL ratio test statistics creating the
confidence interval estimation based on pooled-unpooled data and repeated measures
data. Despite the fact that many statistical inference procedures have been developed to
operate with data subject to ME, to our knowledge, relevant nonparametric likelihood
techniques and parametric likelihood methods have not been well addressed in the
literature.
The rest of this chapter is organized as follows. In Section 2.2, we present a general
form of the likelihood function based on repeated measures data and pooled-unpooled
data. We propose the EL methodology to make nonparametric inferences based on
repeated measures data and pooled-unpooled data in Section 2.3. We claim that the EL
technique based on the hybrid design provides a valuable technique to construct statistical
tests and estimators of parameters when MEs are present. To evaluate the proposed
approaches, we utilize Monte Carlo simulations in Section 2.4. We present an application
to cholesterol biomarker data from a study of coronary heart disease in Section 2.5. In
Section 2.6, we provide some concluding remarks.

2.2

PARAMETRIC INFERENCES

In this section, we derive general forms of the relevant likelihood functions. In each case,
we assume that the total measurements of the biomarkers are fixed, say N, for example, N
is the total number of measurements that a study budget allows us to execute.
11

2.2.1 PARAMETRIC LIKELIHOOOD FUNCTIONS


2.2.1.1

PARAMETRIC LIKELIHOOD BASED ON REPEATED MEASURES


DATA

Suppose that we measure a biospecimen observing score


values of biomarker measurements

are independent identically distributed (i.i.d.) and

are i.i.d. values of ME,


there is a subset of

, where true

. Thus, we assume that

distinct biosassays, and each of them is

times repeatedly

measured. In this case, we can define the total number of available individual bioassays
to be T, T > t, when we can consider obtaining a large number of individual biospecimens
to have a low cost with respect to high cost of measurement processes. We assume that
(

and are independent. Firstly, we consider the simple normal case, say
and

show that if

). Accordingly, we observe

). In this case, one can

, there are no unique solutions of estimation of

and

(non-

identifiability). The observations s in each group i are dependent because they are
measured using the same bioassay. Note that if we fix the value of
is independent of each other, for example, in the case of
|

conditioned on
(

), we have

) . In a general case, the likelihood function based on the repeated

measures data has the general form of


( |

When the distributions of

and

[ ( )

| )]

are known, we can obtain the specific likelihood

functions, and further, we can also derive the maximum likelihood estimators of

12

and

. Well-known asymptotic results related to the maximum likelihood estimation

give evaluations of properties of estimators based on the likelihood


2.2.1.2

( |

).

PARAMETRIC LIKELIHOOD BASED ON POOLED AND


UNPOOLED DATA

We briefly address the basic concept of the pooling design. Let


individual biospecimens available and

be the number of

be the total number of measurements that we

can obtain due to a limited study budget. We obtain the pooling samples by randomly
grouping individual samples into groups of size , where

], the number of

individual samples in a pooling group and [ ] is the integer part of . The pooling design
requires a physical combination of specimens of the same group and a test of each pooled
specimen, obtaining a single observation, when the pooled sample is measured. Since the
measurements are generally per unit of volume, we assume that the true measurement for
a pooled set is the average of the true individual marker values in that group. In this case,
taking into account that instruments applied to the measurement process can be sensitive
and subject to some random exposure ME, we define a single observation to be a sum of
the average of individual marker values and a value of ME. Note that, in accordance with
the pooling literature, we assume that analysis of the biomarkers is restricted by the high
cost of the measurement process, whereas access to a large number of individual
biospecimens can be considered to have a relatively low cost.
In this subsection of hybrid design, we assume that
available, but still we can provide just
unpooled samples is (
(

),

distinct individual bioassays are

measurements (N < T). The ratio of pooled and


[

], and the pooling group size is p. Namely,

) . Specifically, we can obtain pooled data by mixing

13

individual

bioassays together, and we therefore divide the

bioassays into

. We measure the grouped biospecimens as

groups, where

single observations. Let

, denote measurements of pooled bioassays. In accordance with the literature, we


have

(see, e.g., Faraggi et al., 2003; Liu and Schisterman, 2003; Liu et al. 2004; Schisterman
et al. 2008, 2010; Vexler et al., 2006, 2008, 2010, 2011). Hence, we can obtain that
are i.i.d. with the mean

and the variance


i.i.d. (

, namely,

The unpooled samples are based on

)
independent observations
(

i.i.d. (

In this case, we have

).

Note that the pooled and unpooled samples are independent of each other. As a result,
the likelihood function based on the combination of pooled and unpooled data has the
form of
(

If we know the distribution functions of


according to the distributions of

and

and

, we can derive the likelihood functions


. Therefore, we can also obtain the

corresponding theoretical maximum likelihood estimators of (

14

( )

). Since the

estimators follow the maximum likelihood methodology, we can easily show the
asymptotic properties of the estimators.

2.2.2 NORMAL CASE


(

In this subsection, we assume

) and

). Then we obtain closed-

form analytical solutions for the maximum likelihood estimators of the unknown
parameters,
2.2.2.1

, and

MAXIMUM LIKELIHOOD ESTIMATORS BASED ON REPEATED


MEASURES

Assume that

. By the additive property of the


(

normal distributions, we have

Referring to Searle et al. (1992), the likelihood function is a well-known result that can
be expressed by
{ [
( |

)
(

where

)
[ (

)]

]}

Under the assumption that

s are equal (i.e., assuming balanced data), the log

likelihood function is in the form of


( |

)
(

(
where

( )

)
(

15

)
)
) and

) .

Let

. By taking the partial derivatives of

with respect to

, and

and setting the equations equal to zero, we obtain the maximum likelihood equations
with the roots

and

)
is

Thus, the maximum likelihood estimator of


likelihood estimators of

( )

and

are

and

, and the maximum

and

, respectively, when

) , respectively, when

Also, the large-sample variances and covariance of and are given by


( )

)
(
(

( )

( )

(for details, see Searle et al., 1992).


By the property of the maximum likelihood estimators, it is clear that asymptotically
those estimators follow a multivariate normal distribution as
[

) as

where

[
[

)
(
(

16

(
(

)
) ]

2.2.2.2

MAXIMUM LIKELIHOOD ESTIMATORS FOLLOWED THE


HYBRID DESIGN
(

Since we assume that


(

) and

) , and

, we can write

.
The likelihood function based on pooled-unpooled data then takes the form
( |

( |

Differentiating the log likelihood function, log


and

(
(

(
(

)
)

), with respect to

, respectively, we obtain the maximum likelihood estimators of

, and

given by

Note that the estimator of

)
(

has a structure that weighs estimations based on pooled and

unpooled data in a similar manner to a Bayes point estimator used in normal-normal


models (see Carlin and Louis, 2008). In this case, we show that we can obtain inference
regarding the parameters by using this hybrid approach without repeating measures on
the same individual, which is the most common strategy to solve ME problems.

17

By the virtue of the properties of the maximum likelihood estimators, the asymptotic
distribution of the estimators (2.1) is asymptotically
[

) where

is the inverse of the Fisher

Information matrix, I,

)
(

)
(

) ]

[
where

( |

) ]

) is the corresponding log likelihood function (for

details, see Appendix A.1.1).


2.2.2.3

REMARKS ON THE NORMAL CASE

As shown above, when biomarkers values and MEs are normally distributed, the
maximum likelihood estimators exist and can be easily obtained. It is also clear that we
can consider these estimators as the least square estimators in a nonparametric context.
However, when data are not from normal distributions, it may be very complicated or
even be infeasible to extract the distributions of repeated measures data or pooled and
unpooled data (e.g., Vexler et al., 2011). For example, in various situations, closed
analytical forms of the likelihood functions cannot be found based on pooled data

18

because the density function of the pooled biospecimen values involves complex
convolutions of p-individual biospecimen values. Consequently, it is reasonable to
consider efficient nonparametric inference methodologies based on the repeated measures
data or pooled-unpooled data.

2.3

EMPIRICAL LIKELIHOOD METHOD

In this section, we apply the EL methodology to the statement of the problem in this
chapter. The EL technique has been extensively proposed as a nonparametric
approximation of the parametric likelihood approach (e.g., DiCiccio et al., 1989; Owen,
1988, 1991, 2001; Vexler et al. 2009, 2010; Vexler and Gurevich, 2010; Yu et al., 2010).
We begin by outlining the EL ratio method and then modifying the EL ratio test to apply
to construct confidence interval estimations and tests based on data with repeated
measures and pooled-unpooled data.

2.3.1 THE EMPIRICAL LIKELIHOOD RATIO TEST


Consider the following simple testing problem that is stated nonparametrically. Suppose
with ( )

i.i.d. random variables

and | |

are observable. The

problem of interest, for example, is to test the hypothesis


( )
where

( )

is fixed and known. To test for the hypothesis in equation (2.2), we can write

the EL function as

where we assume

s to have values that maximize

given empirical constraints. The

empirical constraints correspond to hypotheses settings. Then, under the null hypothesis
19

subject to

in equation (2.2), we maximize

is an empirical form of

Here the condition

. Using the Lagrange multipliers, one

can show that the maximum EL function has the form of


(

where

is a root of

Similarly, under the alternative hypothesis, the maximum EL function has the simple
form

As a consequence, the 2log EL ratio test for equation (2.2) is


(

( )

( (

))]

It is proven in Owen (2001) that the 2log EL ratio, (


distribution as
(

)]

), follows asymptotically

. Thus, we reject the null hypothesis at a significance level

if

Furthermore, we can construct the confidence interval estimator of

as
{
(Here,

is the

( )

percentile of a

freedom.)

20

}
distribution with one degree of

2.3.2 THE EMPIRICAL LIKELIHOOD METHOD BASED


ON REPEATED MEASURES DATA
Following the statement mentioned in Section 2.2, we have correlated data with repeated
measures. In order to obtain an i.i.d. sample, we utilize the fact that

. Therefore, we give an EL function for the block sample mean

when

is independent of

, in a similar manner to the blockwise EL method

given in Kitamura (1997). Then, the random variables become


corresponding EL function for
(

and the

is given by

where

is a root of

In this case, the 2log EL ratio test statistic is in the form of


(

Proposition 2.1. Assume |

when

)]

. Then the 2log EL ratio,

1
i 1 i

n , as

Proof. See Appendix A.1.2.1.

21

), distributes

The associated confidence interval estimator is then given by


} where

is the

percentile of a

distribution with one degree of

freedom.

2.3.3 THE EMPIRICAL LIKELIHOOD METHOD BASED


ON POOLED-UNPOOLED DATA
In this section, we consider two distribution-free alternatives to the parametric likelihood
method mentioned in Section 2.2.1.2. To this end, we apply the EL technique. Note that,
in contrast to data that consist of repeated measures, in this section we use data that are
based on independent observations. Consequently, we can introduce a combined EL
function for the mean

based on two independent samples, that is, i.i.d.

and i.i.d.

), representing measurements that correspond to

pooled and unpooled biospecimens, respectively. Under the null hypothesis, the EL
function for

can be presented in the form of


(

where

and

are roots of the equations

and

Finally, the 2log EL ratio test statistic can be given in the form of

22

log[

)]

log[

)]

In a similar manner to common EL considerations, one can show that the statistics

)] and

)] follow asymptotically a

distribution, respectively. By virtue of the additive property of


(

2log EL ratio,

), has an asymptotic

distributions, the

distribution with two degrees of freedom.

Thus, we formulate the next proposition.


Proposition 2.2. Let | |
as

. Then the 2log EL ratio,

), has a

distribution

Proof. See Appendix A.1.2.2.


The corresponding confidence interval estimator is
{
where

is the

percentile of a

distribution with two degrees of

freedom.
In practice, to execute the procedure above, we can directly use standard programs
related to the classical EL ratio tests, for example, the code el.test of the R software can
be utilized to conduct the EL confidence interval estimator (2.4).
The EL technique mentioned above does not use an empirical version of the rule
(

( )

that connects the second moments derived from pooled and unpooled observations.
Intuitively, using a constraint related to equation (2.5), one can increase the power of the
EL approach. Consider the EL function for

under the null hypothesis,

23

) )

(
(

as an alternative to the simple EL function

( ) )
) Here, is the estimator from

equation (2.1) that is defined under the null hypothesis,


( )

, and

definition of

are roots of the equations mentioned under the operator


(

) with

) )

and

in the
( ) )

Likewise, under the alternative hypothesis, we maximize the EL function,

, subject to
,

( )

)( )

where
( )

Thus, the EL under the alternative hypothesis that depends on ( ) is given by


(( ) )

(
24

( )

where we should numerically derive


well as

, and

) ) ,

( ) )

as

using the equations (2.6). As a result, the corresponding 2log

EL ratio test statistic is


(

(( ) ))

Note that, following Qin and Lawless (1994),

(
(

))]

) is asymptotically equivalent to

the maximum log EL ratio test statistic. By virtue of results mentioned in Qin and
Lawless (1994),

) asymptotically follows a

hypothesis at a significance level

when
(

where

is the

(
percentile of a
{

corresponding confidence interval is


(

percentile of a

distribution. Then we reject the null

distribution. Moreover, the


)

} where

is the

distribution.

The Monte Carlo simulation study presented in the next section examines the
performance of each EL method mentioned above.

2.4

MONTE CARLO EXPERIMENTS

In this section, we conducted an extensive Monte Carlo study to evaluate the performance
of the parametric and nonparametric likelihood methods proposed in Sections 2.2 and 2.3.

2.4.1 SIMULATION SETTINGS


Examining the repeated measures sampling method, we randomly generated samples of
values from a normal distribution with mean
var(

. We let

and variance

, denote the number of replicates for each subject.

For simplicity, we assumed that each subject had the same number of replicates
25

(i.e., assuming balanced data). Then, in a similar manner, we randomly generated


normally distributed MEs,

s, having

and var (

. Therefore, we conducted samples of

. Each sample had

observations.
To obtain the hybrid samples, we first generated a sample of size T, where
(

, to represent available individual

bioassays. Then we proceeded to generate pooled data. To this end, we pooled


[

], samples of

integer and
mean (
MEs,

s,
)

we

s to constitute pooled data, where we assumed

assumed

the

. Following the pooling literature, if there were no


average

values

of

the

pooled

biospecimens,

, to be observed and we could represent them as the

measurements of pooled bioassays. We took the remaining (

observations

, as individual measurements. For each observation, we randomly

generated a ME

to be an

, were i.i.d. random samples from a normal distribution with


and var(

from a normal distribution. Combining the pooled samples,


,

, with the unpooled samples,

, we obtained pooled-unpooled data with the total sample size


equal to that in the Monte Carlo evaluations related to the repeated measures
approaches.
To evaluate the performance of proposed methods, we applied the following
simulation settings: the fixed significance level was 0.05;

=1 and

=1;

=0.4, 1;

=2, 5, 10; the pooling group size = 2, 5, 10; the pooling proportion =0.5; the total

26

sample size

=100, 300. For each set of parameters, there were 10,000 data generations

(Monte Carlo). In this section, following the pooling literature, we assumed that the
simulated analysis of biomarkers was restricted to execute just N measurements and
(

) individual biospeciaments are available, when the hybrid design was

compared with the repeated measures sampling method. The Monte Carlo simulation
results are presented in the next subsection.

2.4.2 MONTE CARLO OUTPUTS


Table 2.1 shows the estimated parameters based on the repeated measures data using the
parametric likelihood method. The results show that as the replicates increase, the
standard errors of the estimates of

decrease, indicating that the estimations of

appear to be better as the number of replicates increases. Apparently, the Monte Carlo
standard errors of the estimators of

and

increase when we increase the number of

replicates.
To accomplish the efficiency comparison between the repeated measures strategy and
the hybrid design strategy, we provide the Monte Carlo properties of the maximum
likelihood estimates based on pooled-unpooled data in Table 2.1. Table 2.1 shows that
the Monte Carlo standard errors of the estimates for

based on pooled-unpooled data

are clearly less than those of the corresponding estimates that utilize repeated measures,
when

(respectively,

). One observed advantage is that the estimation for

based on pooled-unpooled data is very accurate when the total number of measurements
is fixed at the same level. Another advantage is that the standard errors of the estimates
for

based on pooled-unpooled data are much smaller than those of the corresponding

estimates using repeated measures data, as shown in Table 2.1.


27

Table 2.1: The Monte Carlo evaluations of the maximum likelihood estimates based on repeated
measurements and the hybrid design
Estimates
Standard Errors
Sample Replicates n; Parameters
SE( ) SE( ) SE( )

Size Pooling Size p ( , , )


Repeated Measurements:
N=100
n=2
(1, 1, 0.4)
(1, 1, 1.0)

N=300

0.9781
0.9688

0.3997
0.9984

0.1553
0.1726

0.2410
0.3106

0.0790
0.1994

n=5

(1, 1, 0.4)
(1, 1, 1.0)

0.9966
1.0015

0.9462
0.9362

0.3990
0.9998

0.2328
0.2442

0.3305
0.3688

0.0623
0.1570

n=10

(1, 1, 0.4)
(1, 1, 1.0)
(1, 1, 0.4)
(1, 1, 1.0)

1.0026
1.0044
0.9987
1.0005

0.8951
0.8917
0.9921
0.9883

0.3999
0.9995
0.3999
0.9999

0.3209
0.3299
0.0889
0.0995

0.4346
0.4690
0.1405
0.1803

0.0597
0.1501
0.0455
0.1162

n=5

(1, 1, 0.4)
(1, 1, 1.0)

0.9995
0.9990

0.9797
0.9766

0.3998
0.9990

0.1356
0.1409

0.1950
0.2181

0.0365
0.0906

n=10

(1, 1, 0.4)
(1, 1, 1.0)

0.9985
0.9985

0.9682
0.9660

0.3997
1.0002

0.1864
0.1914

0.2633
0.2782

0.0344
0.0861

(1, 1, 0.4)
(1, 1, 1.0)

1.0015
1.0007

1.0160
1.0754

0.4365
1.0098

0.1048
0.1327

0.6712
1.0058

0.4579
0.7275

p=5

(1, 1, 0.4)
(1, 1, 1.0)

0.9994
1.0008

1.0045
1.0053

0.3889
0.9880

0.0924
0.1240

0.3857
0.5932

0.1662
0.3217

p=10

(1, 1, 0.4)
(1, 1, 1.0)
(1, 1, 0.4)
(1, 1, 1.0)

0.9993
0.9996
0.9999
1.0002

1.0049
1.0050
0.9974
1.0069

0.3918
0.9836
0.4066
0.9982

0.0871
0.1197
0.0608
0.0758

0.3341
0.5082
0.3868
0.5788

0.1164
0.2486
0.2652
0.4179

p=5

(1, 1, 0.4)
(1, 1, 1.0)

0.9995
0.9993

1.0013
1.0076

0.3969
0.9910

0.0534
0.0711

0.2197
0.3386

0.0954
0.1819

p=10

(1, 1, 0.4)
(1, 1, 1.0)

0.9995
0.9992

0.9995
1.0059

0.3972
0.9922

0.0497
0.0688

0.1935
0.2928

0.0671
0.1436

n=2

Hybrid Design:
N=100
p=2

N=300

1.0021
1.0006

p=2

28

Table 2.2 displays the coverage probabilities of the confidence interval estimators
constructed by the parametric likelihood and EL method based on repeated measures data
and the mixed data, respectively.

Table 2.2: Coverage probabilities and confidence intervals based on repeated measurements and
the hybrid design
Parametric Likelihood
Empirical Likelihood
Sample Replicates n; Parameters
Size Pooling Size p ( , , ) Coverage
CI
Coverage
CI
Repeated Measurements:
N=100
n=2
(1, 1, 0.4)
0.9420 (0.7028, 1.3014) 0.9496 (0.6980, 1.3049)
(1, 1, 1.0)
0.9423 (0.6665, 1.3347) 0.9466 (0.6613, 1.3394)

N=300

n=5

(1, 1, 0.4)
(1, 1, 1.0)

0.9305
0.9289

(0.5584, 1.4348)
(0.5404, 1.4626)

0.9327
0.9353

(0.5519, 1.4466)
(0.5298, 1.4752)

n=10

(1, 1, 0.4)
(1, 1, 1.0)
(1, 1, 0.4)
(1, 1, 1.0)

0.9044
0.9042
0.9477
0.9469

(0.4193, 1.5859)
(0.4040, 1.6047)
(0.8243, 1.1731)
(0.8056, 1.1955)

0.8985
0.9030
0.9517
0.9479

(0.4158, 1.5876)
(0.4054, 1.6065)
(0.8240, 1.1753)
(0.8034, 1.1962)

n=5

(1, 1, 0.4)
(1, 1, 1.0)

0.9400
0.9448

(0.7401, 1.2588)
(0.7257, 1.2722)

0.9467
0.9467

(0.7360, 1.2628)
(0.7210, 1.2763)

n=10

(1, 1, 0.4)
(1, 1, 1.0)

0.9396
0.9336

(0.6422, 1.3547)
(0.6321, 1.3648)

0.9379
0.9417

(0.6318, 1.3582)
(0.6245, 1.3712)

(1, 1, 0.4)
(1, 1, 1.0)

0.9512
0.9463

(0.7939, 1.2090)
(0.7422, 1.2592)

0.9492
0.9421

(0.7725, 1.2303)
(0.7146, 1.2869)

p=5

(1, 1, 0.4)
(1, 1, 1.0)

0.9424
0.9439

(0.8230, 1.1757)
(0.7644, 1.2372)

0.9490
0.9509

(0.7978, 1.2010)
(0.7314, 1.2703)

p=10

(1, 1, 0.4)
(1, 1, 1.0)
(1, 1, 0.4)
(1, 1, 1.0)

0.9393
0.9431
0.9482
0.9469

(0.8339, 1.1646)
(0.7701, 1.2290)
(0.8817, 1.1182)
(0.8525, 1.1479)

0.9498
0.9478
0.9551
0.9520

(0.8099, 1.1887)
(0.7376, 1.2614)
(0.8660, 1.1337)
(0.8334, 1.1672)

p=5

(1, 1, 0.4)
(1, 1, 1.0)

0.9478
0.9463

(0.8963, 1.1026)
(0.8616, 1.1371)

0.9532
0.9506

(0.8822, 1.1166)
(0.8433, 1.1556)

p=10

(1, 1, 0.4)
(1, 1, 1.0)

0.9462
0.9484

(0.9030, 1.0961)
(0.8652, 1.1332)

0.9584
0.9532

(0.8896, 1.1095)
(0.8475, 1.5080)

n=2

Hybrid Design:
N=100
p=2

N=300

p=2

29

Table 2.2 shows that the EL ratio test statistic is as efficient as the traditional
parametric likelihood approach in the context of constructing confidence intervals
because the coverage probabilities and the interval widths of the two methods are very
close.
It is clearly shown that when sample sizes are greater than 100, the coverage
probabilities obtained via the pooled-unpooled design are closer to the expected 0.95
value than those based on repeated measurements. This, again, demonstrates that mixed
data are more efficient than repeated measures data.
To compare the Monte Carlo type I errors and powers of the tests based on the test
(

statistics

and

) by equations (2.3) and (2.7), we performed 10,000

simulations for each parametric setting and sample size. To test the null hypothesis
(

, we use the statistics

) and

) by equations (2.3) and (2.7). Table

2.3 depicts results that correspond to the case when

). The outputs show that

the Monte Carlo type I errors and powers of the test statistic
than those corresponding to the test statistic
the simple statistic

) are slightly better

). This indicates that the test based on

) performs better than that based on the statistic

), in the

considered cases.
Table 2.4 displays the Monte Carlo simulation results of testing the null hypothesis
when
freedom and

, where

is an effect size. Again, in this case, it is obvious that the type I errors

(when a=0) of the test statistic


based on the test statistic
the test statistic

is a chi-squared distribution with two degrees of

) are much better controlled by 0.05 than those

). Moreover, the Monte Carlo powers of the test based on

) are higher than those based on the statistic

30

) when the effect

size a is large than 0.5. On the contrary, as the effect size a is small such as 0.1 and 0.2,
the Monte Carlo powers of the tests based on the test statistic
(

those based on the statistic


based on the simple statistic

) seem higher than

). This shows that when the effect size a is large, the test
(

) is preferable to that based on the statistic

).

Table 2.3: The Monte Carlo type I errors and powers of the EL ratio test statistics (2.3)
and (2.7) for testing
based on data following the hybrid design
(
)
(
). The pooling proportion = 0.5; the expected significance
(
level was 0.05).
Sample
Pooling
Parameters
ELR Test Statistic (2.3)
ELR Test Statistic (2.7)
Group
Size (N)
Size (p)
Type I
Power
Type I
Power
Error
Error
=0
=0.5
=1.0
=0
=0.5
=1.0
N=100
p=2
0.4
0.0587 0.9919 1.0000 0.0580 0.9718 0.9990
1.0
0.0558 0.9373 1.0000 0.0611 0.9230 0.9986

N=200

p=5

0.4
1.0

0.0555
0.0587

0.9989
0.9626

1.0000
1.0000

0.0530
0.0604

0.9512
0.9446

0.9992
0.9976

p=10

0.4
1.0
0.4
1.0

0.0588
0.0556
0.0495
0.0536

0.9995
0.9684
0.9999
0.9991

1.0000
1.0000
1.0000
1.0000

0.0595
0.0621
0.0594
0.0593

0.9531
0.9680
0.9990
0.9983

0.9999
0.9992
0.9985
0.9995

p=5

0.4
1.0

0.0540
0.0511

1.0000
0.9997

1.0000
1.0000

0.0524
0.0543

0.9952
0.9981

0.9996
0.9999

p=10

0.4
1.0

0.0549
0.0536

1.0000
0.9999

1.0000
1.0000

0.0546
0.0551

0.9950
0.9979

0.9996
1.0000

p=2

31

Table 2.4: The Monte Carlo type I errors and powers of the EL ratio test statistics (2.3)
and (2.7) for testing
based on data following the hybrid design (
(
) , and
. The pooling proportion
= 0.5; the expected
significance level was 0.05).
Sample
Pooling
Parameters
ELR Test Statistic (2.3)
ELR Test Statistic (2.7)
Group
Size (N)
Size (p)
Type I
Power
Type I
Power
Error
Error
a = 0 a = 0.1 a = 0.2 a = 0.5 a = 0 a = 0.1 a = 0.2 a = 0.5
N=100
p=2
0.4
0.0687 0.0862 0.2021 0.8567 0.0724 0.0989 0.2144 0.7446
1.0
0.0654 0.0792 0.1579 0.6978 0.0690 0.0936 0.1696 0.6417

N=200

2.5

p=5

0.4
1.0

0.0649 0.1123
0.0670 0.0906

0.3133
0.1901

0.9808
0.8141

0.0985
0.0862

0.1583
0.1131

0.3552
0.2338

0.9084
0.8171

p=10

0.4
1.0
0.4
1.0

0.0646
0.0623
0.0587
0.0544

0.1454
0.0916
0.1137
0.0940

0.4460
0.2253
0.3381
0.2583

0.9990
0.8776
0.9907
0.9427

0.1016
0.0933
0.0555
0.0534

0.1724
0.1241
0.1210
0.1044

0.4416
0.2693
0.3294
0.2612

0.9269
0.8744
0.8227
0.8147

p=5

0.4
1.0

0.0559 0.1699
0.0538 0.1134

0.5557
0.3366

0.9998
0.9860

0.0857
0.0770

0.1903
0.1468

0.5215
0.3687

0.8944
0.9436

p=10

0.4
1.0

0.0572 0.2279
0.0568 0.1233

0.7539
0.3988

1.0000
0.9935

0.0801
0.0815

0.2027
0.1629

0.6158
0.4219

0.8875
0.9527

p=2

A REAL DATA EXAMPLE

In this section, we illustrate the proposed methods via data from the Cedars-Sinai
Medical Center. This study on coronary heart disease investigated the discriminatory
ability of a cholesterol biomarker for myocardial infarction (MI). We had 80 individual
measurements of cholesterol biomarker in total. Half of these were collected on patients
who recently survived a MI (cases), and the other half on controls who had normal rest
electrocardiograms and were free of symptoms, having no previous cardiovascular
procedures or MIs. Additionally, the blood specimens were randomly pooled in groups of
, keeping cases and controls separate, and then re-measured. Consequently, we had
32

measurements for 20 samples of pooled cases and 20 samples of pooled controls,


allowing us to form the hybrid design.
The p-value of 0.8662 for Shapiro-Wilk test indicates that we can assume a
cholesterol biomarker follows a normal distribution. A histogram and a normal Q-Q plot
in Figure 2.1 confirm that the normal distributional assumption for the data is reasonable.
Figure 2.1: The histogram and the normal Q-Q plot of cholesterol data

We formed hybrid samples by taking combinations of 20 unpooled samples and 10


pooled samples from different individuals for cases and controls, separately. In this
example, we focused on the means of cholesterol measurements and therefore, we
calculated these means based on 40 individual samples for cases and controls, separately.
The obtained means were 226.7877 and 205.5290, respectively. Using a bootstrap
strategy, we compared the confidence interval estimators and the coverage probabilities

33

of the EL method with those of the parametric method. To execute the bootstrap study,
we proceeded as follows. We randomly selected 10 pooled assays of group size
with replacement. We then randomly sampled 20 assays from the individual assays,
excluding those performed on individual biospecimens that contributed to the 10 chosen
pooled assays. With our 20 sampled individuals and 10 pooled assays, we applied a
parametric likelihood method assuming a normal distributional assumption and an EL
ratio test (2.3) to calculate the 95% confidence interval of the mean of cholesterol
biomarkers. We repeatedly sampled and calculated the confidence interval of the
cholesterol mean 5,000 times, obtaining 5,000 values for the confidence interval of the
mean value of cholesterol measurements for both cases and controls. Then we take the
average of these obtained 5,000 values. Table 2.5 depicts the outputs of the bootstrap
evaluation.
Table 2.5: Bootstrap evaluations of the confidence interval estimators based on the
parametric likelihood ratio test and the EL ratio test
Health
MI
CI
Length
CI
Length
Parametric
(192.5738, 220.8708)
28.29704 (210.0585, 239.4560) 29.39748
(Normal)
Empirical
(192.9715, 221.1471)
28.17561 (210.4337, 240.5975) 30.16376
In accordance with the results above, the confidence intervals of estimators of the
cholesterol mean via the EL ratio method are close to those corresponding to the
parametric approach; therefore, we cannot observe a significant difference in the
confidence intervals related to the approaches. This result shows that, in this example, the
proposed EL approach is as efficient as the traditional parametric likelihood approach in
the context of constructing confidence intervals.

34

2.6

CONCLUSIONS

In this chapter, we proposed and examined different parametric and distribution-free


likelihood methods to evaluate data subject to MEs. We evaluated the common sampling
strategy based on repeated measures and the novel hybrid sample procedure. When the
ME problem is in effect, we pointed out that the repeated measurements strategy might
not perform well. The proposed hybrid design utilizes the cost-efficient pooling approach
and combines the pooled and unpooled samples.
The study done in this chapter has confirmed that the strategy to repeat measures
provides a lot of information just related to ME distributions, reducing efficiency of this
procedure compared with the hybrid design in the context of the evaluation of
biomarkers characteristics. We proposed the application of the EL techniques, very
efficient nonparametric methods, to data subject to ME.
To verify the efficiency of the hybrid design and the EL methodology, we provided
theoretical propositions as well as the Monte Carlo simulation results.
The numerical studies have supported our arguments that the likelihoods based on
pooled-unpooled data are more efficient than those based on the repeated measures data.
We showed that the EL method can be utilized as a very powerful tool in statistical
inference involving MEs.

35

CHAPTER 3
AN EMPIRICAL LIKELIHOOD RATIO BASED
GOODNESS-OF-FIT TEST FOR INVERSE
GAUSSIAN DISTRIBUTIONS

3.1

INTRODUCTION

The Inverse Gaussian (IG) distribution has a probability density function of the form
( |
where

and

are parameters. The

{
(

) }

) distribution is extensively known

for

modeling and analyzing right skewed data with positive support across several different
fields of science, e.g., demography, electrical networks, meteorology, hydrology, ecology,
entomology, physiology, cardiology (see, for example, Chhikara and Folks, 1977;
Bardsley, 1980; Seshadri, 1993, 1999; Johnson et al., 1994; Barndorff-Nielsen, 1994).
Given the utility of this distribution, it is meaningful to develop a corresponding
goodness-fit-test, which has satisfactory statistical properties. Towards this end, we
propose constructing a distribution-free goodness-of-fit test for the IG distribution,
which is based on approximating the appropriate parametric likelihood ratio test statistic.
The parametric likelihood approach is a powerful statistical tool, which provides
optimal statistical tests under well-known conditions; e.g. see Lehmann and Romano
36

(2005); Vexler and Wu (2009); Vexler et al. (2010) and Vexler and Tarima (2010). By

virtue of the Neyman-Pearson lemma, the likelihood ratio


correspond to the likelihoods under hypotheses
powerful test statistic when

and

and

, where

and

, respectively, is the most

are completely known. In the nonparametric

context considered in this chapter, forms of

and

are unknown, but are estimable.

In this case, the Neyman-Pearson lemma motivates us to approximate the optimal


likelihood ratios using an empirical approach.
In this chapter, we develop a simple approach that approximates the most powerful
likelihood ratio goodness-of-fit test for the IG distribution. The method is efficient and
utilizes adapted principles of the empirical likelihood (EL) methodology. The EL
approach allows investigators to employ methods with properties that are close, in the
asymptotic sense, to those of parametric techniques without having to assume a
parametric form for the likelihood functions under either hypothesis

or

(e.g., Lazar,

2003; Qin and Lawless, 1994; Owen, 2001; Vexler et al., 2009; Vexler et al., 2010; Yu et
al., 2010). The main advantage of the EL approach is that it is based on the maximum
likelihood methodology when given a set of well-defined empirical constraints. The EL
function of n i.i.d. observations
s,

, maximize

has the form of

, where the

and satisfy empirical constraints corresponding to

hypotheses of interest. This approach is based on the likelihood methodology involving


cumulative distribution functions

{ ( )

is

( )

( ) . In this case, the likelihood has the form of

)} (see, for details, Owen, 2001). For example, if the null hypothesis

, then the values of


and

s in

should be chosen to maximize

, where the constraint


37

given

is an empirical

version of ( )

. The components

s of the EL can be obtained using Lagrange

multipliers.
Vexler and Gurevich (2010), as well as Gurevich and Vexler (2011), utilized the main
idea of the classical EL methodology to approximate parametric likelihood ratios. The
authors proposed a nonparametric approach based on approximate density functions. In
this chapter, we derive the EL ratio test for the IG distribution, using the density-based
distribution-free likelihood approach by Vexler and Gurevich (2010). Following
Mudholkar and Tian (2002), we transform the observations to be

and present the likelihood function under the alternative hypothesis in the form of

based on

where

( ) ),

. Values of

version of the constraint ( )

f is a density function,

()

can be estimated by maximizing

is the i-order statistic


given an empirical

. The nonparametric likelihood method mentioned

above is then utilized for the purpose of developing our goodness-of-fit test statistic.
Since the proposed test statistic approximates the most powerful parametric likelihood
ratio, the density-based EL ratio test is shown to have very efficient characteristics. We
also demonstrate the proposed test statistic improves upon the decision rule of Mudholkar
and Tian (2002) and maintains good type I error control for finite samples.
In Section 3.2, we create the EL test based on densities for the IG. Here, theoretical
propositions depict properties of the test. A Monte Carlo study of the power of the test is
presented in Section 3.3. Section 3.4 is given to real data examples employing the
proposed test. Section 3.5 consists of concluding remarks.

38

3.2

METHOD

In this section, we derive the EL ratio goodness-of-fit test based on approximating


densities for the IG distribution under the null and alternative hypotheses. The proposed
test is shown to be consistent as
hypothesis that

. Consider the problem of testing the composite

are distributed as IG with unknown parameters

and

Mudholkar and Tian (2002) presented an entropy characterization property of the IG


family. The authors constructed a goodness-of-fit test for the IG distribution based on a
test statistic involving the sample entropy. Their test statistic contains an integer
parameter with unknown optimal values. The power of the entropy-based test statistic is
strongly dependent on values of the unknown parameter. Mudholkar et al. (2001) also
developed a goodness-of-fit test for the IG model by using a characterization theorem for
the IG distribution for which it is known that
)}

and

are independent when the random sample (

,,

) is from an IG

population.
To approximate the optimal parametric likelihood ratio, we note the following issues.
In the context of testing for an IG, we must assume that the null density function, say
( )

of

hypothesis,

is known up to parameters

and

, whereas under the alternative

has a completely unknown distribution,

. In this case, the maximum

likelihood estimation can be applied to evaluate the unknown parameters


approximating the
(

-likelihood function. However, the

and

-likelihood function, say

), should be approximated nonparametrically. Towards this end, we propose

to apply a density-based EL technique (e.g., Vexler and Gurevich, 2010). According to


this methodology, which is based on densities rather than the classical maximum EL
39

methodology based on the empirical distribution function, we derive approximation of


the parametric likelihood
()

),

( )

( )

( )

( ).

, where

(The transformation

()

( ))

with
)

was

proposed by Mudholkar and Tian (2002) in the context of the entropy-based test for the
IG.) In the following section, we present the method for approximating the
likelihood function

-parametric

with its empirical counterpart.

3.2.1 DENSITY-BASED EL RATIO GOODNESS-OF-FIT


TEST
The distribution-free EL ratio tests are comparable asymptotically with the powerful
parametric likelihood ratio tests over a variety of statistical inference problems. In
Section 3.1, we outlined the classical EL approach, which has been dealt with extensively
in the literature (e.g., Lazar, 2003; Qin and Lawless, 1994; Owen, 2001; Yu et al., 2010;
Vexler and Gurevich, 2010). In accordance of the EL literature, the EL approach is a
cumulative distribution based method. However, the optimal parametric likelihood ratio
tests are density-based. Hence, some modifications are necessary for applying this
method to the problem at hand.
Utilizing the main idea of the classical EL technique, the density-based EL ratio tests
were derived to provide efficient tests for normality and uniformity (Vexler and Gurevich,
2010). Following the maximum EL methodology, we can derive values of ,

that maximize

and satisfy the empirical constraints under the alternative hypothesis

Obviously, values of

should be restricted by the equation ( )

40

. Thus, we

need an empirical form of the constraint ( )

. This empirical constraint can be

obtained by the following lemma.


Lemma 3.1. Let ( ) be a density function. Then
(

( )

( )

( )

( )

( )

( )

( )
( )

( )

( )

where

( ),

if

, and

( ),

if

Proof. We outline the proof in Appendix 2.1.


By virtue of Lemma 3.1, it is obvious that since
( )

( )

( )

( )

( )

Note that, using the empirical approximation to the remainder term in Lemma 3.1, we
have
( )

( )

( )

Thus, we can obtain that

when

as

. For simplicity, by

applying the approximate analog to the mean value integration theorem, we can write
(

( )

41

))

Therefore, by virtue of (3.1), the empirical constraint under the alternative hypothesis is
given by

))

Consequently, under the empirical constraint (3.2), the Lagrangian function of the log EL
is

where

is a Lagrange multiplier. Values of

))

maximizing equation (3.3) given the

constraint equation (3.2) satisfy the equation


(
(

Then,

)
))

and taking summation, we conclude with

Since equation (3.2), we have

))

))

. Finally, by equation (3.4), the target values of

have the form of


(
where

( ),

if

, and

))

(
(

( ),

if

Thus, using the maximum EL method, the likelihood ratio test statistic can be
constructed as

42

( )|

))

The density

( | ) has the form of

( | )
where

).

When the parameters

and

are unknown, the maximum likelihood estimators and

can be applied to equation (3.6). Now the test statistic can be written as

(
where

This test statistic

) ]

and

))

is equivalent to the entropy-based test statistic that was

proposed by Mudholkar and Tian (2002), who arrived at it in a different manner via
entropy-based consideration (for details, see Appendix 2.2). The step-by-step derivation
of the EL-based method given above demonstrates how the test statistic

is an

approximation to the optimal likelihood ratio. Thus, we expect directly that a test based
on

will provide highly efficient characteristics. The distribution of the test statistic

depends strongly on values of the integer parameter m. To efficiently execute the test
based on sample entropy, the optimal values of m should be evaluated. In accordance
with Mudholkar and Tian (2002), these optimal values of m can be presented using
information regarding the alternative distribution. We take this one step forth and can
improve upon the test statistic

in the context of eliminating dependence on the

integer parameter m reconsidering the test construction with respect to the EL concept.
The constraint (3.1) is taken into account in order to derive the EL ratio test based on

43

densities.

( )
( )

However,

( )

if

for

m,

(
(

( )

then

by virtue of Lemma 3.1, which is inadmissible. Thus, we restrict


to satisfy

values of

some

there exists an integer

))

, for all m. It is clear that

satisfying

Therefore,

(Here, if

( )

, then

( ).)

Because the constraint (3.2) approximates equation (3.1), if

satisfies

, for some r, we have


(

))

Hence, . That is, if


to

, for k

( )

( )

( )

( )

( )

satisfy

, we can expect that

are subject

r, too. Thus, we can expect that

Equations (3.8) and (3.9) conclude the approximation to the likelihood and can be
defined as
44

The occurrence of the minimum of m into the approximation is strongly justified in


the proofs of the proposition, which shows the asymptotic consistency of the proposed
test. This proposition will be introduced below.
Since in the equation below (3.1), we have the remainder term (
should be vanished as

( ) as

, we also require that

that

. This conditional

bound on m is also mentioned in the literature with respect to proving the consistency of
entropy-based tests (e.g., Vasicek, 1976; Tusnady, 1977; Vexler and Gurevich, 2010).
Thus, we propose the test statistic in the form of

(
where

utilize

) ]

and

))

. In this article, we will

(Section 3.3 shows that the power demonstrated by the proposed test is

relatively the same for different values of

(0, 1)). Finally, we introduce the test for IG

distribution having the form: Reject the null IG hypothesis if

, where C is a

test threshold. The next proposition depicts the asymptotic consistency of the test.
Proposition 3.1. Under
(

then
( (

) [ (

,
(
)

( (

( )
(

)
)

, while, under
, as

)) ]).

Proof. See Appendix 2.3.

45

, where

, if (

( ))

The proof of Proposition 3.1 is based on Proposition 2.2 of Vexler and Gurevich (2010),
where the minimum of

into the test statistic is formally justified.

Proposition 3.1 demonstrates that


value satisfying the type I error

when

, where

is a critical

. Therefore, the proposed test is consistent

(i.e., asymptotic power one).

3.2.2 NULL DISTRIBUTION


Certain lines of research have developed around the asymptotic distribution problems
involving Vasiceks entropy estimator and different entropy-based statistics (e.g., Cressie,
1976; Dudewicz and van der Meulen, 1981; Hall, 1984, 1986; Khashimov, 1989; Van Es,
1992). It is generally recognized that the asymptotic distribution of the test statistic,
(

), which includes an estimate of the nuisance parameter, of

as well as of ,

in the IG case, is analytically difficult. Practical applications motivated us to consider


critical values for fixed sample sizes. Thus, in this chapter, we tabulate the critical values
for fixed sample sizes using a broad set of Monte Carlo simulations. The asymptotic
results which were presented in this section assist to control p-values of the proposed test
for large samples.
To tabulate the Monte Carlo percentiles of the null distribution, we conducted the
following Monte Carlo experiment. The experiment draws 50,000 replicate samples of
the test statistic

) at each sample size n. In this experiment, data were generated

from the IG(1,1) distribution. The generated values of the test statistic
used to determine the critical values

) of the null distribution of

significance level . The results of the experiment are presented in Table 3.1.

46

) were

) at the

In order to evaluate the accuracy of the obtained critical values, we depict in Table 3.2
the estimated type I error control using the 5th percentiles of
(

) test statistic

). Towards this end, we generated random samples from the IG populations. A

selection of the results is displayed in Table 3.2. It can be seen that the empirical
percentiles given in Table 3.1 provide excellent type I error control and thus can be
confidently recommended to be used in practice.
Table 3.1: The critical
)
, i.e.
{ (
Sample
size\
0.01
10
7.1106
15
8.4823
20
9.1504
25
9.6813
30
10.3156
35
10.7687
40
11.1102
45
11.4799
50
11.7863
55
12.2253
60
12.4767
65
12.7206
70
13.0735
75
13.3515
80
13.6866
85
13.8912
90
14.1715
95
14.3226
100
14.5004
120
15.3873
150
16.3815
200
17.6377
250
19.0457
300
20.0134

values1,
(
(
)}
0.02
6.5853
7.8319
8.4316
8.9327
9.5426
9.9797
10.3427
10.7279
11.0235
11.4291
11.6485
11.8821
12.2269
12.5373
12.8380
12.9608
13.2555
13.3931
13.5990
14.4181
15.3858
16.7383
18.0246
18.8813

), of the proposed test at the significance level


.

0.03
6.2781
7.4314
7.9990
8.5005
9.1027
9.5197
9.8696
10.2458
10.5751
10.9412
11.1331
11.4015
11.7186
12.0057
12.3101
12.5013
12.7261
12.8802
13.0714
13.8550
14.8241
16.1220
17.3325
18.1871

0.04
6.0392
7.1291
7.6968
8.1984
8.7600
9.1735
9.5152
9.8953
10.2299
10.5814
10.7850
11.0314
11.3325
11.6373
11.9135
12.1090
12.3453
12.5106
12.6594
13.4787
14.3910
15.6491
16.8719
17.7225

47

0.05
5.8605
6.9042
7.4769
7.9547
8.4807
8.9102
9.2475
9.6262
9.9838
10.3017
10.5217
10.7672
11.0447
11.3509
11.5901
11.8101
12.0278
12.2058
12.3785
13.1542
14.0375
15.3042
16.4938
17.3100

0.06
5.7016
6.7207
7.2840
7.7582
8.2622
8.6944
9.0397
9.3990
9.7577
10.0863
10.3099
10.5457
10.7979
11.1066
11.3451
11.5501
11.7836
11.9523
12.1138
12.8996
13.7557
15.0077
16.1581
16.9705

0.07
5.5559
6.5589
7.1336
7.5782
8.0810
8.4981
8.8636
9.2114
9.5647
9.8780
10.1081
10.3398
10.5989
10.9014
11.1249
11.3309
11.5773
11.7369
11.8898
12.6652
13.5141
14.7280
15.8500
16.6827

0.08
5.4365
6.4104
6.9831
7.4365
7.9190
8.3395
8.7066
9.0504
9.3945
9.6964
9.9422
10.1695
10.4137
10.7156
10.9333
11.1470
11.3801
11.5396
11.6932
12.4551
13.2955
14.5083
15.5890
16.4113

Table 3.1: (contd)


Sample
size\
0.09
0.1
0.2
0.25
0.3
0.5
10
5.3403
5.2464
4.5818
4.3537
4.1537
3.5341
15
6.2808
6.1572
5.3583
5.0968
4.8658
4.1954
20
6.8429
6.7233
5.8916
5.6168
5.3909
4.6884
25
7.3021
7.1786
6.3610
6.0822
5.8399
5.0826
30
7.7906
7.6603
6.8005
6.5054
6.2528
5.4606
35
8.2009
8.0782
7.1993
6.8813
6.6158
5.7813
40
8.5511
8.4212
7.4996
7.1925
6.9176
6.0574
45
8.9051
8.7682
7.8237
7.4973
7.2154
6.3309
50
9.2391
9.0930
8.1252
7.7896
7.5094
6.5824
55
9.5344
9.3925
8.3903
8.0455
7.7480
6.7965
60
9.7673
9.6370
8.6481
8.3002
8.0015
7.0258
65
10.0041
9.8645
8.8596
8.5034
8.1955
7.1992
70
10.2722
10.1242 9.1086
8.7367
8.4193
7.3990
75
10.5448
10.3932 9.3044
8.9402
8.6249
7.5767
80
10.7531
10.6066 9.5426
9.1530
8.8196
7.7530
85
10.9747
10.8349 9.7473
9.3604
9.0177
7.8937
90
11.2128
11.0538 9.9233
9.5282
9.1813
8.0560
95
11.3760
11.2151 10.0996 9.7067
9.3485
8.2065
100
11.5174
11.3670 10.2358 9.8300
9.4805
8.3179
120
12.2712
12.0964 10.8948 10.4702 10.0877 8.8112
150
13.1050
12.9199 11.6778 11.2222 10.8089 9.4228
200
14.3100
14.1224 12.7396 12.2268 11.7565 10.1912
250
15.3619
15.1536 13.6324 13.0803 12.5864 10.8283
300
16.1683
15.9364 14.3229 13.7106 13.1711 11.2812
1
Simulation estimates based on 50,000 replications of data at each n and .

(
Table 3.2: Type I error control 1 of the proposed
Sample size IG(1,0.5) IG(1,2) IG(1,4) IG(1,8)
10
0.0445
0.0532 0.0601 0.0654
20
0.0424
0.0503 0.0541 0.0573
30
0.0486
0.0534 0.0524 0.0553
40
0.0504
0.0514 0.0529 0.0585
50
0.0446
0.0527 0.052
0.0489
1
Simulation estimates based on 10,000 replications.

48

) test: =0.05

In this chapter, we also present an asymptotic result that insures the appropriate
significance level of our test is maintained as

. The following proposition provides

an asymptotic upper bound on the significance level of the test based on the statistic
[(

(3.10). Define

( ))

( ))]

, where a and b are constants.

Proposition 3.2. The significance level of the proposed test satisfies asymptotically
( n ) the inequality
[( [
where

] )

( [

])

is the standard normal distribution function,

( )

0.5772 is the Euler constant,

and Rm 1 / j ; [d] is an integer part of d.


j 1

Proof. See Appendix 2.4.


Note that we chose
(

) and

to minimize a distance between the asymptotic distributions of


[

] ),

where

is defined by equation (3.7). Regarding the

constants a and b, we suggest applying the values a = 29.42109 and b = -29.87852. These
values were obtained empirically based on a broad Monte Carlo study. For example,
when n = 100, by virtue of the Proposition 3.2, the recommended asymptotically critical
value is 12.44114 at =0.05, that corresponds to the actual type I error, 0.0507 obtained
using IG(1,1) random samples. Table 3.3 represents the actual type I errors of the
proposed test, when the critical values were chosen with respect to Proposition 3.2 for
. The results presented in Table 3.3 are based on generated samples from
IG(1,1).

49

Table 3.3: The Monte Carlo Type I errors of the proposed test with critical values
obtained by Proposition 3.2 to guarantee
.
Sample size
The Monte Carlo Type I error
100
0.0507
150
0.0621
200
0.0829
250
0.0309
300
0.0459
500
0.0334
Table 3.3 demonstrates that the upper bound obtained by Proposition 3.2 can be used
in practice.

3.3

POWER PROPERTIES

The proposed goodness-of-fit test statistic approximated nonparametrically the optimal


likelihood ratio. Thus, we expected our test statistic (3.10) to provide a powerful test as
compared to its competitors. The following Monte Carlo study was conducted to
investigate the power properties of the proposed IG goodness-of-fit test for moderate
sample sizes. In this study, 10,000 repetitions of data with size n = 10, 20, 30, were
computed from each of the following populations: exponential with mean 1, uniform [0,
1], Weibull (1, 2) with scale parameter 1 and shape parameter 2, and lognormal (0.5, 1)
with mean e, and standard deviation

. The power study follows a similar line that

found in Mudholkar and Tian (2002). Table 3.4 shows the estimated power of our EL
goodness-of-fit test (3.10) as compared to other goodness-of-fit tests presented by
Mudholkar and Tian (2002), Mudholkar et al. (2001), Edgeman et al. (1988) and by
Edgeman (1990). Because of the problem of choosing an optimal m, we represent the test
proposed by Mudholkar and Tian (2002) for different values of m as seen in Table 3.4.

50

Table 3.4: An empirical power comparison1 for the


ratio test at
.
Distribution
n
Kn,m=2
Kn,m=3
Kn,m=4
2
Exponential
10
0.2075 0.1991 0.1499
20
0.4668 0.46752 0.4347
30
0.6347 0.65972 0.6563

density-based empirical likelihood


Kn,m=5
0.0643
0.3781
0.6169

Z
0.0206
0.0226
0.0328

KS1
0.262
0.518
0.654

KS2
0.280 0.2060
0.525 0.4636
0.668 0.6446

Uniform(0,1)

10
20
30

0.47682 0.4759 0.3987 0.1857


0.8682 0.88022 0.8645 0.8407
0.9687 0.97892 0.9833 0.9772

0.1814
0.426
0.5764

0.342 0.356 0.5078


0.616 0.630 0.8826
0.776 0.782 0.9815

Weibull(1,2)

10
20
30

0.12512 0.1256 0.1064 0.0428


0.2536 0.27072 0.2332 0.2105
0.3564 0.38952 0.4028 0.3636

0.0721
0.1611
0.277

0.06 0.074 0.1354


0.128 0.080 0.2565
0.168 0.111 0.3847

LogNor(0.5,1) 10
0.04672 0.0464 0.0437 0.0318 0.0407 0.048 0.055 0.0460
20
0.0588 0.05352 0.0441 0.0351 0.042
0.068 0.080 0.0541
2
30
0.0707 0.0679 0.0591 0.0517 0.0397 0.082 0.111 0.0547
1
Simulation estimates based on 10,000 replications. 2 Values corresponded to the optimal
m found empirically by Mudholkar and Tian (2002) given known alternatives. Kn,m
Entropy-based test (Mudholkar and Tian, 2002). Z Indepedence characterized test
(Mudholkar et al., 2001). KS1 Modified Kolmogorov-Smirnov test (Edgeman et al.,
1988). KS2 Kolmogorov-Smirnov test using transformation (Edgeman, 1990).

The Monte Carlo study demonstrates the power of the EL goodness-of-fit test is
superior to or about equal to the Mudholkar and Tian (2002) test with values of m that
were selected by Mudholkar and Tian empirically, and given known alternatives (see
Remark 4.2 in Mudholkar and Tian, 2002). Table 3.4 depicts how the selection of m
strongly affects the powers of the test statistic proposed by Mudholkar and Tian (2002).
The wrong choice of m can lead to a 50% reduction in the power of the entropy-based
test by Mudholkar and Tian (2002). We investigate the power in the next section utilizing
the real data examples.

51

3.4

DATA EXAMPLES

In this section, we use the proposed EL goodness-of-fit test described above and the test
proposed by Mudholkar and Tian (2002) to evaluate the appropriateness of the IG
distribution to data from four different studies that were analyzed in Folks and Chhikara
(1978). The data sets were composed of shelflife (days) of a food product (say, Dataset 1),
fracture toughness of MIG (metal inert gas) welds (say, Dataset 2), precipitation (inches)
from Jug Bridge, Maryland (say, Dataset 3), and runoff amounts at Jug Bridge, Maryland
(say, Dataset 4). Table 3.5 presents the p-values obtained via the EL goodness-of-fit test
and the entropy-based test by Mudholkar and Tian (2002) for our example data.
Table 3.5: Test for the IG based on the data introduced in Folks and Chhikara (1978).
Dataset
Sample size Test
p-value
1
26
0.0063
K26, m=2
0.0061
K26, m=3
0.0058
K26, m=4
0.0090
K26, m=5
0.0041
2

19
K19, m=2
K19, m=3
K19, m=4
K19, m=5

0.1791
0.1565
0.1516
0.2663
0.3379

K25, m=2
K25, m=3
K25, m=4
K25, m=5

0.0099
0.0197
0.0159
0.0098
0.0063

K25, m=2
K25, m=3
K25, m=4
K25, m=5

0.9271
0.9393
0.9538
0.8738
0.8001

25

25

52

Based on the results from Table 3.5, our test and the test by Mudholkar and Tian
(2002) provide identical conclusions about the goodness-of-fit test of the IG distribution
at

. The IG distribution is rejected for Datasets 1 and 3. The EL-based

goodness-of-fit test indicates that Datasets 2 and 4 are well described by an IG


distribution. Note that Folks and Chhikara (1978) did not reject the IG distribution for
Dataset 1 using Kolmogorov-Smirmov test. In this case, both the proposed test and the
entropy-based test proposed by Mudholkar and Tian (2002) rejected the assumption that
Dataset 1 follows an IG distribution. If the rejection level is set at

, the

Mudholkar and Tian (2002) test provides different decisions depending on values of m
for Dataset 3. The density-based EL ratio goodness-of-fit test demonstrates high
sensitivity in these data examples given non-IG alternatives.
In addition to our illustration, a bootstrap type study was conducted to examine the
proposed EL-based goodness-of-fit test and the test by Mudholkar and Tian (2002). From
each of Datasets 1, 3 and 4, two samples with the sizes 15 and 20 were randomly selected,
respectively, in order to be tested for an IG fit at a 5% level of significance. Two samples
of sizes 10 and 15 were randomly selected from Dataset 2, respectively, to be tested for
an IG fit at a 5% level of significance. We repeated this strategy 10,000 times calculating
the frequencies of the events
and

= 5.8605,

= 6.9042,

= 7.4769 (the critical values were chosen from Table 3.1). In a

similar manner to the structure above, we examined the test proposed by Mudholkar and
Tian (2002). For Dataset 2, the proposed test rejected the IG assumption in 2625 (the case
of n = 10) and 4059 (the case of n = 15) events, while the test proposed by Mudholkar
and Tian (2002) rejected the IG assumption in 2583 and 3762 events, respectively. Thus,

53

this indicates that our proposed method is more sensitive as compared to the Mudholkar
and Tians test. The results of the bootstrap type studies are presented in Table 3.6.
Table 3.6: Bootstrap type proportion of rejection1 of the tests for the IG at a 5% level of
significance.
Sample size Kn,m=2 Kn,m=3 Kn,m=4 Kn,m=5
Dataset 1
15
0.7199 0.6655 0.6799 0.7024 0.6796
20
0.9023 0.8574 0.8468 0.8662 0.8499
Dataset 2
10
0.2961 0.2583 0.2784 0.2226 0.2625
15
0.5649 0.3762 0.3302 0.3422 0.4059
Dataset 3
15
0.6880 0.6486 0.5989 0.6507 0.6389
20
0.8422 0.8168 0.7961 0.7893 0.8082
Dataset 4
15
0.1898 0.0997 0.0728 0.0767 0.0938
20
0.2820 0.1318 0.0889 0.0761 0.0948
1
Bootstrap type estimates based on 10,000 replications.
Table 3.6 confirms the practical applicability of the proposed test.

3.5

CONCLUSIONS

In this chapter, we have developed the EL ratio methodology based on approximating


densities nonparametrically under the alternative hypothesis in order to test for the IG
distributions. We have proven that the proposed density-based EL ratio test has the
structure that is similar to that of the entropy-based goodness-of-fit test for IG presented
by Mudholkar and Tian (2002). Since the goodness-of-fit test (3.10) is a nonparametric
approximation to the traditional likelihood ratio test, we anticipated good power
characteristics. Utilizing a broad Monte Carlo study, we showed that the proposed
density-based EL ratio test is powerful when compared with the known goodness-of fit
tests. Also note that the test presented by Mudholkar and Tian (2002) depends on values
of an integer parameter m. The optimal values of the Mudholkar and Tian test are
generally only known when information regarding parametric forms for the alternative
distributions is available. Utilizing the EL concept, we eliminate this restrictive
54

dependence on the parameter m. Theoretical support for the proposed EL ratio test is
obtained by proving consistency of the new test and an asymptotic proposition regarding
the null distribution of the density-based EL ratio test statistic. The data examples
demonstrated that the proposed test is reasonable when applied to real data. In general,
the methodology presented in this chapter can be easily adapted to construct different
nonparametric tests, approximating the optimal likelihood ratios via the EL concept.

55

CHAPTER 4
AN EXTENSIVE POWER EVALUATION OF A
NOVEL TWO-SAMPLE DENSITY-BASED
EMPIRICAL LIKELIHOOD RATIO TEST FOR
PAIRED DATA

4.1

INTRODUCTION

Group-comparison methods have been well addressed at clinical field trials and different
biostatistical applications. In many cases, investigators execute a design of experiment
that yields pre-treatment and post-treatment measurements in order to evaluate different
treatment effects. Therefore, it is desirable to have an efficient statistical tool that can be
utilized to compare treatments in studies based on paired data related to different
populations.
To operate with these two independent sets of paired observations, classical
procedures can be applied to the two separate sets of paired values, including the
independent two-sample t-test and the Wilcoxons test (Wilcoxon rank sum test). The
statistical literature has extensively pointed out several issues related to these classical
tests. For example, Albers et al. (2001) indicated that in the case of nonconstant shift
alternatives, the Wilcoxons test may break down completely. The independent twosample t-test is known commonly to have very good properties based on data that are

56

close to be normally distributed, but this test cannot be recommended to be applied using
observations from strongly skewed distributions (see, e.g. Vexler et al., 2009). Note that
the type I error of the t-test based on non-normally distributed data can be controlled just
for large sample sizes using the asymptotic distribution of the t-test statistic. The t-test,
which is mainly used to detect a change of mean, may be inadequate to detect changes of
other measures of location, e.g. the medians. The two-sample Kolmogorov-Smirnov test,
in several situations, is recognized to show relatively lower powers against the other
classical tests. These classical tests are developed to solve general problems to compare
distributions of two populations.
This chapter aims to provide a powerful test to detect differences between
distributions of two therapy groups using the paired data. Towards this end, we propose
to adapt the test introduced by Gurevich and Vexler (2011) and to reconstruct the form of
the Gurevich and Vexlers test statistic based on paired data rather than independent
observations. In a similar manner to the test presented by Gurevich and Vexler (2011),
we approximate nonparametrically the most powerful Neyman-Pearson test statistics to
solve the stated problem. By virtue of the Neyman-Pearson lemma, the likelihood ratio
tests are most powerful decision rules when the forms of distribution functions are known
under the null and alternative hypotheses, respectively (e.g. Lehmann and Romano, 2005;
Vexler and Wu, 2009; Vexler et al., 2010). However, in some cases, required parametric
forms of distributions can be unavailable. Thus, we provide a distribution-free alternative
approach that approximates the optimal parametric likelihood ratio test, utilizing the
density-based empirical likelihood methodology.

57

The empirical likelihood (EL) methodology has been discussed considerably as one of
the principal nonparametric techniques in the statistical literature (e.g. Qin and Lawless,
1994; Owen, 1988, 1990, 2001). The EL ratio approach provides researchers with
methods that are asymptotically close to the parametric likelihood techniques without
parametric assumptions (e.g. Lazar and Mykland, 1998). This approach is based on the
likelihood in the form of

{ ( )

)} , where

independent and identically distributed observations and

, are

is a distribution function. This

implies that the EL method is a distribution-functions based technique. Following the EL


( )

methodology, approximate values of


the EL,

) should be found to maximize

, given empirical constraints (e.g. Owen, 1988, 1990, 2001). For example,
declares the empirical constraint

the assumption

. Nevertheless,

according to the Neyman-Pearson lemma, the most powerful test statistic for testing the
is the likelihood ratio

hypothesis

( ), where

functions

( ) and

( ) and

( )

( ) are distribution functions with corresponding density

( ), respectively. The density-based structure of the likelihood ratios

is strongly involved in the proof of the Neyman-Pearson lemma. To create a


nonparametric test that approximates the Neyman-Pearson test statistic, Vexler and
Gurevich (2010) as well as Gurevich and Vexler (2011) followed the principle idea of the
classical EL methodology applying density functions instead of empirical distribution
functions to construct empirical likelihoods. The structure of the density-based EL has
the form of

the observations

( )

and

, where
( )

( )

58

( ) ),

( ) is a density function of

are order statistics derived from

. To obtain estimated values of (


maximizing

( ) ),

we employ the maximum EL concept

subject to an empirical constraint related to the rule ( )

Recently, it has been shown that one-sample density-based EL ratio tests for
goodness-of-fit introduced by Vexler and Gurevich (2010) can be efficiently applied to
construct goodness-of-fit tests. Gurevich and Vexler (2011) and Vexler and Yu (2011)
extended the density-based EL ratio tests for goodness-of-fit to tests that can be applied
to twosample problems. In this chapter, we extend and modify the Gurevich and
Vexlers test (2011) to compare two independent samples, where each sample is based on
paired data instead of independent observations. The contributions of this chapter lie
mainly in three directions. First, the proposed test is constructed to detect between-groupdifferences with respect to treatment effects based on paired data. Note that the twosample tests mentioned in the literature (e.g., Gurevich and Vexler, 2011) can be shown
to have good characteristics based on independent observations; however, they might
have a relatively weak efficiency to compare treatment effects. Second, we relax the
bounds of the test parameters and demonstrate the robustness of the proposed method
with respect to values of test parameters. The density-based EL literature states that
properties of density-based EL ratio procedures do not depend significantly on values of
the parameters. Thus, one of the objectives of this chapter is to confirm this fact by
evaluating the proposed test for different values of parameters via an extensive Monte
Carlo study. Lastly, we analyze powers of the proposed density-based EL ratio procedure
based on paired data in an extensive Monte Carlo study, where we utilize various
scenarios of distributions and sample sizes. It should be noted that in nonparametric
settings, commonly there are no statistical tools that can be recommended uniformly for

59

all possible scenarios. Therefore, in the context of applied statistics, extensive empirical
evaluations of the powers of the proposed nonparametric test and the relevant classical
nonparametric procedures would be very helpful to provide investigators with
information regarding the efficiency of the proposed nonparametric technique as
compared to standard procedures.
The applicability of the proposed test in practice is illustrated via the motivating
example based on a dataset from the Center for Children and Families, University at
Buffalo, the State University of New York. The purpose of the study is to investigate the
feasibility and efficacy of a group-based therapy program for children with AttentionDeficit/Hyperactivity Disorder (ADHD) and Severe Mood Dysregulation (SMD). ADHD
is a common diagnosed psychiatric disorder in children (e.g. Biederman, 1998; Nair et al.,
2006). SMD, recently created by the Leibenlufts laboratory in the National Institute of
Mental Healths intramural program, refers to children with hyperarousal, an abnormal
baseline mood, and increased reactivity to negative emotional stimuli (e.g. Brotman et al.,
2006; Carlson, 2007; Leibenluft et al., 2003; Waxmonsky et al., 2008). Both ADHD and
SMD can significantly impair childrens behavioral

and psychophysiological

performance; therefore, it is critical to develop an effective treatment for children with


ADHD and SMD. In this research program, study subjects were randomized between two
treatment groups: experimental 11 week group therapy program (therapy group) and
community psychosocial treatment (control group). The changes in Childrens
Depression Rating Scale Revised total score (CDRS-Rts) observations at week 0
(baseline) and week 11 (endpoint) are utilized as our outcome measures (see, e.g.,
Poznanki et al., 1979, 1984). We compare these paired observations in each group to

60

evaluate the treatment effects of ADHD and SMD in children using the proposed densitybased EL ratio test. To evaluate efficiency of the proposed density-based EL ratio test, we
construct appropriate tests based on the classical procedures that are the independent twosample t-test, the Wilcoxons test, and the two-sample Kolmogorov-Smirnov test.
The rest of this chapter is organized as follows. In Section 4.2, we propose the twosample density-based EL ratio test based on paired data to compare treatment effects into
groups. The asymptotic consistency of the proposed test is presented in Section 4.2.
Section 4.3 provides Monte Carlo comparisons between the proposed test and classical
procedures using various designs of data distributions. This section also confirms
experimentally the high efficiency of the density-based EL ratio technique utilizing
different scenarios of data distributions that have not been addressed in the EL literature.
The application of the proposed procedure to the treatment study of ADHD and SMD is
demonstrated in Section 4.4. In Section 4.5, we provide some concluding remarks.

4.2

METHOD

In this section, we extend and adapt the Gurevich and Vexlers test (2011) to develop an
efficient density-based EL ratio method for the two-sample testing problem in paired data
settings. We begin to formalize the testing problem. Suppose that we have independent
paired observations (

) from group 2 with the sample size

),, (

),, (

) from group 1 with the sample size

and

. Note that classical one-

sample tests for paired data, such as the paired t-test and the Wilcoxon signed rank test,
consider differences

. Consequently, we also consider the

two-sample comparisons based on the within-pair differences


, denoted by (

) and (
61

). In this case,

and

( ),

are independent random variables, say, with the distribution functions

( ),

respectively. Our interest lies to test for the hypothesis


(

against
where

and

are assumed to be unknown distribution functions with

corresponding density functions

and

, respectively. Note that, in the context of

(4.1), we compare two samples based on pre- and post-treatment measurements. In the
context of two-sample comparisons of data distributions, Gurevich and Vexler (2011)
proposed to approximate optimal likelihood ratios using the EL concept. The test
proposed by Gurevich and Vexler (2011) was not evaluated when paired data were
utilized. We extend and modify this approach to be applied to the statement of problem
(4.1). In accordance with the stated problem, the corresponding parametric likelihood
ratio test statistic, , for (4.1) takes the form of

where
( )

)
( )

( ) ),

( )

the observed samples (

(
),

( )

( )

( ) ),
( )

(
(

) and (

( ) ),

with respect to
( )

( ) ),

and

are order statistics based on

), respectively. Utilizing the idea

of the density-based EL method, we begin to estimate values of

by maximizing

given an empirical version of the constraint

. By virtue of Proposition 2.1 in Gurevich and Vexler (2011), we have

62

( )

( )

( )

for all integer m when


Let

(
(

( )

( )

)
)

( )

( )

( )

if

, and

if

. Using the fact that


( )

( )

( )

we obtain

and
Note that

when

, as

)
)

( )

can be estimated by applying the approximate analog to the integral mean

value theorem. It turns out that


(

63

( )

( )
( )

( )

))

) ))

Using the empirical distribution function

( )

( (

as an estimator of the distribution function


corresponding to

))

, we present the empirical constraint

in the form of
(

))

) ))

In this case, taking into account the empirical constraint that

, the Lagrange

function of the log EL can be written as

where

is a Lagrange multiplier. Then values of

maximizing (4.3) given the

constraint (4.2) take the form of

[
where

( )

))

if

, and

(
(

the density-based EL approximation to

It is clear to see that

) )]
(

))

if

. As a result,

is given by

) )]

strongly depends on the integer parameter m with unknown

optimal values of m, making it difficult to effectively implement in practice. This kind of


problem is related to known goodness-of-fit tests based on sample entropy (e.g. Vasicek,
1976). To improve the test with regard to the efficiency of the test performance in

64

practice, we need to adapt the test structure by eliminating the dependence on the integer
parameter m. Applying arguments similar to those presented in the Appendix of Gurevich
and Vexler (2011) and using the maximum EL concept, the modified test statistic can be
written in the form of

where

, and

))

. The bounds

) )]

and

are declared

by Proposition 4.1 mentioned below to present the consistency of the proposed test.
Likewise, following the same density-based EL technique, we can obtain the densitybased EL estimator of

given by

where
(

,
if

,
, and

( )

))

( )

if
)

) )]

)).

Finally, taking the log of the equations (4.4) and (4.5) and combining them together yield
our new density-based EL ratio test statistic as
(
Since the test statistic

) is based on the approximation to the optimal

parametric likelihood ratio, it would be expected that a test based on

) is

very powerful. The decision rule developed for hypothesis testing (4.1) is to reject the
null hypothesis if
(

)
65

where

is a critical value at the significant level . Referring to Canner (1975), we

arbitrarily define

be zero. Likewise, if
defined by (

))

))

))

))

if it turns out to

, then it is arbitrarily

) . The following proposition demonstrates the consistency of the

test.
Proposition 4.1. Let
(

( ) be a density function with finite expectations

)), (

))
,(

(4.4) and (4.5). Thus, we have under

. Define
)

( is a constant), while, under


(

in the definitions
)

, as

where
{

{
as

(
(

)
))}
)
(

(
(

)
))}
)

The proof of this proposition is based on a proof scheme of the Proposition 4.1 by
Gurevich and Vexler (2011) with additional applications of Bahadur theorems (1966).
We omit the lengthy and technical proof of Proposition 4.1 for brevity.
Proposition 4.1 demonstrates that the proposed test statistic

) is

asymptotically consistent. Note that the structure of the proposed test includes parameters
and . One of the aims of this chapter is to show that the properties of the proposed

66

procedure do not significantly depend on values of parameters. Although the Proposition


4.1 claims that the proposed test is consistent for
(

, we will examine also the cases

) for the sake of completeness.

The proposed test is exact, independent of distributions of


Accordingly, critical values

can be easily tabulated for fixed ,

and
, and

under

, using, e.g.

Monte Carlo simulations, see the next section.

4.3

SIMULATION STUDY

To evaluate the performance of the proposed method, the following Monte Carlo
experiment was carried out. We begin by tabulating the critical values of the null
distribution.

4.3.1 NULL DISTRIBUTION


Since the test statistic involves the indicator functions
(

))

( ) and (

, the null distribution of the test statistic is

independent of the distributions of observations. As a consequence, the proposed test is


an exact test, implying that the critical values of the test can be exactly computed. Here
we used the standard normal distribution of
percentiles of the null distribution; that is, to derive
Normal (0, ) {

and

to tabulate the Monte Carlo

:
)

To execute an extensive Monte Carlo experiment, we proceeded as follows. We first


generated data of

and

calculated the test statistic

from the standard normal distribution N(0,1) and then


(

). By repeatedly conducting the same procedure

50,000 times, we obtained 50,000 replicate samples of the test statistic, which were used

67

to determine the critical values of the null distribution of

) at the level of

significance of . The experiment was conducted at each sample size

and

with

different selected values of a pair ( ). The simulation results of the experiment are
shown in Table 4.1.
Table 4.1: The critical values
of the proposed test at (4.7) with various ( ) values for
sample sizes (
) at the significant level
0.01, 0.05, and 0.1, respectively.
Parameter Values
( )
(0, 0.5)

(0, 0.6)

(0, 0.7)

(0, 0.8)

(0, 0.9)

(0.1, 0.5)

(0.1, 0.6)

(0.1, 0.7)

(0.1, 0.8)

(0.1, 0.9)

(0.3, 0.5)

Sample sizes (
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1
0.01
0.05
0.1

(10, 10)
9.3453
7.8281
7.1484
9.2927
7.7415
7.0639
9.2329
7.7251
7.0720
9.2685
7.7005
7.0513
9.2810
7.7377
7.0594
9.3505
7.8262
7.1352
9.2892
7.7570
7.0718
9.2044
7.7193
7.0600
9.3381
7.7256
7.0513
9.3553
7.7695
7.0639
9.8085
8.2263
7.5314

(15, 15)
10.3877
8.7486
7.9681
10.3000
8.7194
7.9550
10.2949
8.7133
7.9764
10.3264
8.7036
7.9291
10.2860
8.6926
7.9694
10.4619
8.7432
7.9729
10.4074
8.7067
7.9626
10.3085
8.7452
7.9890
10.3016
8.6902
7.9656
10.2482
8.6913
7.9372
10.6346
8.9543
8.1834

(15, 25)
11.1116
9.3777
8.5862
11.1116
9.3777
8.5862
11.1237
9.3523
8.5521
11.0694
9.3184
8.5367
11.0481
9.3487
8.5270
11.1104
9.4116
8.6065
11.0543
9.3638
8.5585
11.1186
9.3878
8.5748
10.9595
9.3178
8.5535
11.0056
9.2939
8.5332
11.4540
9.7283
8.9023

68

(25, 25)
11.9537
10.1647
9.3214
11.9506
10.2037
9.3785
11.8968
10.2180
9.3738
11.9343
10.1623
9.3326
11.8869
10.1585
9.3615
11.8833
10.1560
9.3454
11.9135
10.1363
9.3592
11.8887
10.1763
9.3506
11.8469
10.1685
9.3372
11.9070
10.1758
9.3347
12.3162
10.5765
9.7645

(35, 35)
13.1402
11.3360
10.4848
13.1462
11.2840
10.4399
13.1292
11.2918
10.4552
13.1525
11.3200
10.4634
13.1518
11.3087
10.4593
13.0643
11.2571
10.4272
13.0473
11.2593
10.4239
13.1386
11.3269
10.4776
13.1968
11.3018
10.4385
13.3179
11.3176
10.4670
13.3237
11.4876
10.6432

(45, 45)
14.2593
12.3041
11.4243
14.1090
12.2720
11.4064
14.1711
12.2998
11.4066
14.0960
12.3017
11.4200
14.2269
12.2990
11.4061
14.1751
12.2677
11.4030
14.0862
12.2328
11.3824
14.1869
12.3182
11.4320
14.1934
12.3011
11.4067
14.1376
12.3093
11.4092
14.2501
12.3624
11.5093

(50, 50)
14.7108
12.7678
11.8860
14.7456
12.7199
11.8339
14.6170
12.7341
11.8278
14.6327
12.7315
11.8356
14.6962
12.7476
11.8665
14.6041
12.7245
11.8335
14.6985
12.7414
11.8510
14.6367
12.6919
11.8422
14.5721
12.7192
11.8361
14.6761
12.7526
11.8606
14.7420
12.7922
11.8779

Table 4.1: (contd)


Parameter Values
( )
(0.3, 0.6)
0.01
0.05
0.1
(0.3, 0.7)
0.01
0.05
0.1
(0.3, 0.8)
0.01
0.05
0.1
(0.3, 0.9)
0.01
0.05
0.1
(0.4, 0.5)
0.01
0.05
0.1
(0.4, 0.6)
0.01
0.05
0.1
(0.4, 0.7)
0.01
0.05
0.1
(0.4, 0.8)
0.01
0.05
0.1
(0.4, 0.9)
0.01
0.05
0.1
(0.5, 0.6)
0.01
0.05
0.1
(0.5, 0.7)
0.01
0.05
0.1
(0.5, 0.8)
0.01
0.05
0.1
(0.5, 0.9)
0.01
0.05
0.1
(0.6, 0.7)
0.01
0.05
0.1

(10, 10)
9.7274
8.1814
7.4760
9.7781
8.1341
7.4574
9.7059
8.1505
7.4614
9.7991
8.1529
7.4731
10.6112
8.9814
8.2424
10.5291
8.8678
8.1257
10.3935
8.8601
8.1440
10.6310
8.9170
8.1706
10.5404
8.8990
8.1549
10.5463
8.8759
8.1643
10.5436
8.8847
8.1368
10.5858
8.9052
8.1799
10.5665
8.8779
8.1761
11.5131
9.7490
9.0446

(15, 15)
10.5522
8.9218
8.1521
10.5874
8.9217
8.1852
10.5922
8.9444
8.2003
10.5570
8.8921
8.1300
11.2950
9.5875
8.7705
11.2244
9.5175
8.7307
11.1481
9.5108
8.7274
11.2006
9.5283
8.7469
11.1530
9.4997
8.7254
12.0270
10.2588
9.4938
12.0173
10.2115
9.4756
11.9449
10.2432
9.4871
11.9446
10.2538
9.4856
12.8077
11.0887
10.3768

Sample sizes (
(15, 25) (25, 25)
11.4267 12.3644
9.6847 10.5418
8.8954 9.7147
11.4992 12.3325
9.7175 10.5579
8.9117 9.7226
11.4186 12.2692
9.6586 10.5353
8.8859 9.7096
11.4600 12.3332
9.7241 10.5610
8.8893 9.7366
12.1616 13.0224
10.3334 11.1941
9.5210 10.3597
12.1387 12.9591
10.2938 11.1354
9.4957 10.3177
12.0465 12.9445
10.2899 11.1424
9.4846 10.3336
12.0671 12.8801
10.3300 11.1162
9.5189 10.3191
12.1275 12.9025
10.2749 11.1423
9.4806 10.3266
12.9370 13.6077
11.1353 11.8599
10.2954 11.1037
12.8799 13.6496
11.0909 11.8876
10.2961 11.1079
12.9062 13.7680
11.1300 11.9260
10.3134 11.1472
12.9200 13.7055
11.0918 11.8834
10.2965 11.1167
14.2238 15.3984
12.4651 13.7289
11.7368 13.0363

69

, )
(35, 35)
13.3133
11.4443
10.6206
13.3398
11.5310
10.6658
13.3090
11.4801
10.6150
13.4335
11.5100
10.6499
13.7326
11.9243
11.0983
13.7107
11.9068
11.0790
13.7722
11.9444
11.0964
13.6517
11.9248
11.1131
13.7767
11.9325
11.0896
15.1018
13.3568
12.5803
15.2383
13.3845
12.6004
15.1518
13.3821
12.5850
15.1155
13.3538
12.6088
16.9911
15.2774
14.5681

(45, 45)
14.3610
12.3914
11.5111
14.2964
12.3872
11.5029
14.2615
12.3402
11.4671
14.2825
12.3721
11.5001
15.0571
13.2592
12.4152
15.0017
13.2433
12.4149
15.0430
13.2021
12.3860
15.1367
13.2400
12.4186
15.0916
13.2249
12.4138
16.5377
14.7771
14.0069
16.5816
14.8188
14.0539
16.5732
14.7714
14.0195
16.5425
14.7624
14.0037
19.3162
17.7350
17.0535

(50, 50)
14.6681
12.7798
11.9154
14.7754
12.7980
11.9111
14.7270
12.8266
11.9277
14.6613
12.7856
11.9121
15.3855
13.5104
12.7124
15.5087
13.5428
12.6994
15.4970
13.5794
12.7211
15.4345
13.5226
12.6900
15.5587
13.5991
12.7083
16.8149
15.0410
14.2730
16.8500
15.0384
14.2531
16.7982
15.0326
14.2518
16.8217
15.0570
14.2668
19.5837
17.9512
17.2144

Table 4.1: (contd)


Parameter Values
( )
(0.6, 0.8)
0.01
0.05
0.1
(0.6, 0.9)
0.01
0.05
0.1
(0.7, 0.8)
0.01
0.05
0.1
(0.7, 0.9)
0.01
0.05
0.1

(10, 10)
11.5090
9.7469
9.0276
11.6090
9.7483
9.0256
12.5478
10.7823
10.0446
12.6427
10.7848
10.0603

(15, 15)
12.7709
11.0969
10.3841
12.7505
11.0864
10.3713
14.7903
13.2178
12.5575
14.7093
13.1639
12.5262

Sample sizes (
(15, 25) (25, 25)
14.2371 15.3914
12.4816 13.7190
11.7555 13.0406
14.1758 15.3425
12.4518 13.7218
11.7241 13.0369
16.5983 18.4280
15.0707 16.9354
14.4402 16.3259
16.6315 18.4181
15.0734 16.9273
14.4476 16.3094

, )
(35, 35)
16.9450
15.2892
14.5705
16.9624
15.2435
14.5397
20.9676
19.5255
18.9088
20.9917
19.5463
18.9148

(45, 45)
19.3345
17.7199
17.0355
19.3088
17.7318
17.0605
23.6376
22.1146
21.4800
23.6172
22.1180
21.4977

(50, 50)
19.6321
17.9117
17.2180
19.5333
17.9099
17.2074
24.8691
23.4102
22.7600
24.8397
23.3599
22.7322

In order to show the appropriateness of the critical values displayed in Table 4.1, we
present in Table 4.2 the corresponding type I errors of the proposed test that were
obtained by utilizing critical values from Table 4.1 (

) and different distributed

samples under the null hypothesis. As can be seen from Table 4.2, the type I error rates of
the proposed test are well controlled. These results were expected since the proposed test
is exact. Therefore, critical values of the proposed test can be exactly calculated and the
associated estimated type I error rates can be well maintained at the nominal level .
Table 4.2: Type I error control of the proposed test statistic (4.6) with ( ) (
at the significant level
.
Design
Sample sizes (
)

vs.

(10,10)

(15, 15)

(15, 25)

(25, 25)

(35, 35)

(45, 45)

(50, 50)

N(0.5, 1) vs. N(0.5, 1)

0.0503

0.0500

0.0508

0.0512

0.0490

0.0524

0.0503

N(1, 1.52) vs. N(1, 1.52)

0.0511

0.0483

0.0511

0.0508

0.0515

0.0495

0.0498

LogNorm(0, 1) vs. LogNorm(0, 1)

0.0509

0.0493

0.0487

0.0488

0.0513

0.0494

0.0514

Exp(2) vs. Exp(2)

0.0505

0.0485

0.0494

0.0498

0.0501

0.0493

0.0494

Gamma(1, 0.5) vs. Gamma(1, 0.5)

0.0502

0.0504

0.0500

0.0495

0.0516

0.0526

0.0507

vs.

0.0479

0.0503

0.0493

0.0513

0.0519

0.0500

0.0507

70

4.3.2 POWER STUDY


The focus of this chapter is to analyze the proposed density-based EL ratio technique in
different scenarios related to practical situations. Towards this end, in this section, we
examine the powers of the considered tests in various cases using Monte Carlo
simulations. Our interest in this study is to compare the proposed test with the classical
procedures, such as the independent two-sample t-test, the Wilcoxons test, and the twosample Kolmogorov-Smirnov test, for two-sample comparisons based on paired data. The
following Monte Carlo study was performed for different sample sizes and values of ( )
in the definition (4.6). In this Monte Carlo power study, the critical values depicted in
Table 4.1 were utilized to guarantee the type I error rate at 0.05 (
samples of each sample size

and

). 10,000

were generated from various population

distributions that can be categorized as the following cases: (i) constant location shifts
such as design K1 - N(0, 1) vs. N(0.5, 1) presented in Table 4.3; (ii) constant versus
nonconstant location shifts such as design K2 - N(0, 1) vs. N(0.5, 1.32); (iii) skewed data
such as design K27 -

vs. Exp(0.1). Table 4.3 denotes the considered scenarios of

alternatives distributions.

71

Table 4.3: Designs of the alternative hypothesis to be applied to the following Monte
Carlo evaluations of the powers of the proposed test (4.6).
Design
K1
N(0, 1)
N(0.5, 1)
K2
N(0, 1)
N(0.5, 1.32)
K3
N(0, 1)
N(0.5, 1.52)
K4
N(0, 1)
N(0.5, 2.252)
K5
N(0, 1)
N(0, 1.52)
K6
N(0, 1)
Unif[-1,1]
K7
N(0, 1)
Cauchy(0, 1)
K8
Exp(1)
LogNorm(0, 1)
K9
Exp(1)
LogNorm(0, 22)
K10
Beta(0.7, 1)
Exp(2)
K11
LogNorm(0, 1)
LogNorm(0.5, 1.52)
K12
LogNorm(0, 1)
LogNorm(1, 2.52)
K13
LogNorm(0, 1)
LogNorm(0, 1.22)
K14
LogNorm(0, 1)
Unif[1,2]
K15
Gamma(1, 1)
Gamma(1, 2)
K16
Gamma(1, 1)
Gamma(1, 0.5)
K17
Gamma(1, 1)
Gamma(1, 1.25)
K18
Gamma(1, 1)
Gamma(3, 1.5)
K19
Gamma(1, 1)
Gamma(1, 5)
K20
Gamma(1, 1)
Gamma(1, 10)
K21
Gamma(1, 1)
Gamma(1, 50)
K22
Gamma(1, 1)
Gamma(5, 2.5)
K23
Gamma(1, 1)
Gamma(10, 2.5)
K24
( )
( )
K25
LogNorm(0,
42)
( )
K26
LogNorm(0, 32)
( )
K27
Exp(0.1)
( )
K28
Exp(0.1)
( )
A summary of the power study is displayed in Figure 4.1.
The presented three-dimensional plots represent the following: The x-axis corresponds
to sample sizes (

(50, 50), (45, 45), (35, 35), (25, 25), (15, 25), (25, 15), (15, 15);

the y-axis stands for different values of a pair ( ); the z-axis represents the Monte
Carlo powers of the considered tests. The following symbols represent each test utilized
in the Monte Carlo study: the powers of the proposed test are displayed using the symbol:
; for the classical procedures, we utilize the following symbols: the two-sample
72

Kolmogorov-Smirnov testsymbol ; the Wilcoxons testsymbol ; the


independent two-sample t-testsymbol +. Recall that the cases of

do not satisfy

the conditions of Proposition 4.1 as pointed out in the previous section. However, we
investigate these cases to show that the proposed test has stable operating characteristics
with respect to different values of and .
In design K1, the alternative has a constant location shift. When observations are from
a normal distribution, it is anticipated that the t-test has the greatest power among all
considered tests. In this case, the t-test represents the maximum likelihood ratio test based
on correct assumptions regarding sample distributions. The Wilcoxons test is also known
to be efficient when a constant shift is in effect under

. Consequently, it was expected

that the classical tests perform better for such types of alternatives. In these situations, the
proposed test has 10% or 20% lower in power as compared to the classical procedures.
Designs K2, K3, and K4 have constant and nonconstant location shifts in normal
distributions. In these cases, the power differences between the proposed test and the
classical procedures become larger with the increase of the scale parameter of the
alternative distributions. As can be seen in design K2, the powers of the proposed test are
close to those of the classical procedures. When the scale parameter of the alternative
distribution increases from 1.52 in design K3 to 2.252 in design K4, the powers of the
proposed test become much larger than those of the classical tests; for design K5 with a
scale shift, design K6 with the symmetric Uniform[-1,1] function, and design K7 having
the heavy-tailed symmetric Cauchy distribution as a role of an alternative distribution, it
is obvious that the proposed test is superior to the classical approaches. When we have
nonconstant shifts such as the exponential and lognormal cases in the designs K8 and K9

73

as well as the beta and exponential cases in the design K10, the proposed test clearly
outperforms the classical tests.
In the skewed lognormal case of design K11, the proposed test is more powerful than
the classical procedures in most situations. Only in the cases of large sample sizes (e.g.
50 or 45) or sample sizes of (

(15, 25), the t-test has slightly more

power than the proposed test. In the remaining skewed lognormal cases of the designs
K12-K14, the proposed test obviously has higher power than the classical procedures. In
the skewed gamma cases of the designs K15-K23, we can see that the proposed test does
not always perform better than the classical tests. However, their power differences are
not significant; in the skewed chi-squared cases of the designs K24-K28, the proposed
test is superior to the classical tests except for the design K24. The powers of the
proposed test in design K24 are still close to those of the classical tests. We would like to
note again that the type I error control related to the t-test was executed asymptotically,
i.e. the powers demonstrated for the t-test may not be corresponded to the significance
level of 0.05.
In summary, the Monte Carlo outputs report that the powers of the proposed test do
not depend significantly on the values of (

). The various outputs show that as

anticipated, the proposed test works quite well, in general, outperforming the classical
procedures in many cases. For a few considered sample sizes and designs, powers of the
proposed test and the classical standard procedures are comparable. For most of sample
sizes and alternative designs considered, the proposed test is found to be superior to the
classical tests for the two-sample problems based on paired data.

74

4.4

A REAL DATA EXAMPLE

In this section, we applied our new method to a study conducted at the Center for
Children and Families, University at Buffalo, State University of New York. A novel
group therapy for children with Attention-Deficit/Hyperactivity Disorder (ADHD) and
severe Mood Dysregulation (SMD) symptoms was created to develop effective
treatments for children with ADHD and SMD. The subjects recruited in this study were
32 children ages 7 to 12 with ADHD and SMD. They were randomly assigned to receive
either the experimental 11 week group therapy program or community psychosocial
treatment. The former was defined as a therapy group with the sample size

whereas the latter was referred to as a control group with the sample size

Measurements were taken at 2 time points: Baseline (week 0) and Endpoint (week 11).
The paired data constituted by the amount of changes in Childrens Depression Rating
Scale Revised total score (CDRS-Rts) at baseline and endpoint were utilized as our
interested outcome measures (see, e.g., Poznanki et al., 1979, 1984).
Let (

),, (

) and (

),, (

) be the CDRS-Rts

observations obtained at baseline and endpoint in the therapy group (group 1) and the
control group (group 2), respectively. We consider the differences
. Define (

) to stand for observations in group 1 based

on the within-pair differences in CDRS-Rts


(

and define

) to represent observations in group 2 based on the within-pair

differences in CDRS-Rts

. The goal of this study is to detect

differences between the new therapy and control groups with respect to treatment effects
of ADHD and SMD in children. Specifically, we are interested in testing if there are
75

differences in the distributions of paired CDRS-Rts data between two treatment groups.
The empirical histograms of

-sample and

-sample are presented in Figure 4.2.

The mean and standard deviation of the paired data in group 1 are -6.8235 and 4.6534,
respectively, while those based on paired data in group 2 are -3.8667 and 4.4540,
respectively. In Section 4.3, the powers of the proposed test are shown experimentally to
be independent of the values of ( ). Without loss of generality, in this example, we
utilize
(

and
) with

in the test statistic (4.6). The observed test statistic


and

0.1292 at the significant level

is 10.1786 with the corresponding p-value of


. Comparing with the two-sample Kolmogorov-

Smirnov test, Wilcoxons test, and t-test, we obtained their p-values also larger than the
significance level

, which are 0.2397, 0.0863, and 0.0765, respectively.

According to these testing results based on the full dataset, our new test and the classical
tests lead to the identical conclusion that the distributions of paired CDRS-Rts data have
no statistically significant differences between two therapy groups at

, implying

that there is no strong evidence of differential effects of two treatments. As a


consequence, the proposed test supports the conclusions regarding the treatment study of
ADHD and SMD in conjunction with the classical procedures. Note that the Wilcoxons
test may break down completely when a nonconstant change is in effect under the
alternative hypothesis (e.g., Albers et al., 2001). The independent two-sample t-test is
asymptotic, not an exact test, while the proposed test is exact. The Monte Carlo study
indicates that the proposed density-based EL ratio test is very powerful when a
nonconstant shift is in effect.

76

In accordance with ideas introduced by Stigler (1977), we performed a bootstrapping


type study to examine the proposed test. The procedure of the bootstrapping type study
proceeded as follows. We randomly selected samples with the following sizes for two
groups from the original dataset: (

(7, 5), (9, 7), (11, 9), (13, 11), (15, 13), (17,
(

15) and calculated the corresponding test statistic

from a bootstrap method, we located the obtained test statistic

). To obtain a p-value
(

) in the null

distribution that generated from a standard normal distribution N(0,1). We repeated this
resampling with replacement 1,000 times and obtained a p-value in each resample. An
average p-value was computed by taking the average of the obtained 1,000 p-values. The
results of the average p-values of the considered tests in this study are displayed in Figure
4.3. It can be seen from Figure 4.3 that in each case of sample sizes, the p-value
decreases when the sample sizes of each group increase for all tests, implying that
probably when we have more data, the p-values will decrease to be less than the
significant level

, in which case we will reject the null hypothesis.

Finally, we conducted a new bootstrapping type study on sample sizes (


(

). Two samples of size 10 were selected at random from our original data to be

tested at the significance level


calculating the frequencies of events

. We repeated this strategy 10,000 times


(

(the critical value was

chosen from Table 4.1). Table 4.4 depicts the results. The proposed test rejects the null
hypothesis in 4273 cases, while the two-sample Kolmogorov-Smirnov test, the
Wilcoxons test, and the independent two-sample t-test rejects the null hypothesis in 1226,
2786, 2889 cases, respectively. The number of rejection in each test is not large
compared to 10,000 resampling times (i.e. the rejection rate in each test is small). It again
77

demonstrates that when we do not have enough data, the null hypothesis would be more
inclined to be not rejected. Note that the proposed test has the largest proportion of
rejection, showing that our new test is reliable and most sensitive to detect the differences
between two samples based on paired data in comparison to the classical procedures.
Table 4.4: Proportion of rejection* based on the bootstrap method for each considered test.
Considered Tests
Bootstrapping type study based on (
) (10, 10)
Proposed test
0.4273
Kolmogorov-Smirnov test
0.1226
Wilcoxons test
0.2786
t-test
0.2889
*The proportion of rejection of each test from the bootstrap method was computed based
on sample sizes (
) (10, 10) and 10,000 replications.

4.5

CONCLUSIONS

In this chapter, we extended and adapted the Gurevich and Vexlers test (2011) to
develop a nonparametric approach for comparing treatment effects with respect to two
study groups of individuals involved into biomedical studies. In contrast to the Gurevich
and Vexlers test (2011), we relaxed the boundaries on the values of the test parameters
(considering the case of 0<

< 0.5) and used Monte Carlo simulations to see how that

affects the performance of the proposed test. The simulation results demonstrated that the
proposed test has stable operating characteristics with respect to values of parameters
and . Moreover, we extensively examined the power properties of the proposed test and
relevant classical tests. The study results demonstrated that the proposed test is very
efficient, even outperforming the standard procedures in many cases. We showed that
when the underlying data are normally distributed and only a location shift is assumed
under the alternative hypothesis, the proposed test has high and stable power to detect
differences in location, resulting in a relatively small power loss as compared to the
78

classical Wilcoxons test, the two-sample Kolmogorov-Smirnov test, and the independent
two-sample t-test. On the contrary, in the case of nonconstant location-shift alternatives
with normally distributed data, it turns out that the proposed test achieves a substantial
power gain in comparison to the standard procedures. Furthermore, we applied the
proposed method to the real data example, showing that the proposed procedure helped to
confirm the decision regarding the treatment effect of ADHD and SMD in children. This
illustrates the practical applicability of the proposed test. Therefore, the proposed test can
be utilized as a very powerful tool in nonparametric statistical inference applied to twosample problems based on paired data. The proposed approach can be easily extended to
consider k-sample problems.

79

Figure 4.1: 3-D plots of powers of the considered tests via all 28 designs (K1-K28) with
) (50, 50), (45, 45), (35, 35), (25, 25), (15, 25), (25, 15),
different sample sizes (
(15, 15) and parameter settings of ( ) (0, 0.5), (0, 0.6), (0, 0.7), (0, 0.8), (0, 0.9);
(0.1, 0.5), (0.1, 0.6), (0.1, 0.7), (0.1, 0.8), (0.1, 0.9); (0.3, 0.5), (0.3, 0.6), (0.3, 0.7), (0.3,
0.8), (0.3, 0.9); (0.4, 0.5), (0.4, 0.6), (0.4, 0.7), (0.4, 0.8), (0.4, 0.9); (0.5, 0.6), (0.5, 0.7),
(0.5, 0.8), (0.5, 0.9); (0.6, 0.7), (0.6, 0.8), (0.6, 0.9); (0.7, 0.8), (0.7, 0.9) at the significant
level
.

80

81

82

83

84

85

86

Figure 4.2: Histograms of the differences in CDRS-Rts at baseline and endpoint in the
group 1 (
samples) and group 2 (
samples), respectively

Figure 4.3: Plot of sample sizes vs. p-value using a bootstrap method

Plot of Boostrap results


0.6

p-value

0.5
0.4
Proposed test

0.3

Kolmogorov-Smirnov test

0.2

Wilcoxons test

0.1

t-test

0
(7, 5)

(9, 7)

(11, 9) (13, 11) (15, 13) (17, 15)

Sample sizes

87

CHAPTER 5
TWO-SAMPLE DENSITY-BASED EMPIRICAL
LIKELIHOOD RATIO TESTS BASED ON PAIRED
DATA, WITH APPLICATION TO A TREATMENT
STUDY OF ATTENTION-DEFICIT/
HYPERACTIVITY DISORDER AND SEVERE
MOOD DYSREGULATION

5.1

INTRODUCTION

Often, investigators in various fields of medical studies deal with paired data to compare
different population groups. In this chapter, we propose a paired data-based methodology
motivated by the following comparative study of Attention-Deficit/Hyperactivity
Disorder (ADHD) and Severe Mood Dysregulation (SMD). ADHD is a common
diagnosed psychiatric disorder in children (e.g., Biederman, 1998; Nair et al., 2006).
SMD is a diagnostic label recently created by the Leibenlufts laboratory in the National
Institute of Mental Healths intramural program to refer to children with an abnormal
baseline mood, hyperarousal, and increased reactivity to negative emotional stimuli (e.g.,
Brotman et al., 2006; Carlson, 2007; Leibenluft et al., 2003; Waxmonsky et al., 2008). A
novel group therapy study at University at Buffalo enrolled 32 children aged 7-12 with

88

ADHD and SMD. These children were treated for 11 weeks. The study participants were
randomized between two therapy groups: experimental group therapy program (case; new
therapy group) and community psychosocial treatment (control; old therapy group). An
objective of the study was to compare the feasibility and efficacy of these two treatments
using the Childrens Depression Rating Scale Revised total score (CDRS-Rts). The
Childrens Depression Rating Scale, revised version (CDRS-R), is a clinician-rated
instrument for the diagnosis of childhood depression and the assessment of the severity of
depression in children 6-12 years of age (Poznanki et al., 1979, 1984). The CDRS-R
consists of 17 clinician rated items, with 4 items based on the childs self-report or
reports from the parents or teachers and three items based on the childs nonverbal
behavior during the interviews. The CDRS-R provides more reliable depression ratings
compared to the other children depression rating scales, since it collects information from
more sources through interviewing the child, parents or school teachers, independently,
as well as it considers the childs behavior during the interview, and lengthens scales to
capture slight differences of symptomatology. On the basis of clinical experience, a
CDRS-Rts of below 40, 40-60, and above 60 corresponds to none to mild, moderate, and
severe depression, respectively (Poznanki et al., 1979, 1984; Ying et al., 2006). Thus, the
fact that the CDRS-Rts drops significantly over the course of the study implies an
effectiveness of a treatment. To record paired data of this study, two measurements were
taken from the same subjects. The paired data were constituted by observed values of
CDRS-Rts at week 0 (baseline) and week 11 (endpoint).
In this medical study, main research problems are to test differences between
distributions of the two therapy groups as well as to detect treatment effects within each

89

group. Testing the hypothesis of no difference between distributions of two therapy


groups using the paired data is only one aspect of the comparisons between treatments.
For example, in the context of the treatment study of ADHD and SMD, previously,
Waxmonsky et al. (2008) carried out a study to examine the tolerability and efficacy of
methylphenidate (MPH) and behavior modification therapy, where multiple comparisons
with the Bonferroni technique using the independent sample t-tests and the pairwise ttests were conducted. The former tests were implemented to evaluate between-group
differences in baseline characteristics and measures of tolerability (the Pittsburgh Side
Effect Rating Scale, say PSERS). The latter ones were undertaken in the SMD group to
compare pre- and post-differences in CDRS-Rts and PSERS from low-dose MPH to high
dose MPH. In this chapter, we avoid considerations of combined p-values, proposing a
simple and efficient way to create nonparametric tests that attends to special alternative
hypotheses directly, in an analogy to parametric likelihood ratio tests. Note that, in
controlling the type I error, the used t-tests are known to be inefficient, when utilized data
are skewed, and the applied Bonferroni method tends to be conservative. The
nonparametric statistical analyses of two populations described above require to consider
more versatile testing methods than those well addressed in the classic literature. In this
chapter, we propose and examine distribution-free tests for multiple hypotheses to detect
various differences related to treatment effects in study groups based on paired data.
To formalize the testing problems, let (

) be independent identically distributed

(i.i.d.) pairs of observations within a subject j from sample i, where i 1, 2 are referred to
as treatments; j

are referred to as subjects. In the nonparametric setting, the

classic one-sample tests for paired data, e.g. the paired t-test and the Wilcoxon signed

90

rank test, are based on differences

, where

difference of subject j from sample i, i 1, 2; j


{

. Note that {

} consist of i.i.d. observations from populations

functions, say,

( ) and

denotes a within-pair

and

} and
with distribution

( ), respectively. In contexts of treatment evaluations,

can be defined to be the difference of measurements between pre- and post-treatment. In


this chapter, we consider different hypotheses simultaneously for the symmetry of
and/or

(detecting a treatment effect into groups) as well as for the equivalence


. Here we refer to the nonparametric literature to connect the term treatment

effect with tests for symmetry (e.g., Wilcoxon, 1945). Note that the KolmogorovSmirnov test is a known procedure to compare distributions of populations, whereas the
standard testing procedures such as the paired t-test, the sign test, and the Wilcoxon
signed rank test can be applied to the symmetric problem, i.e., to test for
(

( )

). Comparisons between distributions of new therapy and control groups as

well as detecting treatment effects may be based on multiple hypotheses tests. To this end,
one can create relevant tests combining, for example, the Kolmogorov-Smirnov test and
the Wilcoxon signed rank test. The use of the classical procedures commonly requires
complex considerations to combine the known nonparametric tests. Alternatively, we will
develop a direct distribution-free method for analyzing the two-sample problems. The
proposed method can be easily applied to test nonparametrically for different composite
hypotheses. The proposed approach approximates nonparametrically most powerful
Neyman-Pearson test-rules, providing efficiency of the proposed procedures.
When parametric forms of the relevant distributions are known, corresponding
parametric likelihood ratios can be easily applied to test for the problems mentioned
91

above. According to the Neyman-Pearson lemma, the parametric likelihood ratio tests are
optimal decision rules (e.g., Lehmann and Romano, 2005). We propose to approximate
corresponding likelihood ratios using an empirical likelihood (EL) concept. The EL
methodology has been addressed in the statistical literature as one of powerful
nonparametric techniques (e.g., Owen, 1988, 1991, 2001). The EL methodology allows
researchers to use distribution-free procedures with efficient characteristics that are
asymptotically close to those of related parametric likelihood approaches (e.g., Lazar and
Mykand, 1998). The EL approach is developed via terms of cumulative distribution
functions (e.g., Owen, 2001). Vexler and Yu (2011) demonstrated that the classical EL
method based on distribution functions is well suitable for testing parameters; however,
the EL technique based on density functions performs more efficiently to test for
distributions. To approximate Neyman-Pearson test statistics, Vexler and Gurevich (2010)
and Gurevich and Vexler (2011) proposed to focus on the density-based EL,

, where

} and

(
()

( ) ),

( ) is an unknown density function of the observations

denotes the ith ordered statistic based on {

approximate values of
of the constraint ( )

are obtained by maximizing

}. In this case,

subject to an empirical version

We extend and adapt the density-based EL approach for the two-sample testing issues
carrying out multiple testing problems in paired data settings. Despite the fact that many
statistical inference procedures have been developed for two-sample problems, to our
knowledge, relevant nonparametric likelihood techniques to deal with the presented twosample issues based on paired data have not been well addressed in the literature. The

92

proposed density-based EL tests are exact that ensures accurate computations of relevant
p-values based on data with small sample sizes.
The rest of this chapter is organized as follows. In Section 5.2, we address the purpose
of each testing hypothesis considered in this chapter, and then we develop corresponding
density-based EL ratio test statistics. The theoretical results will be presented to show the
asymptotic consistency of the proposed tests. To evaluate the proposed approaches,
extensive Monte Carlo studies are carried out in Section 5.3. An application to analyze
the CDRS-Rts data is presented in Section 5.4. In Section 5.5, we provide some
concluding remarks.

5.2

STATEMENT OF PROBLEMS AND METHODS

5.2.1

HYPOTHESES SETTING

To test for equality of the distribution of the new therapy group and the control therapy
group based on paired observations {

} and {

}, one may consider

the hypotheses,
vs.

In order to incorporate evaluation of the treatment effect on each therapy group, we point
out three tests related to the null hypothesis, 1) the equality of the distributions of two
therapy groups, and 2) no treatment effect in each group. This can be presented by
)

( )

, and 2)

), for all

can set up three different alternative hypotheses, namely


(i)

i.e.,

or

( )

93

(
,

) or

). Against
, and

( )

, we

, where
(

);

(ii)

There is a treatment effect in one therapy group while there is no treatment

effect in the other;


(iii)

One asserts that both therapy groups have the same treatment effect. In this case,

since the distributions of two groups are assumed to be identical under

and

, a

one-sample test for symmetry can be applied.


The cases (i)-(iii) are formally noted in Table 5.1.
Table 5.1: Hypotheses of interest to be tested based on paired data.
Null hypothesis
vs.
Alternative hypothesis

( )
for all

(
(

or ( )
for i=1 or 2 (i.e. not
( )
;
( )
(
;

),

)
(

);

)
( )

Let Test 1, Test 2, and Test 3, refer to the hypothesis tests for the composite hypotheses
vs.

vs.

, and

vs.

, respectively.

5.2.2 TEST STATISTICS


In this section, we develop test statistics for Tests 1-3. The proposed three tests will be
shown to be exact.
5.2.2.1

TEST 1:

vs.

Consider the scenario where one is interested to test for


( )

), for all

The likelihood ratio test statistic based on observations,

) vs.

, i 1, 2; j

by

(
(
94

)
)

.
, is given

where

, i 1, 2, are density functions related to

related to a symmetric distribution


(

( ) ),
(

and

( ))

, i 1, 2;
(

as well as

( )

( )) ,

( )

are the order statistics based on {

is a density function

),

( )) ,
( )

( )

} and {

} ,

respectively. The main novelty of the proposed method for developing the nonparametric
test statistic is that we modify the maximum EL concept to obtain directly estimated
, maximizing

values of

subject to an empirical constraint.

This constraint controls estimated values of


property of the density function

, preserving the main

under the complex structure of the tested hypothesis.

To obtain the associated empirical constraint, we utilize the fact that the values of
should be restricted by the equation

( )

Theorem to approximate the constraint

( )

( )

Since under the null hypothesis

, we have
(

. By applying the Mean Value

( )

( )
( )

( )

))

) ))

, the distribution function

(
is assumed

to be symmetric, the idea presented by Schuster (1975) can be adapted to estimate


(

denoted as

))

) ))

at (5.1) by using the following estimator, which is

,
95

(
( )

where

))

[ (

) ))

)] I(.) is the indicator

function.
By virtue of Proposition 2.1 in Vexler and Gurevich (2010), we have that for all integer
,
(

( )

( )

( )

where

( )

if

( )

( )

( )

and

( )

if

Since

( )

( )

the equation (5.3) demonstrates that (


(

( )

(
(

, when

)
)

as

( )

, and
.

By replacing the distribution functions in (5.3) by their empirical counterparts,

Fn1 (u) n1

I Z
n1

j 1

1j

u , the empirical version of the equation (5.3) then has the form

of
(

( )

))

96

( ))

))

))

))

( ) )]

This leads to
(

(
(

( )

)
)

)(

Now, by the equations (5.1), (5.2), and (5.5), the resulting empirical constraint for values
of

is
(

To find values of

)(

that maximize the likelihood

provided that the condition

(5.6), we formalize the Lagrange function as

where

)(

is a Lagrange multiplier. Maximizing the equation above, the values of


have the form of
(

where

( ),

if

, and

),

if

As a consequence, the density-based EL estimator of the ratio

can be

formulated by

One can show that properties of the statistic

strongly depend on the selection of

values of the integer parameter m. Attending to this issue, we eliminate the dependence
97

on the integer parameter m. Towards this end, we utilize the maximum EL concept in a
similar manner to arguments proposed in Vexler and Gurevich (2010). Thus, the
modified test statistic can be written as
(

Likewise, the approximation to the likelihood ratio

is

where
(

))

) )),

is defined in (5.2).

Finally, the proposed test statistic for Test 1 has the form of

that approximates the likelihood ratio

. Consequently, the decision rule is to reject

if
(
where

is a test threshold. Proposition 5.1 in Section 5.2.3 will demonstrate that the

proposed test

) in (5.9) is asymptotically consistent. The upper and lower

bounds for the integer parameters m and k in definitions of (5.7) and (5.8), respectively,
were selected to provide the asymptotic consistency. Note that, to test for the composite

98

hypotheses

vs.

, a complex consideration regarding a reasonable combination of

the Kolmogorov-Smirnov test and the Wilcoxon signed rank test can be applied (see, for
example, Section 5.3.2). Alternatively, the test (5.9) uses measurements from the therapy
groups, in an approximate Neyman-Person manner, providing a simple procedure to
evaluate the treatment effect on each therapy group. Section 5.3 shows, in various
situations, the test (5.9) is superior to the combinations of the classic tests based on
Kolmogorov-Smirnov and Wilcoxon procedures. It is also shown that the proposed
nonparametric test has power comparable with that of correct parametric likelihood ratio
tests. Thus, in contexts of the study described in Section 5.1, the direct application of the
density-based EL ratio test (5.9) provides an efficient evaluation of treatment effects with
ADHD and SMD in children.
5.2.2.2

TEST 2:

vs.

Our goal is to test for


( )
vs.

( )

),

( )

)
(

).

In a similar manner to the development of the density-based EL approximation to the


ratio

mentioned in Section 5.2.2.1, the EL ratio related to test for

vs.

can be defined as

where

are defined in (5.2). Consider the density-based EL approximation to the

corresponding ratio

The empirical constraint for values of

constructed based on the symmetric property of


99

can be

. By analogy with equations (5.1)-

(5.6), one can show the resulting empirical constraint on values of

in Test 2 has the

form of
(
where

are defined in (5.8) and

{ [ (

(
(

( ) )]

[ (

))

))

))

))

( ) )]

(the formal derivation of the constant is given in Appendix A.3.1).


Then the corresponding Lagrange function can be formulated by

where

is a Lagrange multiplier. Thus approximate values of

where

( ),

if

, and

),

if

are

Similarly to (5.7) and (5.8), the density-based EL estimator of the ratio


can be presented as

100

Finally, taking into account (5.10) and (5.14), the proposed test statistic for Test 2 can be
constructed as
(

In this case, the decision rule developed for Test 2 is to reject the null hypothesis if
(
where
5.2.2.3

is a test threshold.
TEST 3:

vs.

Consider the following hypotheses of interest


( )

vs.

( )

)
).

The corresponding likelihood ratio test statistic based on observations


can be

where

functions of observations Z under

( ))

derive values of

( ))

( ))

and

( ))

and

, respectively, as well as

are the order statistics based on the pooled sample of {


that are denoted by

denote the density


( ),

} and {

. Using the same technique as in Section 5.2.2.1, we


,

, that maximize the log likelihood

101

( )

given a constraint, empirical form of

. The proposed test statistic for

Test 3 is
(

where
(

) )]

[ (

))

))

))

Thus, the decision rule for Test 3 is to reject the null hypothesis if
(
where

is a test threshold.

5.2.3 ASYMPTOTIC CONSISTENCY OF THE TESTS


In this section, we present the following propositions to demonstrate the asymptotic
consistency of the proposed tests:
( ) be the density function with the expectations

Proposition 5.1. Let


(

))

and (

( )

( )

and under

and

))

. Let

)) . Then under

,(

,
(

{
as

Proof. We outline the proof in Appendix A.3.2.


Consider the testing problem

vs.

.
102

(
(

)
)}
)

(
(

)
)}
)

, have the density function, ( ),

Proposition 5.2. Let the pooled sample


with the expectations (
(

as

( ))

, and under

and (
(

))

. Then under

,
(
(

)
)

)}

Proof. We omit the proof, since it is similar to the proof of Proposition 5.1.

5.2.4 NULL DISTRIBUTIONS OF THE PROPOSED TEST


STATISTICS
To obtain critical values of the proposed tests, we utilize the fact that the proposed test
statistics are based on indicator functions ( ) and (
well as (
random variables

( ) and

( )
(

))

)
( )

( )
(

)) as

)) , where the

) have the uniform distribution, Unif[0,1], under

Thus, the distributions of the proposed test statistics are independent of the distributions
of observations and hence, the critical values of the proposed tests can be exactly
computed. For each proposed test, we conducted the following procedures to determine
the critical values,

, of the null distributions. We first generated data of

and

from

the standard normal distribution N(0,1) and then calculated the test statistics
corresponding to each proposed test. At each sample size

and

generated values of the test statistics (5.9), (5.15), and (5.17), with

, we obtained 50,000
tabulating the

critical values for the null distributions of the test statistics at the significance levels
0.01; 0.05; 0.1 (see, Table 5.2).

103

Table 5.2: The critical values for Test 1 by (5.9) (Test 2 by (5.15)) [Test 3 by (5.17)] with =0.1 for
different sample sizes ( , ) and significance levels
0.01

10
7.44 (5.75) [4.17]

15
7.18 (5.85) [4.18]

20
25
7.39 (6.02) [4.18] 7.33 (6.17) [4.16]

30
7.45 (6.33) [4.25]

35
7.47 (6.20) [4.42]

0.05

5.22 (3.86) [2.68]

5.06 (3.93) [2.82]

5.25 (4.09) [2.84]

5.35 (4.28) [2.85]

5.38 (4.37) [2.97]

5.42 (4.43) [3.11]

0.1

4.27 (3.11) [2.14]

4.19 (3.17) [2.27]

4.39 (3.35) [2.30]

4.48 (3.52) [2.36]

4.56 (3.60) [2.46]

4.59 (3.68) [2.60]

0.01

6.83 (5.39) [4.15]

6.96 (5.57) [4.16]

6.98 (5.68) [4.24]

6.88 (5.80) [4.33]

6.96 (5.81) [4.39]

0.05

4.90 (3.73) [2.83]

5.02 (3.89) [2.85]

5.15 (4.04) [2.97]

5.14 (4.11) [3.12]

5.20 (4.18) [3.13]

0.1

4.09 (3.07) [2.30]

4.25 (3.24) [2.34]

4.38 (3.38) [2.45]

4.36 (3.45) [2.59]

4.44 (3.52) [2.63]

0.01

6.86 (5.44) [4.24]

6.96 (5.55) [4.38]

6.99 (5.69) [4.41]

7.01 (5.73) [4.42]

0.05

5.11 (3.94) [2.96]

5.24 (4.11) [3.11]

5.25 (4.19) [3.16]

5.27 (4.26) [3.19]

0.1

4.35 (3.33) [2.45]

4.48 (3.47) [2.58]

4.51 (3.55) [2.64]

4.52 (3.62) [2.69]

0.01

7.08 (5.62) [4.40]

7.01 (5.76) [4.44]

7.10 (5.85) [4.56]

0.05

5.33 (4.18) [3.15]

5.36 (4.25) [3.22]

5.37 (4.32) [3.34]

0.1

4.56 (3.55) [2.64]

4.59 (3.63) [2.70]

4.62 (3.69) [2.82]

0.01

6.89 (5.63) [4.55]

6.99 (5.75) [4.68]

0.05

5.31 (4.23) [3.32]

5.33 (4.27) [3.44]

0.1

4.59 (3.65) [2.80]

4.61 (3.68) [2.93]

10

15

20

25

30

35

0.01

6.91 (5.68) [4.64]

0.05

5.35 (4.28) [3.47]

0.1

4.64 (3.69) [2.98]

Remark 5.1: The definitions (5.9), (5.15), and (5.17) of the proposed test statistic
include

). We set up

. To investigate the test statistics with different

values of , we conducted an extensive Monte Carlo study. The Monte Carlo powers of
the proposed tests were not found to be significantly dependent on values of

These experimental results are similar to those shown in Gurevich and Vexler (2011).

104

).

5.3

SIMULATION STUDY

In this section, we examine the power properties of the proposed tests in various cases
using Monte Carlo simulations. The proposed tests based on (5.9), (5.15), and (5.17) with
=0.1, are compared with the common test procedures: the maximum likelihood ratio
(MLR) tests, assuming parametric conditions on distributions of observations (for details
of the constructions and definitions of the MLR tests, see Appendix A.3.3); combined
classic nonparametric tests with a structure based on the Wilcoxon signed rank test or/and
the Kolmogorov-Smirnov test. We fixed the significance level of the tests to be 0.05 in
all considered cases.

5.3.1 POWER COMPARISON WITH THE PARAMETRIC


METHOD
In order to present the comparative power of the proposed tests versus the corresponding
MLR tests, we performed the following Monte Carlo study. Critical values of the MLR
test statistics were obtained based on 50,000 simulations under

based on N(0,1)-

distributed observations Z. To study the powers of the tests, 10,000 samples for each size
( ,

) were generated from a variety of distributions. Tables 5.3-5.5 depict the Monte

Carlo powers of the proposed tests and those of the corresponding MLR tests.
When observations are normally distributed, as anticipated, the MLR tests would be
more powerful than the proposed nonparametric tests. The tables show the powers of the
proposed tests are very close to those of the MLR tests, demonstrating that the densitybased EL tests are comparable to the parametric methods that utilize the correct
information regarding distributions of observations. Table 5.6 displays the actual type I

105

errors of the MLR tests under the misspecification of underlying distributions, i.e., when
observations were simulated from t distributions with different degrees of freedom, a
logistic with parameters (0,1), a Laplace with parameters (0,1), and the Unif[0, 1] under
. As can be seen from Table 5.6, the type I errors of the MLR tests for
vs.

vs.

and

are not under control until the degrees of freedom of the t distribution < 200.

For the cases of the logistic and the Laplace distributions, the type I errors of the MLR
tests are not well controlled. When the observations are from Unif[0,1], the impact of the
misspecification of the model on the type I errors of the MLR tests is more significant.
This illustrates that the considered MLR tests are strongly dependent on assumptions
regarding distributions of observations.

5.3.2 POWER COMPARISON WITH CLASSIC


NONPARAMETRIC METHODS
In this section, we compare the power of the proposed tests to the power of procedures
based on the classic nonparametric tests. Since Tests 1 and 2 are based on the composite
hypotheses regarding between-group differences and treatment effects, the respective
Kolmogorov-Smirnov test and Wilcoxon signed rank test cannot be directly applied to
test for these hypotheses. In this case, one can perform combined tests based on the
Kolmogorov-Smirnov test and the Wilcoxon signed rank test for

vs.

and

vs.

. For the comparison, we used combined nonparametric tests with the Bonferroni
method. Let W-test denote the Wilcoxon signed rank test and K-S test denote the
Kolmogorov-Smirnov test. The combined nonparametric test for
two W-tests for symmetry and one K-S test based on

for

vs.

consists of
. The former

tests are employed to assess a treatment effect of each therapy group, whereas the latter
106

test is conducted to detect the group difference. Similarly, we performed the combined
nonparametric test for
procedure for
To test

vs.

vs.

that includes one W-test and one K-S test. The classical

is the W-test for symmetry.

vs.

and

vs.

, we assigned different distributions for baseline

measurements X and endpoint observations Y in each group under the alternative


hypothesis (i.e., (

) vs. (

directly generated observations


N(0, 1); Unif[-1,1];

) ), whereas when testing for

vs.

from three cases of symmetric distributions under

, we
:

distribution. Tables 5.7-5.9 contain the results of the power

comparisons of the two different testing procedures: the proposed procedures and the
nonparametric testing procedures based on the W-test and/or K-S test using the
Bonferroni approach.
The Monte Carlo outputs shown in Tables 5.7-5.9 indicate that the new tests have
higher powers as compared to the combined nonparametric tests. In particular, for the
cases of small sample sizes (e.g., (

(10, 10), (25, 25)), the proposed tests are

significantly superior to the classic tests. In several cases, the powers of the proposed
tests have values that are 3-4 times larger than those of the combined nonparametric tests.

5.4

DATA ANALYSIS

In this section, we apply the proposed method to the study described in Section 5.1,
which evaluates treatment effects of ADHD and SMD in children. Study subjects were
randomized to receive either the experimental 11 week group therapy program (
or community psychosocial treatment (

). We defined the former as group 1 and

the latter as group 2. For each child enrolled in the study, CDRS-Rts was taken at the
baseline (week 0) and endpoint (week 11). Specifically, we computed the differences of
107

CDRS-Rts:

, for i 1, 2; j

assessed at baseline before subject

, where

receives treatment

stands for the CDRS-Rts


and

represents another

CDRS-Rts at the endpoint after subject receives treatment . The empirical histograms
of the CDRS-Rts at baseline and endpoint for each group are shown in Figure 5.1. As can
be seen from Figure 5.1, it appears that both therapy groups have a decline in the CDRSRts after baseline but the decrease in the CDRS-Rts seems to be more significant in the
group 1.
In the context of the studys interests to test a claim that the distributions of the
changes in CDRS-Rts are not equivalent with respect to the therapy groups or at least one
therapy group has a treatment effect, we performed the proposed test 1. In this case, the
observed value of the test statistic by (5.9), with =0.1, is 22.8217 and the corresponding
p-value is 0.00002, indicating the null hypothesis of no group differences and the lack of
treatment effects in both groups is rejected. The combined nonparametric test (the two
W-tests and one K-S test) also rejects the null hypothesis with the p-value 0.000005.
Based on these results, there is strong evidence to reject the null hypothesis.
In addition, to demonstrate an applicability of the proposed tests, we carried out Test 2
that might be appropriate to test an assertion that there is a treatment effect in one group
and no such effect in the other besides a group difference. The observed value of the test
statistic by (5.15), with =0.1, is 11.9370 and the corresponding p-value is 410-5. The
combined nonparametric test (the W-test and one K-S test) with the Bonferroni
method also supports the result to reject the null hypothesis with the p-value of 510-7.
These results show that the proposed procedures are in conjunction with the classic tests,
demonstrating that our proposed tests can be utilized in the ADHD and SMD study.

108

In addition to the analysis above, we also conducted a bootstrap type study. To


perform this study for Test 1, we executed the following procedure. We randomly
selected samples with the sample sizes of (

(9, 6), (9, 7), (11, 9), (13, 10), (13,

11), (15, 13) from the original dataset. Then we calculated the corresponding test statistic
(

) by (5.9), where =0.1. We repeated this strategy 10,000 times calculating the

proportion of rejections at
percentage of times when

of the null hypothesis; that is, we computed the


(

. The bootstrap type study for Test 2 was

also carried out following the same procedures as described above. The results regarding
the proportion of the rejections of the null hypothesis for each considered test are
provided in Table 5.10.
Table 5.10 demonstrates that the proposed procedures have a larger proportion of the
rejections in comparison with the combined nonparametric tests. In particular, when the
sample sizes are relatively small (e.g., (

(9, 6), (9, 7)), the differences in the

proportions of the rejections between two approaches are strongly recognizable. For
example, we select a sample of size 9 from the group 1 and a sample of size 6 from the
group 2. This sub-dataset was tested for the hypotheses

vs.

(Test 1). In contrast to

the result that the nonparametric test based on the two W-tests and one K-S test for
vs.

is not statistically significant (the Bonferroni adjusted p-values of these classic

tests are 0.0617, 0.1050, and 0.9873, respectively), the proposed Test 1, with =0.1, is
statistically significant (p-value=0.0005). Figure 5.2 shows the empirical histograms of
and

from the sub-dataset. All these results indicate that the proposed methods for

Tests 1 and 2 are more sensitive to detect the difference between the null hypothesis and

109

the alternative hypotheses involved in Tests 1 and 2 compared to the corresponding


combined nonparametric tests.

5.5

CONCLUSIONS

In this chapter, we proposed and examined the two-sample density-based EL ratio tests
based on paired observations. While constructing the tests, we used approximations to the
most powerful test statistics with respect to the stated problems, providing efficient
nonparametric procedures. The proposed tests are shown to be exact and simple in
performing. The extensive Monte-Carlo studies confirmed powerful properties of the
proposed tests. We showed that our tests outperform different tests with a structure based
on the Wilcoxon signed rank test and/or the Kolmogorov-Smirnov test, and outperform
the parametric likelihood ratio tests when the underlying distributions are misspecified.
The data example illustrated that the proposed tests can be easily and efficiently used in
practice.

110

Table 5.3: The Monte Carlo powers of Test 1 by (5.9) vs. the MLR test for
vs.
with
) at the significance level
different sample sizes (
.
Proposed
MLR
test at (5.9)
N(0, 1)
N(0.2, 0.252) N(0.1, 0.52)
N(0.5, 1)
10
10
0.1541
0.1579
50
50
0.6232
0.6921
2
2
2
N(1, 1)
N(3, 2 )
N(1, 2 )
N(3, 1.5 )
10
10
0.8245
0.8319
25
25
0.9993
0.9996
N(2.5, 0.82) N(1.5, 0.52)
N(1, 1.52)
N(1.5, 0.62)
10
10
0.7267
0.7351
25
25
0.9956
0.9978
2
2
2
2
N(2, 1.5 )
N(5, 2.5 )
N(0, 3 )
N(3, 1.5 )
10
10
0.9174
0.9199
25
25
0.9998
1.0000
N(0.3, 0.52) N(0.5, 1)
N(0.25, 0.252) N(0.5, 0.52)
10
10
0.2818
0.4497
50
50
0.9764
0.9993
2
2
N(0, 1)
N(0.5, 1)
N(0.5, 1.1 )
N(1, 0.5 )
10
10
0.2450
0.2461
50
50
0.8731
0.9274
N(0.1, 1)
N(0.5, 1)
N(1.5, 1.12)
N(1, 1.32)
10
10
0.1605
0.1732
50
50
0.6476
0.7450
2
N(0.5, 0.5 ) N(1, 1)
N(0, 1)
N(0, 1)
10
10
0.1723
0.1764
50
50
0.7171
0.7942
N(0, 1)
N(0, 1)
N(1.5, 1.12)
N(1, 1.32)
10
10
0.1125
0.1300
50
50
0.4321
0.5531
2
2
N(0, 1)
N(0.5, 1)
N(0.5, 1.2 )
N(1, 0.5 )
10
10
0.2208
0.2230
50
50
0.8348
0.8785

111

Table 5.4: The Monte Carlo powers of Test 2 by (5.15) vs. the MLR test for
) at the significance level
different sample sizes (
.
Proposed
MLR
test at (5.15)
N(0, 0.52) N(0.5, 1) N(0,1)
10 10 0.2325
0.2351
50 50 0.7989
0.8560
2
N(0, 0.5 ) N(1.5, 1) N(0,1)
10 10 0.9564
0.9593
15 15 0.9957
0.9967
N(0, 1)
N(1.5, 1) N(0,1)
10 10 0.877
0.8991
25 25 0.9995
0.9997
2
2
N(0, 1.5 ) N(3, 2 )
N(0,1)
10 10 0.9795
0.9971
15 15 0.9989
1.0000
N(1, 0.52) N(2, 1.52) N(0,1)
10 10 0.5219
0.6194
50 50 0.9975
0.9997

vs.

Table 5.5: The Monte Carlo powers of Test 3 by (5.17) vs. the MLR test for
vs.
) at the significance level
different sample sizes (
.
Proposed
test at
(5.17)
N(1, 1)
N(1.5, 1.52) N(1, 1)
N(1.5, 1.52)
10 10 0.2058
50 50 0.6709
N(1, 1)
N(1.5, 1)
N(1, 1)
N(1.5, 1)
10 10 0.3059
50 50 0.8661
N(0, 0.52)
N(0.6, 1)
N(0, 0.52)
N(0.6, 1)
10 10 0.5922
50 50 0.9974
2
2
N(0.5, 0.25 )
N(1, 1)
N(0.5, 0.25 ) N(1, 1)
10 10 0.5072
50 50 0.9869
N(2.5, 1.252)
N(2, 0.52)
N(2.5, 1.252) N(2, 0.52)
10 10 0.3383
50 50 0.9004

112

with

with
MLR

0.2191
0.7919
0.3235
0.9407
0.6377
0.9996
0.5444
0.9977
0.3509
0.9582

Table 5.6: The Monte Carlo type I errors of the MLR tests
MLR test
for
vs.

Logistic(0,1)
Laplace(0,1)
Unif[0, 1]

MLR test
for
vs.

MLR test
for
vs.

10
50

10
50

0.1599
0.2934

0.1838
0.3260

0.0428
0.0465

10
50

10
50

0.1154
0.1939

0.1322
0.2201

0.0434
0.0483

10
50

10
50

0.0955
0.1447

0.1066
0.1687

0.0458
0.0488

10
50

10
50

0.0507
0.0503

0.0493
0.0506

0.0499
0.0502

10
50

10
50

0.0717
0.0855

0.0759
0.0968

0.0463
0.0508

10
50

10
50

0.1108
0.1446

0.1293
0.1645

0.0438
0.0488

10

10

1.0000

0.9993

1.0000

Logistic(0,1)
Laplace(0,1)
Unif[0, 1]

Table 5.7: The Monte Carlo powers of the proposed test (5.9) vs. the combined nonparametric test
(the two Wilcoxon signed rank tests and one Kolmogorov-Smirnov test) at
.
Proposed
W and K-S
test at (5.9) tests
Exp(1)
Lognorm(0, 22)
N(0, 1)
N(0.5, 1.52)
10 10
0.2238
0.0946
50 50
0.9616
0.6273
2
2
Lognorm(1, 1) Lognorm(1, 0.5 ) N(0,1)
N(1.5, 2 )
10 10
0.3646
0.2754
50 50
0.9953
0.9849
Exp(3)
Lognorm(0, 22)
Gamma(5,1) Gamma(1, 5)
10 10
0.6321
0.5016
2
Gamma(1,10)
N(0,1)
N(0.5,
2
)
( )
10 10
0.3815
0.1179
50 50
1.0000
0.8576
Exp(1)
Cauchy(1,1)
N(0.5, 1)
N(1.5, 22)
10 10
0.1819
0.1255
50 50
0.7928
0.7426
2
Exp(1)
Lognorm(0, 2 )
Unif[-1, 1]
Unif[-1, 1]
10 10
0.2325
0.0836
50 50
0.9981
0.6939
113

Table 5.8: The Monte Carlo powers of the proposed test (5.15) vs. the combined nonparametric
test (the one Wilcoxon signed rank test and one Kolmogorov-Smirnov test) at
.
Proposed
W and K-S
test at (5.15) tests
Exp(3)
N(1.5, 22)
N(0, 1)
10
10
0.5638
0.2876
50
50
0.9995
0.9816
2
Lognorm(1, 1)
Lognorm(1.3, 1.5 ) N(0,1)
10
10
0.6164
0.1279
Exp(1)
Beta(1,1)
N(0, 1)
10
10
0.2256
0.1193
50
50
0.9983
0.8046
Gamma(1,5)
N(0,
1)
( )
10
10
0.5582
0.1454
25
25
0.9949
0.9789
Exp(1)
Cauchy(1,1)
N(0, 1)
10
10
0.1613
0.0401
50
50
0.9736
0.2323
Exp(1.5)
N(0.5,1)
Unif[-1, 1]
10
10
0.1677
0.0384
50
50
0.9984
0.3042
Lognorm(1, 0.52) Lognorm(1.1, 0.52) Unif[-1, 1]
10
10
0.4484
0.0808
25
25
0.9963
0.9574
Exp(1.5)
Beta(3,1)
Unif[-1, 1]
10
10
0.1328
0.0934
50
50
0.8774
0.4128
Gamma(2,1)
Unif[-1,
1]
( )
10
10
0.7688
0.4005
Exp(1)
Cauchy(1,1)
Unif[-1, 1]
10
10
0.4052
0.0608
Exp(3)
N(1.5, 22)
10
10
0.4198
0.2553
Lognorm(1, 1)
Lognorm(1.2, 1)
10
10
0.2892
0.0657
50
50
0.8807
0.6635
Exp(1.5)
Beta(2,1)
10
10
0.2204
0.0648
50
50
0.9997
0.4062
Gamma(10,1)
( )
10
10
0.9044
0.6699
Exp(1)
Cauchy(1,1)
10
10
0.0792
0.0316
50
50
0.2891
0.0877

114

Table 5.9: The Monte Carlo powers of the proposed test (5.17) vs. the Wilcoxon signed rank test
at the significance level
.
Proposed
test at (5.17)
Exp(1)
Lognorm(0, 22)
Exp(1)
Lognorm(0, 22)
10 10
0.4136
50 50
0.9992
2
2
Lognorm(1, 1) Lognorm(1, 0.5 ) Lognorm(1, 1) Lognorm(1, 0.5 )
10 10
0.1218
50 50
0.7906
Gamma(5,1)
Gamma(1, 5)
Gamma(5,1)
Gamma(1, 5)
10 10
0.1218
50 50
0.8074
Gamma(1,10)
Gamma(1,10)
( )
( )
10 10
0.3125
50 50
0.9629
Beta(0, 0.8)
Exp(1.5)
Beta(0, 0.8)
Exp(1.5)
10 10
0.2094
50 50
0.9928

Table 5.10: The proportions of rejectionsa based on the bootstrap method for each
considered test.
Bootstrapped Test 1
Test 2
sample sizes
Proposed
Classic testb Proposed
Classic testc
(
)
test (5.9)
test (5.15)
(9, 6)
0.9858
0.7135
0.9755
0.9134
(9, 7)
0.9870
0.7172
0.9800
0.9165
(11, 9)
0.9989
0.9795
0.9955
0.9844
(13, 10)
0.9998
0.9962
0.9993
0.9972
(13, 11)
0.9999
0.9965
0.9997
0.9975
(15, 13)
1.0000
0.9997
0.9999
0.9994
a. The proportion of rejections of each test from the bootstrap method was computed based
on sample sizes (
) and 10,000 replications; b. The combined classic test for
vs.
is based on two W-tests and one K-S test; c. The combined classic test for
vs.
is
based on one W-test and one K-S test.

115

W test
0.2933
0.9223
0.0731
0.2208
0.0886
0.2294
0.2344
0.8517
0.1518
0.6003

Figure 5.1: Histograms of CDRS-Rts related to the baseline and endpoint in group 1: (
), and in group 2: (

).

Figure 5.2: Histograms of the paired observations


data, with sample sizes (

and

based on the CDRS-Rts

)=(9, 6), that were sampled from the original data set.

116

CHAPTER 6
OPTIMAL PROPERTIES OF PARAMETRIC
SHIRYAEV-ROBERTS STATISTICAL CONTROL
PROCEDURES

6.1

INTRODUCTION

In this chapter, we study parametric Shiryaev-Roberts type procedures that can be applied
to key problems of the statistical process control issues that include retrospective and
sequential change point detection problems. Considerations of these problems are very
important in the context of quality and reliability controls, special topics of statistical
inference, as well as in experimental and mathematical sciences (e.g., Lai, 1995;
Gurevich and Vexler, 2010).
Firstly, we outline a main principle of the proof related to the Neyman-Pearson
fundamental lemma (e.g., Vexler and Gurevich, 2011). To this end, let us define [0,1]
and A , B to be any real numbers. Then, it is clear that

A BI A B 0 ,

where I is the indicator function. This inequality can be easily applied to evaluate
optimal properties of decision rules. For example, consider the simple classification
problem, where given a sample of k independent and identically distributed observations
X 1 ,..., X k , we want to test the hypothesis

117

H 0 : X1 ,..., X k ~ F0 versus H1 : X1 ,..., X k ~ F1 .

Here F0 and F1 are known distributions with the density functions f 0 x and f1 x ,
respectively. The inequality (6.1) determines that the most powerful test for (6.2) is the
likelihood ratio test that rejects H 0 if
threshold.

This

classical

i1 f1 X i / i1 f0 X i C , where C
k

proposition

directly

follows

from

is a fixed

(6.1),

when

A i 1 f1 X i / i 1 f0 X i , B C , is considered as any decision rule based on the


k

observed sample, and the expectation, under H 0 , is derived from both the sides of (6.1).
Although the example mentioned above is very simple, the inequaltiy (6.1) can be
applied to show different aspects of optimality related to operating characteristics of
complex test-statistics. In this chapter, we utilize the trivial inequality (6.1) to provide
simple proofs of non-asymptotic optimal properties of retrospective Shiryaev-Roberts
procedures. We consider situations related to the retrospective change point detections,
proposing accordingly adjusted forms of the retrospective Shiryaev-Roberts procedure.
The problem to detect more than one change-point is also analyzed in this chapter. The
real data example is provided to demonstrate applicability of the proposed approach in
practice. Considering sequential change point problems, we show that any given
sequential test can be evaluated via an application of (6.1) type inequalities that provide
optimal properties of this test, but explanations of this optimality corresponding to the
classical operating characteristics of tests are complicated tasks. The presented analysis
of the sequential Shiryaev-Roberts procedure and its non-asymptotic optimal property
clearly demonstrate this issue.
All sections of this chapter are supplied with brief introductions related to the
corresponding problem statements. In Section 6.2, we consider the retrospective AMOC
118

(at most one change) change point problem and review the techniques addressed in the
literature. Theoretical results, which show a non-asymptotic optimal property of the
retrospective Shiryaev-Roberts procedures, are also presented. In Section 6.3, we propose
and analyze in details adjusted forms of the retrospective Shiryaev-Roberts procedure for
detecting two changes in distributions of independent observations. This section clearly
demonstrates how this procedure can be adapted to be used for the multiple changepoints detection. A real data example introduced in Section 6.3 demonstrates that the
proposed generalized Shiryaev-Roberts procedures can be easily applied in practice.
Section 6.4 contributes results related to sequential change point problems. We outline
here the proof of a non-asymptotic optimality of the sequential Shiryaev-Roberts
procedure. We present main conclusions in Section 6.5.

6.2

RETROSPECTIVE CHANGE POINT DETECTION

The scientific literature has shown a significant interest in investigations related to


retrospective change point detection problems (e.g., Page, 1954, 1955; Chernoff and
Zacks, 1964; James et al., 1987; Gombay and Horvath, 1994; Gurevich and Vexler, 2005,
2010). These problems are directly associated with process capability and are important
in biostatistics, engineering, education, economics and other fields (see, e.g., Sen and
Srivastava, 1975). The literature presents change point detection problems as key issues
that belong to testing statistical hypotheses. This section focuses on the problem to detect
a change in a distribution of independent data points. These data points are assumed to be
observed before that we run requested procedure to analyze the homogeneity of the data
points.

119

Thus, let us set up X 1 ,..., X n to be independent observations with density functions


g1 ,..., g n , respectively. The retrospective change point problem can be formulated via the

notation related to hypothesis testing, when we want to test the null hypothesis:
H 0 : g i f 0 for all i 1,...,n .

versus the alternative hypothesis:


H1 : g1 ... g 1 f 0 g ... g n f1 , is unknown.

The unknown parameter , 2 n , is called a change point. The statistical literature


has investigated the problem (6.3) in parametric and nonparametric settings. In the
parametric case of (6.3), it is assumed that the density functions f 0 and f1 have known
forms that can contain certain unknown parameters (e.g., James et al., 1987; Gombay and
Horvath, 1994; Chernoff and Zacks, 1964; Kander and Zacks, 1966; Sen and Srivastava,
1975). In the nonparametric case of (6.3), the functions f 0 , f1 are assumed to be
completely unknown (e.g., Wolfe and Schechtman, 1984; Gurevich, 2006; Gurevich and
Vexler, 2010). The common distribution-free procedures for change point detections are
based on signs and/or ranks and/or U-statistics (e.g., Gombay, 2001; Gurevich, 2006).
In this chapter, we attend to the parametric case of the change point problem (6.3).
Such situations are widely addressed in both the theoretical and applied literature.
Chernoff and Zacks (1964) considered the problem (6.3) with normally distributed
observations. They assumed a uniform prior distribution for the change point and
suggested a Bayesian approach to construct the detection rule. Kander and Zacks (1966)
adapted the Chernoff and Zacks's method to be applied to a case based on data from oneparameter exponential families. In this framework, Sen and Srivastava (1975) presented a
test-statistic utilizing the maximum likelihood methodology; James et al. (1987)
120

proposed decision rules based on likelihood ratios and recursive residuals; Gombay and
Horvath (1994) considered the general case, defining density functions f0 u f u;0 ,
f1 u f u;1 , 0 1 , where the parameters 0 , 1 are unknown. They suggested to use

the maximal likelihood ratio: Zn max 2k n 2log k , where


k 1

sup i 1 f xi ;0 sup j k f x j ;1

sup i 1 f xi ;0
n

The Gombay and Horvaths test rule is to reject H 0 for large values of Z n .
Following the aims of this chapter, let us begin with a consideration related to a simple
situation, where density functions f 0 and f1 are known. In this case, the maximum
likelihood estimation of the change point parameter employed in the likelihood ratio

i f1 X i / f 0 X i leads us to the well-known CUSUM test (e.g., Gurevich and Vexler,


n

2010). That is, we should reject H 0 if and only if max f1 X i / f0 X i C , where


1 k n

i k

C 0 is a threshold. Alternatively, Vexler (2006) proposed and examined a test based on

the Shiryaev-Roberts approach: to reject H 0 if


1
1 n n f X
Rn 1 i C ,
n
n k 1 i k f0 X i

where C 0 is a threshold. Optimal properties of the CUSUM procedure have not been
addressed in the retrospective change point literature. Vexler (2006) and Vexler and
Gurevich (2011) showed the following non-asymptotic optimal property of the test (6.5)
for the problem (6.3). Let Pk and E k ( k 0,...,n ) respectively denote probability and

121

expectation conditional on k (the case k 0 corresponds to H 0 ). Setting in (6.1)


A Rn / n and B C implies

1
1
1

Rn C I Rn C Rn C .
n
n
n

Without loss of generality, we suppose that 0, 1 is any decision rule based on


X 1 ,..., X n such that the event 1 leads us to reject H 0 . Because of
n
n
n f X 1 n
f x n
1
1 n
E0 Rn E0 1 i ... 1 i f 0 xi dxi
n
n k 1 i k f 0 X i n k 1
i k f 0 xi i 1
i 1

k 1
n
n
1 n
1 n
... I 1 f 0 xi f1 xi dxi Pk 1 ,

n k 1
n k 1
i 1
i k
i 1

the derivations of H 0 -expectation applied to both the left and right side of (6.6) provide
the next proposition.
Proposition 6.1. The test (6.5) is the average most powerful test, i.e.
1 n 1

1
1 n
Pk Rn C CP0 Rn C Pk rejects H0 CP0 rejects H0 ,

n k 1 n

n
n k 1

for any decision rule [0, 1] based on the observations X 1 , X 2 ,..., X n .


Remark 6.1: The test statistic (6.5) can be easily modified when the density functions f 0
and f1 are known up to parameters. For example, one can use an approach to adapt (6.5)
to this case via the mixture technique described below (e.g., Krieger et al., 2003). For
instance, consider the problem (6.3) with a known density function f 0 , and
f1 u f1 u; , where is an unknown parameter. In this case, we can represent the test

(6.5) following the mixture methodology. That is, we choose a prior ( ) and pretend
that ~ ( ) . Hence, the mixture Shiryaev-Roberts type statistic has the form of

122

1 (1) 1 n n f1 X i ;
Rn
d ( ) .
n
n k 1 i k f 0 X i

This definition provides to show the following property of the adapted change point
detection scheme:
1 n
1

Pk Rn(1) C X j are from f1 X i ; d ( ) CP0 Rn(1) C

k
n k 1
n

1 n
Pk rejects H0 X j jk are from f1 X i ; d ( ) CP0 rejects H0 ,
n k 1

for any decision rule [0,1] based on the observations X 1 ,..., X n . This optimality is
again obtained using the inequality (6.1). In this case, the meaning of optimality
mentioned in Proposition 6.1 is modified to be integrated over values of the unknown
parameter with respect to the function .
The different approach for the case, where f0 u f u;0 , f1 u f u;1 , 0 1 ,
is to adapt the CUSUM and Shiryaev-Roberts tests to be the next rules: to reject H 0 if
max k C1 ,

1 n
k C2 ,
n k 2

2 k n

and

respectively, where C1, C2 0 are thresholds and the ratios k , k 2,..., n, are denoted in
(6.4). Gurevich and Vexler (2010) conducted an extensive Monte Carlo study to compare
various change point procedures. The powers of the modified CUSUM test (6.7) and the
modified Shiryaev-Roberts test (6.8) were compared for different families of the null-and
the alternative-distributions. It was shown that the test (6.8) is more powerful (not just
more powerful in average) than the test (6.7) in most of considered scenarios. However,
123

when the change point location is relatively very close to 1, the test (6.7) is better than
the test (6.8). Monte Carlo experiments presented in Gurevich and Vexler (2010) also
confirmed that the modified Shiryaev-Roberts test statistic (6.8) is usually more robust
than the CUSUM test statistic (6.7) with respect to misclassifications regarding the data
distributions.
Remark 6.2: Gurevich and Vexler (2010) proposed distribution-free forms of the
CUSUM and Shiryaev-Roberts procedures approximating nonparametrically the
likelihood ratio's components of (6.7) and (6.8). Gurevich and Vexler (2010) used Monte
Carlo studies to show that comparisons of the nonparametric CUSUM and ShiryaevRoberts tests give the results similar to those of that related to the parametric tests
comparisons. The nonparametric form of the Shiryaev-Roberts test is more powerful (and
always more powerful in average) than that of the CUSUM test in most of scenarios.
The Shiryaev-Roberts procedure (6.5) can be easily adapted to be a multiple changepoints detection procedure. In the next section, we propose an extended Shiryaev-Roberts
procedure for the two change-points detection problem, presenting in details its nonasymptotic properties. We demonstrate an application of the proposed procedure in this
section to a real data example.

6.3

RETROSPECTIVE DETECTION OF TWO CHANGE


POINTS

In this section, we consider the problem to develop a test for


H 0 : g i f 0 for all i 1,...,n

versus
H1 : g1 ... g1 1 f1; g1 ... g 2 1 f 2 ; g 2 ... g n f3 ,

124

where 1 , 2 are unknown change points, 2 1 2 n . The density functions f 0 , f1 , f 2


and f 3 are not necessary known. We propose to apply the following adjusted ShiryaevRoberts statistic for the problem (6.9):
k1 1

1 (2) 1 n n
Rn
n
n k1 1 k2 k1

i 1

k2 1

j k1
n

l k2

f1 ( X i ) f 2 ( X j ) f 3 ( X l )

f0 ( X i )

i 1

Then, we reject H 0 if
1 (2)
Rn C ,
n

where C is a test threshold at the significance level of .

6.3.1 NON-ASYMPTOTIC OPTIMAL PROPERTIES OF


THE SHIRYAEV-ROBERTS TEST (6.10)
Let Pk1 ,k2 denote probability conditional on 1 k1 and 2 k2 . The next proposition
presents a non-asymptotic property of the proposed test (6.10).
Proposition 6.2. The proposed test (6.10) is the average most powerful test for (6.9) with
known density functions f 0 , f1 , f 2 and f 3 , i.e.
1 n n
1 (2)
1 n n
P
R

C
k ,k n
Pk ,k rejects H 0 ,

n k1 1 k2 k1 1 2 n
n k1 1 k2 k1 1 2

for any decision rule [0, 1] with fixed PH0 rejects H 0 = based on the observations
X 1 ,..., X n .

Proof. The corresponding proof scheme is similar to that of Proposition 6.1. That is,
using Equation (6.1) with A Rn(2) / n and B C , we can write

125


1 (2)
1 (2)

n Rn C I n Rn C 0 .

Taking the H 0 -expectation on both the sides of (6.11), we have

1
1
1

1
EH0 Rn(2) I Rn(2) C C EH0 I Rn(2) C EH0 Rn(2) C EH0 .
n
n

It is enough to note that utilizing Equation (6.12), we can complete the proof, since

EH 0

k2 1
n
k1 1

f
(
X
)
f
(
X
)
f3 ( X l )

1
i
2
j
n
n
i 1

j k1
l k2
Rn(2) EH 0

n
k1 1 k2 k1

f0 ( X i )

i 1

k1 1


k 1 k k
1

i 1

k2 1

j k1
n

l k2

f1 ( X i ) f 2 ( X j ) f 3 ( X l )

f0 ( X i )

i 1

i 1

f 0 ( X i ) dxi

i 1

k1 1 k2 k1
n

k1 1

k2 1

i 1

j k1

l k2

i 1

I ( 1) f1( X i ) f 2 ( X j ) f3 ( X l ) dxi

Pk ,k 1 Pk ,k rejects H0 .

k1 1 k2 k1

k1 1 k2 k1

When forms of the density functions f 0 , f1 , f 2 and f 3 depend on unknown parameters,


one can apply:
a. The mixture approach, in which a class of likelihood ratio test statistics is constructed
via the Bayesian methodology (see Remark 6.1). Let
f s u f u; s , s 0,...,3 ,

where 0 is an unknown parameter and the vector of unknown parameters (1 ,2 ,3 ) has


a known prior (1 ,2 ,3 ) . Then the mixture Shiryaev-Roberts statistic takes the form of

126

k1 1

1 (3) 1 n n
Rn
n
n k1 1 k2 1

i 1

k2 1

j k1
n

l k2

f1 ( X i ;1 ) f 2 ( X j ; 2 ) f 3 ( X l ;3 )

i 1

f 0 ( X i ;0 )

d (1 , 2 ,3 ) ,

where 0 arg max f0 ( X i ; ) is the maximum likelihood estimator, under the null

i 1

hypothesis, of 0 based on the observations X 1 ,..., X n . The appropriate test rejects H 0 if


1 (3)
Rn C ,
n

where C is a test threshold at the significance level of . The following proposition


presents an optimal property of the test (6.15).
Proposition 6.3. In a class of any detection rules [0, 1] based on the observations
X 1 ,..., X n for the problem (6.9) with the density functions f 0 , f1 , f 2 , f 3 presented at

(6.13), the test (6.15) is the average integrated most powerful test with respect to a prior
(1 ,2 ,3 ) for a fixed estimate of the significance level
n

( x1 , x2 ,..., xn ) f 0 ( x1 , x2 ,..., xn ;0 ) dxi , i.e.


i 1

1 n n
1

Pk1 ,k2 Rn(3) C? d (1 ,2 ,3 ) C PH0 Rn(3) C?

n k1 1 k2 k1
n

1 n n
Pk ,k rejects H 0 d (1,2 ,3 ) C PH0 rejects H 0 .
n k1 1 k2 k1 1 2

Proof. The proof is similar to those of Propositions 6.1 and 6.2.


Example 6.1. Let X r ~ N ( 0 , 02 ), r 1,..., n; X i ~ N ( 1, 12 ), i 1,..., k1 1 ;

X j ~ N ( 2 , 22 ), j k1,..., k2 1 , and X ~ N ( 3 , 32 ), k2 ,..., n , where s2 , s 0,...,3 , are


fixed known parameters, 0 is an unknown parameter. We assume that the priors for the

127

parameters j , j 1,2,3 , under the alternative hypothesis, are normal densities, i.e.
j ~ N ( j , j 2 ), j 1,2,3 . Then the mixture Shiryaev-Roberts statistic (6.14) is

12

12

1 k 1 k 1 (k1 1)

1 2
n
n

1 (3)
Rn
n

( k1 1)
2

( k2 k1 )

22
2

32
22
32

(k2 k1 )

(n k2 1)

( 0 ) n exp ( xi x0 ) 2 2 02
i 1

( n k2 1)
2

Ak ,k
exp 1 2
2

where
Ak1 ,k2

( x1 1 )2
( x2 2 )2
( x3 3 )2
, and

(12 (k1 1) 12 ) ( 22 (k2 k1 ) 2 2 ) ( 32 (n - k2 1) 32 )

k1 1

k2 1

i 1

i 1

j k1

x0 xi n , x1 xi (k1 1), x2

xj

(k2 k1 ), x3

xl

l k2

(n k2 1) .

b. The estimation approach, in which a class of likelihood ratio test statistics is


constructed via the maximum likelihood estimation of the parameters (see Remark 6.1).
Let f s u f u;s , s 0,...,3 . Then the proposed modified Shiryaev-Roberts statistic
has the form of
k1 1

k2 1

1 (4) 1 n n 1 i 1
Rn
n
n k1 1 k2 k1

2 j k1

sup f1 ( X i ;1 ) sup f 2 ( X j ; 2 ) sup f3 ( X l ;3 )


n

sup f 0 ( X i ;0 )

3 l k2

0 i 1

The appropriate test rejects H 0 if


1 (4)
Rn C ,
n

where C is a test threshold at the significance level of .

128

Example 6.2. Assume X r ~ N ( 0 , 02 ), r 1,..., n; X i ~ N ( 1, 12 ), i 1,..., k1 1 ;

X j ~ N ( 2 , 22 ), j k1,..., k2 1 , and X ~ N ( 3 , 32 ), k2 ,..., n , where expectations s


and variences s2 , s 0,...,3 , are unknown. Then, the statistic (6.16) has the form of
1 (4) 1 n n
Rn
n
n k1 1 k2 k1
n

where 0

( X i 0 )2
i 1

k2 1

Xj
j k1

k2 k1

( k1 1)
?1 2

, 0

Xi
i 1

( X 3 )2
k2

n k2 1

( k2 k1 )
2
2

( n k2 1)
?3 2

; 1

( X i 1 )2
i 1

k1 1

k1 1

; 3

n
2

k2 1

k1 1

, 1

Xi
i 1

k1 1

; 2

( X j 2 )2
j k1

k2 k1

, 3

X
k2

n k2 1

6.3.2 A REAL DATA EXAMPLE


In this example, we demonstrate that the proposed test (6.17) can be easily applied in
practice. We apply the proposed method to analyze data from a study that was shown in
Wians et al. (2001). The same data set was utilized by Obuchowski (2006) and Tian et al.
(2011). These authors compared the diagnostic abilities of different rapid blood testscores, including per cent transferrin saturation (%TS) and total iron binding capacity
(TIBC), for determining blood iron concentrations. The data set was composed of 134
patients (55 females and 79 males) with anemia who underwent the series of blood testscores. Following previous works of Obuchowski (2006) and Tian et al. (2011), we focus
on only the %TS and TIBC blood test-scores and limit the analysis to 55 female anemia
patients. The plots and the empirical histograms based on the %TS and TIBC data are
displayed in Figure 6.1.
129

Figure 6.1: The left-hand side of the Figure 6.1 shows plots and histograms of %TS data;
the right-hand side of the Figure 6.1 shows plots and histograms of TIBC data.

Tian et al. (2011) categorized the study subjects into three groups based on the results
of ferritin concentration that provides a useful screening test for iron deficiency anemia
(IDA). Non-pregnant women with anemia and a ferritin concentration less than 20

( g / L) were assigned to the IDA group, while those with anemia and a ferritin
concentration greater than 240 ( g / L) were assigned to be in the anemia of chronic
disease (ACD) group. The intermediate group consists of the women with ferritin
concentration between 20 and 240 ( g / L) . There were 29, 14, 12 female anemia
patients in IDA, intermediate, ACD groups, respectively. The histograms of the %TC
data and those of the TIBC measurements in each group are shown in Figures 6.2 and 6.3,
respectively.

130

Figure 6.2: Histograms of %TS data in each group

Figure 6.3: Histograms of TIBC data in each group

Our interest is to detect if the underlying distributions of the %TC data as well as the
distributions of the TIBC measurements change at two different points. In this section,
131

we formally test for the assumption made by Tian et al. (2011) that suggested to consider
the %TS measurements as three groups as well as the TIBC measurements splitted into
the three groups, i.e. there are two change points in the distributions of the %TS and also
two change points in the TIBC measurements distributions.
Following the publications mentioned above in this section, we assume the %TS and
TIBC data distributed normally. Thus, we apply the test based on the statistic (6.18). The
mean and standard deviation of the %TS data are 4.55 and 2.59, respectively, whereas the
mean and standard deviation of the TIBC observations are 345 and 120 ( g / L) ,
respectively. The means and standard deviations of the %TS and TIBC data in each
group are presented in Table 6.1.
Table 6.1: Means and standard deviations of the %TS data and the TIBC data in each
group
Group
IDA
intermediate
ACD
Sample size n
29
14
12
Mean
3.5276
5.0714
5.7500
Standard deviation
1.8820
2.5859
2.0505

To approximate the p-value of the test (6.17), where the statistic Rn(4) / n is defined by
(6.18), we propose the following methods.
6.3.2.1 THE METHODS FOR P-VALUE APPROXIMATION REALTED TO
TEST (6.17)
In this section, we propose and apply two different methods for the p-value
approximation related to the test (6.17) with the statistic Rn(4) / n by (6.18).
1) The Monte Carlo technique. Since, given that observations follow a normal
distribution, the construction of the test statistic (6.18), under the null hypothesis, does
not depend on parameters 0 and 02 of the null normal distribution, we can conduct the
132

Monte Carlo study to obtain the p-value of the test. To execute the Monte Carlo
experiment, we first drew 50,000 replicate samples of 55 observations X i ~ N 0,1 ,
i 1,...,55 , and evaluated the generated values of the test statistic, say, rj Rn(4)
55 / 55 at

one generation of X1,..., X 55 , j 1,...,50,000 . Let r be the observed test statistic value
based on the data. Then we determined the approximate p-value of the test as the
proportion of cases when values of r j , j 1,...,50,000 , exceed the value of r . Following
the procedures mentioned above, we obtained the p-value of 0.0244 based on the %TS
data and the p-value close to zero based on the TIBC measurements (p-value < 0.0001).
Both the p-values are less than the significance level of 0.05 ; therefore, we
recommend to reject the null hypothesis, implying that there are changes at two points in
both the %TS and TIBC data distributions.
2) Bootstrap calibration. The procedure of the bootstrap calibration (e.g., Owen, 2001) is
defined as follows. Let X i*b , i 1,..., n , b 1,..., B , be independent random vectors
sampled from the empirical distribution function Fn of the data X i , i 1,..., n . This
resampling can be implemented by drawing n random integers (i, b) independently from
the uniform distribution Unif [1, n], and setting X i*b X

( i ,b )

. We use n 55 and

*b
*b
B 10,000 . Now let Hb Rn(4)
55 ( X1 ,..., X 55 ) / 55 . This defines the order statistics

H (1) H (2) ... H ( B) . Then, the critical value of the test at the significance level of

0.05 is H (9,500) . The p-value of the test can be evaluated by obtaining


q : H ( q1) r H ( q ) , where r is a value of the test statistic based on the original data set,

and (1 q n) approximates the p-value. The bootstrap procedure gives the corresponding
p-values based on the %TS data and the TIBC measurements as 0.003 and 0.0001,
133

respectively. Both the p-values are less than the significance level of 0.05 , supporting
the conclusion that the underlying distributions of the %TC and TIBC measurements
have significant changes at two different points.
Therefore, both methods 1) and 2) suggest to reject the null hypothesis. Note that, for
the method 1), it is important that the observations under the null hypothesis are
independent and identically normally distributed, whereas for method 2), the observations,
under the null hypothesis, are assumed to be just independent and identically distributed
(i.i.d.). Consequently, in the case where data are close to be normally distributed, the type
I errors of method 2) is expected to be very close to those using method 1).
6.3.2.2 ADDITIONAL STUDY
In this subsection, we consider a situation when no change is expected in the real data
distributions. Now we test the hypotheses (6.9) based on the %TS observations in the
IDA group (n=29). By using the Monte Carlo study and the bootstrap calibration as
mentioned above, we obtain that the corresponding p-values are 0.4118 and 0.1136,
respectively. These results suggest that the distribution of the %TS data in the IDA group
has no significant change in this case. Similarly, by applying the Monte Carlo study and
the bootstrap calibration, the p-values based on the TIBC observations in the IDA group,
are 0.5376 and 0.6533, respectively. These p-values indicate that there is no significant
change in the distribution of the TIBC data in the IDA group in this case.

6.4

SEQUENTIAL CHANGE POINT DETECTION

There are extensive references in the statistics and engineering literature on the subject of
quick detection, with low false alarm rate, of changes in stochastic systems on the basis
134

of sequential observations from the system. These problems are very important in the
context of quality and reliability controls (e.g., Lai, 1995).
In many common situations, we assume that we survey sequentially independent
observations X 1 , X 2 ,... . Initially, the observations follow an in-control distribution with a
density function f 0 . It is possible that at -time, an unknown point in time, an accident is
in effect, causing the distribution of the observations to change to an out-of-control
distribution with a density function f1 .
A common performance measure for any inspection scheme is the in-control average
run length (ARL). Let T be the random variable corresponding to the time when the
scheme signals that the process is out of control (distribution of the observations has
changed), which henceforth will be referred to as the stopping time. Thus, T is the
number of observations until the alarm signal. The in-control ARL is defined by E f0 T ,
whereas the out-of-control ARL is defined by E f1 T . Additionally, we define by E f T
the expectation of the stopping time T under the assumption that the observations come
from a distribution with a density function f . Clearly, one desires E f0 T to be large and
E f1 T

to be small. In the literature, a proposed index of the speed of detection is

E T 1 T . The latter is the expectation of the delay in detection given that the

change is at point in time, and given that the stopping time T is larger than .
In this section, we consider the observations X 1 , X 2 ,..., X 1 to be distributed according
to a density function f 0 , whereas X , X 1 ,... from a density function f1 , with an
unknown (1 ) . The case indicates the situation, when all observations
are distributed according to f 0 . In this case, the notations P and E denote probability
135

and expectation, respectively, when all observations are distributed according to f 0 . The
sequential change point detection procedures are assumed to raise an alarm as soon as
possible after the change, avoiding false alarms. It is well known that CUSUM and
Shiryaev-Roberts procedures are efficient detection methods for this stated problem (e.g.,
Moustakides, 1986; Mei, 2006; Gurevich and Vexler, 2011). The CUSUM policy is: we
stop sampling of X s and report that a change in distribution of X has been detected at
the first time n 1 that max i k f1 X i / f 0 X i C , for a given threshold C ; similarly,
n

1k n

the Shiryaev-Roberts procedure can be defined via the stopping time


TC inf n 1 : Rn C ,

where the Shiryaev-Roberts test-statistic Rn is


n

Rn
k 1 i k

f1 X i

f0 X i

The sequential CUSUM detection procedure has a non-asymptotic optimal property (e.g.,
Moustakides, 1986). That is, if the initial and the final distributions of the observations
are known, then the CUSUM control procedure most rapidly detect a change in
distribution among all procedures with a common bound specifying an acceptable rate of
false alarms, i.e. in-control ARL. For the Shiryaev-Roberts procedure, an asymptotic (as
C ) optimality has been shown (Pollak, 1985). To demonstrate the optimality of the

Shiryaev-Roberts detection scheme (6.20), Pollak (1985) proved an asymptotic closeness


of the expected loss using a Bayes rule for the change problem, with a known prior
distribution of .

136

However, in the context of simple application of the inequality (6.1), the procedure (6.19)
declares loss functions for which that detection policy is optimal. That is, setting
A RminTC ,n and B C in (6.1) leads to

min TC , n

C I Rmin TC ,n C 0 ,

for all [0, 1] . Because of RminTC ,n C TC n , we have

min TC , n

C I Rmin TC ,n C

Rk C 1 I TC k Rn C I TC n 0 .
k 1

It is clear that (6.21) can report an optimal property of the detection rule TC .
For simplicity, noting that every summand in the left side of the inequality (6.21) is nonnegative, we can focus only on Rn C I TC n 0 .Thus, if is defined to be a
stopping time and I n, then
E C Rn I n, TC n 0 .

This and definition (6.20) imply that

C P TC n P min , TC n Pk TC n Pk min , TC n 0 .
k 1

Therefore,

C P TC n P min , TC n Pk TC n Pk min , TC n 0 ,
n 1

n 1 k 1

where

Pk TC n Pk min ,TC n
n 1 k 1

137

Pk TC n Pk min , TC n
k 1 n k

Ek TC k 1 Ek min , TC k 1
k 1

aIa 0 . The inequality (6.22) with (6.23) gives the next proposition.

Proposition 6.4. The Shiryaev-Roberts policy (6.19) satisfies

En TC n 1 CETC
n 1

min En min , TC n 1 CE min , TC .

n 1

Here, E presents the average run length to false alarm of a stopping rule . Small
values of E are privileged, whereas small values of En n 1 are also
preferable (because En n 1 relates to fallibility of the sequential detection in the
case n ). It is clear that, if , then min , TC detects that faster than the
stopping time TC . Consequently, Proposition 6.4 states the non-asymptotic optimal
property of the Shiryaev-Roberts sequential procedure in the context of series of delays in
the detection, considering the expectation E T 1 as the index of the speed of the

detection.

6.5

CONCLUSIONS

In this chapter, we introduced the general principals related to the retrospective change
point problems. We provided schemes to construct the Shiryaev-Roberts type procedures
corresponding to different change point problems. Although we considered the relatively
simple statement of the problem (6.3), with independent observations, in a similar

138

manner to constructions of Shiryaev-Roberts procedures mentioned in this chapter,


complex regression models (see, e.g., Vexler and Gurevich, 2009) can be evaluated. The
Shiryaev-Roberts based procedures are appropriate to replace the classical CUSUM
policies in many practical applications, because the Shiryaev-Roberts based procedures
are shown to demonstrate optimal properties. We also proposed the Shiryaev-Roberts
procedure for detecting two changes in the sequence of independent observations,
representing a way of developing Shiryaev-Roberts-type procedures for multiple changepoints detection. In this chapter, the real data example presents the applicability of the
proposed technique for detecting two possible changes in the biomarker measurements
distributions. In this example, we pointed out two methods for estimating the p-value of
the Shiryaev-Roberts-type test. These methods are general and can be applied to estimate
the p-values of other Shiryaev-Roberts-type tests.

139

CHAPTER 7
FUTURE WORKS
Based on results of previous studies mentioned above, the newly created densitybased EL methodology has been shown to be very efficient. This was expected, since the
proposed test-statistics approximate nonparametrically the most powerful likelihood ratios.
There remain several open problems in biomedical research that can be investigated via
EL approaches. Thus, my future work will continue along this line of research by
developing different efficient EL ratio based tests that can be applied to important
biomedical studies. Two possible topics of future work are briefly outlined as follows.

7.1 TESTING NORMALITY BASED ON INDEPENDENT SAMPLES


AND ERRORS IN REGRESSION MODELS
In many biostatistical applications, researchers commonly depend on statistical models
with distributional assumptions to conduct data analysis. The normal distribution is the
most widely used for this purpose and plays a critical role in various areas. For instance,
in biomedical studies, investigators often utilize the t-test or the Wilcoxons test to
conduct two-sample comparisons depending on whether the normality assumption is
satisfied.
Joint normality of a set of independent samples is a crucial assumption necessary for
the validity of many inferential procedures. Therefore, it is important to have an efficient
method for combining normality assessments in relation to statistically independent
samples. In the statistical literature, several classical procedures have been proposed for
this testing problem. For example, we can mention Shapiro and Wilk (1968), Pettitt
140

(1977), and Quesenberry et al. (1976). We plan to develop an efficient density-based EL


test for the joint normality of a set of k independent samples, where the means and
variances of the populations supposed to underlie the samples may not be all the same.
The proposed test is an exact test, which is simple and readily used in practice. Although
the formulation of the proposed test is for k sample problems, our primary interest are
small cases, particularly as small as two or three (i.e. k=2 or 3).
In addition to the development of previous test for the assessment of the joint
normality of a collection of independent samples, we will make the extension to
regression problems. We will develop a density-based EL ratio test to evaluate the
validity of the assumptions of normally distributed regression errors. We will investigate
the power properties of the proposed tests in various alternative scenarios via extensive
Monte Carlo studies and illustrate the practical applicability of the proposed tests using
real data sets.

7.2 A SIMPLE DENSITY-BASED EMPIRICAL LIKELIHOOD RATIO


TEST FOR INDEPENDENCE
Independence is a key concept in various critical statistical procedures. In many
applications, the research interest often lies in exploring potential relationships between
two sets of observations. For example, in the contexts of studies related to lung cancer, a
complex multifactorial disease, statistical tests for relations between genetic factors and
environmental factors have very vital roles (e.g., Gu et al., 2012). Both the theoretical
and applied statistical literature well addresses the classical measures of dependence for
pairs of two random variables in the forms of the Pearson (r), Spearman ( rs ), and
Kendall ( ) correlation coefficients. These measures of dependence are not designed to

141

efficiently capture all possible structures of dependency between two random variables.
The Pearson correlation coefficient is a measure of the strength of the linear relationship
between two random variables (e.g., Pearson, 1920; Hauke and Kossowski, 2011). The
Spearman correlation coefficient is a measure of a monotonic association between two
random variables (e.g., Spearman, 1904a; Hauke and Kossowski, 2011). The correlation
coefficient rs is commonly utilized when assumptions required to use the Pearson
correlation coefficient do not hold. Both the Pearson and the Spearman correlation
coefficients cannot be well served to analyze nonlinear forms of dependence between two
random variables (e.g., Embrechts et al., 2002). Practical data issues state the problems to
develop a general coefficient that can efficiently measure both linear and non-linear
dependencies between random variables. The Kendall correlation coefficient ( ) is a
well-known measure of the concordance between two rankings associated with two sets
of observations (Kendall, 1938, 1948). However, in many cases, the Kendall correlation
coefficient shows relatively lower power comparing to the former two classical measures
r and rs (e.g., Mudholkar and Wilding, 2003). Note that, for example, in the context of a
lung cancer study, Gu et al. (2012) proposed to consider relationships of two
polymorphisms rs1051730 and rs8034191 in random-effect-type forms that should be
tested. In additional to linear/nonlinear dependence structures, the present applied
biostatistical literature introduces random-effect-type associations between two sets of
observations. We plan to develop a simple density-based EL test that can be applicable to
all general cases of dependency, including linear, non-linear, and/or random effect type
associations, between two random variables. Towards this end, we will construct an exact
nonparametric likelihood ratio type test.
142

APPENDICES
A.1.1

PARAMETRIC LIKELIHOOD BASED ON THE


HYBRID DESIGN

Under the normal assumption, the log likelihood function based on the pooled-unpooled
data is in the form of
( |

(
(

)
)

(
(

)
)

))

Taking the corresponding first derivatives of the log likelihood function

))
( |

yields

(
(

(
(

)
(
(

)
)
)
)

(
(
)
)

(
(

(
(

(
(

)
)

(
)
)

)
)

Let these first derivatives be equal to zero. Then we can obtain the following system of
equations whose solutions are the maximum likelihood estimators.
143

)
)

To present the matrix , we obtain the second derivatives of the log likelihood function
as follows:

)
)

)
)

(
(

(
(

)
)

)
(

(
)

)
(

)
)

144

The Fisher Information matrix I can be calculated as

)
(

)
(

) ]

) ]

The asymptotic distribution of the maximum likelihood estimators (2.1) can be derived

A.1.2

THE ASYMPTOTIC DISTRIBUTION OF THE LOG


EMPIRICAL LIKELIHOOD RATIO TEST

A.1.2.1

THE REPEATED MEASUREMENTS

Under the null hypothesis, the empirical likelihood function based on repeated measures
data is
(

145

Thus, the EL ratio based on repeated measures data is given by

(
where

is a root of

By applying the Taylor expansion, we have

where

is a remainder term (for details, see Owen, 2001).

Therefore,
(
(

)
)

Now consider the log empirical likelihood ratio statistic in the form of
(

log [

Again, by applying the Taylor expansion with (


(
where

[ (

) around zero, we obtain


)

is a remainder term (for details, see Owen, 2001).

146

)]

Substituting the approximate solution of that we obtained yields


(

(
(

)]
)

(
(

)]
)

where

Hence,
(

Therefore, we complete to outline the proof of Proposition 2.1.


A.1.2.2

THE HYBRID DESIGN

In a similar manner to Section A1.2.1., we present the empirical likelihood ratio based on
the pooled-unpooled data in the form of

][
(

[
where

and

)] [

are roots of

)
(

respectively.
147

)
(

)]

By applying the Taylor expansion, we have

where

)
(

is a remainder term (for details, see Owen, 2001).

Consequently,

Similarly, one can obtain

Now consider the log EL ratio statistic,

, presented in the equation

(2.3).
By applying the Taylor expansion with

) and

) around zeros, we

obtain
(

where

and

[ (

[ (

are remainder terms (for details, see Owen, 2001).

148

Substituting the approximate solutions of

yields

)]

)]

)]

]
(

where

Hence,
log [

)]

log [

)]

Thus, we complete the sketchy proof of Proposition 2.2.

A.2.1

PROOF OF LEMMA 3.1

Vexler and Gurevich (2010) proved the first equality of Lemma 3.1. Now we obtain the
second equation of Lemma 3.1, using the empirical distribution functions
(

( )

( )

( )

( )

)
(

( )

( )

149

)
)

( )

, we have

( )

( )

)( (

))

) ))

( )

)( (

( )

( )

))

( ) ))

)( (

))

) ))

( )

)( (
( )

( )

))

( ) ))

( )

where

and

A.2.2

are the theoretical and empirical distribution functions, respectively.

TO SHOW THE PROPOSED TEST STATISTIC IS


IDENTICAL TO THE TEST STATISTIC
PRESENTED BY MUDHOLKAR AND TIAN (2002)

The test statistic

can be shown to be equivalent to the test statistic for IG

distribution based on sample entropy proposed by Mudholkar and Tian (2002). The
density-based EL ratio test statistic is defined as

(
The maximum likelihood estimator of

[(

))

can be written in the form of

) ]
150

) ] ( (

) (

) (

) (

) )

) ))

) )

By substituting above into

is formulated as

))

))

The test statistic

) )

))

is identical to the test statistic,

))

) )

))

) )
)

) )
(

( ))( ), presented

by Mudholkar and Tian (2002), utilizing the maximum likelihood estimator of


( (

))

instead of the sample UMVU estimator, where


151

( )

A.2.3

) )},

and

is the sample UMVU estimator of ,

) (

).

PROOF OF PROPOSTION 3.1

The proof of Proposition 3.1 is based on Proposition 2.2 of Vexler and Gurevich (2010).
In order to show the consistency of the proposed test, let us check conditions that are
present in Vexler and Gurevich (2010):
Conditions for Proposition 3.1
(C1) (

( ))

(C2) Under the null hypothesis

, we define |

and are the maximum likelihood estimators of

(|

and , respectively. By a property of


|

as

) [ (

the maximum likelihood estimator, a consistent estimator, |


(C3) Under the alternative hypothesis
we have

as

( (

, for

)]),

by virtue of the law of large numbers.

(C4) There are open intervals

and

containing

well as there exists a function t(y) such that |


, and ( ( ))

| ). Both

| |

)|

and , respectively, as

( ), for all

. It is clear that (C4) is satisfied.

Let us outline the proof of Proposition 3.1., using the proof scheme of Proposition 2.2. of
Vexler and Gurevich (2010). Consider the statistic

152

))

( (

))

). From Vasiceks ( 976), we have


(

( (

))
(

( (
(

))
)

( (

) , and

which is the first component of the test statistic,

)( (

))

))

) ))

) ))),

where Ys are followed by the distribution F, and have

, where

( )

( )

and

( )

the distribution F with a density function ( )

( )

),

as the empirical distribution

( ) as

function. Following Vasicek (1976),

uniformly for all


( )) is the entropy of

is an expectation when Ys are distributed from F. The statistic

positive variable independent of F, and hence


(

as

( )),

),

is a non-

. Hence,
)

( )), as

.
Therefore,
(
as

( ( )))

Now, we represent the statistic,


(

in the form of
( | ))

( | ))

153

( | )))

( )
Since (A.2.1), under

,
(

as

( )))

Since (

( ))

Let

) and

( | ))

( | )))

be the ith element of the vector .


(

as well as | |

( | )

| |, where

).

By expanding the third term of (A.2.2) in Taylor series until the first derivative, we
obtain
{

( | ))

( | ))}

Since there exists a function t(y) such that |


, and ( ( ))

, where

(
and

)|

)(

( ), for all

)
,

are open intervals containing

and , respectively,
{

( | ))

where, |

( | ))}

)| (

154

)(

|. Since, under the null hypothesis

where is the maximum likelihood estimator of ,

,|

as

{
Therefore, under

( | ))

Under

,
)

( | ))}

, (A.2.2)-(A.2.5) conclude
)

(
as

( ))

( )
( | )

( | )
( | )

Similarly to the proof of the results (A.2.3)-(A.2.5), we conclude that


)

( )

( | )

as

This and (A.2.6) complete the proof of Proposition 3.1.

A.2.4

PROOF OF PROPOSTION 3.2

Following Song (2000, 2002), we have

(( [
since [

)
] )

)
(

(
)

(
( [

])

])

)), as

] satisfies the conditions [m0 ] / log n and [m0 ] logn 2 / 3 / n1 / 3 0 as

. (Here,

by (3.7)).

Then,
[( [

] )

( [

155

])

( )

We choose

to minimize the argument of

(6mn)1/2 (log(TK n )m / n log(2m) R2 m1 ) , for all appropriate values of m.

A.3.1 COMUPTING THE EMPIRICAL CONSTRAINT


(5.11) UTILIZED TO DEVELOP TEST 2
Similarly to the equations (5.1)-(5.4), we have
(

where

( )
Let

( )

( )

))

( ))

))

))

))

are defined in (5.8). Here, due to the symmetric property of

alternative hypothesis
( )

( ) )]

under the

, applying the estimation proposed by Schuster (1975),

( ) can be estimated by
( )

[ (

denote the estimated value of the right-hand side of equation (5.11). It can be

easily shown that

156

)]

{ [ (

))

[ (

))

))

( ) )]

The resulting empirical constraint on values of


(

A.3.2

( ) )]

))

in Test 2 is given by

PROOF OF PROPOSTION 5.1

A.3.2.1 PROPOSED TEST 1


Consider the case of testing

vs.

) in Proposition 5.1. Towards this end, we

define
[

Then the proposed test statistic at (5.9), (


(

)]

)
(

)]

), can be expressed as

)
)

157

We first investigate the first term of the right-hand side of the equation (A.3.4). To this
end, we define the distribution function ( ) to be
( )

( )

( ))(

where
( )

( )

)) and

( )

( )

))

Also, an empirical distribution function is defined by


( )

( )

( ))(

where

( )

))

( )

))

and

are the empirical distribution functions based on observations


, respectively. Consequently, the first term of

and
in (A.3.4) can be

reformulated as

158

))

) )]

))

))

))

))

))

))

))

))

(
(

)
)
)

) )]

The first term in the right-hand of the equation (A.3.6) can be expressed as

))

))

(
(

))

(
(

(
(

))

))

))

(
(

))

))

The result shown in Theorem 1 of Vasicek (1976) leads to

(
(

))

))

where

Let ( )
Suppose

(
(

( )
(

and

))

( )

(
(

))

)(
(

))

))

is within an interval in which

159

) ))

(mod 2m).

( )

( )

( )(

( )(

is positive and continuous, then

(
(

for some existing value

( (

)) (

))

) ).

))

))

(
)

It follows that
) ))

can be written as

(mod 2m).

Let us define a density function ( ) that approximates ( ) as follows:


( )
For each

( )(

and sufficiently large

so that we have (

and

) ( )

It follows that for sufficiently large

and
i.e.
) (

and

) (

((
) (

(((

( )(

) ( )

)) (

))) (

(
(

))

( ) and

( ) in which ( )

converges in probability to

],
((

) )),

) )).

) (

are Stieltjes sums of the function log((

Since in any interval in which ( ) is positive,


uniformly over

))

)), respectively, with respect to the measure

continuity of

,
( )

and
(

where

)) and log((

over the sum of intervals of

.
(

as

), and uniformly over


) (

160

))

,
((

) (

))}

as

. Furthermore, this convergence is uniformly over j and


(

converges in probability to {

). Similarly,

uniformly over j and

) (

((

))},

).

Therefore,
{

as

) (

((

))}

, uniformly over

) (

((

))}

).

Recalling from (A.3.8), we find

))

as

))

( (

))}

).

, uniformly over

Similarly, we have
(

as
(

))

))

(
)

( (

))}

, uniformly over

).

Combining the results of (A.3.7), (A.3.9), and (A.3.10) yields

(
(

)
))
)

(
(

))

))

161

))

(
(

)
)
)

(
(

)
)}
)

as

, uniformly over

).

Now, we consider the second term in the right-hand side of the equation (A.3.6). By the

Theorem A in Serfling (1980), we know that for


(

| ( )

( )|

| ( )

( )|

and
(

( )|

| ( )

Hence,

| ( )

implying that

))

)
| ( )

, for
( )|

))

| ( )

Thus, for the case of

. According to

( ), we have the inequality

the definition of
(

( )|

Now we consider the case when

(
(

))

))

))

( )|

(
))

).

, we have
(

))

)
)
)

))

162

))

for a sufficiently large

and

].

Also, note that

))

))

(
))

))

))

))

and

))

(
(

for a sufficiently large

))

]. Hence, we prove that the second

term in the right-hand side of the equality (A.3.6) converges to zero in probability. That is,

as

))

))

))

))

, uniformly over

Finally, using the result of Lemma 1 of Vasicek (1976), the last term in the right-hand
side of the equality of (A.3.6) also converges to zero in probability. That is,

as

))

) )]

, uniformly over

).

By (A.3.12), (A.3.13), and (A.3.14), we show that


(

(
163

(
(

)
)}
)

as
(

, uniformly over

).

Likewise, following the same procedure as shown in the proof of

as

(
(

)
)
)

, we have

)]

, uniformly over

This and (A.3.14) conclude that


(

{
as

Hence, under

(
(

(
)
)
)

(
(

)
)}
)

(
(

)
)

))

))

(
(

)
)

))

))

(
(

164

)
)
)

(
(

)
)

(
(

)
)

(
(

)
)}
)

).

and under
(

{
, as

(
(

(
(

)
)}
)

(
(

(
)
)
)

)
)
)

(
(

)
)}
)

We complete the proof of Proposition 1 for the case of

, i.e. the consistency related

to the proposed Test 1.


A.3.2.2 PROPOSED TEST 2
Here, we will consider the case of
(

. That is, we will show that

{
as

It is clear that if one can show that

(
(

(
)
)
)

(
(

)
)}
)

.
(

as

, where

is defined by

(5.12), the rest of the proof is similar to the proof shown in Section A.3.2.1 regarding the
test statistic of the proposed Test 1,

). To consider

begin with a proof that

165

), as

, we

))

( ))

[ (

[ (

( ))

))

( ) )]

To this end, we apply Theorem A of Serfling (1980), having that for


| ( )
Thus, (

))

It is obvious that

( )|

( ))
))

(
and

) )]

))

( ))

) as
(

( ))

as

),

.
(

).

Hence,
(
as

))

( ))

.
,

Next, we will show that the part of


(

( ) )]

. Let

as

distributed with (

) (

[ (

))

( ) denote the empirical distribution function of


)) (Here the symmetry of Z distribution under

is used). Then

[ (

))

[ (

( ) )]

))

( ) )]

166

))

( ))

and

(
) (

Since the first term of (A.3.16), (

))

( ) )]

), vanishes to zero as

, we focus

on the remaining terms of the equation (A.3.16), which can be reorganized as follows:

))

))

))

( ) )]

))

( ))

In respect to the empirical distribution function,

Clearly,

( ))

( ))

and

based on (

( ))

( ))

))

( ))

( ))

( ))

and

( ) )]

. Now, we prove that


.
and the statistic,

, the distribution of

167

),

as

is symmetric under
under

( ))

the first item of (A.3.18) converges to zero in probability as


Since the distribution of

( ), appeared in (A.3.17), again by

virtue of Theorem A of Serfling (1980), we have that for

))

( ))

( ))

, is
can be

taken as the uniform distribution on the interval [-1, 1]. Thus, we obtain
(

( ))

, where

( ))

is the rth order statistic based

( )

( ))

on a standard uniformly distributed, Unif[0,1], random variable. Since

{(

(
(

( )}

applying the Chebyshev's inequality yields (

)
)
)

( )

( ) )]

as

Combining (A.3.16)-(A.3.18), we conclude that

as

[ (

))

Similarly, one can show that

as

[ (

))

) )]

The results of (A.3.15), (A.3.19), and (A.3.20) complete the proof of

as

A.3.3 MATHEMATICAL DERIVATION OF MAXIMUM


LIKELIHOOD RATIO TESTS
A.3.3.1 MAXIMUM LIKELIHOOD RATIO TEST STATISTIC FOR TEST 1
Assume
null hypothesis,

), where

and

1,2, are unknown. The following

, is equivalent to the null hypothesis,

article
168

, that presented in the

Hence, the corresponding hypothesis of interest for Test 1 using the maximum likelihood
ratio test is

vs.

: not

. Under normal assumptions, the MLR test statistic

is given by

)
(

( )

, and

( )

where the associated maximum likelihood estimators (MLEs) of


are

( )

, and
)

), respectively.

A.3.3.2 MAXIMUM LIKELIHOOD RATIO TEST STATISTIC FOR TEST 2


(

Under the assumption that


unknown, the hypotheses

vs.

), where

and

1,2, are

are equivalent to the following hypotheses:

vs.
Thus, the maximum likelihood ratio for Test 2 can be formulated by

)
(

Substituting the associated MLEs of

)
,

, and

(
)

into the above likelihood ratio,

, yields the following maximum likelihood ratio test statistic for Test 2:

169

( )

where

( )

( )(
)

and

).

A.3.3.3 MAXIMUM LIKELIHOOD RATIO TEST STATISTIC FOR TEST 3


(

Assume
vs.

), where

and

1,2, are unknown. The hypotheses:

are equivalent to the following hypotheses:


vs.

Accordingly, the corresponding MLR test statistic is


(

Replacing the parameters of

(
,

)
, and

by their MLEs, the following maximum

likelihood ratio test statistic for the Test 3 can be formulated by


( )
where

(
(

).

170

) (

), and

BIBLIOGRAPHY

Albers, W., Kallenberg, W. C. M., and Martini, F. (2001). Data-Driven Rank Tests for
Classes of Tail Alternatives. Journal of American Statistical Association, 96, 685696.
Bahadur, R. R. (1966). A Note on Quantiles in Large Samples, Ann. Math. Statist, 37,
577-580.
Bardsley, W. E. (1980). Note on the use of the inverse Gaussian distribution for wind
energy applications. Journal of Applied Meteorology, 19, 1126-1130.
Barndorff-Nielsen, O. E. (1994). A note on electrical networks and the inverse Gaussian
distribution. Advances in Applied Probability, 26, 63-67.
Bhattacharyya, G., and Fries, A. (1982). Fatigue failure models - Birnbaum-Saunders vs.
inverse Gaussian. IEEE Trans. Reliab., 31, 439441.
Biederman, J. (1998). Attention-deficit/hyperactivity disorder: a life-span perspective.
The Journal of Clinical Psychiatry, 59, 4-16.
Brotman, M. A., Schmajuk, M., Rich, B., Dickstein, D. P., Guyer, A. E., Costello, E. J.,
Egger, H. L., Angold, A., and Leibenluf, E. (2006). Prevalence, clinical correlates
and longitudinal course of severe mood dysregulation in children. Biological
Psychiatry, 60, 991-997.
Canner, P. L. (1975). A simulation study of one-and two-sample Kolmogorov-Smirnov
statistics with a particular weight function. Journal of American Statistical
Association, 70, 209-211.
Carlin, B., and Louis, T. A. (2008). Bayes and Empirical Bayes Methods for Data
Analysis. Chapman & Hall/CRC, New York.
Carlson, G. A. (2007). Who Are the Children with Severe Mood Dysregulation, a.k.a.
"Rages"? American Journal of Psychiartry, 164, 1140-1142.
Carroll, R. J., Roeder, K., and Wasserman, L. (1999). Flexible Parametric Measurement
Error Models. Biometrics, 55, 44-54.
Carroll, R. J., Spiegelman, C. H., Lan, K. K., Bailey, K. T., and Abbott, R. D. (1984). On
errors-in-variables for binary regression models. Biometrika, 71, 19-25.

171

Carroll, R. J., and Wand, M. P. (1991). Semiparametric Estimation in Logistic


Measurement Error Models. Journal of the Royal Statistical Society, Series B
(Methodological), 53, 573-585.
Chen, S. X. (1993). On the accuracy of empirical likelihood confidence regions for linear
regression models. Ann. Inst. Statist. Math., 45, 621-638.
Chen, S. X. (1994). Empirical likelihood confidence intervals for linear regression
coefficients. Journal of Multivariate Analysis, 45, 621-638.
Chen, S. X. (1994). Comparing empirical likelihood and bootstrap hypothesis tests.
Journal of Multivariate Analysis, 51, 277-293.
Chen, S. X, and Hall, P. (1993). Smoothed empirical likelihood confidence intervals for
quantiles. The Annals of Statistics, 21, 1166-1181.
Chernoff, H., and Zacks, S. (1964). Estimating the current mean of a normal distribution
which is subjected to changes in time. Annals of Mathematical Statistics, 35, 9991018.
Chhikara, R., and Folks, J. (1977). The inverse Gaussian distribution as a lifetime model.
Technometrics, 19, 461-468.
Claeskens, G., and Hjort, N. L. (2004). Goodness of fit via non-parametric likelihood
ratios. Scandinavian Journal of Statistics, 31, 487-513.
Cressie, N. (1976). On the Logarithms of High-Order Spacings. Biometrika, 63, 343-355.
DiCicco, T., Hall, P., and Romano, J. (1989). Comparison of parametric and empirical
likelihood functions. Biometrika, 76, 465-476.
Dorfman, R. (1943). The Detection of Defective Members of Large Populations. Ann.
Math. Stat., 44, 436-441.
Dudewicz, E. J., and van der Meulen, E. C. (1981). Entropy-Based Tests of Uniformity.
Journal of the American Statistical Association, 76, 967-974.
Dunn, G. (1989). Design and Analysis of Reliability Studies: The Statistical Evaluation of
Measurement Errors. Oxford University Press, New York.
Edgeman, R. L., Scott, R. C., and Pavur, R. J. (1988). A modified Kolmogorov-Smirnov
test for the inverse gaussian density with unknown parameters. Communications in
Statistics - Simulation and Computation, 17, 1203-1212.
Edgeman, R. L. (1990). Assessing the Inverse Gaussian Distribution Assumption. IEEE
Transactions on Reliability, 39, 352-355.
172

Embrechts, P., McNeil, A., and Straumann, D. (2002). Correlation and dependence in
risk management: properties and pitfalls. In Risk Management: Value at Risk and
Beyond, ed. M.A.H. Dempster, Cambridge University Press, Cambridge, 176-223.
Faraggi, D., Reiser, B., and Schisterman, E. (2003). ROC curve analysis for biomarkers
based on pooled assessments. Statistics in Medicine, 22, 2515-27.
Folks, J. L., and Chhikara, R. S. (1978). The inverse Gaussian distribution and its
statistical applicationa review. Journal of the Royal Statistical Society of Great
Britain, 40, 263-289.
Folks, J. L., and Chhikara, R. S. (1989). The Inverse Gaussian Distribution, Theory,
Methodology and Applications. Marcel Dekker, New York.
Freedman, L. S., Fainberg, V., Kipnis, V., Midthune, D., and Carroll, R. J. (2004). A new
method for dealing with measurement error in explanatory variables of regression
models. Biometrics, 60, 172-181.
Fuller, W. A. (1987). Measurement Error Models. Wiley, New York.
Gombay, E., and Horvath, L. (1994). An application of the maximum likelihood test to
the change-point problem. Stochastic Processes and their Applications, 50, 161-171.
Gombay, E. (2001). U-statistics for Change under Alternatives. Journal of Multivariate
Analysis, 78, 139-158.
Good, P. (2005). Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer,
New York.
Gu, M., Dong, X., Zhang, X., Wang, X., Qi, Y., Yu, J., and Niu, W. (2012). Strong
Association between Two Polymorphisms on 15q25.1 and Lung Cancer Risks: A
Meta-Analysis. PLoS ONE, 7, e37970. DOI: 10.1371/journal.pone.0037970
Gurevich, G. (2006). Nonparametric AMOC change point tests for stochastically ordered
alternatives. Communications in Statistics - Theory and Methods, 35, 887-903.
Gurevich, G. (2007). Retrospective parametric tests for homogeneity of data.
Communications in Statistics-Theory and Methods, 36, 2841-2862.
Gurevich, G., and Vexler, A. (2005). Change point problems in the model of logistic
regression. Journal of Statistical Planning and Inference, 131, 313-331.
Gurevich, G., and Vexler, A. (2010). Retrospective change point detection: from
parametric to distribution free policies. Communications in Statistics-Simulation and
Computation, 39, 899-920.
173

Gurevich, G., and Vexler, A. (2011). Non-asymptotic optimal properties of ShiryaevRoberts statistical control procedures. Proceedings of the 1st International
Symposium & 10th Balkan Conference on Operational Research (BALCOR 2011), 1,
242-246.
Gurevich, G., and Vexler, A. (2011). A two-sample empirical likelihood ratio test based
on samples entropy. Statistics and Computing, 21, 657-670.
Hall, P. (1984). Limit theorems for sums of general functions of m-spacings.
Mathematical Proceedings of the Cambridge Philosophical Society, 96, 517-532.
Hall, P. (1986). On powerful distributional tests on sample spacings. Journal of
Multivariate Analysis, 19, 201-255.
Hall, P., and La Scala, B. (1990). Methodology and algorithms of empirical likelihood.
International Statistical Review, 58, 109-127.
Hasabelnaby, N. A., Ware, J. H., and Fuller, W. A. (1989). Indoor air pollution and
pulmonary performance: investigating errors in exposure assessment (with
comments). Statistics in Medicine, 8, 1109-1126.
Hauke, J., and Kossowksi, T. (20 ). Comparison of values of Pearsons and Spearmans
correlation coeffcienets on the same sets of data. Quaestiones Geographicae, 3, 8793.
Henze, N., and Klar, B. (2002). Goodness-of-Fit Tests for the Inverse Gaussian
Distribution Based on the Empirical Laplace Transform. Annals of the Institute of
Statistical Mathematics, 54, 425-444.
Iyengar, S., and Patwardhan, G. (1988). Recent developments in the inverse Gaussian
distribution. In: Krishnaiah, P.R. and Rao, C.R. (Eds.) Handbooks of Statistics, 7,
479-480.
James, B., James, K. L., and Siegmund, D. (1987). Tests for a change-point. Biometrika,
74, 71-83.
Johnson, N.L., Kotz, S., and Balakrishnan, N. (1994). Continuous Unvariate
Distributions 1 & 2. Wiley, New York.
Kander, Z., and Zacks, S. (1966). Test procedures for possible changes in parameters of
statistical distributions occurring at unknown time points. Annals of Mathematical
Statistics, 37, 1196-1210.
Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30, 81-89.
Kendall, M. G. (1948). Rank Correlation Methods, London, Griffin.
174

Khashimov, SH.A. (1989). Asymptotic properties of functions of spacings. Theory of


Probability and Its Applications, 34, 298-306.
Kolaczyk, E. D. (1994). Empirical Likelihood and Generalized Linear Models. Statistica
Sinica, 4, 199-218.
Krieger, A. M., Pollak, M., and Yakir, B. (2003). Surveillance of a simple linear
regression. Journal of the American Statistical Association, 98, 456-469.
Lai, T. L. (1995). Sequential change point detection in quality control and dynamical
systems. Journal of the Royal Statistical Society, Series B, 57, 613-658.
Lazar, N. A. (2003). Bayesian Empirical Likelihood. Biometrika, 90, 319-326.
Lazar, N., and Mykland, P. A. (1998). An evaluation of the power and conditionality
properties of empirical likelihood. Biometrika, 85, 523-534.
Lehmann, E.L., and Romano, J. P. (2005). Testing Statistical Hypotheses. Springer, New
York.
Leibenluft, E., Charney, D. S., Towbin, K. E., Bhangoo, R. K., Pine, D. S. (2003).
Defining Clinical Phenotypes of Juvenile Mania. American Journal of Psychiartry,
160, 430-437.
Liu, X., and Liang, K.-Y. (1992). Efficacy of Repeated Measures in Regression Models
with Measurement Error. Biometrics, 48, 645-654.
Liu, A., and Schisterman, E. F. (2003). Comparison of Diagnostic Accuracy of
Biomarkers with Pooled Assessments. Biometrical Journal, 45, 631-644.
Liu, A., Schisterman, E. F., and Theo, E. (2004). Sample Size and Power Calculation in
Comparing Diagnostic Accuracy of Biomarkers with Pooled Assessments. Journal
of Applied Statistics, 31, 49-59.
Louis, G. M., Weiner, J. M., Whitcomb, B. W., Sperrazza, R., Schisterman, E. F.,
Lobdell, D. T., Crickard, K., Greizerstein, H., and Kostyniak, P. J. (2005).
Environmental PCB exposure and risk of endometriosis. Human Reproduction, 20,
279-85.
Mei, Y. (2006). Comments on "A note on optimal detection of a change in distribution"
by Benjamin Yakir, The Annals of Statistics (1997), 25, 2117-2126. The Annals of
Statistics, 34, 1570-1576.
Moustakides, G.V. (1986). Optimal stopping times for detecting changes in distributions.
The Annals of Statistics, 14, 1379-1387.
175

Mudholkar, G. S., and Natarajan, R. (2002). The inverse Gaussian models: analogues of
symmetry, skewness and kurtosis. Annals of the Institute of Statistical Mathematics,
54, 138-154.
Mudholkar, G. S., Natarajan, R., and Chaubey, Y. P. (2001). A goodness-of-fit test for
the inverse Gaussian distribution using its independence characterization. Sankhy B,
63, 362-374.
Mudholkar, G.S., and Tian, L. (2001). On the null distribution of entropy tests for the
Gaussian and inverse Gaussian models. Commun. Statist. Theory Methods, 30, 15071520.
Mudholkar, G. S., and Tian L. (2002). An entropy characterization of the inverse
Gaussian distribution and related goodness-of-fit test. Journal of Statistical Planning
and Inference, 102, 211-221.
Mudholkar, G. S., and Tian, L. (2004). A test for homogeneity of ordered means of
inverse Gaussian population. Journal of Statistical Planning and Inference, 118, 3749.
Mudholkar, G. S., and Wang, H. (2007). IG-symmetry and R-symmetry: Interrelations
and applications to the inverse Gaussian theory. Journal of Statistical Planning and
Inference, 137, 3655-3671.
Mudholkar, G. S., and Wilding, G. E. (2003). On the conventional wisdom regarding two
consistent tests of bivariate independence. Journal of the Royal Statistical Society,
Series D, 52, 41-57.
Mumford, S. L., Schisterman, E. F., Vexler, A., and Liu, A. (2006). Pooling
biospecimens and limits of detection: effects on ROC curve analysis. Biostatistics, 7,
585-598.
Nair, J., Ehimare, U., Beitman, B. D., Nair, S. S., Lavin, A. (2006). Clinical review:
evidence-based diagnosis and treatment of ADHD in children. Missouri Medicine,
103, 617-621.
Natarajan, R., and Mudholkar, G. S. (2004). Moment based goodness-of-fit tests for
inverse Gaussian distribution. Technometrics, 46, 339-347.
Obuchowski, N. (2006). An ROC-type measure of diagnostic accuracy when the gold
standard is continuous-scale, Statistics in Medicine, 25, 481-493.
Owen, A. B. (1988). Empirical Likelihood Ratio Confidence Intervals for a Single
Functional. Biometrika, 75, 237-249.

176

Owen, A. B. (1990). Empirical Likelihood Ratio Confidence Regions. The Annals of


Statistics, 18, 90-120.
Owen, A. B. (1991). Empirical Likelihood for Linear Models. The Annals of Statistics, 19,
1725-1747.
Owen, A. B. (2001). Empirical Likelihood. Chapman and Hall, New York.
Padgett, W., Tsoi, S. (1986). Prediction intervals for future observations from the inverse
Gaussian distribution. IEEE Trans. Reliab., 35, 406408.
Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41, 100-114.
Page, E. S. (1955). A test for a change in a parameter occurring at an unknown point.
Biometrika, 42, 523-526.
Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13, 25-45.
Pettitt, A. N. (1977). Testing the normality of several independent samples using the
Anderson-Darling statistic. Journal of the Royal Statistical Society, Series C, 26,
156-161.
Pollak, M. (1985). Optimal detection of a change in distribution. The Annals of Statistics,
13, 206-227.
Poznanki, E. O., Cook, S. C., and Carroll, B. J. (1979). A depression rating scale for
children. Pediatrics, 64, 442-450.
Poznanki, E. O., Grossman, J. A., Buchsbaum, Y., Banegas, M., Freeman, L., and
Gibbons, R. (1984). Preliminary studies of the reliability and validity of the
childrens depression rating scale. J Am Acad Child Psychiatry, 23, 191-197.
Qin, J., and Lawless, J. (1994). Empirical Likelihood and General Estimating Equations.
The Annals of Statistics, 22, 300325.
Quesenberry, C. P., Whitaker, T. B., and Dickens, J. W. (1976). On Testing Normality
Using Several Samples: An Analysis of Peanut Aflatoxin Data. Biometrics, 32, 753759.
Shapiro, S. S., and Wilk, M. B. (1968). Approximations for the Null Distribution of the
W Statistic. Technometrics, 10, 861-866.
Schafer, D. W. (2001). Semiparametric Maximum Likelihood for Measurement Error
Model Regression. Biometrics, 57, 53-61.

177

Schisterman, E. F., and Vexler, A. (2008). To pool or not to pool, from whether to when:
applications of pooling to biospecimens subject to a limit of detection. Pediatric and
Perinatal Epidemiology, 22, 486-496.
Schisterman, E. F., Vexler, A., Mumford, S. L., and Perkins, N. J. (2010). Hybrid pooledunpooled design for cost-efficient measurement of biomarkers. Statistics in Medicine,
29, 597-613.
Schuster, E. F. (1975). Estimating the distribution function of a symmetric distribution.
Biometrika, 62, 631-635.
Sen, A., and Srivastava, M. S. (1975). On tests for detecting change in mean. The Annals
of Statistics, 3, 98-108.
Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New
York.
Seshadri, V. (1993). The Inverse Gaussian Distribution: A Case Study in Exponential
Families. Clarendon Press, Oxford.
Seshadri, V. (1999). The Inverse Gaussian Distribution: Statistical Theory and
Applications. Springer, New York.
Song, K.-S. (2000). Limit theorems for nonparametric sample entropy estimators.
Statistics & Probability Letters, 49, 9-18.
Song, K.-S. (2002). Goodness-of-fit Tests Based on Kullback-Leibler Discrimination
Information. IEEE Transactions on information theory, 48, 1103-1117.
Spearman, C. E. ( 904a), The proof and measurements of association between two things,
American Journal of Psychology, 15, 72-101.
Stefanski, L. A. (1985). The effects of measurement error on parameter estimation.
Biometrika, 72, 583-592.
Stefanski, L. A., and Carroll, R. J. (1987). Conditional scores and optimal scores in
generalized linear measurement-error models. Biometrika, 74, 703-716.
Stefanski, L. A., and Carroll, R. J. (1990). Score Tests in Generalized Linear
Measurement Error Models. Journal of the Royal Statistical Society, Series B
(Methodological), 52, 345-359.
Stigler, S. M. (1977). Do robust estimators work with real data? The Annals of Statistics,
5, 1055-1098.

178

Thomas, D. R., and Grunkemeier, G. L. (1975). Confidence interval estimation of survival


probabilities for censored data. Journal of American Statistical Association, 70, 865871.
Tian, L., and Mudholkar, G. S. (2003). The likelihood ratio tests for homogeneity of
inverse Gaussian means under simple order and simple tree order. Communication in
Statistics: Theory and method, 32, 791-805.
Tian, L., Xiong, C., Lai, C.-Y., Vexler, A. (2011). Exact confidence interval estimation
for the difference in diagnostic accuracy with three ordinal diagnostic groups,
Journal of Statistical Planning and Inference, 141, 549-558.
Tusnady, G. (1977). On Asymptotically Optimal Tests. The Annals of Statistics, 5, 385393.
Tweedie, M. C. K. (1957). Statistical Properties of Inverse Gaussian Distributions. The
Annals of mathematical statistics, 28, 362-377.
Van Es, B. (1992). Estimating functionals related to a density by a class of statistics
based on spacings. Scandinavian Journal of Statistics, 19, 61-72.
Vasicek, O. (1976). A test for normality based on sample entropy. Journal of the Royal
Statistical Society. Series B (Methodological), 38, 54-59.
Vexler, A. (2006). Guaranteed testing for epidemic changes of a linear regression model.
Journal of Statistical Planning and Inference, 136, 3101-3120.
Vexler, A., Liu, A., and Schisterman, E. F. (2006). Efficient Design and Analysis of
Biospecimens with Measurements Subject to Detection Limit. Biometrical Journal,
48, 780-791.
Vexler, A., and Gurevich, G. (2009). Average most powerful tests for a segmented
regression. Communications in Statistics-Theory and Methods, 38, 2214-2231.
Vexler, A., and Gurevich, G. (2010). Density-based empirical likelihood ratio change
point detection policies, Communications in Statistics-Simulation and Computation,
39, 1709-1725.
Vexler A., and Gurevich G. (2010). Empirical likelihood ratios applied to goodness-of-fit
tests based on sample entropy. Computational Statistics and Data Analysis, 54, 531545.
Vexler, A. and Gurevich, G. (2011). A note on optimality of hypothesis testing,
Mathematics in Engineering, Science and Aerospace, 2, 243-250.

179

Vexler, A., Liu, S., Kang, L., and Hutson, A. D. (2009). Modifications of the Empirical
Likelihood Interval Estimation with Improved Coverage Probabilities.
Communications in Statistics (Simulation and Computation), 38, 2171-2183.
Vexler, A., Liu, A., and Schisterman, E. F. (2006). Efficient Design and Analysis of
Biospecimens with Measurements Subject to Detection Limit. Biometrical Journal,
48, 780-791.
Vexler, A., Liu, A, and Schisterman, E. F. (2010). Nonparametric deconvolution of
density estimation based on observed sums. Journal of Nonparametric Statistics, 22,
23-39.
Vexler, A., Liu, S., and Schisterman EF. (2011). Nonparametric-likelihood inference
based on cost-effectively-sampled-data. Journal of Applied Statistics, 38, 769-783.
Vexler, A., Schisterman, E. F., and Liu, A. (2008). Estimation of ROC curves based on
stably distributed biomarkers subject to measurement error and pooling mixtures.
Statistics in Medicine, 27, 280-296.
Vexler, A., and Tarima, S. (2010). An optimal approach for hypothesis testing in the
presence of incomplete data. Annals of the Institute of Statistical Mathematics, 63,
1141-1163.
Vexler, A., and Wu, C. (2009). An Optimal Retrospective Change Point Detection Policy.
Scandinavian Journal of Statistics, 36, 542-558.
Vexler, A., Wu, C., and Yu, K. F. (2010). Optimal hypothesis testing: from semi to fully
Bayes factors. Metrika, 71, 125-138.
Vexler, A., and Yu, J. (2011). Two-sample density-based empirical likelihood tests for
incomplete data in application to a pneumonia study. Biometrical Journal, 53, 628651.
Vexler, A., Yu, J., Tian, L., and Liu, S. (2010). Two-sample nonparametric likelihood
inference based on incomplete data with an application to a pneumonia study.
Biometrical Journal, 52, 348-361.
Waxmonsky, J., Pelham, W. E., Gnagy, E., Cummings, M. R., O'Connor, B., Majumdar,
A., Verley, J., Hoffman, M. T., Massetti, G. A., Burrows-MacLean, L., Fabiano, G.
A., Waschbusch, D. A., Chacko, A., Arnold, F. W., Walker, K. S., Garefino, A. C.,
and Robb, J. A. (2008). The efficacy and tolerability of methylphenidate and
behavior modification in children with attention-deficit/hyperactivity disorder and
severe mood dysregulation. J Child Adolesc Psychopharmacol, 18, 573-88.
Weinberg, C. R., and Umbach, D. M. (1999). Using pooled exposure assessment to
improve efficiency in case-control studies. Biometrics, 55, 718-26.
180

Wians, F. H., Urban, J. E., Keffer, J. H. and Kroft, S. H. (2001). Discriminating between
iron deficiency anemia and anemia of chronic disease using traditional indices of
iron status vs transferrin receptor concentration. American Journal of Clinical
Phathology, 115, 112-118.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80-83.
Wolfe, D. A., and Schechtman, E. (1984). Nonparametric statistical procedures for the
change point problem. Journal of Statistical Planning and Inference, 9, 389-396.
Ying, G., Mary, E. N., John, H., Michael, G. W., and Graham, E. (2006). An Exploratory
Factor Analysis of the Childrens Depression Rating Scale-Revised. J Child Adol
Psychop, 16, 482-491.
Yu, J., Vexler, A., and Tian, L. (2010). Analyzing Incomplete Data Subject to a
Threshold Using Empirical Likelihood Methods: An Application to a Pneumonia
Risk Study in an ICU Setting. Biometrics, 66, 123130.
Yu, J., Vexler, A,. Kim, S., and Hutson, A. D. (2011). Two-sample Empirical likelihood
ratio tests for medians application to biomarker evaluations. The Canadian Journal
of Statistics, 39, 671-689.

181

Potrebbero piacerti anche