Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
George G. Roussas
Intercollege Division of Statistics
University of California
Davis, California
ACADEMIC PRESS
San Diego London Boston
New York Sydney Tokyo Toronto
iv Contents
ACADEMIC PRESS
525 B Street, Suite 1900, San Diego, CA 92101-4495, USA
1300 Boylston Street, Chestnut Hill, MA 02167, USA
http://www.apnet.com
Roussas, George G.
A course in mathematical statistics / George G. Roussas.2nd ed.
p. cm.
Rev. ed. of: A first course in mathematical statistics. 1973.
Includes index.
ISBN 0-12-599315-3
1. Mathematical statistics. I. Roussas, George G. First course in mathematical
statistics. II. Title.
QA276.R687 1997 96-42115
519.5dc20 CIP
Contents
vii
viii Contents
17.3 2) Observations
Two-way Layout (Classification) with K (
Per Cell 452
Exercises 457
17.4 A Multicomparison method 458
Exercises 462
Index 561
Contents xv
This is the second edition of a book published for the first time in 1973 by
Addison-Wesley Publishing Company, Inc., under the title A First Course in
Mathematical Statistics. The first edition has been out of print for a number of
years now, although its reprint in Taiwan is still available. That issue, however,
is meant for circulation only in Taiwan.
The first issue of the book was very well received from an academic
viewpoint. I have had the pleasure of hearing colleagues telling me that the
book filled an existing gap between a plethora of textbooks of lower math-
ematical level and others of considerably higher level. A substantial number of
colleagues, holding senior academic appointments in North America and else-
where, have acknowledged to me that they made their entrance into the
wonderful world of probability and statistics through my book. I have also
heard of the book as being in a class of its own, and also as forming a collectors
item, after it went out of print. Finally, throughout the years, I have received
numerous inquiries as to the possibility of having the book reprinted. It is in
response to these comments and inquiries that I have decided to prepare a
second edition of the book.
This second edition preserves the unique character of the first issue of the
book, whereas some adjustments are affected. The changes in this issue consist
in correcting some rather minor factual errors and a considerable number of
misprints, either kindly brought to my attention by users of the book or
located by my students and myself. Also, the reissuing of the book has pro-
vided me with an excellent opportunity to incorporate certain rearrangements
of the material.
One change occurring throughout the book is the grouping of exercises of
each chapter in clusters added at the end of sections. Associating exercises
with material discussed in sections clearly makes their assignment easier. In
the process of doing this, a handful of exercises were omitted, as being too
complicated for the level of the book, and a few new ones were inserted. In
xv
xvi Contents
Preface to the Second Edition
Chapters 1 through 8, some of the materials were pulled out to form separate
sections. These sections have also been marked by an asterisk (*) to indicate
the fact that their omission does not jeopardize the flow of presentation and
understanding of the remaining material.
Specifically, in Chapter 1, the concepts of a field and of a -field, and basic
results on them, have been grouped together in Section 1.2*. They are still
readily available for those who wish to employ them to add elegance and rigor
in the discussion, but their inclusion is not indispensable. In Chapter 2, the
number of sections has been doubled from three to six. This was done by
discussing independence and product probability spaces in separate sections.
Also, the solution of the problem of the probability of matching is isolated in a
section by itself. The section on the problem of the probability of matching and
the section on product probability spaces are also marked by an asterisk for the
reason explained above. In Chapter 3, the discussion of random variables as
measurable functions and related results is carried out in a separate section,
Section 3.5*. In Chapter 4, two new sections have been created by discussing
separately marginal and conditional distribution functions and probability
density functions, and also by presenting, in Section 4.4*, the proofs of two
statements, Statements 1 and 2, formulated in Section 4.1; this last section is
also marked by an asterisk. In Chapter 5, the discussion of covariance and
correlation coefficient is carried out in a separate section; some additional
material is also presented for the purpose of further clarifying the interpreta-
tion of correlation coefficient. Also, the justification of relation (2) in Chapter 2
is done in a section by itself, Section 5.6*. In Chapter 6, the number of sections
has been expanded from three to five by discussing in separate sections charac-
teristic functions for the one-dimensional and the multidimensional case, and
also by isolating in a section by itself definitions and results on moment-
generating functions and factorial moment generating functions. In Chapter 7,
the number of sections has been doubled from two to four by presenting the
proof of Lemma 2, stated in Section 7.1, and related results in a separate
section; also, by grouping together in a section marked by an asterisk defini-
tions and results on independence. Finally, in Chapter 8, a new theorem,
Theorem 10, especially useful in estimation, has been added in Section 8.5.
Furthermore, the proof of Plyas lemma and an alternative proof of the Weak
Law of Large Numbers, based on truncation, are carried out in a separate
section, Section 8.6*, thus increasing the number of sections from five to six.
In the remaining chapters, no changes were deemed necessary, except that
in Chapter 13, the proof of Theorem 2 in Section 13.3 has been facilitated by
the formulation and proof in the same section of two lemmas, Lemma 1 and
Lemma 2. Also, in Chapter 14, the proof of Theorem 1 in Section 14.1 has been
somewhat simplified by the formulation and proof of Lemma 1 in the same
section.
Finally, a table of some commonly met distributions, along with their
means, variances and other characteristics, has been added. The value of such
a table for reference purposes is obvious, and needs no elaboration.
Preface to the Second
Contents
Edition xvii
This book contains enough material for a year course in probability and
statistics at the advanced undergraduate level, or for first-year graduate stu-
dents not having been exposed before to a serious course on the subject
matter. Some of the material can actually be omitted without disrupting the
continuity of presentation. This includes the sections marked by asterisks,
perhaps, Sections 13.413.6 in Chapter 13, and all of Chapter 14. The instruc-
tor can also be selective regarding Chapters 11 and 18. As for Chapter 19, it
has been included in the book for completeness only.
The book can also be used independently for a one-semester (or even one
quarter) course in probability alone. In such a case, one would strive to cover
the material in Chapters 1 through 10 with the exclusion, perhaps, of the
sections marked by an asterisk. One may also be selective in covering the
material in Chapter 9.
In either case, presentation of results involving characteristic functions
may be perfunctory only, with emphasis placed on moment-generating func-
tions. One should mention, however, why characteristic functions are intro-
duced in the first place, and therefore what one may be missing by not utilizing
this valuable tool.
In closing, it is to be mentioned that this author is fully aware of the fact
that the audience for a book of this level has diminished rather than increased
since the time of its first edition. He is also cognizant of the trend of having
recipes of probability and statistical results parading in textbooks, depriving
the reader of the challenge of thinking and reasoning instead delegating the
thinking to a computer. It is hoped that there is still room for a book of the
nature and scope of the one at hand. Indeed, the trend and practices just
described should make the availability of a textbook such as this one exceed-
ingly useful if not imperative.
G. G. Roussas
Davis, California
May 1996
xviii Contents
xviii
Contents
Preface to the First Edition xix
the text. Also, there are more than 530 exercises, appearing at the end of the
chapters, which are of both theoretical and practical importance.
The careful selection of the material, the inclusion of a large variety of
topics, the abundance of examples, and the existence of a host of exercises of
both theoretical and applied nature will, we hope, satisfy people of both
theoretical and applied inclinations. All the application-oriented reader has to
do is to skip some fine points of some of the proofs (or some of the proofs
altogether!) when studying the book. On the other hand, the careful handling
of these same fine points should offer some satisfaction to the more math-
ematically inclined readers.
The material of this book has been presented several times to classes
of the composition mentioned earlier; that is, classes consisting of relatively
mathematically immature, eager, and adventurous sophomores, as well as
juniors and seniors, and statistically unsophisticated graduate students. These
classes met three hours a week over the academic year, and most of the
material was covered in the order in which it is presented with the occasional
exception of Chapters 14 and 20, Section 5 of Chapter 5, and Section 3 of
Chapter 9. We feel that there is enough material in this book for a three-
quarter session if the classes meet three or even four hours a week.
At various stages and times during the organization of this book several
students and colleagues helped improve it by their comments. In connection
with this, special thanks are due to G. K. Bhattacharyya. His meticulous
reading of the manuscripts resulted in many comments and suggestions that
helped improve the quality of the text. Also thanks go to B. Lind, K. G.
Mehrotra, A. Agresti, and a host of others, too many to be mentioned here. Of
course, the responsibility in this book lies with this author alone for all omis-
sions and errors which may still be found.
As the teaching of statistics becomes more widespread and its level of
sophistication and mathematical rigor (even among those with limited math-
ematical training but yet wishing to know why and how) more demanding,
we hope that this book will fill a gap and satisfy an existing need.
G. G. R.
Madison, Wisconsin
November 1972
1.1 Some Definitions and Notation 1
Chapter 1
S
Figure 1.1 S S; in fact, S S, since s2 S,
S'
s1 but s2 S .
s2
1
2 1 Basic Concepts of Set Theory
S
A
Ac Figure 1.2 Ac is the shaded region.
is defined by
n
is defined by
n
I A j = {s S ; s A j for all j = 1, 2, }.
j =1
{
A1 A2 = s S ; s A1 , s A2 . }
Symmetrically,
{
A2 A1 = s S ; s A2 , s A1 . }
Note that A1 A2 = A1 Ac2, A2 A1 = A2 Ac1, and that, in general, A1 A 2
A2 A1. (See Fig. 1.5.)
S
Figure 1.5 A1 A2 is ////.
A2 A1 is \\\\.
A1 A2
( ) (
A1 A2 = A1 A2 A2 A1 . )
Note that
( ) (
A1 A2 = A1 A2 A1 A2 . )
Pictorially, this is shown in Fig. 1.6. It is worthwhile to observe that
operations (4) and (5) can be expressed in terms of operations (1), (2), and
(3).
will usually be either the (finite) set {1, 2, . . . , n}, or the (infinite) set
{1, 2, . . .}.
S
Figure 1.7 A1 and A2 are disjoint; that is,
A1 A2 = . Also A1 A2 = A1 + A2 for the
same reason.
A1 A2
5. A1 A2 = A2 A1
A1 A2 = A2 A1 } (Commutative laws)
6. A (j Aj) = j (A Aj)
A (j Aj) = j (A Aj) } (Distributive laws)
There are two more important properties of the operation on sets which
relate complementation to union and intersection. They are known as De
Morgans laws:
c
i ) U Aj = I Aj ,
j j
c
c
ii ) I Aj = U Aj .
j j
c
We will then, by definition, have verified the desired equality of the two
sets.
a) Let s ( U jAj)c. Then s U jAj, hence s Aj for any j. Thus s Acj for every
j and therefore s I jAcj.
b) Let s I jAcj. Then s Acj for every j and hence s Aj for any j. Then
s U jAj and therefore s ( U jAj)c.
The proof of (ii) is quite similar.
ii) If An, then lim An = I An .
n
n=1
and
A = lim sup An = I U Aj .
n n=1 j= n
The sets A and A are called the inferior limit and superior limit,
.
respectively, of the sequence {An}. The sequence {An} has a limit if A = A
Exercises
1.1.1 Let Aj, j = 1, 2, 3 be arbitrary subsets of S. Determine whether each of
the following statements is correct or incorrect.
iii) (A1 A2) A2 = A2;
iii) (A1 A2) A1 = A2;
iii) (A1 A2) (A1 A2) = ;
iv) (A1 A2) (A2 A3) (A3 A1) = (A1 A2) (A2 A3) (A3 A1).
6 1 Basic Concepts of Set Theory
( )
A1 = x, y S ; x = y;
( ) S ; x = y;
A2 = x, y
( )
A3 = x, y S ; x 2 = y 2 ;
( ) S ; x
A4 = x, y
2
y 2 ;
( )
A5 = x, y S ; x 2 + y 2 4 ;
( )
A6 = x, y S ; x y2 ;
( )
A7 = x, y S ; x 2 y.
7 7
(
iii) A1 U A j = U A1 A j ;
j =2 j =2
)
7 7
(
iii) A1 I A j = I A1 A j ;
j =2 j =2
)
c
7 7
iii) U A j = I Acj ;
j =1 j =1
c
7 7
iv) I A j = U Acj
j =1 j =1
by listing the members of each one of the eight sets appearing on either side of
each one of the relations (i)(iv).
c
I Aj = U Aj .
c
j j
1
( ) 1 2
An = x, y ! 2 ; 3 + x < 6 , 0 y 2 2 ,
n n n
1
( )
Bn = x, y ! 2 ; x 2 + y 2 3 .
n
1 1 3
An = x ! ; 5 + < x < 20 , Bn = x ! ; 0 < x < 7 + .
n n n
k1 k
U A j Ak = U A j F
j =1 j =1
and by induction, the statement for unions is true for any finite n. By observing
that
c
n n c
=
I j U A j ,
A
j =1 j =1
* The reader is reminded that sections marked by an asterisk may be omitted without jeo-
* pardizing the understanding of the remaining material.
1.1 Some1.2*
Definitions
Fields and -Fields
and Notation 9
we see that (F2) and the above statement for unions imply that if A1, . . . , An
F, then Inj =1 A j F for any finite n.
U j A = I Acj ,
j = 1 j = 1
which is countable, since it is the intersection of sets, one of which is
countable.
1.1 Some1.2*
Definitions
Fields and -Fields
and Notation 11
( ) {
C = I all -fields containing C . }
By Theorem 3, (C ) is a -field which obviously contains C. Uniqueness
and minimality again follow from the definition of (C ). Hence A =
(C).
( )( ]( )[
, x , , x , x, , x, , x, y ,
)(
)
{ }
C0 = all intervals in ! = .
( ][ )[ ]
x, y , x, y , x, y ; x, y ! , x < y
it the Borel -field (over the real line). The pair (! , B) is called the Borel real
line.
THEOREM 5 Each one of the following classes generates the Borel -field.
C1 = {(x, y]; x, y !, x < y},
C2 = {[ x, y); x, y ! , x < y},
C5 = {( x, ); x !},
C6 = {[ x, ); x !},
C7 = {( , x ); x !},
C8 = {( , x ]; x !}.
Also the classes C j, j = 1, . . . , 8 generate the Borel -field, where for j = 1, . . . ,
8, Cj is defined the same way as Cj is except that x, y are restricted to the
rational numbers.
I (, x ) = (, x ].
n
n=1
(x, ) = (, x] , [x, ) = (, x) ,
c c
(x, y) = (, y) (, x] = (, y) (x, ) (C ), 7
[x, y] = (, y] [x, ) (C ). 7
{ } {(, x) (, x ), (, x) (, x ],
C0 = all rectangles in ! 2 =
(, x] (, x ), (, x] (, x ],
(x, ) (x , ), , [x, ) [x , ), ,
(x, y) (x , y), , [x, y] [x , y],
x, y, x , y ! , x < y, x < y}.
Exercises
1.2.1 Verify that the classes defined in Examples 1, 2 and 3 on page 9 are
fields.
1.2.2 Show that in the definition of a field (Definition 2), property (F3) can
be replaced by (F3) which states that if A1, A2 F, then A1 A2 F.
1.2.3 Show that in Definition 3, property (A3) can be replaced by (A3),
which states that if
Aj A, j = 1, 2, then I Aj A.
j =1
1.2.4 Refer to Example 4 on -fields on page 10 and explain why S was taken
to be uncountable.
1.2.5 Give a formal proof of the fact that the class AA defined in Remark 1
is a -field.
1.2.6 Refer to Definition 1 and show that all three sets A, A and lim An,
n
whenever it exists, belong to A provided An, n 1, belong to A.
1.2.7 Let S = {1, 2, 3, 4} and define the class C of subsets of S as follows:
{ {} {} {} { } { } { } { } { } { }
C = , 1 , 2 , 3 , 4 , 1, 2 , 1, 3 , 1, 4 , 2, 3 , 2, 4 ,
Chapter 2
14
2.1 Probability Functions and Some Basic
2.4 Properties
Combinatorial
and Results 15
(In the terminology of Section 1.2, we require that the events associated with
a sample space form a -field of subsets in that space.) It follows then that I j Aj
is also an event, and so is A1 A2, etc. If the random experiment results in s and
s A, we say that the event A occurs or happens. The U j Aj occurs if at least
one of the Aj occurs, the I j Aj occurs if all Aj occur, A1 A2 occurs if A1 occurs
but A2 does not, etc.
The next basic quantity to be introduced here is that of a probability
function (or of a probability measure).
DEFINITION 1 A probability function denoted by P is a (set) function which assigns to each
event A a number denoted by P(A), called the probability of A, and satisfies
the following requirements:
(P1) P is non-negative; that is, P(A) 0, for every event A.
(P2) P is normed; that is, P(S) = 1.
(P3) P is -additive; that is, for every collection of pairwise (or mutually)
disjoint events Aj, j = 1, 2, . . . , we have P(j Aj) = j P(Aj).
This is the axiomatic (Kolmogorov) definition of probability. The triple
(S, class of events, P) (or (S, A, P)) is known as a probability space.
REMARK 1 If S is finite, then every subset of S is an event (that is, A is taken
to be the discrete -field). In such a case, there are only finitely many events
and hence, in particular, finitely many pairwise disjoint events. Then (P3) is
reduced to:
(P3) P is finitely additive; that is, for every collection of pairwise disjoint
events, Aj, j = 1, 2, . . . , n, we have
n n
( )
P A j = P A j .
j =1 j =1
Actually, in such a case it is sufficient to assume that (P3) holds for any two
disjoint events; (P3) follows then from this assumption by induction.
( ) ( ) ( ) ( )
P S = P S ++ = P S +P + ,
or
( )
1 = 1+ P + and ( )
P = 0,
since P() 0. (So P() = 0. Any event, possibly " , with probability 0 is
called a null event.)
(C2) P is finitely additive; that is for any event Aj, j = 1, 2, . . . , n such that
Ai Aj = , i j,
16 2 Some Probabilistic Concepts and Results
n n
P A j = P A j .
j =1 j =1
( )
Indeed, for ( ) ( ) ( )
Aj = 0, j n + 1, P jn= 1Aj = P j= 1PAj = j= 1P Aj =
( )
jn= 1P Aj .
(C3) For every event A, P(Ac) = 1 P(A). In fact, since A + Ac = S,
( ) ()
P A + Ac = P S , or ( ) ( )
P A + P Ac = 1,
(
A2 = A1 + A2 A1 , )
hence
( ) ( ) (
P A2 = P A1 + P A2 A1 , )
and therefore P(A2) P(A1).
REMARK 2 If A1 A2, then P(A2 A1) = P(A2) P(A1), but this is not true,
in general.
(C5) 0 P(A) 1 for every event A. This follows from (C1), (P2), and (C4).
(C6) For any events A1, A2, P(A1 A2) = P(A1) + P(A2) P(A1 A2).
In fact,
(
A1 A2 = A1 + A2 A1 A2 . )
Hence
( ) ( ) (
P A1 A2 = P A1 + P A2 A1 A2 )
= P( A ) + P( A ) P( A A ),
1 2 1 2
since A1 A2 A2 implies
( ) ( ) (
P A2 A1 A2 = P A2 P A1 A2 . )
(C7) P is subadditive; that is,
P U A j P A j
j =1 j =1
( )
and also
n n
P U A j P A j .
j =1 j =1
( )
2.1 Probability Functions and Some Basic
2.4 Properties
Combinatorial
and Results 17
j =1
j =1
n
( )
n
( )
P U A j = P A j P A j A j
j =1 j =1 1 j < j n
1 2
1 2
+
1 j1 < j2 < j3 n
(
P Aj Aj Aj
1 2 3
)
( ) ( )
n+1
+ 1 P A1 A2 An .
We have
k+1 k
P U A j = P U A j Ak+1
j =1 j =1
k k
j =1 j =1
(
= P U A j + P Ak+1 P U A j Ak+1
)
k
( )
= P A j P A j1 A j2
j = 1 1 j1 < j2 k
( )
+ P A j1 A j2 A j3
1 j1 < j2 < j3 k
( )
k
( ) (
P A1 A2 Ak + P Ak+1 P U A j Ak+1 ) ( ) ( )
k+1
+ 1
j =1
k+1
= P Aj
j =1
( ) P( A A ) 1 ji < j2 k
j1 j2
+ P( A A A ) j1 j2 j3
1 ji < j2 < j3 k
k
( ) ( ) ( ) .
k+1
+ 1 P A1 A2 Ak P U A j Ak+1 (1)
But j =1
k
( )
k
( )
P U A j Ak+1 = P A j Ak+1 P A j1 A j2 Ak+1
j =1 j =1 1 j1 < j2 k
( )
+
1 j1 < j2 < j3 k
(
P A j1 A j2 A j3 Ak+1 )
( ) ( )
k
+ 1 P A j1 A jk 11 Ak+1
1 j1 < j2 jk 1 k
( ) ( )
k+1
+ 1 P A1 Ak Ak+1 .
k+1 k+1
( )
k
( )
P U A j = P A j P A j A j + P A j Ak+1
j =1 j =1 1 j < j k j =1
1 2
( )
1 2
(
+ P A j A j A j + P A j A j Ak+1
1 j < j < j k 1 j < j k
1 2 3
) ( 1 2
)
1 2 3 1 2
( ) [ P( A A )
k+1
+ 1 1 k
+
1 j1 < j2 < < jk 1 k
(
P A j A j Ak+1
1
k 1
)
2.1 Probability Functions and Some Basic
2.4 Properties
Combinatorial
and Results 19
( ) P( A A A )
k+ 2
+ 1 1 k k+1
k+1
= P( A ) P( A A )
j j1 j2
j=1 1 j1 < j2 k+1
+ P( A A A ) j1 j2 j3
1 j1 < j2 < j3 k+1
( ) ( )
k+ 2
+ 1 P A1 Ak+1 .
(
P lim An = lim P An .
n
) n
( )
PROOF Let us first assume that An . Then
lim An = U A j.
n
j=1
We recall that
( ) ( )
(
= A1 + A2 A1 + A3 A2 + , ) ( )
by the assumption that An . Hence
(
)
P lim An = P U A j = P A1 + P A2 A1
n j =1
( ) ( )
( ( ) )
+ P A3 A2 + + P An An1 +
= lim[ P( A ) + P( A A ) + + P( A A )]
1 2 1 n n1
n
= lim[ P( A ) + P( A ) P( A )
1 2 1
n
+ P( A ) P( A ) + + P( A ) P( A )]
3 2 n n1
= lim P( A ). n
n
Thus
(
P lim An = lim P An .
n
) n
( )
Now let An . Then Acn , so that
lim Anc = U Acj .
n
j=1
20 2 Some Probabilistic Concepts and Results
Hence
(
n
)
j=1 n
( )
P lim Anc = P U Acj = lim P Anc ,
or equivalently,
c
j = 1 n [ ( )]
P I A j = lim 1 P An , or 1 P I A j = 1 lim P An .
j =1 n
( )
Thus
n
( )
(
lim P An = P I A j = P lim An ,
j =1 n
)
and the theorem is established.
This theorem will prove very useful in many parts of this book.
Exercises
2.1.1 If the events Aj, j = 1, 2, 3 are such that A1 A2 A3 and P(A1) = 1
4
,
P(A2) = 125 , P(A3) = 7 , compute the probability of the following events:
12
2.1.2 If two fair dice are rolled once, what is the probability that the total
number of spots shown is
i) Equal to 5?
ii) Divisible by 3?
2.1.3 Twenty balls numbered from 1 to 20 are mixed in an urn and two balls
are drawn successively and without replacement. If x1 and x2 are the numbers
written on the first and second ball drawn, respectively, what is the probability
that:
i) x1 + x2 = 8?
ii) x1 + x2 5?
2.1.4 Let S = {x integer; 1 x 200} and define the events A, B, and C by:
{
A = x S ; x is divisible by 7 , }
B = {x S ; x = 3n + 10 }
for some positive integer n ,
{
C = x S ; x 2 + 1 375 . }
2.2
2.4 Conditional
Combinatorial
Probability
Results 21
Compute P(A), P(B), P(C), where P is the equally likely probability function
on the events of S.
2.1.5 Let S be the set of all outcomes when flipping a fair coin four times and
let P be the uniform probability function on the events of S. Define the events
A, B as follows:
{ }
A = s S ; s contains more T s than H s ,
B = {s S ; any T in s precedes every H in s}.
P( A j ) < .
j=1
( ) ( ) ( )
P A lim inf P An lim sup P An P A .
n n
( )
DEFINITION 2 Let A be an event such that P(A) > 0. Then the conditional probability, given
A, is the (set) function denoted by P(|A) and defined for every event B as
follows:
( ) (P( A) ) .
P A B
P BA =
(
P j = 1 A j A P j = 1 A j A
)
P A j A = =
j =1 P A ( ) P A ( )
(
P Aj A )=
P( A )
P( A )
1
= A = A.
( )
P A j =1
j
j =1 ( )
P A j =1
j
n1
P I A j > 0.
j =1
Then
2.4
2.2 Conditional
Combinatorial
Probability
Results 23
n
(
P I A j = P An A1 A2 An1
j =1
)
(
P An1 A1 An 2 P A2 A1 P A1 . ) ( )( )
(The proof of this theorem is left as an exercise; see Exercise 2.2.4.)
REMARK 3 The value of the above formula lies in the fact that, in general, it
is easier to calculate the conditional probabilities on the right-hand side. This
point is illustrated by the following simple example.
EXAMPLE 1 An urn contains 10 identical balls (except for color) of which five are black,
three are red and two are white. Four balls are drawn without replacement.
Find the probability that the first ball is black, the second red, the third white
and the fourth black.
Let A1 be the event that the first ball is black, A2 be the event that the
second ball is red, A3 be the event that the third ball is white and A4 be the
event that the fourth ball is black. Then
(
P A1 A2 A3 A4 )
( )(
= P A4 A1 A2 A3 P A3 A1 A2 P A2 A1 P A1 , )( )( )
and by using the uniform probability function, we have
( )
P A1 =
5
10
, (
P A2 A1 = ) 3
9
, (
P A3 A1 A2 = ) 2
8
,
(
P A4 A1 A2 A3 = ) 4
7
.
(
B = B Aj .
j
)
Hence
( ) (
P B = P B Aj = P B Aj P Aj ,
j
) j
( )( )
provided P(Aj) > 0, for all j. Thus we have the following theorem.
THEOREM 4 (Total Probability Theorem) Let {Aj, j = 1, 2, . . . } be a partition of S with
P(Aj) > 0, all j. Then for B A, we have P(B) = jP(B|Aj)P(Aj).
This formula gives a way of evaluating P(B) in terms of P(B|Aj) and
P(Aj), j = 1, 2, . . . . Under the condition that P(B) > 0, the above formula
24 2 Some Probabilistic Concepts and Results
i i i
Thus
THEOREM 5 (Bayes Formula) If {Aj, j = 1, 2, . . .} is a partition of S and P(Aj) > 0, j = 1,
2, . . . , and if P(B) > 0, then
(
P Aj B = ) (
)( ).
P B Aj P Aj
P( B A ) P( A )
i i i
( )
P AB =
( )( )
P BA P A
=
1 p
=
5p
.
( )( ) (
P BA P A +P BA P A c
)( ) c 1
(
1 p + 1 p
5
4p+1
)
Furthermore, it is easily seen that P(A|B) = P(A) if and only if p = 0 or 1.
For example, for p = 0.7, 0.5, 0.3, we find, respectively, that P(A|B) is
approximately equal to: 0.92, 0.83 and 0.68.
Of course, there is no reason to restrict ourselves to one partition of S
only. We may consider, for example, two partitions {Ai, i = 1, 2, . . .} {Bj, j = 1,
2, . . . }. Then, clearly,
(
Ai = Ai B j ,
j
) i = 1, 2, ,
Bj = ( B A ), j i j = 1, 2, ,
i
and
{A B , i = 1, 2, , j = 1, 2, }
i j
2.4 Combinatorial
Exercises
Results 25
is a partition of S. In fact,
(A B ) (A
i j i )
B j = if (i, j ) (i, j )
and
( A B ) = ( A B ) = A = S.
i j i j i
i, j i j i
The expression P(Ai Bj) is called the joint probability of Ai and Bj. On the
other hand, from
Ai = Ai Bj
j
( ) and Bj = Ai Bj ,
i
( )
we get
( ) (
P Ai = P Ai B j = P Ai B j P B j ,
j
) j
( )( )
provided P(Bj) > 0, j = 1, 2, . . . , and
( ) (
P B j = P Ai B j = P B j Ai P Ai ,
i
) i
( )( )
provided P(Ai) > 0, i = 1, 2, . . . . The probabilities P(Ai), P(Bj) are called
marginal probabilities. We have analogous expressions for the case of more
than two partitions of S.
Exercises
2.2.1 If P(A|B) > P(A), then show that P(B|A) > P(B) (P(A)P(B) > 0).
2.2.2 Show that:
i) P(Ac|B) = 1 P(A|B);
ii) P(A B|C) = P(A|C) + P(B|C) P(A B|C).
Also show, by means of counterexamples, that the following equations need
not be true:
iii) P(A|Bc) = 1 P(A|B);
iv) P(C|A + B) = P(C|A) + P(C|B).
2.2.3 If A B = and P(A + B) > 0, express the probabilities P(A|A + B)
and P(B|A + B) in terms of P(A) and P(B).
2.2.4 Use induction to prove Theorem 3.
2.2.5 Suppose that a multiple choice test lists n alternative answers of which
only one is correct. Let p, A and B be defined as in Example 2 and find Pn(A|B)
26 2 Some Probabilistic Concepts and Results
in terms of n and p. Next show that if p is fixed but different from 0 and 1, then
Pn(A|B) increases as n increases. Does this result seem reasonable?
2.2.6 If Aj, j = 1, 2, 3 are any events in S, show that {A1, Ac1 A2, Ac1 Ac2
A3, (A1 A2 A3)c} is a partition of S.
2.2.7 Let {Aj, j = 1, . . . , 5} be a partition of S and suppose that P(Aj) = j/15
and P(A|Aj) = (5 j)/15, j = 1, . . . , 5. Compute the probabilities P(Aj|A),
j = 1, . . . , 5.
2.2.8 A girls club has on its membership rolls the names of 50 girls with the
following descriptions:
20 blondes, 15 with blue eyes and 5 with brown eyes;
25 brunettes, 5 with blue eyes and 20 with brown eyes;
5 redheads, 1 with blue eyes and 4 with green eyes.
If one arranges a blind date with a club member, what is the probability that:
i) The girl is blonde?
ii) The girl is blonde, if it was only revealed that she has blue eyes?
2.2.9 Suppose that the probability that both of a pair of twins are boys is 0.30
and that the probability that they are both girls is 0.26. Given that the probabil-
ity of a child being a boy is 0.52, what is the probability that:
i) The second twin is a boy, given that the first is a boy?
ii) The second twin is a girl, given that the first is a girl?
2.2.10 Three machines I, II and III manufacture 30%, 30% and 40%, respec-
tively, of the total output of certain items. Of them, 4%, 3% and 2%, respec-
tively, are defective. One item is drawn at random, tested and found to be
defective. What is the probability that the item was manufactured by each one
of the machines I, II and III?
2.2.11 A shipment of 20 TV tubes contains 16 good tubes and 4 defective
tubes. Three tubes are chosen at random and tested successively. What is the
probability that:
i) The third tube is good, if the first two were found to be good?
ii) The third tube is defective, if one of the other two was found to be good
and the other one was found to be defective?
2.2.12 Suppose that a test for diagnosing a certain heart disease is 95%
accurate when applied to both those who have the disease and those who do
not. If it is known that 5 of 1,000 in a certain population have the disease in
question, compute the probability that a patient actually has the disease if the
test indicates that he does. (Interpret the answer by intuitive reasoning.)
2.2.13 Consider two urns Uj, j = 1, 2, such that urn Uj contains mj white balls
and nj black balls. A ball is drawn at random from each one of the two urns and
2.4 Combinatorial
2.3 Independence
Results 27
is placed into a third urn. Then a ball is drawn at random from the third urn.
Compute the probability that the ball is black.
2.2.14 Consider the urns of Exercise 2.2.13. A balanced die is rolled and if
an even number appears, a ball, chosen at random from U1, is transferred to
urn U2. If an odd number appears, a ball, chosen at random from urn U2, is
transferred to urn U1. What is the probability that, after the above experiment
is performed twice, the number of white balls in the urn U2 remains the same?
2.2.15 Consider three urns Uj, j = 1, 2, 3 such that urn Uj contains mj white
balls and nj black balls. A ball, chosen at random, is transferred from urn U1 to
urn U2 (color unnoticed), and then a ball, chosen at random, is transferred
from urn U2 to urn U3 (color unnoticed). Finally, a ball is drawn at random
from urn U3. What is the probability that the ball is white?
2.2.16 Consider the urns of Exercise 2.2.15. One urn is chosen at random
and one ball is drawn from it also at random. If the ball drawn was white, what
is the probability that the urn chosen was urn U1 or U2?
2.2.17 Consider six urns Uj, j = 1, . . . , 6 such that urn Uj contains mj ( 2)
white balls and nj ( 2) black balls. A balanced die is tossed once and if the
number j appears on the die, two balls are selected at random from urn Uj.
Compute the probability that one ball is white and one ball is black.
2.2.18 Consider k urns Uj, j = 1, . . . , k each of which contain m white balls
and n black balls. A ball is drawn at random from urn U1 and is placed in urn
U2. Then a ball is drawn at random from urn U2 and is placed in urn U3 etc.
Finally, a ball is chosen at random from urn Uk1 and is placed in urn Uk. A ball
is then drawn at random from urn Uk. Compute the probability that this last
ball is black.
2.3 Independence
For any events A, B with P(A) > 0, we defined P(B|A) = P(A B)/P(A). Now
P(B|A) may be >P(B), < P(B), or = P(B). As an illustration, consider an urn
containing 10 balls, seven of which are red, the remaining three being black.
Except for color, the balls are identical. Suppose that two balls are drawn
successively and without replacement. Then (assuming throughout the uni-
form probability function) the conditional probability that the second ball is
red, given that the first ball was red, is 69 , whereas the conditional probability
that the second ball is red, given that the first was black, is 79 . Without any
knowledge regarding the first ball, the probability that the second ball is red is
7
10
. On the other hand, if the balls are drawn with replacement, the probability
that the second ball is red, given that the first ball was red, is 107 . This probabil-
ity is the same even if the first ball was black. In other words, knowledge of the
event which occurred in the first drawing provides no additional information in
28 2 Some Probabilistic Concepts and Results
calculating the probability of the event that the second ball is red. Events like
these are said to be independent.
As another example, revisit the two-children families example considered
earlier, and define the events A and B as follows: A = children of both
genders, B = older child is a boy. Then P(A) = P(B) = P(B|A) = 12 . Again
knowledge of the event A provides no additional information in calculating
the probability of the event B. Thus A and B are independent.
More generally, let A, B be events with P(A) > 0. Then if P(B|A) = P(B),
we say that the even B is (statistically or stochastically or in the probability
sense) independent of the event A. If P(B) is also > 0, then it is easily seen that
A is also independent of B. In fact,
That is, if P(A), P(B) > 0, and one of the events is independent of the other,
then this second event is also independent of the first. Thus, independence is
a symmetric relation, and we may simply say that A and B are independent. In
this case P(A B) = P(A)P(B) and we may take this relationship as the
definition of independence of A and B. That is,
DEFINITION 3 The events A, B are said to be (statistically or stochastically or in the probabil-
ity sense) independent if P(A B) = P(A)P(B).
Notice that this relationship is true even if one or both of P(A), P(B) = 0.
As was pointed out in connection with the examples discussed above,
independence of two events simply means that knowledge of the occurrence of
one of them helps in no way in re-evaluating the probability that the other
event happens. This is true for any two independent events A and B, as follows
from the equation P(A|B) = P(A), provided P(B) > 0, or P(B|A) = P(B),
provided P(A) > 0. Events which are intuitively independent arise, for exam-
ple, in connection with the descriptive experiments of successively drawing
balls with replacement from the same urn with always the same content, or
drawing cards with replacement from the same deck of playing cards, or
repeatedly tossing the same or different coins, etc.
What actually happens in practice is to consider events which are inde-
pendent in the intuitive sense, and then define the probability function P
appropriately to reflect this independence.
The definition of independence generalizes to any finite number of events.
Thus:
DEFINITION 4 The events Aj, j = 1, 2, . . . , n are said to be (mutually or completely) indepen-
dent if the following relationships hold:
( 1 k
) ( )
P Aj Aj = P Aj P Aj
1
( ) k
n n n n n
+ + + =2 =2 n1
n n
2 3 n 1 0
( ) ( ) ( ) ( )
P A1 A2 A3 = P A1 P A2 P A3 ,
P( A A ) = P( A )P( A ),
1 2 1 2
P( A A ) = P( A )P( A ),
1 3 1 3
P( A A ) = P( A )P( A ).
2 3 2 3
That these four relations are necessary for the characterization of indepen-
dence of A1, A2, A3 is illustrated by the following examples:
Let S = {1, 2, 3, 4}, P({1}) = = P({4}) = 14 , and set A1 = {1, 2}, A2 = {1, 3},
A3 = {1, 4}. Then
A1 A2 = A1 A3 = A2 A3 = 1 , and {} A1 A2 A3 = 1 .{}
Thus
( ) ( ) (
P A1 A2 = P A1 A3 = P A2 A3 = P A1 A2 A3 = ) ( ) 1
4
.
Next,
(
P A1 A2 =
1 1 1
)
= = P A1 P A2 ,
4 2 2
( ) ( )
( 1 1 1
)
P A1 A3 = = = P A1 P A3 ,
4 2 2
( ) ( )
( 1 1 1
)
P A1 A3 = = = P A2 P A3 ,
4 2 2
( ) ( )
but
(
P A1 A2 A3 = ) 1 1 1 1
( ) ( ) ( )
= P A1 P A2 P A3 .
4 2 2 2
({ })
P 1 =
1
8
, P 2 ({ }) = P({3}) = P({4}) = 163 , P({5}) = 165 .
30 2 Some Probabilistic Concepts and Results
Let
{ } {
A1 = 1, 2, 3 , A2 = 1, 2, 4 , A3 = 1, 3, 4 . } { }
Then
A1 A2 = 1, 2 , { } A1 A2 A3 = 1 . {}
Thus
(
P A1 A2 A3 = ) 1 1 1 1
= = P A1 P A2 P A3 ,
8 2 2 2
( ) ( ) ( )
but
(
P A1 A2 = ) 5 1 1
= P A1 P A2 .
16 2 2
( ) ( )
The following result, regarding independence of events, is often used by
many authors without any reference to it. It is the theorem below.
THEOREM 6 If the events A1, . . . , An are independent, so are the events A1, . . . , An, where
Aj is either Aj or Acj, j = 1, . . . , n.
PROOF The proof is done by (a double) induction. For n = 2, we have to
show that P(A1 A2) = P(A1)P(A2). Indeed, let A1 = A1 and A2 = Ac2. Then
P(A1 A2) = P(A1 Ac2) = P[A1 (S A2)] = P(A1 A1 A2) = P(A1)
P(A1 A2) = P(A1) P(A1)P(A2) = P(A1)[1 P(A2)] = P(A1)P(Ac2) = P(A1)P(A2).
Similarly if A1 = Ac1 and A2 = A2. For A1 = Ac1 and A2 = Ac2, P(A1 A2) =
P(Ac1 Ac2) = P[(S A1) Ac2] = P(Ac2 A1 Ac2) = P(Ac2) P(A1 Ac2) = P(Ac2)
P(A1)P(Ac2) = P(Ac2)[1 P(A1)] = P(Ac2)P(Ac1) = P(A1)P(A2).
Next, assume the assertion to be true for k events and show it to be true
for k + 1 events. That is, we suppose that P(A1 Ak) = P(A1 ) P(Ak),
and we shall show that P(A1 Ak+1) = P(A1 ) P(Ak+1). First, assume
that Ak+1 = Ak+1, and we have to show that
( ) (
P A1 Ak +1 = P A1 Ak Ak+1 )
= P( A) P( A )P( A ).
1 k k+1
( ) [(
P A1c A2 Ak Ak+1 = P S A1 A2 Ak Ak+1 ) ]
(
= P A2 Ak Ak+1 A1 A2 Ak Ak+1 )
= P( A A A ) P( A A A A )
2 k k+1 1 2 k k+1
= P( A ) P( A ) P( A ) P( A ) P( A ) P( A ) P( A )
2 k k+1 1 2 k k+1
This is, clearly, true if Ac1 is replaced by any other Aci, i = 2, . . . , k. Now, for
! < k, assume that
(
P A1c Alc Al+1 Ak +1 )
( ) ( )( )
= P A1c P Alc P Al+1 P Ak +1 ( )
and show that
(
P A1c Alc Alc+1 Al+ 2 Ak +1 )
=P A ( ) P( A )P( A )P( A ) P( A ).
c
1
c
l
c
l +1 l+ 2 k +1
Indeed,
(
P A1c Alc Alc+1 Al+ 2 Ak+1 )
[ (
= P A1c Alc S Al+1 Al+ 2 Ak+1 ) ]
(
= P A A Al+ 2 Ak+1
1
c c
l
P( A ) P( A ) P( A ) P( A ) P( A )
1
c c
l l +1 l+ 2 k+1
= P( A ) P( A ) P( A ) P( A ) P( A )
1
c c
l l+ 2 k+1
c
l +1
= P( A ) P( A )P( A )P( A ) P( A ),
1
c c
l
c
l +1 l+ 2 k+1
as was to be seen. It is also, clearly, true that the same result holds if the ! Ai s
which are Aci are chosen in any one of the (k!) possible ways of choosing ! out
of k. Thus, we have shown that
(
P A1 Ak Ak+1 = P A1 P Ak P Ak+1 . ) ( ) ( ) ( )
Finally, under the assumption that
( )
P A1 Ak = P A1 P Ak , ( ) ( )
take Ak+1 = A , and show that
c
k+1
(
P A1 Ak Akc+1 = P A1 P Ak P Akc+1 .) ( ) ( ) ( )
32 2 Some Probabilistic Concepts and Results
In fact,
( ) [(
P A1 Ak Akc+1 = P A1 Ak S Ak+1 ( ))]
(
= P A1 Ak A1 Ak Ak+1 )
= P( A A ) P( A A A )
1 k 1 k k+1
= P( A) P( A ) P( A) P( A )P( A )
1 k 1 k k+1
= P( A) P( A )P( A ).
1 k
c
k+1
The class of events are subsets of S, and the experiments are said to be
independent if for all events Bj associated with experiment Ej alone, j = 1,
2, . . . , n, it holds
2.4 Combinatorial
Exercises
Results 33
( ) ( ) ( )
P B1 Bn = P B1 P Bn .
Again, the probability function P is defined, in terms of Pj, j = 1, 2, . . . , n, on
the class of events in S so that to reflect the intuitive independence of the
experiments Ej, j = 1, 2, . . . , n.
In closing this section, we mention that events and experiments which are
not independent are said to be dependent.
Exercises
2.3.1 If A and B are disjoint events, then show that A and B are independent
if and only if at least one of P(A), P(B) is zero.
2.3.2 Show that if the event A is independent of itself, then P(A) = 0 or 1.
2.3.3 If A, B are independent, A, C are independent and B C = , then A,
B + C are independent. Show, by means of a counterexample, that the conclu-
sion need not be true if B C .
2.3.4 For each j = 1, . . . , n, suppose that the events A1, . . . , Am, Bj are
independent and that Bi Bj = , i j. Then show that the events A1, . . . , Am,
nj= 1Bj are independent.
2.3.5 If Aj, j = 1, . . . , n are independent events, show that
n
( )
n
P U A j = 1 P Acj .
j =1 j =1
2.3.6 Jim takes the written and road drivers license tests repeatedly until he
passes them. Given that the probability that he passes the written test is 0.9
and the road test is 0.6 and that tests are independent of each other, what is the
probability that he will pass both tests on his nth attempt? (Assume that
the road test cannot be taken unless he passes the written test, and that once
he passes the written test he does not have to take it again, no matter whether
he passes or fails his next road test. Also, the written and the road tests are
considered distinct attempts.)
2.3.7 The probability that a missile fired against a target is not intercepted by
an antimissile missile is 32 . Given that the missile has not been intercepted, the
probability of a successful hit is 34 . If four missiles are fired independently,
what is the probability that:
i) All will successfully hit the target?
ii) At least one will do so?
How many missiles should be fired, so that:
34 2 Some Probabilistic Concepts and Results
A B
Pn ,k n n!
= C n ,k = =
k! (
k k! n k !)
if the sampling is done without replacement; and is equal to
n + k 1
( )
N n, k =
k
n + k 1
k
(
= N n, k , )
as was to be seen.
For the sake of illustration of Theorem 8, let us consider the following
examples.
EXAMPLE 4 i(i) Form all possible three digit numbers by using the numbers 1, 2, 3, 4, 5.
(ii) Find the number of all subsets of the set S = {1, . . . , N}.
In part (i), clearly, the order in which the numbers are selected is relevant.
Then the required number is P5,3 = 5 4 3 = 60 without repetitions, and 53 = 125
with repetitions.
In part (ii) the order is, clearly, irrelevant and the required number is (N0)
+ ( 1) + + (NN) = 2N, as already found in Example 3.
N
2.4 Combinatorial Results 37
EXAMPLE 5 An urn contains 8 balls numbered 1 to 8. Four balls are drawn. What is the
probability that the smallest number is 3?
Assuming the uniform probability function, the required probabilities are
as follows for the respective four possible sampling cases:
5
3 1
Order does not count/replacements not allowed: = 0.14;
8 7
4
6 + 3 1
3 28
Order does not count/replacements allowed: = 0.17;
8 + 4 1 165
4
1, 098, 240
0.42.
2, 598, 960
THEOREM 9 i) The number of ways in which n distinct balls can be distributed into k
distinct cells is kn.
ii) The number of ways that n distinct balls can be distributed into k distinct
cells so that the jth cell contains nj balls (nj 0, j = 1, . . . , k, kj=1 nj = n)
is
n! n
= .
n1! n2 ! nk ! n1 , n2 , , nk
iii) The number of ways that n indistinguishable balls can be distributed into
k distinct cells is
k + n 1
.
n
n 1
.
k 1
PROOF
i) Obvious, since there are k places to put each of the n balls.
ii) This problem is equivalent to partitioning the n balls into k groups, where
the jth group contains exactly nj balls with nj as above. This can be done in
the following number of ways:
n n n1 n n1 nk1 n!
= n !n ! n !.
n1 n2 nk 1 2 k
iii) We represent the k cells by the k spaces between k + 1 vertical bars and the
n balls by n stars. By fixing the two extreme bars, we are left with k + n
1 bars and stars which we may consider as k + n 1 spaces to be filled in
by a bar or a star. Then the problem is that of selecting n spaces for the n
( )
stars which can be done in k+nn1 ways. As for the second part, we now
have the condition that there should not be two adjacent bars. The n stars
REMARK 5
i) The numbers nj, j = 1, . . . , k in the second part of the theorem are called
occupancy numbers.
ii) The answer to (ii) is also the answer to the following different question:
Consider n numbered balls such that nj are identical among themselves and
distinct from all others, nj 0, j = 1, . . . , k, kj=1 nj = n. Then the number of
different permutations is
n
.
n1 , n2 , , nk
Now consider the following examples for the purpose of illustrating the
theorem.
EXAMPLE 7 Find the probability that, in dealing a bridge hand, each player receives one
ace.
The number of possible bridge hands is
52 52!
N = = .
13, 13, 13, 13 ( )
4
13!
Our sample space S is a set with N elements and assign the uniform probability
measure. Next, the number of sample points for which each player, North,
South, East and West, has one ace can be found as follows:
a) Deal the four aces, one to each player. This can be done in
4 4!
= 1! 1! 1! 1! = 4! ways.
1, 1, 1, 1
b) Deal the remaining 48 cards, 12 to each player. This can be done in
48 48!
= ways.
12, 12, 12, 12 ( )
4
12!
EXAMPLE 8 The eleven letters of the word MISSISSIPPI are scrambled and then arranged
in some order.
i) What is the probability that the four Is are consecutive letters in the
resulting arrangement?
There are eight possible positions for the first I and the remaining
probability is
( )
seven letters can be arranged in 7 distinct ways. Thus the required
1, 4 , 2
40 2 Some Probabilistic Concepts and Results
7
8
1, 4, 2 4
= 0.02.
11 165
1, 4, 4, 2
ii) What is the conditional probability that the four Is are consecutive (event
A), given B, where B is the event that the arrangement starts with M and
ends with S?
Since there are only six positions for the first I, we clearly have
5
6
( )
P AB =
2
9
=
1
21
0.05.
4, 3, 2
iii) What is the conditional probability of A, as defined above, given C, where
C is the event that the arrangement ends with four consecutive Ss?
Since there are only four positions for the first I, it is clear that
3
4
( )
P AC =
2
7
=
4
35
0.11.
1, 2, 4
Exercises
2.4.1 A combination lock can be unlocked by switching it to the left and
stopping at digit a, then switching it to the right and stopping at digit b and,
finally, switching it to the left and stopping at digit c. If the distinct digits a, b
and c are chosen from among the numbers 0, 1, . . . , 9, what is the number of
possible combinations?
2.4.2 How many distinct groups of n symbols in a row can be formed, if each
symbol is either a dot or a dash?
2.4.3 How many different three-digit numbers can be formed by using the
numbers 0, 1, . . . , 9?
2.4.4 Telephone numbers consist of seven digits, three of which are grouped
together, and the remaining four are also grouped together. How many num-
bers can be formed if:
i) No restrictions are imposed?
ii) If the first three numbers are required to be 752?
2.4 Combinatorial Results
Exercises 41
2.4.5 A certain state uses five symbols for automobile license plates such that
the first two are letters and the last three numbers. How many license plates
can be made, if:
i) All letters and numbers may be used?
ii) No two letters may be the same?
2.4.6 Suppose that the letters C, E, F, F, I and O are written on six chips and
placed into an urn. Then the six chips are mixed and drawn one by one without
replacement. What is the probability that the word OFFICE is formed?
2.4.7 The 24 volumes of the Encyclopaedia Britannica are arranged on a
shelf. What is the probability that:
i) All 24 volumes appear in ascending order?
ii) All 24 volumes appear in ascending order, given that volumes 14 and 15
appeared in ascending order and that volumes 113 precede volume 14?
2.4.8 If n countries exchange ambassadors, how many ambassadors are
involved?
2.4.9 From among n eligible draftees, m men are to be drafted so that all
possible combinations are equally likely to be chosen. What is the probability
that a specified man is not drafted?
2.4.10 Show that
n + 1
m + 1 n+1
= .
n m+1
m
2.4.11 Consider five line segments of length 1, 3, 5, 7 and 9 and choose three
of them at random. What is the probability that a triangle can be formed by
using these three chosen line segments?
2.4.12 From 10 positive and 6 negative numbers, 3 numbers are chosen at
random and without repetitions. What is the probability that their product is
a negative number?
2.4.13 In how many ways can a committee of 2n + 1 people be seated along
one side of a table, if the chairman must sit in the middle?
2.4.14 Each of the 2n members of a committee flips a fair coin in deciding
whether or not to attend a meeting of the committee; a committee member
attends the meeting if an H appears. What is the probability that a majority
will show up in the meeting?
2.4.15 If the probability that a coin falls H is p (0 < p < 1), what is the
probability that two people obtain the same number of Hs, if each one of
them tosses the coin independently n times?
42 2 Some Probabilistic Concepts and Results
2.4.16
i) Six fair dice are tossed once. What is the probability that all six faces
appear?
ii) Seven fair dice are tossed once. What is the probability that every face
appears at least once?
2.4.17 A shipment of 2,000 light bulbs contains 200 defective items and 1,800
good items. Five hundred bulbs are chosen at random, are tested and the
entire shipment is rejected if more than 25 bulbs from among those tested are
found to be defective. What is the probability that the shipment will be
accepted?
2.4.18 Show that
M M 1 M 1
= + ,
m m m 1
where N, m are positive integers and m < M.
2.4.19 Show that
r
m n m + n
x r x = r
,
x =0
where
k
= 0 if x > k.
x
2.4.20 Show that
n
n n
n
(1) j = 0.
j
i) j = 2 n ; ii )
j=0 j=0
iii) The committee contains the same number of students from each class;
iv) The committee contains two male students and one female student from
each class;
v) The committee chairman is required to be a senior;
vi) The committee chairman is required to be both a senior and male;
vii) The chairman, the secretary and the treasurer of the committee are all
required to belong to different classes.
2.4.23 Refer to Exercise 2.4.22 and suppose that the committee is formed by
choosing its members at random. Compute the probability that the committee
to be chosen satisfies each one of the requirements (i)(vii).
2.4.24 A fair die is rolled independently until all faces appear at least once.
What is the probability that this happens on the 20th throw?
2.4.27 Suppose that each one of n sticks is broken into one long and one
short part. Two parts are chosen at random. What is the probability that:
i) One part is long and one is short?
ii) Both parts are either long or short?
The 2n parts are arranged at random into n pairs from which new sticks are
formed. Find the probability that:
iii) The parts are joined in the original order;
iv) All long parts are paired with short parts.
2.4.28 Derive the third part of Theorem 9 from Theorem 8(ii).
2.4.29 Three cards are drawn at random and with replacement from a stan-
dard deck of 52 playing cards. Compute the probabilities P(Aj), j = 1, . . . , 5,
where the events Aj, j = 1, . . . , 5 are defined as follows:
44 2 Some Probabilistic Concepts and Results
{
A1 = s S ; all 3 cards in s are black , }
A2 = {s S ; at least 2 cards in s are red},
A3 = {s S ; exactly 1 card in s is an ace},
A4 = {s S ; the first card in s is a diamond,
}
the second is a heart and the third is a club ,
{ }
A5 = s S ; 1 card in s is a diamond, 1 is a heart and 1 is a club .
{
A1 = s S ; s consists of 1 color cards , }
A2 = {s S ; s consists only of diamonds},
A3 = {s S ; s consists of 5 diamonds, 3 hearts, 2 clubs and 3 spades},
A4 = {s S ; s consists of cards of exactly 2 suits},
A5 = {s S ; s contains at least 2 aces},
A6 = {s S ; s does not contain aces, tens and jacks},
2.5 Product
2.4 Combinatorial
Probability Results
Spaces 45
{
A7 = s S ; s consists of 3 aces, 2 kings and exactly 7 red cards , }
A8 = {s S ; s consists of cards of all different denominations}.
{ }
A j = s S ; s contains exactly j tens ,
A = {s S ; s contains exactly 7 red cards}.
{
A = s S ; s begins with a specific letter , }
{ (
B = s S ; s has the specified letter mentioned in the definition of A )
in the middle entry , }
{
C = s S ; s has exactly two of its letters the same . }
Then show that:
i) P(A B) = P(A)P(B);
ii) P(A C) = P(A)P(C);
iii) P(B C) = P(B)P(C);
iv) P(A B C) P(A)P(B)P(C).
Thus the events A, B, C are pairwise independent but not mutually
independent.
{
C = A1 A2 ; A1 A1 , A2 A2 , }
where A1 A2 = {(s , s ); s A , s
1 2 1 1 2 }
A2 .
46 2 Some Probabilistic Concepts and Results
{
C = A1 An ; A j A j , j = 1, 2, , n , }
and P is the unique probability measure defined on A through the
relationships
( ) ( ) ( )
P A1 An = P A1 P An , A j A j , j = 1, 2, , n.
The probability measure P is usually denoted by P1 Pn and is called the
product probability measure (with factors Pj, j = 1, 2, . . . , n), and the probabil-
ity space (S, A, P) is called the product probability space (with factors (Sj, Aj,
Pj), j = 1, 2, . . . , n). Then the experiments Ej, j = 1, 2, . . . , n are said to be
independent if P(B1 B2) = P(B1) P(B2), where Bj is defined by
B j = S1 S j 1 A j S j + 1 S n , j = 1, 2, , n.
Exercises
2.5.1 Form the Cartesian products A B, A C, B C, A B C, where
A = {stop, go}, B = {good, defective), C = {(1, 1), (1, 2), (2, 2)}.
S0 = 1,
M
S1 = P A j ,
j =1
( )
S2 =
1 j1 < j2 M
(
P A j1 A j2 , )
M
Sr =
1 j1 < j2 < < jr M
( )
P A j1 A j2 A jr ,
M
(
S M = P A1 A2 AM . )
Let also
Bm = exactly
C m = at least m of the events A j , j = 1, 2, , M occur.
Dm = at most
Then we have
48 2 Some Probabilistic Concepts and Results
( ) ( )
M
P B0 = S0 S1 + S 2 + 1 SM , (3)
and
( ) ( ) ( )
P C m = P Bm + P Bm+1 + + P BM , ( ) (4)
and
( ) ( ) ( )
P Dm = P B0 + P B1 + + P Bm . ( ) (5)
For the proof of this theorem, all that one has to establish is (2), since (4)
and (5) follow from it. This will be done in Section 5.6 of Chapter 5. For a proof
where S is discrete the reader is referred to the book An Introduction to
Probability Theory and Its Applications, Vol. I, 3rd ed., 1968, by W. Feller, pp.
99100.
The following examples illustrate the above theorem.
EXAMPLE 9 The matching problem (case of sampling without replacement). Suppose that
we have M urns, numbered 1 to M. Let M balls numbered 1 to M be inserted
randomly in the urns, with one ball in each urn. If a ball is placed into the urn
bearing the same number as the ball, a match is said to have occurred.
i) Show the probability of at least one match is
1 1
( ) 1
M +1
1 + + 1 1 e 1 0.63
2! 3! M!
for large M, and
ii) exactly m matches will occur, for m = 0, 1, 2, . . . , M is
1
1 1
( ) 1
M m
1 1 + + + 1
m! 2! 3! ( )
M m !
1 M m
( ) 1 1 1
k
= 1
m! k = 0
k! m!
e for M m large.
DISCUSSION To describe the distribution of the balls among the urns, write
an M-tuple (z1, z2, . . . , zM) whose jth component represents the number of the
ball inserted in the jth urn. For k = 1, 2, . . . , M, the event Ak that a match will
occur in the kth urn may be written Ak = {(z1, . . . , zM) ! M; zj integer, 1 zj
2.6* The
2.4Probability of Matchings
Combinatorial Results 49
) ( M! ) .
M r !
(
P Ak1 Ak2 Akr =
( )
M M r ! 1
Sr = = .
r M! r!
This implies the desired results.
EXAMPLE 10 Coupon collecting (case of sampling with replacement). Suppose that a manu-
facturer gives away in packages of his product certain items (which we take to
be coupons), each bearing one of the integers 1 to M, in such a way that each
of the M items is equally likely to be found in any package purchased. If n
packages are bought, show that the probability that exactly m of the integers,
1 to M, will not be obtained is equal to
n
M M m k M m m + k
( )
1
m k = 0 k
1 M .
Many variations and applications of the above problem are described in
the literature, one of which is the following. If n distinguishable balls are
distributed among M urns, numbered 1 to M, what is the probability that there
will be exactly m urns in which no ball was placed (that is, exactly m urns
remain empty after the n balls have been distributed)?
DISCUSSION To describe the coupons found in the n packages purchased,
we write an n-tuple (z1, z2, , zn), whose jth component zj represents the
number of the coupon found in the jth package purchased. We now define the
events A1, A2, . . . , AM. For k = 1, 2, . . . , M, Ak is the event that the number k
will not appear in the sample, that is,
( )
Ak = z1 , , zn ! n ; z j integer, 1 z j M , z j k, j = 1, 2, , n.
It is easy to see that we have the following results:
n n
M 1 1
( )
P Ak =
M
= 1 ,
M
k = 1, 2, , M ,
n n
M 2 2 k1 = 1, 2, , n
(
P Ak Ak1 2
) =
M
= 1 ,
M k2 = k1 + 1, , n
and, in general,
50 2 Some Probabilistic Concepts and Results
k1 = 1, 2, , n
n
r k2 = k1 + 1, , n
( 1 2 r
M
)
P Ak Ak Ak = 1 ,
M
kr = kr 1 + 1, , n.
Let Bm be the event that exactly m of the integers 1 to M will not be found in
the sample. Clearly, Bm is the event that exactly m of the events A1, . . . , AM
will occur. By relations (2) and (6), we have
n
M
r M r
( ) ( )
r m
P Bm = 1 1 M
r =m m r
n
M M m k M m m + k
= 1
m k=0 k
( )
1 M ,
(7)
m + k M M M m
= . (8)
m m + k m k
( )
P A
(
P A occurs before B occurs = ) .
( ) ( )
P A +P B
and therefore
2.6* The
2.4Probability
Combinatorial
of Matchings
Results 51
(
P A occurs before B occurs )
[ (
= P A1 + A1c B1c A2 + A1c B1c A2c B2c A3 ) ( )
(
+ + A1c B1c Anc Bnc An+1 + ) ]
( ) (
= P A1 + P A1c B1c A2 + P A1c B1c A2c B2c A3 ) ( )
( )
+ + P A1c B1c Anc Bnc An+1 +
= P( A ) + P( A B ) P( A ) + P( A B ) P( A B ) P( A )
1 1
c
1
c
2 1
c
1
c c
2
c
2 3
= P( A) + P( A B )P( A) + P ( A B )P( A)
c c 2 c c
+ + P ( A B )P( A)
n c c
= P( A)[1 + P( A B ) + P ( A B ) + + P ( A B ) + ]
c c 2 c c n c c
( ) 1 P A1 B
=P A .
( c c
)
But
( )
P Ac Bc = P A B = 1 P A B
c
( ) ( )
= 1 P A+ B = 1 P A P B , ( ) ( ) ( )
so that
(
1 P Ac Bc = P A + P B . ) ( ) ()
Therefore
( )
P A
(
P A occurs before B occurs = ) ,
( ) ( )
P A +P B
as asserted.
It is possible to interpret B as a catastrophic event, and A as an event
consisting of taking certain precautionary and protective actions upon the
energizing of a signaling device. Then the significance of the above probability
becomes apparent. As a concrete illustration, consider the following simple
example (see also Exercise 2.6.3).
52 2 Some Probabilistic Concepts and Results
Then P(A) = 4
52
= 1
13
and P(B) = 12
52
= 4
13
, so that P(A occurs before B occurs)
= 131 134 = 14 .
Exercises
2.6.1 Show that
m + k M M M m
= ,
m m + k m k
as asserted in relation (8).
2.6.2 Verify the transition in (7) and that the resulting expression is indeed
the desired result.
2.6.3 Consider the following game of chance. Two fair dice are rolled repeat-
edly and independently. If the sum of the outcomes is either 7 or 11, the player
wins immediately, while if the sum is either 2 or 3 or 12, the player loses
immediately. If the sum is either 4 or 5 or 6 or 8 or 9 or 10, the player continues
rolling the dice until either the same sum appears before a sum of 7 appears in
which case he wins, or until a sum of 7 appears before the original sum appears
in which case the player loses. It is assumed that the game terminates the first
time the player wins or loses. What is the probability of winning?
3.1 Soem General Concepts 53
Chapter 3
( ) ({ })(= P(X = x ))
fX x j = PX x j j for x = x j ,
53
54 3 On Random Variables and Their Distributions
()
fX x 0 for all x, and f (x ) = 1.
j
X j
( ) f (x ).
P X B =
x j B
X j
()
fX x 0 (
for all x ! k , and P X J = fX x dx ) J
()
for any sub-rectangle J of I. The function fX is the p.d.f. of X. The distribution
of a k-dimensional r. vector is also referred to as a k-dimensional discrete or
(absolutely) continuous distribution, respectively, for a discrete or (abso-
lutely) continuous r. vector. In Sections 3.2 and 3.3, we will discuss two repre-
sentative multidimensional distributions; namely, the Multinomial (discrete)
distribution, and the (continuous) Bivariate Normal distribution.
We will write f rather than fX when no confusion is possible. Again, when
one is presented with a function f and is asked whether f is a p.d.f. (of some r.
vector), all one has to check is non-negativity of f, and that the sum of its values
or its integral (over the appropriate space) is equal to 1.
{ }
S = S, F S, F{ } (n copies).
In particular, for n = 1, we have the Bernoulli or Point Binomial r.v. The r.v.
X may be interpreted as representing the number of Ss (successes) in
the compound experiment E E (n copies), where E is the experiment
resulting in the sample space {S, F} and the n experiments are independent
(or, as we say, the n trials are independent). f(x) is the probability that exactly
x Ss occur. In fact, f(x) = P(X = x) = P(of all n sequences of Ss and Fs
with exactly x Ss). The probability of one such a sequence is pxqnx by the
independence of the trials and this also does not depend on the particular
sequence we are considering. Since there are ( xn) such sequences, the result
follows.
The distribution of X is called the Binomial distribution and the quantities
n and p are called the parameters of the Binomial distribution. We denote the
Binomial distribution by B(n, p). Often the notation X B(n, p) will be used
to denote the fact that the r.v. X is distributed as B(n, p). Graphs of the p.d.f.
of the B(n, p) distribution for selected values of n and p are given in Figs. 3.1
and 3.2.
f (x)
0.25
n ! 12
0.20 1
p! 4
0.15
0.10
0.05
x
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 3.1 Graph of the p.d.f. of the Binomial distribution for n = 12, p = 14 .
()
f 0 = 0.0317 ()
f 7 = 0.0115
f (1) = 0.1267 f ( 8 ) = 0.0024
f (2 ) = 0.2323 f (9) = 0.0004
f ( 3) = 0.2581 f (10 ) = 0.0000
f ( 4 ) = 0.1936 f (11) = 0.0000
f ( 5) = 0.1032 f (12 ) = 0.0000
f (6) = 0.0401
3.2 3.1 Soem
Discrete Random Variables (and General
RandomConcepts
Vectors) 57
f (x)
0.25
0.20 n ! 10
1
0.15 p! 2
0.10
0.05
x
0 1 2 3 4 5 6 7 8 9 10
Figure 3.2 Graph of the p.d.f. of the Binomial distribution for n = 10, p = 12 .
()
f 0 = 0.0010 ()
f 6 = 0.2051
f (1) = 0.0097 f ( 7) = 0.1172
f (2 ) = 0.0440 f ( 8 ) = 0.0440
f ( 3) = 0.1172 f (9) = 0.0097
f ( 4 ) = 0.2051 f (10 ) = 0.0010
f ( 5) = 0.2460
3.2.2 Poisson
x
( ) {
X S = 0, 1, 2, , } ( ) ()
P X = x = f x = e
x!
,
x
f ( x ) = e x! = e
e = 1.
x =0 x =0
Poisson probabilities.
f (x)
0.20
0.15
0.10
0.05
x
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 3.3 Graph of the p.d.f. of the Poisson distribution with = 5.
()
f 0 = 0.0067 ()
f 9 = 0.0363
f (1) = 0.0337 f (10 ) = 0.0181
f (2 ) = 0.0843 f (11) = 0.0082
f ( 3) = 0.1403 f (12 ) = 0.0035
f ( 4 ) = 0.1755 f (13) = 0.0013
f ( 5) = 0.1755 f (14 ) = 0.0005
f (6) = 0.1462 f (15) = 0.0001
f ( 7) = 0.1044
f ( 8 ) = 0.0653 ()
f n is negligible for n 16.
3.2 Discrete Random Variables
3.1 Soem
(and General
RandomConcepts
Vectors) 59
3.2.3 Hypergeometric
m n
x r x
( ) {
X S = 0, 1, 2, , r , } ()
f x =
m + n
,
r
1
n + j 1 j
= x , x < 1.
(1 x) j=0
n
j
{all (r + x)-sequences of Ss and Fs such that the rth S is at the end of the
sequence}, x = 0, 1, . . . and f(x) = P(X = x) = P[all (r + x)-sequences as above
for a specified x]. The probability of one such sequence is pr1qxp by the
independence assumption, and hence
r + x 1 r 1 x r r + x 1 x
()
f x =
x
p q p= p
x
q .
The above interpretation also justifies the name of the distribution. For r = 1,
we get the Geometric (or Pascal) distribution, namely f(x) = pqx, x = 0, 1, 2, . . . .
( ) {
X S = 0, 1, , n 1 , } ()
f x =
1
n
, x = 0, 1, , n 1.
f (x)
n!5
1 Figure 3.4 Graph of the p.d.f. of a Discrete
5 Uniform distribution.
x
0 1 2 3 4
3.2.6 Multinomial
Here
k
() ( )
X S = x = x1 , , xk ; x j 0, j = 1, 2, , k, x j = n,
j =1
k
()
f x =
n!
x1! x 2 ! x k !
p1x 1 p2x 2 pkx k , p j > 0, j = 1, 2, , k, p j = 1.
j =1
f ( x) = ( )
n! n
p1x1 pkxk = p1 + + pk = 1n = 1,
X x 1 , , xk x1! xk !
where the summation extends over all xjs such that xj 0, j = 1, 2, . . . , k,
kj=1 xj = n. The distribution of X is also called the Multinomial distribution and
n, p1, . . . , pk are called the parameters of the distribution. This distribution
occurs in situations like the following. A Multinomial experiment E with k
possible outcomes Oj, j = 1, 2, . . . , k, and hence with sample space S = {all
n-sequences of Ojs}, is carried out n independent times. The probability of
the Ojs occurring is pj, j = 1, 2, . . . k with pj > 0 and kj=1 p j = 1 . Then X is the
random vector whose jth component Xj represents the number of times xj
the outcome Oj occurs, j = 1, 2, . . . , k. By setting x = (x1, . . . , xk), then f is the
3.1 Soem General Exercises
Concepts 61
probability that the outcome Oj occurs exactly xj times. In fact f(x) = P(X = x)
= P(all n-sequences which contain exactly xj Ojs, j = 1, 2, . . . , k). The prob-
ability of each one of these sequences is p1x1 pkxk by independence, and since
there are n!/( x1! xk ! ) such sequences, the result follows.
The fact that the r. vector X has the Multinomial distribution with param-
eters n and p1, . . . , pk may be denoted thus: X M(n; p1, . . . , pk).
REMARK 1 When the tables given in the appendices are not directly usable
because the underlying parameters are not included there, we often resort to
linear interpolation. As an illustration, suppose X B(25, 0.3) and we wish
to calculate P(X = 10). The value p = 0.3 is not included in the Binomial
Tables in Appendix III. However, 164 = 0.25 < 0.3 < 0.3125 = 165 and the
probabilities P(X = 10), for p = 164 and p = 165 are, respectively, 0.9703
and 0.8756. Therefore linear interpolation produces the value:
0.3 0.25
(
0.9703 0.9703 0.8756 ) 0.3125 0.25
= 0.8945.
Likewise for other discrete distributions. The same principle also applies
appropriately to continuous distributions.
REMARK 2 In discrete distributions, we are often faced with calculations of
the form x =1 x x . Under appropriate conditions, we may apply the following
approach:
d x d
d
x x
= x x 1 = = x
= = .
d d d 1
( )
2
x =1 x =1 x =1 x =1 1
Exercises
3.2.1 A fair coin is tossed independently four times, and let X be the r.v.
defined on the usual sample space S for this experiment as follows:
()
X s = the number of H s in s.
iii) X 1?
iii) X 20?
iii) 5 X 20?
3.2.3 A manufacturing process produces certain articles such that the prob-
ability of each article being defective is p. What is the minimum number, n, of
articles to be produced, so that at least one of them is defective with probabil-
ity at least 0.95? Take p = 0.05.
3.2.4 If the r.v. X is distributed as B(n, p) with p > 12 , the Binomial Tables
in Appendix III cannot be used directly. In such a case, show that:
ii) P(X = x) = P(Y = n x), where Y B(n, q), x = 0, 1, . . . , n, and q = 1 p;
ii) Also, for any integers a, b with 0 a < b n, one has: P(a X b) =
P(n b Y n a), where Y is as in part (i).
3.2.5 Let X be a Poisson distributed r.v. with parameter . Given that
P(X = 0) = 0.1, compute the probability that X > 5.
3.2.6 Refer to Exercise 3.2.5 and suppose that P(X = 1) = P(X = 2). What is
the probability that X < 10? If P(X = 1) = 0.1 and P(X = 2) = 0.2, calculate the
probability that X = 0.
3.2.7 It has been observed that the number of particles emitted by a radio-
active substance which reach a given portion of space during time t follows
closely the Poisson distribution with parameter . Calculate the probability
that:
iii) No particles reach the portion of space under consideration during
time t;
iii) Exactly 120 particles do so;
iii) At least 50 particles do so;
iv) Give the numerical values in (i)(iii) if = 100.
3.2.8 The phone calls arriving at a given telephone exchange within one
minute follow the Poisson distribution with parameter = 10. What is the
probability that in a given minute:
iii) No calls arrive?
iii) Exactly 10 calls arrive?
iii) At least 10 calls arrive?
3.2.9 (Truncation of a Poisson r.v.) Let the r.v. X be distributed as Poisson
with parameter and define the r.v. Y as follows:
(
Y = X if X k a given positive integer ) and Y = 0 otherwise.
Find:
3.1 Soem General Exercises
Concepts 63
(
P X j = x j , j = 1, , k = ) (
k
( ),
nj
j =1 x j
n1 + + nk
n )
k
0 x j n j , j = 1, , k, x j = n.
j =1
3.2.12 Refer to the manufacturing process of Exercise 3.2.3 and let Y be the
r.v. denoting the minimum number of articles to be manufactured until the
first two defective articles appear.
ii) Show that the distribution of Y is given by
( ) ( )( )
y2
P Y = y = p2 y 1 1 p , y = 2, 3, ;
() ()
f x = c x I A x , where {
A = 0, 1, 2, } (0 < < 1).
3.2.15 Suppose that the r.v. X takes on the values 0, 1, . . . with the following
probabilities:
() (
f j =P X=j = ) c
3j
, j = 0, 1, ;
3.2.16 There are four distinct types of human blood denoted by O, A, B and
AB. Suppose that these types occur with the following frequencies: 0.45, 0.40,
0.10, 0.05, respectively. If 20 people are chosen at random, what is the prob-
ability that:
ii) All 20 people have blood of the same type?
ii) Nine people have blood type O, eight of type A, two of type B and one of
type AB?
3.2.17 A balanced die is tossed (independently) 21 times and let Xj be the
number of times the number j appears, j = 1, . . . , 6.
ii) What is the joint p.d.f. of the Xs?
ii) Compute the probability that X1 = 6, X2 = 5, X3 = 4, X4 = 3, X5 = 2, X6 = 1.
3.2.18 Suppose that three coins are tossed (independently) n times and
define the r.v.s Xj, j = 0, 1, 2, 3 as follows:
(m x)(r x) f x ,
(
f x+1 = )
(n r + x + 1)(x + 1)
() { }
x = 0, 1, , min m, r .
3.2.21
i) Suppose the r.v.s X1, . . . , Xk have the Multinomial distribution, and let
j be a fixed number from the set {1, . . . , k}. Then show that Xj is distributed as
B(n, pj);
ii) If m is an integer such that 2 m k 1 and j1, . . . , jm are m distinct integers
from the set {1, . . . , k}, show that the r.v.s Xj , . . . , Xj have Multinomial
1 m
3.2.22 (Polyas urn scheme) Consider an urn containing b black balls and r
red balls. One ball is drawn at random, is replaced and c balls of the same color
as the one drawn are placed into the urn. Suppose that this experiment is
repeated independently n times and let X be the r.v. denoting the number of
black balls drawn. Then show that the p.d.f. of X is given by
( )(
b b + c b + 2c b + x 1 c [ ( )]
)
n r (r + c ) [r + ( n x 1)c ]
(
P X =x = ) .
x b + r b + r + c ( )( )
( )
b + r + 2c b + r + m 1 c [ ( )]
(This distribution can be used for a rough description of the spread of conta-
gious diseases. For more about this and also for a certain approximation to the
above distribution, the reader is referred to the book An Introduction to
Probability Theory and Its Applications, Vol. 1, 3rd ed., 1968, by W. Feller, pp.
120121 and p. 142.)
( )
X S = !, f x = () 1
exp
2 2
,
x ! .
2
We say that X is distributed as normal (, 2), denoted by N(, 2), where ,
2 are called the parameters of the distribution of X which is also called
the Normal distribution ( = mean, !, 2 = variance, > 0). For = 0,
= 1, we get what is known as the Standard Normal distribution, denoted
by N(0, 1). Clearly f(x) > 0; that I = f x dx = 1 is proved by showing that ()
I = 1. In fact,
2
()
I 2 = f x dx = f x dx f y dy () ()
x 2
( y
) ( )
2
1 1
dx exp dy
2
= exp
2 2 2 2
1 1 1
2 2
= e z 2
dz e 2
d ,
2
f (x)
0.8
$ ! 0.5
0.6
0.4
$!1
0.2
$!2
x
"2 "1 0 1 2 3 4 5
N(#, $ 2 )
Figure 3.5 Graph of the p.d.f. of the Normal distribution with = 1.5 and several values of .
1 (
z 2 + 2 ) 2 dz d = 1 2
2
2
I2 = e e r 2
r dr d
2 0 0
1 2
r dr d = e r
2 2 2
I2 = e r 2 2
r dr = e r 2
0 = 1;
2 0 0 0
max f x = () 1
x !
2
and the fact that
f ( x)dx = 1,
we conclude that the larger is, the more spread-out f(x) is and vice versa. It
is also clear that f(x) 0 as x . Taking all these facts into consideration,
we are led to Fig. 3.5.
The Normal distribution is a good approximation to the distribution of
grades, heights or weights of a (large) group of individuals, lifetimes of various
manufactured items, the diameters of hail hitting the ground during a storm,
the force required to punctuate a cardboard, errors in numerous measure-
ments, etc. However, the main significance of it derives from the Central Limit
Theorem to be discussed in Chapter 8, Section 8.3.
3.3.2 Gamma
( ) (
X S = ! actually X S = 0, ( ) ( ))
3.3 Continuous Random Variables
3.1 Soem
(and General
RandomConcepts
Vectors) 67
Here
1
x 1 e x , x > 0 > 0, > 0,
()
f x = ( )
0 , x0
where () = 0 y 1 e y dy (which exists and is finite for > 0). (This integral
is known as the Gamma function.) The distribution of X is also called the
Gamma distribution and , are called the parameters of the distribution.
Clearly, f(x) 0 and that f(x)dx = 1 is seen as follows.
f (x )dx = ( )
1 1
( )
x 1 e x dx = y 1 e y dy,
0
0
f (x )dx = ( ) 0 ( )
1 1
y 1 e y dy = = 1.
( )
REMARK 3 One easily sees, by integrating by parts, that
( ) (
= 1 1 , )( )
and if is an integer, then
( ) ( )(
= 1 2 1 , ) ()
where
() () ( )
1 = e y dy = 1; that is, a = 1 !
0
We often use this notation even if is not an integer, that is, we write
( ) ( )
= 1 ! = y 1 e y dy for > 0.
0
1 1
= ! = .
2 2
We have
1 (1 2 )
= y e y dy.
2 0
By setting
t2
y1 2 =
t
, so that y=
2
, dy = t dt , t 0, .( )
2
68 3 On Random Variables and Their Distributions
we get
1 1
= 2 e t t dt = 2 e t
2 2
2 2
dt = ;
2 0 t 0
that is,
1 1
= ! = .
2 2
From this we also get that
3 1 1
= = , etc.
2 2
2 2
Graphs of the p.d.f. of the Gamma distribution for selected values of and
are given in Figs. 3.6 and 3.7.
The Gamma distribution, and in particular its special case the Negative
Exponential distribution, discussed below, serve as satisfactory models for
f (x)
1.00
% ! 1, & ! 1
0.75
0.50
% ! 2, & ! 1
0.25 % ! 4, & ! 1
x
0 1 2 3 4 5 6 7 8
Figure 3.6 Graphs of the p.d.f. of the Gamma distribution for several values of , .
f (x)
1.00
0.50
% ! 2, & ! 1
0.25
% ! 2, & ! 2
x
0 1 2 3 4 5
Figure 3.7 Graphs of the p.d.f. of the Gamma distribution for several values of , .
3.3 Continuous Random Variables
3.1 Soem
(and General
RandomConcepts
Vectors) 69
3.3.3 Chi-square
For = r/2, r 1, integer, = 2, we get what is known as the Chi-square
distribution, that is,
1 ( r 2 )1 x 2
x e , x>0
() 1
( )
f x = 2 r 2
r 2
r > 0, integer.
0, x0
The distribution with this p.d.f. is denoted by r2 and r is called the number of
degrees of freedom (d.f.) of the distribution. The Chi-square distribution occurs
often in statistics, as will be seen in subsequent chapters.
e x , x>0
()
f x =
x0
> 0,
0 ,
which is known as the Negative Exponential distribution. The Negative Expo-
nential distribution occurs frequently in statistics and, in particular, in waiting-
time problems. More specifically, if X is an r.v. denoting the waiting time
between successive occurrences of events following a Poisson distribution,
then X has the Negative Exponential distribution. To see this, suppose that
events occur according to the Poisson distribution P(); for example, particles
emitted by a radioactive source with the average of particles per time unit.
Furthermore, we suppose that we have just observed such a particle, and let X
be the r.v. denoting the waiting time until the next particle occurs. We shall
show that X has the Negative Exponential distribution with parameter .
To this end, it is mentioned here that the distribution function F of an r.v., to
be studied in the next chapter, is defined by F(x) = P(X x), x !, and if X
is a continuous r.v., then dFdx( x ) = f ( x). Thus, it suffices to determine F here.
Since X 0, it follows that F(x) = 0, x 0. So let x > 0 be the the waiting time
for the emission of the next item. Then F(x) = P(X x) = 1 P(X > x). Since
is the average number of emitted particles per time unit, their average
x ( x )
0
() f (x )dx = dx = 1.
1
f x 0,
f (x)
1
&"% Figure 3.8 Graph of the p.d.f. of the
U(, ) distribution.
x
0 % &
3.3.6 Beta
( ) (
X S = ! actually X S = 0, 1 and ( ) ( ))
+ ( )
( )
1
x 1 1 x 0<x<1
()
f x = ( )()
,
0 elsewhere, > 0, > 0.
Clearly, f(x) 0. That f ( x)dx = 1 is seen as follows.
3.3 Continuous Random Variables
3.1 Soem
(and General
RandomConcepts
Vectors) 71
( )()
= x 1 e x dx y 1 e y dy
0 0
( x+y )
= x 1 y 1 e dx dy
0 0
x=
uy
, dx =
y du
, u 0, 1 ( ) and x + y =
y
,
1 u
( ) 1 u
2
1 u
becomes
1 u 1 (
y 1 u ) du
= y 1 y 1 e y dy
(1 u) (1 u)
0 0 1 2
1 u 1 y 1 u( )
= y + 1 e du dy.
( )
0 0 +1
1 u
u (1 u)
1 1
= 1
+ 1 e du d
0 0
( )
1 1
= + 1 e d u 1 1 u du
0 0
( ) u (1 u)
1
1 1
= + du;
0
that is,
( )() ( ) x (1 x)
1
1 1
= + dx
0
and hence
(
+ )
f (x )dx = ( )( ) 0 x (1 x )
1 1
1
dx = 1.
Graphs of the p.d.f. of the Beta distribution for selected values of and are
given in Fig. 3.9.
REMARK 4 For = = 1, we get the U(0, 1), since (1) = 1 and (2) = 1.
The distribution of X is also called the Beta distribution and occurs rather
often in statistics. , are called the parameters of the distribution
and the function defined by 0 x 1 (1 x ) 1 dx for , > 0 is called the Beta
1
function.
Again the fact that X has the Beta distribution with parameters and
may be expressed by writing X B(, ).
72 3 On Random Variables and Their Distributions
f (x) %!5
&!3
2.5
%!3
&!3
2.0
%!2
1.5
&!2
1.0
x
0 0.2 0.4 0.6 0.8 1.0
Figure 3.9 Graphs of the p.d.f. of the Beta distribution for several values of , .
3.3.7 Cauchy
Here
( )
X S = ! and ()
f x =
1
, x ! , ! , > 0.
2 + x
( )
2
f ( x )dx =
1 1 1
1 + x
dx = dx
( ) [( ) ]
2 2
2 + x
1 dy 1
1 + y 2
= = arctan y = 1,
upon letting
x dx
y= , so that = dy.
The distribution of X is also called the Cauchy distribution and , are called
the parameters of the distribution (see Fig. 3.10). We may write X
Cauchy(, 2) to express the fact that X has the Cauchy distribution with
parameters and 2.
(The p.d.f. of the Cauchy distribution looks much the same as the Normal
p.d.f., except that the tails of the former are heavier.)
3.3.8 Lognormal
Here X(S) = ! (actually X(S) = (0, )) and
3.3 3.1 Soem
Continuous Random Variables (and General
RandomConcepts
Vectors) 73
f (x)
0.3
0.2
0.1
x
"2 "1 0 1 2
Figure 3.10 Graph of the p.d.f. of the Cauchy distribution with = 0, = 1.
( )
2
1 exp log x log , x>0
()
f x = x 2
2 2
0, x 0 where , > 0.
Now f(x) 0 and
log x log
( )
2
()
1 1
f x dx = 0 exp dx
2 x 2 2
which, letting x = ey, so that log x = y, dx = eydy, y (, ), becomes
y log
( )
2
1 1
= y 2 2
2 e
exp e y dy.
But this is the integral of an N(log , 2) density and hence is equal to 1; that
is, if X is lognormally distributed, then Y = log X is normally distributed with
parameters log and 2. The distribution of X is called Lognormal and , are
called the parameters of the distribution (see Fig. 3.11). The notation X
Lognormal(, ) may be used to express the fact that X has the Lognormal
distribution with parameters and .
(For the many applications of the Lognormal distribution, the reader is
referred to the book The Lognormal Distribution by J. Aitchison and J. A. C.
Brown, Cambridge University Press, New York, 1957.)
"
These distributions occur very often in Statistics (interval esti-
3.3.9 t
mation, testing hypotheses, analysis of variance, etc.) and their
3.3.10 F densities will be presented later (see Chapter 9, Section 9.2).
We close this section with an example of a continuous random vector.
f (x)
0.8 & 2 ! 0.5
%!1
0.6
1
0.4 % ! e2
0.2 %!e
x
0 1 2 3 4
Figure 3.11 Graphs of the p.d.f. of the Lognormal distribution for several values of , .
(
f x1 , x 2 = ) 1
e q 2 ,
2 1 2 1 2
1 x 2 x 1 1 x 2 2 x 2 2
2
q= 1 1
2 +
1 2 1 1 2 2
with 1, 2 !. The distribution of X is also called the Bivariate Normal
distribution and the quantities 1, 2, 1, 2, are called the parameters of the
distribution. (See Fig. 3.12.)
Clearly, f(x1, x2) > 0. That ! f(x1, x2)dx1dx2 = 1 is seen as follows:
2
x 2 x1 1 x 2 2 x 2 2
2
( )
1 2 q = 1
1
1
2
1 2 2
+
2 2
x 2 x1 1
= 2
2
1 1
(
2 x1 1
+ 1 . )
Furthermore,
x2 2 x1 1 x 2 2 1 x 1
= 2 1
2 1 2 2 1
x1 1
=
1
2
x 2 2 + 2
=
1
x2 b , ( )
1 2
where
2
b = 2 +
1
(
x1 1 . )
3.3 3.1 Soem
Continuous Random Variables (and General
RandomConcepts
Vectors) 75
z
z ! f(x1, x 2 ) for z ' k
x2
0
x1 k
Thus
2 2
(
1 2 q = 2)
x b
2
2 x1 1
+ 1
1
( )
and hence
x
(
)
2
( )
1
exp
1 1
f x1 , x 2 dx 2 =
2 1 2 12
( )
2
1 x2 b dx
exp
2 2 1 2
2 2 1 2
2 ( ) 2
x
( )
2
1
exp 1,
1 1
=
2 1 2 2
1
since the integral above is that of an N(b, 22(1 2)) density. Since the first
factor is the density of an N(1, 12) random variable, integrating with respect
to x1, we get
f ( x1 , x 2 )dx1 dx 2 = 1.
REMARK 5 From the above derivations, it follows that, if f(x1, x2) is Bivariate
Normal, then
( ) ( ) ( )
f1 x1 = f x1 , x 2 dx 2 is N 1 , 12 ,
and similarly,
( ) ( ) ( )
f2 x 2 = f x1 , x 2 dx1 is N 2 , 22 .
76 3 On Random Variables and Their Distributions
As will be seen in Chapter 4, the p.d.f.s f1 and f2 above are called marginal
p.d.f.s of f.
The notation X N(1, 2, 12, 22, ) may be used to express the fact that
X has the Bivariate Normal distribution with parameters 1, 2, 12, 22, .
Then X1 N(1 , 12) and X2 N(2, 22 ).
Exercises
3.3.1 Let f be the p.d.f. of the N(, 2) distribution and show that:
ii) f is symmetric about ;
ii) max f x = () 1
.
x !
2
3.3.2 Let X be distributed as N(0, 1), and for a < b, let p = P(a < X < b). Then
use the symmetry of the p.d.f. f in order to show that:
iii) For 0 a < b, p = (b) (a);
iii) For a 0 < b, p = (b) + (a) 1;
iii) For a b < 0, p = (a) (b);
iv) For c > 0, P(c < X < c) = 2(c) 1.
(See Normal Tables in Appendix III for the definition of .)
3.3.3 If X N(0, 1), use the Normal Tables in Appendix III in order to show
that:
iii) P(1 < X < 1) = 0.68269;
iii) P(2 < X < 2) = 0.9545;
iii) P(3 < X < 3) = 0.9973.
3.3.4 Let X be a r2 . In Table 5, Appendix III, the values = P(X x) are
given for r ranging from 1 to 45, and for selected values of . From the entries
of the table, observe that, for a fixed , the values of x increase along with the
number of degrees of freedom r. Select some values of and record the
corresponding values of x for a set of increasing values of r.
3.3.5 Let X be an r.v. distributed as 102. Use Table 5 in Appendix III in order
to determine the numbers a and b for which the following are true:
ii) P(X < a) = P(X > b);
ii) P(a < X < b) = 0.90.
3.3.6 Consider certain events which in every time interval [t1, t2] (0 < t1 < t2)
occur independently for nonoverlapping intervals according to the Poisson
distribution P((t2 t1)). Let T be the r.v. denoting the time which lapses
3.1 Soem General Exercises
Concepts 77
between two consecutive such events. Show that the distribution of T is Nega-
tive Exponential with parameter by computing the probability that T > t.
3.3.7 Let X be an r.v. denoting the life length of a TV tube and suppose that
its p.d.f. f is given by:
f(x) = exI(0, )(x).
Compute the following probabilities:
iii)P(j < X j + 1), j = 0, 1, . . . ;
iii)P(X > t) for some t > 0;
iii)P(X > s + t|X > s) for some s, t > 0;
iv) Compare the probabilities in parts (ii) and (iii) and conclude that the
Negative Exponential distribution is memoryless;
iv) If it is known that P(X > s) = , express the parameter in terms of
and s.
3.3.8 Suppose that the life expectancy X of each member of a certain group
of people is an r.v. having the Negative Exponential distribution with param-
eter = 1/50 (years). For an individual from the group in question, compute
the probability that:
iii) He will survive to retire at 65;
iii) He will live to be at least 70 years old, given that he just celebrated his 40th
birthday;
iii) For what value of c, P(X > c) = 12 ?
3.3.9 Let X be an r.v. distributed as U(, ) ( > 0). Determine the values
of the parameter for which the following are true:
ii) P(1 < X < 2) = 0.75;
ii) P(|X| < 1) = P(|X| > 2).
3.3.10 Refer to the Beta distribution and set:
1
( ) ( )
1
B , = x 1 1 x dx.
0
n 1 p m1
( ) n!
( )
n m p n m
n x 1 x dx = x m1 1 x dx
m 1 0 ( )(
m1 nm ! 0 )
n
= n p j 1 p ( )
n j
.
j= m
j
3.3.12 Let X be an r.v. with p.d.f given by f(x) = 1/[(1 + x2)]. Calculate the
probability that X2 c.
78 3 On Random Variables and Their Distributions
ce 6 x , x>0
ii) f(x) = cx, 1 < x 0;
0 , x 1
ii) f(x) = cx2ex I(0,) (x).
3
3.3.16 Let X be an r.v. with p.d.f. given by 3.3.15(ii). Compute the probabil-
ity that X > x.
3.3.17 Let X be the r.v. denoting the life length of a certain electronic device
expressed in hours, and suppose that its p.d.f. f is given by:
()
f x =
c
()
I 1,000 , 3,000 ] x .
xn [
iii) Show that it is a p.d.f. (called the Weibull p.d.f. with parameters
and );
iii) Observe that the Negative Exponential p.d.f. is a special case of a Weibull
p.d.f., and specify the values of the parameters for which this happens;
iii) For = 1 and = 12 , = 1 and = 2, draw the respective graphs of the
p.d.f.s involved.
(Note: The Weibull distribution is employed for describing the lifetime of
living organisms or of mechanical systems.)
3.4
3.1 The
SoemPoisson
General
Distribution
Concepts 79
3.3.20 Let X and Y be r.v.s having the joint p.d.f. f given by:
( ) ( ) (
f x, y = c 25 x 2 y 2 I (0 ,5 ) x 2 + y 2 . )
Determine the constant c and compute the probability that 0 < X2 + Y2 < 4.
3.3.21 Let X and Y be r.v.s whose joint p.d.f. f is given by f(x, y) =
cxyI(0,2)(0,5)(x, y). Determine the constant c and compute the following
probabilities:
i) P( 12 < X < 1, 0 < Y < 3);
ii) P(X < 2, 2 < Y < 4);
iii) P(1 < X < 2, Y > 5);
iv) P(X > Y).
3.3.22 Verify that the following function is a p.d.f.:
( )
f x, y =
1
4
( ) (
cos y I A x, y , ) ( ]
A = , , .
2 2
3.3.23 (A mixed distribution) Show that the following function is a p.d.f.
1 x
e , x0
4
1 , 0<x<2
()
f x = 8
x
1
, x = 2, 3,
2
0, otherwise.
x
n x n x
pn q n n
e for each fixed x = 0, 1, 2, .
x
x!
80 3 On Random Variables and Their Distributions
PROOF We have
( ) (
n x n x n n 1 n x + 1 x n x
pn q n = pn q n
)
x x!
=
( ) ( )
n n 1 n x + 1 n n
x n x
1
x! n n
=
( ) (
n n1 nx +1 ) 1 x
n
1 n
1
n
nx x! n n
x
1
n
1 x 1 nx
= 1 1 1
n n x!
n
1 x
1 n
n
e ,
n n
x
x!
1
n
since, if n , then
n
n
1 e .
n
m n
x r x r
p x q r x , x = 0, 1, 2, , r .
m + n x
r
PROOF We have
m n
x r x
=
(
m! n! m + n r ! )
r!
m + n
( )[ ( )] ( )
m x ! n r x ! m + n ! x! r x ! ( )
r
( ) (
r m x + 1 m n r + x + 1 n
=
)
x (
m+nr +1 m+n) ( )
r m( m 1) [ m ( x 1)] n( n 1) [ n (r x 1)]
= .
x (m + n) [(m + n) (r 1)]
Both numerator and denominator have r factors. Dividing through by (m + n),
we get
m x
n r x r m m 1 m x1
=
m + n x m + n m + n m + n m + n m + n
r
n n 1 n r x 1
m + n m + n m + n m+n m+n
1
1 r 1
1 1 1
m + n m + n
r
m ,n
p x q r x ,
x
since
m n
p and hence 1 p = q. "
m+n m+n
m n
x r x r
by p x q r x
m + n x
r
by setting p = m/(m +x n). This is true for all x = 0, 1, 2, . . . , r. It is to be
observed that ( xr )( mm+ n ) (1 mm+ n ) is the exact probability of having exactly x
rx
Exercises
3.4.1 For the following values of n, p and = np, draw graphs of B(n, p) and
P() on the same coordinate axes:
iii) n= 10, p = 4
16
, so that = 2.5;
iii) n= 16, p = 2
16
, so that = 2;
iii) n= 20, p = 2
16
, so that = 2.5;
iv) n= 24, p = 1
16
, so that = 1.5;
iv) n= 24, p = 2
16
, so that = 3.
3.4.2 Refer to Exercise 3.2.2 and suppose that the number of applicants is
equal to 72. Compute the probabilities (i)(iii) by using the Poisson approxi-
mation to Binomial (Theorem 1).
3.4.3 Refer to Exercise 3.2.10 and use Theorem 2 in order to obtain an
approximate value of the required probability.
( ) { () }
X 1 T = s S ; X s T .
3.5* Random Variables as Measurable Functions
3.1 Soem and
General
Related
Concepts
Results 83
( ) ( )
If T1 T2 = , then X 1 T1 X 1 T2 = . (2)
Hence by (1) and (2) we have
j j
( )
X 1 T j = X 1 T j . (3)
Also
j j
( )
X 1 I T j = I X 1 T j , (4)
( ) [ ( )] ,
c
X 1 T c = X 1 T (5)
( )
X 1 T = S , (6)
X 1 () = . (7)
( ) { ( )
X 1 D = A S ; A = X 1 T for some T D . }
By means of (1), (5), (6) above, we immediately have
THEOREM 3 The class X1(D) is a -field of subsets of S.
The above theorem is the reason we require measurability in our defini-
tion of a random variable. It guarantees that the probability distribution
function of a random vector, to be defined below, is well defined.
If X1(D) A, then we say that X is (A, D)-measurable, or just measur-
able if there is no confusion possible. If (T, D) = (!, B) and X is (A, B)-
measurable, we say that X is a random variable (r.v.). More generally, if (T, D)
= (!k, Bk), where !k = ! ! ! (k copies of !), and X is (A, Bk)-
measurable, we say that X is a k-dimensional random vector (r. vector). In this
latter case, we shall write X if k 1, and just X if k = 1. A random variable is
a one-dimensional random vector.
On the basis of the properties (1)(7) of X1, the following is immediate.
THEOREM 4 Define the class C* of subsets of T as follows: C* = {T T; X1(T) = A for some
A A}. Then C* is a -field.
COROLLARY Let D = (C), where C is a class of subsets of T. Then X is (A, D)-measurable
if and only if X1(C) A. In particular, X is a random variable if and only if
84 3 On Random Variables and Their Distributions
( ) [ ( )] ( ) ({
PX B = P X 1 B = P X B = P s S ; X s B . ( ) }) (8)
By the Corollary to Theorem 4, the sets X1(B) in S are actually events due to
the assumption that X is an r. vector. Therefore PX is well defined by (8); i.e.,
P[X1(B)] makes sense, is well defined. It is now shown that PX is a probability
measure on Bk . In fact, PX(B) 0, B Bk, since P is a probability measure.
Next, PX(!k) = P[X 1(!k)] = P(S ) = 1, and finally,
[ ( )]
( ) ( )
PX B j = P X 1 B j = P X 1 B j = P X 1 B j = PX B j .
j =1 j = 1
j =1 j =1 j =1
Exercises
3.5.1 Consider the sample space S supplied with the -field of events A. For
an event A, the indicator IA of A is defined by: IA(s) = 1 if s A and IA(s) = 0
if s Ac.
iii) Show that IA is r.v. for any A A .
iii) What is the partition of S induced by IA?
iii) What is the -field induced by IA?
3.5.2 Write out the proof of Theorem 1 by using (1), (5) and (6).
3.5.3 Write out the proof of Theorem 2.
4.1 The Cumulative Distribution Function 85
Chapter 4
85
86 4 Distribution Functions, Probability Densities, and Their Relationship
x1 < x2 implies (, x ] (, x ]
1 2
and hence
( ] ( ]
Q , x1 Q , x 2 ; equivalently, F x1 F x 2 . ( ) ( )
iii) This means that, if xn x, then F(xn) F(x). In fact,
x n x implies (, x ] (, x]
n
and hence
( ]
Q , x n Q , x ( ]
by Theorem 2, Chapter 2; equivalently, F(xn) F(x).
iv) Let xn . We may assume that xn (see also Exercise 4.1.6). Then
(, x ] ,
n (
so that Q , x n Q = 0] ( )
by Theorem 2, Chapter 2. Equivalently, F(xn) 0. Similarly, if xn +.
We may assume xn . Then
(, x ] ! and hence Q(, x ] Q(! ) = 1; equivalently, F ( x ) 1. "
n n n
This limit always exists, since F(xn), but need not be equal to F(x+)(=limit
from the right) = F(x). The quantities F(x) and F(x) are used to express
the probability P(X = a); that is, P(X = a) = F(a) F(a). In fact, let xn
a and set A = (X = a), An = (xn < X a). Then, clearly, An A and hence
by Theorem 2, Chapter 2,
( ) ( )
P An P A , or lim P x n < X a = P X = a ,
n
( ) ( )
or
[ ( ) ( )] (
lim F a F x n = P X = a ,
n
)
4.1 The Cumulative Distribution Function 87
or
() ( )
F a lim F x n = P X = a ,
n
( )
or
() ( )
F a F a =P X =a . ( )
It is known that a nondecreasing function (such as F) may have
discontinuities which can only be jumps. Then F(a) F(a) is the length of
the jump of F at a. Of course, if F is continuous then F(x) = F(x) and
hence P(X = x) = 0 for all x.
iii) If X is discrete, its d.f. is a step function, the value of it at x being defined
by
( ) f (x )
F x =
xjx
j and ( ) ( ) ( )
f x j = F x j F x j1 ,
F(x) F(x)
1.00 1.00
0.80 0.80
0.60 0.60
0.40 0.40
0.20 0.20
0 x 0 x
1
(a) Binomial for n ! 6, p ! 4
. (b) Poisson for ( ! 2.
F(x) )(x)
1.0 1.0
0 x 0.5
% &
x
0 x+% "2 "1 0 1 2
x"%
(c) U (%, &). Here F(x) ! &"%
% * x * &.
(d) N(0, 1).
1 x'&
Figure 4.1 Examples of graphs of c.d.f.s.
88 4 Distribution Functions, Probability Densities, and Their Relationship
() 1 y
2
y = e t 2
dt .
2
We have
X
( )
P Y y = P
y = P X y +
( )
(
t
)
2
1 1
()
y +
exp dt = y
e
u2 2
= du = y ,
2 2 2 2
4.1 The Cumulative DistributionExercises
Function 89
() ( )
FY y = P Y y = P y X ( y )
1 y 1 y
2 2
= ex 2
dx = 2 ex 2
dx.
y
2 2 0
() 1 1
y
FY y = 2
2
0
2 t
e t 2 dt .
Hence
dFY y( )= 1 1
e y 2
=
1
y
(1 2 )1
e y 2 .
dy 2 y 2
1 (1 2 )1 y 2
y e , y>0
()
fY y = 21 2
0, y 0,
and this is the p.d.f. of 21. (Observe that here we used the fact that ( 12 ) =
.) "
Exercises
4.1.1 Refer to Exercise 3.2.13, in Chapter 3, and determine the d.f.s corre-
sponding to the p.d.f.s given there.
4.1.2 Refer to Exercise 3.2.14, in Chapter 3, and determine the d.f.s corre-
sponding to the p.d.f.s given there.
90 4 Distribution Functions, Probability Densities, and Their Relationship
4.1.3 Refer to Exercise 3.3.13, in Chapter 3, and determine the d.f.s corre-
sponding to the p.d.f.s given there.
4.1.4 Refer to Exercise 3.3.14, in Chapter 3, and determine the d.f.s corre-
sponding to the p.d.f.s given there.
4.1.5 Let X be an r.v. with d.f. F. Determine the d.f. of the following r.v.s:
X, X 2, aX + b, XI[a,b) (X ) when:
i) X is continuous and F is strictly increasing;
ii) X is discrete.
4.1.6 Refer to the proof of Theorem 1 (iv) and show that we may assume
that xn (xn ) instead of xn (xn ).
4.1.7 Let f and F be the p.d.f. and the d.f., respectively, of an r.v. X. Then
show that F is continuous, and dF(x)/dx = f(x) at the continuity points x of f.
4.1.8
i) Show that the following function F is a d.f. (Logistic distribution) and
derive the corresponding p.d.f., f.
()
F x =
1
(x + )
, x ! , > 0, ! ;
1+ e
ii) Show that f(x) =F(x)[1 F(x)].
4.1.9 Refer to Exercise 3.3.17 in Chapter 3 and determine the d.f. F corre-
sponding to the p.d.f. f given there. Write out the expressions of F and f for
n = 2 and n = 3.
4.1.10 If X is an r.v. distributed as N(3, 0.25), use Table 3 in Appendix III in
order to compute the following probabilities:
i) P(X < 1);
ii) P(X > 2.5);
iii) P(0.5 < X < 1.3).
4.1.11 The distribution of IQs of the people in a given group is well approxi-
mated by the Normal distribution with = 105 and = 20. What proportion of
the individuals in the group in question has an IQ:
i) At least 150?
ii) At most 80?
iii) Between 95 and 125?
4.1.12 A certain manufacturing process produces light bulbs whose life
length (in hours) is an r.v. X distributed as N(2,000, 2002). A light bulb is
supposed to be defective if its lifetime is less than 1,800. If 25 light bulbs are
4.2 The 4.1
d.f. ofThe
a Random
Cumulative
Vector
Distribution
and Its Properties
Function 91
tested, what is the probability that at most 15 of them are defective? (Use the
required independence.)
4.1.13 A manufacturing process produces 12 -inch ball bearings, which are
assumed to be satisfactory if their diameter lies in the interval 0.5 0.0006 and
defective otherwise. A days production is examined, and it is found that the
distribution of the actual diameters of the ball bearings is approximately
normal with mean = 0.5007 inch and = 0.0005 inch. Compute the propor-
tion of defective ball bearings.
4.1.14 If X is an r.v. distributed as N(, 2), find the value of c (in terms of
and ) for which P(X < c) = 2 9P(X > c).
4.1.15 Refer to the Weibull p.d.f., f, given in Exercise 3.3.19 in Chapter 3 and
do the following:
i) Calculate the corresponding d.f. F and the reliability function !(x) = 1
F(x);
()
f (x )
ii) Also, calculate the failure (or hazard) rate H x = ! ( x ) , and draw its graph
for = 1 and = 1 , 1, 2;
2
iii) For s and t > 0, calculate the probability P(X > s + t|X > t) where X is an r.v.
having the Weibull distribution;
iv) What do the quantities F(x), !(x), H(x) and the probability in
part (iii) become in the special case of the Negative Exponential
distribution?
4.2 The d.f. of a Random Vector and Its PropertiesMarginal and Conditional
d.f.s and p.d.f.s
For the case of a two-dimensional r. vector, a result analogous to Theorem
1 can be established. So consider the case that k = 2. We then have X =
(X1, X2) and the d.f. F(or FX or FX1,X2) of X, or the joint distribution function
of X1, X2, is F(x1, x2) = P(X1 x1, X2 x2). Then the following theorem holds
true.
With the above notation we have
THEOREM 4
i) 0 F(x1, x2) 1, x1, x2 !.
ii) The variation of F over rectangles with sides parallel to the axes, given in
Fig. 4.2, is 0.
iii) F is continuous from the right with respect to each of the coordinates x1, x2,
or both of them jointly.
92 4 Distribution Functions, Probability Densities, and Their Relationship
y
(x1, y 2)
y2 (x2, y 2)
" ,
Figure 4.2 The variation V of F over the
rectangle is:
, "
y1 (x2, y 1) F(x1, y1) + F(x2, y2) F(x1, y2) F(x2, y1)
(x1, y 1)
x
0 x1 x2
iv) If both x1, x2, , then F(x1, x2) 1, and if at least one of the x1, x2
, then F(x1, x2) 0. We express this by writing F(, ) = 1, F(, x2) =
F(x1, ) = F(, ) = 0, where < x1, x2 < .
PROOF
i) Obvious.
ii) V = P(x1 < X1 x2, y1 < X2 y2) and is hence, clearly, 0.
iii) Same as in Theorem 3. (If x = (x1, x2), and zn = (x1n, x2n), then zn x means
x1n x1, x2n x2).
iv) If x1, x2 , then (, x1] (, x2] R 2, so that F(x1, x2) P(S) = 1. If
at least one of x1, x2 goes () to , then (, x1] (, x2] , hence
( ) ( )
F x1 , x 2 P = 0. "
REMARK 3 The function F(x1, ) = F1(x1) is the d.f. of the random variable
X1. In fact, F(x1, ) = F1(x1) is the d.f. of the random variable X1. In fact,
( ) (
F x1 , = lim P X1 x1 , X 2 x n
x n
)
( ) (
= P X1 x1 , < X 2 < = P X1 x1 = F1 x1 . ) ( )
Similarly F(, x2) = F2(x2) is the d.f. of the random variable X2. F1, F2 are called
marginal d.f.s.
REMARK 4 It should be pointed out here that results like those discussed
in parts (i)(iv) in Remark 1 still hold true here (appropriately interpreted).
In particular, part (iv) says that F(x1, x2) has second order partial derivatives
and
2
x1x 2
( ) (
F x1 , x 2 = f x1 , x 2 )
at continuity points of f.
For k > 2, we have a theorem strictly analogous to Theorems 3 and 6 and
also remarks such as Remark 1(i)(iv) following Theorem 3. In particular, the
analog of (iv) says that F(x1, . . . , xk) has kth order partial derivatives and
4.2 The 4.1
d.f. ofThe
a Random
Cumulative
Vector
Distribution
and Its Properties
Function 93
k
x1x2 xk
(
F x1 , , xk = f x1 , , xk ) ( )
at continuity points of f, where F, or FX, or FX1, , Xk, is the d.f. of X, or the joint
distribution function of X1, . . . , Xk. As in the two-dimensional case,
(
F , , , x j , , , = F j x j ) ( )
is the d.f. of the random variable Xj, and if m xjs are replaced by (1 < m < k),
then the resulting function is the joint d.f. of the random variables correspond-
ing to the remaining (k m) Xjs. All these d.f.s are called marginal distribu-
tion functions.
In Statement 2, we have seen that if X = (X1, . . . , Xk) is an r. vector, then
Xj, j = 1, 2, . . . , k are r.v.s and vice versa. Then the p.d.f. of X, f(x) =
f(x1, . . . , xk), is also called the joint p.d.f. of the r.v.s X1, . . . , Xk.
Consider first the case k = 2; that is, X = (X1, X2), f(x) = f(x1, x2) and set
f x1 , x 2
( )
( )
f1 x1 = x 2
( )
f x1 , x 2 dx 2
f x1 , x 2
( )
( )
f2 x 2 = x 1
( )
f x1 , x 2 dx1 .
f ( x ) = f ( x , x ) = 1,
1 1 1 2
x1 x1 x2
or
Similarly we get the result for f2. Furthermore, f1 is the p.d.f. of X1, and f2 is the
p.d.f. of X2. In fact,
(
f x1 , x2 = f x1 , x2 = f1 x1
) ( ) ( )
( )
P X 1 B = x B , x !
1 2 x B x ! x B 1 2 1
( )
f x1 , x2 dx1dx2 = f x1 , x2 dx2 dx1 = f1 x1 dx1 .
B ! B ! B [ ( ) ] ( )
Similarly f2 is the p.d.f. of the r.v. X2. We call f1, f2 the marginal p.d.f.s. Now
suppose f1(x1) > 0. Then define f(x2|x1) as follows:
) (f (x ) ) .
f x,x
(
f x 2 x1 =
1
1
1
2
94 4 Distribution Functions, Probability Densities, and Their Relationship
f ( x 2 x1 ) = f ( x1 , x 2 ) = ( )
1 1
f1 x1 = 1,
x2 ( )
f1 x1 x2 ( )
f1 x1
f ( x 2 x1 )dx 2 = f ( x ) f ( x1 , x 2 )dx 2 = f ( x ) f1 ( x1 ) = 1.
1 1
1 1 1 1
) (f (x ) )
f x, x
(
f x1 x 2 =
2
1
2
2
and show that f(|x2) is a p.d.f. Furthermore, if X1, X2 are both discrete, the
f(x2|x1) has the following interpretation:
) (f (x ) ) = ( P(X = x ) ) = P(X
f x,x P X =x, X =x
(
f x 2 x1 =
1
1
1
2 1 1
1
2
1
2
2 = x 2 X1 = x1 . )
Hence P(X2 B|X1 = x1) = x B f(x2|x1). For this reason, we call f(|x2) the
2
conditional p.d.f. of X2, given that X1 = x1 (provided f1(x1) > 0). For a similar
reason, we call f(|x2) the conditional p.d.f. of X1, given that X2 = x2 (provided
f2(x2) > 0). For the case that the p.d.f.s f and f2 are of the continuous type,
the conditional p.d.f. f (x1|x2) may be given an interpretation similar to the
one given above. By assuming (without loss of generality) that h1, h2 > 0, one
has
=
(1 h h )P(x < X x + h , x < X x + h )
1 2 1 1 1 1 2 2 2 2
(1 h )P(x < X x + h )
2 2 2 2 2
=
(1 h h )[F (x , x ) + F (x + h , x + h ) F (x , x + h ) F (x
1 2 1 2 1 1 2 2 1 2 2 1 + h1 , x 2 )]
(1 h )[F (x + h ) F (x )] 2 2 2 2 2 2
where F is the joint d.f. of X1, X2 and F2 is the d.f. of X2. By letting h1, h2 0
and assuming that (x1, x2) and x2 are continuity points of f and f2, respectively,
the last expression on the right-hand side above tends to f(x1, x2)/f2(x2) which
was denoted by f(x1|x2). Thus for small h1, h2, h1 f(x1|x2) is approximately equal
to P(x1 < X1 x1 + h1|x2 < X2 x2 + h2), so that h1 f(x1|x2) is approximately the
conditional probability that X1 lies in a small neighborhood (of length h1) of x1,
given that X2 lies in a small neighborhood of x2. A similar interpretation may
be given to f(x2|x1). We can also define the conditional d.f. of X2, given X1 = x1,
by means of
4.2 The 4.1
d.f. ofThe
a Random
Cumulative
Vector
Distribution
and Its Properties
Function 95
f x x ( )
(
)
2 1
F x2 x1 = x x 2 2
x f x x dx ,
2 1 2
2
( )
and similarly for F(x1|x2).
The concepts introduced thus far generalize in a straightforward way for
k > 2. Thus if X = (X1, . . . , Xk) with p.d.f. f(x1, . . . , xk), then we have called
f(x1, . . . , xk) the joint p.d.f. of the r.v.s X1, X2, . . . , Xk. If we sum (integrate)
over t of the variables x1, . . . , xk keeping the remaining s fixed (t + s = k), the
resulting function is the joint p.d.f. of the r.v.s corresponding to the remaining
s variables; that is,
f x1 , , xk
x , , x
( )
(
fi , , i xi , , xi ) = j1 jt
( )
1 s 1 s
f x ,
1 , xk dx j dx j . 1 t
There are
k k k
+ + + =2 2
k
1 2 k 1
such p.d.f.s which are also called marginal p.d.f.s. Also if xi1, . . . , xis are such
that fi1, . . . , it (xi1, . . . , xis) > 0, then the function (of xj1, . . . , xjt) defined by
(
f x1 , , xk )
(
f x j , , x j xi , , xi =
1 t 1 s ) (
fi , , i xi , , xi )
1 s 1 s
is a p.d.f. called the joint conditional p.d.f. of the r.v.s Xj1, . . . , Xjt, given Xi1 =
xi1, , Xjs = xjs, or just given Xi1, . . . , Xis. Again there are 2k 2 joint condi-
tional p.d.f.s involving all k r.v.s X1, . . . , Xk. Conditional distribution func-
tions are defined in a way similar to the one for k = 2. Thus
(
F x j , , x j xi , , xi
1 t 1 t)
f (x , , x x , , x )
j j i i 1 t 1 s
= ( x , , x )( x , , x
j1 jt j1 jt )
x x f x ,
jt
(
j , x j x i , , x i dx j dx j .
j1
1 t 1 s ) 1 t
ii) 1 s
(
fi , , i xi , , xi =
1 s
) n!
xi ! xi ! n r !
x
(
x
pi pi q n r ,
)
i1
1 s
is
1 s
(
q = 1 pi + + pi , r = xi + + xi ;
1 s
) 1 s
that is, the r.v.s Xi1, . . . , Xis and Y = n (Xi1 + + Xis) have the Multinomial
distribution with parameters n and pi1, . . . , pis, q.
(n r )! p j1
x j1
pj t
xj
ii) (
f x j1 , , x jt xi1 , , xi s ) =
x j1 ! x jt ! q
1 ,
q
r = xi1 + + xi s ;
that is, the (joint) conditional distribution of Xj1, . . . , Xjt given Xi1, . . . , Xis is
Multinomial with parameters n r and pj1/q, . . . , pjt/q.
DISCUSSION
i) Clearly,
(X i1 1 s s
) (
= xi , , X i = xi X i + + X i = r = n Y = r = Y = n r ,
1 s
) ( ) ( )
so that
(X i1 s s
) (
= xi , , X i = xi = X i = xi , , X i = xi , Y = n r .
1 1 1 s s
)
Denoting by O the outcome which is the grouping of all n outcomes distinct
from those designated by i1, . . . , is , we have that the probability of O is q, and
the number of its occurrences is Y. Thus, the r.v.s Xi1, . . . , Xis and Y are
distributed as asserted.
ii) We have
) ( f ( x , , x ) ) = f ((x , , x ))
f x , , x ,x , ,x f x, ,x
(
f x j1 , , x jt xi1 , , xi s =
j1
i1
jt i1
is
is 1
i1
k
is
n! 1 k
n! x x
= p1x pkx pi1i1 pi si s q n r
x1! xk ! xi1 ! xi s ! n r ! ( )
p p p p
xi1 xi s x j1 x jt p p q
xi1 xi s x j1 + + x jt
i is j1 jt i is
= 1 1
xi1 ! xi s ! x j1 ! x jt !
xi ! xi ! n r !
1 s ( )
(since n r = n ( x i1 )
+ + xi s = x j1 + + x jt )
=
(n r )! p j1
x j1
pj
t ,
x jt
x j1 ! x jt ! q q
as was to be seen.
EXAMPLE 2 Let the r.v.s X1 and X2 have the Bivariate Normal distribution, and recall that
their (joint) p.d.f. is given by:
4.1 The Cumulative DistributionExercises
Function 97
(
f x1 , x2 = ) 1
2 1 2 1 2
x 2 2
1 x1 1 x2 2 x2 2
exp 1 1
2 + .
(
2 1
2
) 1
1 2 2
We saw that the marginal p.d.f.s f1, f2 are N(1, 21), N(2, 22), respectively;
that is, X1, X2 are also normally distributed. Furthermore, in the process of
proving that f(x1, x2) is a p.d.f., we rewrote it as follows:
x
( )
( )
2 2
x b
(
f x1 , x 2 = ) 1
exp
1
2 12
1 exp
2
2
,
2 1 2 1 2 2
2 2 1
where
2
b = 2 +
1
(
x1 1 . )
Hence
( )= ( )
2
f x1 , x 2 x2 b
(
f x 2 x1 = ) ( )
f1 x1 2 2
1
1 2
exp
2
2
2 2 1
which is the p.d.f. of an N(b, 22(1 2)) r.v. Similarly f(x1|x2) is seen to be the
p.d.f. of an N(b, 21(1 2)) r.v., where
1
b = 1 +
2
(
x2 2 . )
Exercises
4.2.1 Refer to Exercise 3.2.17 in Chapter 3 and:
i) Find the marginal p.d.f.s of the r.v.s Xj, j = 1, , 6;
ii) Calculate the probability that X1 5.
4.2.2 Refer to Exercise 3.2.18 in Chapter 3 and determine:
ii) The marginal p.d.f. of each one of X1, X2, X3;
ii) The conditional p.d.f. of X1, X2, given X3; X1, X3, given X2; X2, X3, given
X1 ;
98 4 Distribution Functions, Probability Densities, and Their Relationship
iii) The conditional p.d.f. of X1, given X2, X3; X2, given X3, X1; X3, given X1,
X2 .
If n = 20, provide expressions for the following probabilities:
iv) P(3X1 + X2 5);
v) P(X1 < X2 < X3);
vi) P(X1 + X2 = 10|X3 = 5);
vii) P(3 X1 10|X2 = X3);
viii) P(X1 < 3X2|X1 > X3).
4.2.3 Let X, Y be r.v.s jointly distributed with p.d.f. f given by f(x, y) = 2/c2
if 0 x y, 0 y c and 0 otherwise.
i) Determine the constant c;
ii) Find the marginal p.d.f.s of X and Y;
iii) Find the conditional p.d.f. of X, given Y, and the conditional p.d.f. of Y,
given X;
iv) Calculate the probability that X 1.
4.2.4 Let the r.v.s X, Y be jointly distributed with p.d.f. f given by f(x, y) =
exy I(0,)(0,) (x, y). Compute the following probabilities:
i) P(X x);
ii) P(Y y);
iii) P(X < Y);
iv) P(X + Y 3).
4.2.5 If the joint p.d.f. f of the r.v.s Xj, j = 1, 2, 3, is given by
( )
f x1 , x 2 , x3 = c 3 e
(
c x1 + x2 + x 3 )
( )
I A x1 , x 2 , x3 ,
where
( ) (
A = 0, 0, 0, , ) ( )
i) Determine the constant c;
ii) Find the marginal p.d.f. of each one of the r.v.s Xj, j = 1, 2, 3;
iii) Find the conditional (joint) p.d.f. of X1, X2, given X3, and the conditional
p.d.f. of X1, given X2, X3;
iv) Find the conditional d.f.s corresponding to the conditional p.d.f.s in (iii).
4.2.6 Consider the function given below:
yx e y
( )
f x y = x!
0,
, x = 0, 1, ; y 0
otherwise.
4.3
4.1 Quantiles
The Cumulative
and Modes
Distribution
of a Distribution
Function 99
i) Show that for each fixed y, f(|y) is a p.d.f., the conditional p.d.f. of an r.v.
X, given that another r.v. Y equals y;
ii) If the marginal p.d.f. of Y is Negative Exponential with parameter = 1,
what is the joint p.d.f. of X, Y?
iii) Show that the marginal p.d.f. of X is given by f(x) = ( 12 )x+1 IA(x), where
A = {0, 1, 2, . . . }.
4.2.7 Let Y be an r.v. distributed as P() and suppose that the conditional
distribution of the r.v. X, given Y = n, is B(n, p). Determine the p.d.f. of X and
the conditional p.d.f. of Y, given X = x.
4.2.8 Consider the function f defined as follows:
x 12 + x 22
( )
f x1 , x 2 =
1
2
exp
2 4e
+
1 3 3
(
x1 x 2 I [ 1, 1][ 1, 1] x1 , x 2 )
and show that:
i) f is a non-Normal Bivariate p.d.f.
ii) Both marginal p.d.f.s
( ) ( )
f1 x1 = f x1 , x 2 dx 2
and
( ) ( )
f2 x 2 = f x1 , x 2 dx1
Typical cases:
F(x) F(x)
1
p
p
x [ ] x
xp 0 0
xp
(a) (b)
F(x)
x
0 xp
(c)
F(x) F(x)
1
1
p
p
x [ ] x
0 xp 0
xp
(d) (e)
Figure 4.3 Observe that the figures demonstrate that, as defined, xp need not be unique.
Consider the number (n + 1)p and set m = [(n + 1)p], where [y] denotes the
largest integer which is y. Then if (n + 1)p is not an integer, f(x) has a unique
mode at x = m. If (n + 1)p is an integer, then f(x) has two modes obtained for
x = m and x = m 1.
PROOF For x 1, we have
n x n x
p q
f x
=
() x
(
f x1
)
n x 1 n x +1
p q
x 1
n!
p x q n x
=
(
x! n x ! ) =
nx +1 p
.
n! x q
p x 1 q n x +1
( )(
x 1! nx +1! )
That is,
f x() =
nx +1 p
.
(
f x1 ) x q
(
x = n+1 p1 )
is a second point which gives the maximum value. "
THEOREM 6 Let X be P(); that is,
x
()
f x = e
x!
, x = 0, 1, 2, , > 0.
f x () =
( )
e x x!
=
.
(
f x1 ) e
[ (x 1)!]
x 1 x
Hence f(x) > f(x 1) if and only if > x. Thus if is not an integer, f(x) keeps
increasing for x [] and then decreases. Then the maximum of f(x) occurs
at x = []. If is an integer, then the maximum occurs at x = . But in this case
f(x) = f(x 1) which implies that x = 1 is a second point which gives
the maximum value to the p.d.f. "
Exercises
4.3.1 Determine the pth quantile xp for each one of the p.d.f.s given in
Exercises 3.2.1315, 3.3.1316 (Exercise 3.2.14 for = 14 ) in Chapter 3 if p =
0.75, 0.50.
4.3.2 Let X be an r.v. with p.d.f. f symmetric about a constant c (that is,
f(c x) = f(c + x) for all x ! ). Then show that c is a median of f.
4.3.3 Draw four graphstwo each for B(n, p) and P()which represent
the possible occurrences for modes of the distributions B(n, p) and P().
4.3.4 Consider the same p.d.f.s mentioned in Exercise 4.3.1 from the point
of view of a mode.
DEFINITION 2 A set G in ! m, m 1, is called open if for every x in G there exists an open cube
in ! m containing x and contained in G; by the term open cube we mean the
Cartesian product of m open intervals of equal length. Without loss of gener-
ality, such cubes may be taken to be centered at x.
LEMMA 2 Every open set in ! n is measurable.
PROOF It is analogous to that of Lemma 1. Indeed, let G be an open set in
! m, and for each x G, consider an open cube centered at x and contained in
G. The union over x, as x varies in G, of such cubes clearly is equal to G. The
same is true if we restrict ourselves to xs in G whose m coordinates are
rationals. Then the resulting cubes are countably many, and therefore their
union is measurable, since so is each cube. "
DEFINITION 3 Recall that a function g: S ! ! is said to be continuous at x0 S if for
every > 0 there exists a = (, x0) > 0 such that |x x0| < implies |g(x) g(x0)|
< . The function g is continuous in S if it is continuous for every x S.
It follows from the concept of continuity that 0 implies 0.
LEMMA 3 Let g: ! ! be continuous. Then g is measurable.
PROOF By Theorem 5 in Chapter 1 it suffices to show that g1(G) are meas-
urable sets for all open intevals G in !. Set B = g1(G). Thus if B = , the
assertion is valid, so let B and let x0 be an arbitrary point of B, so that g(x0)
G. Continuity of g at x0 implies that for every > 0 there exists = (, x0)
> 0 such that |x x0| < implies |g(x) g(x0)| < . Equivalently, x (x0 , x0
+ ) implies g(x) (g(x0) , g(x0) + ). Since g(x0) G and G is open, by
choosing sufficiently small, we can make so small that (g(x0) , g(x0) + )
is contained in G. Thus, for such a choice of and , x (x0 , x0 + ) implies
that (g(x0) , g(x0) + ) G. But B(= g1(G)) is the set of all x ! for which
g(x) G. As all x (x0 , x0 + ) have this property, it follows that (x0 , x0
+ ) B. Since x0 is arbitrary in B, it follows that B is open. Then by Lemma
1, it is measurable. "
The concept of continuity generalizes, of course, to Euclidean spaces of
higher dimensions, and then a result analogous to the one in Lemma 3 also
holds true.
PROOF The proof is similar to that of Lemma 3. The details are presented
here for the sake of completeness. Once again, it suffices to show that g1(G)
are measurable sets for all open cubes G in !m. Set B = g1(G). If B = the
assertion is true, and therefore suppose that B and let x0 be an arbitrary
point of B. Continuity of g at x0 implies that for every > 0 there exists a =
(, x0) > 0 such that ||x x0|| < implies ||g(x) g(x0)|| < ; equivalently, x
S(x0, ) implies g(x) S(g(x0), ), where S(c, r) stands for the open sphere with
center c and radius r. Since g(x0) G and G is open, we can choose so small
that the corresponding is sufficiently small to imply that g(x) S(g(x0), ).
Thus, for such a choice of and , x S(x0, ) implies that g(x) S(g(x0), ).
Since B(= g1(G)) is the set of all x ! k for which g(x) G, and x S(x0, )
implies that g(x) S(g(x0), ), it follows that S(x0, ) B. At this point, observe
that it is clear that there is a cube containing x0 and contained in S(x0, ); call
it C(x0, ). Then C(x0, ) B, and therefore B is open. By Lemma 2, it is also
measurable. "
We may now proceed with the justification of Statement 1.
THEOREM 7 Let X : (S, A) (! k, B k) be a random vector, and let g : (! k, B k) (! m, Bm)
be measurable. Then g(X): (S, A) (! m, Bm) and is a random vector. (That
is, measurable functions of random vectors are random vectors.)
PROOF To prove that [g(X)]1(B) A if B B m, we have
implies that (xj x0j)2 < 2 for j = 1, . . . , k, or |xj x0j| < , j = 1, . . . , k. This last
expression is equivalent to |gj(x) gj(x0)| < , j = 1, . . . , k. Thus the definition
of continuity of gj is satisfied here for = . "
Now consider a k-dimensional function X defined on the sample space S.
Then X may be written as X = (X1, . . . , Xk), where Xj, j = 1, . . . , k are real-
valued functions. The question then arises as to how X and Xj, j = 1, . . . , k are
4.1 The Cumulative DistributionExercises
Function 105
related from a measurability point of view. To this effect, we have the follow-
ing result.
Let X = (X1, . . . , Xk) : (S, A) (! k, B k). Then X is an r. vector if and only if
THEOREM 8 Xj, j = 1, . . . , k are r.v.s.
PROOF Suppose X is an r. vector and let gj, j = 1, . . . , k be the coordinate
functions defined on ! k. Then gjs are continuous by Lemma 5 and therefore
measurable by Lemma 4. Then for each j = 1, . . . , k, gj(X) = gj(X1, . . . , Xk) =
Xj is measurable and hence an r.v.
Next, assume that Xj, j = 1, . . . , k are r.v.s. To show that X is an r. vector,
by special case 3 in Section 2 of Chapter 1, it suffices to show that X1(B) A
for each B = (, x1] (, xk], x1, . . . , xk !. Indeed,
k
X 1 ( B) = (X B) = ( X j ( , x j ], j = 1, , k ) = I X j1 (( , x j ]) A .
j =1
Exercises
4.4.1 If X and Y are functions defined on the sample space S into the real line
!, show that:
Chapter 5
For n = 1, we get
g x f x ()()
[ ( )]
E g X = x
( )( )
g x1 , , xk f x1 , , xk dx1 dxk
106
5.1 Moments of Random Variables 107
and call it the mathematical expectation or mean value or just mean of g(X).
Another notation for E[g(X)] which is often used is g(X), or [g(X)], or just
, if no confusion is possible.
iii) For r > 0, the rth absolute moment of g(X) is denoted by E|g(X)|r and is
defined by:
() () ( )
r
g x f x , x = x1 , , xk
( )
r
Eg X = x
g x ,
( ) ( )
r
1 , xk f x1 , , xk dx1 dxk .
iii) For an arbitrary constant c, and n and r as above, the nth moment and rth
absolute moment of g(X) about c are denoted by E[g(X) c]n, E|g(X) c|r,
respectively, and are defined as follows:
[() ] () (
)
n
g x c f x , x = x1 , , xk
[( ) ]
n
E g X c = x
[( ) ] (
g x ,
)
n
1 , xk c f x1 , , xk dx1 dxk ,
and
() () ( )
r
g x c f x , x = x1 , , xk
( )
r
Eg X c = x
g x ,
( ) ( )
r
1 , xk c f x1 , , xk dx1 dxk .
For c = E[g(X)], the moments are called central moments. The 2nd central
moment of g(X), that is,
{( ) [ ( )]}
2
E g X E g X
[ g( x) Eg(X )] f ( x),
( )
2
x = x1 , , xk
= x
g x ,
[( ) ( )] f ( x , , x )dx
2
1 , xk Eg X 1 k 1 dxk
is called the variance of g(X) and is also denoted by 2[g(X)], or g2( X ), or just
2, if no confusion is possible. The quantity + 2 [ g(X )] = [ g(X )] is called the
standard deviation (s.d.) of g(X) and is also denoted by g(X), or just , if no
confusion is possible. The variance of an r.v. is referred to as the moment of
inertia in Mechanics.
()
x nj f x =
x
(
x nj f x1 , , xk )
( )
E X n
= ( x , ,x )
1 k
j
xn f x ,
( )
j 1 , xk dx1 dxk
( )
x nj f j x j
x
= j
j j j ( )
x n f x dx
j
which is the nth moment of the r.v. Xj. Thus the nth moment of an r.v. X with
p.d.f. f is
x n f x ()
( )
E Xn
= x
()
x n f x dx.
For n = 1, we get
xf x
()
( )
E X = x
()
xf x dx
More specifically, in the definition of E[g(X)], one would expect to use the
p.d.f. of g(X) rather than that of X. Actually, the definition of E[g(X)], as
given, is correct and its justification is roughly as follows: Consider E[g(x)] =
g( x)f ( x)dx and set y = g(x). Suppose that g is differentiable and has an
inverse g1, and that some further conditions are met. Then
x j EX j f j x j
= xj
(
x EX n f x dx
j j j j j ) ( )
which is the nth central moment of the r.v. Xj (or the nth moment of Xj about its
mean).
Thus the nth central moment of an r.v. X with p.d.f. f and mean is
( ) f (x) = (x ) f (x)
n n
x EX
( ) ( )
n n
E X EX =E X = x x
x EX ( ) f (x)dx = (x ) f (x)dx.
n n
(
E X1 EX1 X k EX k ) ( )
n n 1 k
is the (n1, . . . , nk)-central joint moment of X1, . . . , Xk or the (n1, . . . , nk)-joint
moment of X1, . . . , Xk about their means.
4. For g(X1, . . . , Xk) = Xj(Xj 1) (Xj n + 1), j = 1, . . . , k, the quantity
x j x j 1 x j n + 1 f j x j ( ) ( ) ( )
[ ( )
x
E Xj Xj 1 Xj n+1 = ( )] j
x x 1 x n + 1 f x dx
j j j j j j ( ) ( ) ( )
110 5 Moments of Random VariablesSome Moment and Probability Inequalities
is the nth factorial moment of the r.v. Xj. Thus the nth factorial moment of an
r.v. X with p.d.f. f is
(
x x 1 x n + 1 f x ) ( )()
[ ( )
(
E X X 1 X n+1 = x )]
( ) ( )()
x x 1 x n + 1 f x dx.
5.1.2 Basic Properties of the Expectation of an R.V.
From the very definition of E[g(X)], the following properties are immediate.
(E1) E(c) = c, where c is a constant.
(E2) E[cg(X)] = cE[g(X)], and, in particular, E(cX) = cE(X) if X is an
r.v.
(E3) E[g(X) + d] = E[g(X)] + d, where d is a constant. In particular,
E(X + d) = E(X) + d if X is an r.v.
(E4) Combining (E2) and (E3), we get E[cg(X) + d] = cE[g(X)] + d,
and, in particular, E(cX + d) = cE(X) + d if X is an r.v.
[ ( )] [ ( )]
(E4) E nj = 1 cj g j X = nj = 1 cj E g j X .
In fact, for example, in the continuous case, we have
n n
( ) ( ) ( )
E c j g j X = c j g j x1 , , xk f x1 , , xk dx1 dxk
j =1
j =1
n
( )( )
= c j g j x1 , , xk f x1 , , xk dx1 dxk
j =1
[ ( )]
n
= cjE gj X .
j =1
(E7) If E(Xn) exists (that is, E|X|n < ) for some n = 2, 3, . . . , then E(Xn)
also exists for all n = 1, 2, . . . with n < n.
[( ) ] {[ ( ) ] [ ( ) ]}
2
2 g X +d = E g X +d E g X +d
the equality before the last one being true because of (E4).
(V6) 2(X) = E[X(X 1)] + EX (EX)2, if X is an r.v., as is easily seen.
This formula is especially useful in calculating the variance of a
discrete r.v., as is seen below.
Exercises
5.1.1 Verify the details of properties (E1)(E7).
5.1.2 Verify the details of properties (V1)(V5).
5.1.3 For r < r, show that |X|r 1 + |X|r and conclude that if E|X|r < , then
E|X|r for all 0 < r < r.
112 5 Moments of Random VariablesSome Moment and Probability Inequalities
5.1.4 Verify the equality (E[ g( X )] =) g( x) fX ( x)dx = yfY ( y)dy for the
case that X N(0, 1) and Y = g(X) = X2.
5.1.5 For any event A, consider the r.v. X = IA, the indicator of A defined by
IA(s) = 1 for s A and IA(s) = 0 for s Ac, and calculate EXr, r > 0, and also
2(X).
5.1.6 Let X be an r.v. such that
( ) (
P X = c = P X = c = .
2
1
)
Calculate EX, 2(X) and show that
( ).
2 X
(
P X EX c = ) c2
5.1.7 Let X be an r.v. with finite EX.
ii) For any constant c, show that E(X c)2 = E(X EX)2 + (EX c)2;
ii) Use part (i) to conclude that E(X c)2 is minimum for c = EX.
5.1.8 Let X be an r.v. such that EX4 < . Then show that
ii) E(X EX)3 = EX3 3(EX)(EX)2 + 2(EX)3;
ii) E(X EX)4 = EX4 4(EX)(EX3) + 6(EX)2(EX2) 3(EX)4.
5.1.9 If EX4 < , show that:
[ ( )] [ ( )( )]
E X X 1 = EX 2 EX ; E X X 1 X 2 = EX 3 3EX 2 + 2EX ;
x 0 1 2 3 4 5 6
5.1.11 A roulette wheel has 38 slots of which 18 are red, 18 black, and 2
green.
iii) Suppose a gambler is placing a bet of $M on red. What is the gamblers
expected gain or loss and what is the standard deviation?
iii) If the same bet of $M is placed on green and if $kM is the amount
the gambler wins, calculate the expected gain or loss and the standard
deviation.
iii) For what value of k do the two expectations in parts (i) and (ii) coincide?
iv) Does this value of k depend on M?
iv) How do the respective standard deviations compare?
5.1.12 Let X be an r.v. such that P(X = j) = ( 12 )j, j = 1, 2, . . . .
ii) Compute EX, E[X(X 1)];
ii) Use (i) in order to compute 2(X).
5.1.13 If X is an r.v. distributed as U(, ), show that
( )
2
+
EX =
2
, X2
( ) =
12
.
5.1.14 Let the r.v. X be distributed as U(, ). Calculate EXn for any positive
integer n.
5.1.15 Let X be an r.v. with p.d.f. f symmetric about a constant c (that is,
f(c x) = f(c + x) for every x).
ii) Then if EX exists, show that EX = c;
ii) If c = 0 and EX2n+1 exists, show that EX2n+1 = 0 (that is, those moments of X
of odd order which exist are all equal to zero).
5.1.16 Refer to Exercise 3.3.13(iv) in Chapter 3 and find the EX for those s
for which this expectation exists, where X is an r.v. having the distribution in
question.
5.1.17 Let X be an r.v. with p.d.f. given by
x
()
f x = ()
I c ,c x .
c2 ( )
[ ( )] ()
0
EX = 1 F x dx F x dx;
0
114 5 Moments of Random VariablesSome Moment and Probability Inequalities
ii) Use the interpretation of the definite integral as an area in order to give a
geometric interpretation of EX.
5.1.19 Let X be an r.v. of the continuous type with finite EX and p.d.f. f.
ii) If m is a median of f and c is any constant, show that
( )()
c
E X c = E X m + 2 c x f x dx;
m
ii) Utilize (i) in order to conclude that E|X c| is minimized for c = m. (Hint:
Consider the two cases that c m and c < m, and in each one split the
integral from to c and c to in order to remove the absolute value.
Then the fact that f ( x)dx = m f ( x)dx = 21 and simple manipulations
m
prove part (i). For part (ii), observe that m (c x) f ( x)dx 0 whether c m
c
or c < m.)
5.1.20 If the r.v. X is distributed according to the Weibull distribution (see
Exercise 4.1.15 in Chapter 4), then:
1 1 2 2
ii) Show that EX = 1 + , EX 2 = 1 + , so that
2 1
( )
2 X = 1 + 2 1 + 2 ,
where recall that the Gamma function is defined by = 0 t 1 e t dt , ()
> 0;
ii) Determine the numerical values of EX and 2(X) for = 1 and = 12 ,
= 1 and = 2.
n
n n
( )
E X = x p x q n x = x
n!
px qn x
x = 0 x x =1 x! n x ! ( )
=
n
( )
n n1 !
px qn x
( x 1)!(n x)!
x =1
= np
n
(n 1)! p q
( )( ) x 1 n 1 x 1
( ) [( ) ( )]
x
x =11 ! n 1 x 1 !
(n 1)! p q( ) = np p + q
n 1
( )
n 1
= np
n 1 x
x
= np.
x![( n 1) x]!
x= 0
5.2 Expectations
5.1 Moments
and Variances
of Random
of Some
Variables
R.V.s 115
Next,
[ (
E X X 1 )]
n
= x x1 ( ) x! nn! x ! p q x n x
x= 0 ( )
n n( n 1)( n 2)!
= x( x 1)
( 2 x2 )(
n 2 x2 )
p p q
x=2 x( x 1)( x 2)![( n 2) ( x 2)]!
= n( n 1) p 2
n
(n 2)! p q
( )( ) x2 n 2 x2
( x 2)![(n 2) ( x 2)]!
x=2
= n( n 1) p 2 (n 2)! p q( )
n 2
x n 2 x
x![( n 2) x]!
x= 0
= n( n 1) p ( p + q) = n( n 1) p .
2 n 2 2
That is,
[ (
E X X 1 = n n 1 p2 . )] ( )
Hence, by (V6),
( ) [ ( )] ( ) ( )
2
2 X = E X X 1 + EX EX = n n 1 p 2 + np n 2 p 2
2 2 2
= n p np + np n p = np 1 p = npq. 2 2
( )
2. Let X be P(). Then E(X) = 2(X) = . In fact,
x
x
x 1
( )
E X = xe = xe = e
x= 0 x! x =1 x x1 ! (
x =1 x 1 ! ) ( )
x
= e = e e = .
x = 0 x!
Next,
[ ( )] x
E X X 1 = x x 1 e
x= 0
( ) x!
x
x
= x x 1 e ( ) = 2 e = 2 .
x=2 (
x x1 x 2 ! )( x = 0 x! )
Hence EX2 = 2 + , so that, 2(X) = 2 + 2 = .
REMARK 3 One can also prove that the nth factorial moment of X is n; that
is, E[X(X 1) (X n + 1)] = n.
116 5 Moments of Random VariablesSome Moment and Probability Inequalities
( ) 2( (n)!) ,
2n !
( )
E X 2 n+1 = 0, E X 2 n = n
n 0.
In particular, then
( )
E X = 0, ( )
2 X = E X2 = ( ) 2
2 1!
= 1.
In fact,
( ) 1
2
E X 2 n+1 = x 2 n+1 e x 2
dx.
2
But
0
x dx = x 2n +1e x 2 dx + x 2n +1e x 2 dx
2 n +1 x 2 2 2 2
e
0
0
= y 2n +1e y 2 dy + x 2n +1e x 2 dx
2 2
0
= x 2n +1e x 2 dx + x 2n +1e x 2 dx = 0.
2 2
0 0
( )
2 2
= x 2n 1e x 2
+ 2n 1 x 2n 2 e x 2 dx
0 0
( )
2
= 2n 1 x 2n 2 e x 2 dx,
0
( )
m2 n = 2 n 1 m2 n 2 , and similarly,
m 2 n 2 = (2 n 3)m 2 n 4
M
m2 = 1 m0
(
m0 = 1 since m0 = E X 0 = E 1 = 1 . ( ) () )
Multiplying them out, we obtain
5.2 Expectations
5.1 Moments
and Variances
of Random
of Some
Variables
R.V.s 117
( )( )
m2 n = 2 n 1 2 n 3 1
1 2 ( 2 n 3)( 2 n 2)( 2 n 1)( 2 n) (2 n)!
= =
2 ( 2 n 2)( 2 n) (2 1) [2( n 1)](2 n)
=
(2 n)! =
(2 n)! .
2 [1 ( n 1) n] 2 ( n!)
n n
X 2 X
E = 0, = 1.
But
X 1
E
= EX .
( )
Hence
1
( )
E X = 0,
X
2
1 2
= 2 X ( )
and then
1 2
2
( )
X = 1,
so that 2(X) = 2.
2. Let X be Gamma with parameters and . Then E(X) = and
2(X) = 2. In fact,
( ) 1 1
( ) ( )
EX = xx 1 e x dx = x e x dx
0
0
x e x x 1 e x dx
( ) 0
= x de x =
0
( )
0
( )
= x 1 e x dx = .
0
118 5 Moments of Random VariablesSome Moment and Probability Inequalities
Next,
( ) (1) ( )
E X2 = 0 x +1 e x dx = 2 + 1
and hence
( ) ( )
2 X = 2 + 1 2 2 = 2 + 1 = 2 . ( )
REMARK 5
ii) If X is r2 , that is, if = r/2, = 2, we get E(X) = r, 2(X) = 2r.
ii) If X is Negative Exponential, that is, if = 1, = 1/, we get E(X) = 1/,
2(X) = 1/2.
3. Let X be Cauchy. Then E(Xn) does not exist for any n 1. For example,
for n = 1, we get
x dx
I= .
( )
2
2 + x
1 x dx
1 1 d x
2
( )
1 + x 2 2 1 + x 2
I= =
1 1 d 1+ x
2
( 1 ) ( )
=
2 1+ x 2
=
2
log 1 + x 2
=
1
2
(
, )
which is an indeterminate form. Thus the Cauchy distribution is an example of
a distribution without a mean.
REMARK 6 In somewhat advanced mathematics courses, one encounters
sometimes the so-called Cauchy Principal Value Integral. This coincides with
the improper Riemann integral when the latter exists, and it often exists even
if the Riemann integral does not. It is an improper integral in which the limits
are taken symmetrically. As an example, for = 1, = 0, we have, in terms of
the principal value integral,
I * = lim
A
1 A xdx
A 1+ x 2
=
1
lim log 1 + x 2
2 A
( ) A
A
=
1
[ ( )
lim log 1 + A 2 log 1 + A 2 = 0.
2 A
( )]
5.1 Moments of Random Exercises
Variables 119
Thus the mean of the Cauchy exists in terms of principal value, but not in the
sense of our definition which requires absolute convergence of the improper
Riemann integral involved.
Exercises
5.2.1 If X is an r.v. distributed as B(n, p), calculate the kth factorial moment
E[X(X 1) (X k + 1)].
5.2.2 An honest coin is tossed independently n times and let X be the r.v.
denoting the number of Hs that occur.
iii) Calculate E(X/n), 2(X/n);
iii) If n = 100, find a lower bound for the probability that the observed
frequency X/n does not differ from 0.5 by more than 0.1;
iii) Determine the smallest value of n for which the probability that X/n does
not differ from 0.5 by more 0.1 is at least 0.95;
iv) If n = 50 and P(|(X/n) 0.5| < c) 0.9, determine the constant c. (Hint: In
(ii)(iv), utilize Tchebichevs inequality.)
5.2.3 Refer to Exercise 3.2.16 in Chapter 3 and suppose that 100 people are
chosen at random. Find the expected number of people with blood of each one
of the four types and the variance about these numbers.
5.2.4 If X is an r.v. distributed as P(), calculate the kth factorial moment
E[X(X 1) (X k + 1)].
5.2.5 Refer to Exercise 3.2.7 in Chapter 3 and find the expected number of
particles to reach the portion of space under consideration there during time
t and the variance about this number.
5.2.6 If X is an r.v. with a Hypergeometric distribution, use an approach
similar to the one used in the Binomial example in order to show that
(
mnr m + n r ) .
EX =
mr
, ( )
2 X =
m+n
(m + n) (m + n 1)
2
n 1
x
f ( x)dx =
e .
x= 0 x!
Conclude that in this case, one may utilize the Incomplete Gamma tables (see,
for example, Tables of the Incomplete -Function, Cambridge University
Press, 1957, Karl Paerson, editor) in order to evaluate the d.f. of a Poisson
distribution at the points j = 1, 2, . . . .
5.2.9 Refer to Exercise 3.3.7 in Chapter 3 and suppose that each TV tube
costs $7 and that it sells for $11. Suppose further that the manufacturer sells an
item on money-back guarantee terms if the lifetime of the tube is less than c.
ii) Express his expected gain (or loss) in terms of c and ;
ii) For what value of c will he break even?
5.2.10 Refer to Exercise 4.1.12 in Chapter 4 and suppose that each bulb costs
30 cents and sells for 50 cents. Furthermore, suppose that a bulb is sold under
the following terms: The entire amount is refunded if its lifetime is <1,000 and
50% of the amount is refunded if its lifetime is <2,000. Compute the expected
gain (or loss) of the dealer.
5.2.11 If X is an r.v. having the Beta distribution with parameters and ,
then
ii) Show that
EX n =
( ) ( ),
+ +n
n = 1, 2, ;
( ) ( + + n)
ii) Use (i) in order to find EX and 2(X).
5.2.12 Let X be an r.v. distributed as Cauchy with parameters and 2. Then
show that E|X| = .
5.2.13 If the r.v. X is distributed as Lognormal with parameters and ,
compute EX, 2(X).
5.2.14 Suppose that the average monthly water consumption by the resi-
dents of a certain community follows the Lognormal distribution with = 104
cubic feet and = 103 cubic feet monthly. Compute the proportion of the
residents who consume more than 15 103 cubic feet monthly.
5.2.15 Let X be an r.v. with finite third moment and set = EX, 2 = 2(X).
Define the (dimensionless quantity, pure number) 1 by
3
X
1 = E .
1 is called the skewness of the distribution of the r.v. X and is a measure of
asymmetry of the distribution. If 1 > 0, the distribution is said to be skewed to
5.1 Moments of Random Exercises
Variables 121
the right and if 1 < 0, the distribution is said to be skewed to the left. Then show
that:
iii) If the p.d.f. of X is symmetric about , then 1 = 0;
iii) The Binomial distribution B(n, p) is skewed to the right for p < 1
2
and is
skewed to the left for p > 12 ;
iii) The Poisson distribution P() and the Negative Exponential distribution
are always skewed to the right.
5.2.16 Let X be an r.v. with EX4 < and define the (pure number) 2 by
4
X
2 = E
3, where = EX , 2 = 2 X . ( )
2 is called the kurtosis of the distribution of the r.v. X and is a measure of
peakedness of this distribution, where the N(0, 1) p.d.f. is a measure of
reference. If 2 > 0, the distribution is called leptokurtic and if 2 < 0, the
distribution is called platykurtic. Then show that:
ii) 2 < 0 if X is distributed as U(, );
ii) 2 > 0 if X has the Double Exponential distribution (see Exercise 3.3.13(iii)
in Chapter 3).
5.2.17 Let X be an r.v. taking on the values j with probability pj = P(X = j),
j = 0, 1, . . . . Set
()
G t = pj t j ,
j =0
1 t 1.
[ ( ) (
E X X 1 X k +1 = )] dk
dt k
Gt () t =1 ;
n j n j
p q , j 0, 0 < p < 1, q = 1 p
j
and
j
e , j 0, > 0;
j!
122 5 Moments of Random VariablesSome Moment and Probability Inequalities
iv) Utilize (ii) and (iii) in order to calculate the kth factorial moments of X
being B(n, p) and X being P(). Compare the results with those found in
Exercises 5.2.1 and 5.2.4, respectively.
Thus
x f x x ( )
(
)
2 2 1
E X 2 X1 = x1 = x 2
x f x x dx ,
2 2 1 2 ( )
[ ( )] f (x x )
2
x 2 E X 2 X1 = x1
( )
2 1
2 X 2 X1 = x1 = x 2
[ ( )] f (x x )dx .
2
x 2 E X 2 X1 = x1 2 1 2
For example, if (X1, X2) has the Bivariate Normal distribution, then f(x2 |x1)
is the p.d.f. of an N(b, 22 (1 2)) r.v., where
2
b = 2 +
1
(
x1 1 . )
Hence
(
E X 2 X1 = x1 = 2 + ) 2
1
(x1 1 . )
Similarly,
(
E X1 X 2 = x 2 = 1 + ) 1
2
(
x2 2 . )
Let X1, X2 be two r.v.s with joint p.d.f f(x1, x2). We just gave the definition
of E(X2|X1 = x1) for all x1 for which f(x2|x1) is defined; that is, for all x1 for which
fX (x1) > 0. Then E(X2|X1 = x1) is a function of x1. Replacing x1 by X1 and writing
1
E(X2|X1) instead of E(X2|X1 = x1), we then have that E(X2|X1 ) is itself an r.v.,
and a function of X1. Then we may talk about the E[E(X2|X1)]. In connection
with this, we have the following properties:
5.3 Conditional
5.1 Moments of Random Variables 123
[( )] (
E E X 2 X1 = x 2 f x 2 x1 dx 2 fX x1 dx1
) 1
( )
( ) ( )
= x 2 f x 2 x1 fX x1 dx 2 dx1
1
( ) ( )
= x 2 f x1 , x 2 dx 2 dx1 = x 2 f x1 , x 2 dx1 dx 2
( )
= x 2 f x1 , x 2 dx1 dx 2 = x 2 fX x 2 dx 2 = E X 2 . ( ) ( )
REMARK 7 Note that here all interchanges of order of integration are legiti-
mate because of the absolute convergence of the integrals involved.
(CE2) Let X1, X2 be two r.v.s, g(X1) be a (measurable) function of X1
and let that E(X2) exists. Then for all x1 for which the
conditional expectations below exist, we have
[ ( ) ] ( )(
E X 2 g X1 X1 = x1 = g x1 E X 2 X1 = x1 )
or
[ ( ) ] ( )(
E X 2 g X1 X1 = g X1 E X 2 X1 . )
Again, restricting ourselves to the continuous case, we have
[ ( ) ] ( )( ) ( ) ( )
E X 2 g X1 X1 = x1 = x 2 g x1 f x 2 x1 dx 2 = g x1 x 2 f x 2 x1 dx 2
( ) (
= g x1 E X 2 X1 = x1 . )
In particular, by taking X2 = 1, we get
(CE2) For all x1 for which the conditional expectations below exist, we
have E[g(X1)|X1 = x1] = g(x1) (or E[g(X1)|X1] = g(X1)).
(CV) Provided the quantities which appear below exist, we have
[(
2 E X 2 X1 2 X 2)] ( )
and the inequality is strict, unless X2 is a function of X1 (on a set of
probability one).
Set ( ) ( )
= E X 2 , X1 = E X 2 X1 . ( )
124 5 Moments of Random VariablesSome Moment and Probability Inequalities
Then
( ) ( ) = E{[X (X )] + [ (X ) ]}
2 2
2 X2 = E X2 2 1 1
Next,
{[ ( )][ ( ) ]}
E X 2 X1 X1
= E[ X ( X )] E[ ( X )] E( X ) + E[ ( X )]
2 1
2
1 2 1
= E{E[ X ( X ) X ]} E[ ( X )] E[ E( X X )]
2 1 1
2
1 2 1
+ E[ ( X )] (by (CE1)),
1
[ ( )] [ ( )]
E 2 X1 E 2 X1 E X1 + E X1 [ ( )] [ ( )] (by (CE2)),
which is 0. Therefore
( ) [ ( )] [( ) ]
2 2
2 X 2 = E X 2 X1 + E X1 ,
and since
[ ( )]
2
E X 2 X1 0,
we have
( ) [( )
2 X 2 E X1 2 = 2 E X 2 X1 . ] [( )]
The inequality is strict unless
[ ( )]
2
E X 2 X1 = 0.
But
[ ( )] [ ( )] [ ( )]
2
E X 2 X1 = 2 X 2 X1 , since E X 2 X1 = = 0.
Exercises
5.3.1 Establish properties (CEI) and (CE2) for the discrete case.
5.3.2 Let the r.v.s X, Y be jointly distributed with p.d.f. given by
5.4 Some Important Applications: Probability
5.1 Moments andof
Moment
Random
Inequalities
Variables 125
(
f x, y = ) 2
(
n n+1 )
if y = 1, . . . , x; x = 1, . . . , n, and 0 otherwise. Compute the following
(
quantities: E(X|Y = y), E(Y|X = x). Hint: Recall that nx =1 x = ( 2 ) , and
n n+1
x =1 x =
n 2 n ( )(
n + 1
6 )
2 n + 1)
.
5.3.3 Let X, Y be r.v.s with p.d.f. f given by f(x, y) = (x + y)I(0,1)(0,1)(x, y).
Calculate the following quantities: EX, 2(X), EY, 2(Y), E(X|Y = y),
2(X|Y = y).
5.3.4 Let X, Y be r.v.s with p.d.f. f given by f(x, y) = 2e(x+y)I(0,)(0,)(x, y).
Calculate the following quantities: EX, 2(X), EY, 2(Y), E(X|Y = y),
2(X|Y = y).
5.3.5 Let X be an r.v. with finite EX. Then for any r.v. Y, show that
E[E(X|Y)] = EX. (Assume the existence of all p.d.f.s needed.)
5.3.6 Consider the r.v.s X, Y and let h, g be (measurable) functions on ! into
itself such that E[h(X)g(Y)] and Eg(X) exist. Then show that
[( )( )
E h X gY X=x =h x E gY X=x. ] ()[( ) ]
[ ( ) ] [ c ].
E g ( X)
Pg X c
[ ( )] ( )( )
E g X = g x1 , , xk f x1 , , xk dx1 dxk
( )( ) (
= g x1 , , xk f x1 , , xk dx1 dxk + c g x1 , , xk
A A
)
f ( x , , x )dx
1 k 1 dxk ,
[ ( )] ( )( )
E g X g x1 , , xk f x1 , , xk dx1 dxk
A
c f ( x , , x )dx dx
1 k 1 k
A
( )=
2
EX 2 X
[ ]
2
P X c = P X c 2
2
= .
c2 c 2
c 2
[
P X k ] k
1
2
k
1
[
; equivalently, P X < k 1 2 . ]
REMARK 8 Let X be an r.v. with mean and variance 2 = 0. Then
Tchebichevs inequality gives: P[|X | c] = 0 for every c > 0. This result and
Theorem 2, Chapter 2, imply then that P(X = ) = 1 (see also Exercise 5.4.6).
LEMMA 1 Let X and Y be r.v.s such that
( )
E X = E Y = 0, ( ) ( )
2 X = 2 Y = 1.( )
Then
( )
E 2 XY 1 or, equivalently, 1 E XY 1, ( )
and
( )
E XY = 1 if any only if ( )
P Y = X = 1,
E( XY ) = 1 if any only if P(Y = X ) = 1.
PROOF We have
(
0 E X Y )
2
(
= E X 2 2 XY + Y 2 )
( )
= EX 2 2 E XY + EY 2 = 2 2 E XY ( )
5.4 Some Important Applications: Probability
5.1 Moments andof
Moment
Random
Inequalities
Variables 127
and
0 E X +Y( )
2
(
= E X 2 + 2 XY + Y 2 )
( )
= EX + 2 E XY + EY 2 = 2 + 2 E XY .
2
( )
Hence E(XY) 1 and 1 E(XY), so that 1 E(XY) 1. Now let P(Y = X)
= 1. Then E(XY) = EY2 = 1, and if P(Y = X) = 1, then E(XY) = EY2 = 1.
Conversely, let E(XY) = 1. Then
( ) ( ) [E(X Y )] = E(X Y )
2 2 2
2 X Y = E X Y
= EX 2 2 E( XY ) + EY = 1 2 + 1 = 0,
2
(
P X = Y = 1. )
THEOREM 2 (CauchySchwarz inequality) Let X and Y be two random variables with
means 1, 2 and (positive) variances 12 , 22 , respectively. Then
[( )(
E 2 X 1 Y 2 12 22 , )]
or, equivalently,
[(
1 2 E X 1 Y 2 1 2 , )( )]
and
[(
E X 1 Y 2 )( )] = 1 2
if and only if
P Y = 2 + 2 X 1 = 1
( )
1
and
[(
E X 1 Y 2 )( )] = 1 2
if and only if
P Y = 2 2 X 1 = 1.
1
( )
PROOF Set
X 1 Y 2
X1 = , Y1 = .
1 2
128 5 Moments of Random VariablesSome Moment and Probability Inequalities
(
E 2 X1Y1 1 )
if and only if
1 E X1Y1 1( )
becomes
[(
E 2 X 1 Y 2 )( )] 1
1
2 2
2
if and only if
[( )(
1 2 E X 1 Y 2 1 2 . )]
The second half of the conclusion follows similarly and will be left as an
exercise (see Exercise 5.4.6).
REMARK 9 A more familiar form of the CauchySchwarz inequality is
E2(XY) (EX2)(EY2). This is established as follows: Since the inequality is
trivially true if either one of EX2, EY2 is , suppose that they are both finite
and set Z = X Y, where is a real number. Then 0 EZ2 = (EX2)2
2[E(XY)] + EY2 for all , which happens if and only if E2(XY) (EX2)(EY2)
0 (by the discriminant test for quadratic equations), or E2(XY) (EX2)(EY2).
Exercises
5.4.1 Establish Theorem 1 for the discrete case.
5.4.2 Let g be a (measurable) function defined on ! into (0, ). Then, for any
r.v. X and any > 0,
Eg X( ).
[( ) ]
P g X
If furthermore g is even (that is, g(x) = g(x)) and nondecreasing for x 0, then
Eg X ( ).
(
P X ) g ( )
5.4.3 For an r.v. X with EX = and 2(X) = 2, both finite, use Tchebichevs
inequality in order to find a lower bound for the probability P(|X | < k).
Compare the lower bounds for k = 1, 2, 3 with the respective probabilities
when X N(, 2).
5.5 Covariance, Correlation
5.1 Coefficient
Moments of and
Random
Its Interpretation
Variables 129
[( )(
X 1 Y 2 E X 1 Y 2
= E =
)] = Cov( X , Y )
1 2 1 2 1 2
=
( )
E XY 1 2
.
1 2
2
Y = 2
1
(
X 1 )
with probability 1. So = 1 means X and Y are linearly related. From this
stems the significance of as a measure of linear dependence between X and
Y. (See Fig. 5.1.) If = 0, we say that X and Y are uncorrelated, while if = 1,
we say that X and Y are completely correlated (positively if = 1, negatively if
= 1).
130 5 Moments of Random VariablesSome Moment and Probability Inequalities
Y
$2
#2 , $1 #1 $2
y ! #2 , $1 (x " # 1 )
#2
$2
y ! #2 " $1 (x " # 1 )
$2
#2 " $1 #1
X
0 #1
Figure 5.1
For 1 < < 1, 0, we say that X and Y are correlated (positively if > 0,
negatively if < 0). Positive values of may indicate that there is a tendency of
large values of Y to correspond to large values of X and small values of Y to
correspond to small values of X. Negative values of may indicate that small
values of Y correspond to large values of X and large values of Y to small
values of X. Values of close to zero may also indicate that these tendencies
are weak, while values of close to 1 may indicate that the tendencies are
strong.
The following elaboration sheds more light on the intuitive interpreta-
tion of the correlation coefficient (= (X, Y)) as a measure of co-linearity
of the r.v.s X and Y. To this end, for > 0, consider the line y = 2 + ( x 1 ) 2
1
in the xy-plane and let D be the distance of the (random) point (X, Y)
from the above line. Recalling that the distance of the point (x0, y0) from the
line ax + by + c = 0 is given by ax0 + by0 + c a 2 + b 2 . we have in the present
case:
1 12
D= X Y + 1 2 1 1+ ,
2 2 22
1
since here a = 1, b = and c = 1 2 1 . Thus,
2 2
2
12
D = X 1 Y + 1 2 1
2
1 + ,
2 2 22
and we wish to evaluate the expected squared distance of (X, Y) from the line
y = 2 + ( x 1 ) ; that is, ED2. Carrying out the calculations, we find
2
1
( 2
1 ) (
+ 22 D 2 = 22 X 2 + 12Y 2 2 1 2 XY + 2 2 1 2 2 1 X )
( ) ( )
2
2 1 1 2 2 1 Y + 1 2 2 1 . (1)
EX 2 = 12 + 12 , EY 2 = 22 + 22 ( )
and E XY = 1 2 + 1 2 ,
we obtain
2 12 22
ED2 =
12 + 22
(1 ) ( > 0). (2)
2 12 22
ED2 =
12 + 22
(1+ ) ( < 0). (3)
fore, regardless of the value of , by the observation just made, relations (2)
and (3) are summarized by the expression
ED2 =
2 12 22
(
12 + 22
)
1 . (4)
to this line as gets closer to 1, and lie on the line for = 1. For < 0, the
pairs (X, Y) tend to be arranged along the line y = 2 ( x 1 ) . These
2
1
points get closer and closer to this line as gets closer to 1, and lie on
this line for = 1. For = 0, the expected distance is constantly equal to
2 12 22 /( 12 + 22 ) from either one of the lines y = 2 + ( x 1 ) and
2
and y1 = y , we move from the xy-plane to the x1y1-plane. In this latter plane,
2
2
look at the point (X1, Y1) and seek the line Ax1 + By1 + C = 0
from which the expected squared distance of (X1, Y1) is minimum. That
is, determine the coefficients A, B and C, so that ED12 is minimum, where
132 5 Moments of Random VariablesSome Moment and Probability Inequalities
2 AB C2
ED12 = 1 + + . (5)
A2 + B2 A2 + B2
2 AB
ED12 = 1 + . (6)
A2 + B2
(
A+ B )
2
( ) ( )
= A2 + B2 2 AB 0 A2 + B2 2 AB = A B , ( )
2
or equivalently,
2 AB
1 1. (7)
A2 + B2
respectively.
5.1 Moments of Random Exercises
Variables 133
Exercises
5.5.1 Let X be an r.v. taking on the values 2, 1, 1, 2 each with probability
1 . Set Y = X2 and compute the following quantities: EX, 2(X), EY, 2(Y),
4
(X, Y).
5.5.2 Go through the details required in establishing relations (2), (3) and
(4).
5.5.3 Do likewise in establishing relation (5).
5.5.4 Refer to Exercise 5.3.2 (including the hint given there) and
iii) Calculate the covariance and the correlation coefficient of the r.v.s X and
Y;
iii) Referring to relation (4), calculate the expected squared distance of (X,
Y) from the appropriate line y = 2 + ( x 1 ) or y = 2 ( x 1 )
2
1
2
1
(which one?);
iii) What is the minimum expected squared distance of (X1, Y1) from
the appropriate line y = x or y = x (which one?) where X 1 = X and 1
1
Y1 = Y .
2
2
( ) .
2
n n n+1
Hint: Recall that
x = 2
3
x =1
5.5.5 Refer to Exercise 5.3.2 and calculate the covariance and the correla-
tion coefficient of the r.v.s X and Y.
5.5.6 Do the same in reference to Exercise 5.3.3.
5.5.7 Repeat the same in reference to Exercise 5.3.4.
Y( ) X, Y .
= EY EX , =
(X )
( )
(The r.v. Y = X + is called the best linear predictor or Y, given X.)
5.5.11 If the r.v.s X1 and X2 have the Bivariate Normal distribution, show
that the parameter is, actually, the correlation coefficient of X1 and X2. (Hint:
Observe that the exponent in the joint p.d.f. of X1 and X2 may be written as
follows:
1 x 2 x 1 1 x 2 2 x 2 2
2
1 1
2 +
(
2 1 2 ) 1
1 2 2
(x ) (x )
2 2
1 b 2
=
1
2 2
+
2
2
, where b = 2 +
1
( )
x 1 1 .
1
2 2 1 2
5.5.12 If the r.v.s X1 and X2 have jointly the Bivariate Normal distribution
with parameters 1, 2, 12 , 22 and , calculate E(c1X1 + c2X2) and 2(c1X1 +
c2X2) in terms of the parameters involved, where c1 and c2 are real constants.
m n m n
Cov X i , Yj = Cov X i , Yj .
i =1 j =1 i =1 j =1
( )
1 if s A
()
IA s =
0 if s A .
c
n
I I Aj = I A
n
j=1 j
(8)
j =1
n
I Aj = I A , (9)
n
j=1
j =1
j
and, in particular,
IA = 1 IA.
c (10)
Clearly,
( )
E IA = P A ( ) (11)
(1 X )(1 X ) (1 X ) = 1 H ( )
r
1 2 r 1 + H 2 + 1 H r , (12)
where Hj stands for the sum of the products Xi Xi , where the summation 1 j
extends over all subsets {i1, i2, . . . , ij} of the set {1, 2, . . . , r}, j = 1, . . . , r. Let
, be such that: 0 < , and + r. Then the following is true:
+
Xi
j
1
X i H J = ( )
H + , (13)
where J = {i1, . . . , i} is the typical member of all subsets of size of the set
{1, 2, . . . , r}, H(J) is the sum of the products Xj Xj , where the summation
1
extends over all subsets of size of the set {1, . . . , r} J, and J is meant to
r r r! r ! ( )
(
! r ! ! r !
=
) ( )
r r!
+
+ ! (
r ! )( )
=
( + )! = + .
! !
where the summation extends over all choices of subsets Jm = {i1, . . . , im} of the
set {1, 2, . . . , M} and Bm is the one used in Theorem 9, Chapter 2. Hence
IB = IA
m i1 Aim Aicm + 1 AicM
Jm
= IA IA 1 IA
Jm
i1 im ( im + 1 ) (1 I ) (by (8), (9), (10))
AiM
( )
= I A I A 1 H1 J m + H 2 J m + 1 ( ) ( ) ( )
H M m J m
M m
J m
i1
im
(by (12)).
Since
IA
Jm
i1 im
m + k
I A H k Jm = ( )
m
H m+ k (by (13)),
we have
m + 1 m + 2 M
( )
M m
I B = Hm H m+1 + H m+ 2 + 1 HM .
m
m m m
Taking expectations of both sides, we get (from (11) and the definition of Sr in
Theorem 9, Chapter 2)
m + 1 m + 2 M
( ) ( )
M m
P Bm = S m S m+1 + S m+ 2 + 1 SM ,
m m m
as was to be proved.
5.6* Justification
5.1 Moments
of Relation
of Random
(2) in Chapter
Variables
2 137
(For the proof just completed, also see pp. 8085 in E. Parzens book
Modern Probability Theory and Its Applications published by Wiley, 1960.)
REMARK 10 In measure theory the quantity IA is sometimes called the char-
acteristic function of the set A and is usually denoted by A. In probability
theory the term characteristic function is reserved for a different concept and
will be a major topic of the next chapter.
138 6 Characteristic Functions, Moment Generating Functions and Related Theorems
Chapter 6
Characteristic Functions,
Moment Generating Functions
and Related Theorems
6.1 Preliminaries
The main subject matter of this chapter is the introduction of the concept of
the characteristic function of an r.v. and the discussion of its main properties.
The characteristic function is a powerful mathematical tool, which is used
profitably for probabilistic purposes, such as producing the moments of an r.v.,
recovering its distribution, establishing limit theorems, etc. To this end, recall
that for z !, eiz = cos z + i sin z, i = 1 , and in what follows, i may
be treated formally as a real number, subject to its usual properties: i2 = 1,
i3 = i, i 4 = 1, i 5 = i, etc.
The sequence of lemmas below will be used to justify the theorems which
follow, as well as in other cases in subsequent chapters. A brief justification for
some of them is also presented, and relevant references are given at the end of
this section.
( ) ( )
g1 x j g 2 x j , j = 1, 2, ,
PROOF If the summations are finite, the result is immediate; if not, it follows
by taking the limits of partial sums, which satisfy the inequality. "
b
LEMMA A Let g1, g2 : ! [0, ) be such that g1(x) g2(x), x !, and that a g1 (x)dx
exists for every a, b, ! with a < b, and that g 2 (x)dx < . Then g1(x)dx
< .
138
6.5 The Moment Generating
6.1 Preliminaries
Function 139
LEMMA B Let g : {x1, x2, . . .} ! and xj|g(xj)| < . Then xj g(xj) also converges.
PROOF The result is immediate for finite sums, and it follows by taking the
limits of partial sums, which satisfy the inequality. "
b
LEMMA B Let g : ! ! be such that a g(x)dx exists for every a, b, ! with a < b, and that
|g( x ) |dx < . Then g(x)dx also converges.
PROOF Same as above replacing sums by integrals. "
The following lemma provides conditions under which the operations
of taking limits and expectations can be interchanged. In more advanced
probability courses this result is known as the Dominated Convergence
Theorem.
LEMMA C Let {Xn}, n = 1, 2, . . . , be a sequence of r.v.s, and let Y, X be r.v.s such
that |Xn(s)| Y(s), s S, n = 1, 2, . . . and Xn(s) X(s) (on a set of ss of
probability 1) and E(Y) < . Then E(X) exists and E(Xn) n
E(X), or
equivalently,
( ) (
lim E X n = E lim X n .
n n
)
REMARK 1 The index n can be replaced by a continuous variable.
The next lemma gives conditions under which the operations of differen-
tiation and taking expectations commute.
LEMMA D For each t T (where T is ! or an appropriate subset of it, such as the interval
[a, b]), let X(; t) be an r.v. such that (t)X(s; t) exists for each s S and
t T. Furthermore, suppose there exists an r.v. Y with E(Y) < and such
that
t
( )
X s; t Y s ,() s S , t T .
Then
d
dt [ ( )]
( )
E X ; t = E X ; t , for all t T .
t
The proofs of the above lemmas can be found in any book on real vari-
ables theory, although the last two will be stated in terms of weighting func-
tions rather than expectations; for example, see Advanced Calculus, Theorem
2, p. 285, Theorem 7, p. 292, by D. V. Widder, Prentice-Hall, 1947; Real
Analysis, Theorem 7.1, p. 146, by E. J. McShane and T. A. Botts, Van
Nostrand, 1959; The Theory of Lebesgue Measure and Integration, pp. 6667,
by S. Hartman and J. Mikusinski, Pergamon Press, 1961. Also Mathematical
Methods of Statistics, pp. 4546 and pp. 6668, by H. Cramr, Princeton
University Press, 1961.
140 6 Characteristic Functions, Moment Generating Functions and Related Theorems
() [ ( ) ( ) ( ) ( )]
e itx f x = cos tx f x + i sin tx f x
()
X t = E e itX [ ]
= x x
() [ ( ) ( ) ( ) ( )]
e f x dx =
cos tx f x + i sin tx f x dx
itx
[ ( ) ( )] [ ( ) ( )]
cos tx f x + i sin tx f x
= x x
( )() ( )()
cos tx f x dx + i sin tx f x dx.
By Lemmas A, A, B, B, X(t) exists for all t !. The ch.f. X is also called the
Fourier transform of f.
The following theorem summarizes the basic properties of a ch.f.
THEOREM 1 (Some properties of ch.fs)
i) X(0) = 1.
ii)|X(t)| 1.
iii)X is continuous, and, in fact, uniformly continuous.
iv) X+d(t) = eitdX(t), where d is a constant.
v) cX(t) = X(ct), where c is a constant.
vi) cX+d(t) = eitdX(ct).
dn
vii)
dt n
X t () = inE(Xn), n = 1, 2, . . . , if E|Xn| < .
t =0
PROOF
i) X(t) = EeitX. Thus X(0) = Eei0X = E(1) = 1.
ii) |X(t)| = |EeitX| E|eitX| = E(1) = 1, because |eitX| = 1. (For the proof of the
inequality, see Exercise 6.2.1.)
( )
iii) X t + h X t = E e
() ( )
i t+ h X
e itX
[ ( )] ( )
= E e itX e ihX 1 E e itX e ihX 1
= E e ihX 1.
Then
6.2 Definitions and Basic TheoremsThe
6.5 The Moment
One-Dimensional
Generating Function
Case 141
h0
( ) ()
lim X t + h X t lim E e ihX 1 = E lim e ihX 1 = 0,
h0 [ h0 ]
provided we can interchange the order of lim and E, which here can be
done by Lemma C. We observe that uniformity holds since the last ex-
pression on the right is independent of t.
iv) X+d(t) = Eeit(X+d) = E(eitXeitd) = eitd EeitX = eitd X(t).
v) cX(t) = Eeit(cX) = Eei(ct)X = X(ct).
vi) Follows trivially from (iv) and (v).
n
vii)
dn
dt n X
dn
()
t = n Ee itX = E n e itX = E i n X n e itX ,
dt t
( )
provided we can interchange the order of differentiation and E. This can
be done here, by Lemma D (applied successively n times to X and its
n 1 first derivatives), since E|X n| < implies E|X k| < , k = 1, . . . , n
(see Exercise 6.2.2). Thus
dn
dt n
X t () ( )
= inE X n . "
t =0
REMARK
n
2 From part (vii) of the theorem we have that E(Xn) =
(i) n d
dt n
X(t)|t = 0, so that the ch.f. produces the nth moment of the r.v.
REMARK 3 If X is an r.v. whose values are of the form x = a + kh, where a,
h are constants, h > 0, and k runs through the integral values 0, 1, . . . , n or 0,
1, . . . , or 0, 1, . . . , n or 0, 1, . . . , then the distribution of X is called a lattice
distribution. For example, if X is distributed as B(n, p), then its values are of
the form x = a + kh with a = 0, h = 1, and k = 0, 1, . . . , n. If X is distributed as
P(), or it has the Negative Binomial distribution, then again its values are of
the same form with a = 0, h = 1, and k = 0, 1, . . . . If now is the ch.f. of X, it
can be shown that the distribution of X is a lattice distribution if and only if
|(t)| = 1 for some t 0. It can be readily seen that this is indeed the case in the
cases mentioned above (for example, (t) = 1 for t = 2). It can also be shown
that the distribution of X is a lattice distribution, if and only if the ch.f. is
periodic with period 2 (that is, (t + 2) = (t), t !).
In the following result, the ch.f. serves the purpose of recovering the
distribution of an r.v. by way of its ch.f.
THEOREM 2 (Inversion formula) Let X be an r.v. with p.d.f. f and ch.f. . Then if X is of the
discrete type, taking on the (distinct) values xj, j 1, one has
( ) 1
()
T itx j
i) f x j = lim
T 2T
T
e t dt , j 1.
1 e ith itx
() 1
()
T
ii) f x = lim lim
h0 T 2 T ith
e t dt
and, in particular, if | (t )|dt < , then (f is bounded and continuous and)
1 itx
ii) f(x) = e (t)dt.
2
PROOF (outline) i) The ch.f. is continuous, by Theorem 1(iii), and since
T itx
so is eitxj, it follows that the integral T e (t)dt exists for every T(> 0). We j
have then
itx
1
() 1
( )
T T
e itx f x k dt
itx j
e t dt =
e
j k
2T T 2T T
k
(
) f x dt
1
( )
T
e
it x x
2T T k
= k
k j
1 T it( x x )
= f xk
k
( )
2T T
e dt k j
(the interchange of the integral and summations is valid here). That is,
( ) dt .
1
() ( ) 21T
T T
t dt = f x k
itx j it x k x j
2T T e k
T
e (1)
But
( ) dt =
[cos t(x ) ( )]
T it x k x j T
T
e T
k x j + i sin t x k x j dt
( ) ( )
T T
= cos t x k x j dt + i sin t x k x j dt
T T
( ) 1
( )
T T
T
cos t x k x j dt =
xk x j T
d sin t x k x j
=
( )
sin T x k x j sin T x k x j [ ( )]
xk x j
=
(
2 sin T x k x j ).
xk x j
6.2 Definitions and Basic TheoremsThe
6.5 The Moment
One-Dimensional
Generating Function
Case 143
Therefore,
1, if xk = x j
( ) dt = sin T x x
1
( k j)
T it x k x j
2T
T
e
, if xk x j.
(3)
(
T x k x j )
sinT ( x x )
But k
x x
j
x 1 x , a constant independent of T, and therefore, for
k j
sin T ( x x )
k j
T k j
1 T (
it x k x j ) dt = 1, if xk = x j
lim
T 2T e
xk x j.
(4)
0, if
T
1
() ( ) 21T ( )
T T
t dt = lim f xk
it xk x j
T e
itx j
lim e dt
T 2T T
k
T
( ) 1 ( )
T
= f xk lim
it xk x j
k
T 2T T e dt
k j
T 2T T e
as was to be seen
ii) (ii) Strictly speaking, (ii) follows from (ii). We are going to omit (ii)
entirely and attempt to give a rough justification of (ii). The assumption that
itx
| (t )|dt < implies that e (t)dt exists, and is taken as follows for every
arbitrary but fixed x !:
() ()
T
e itx t dt = lim e itx t dt .
(0 < )T T
(5)
But
()
e itx t dt = e itx e ity f y dydt ()
T T
T T
() ()
it ( y x ) it ( y x )
f y dy dt = f y e dt dy,
T T
= e (6)
T T
where the interchange of the order of integration is legitimate here. Since the
integral (with respect to y) is zero over a single point, we may assume in the
sequel that y x. Then
T (
it y x ) 2 sin T y x ( ),
T e dt =
yx
(7)
sin T y x ( ) dy.
() ()
e t dt = 2 lim f y (8)
itx
T yx
Setting T(y x) = z, expression (8) becomes
z sin z
()
e itx t dt = 2 lim f x +
T T z
dz
()
= 2 f x = 2f x , ()
by taking the
limit under the integral sign, and by using continuity of f and the
fact that sinz z dz = . Solving for f(x), we have
() 1
()
f x =
2
e itx t dt ,
as asserted. "
EXAMPLE 1 Let X be B(n, p). In the next section, it will be seen that X(t) = (peit + q)n. Let
us apply (i) to this expression. First of all, we have
1
() 1
( pe )
T T n
e itx t dt = + q e itx dt
it
2T T 2T T
n
r ( pe )q
n
1 T r
n r
= it
e itx dt
2T T
r = 0
1 n
n T ( )
r p q n r i r x t
= r
e dt
2T r= 0
T
1 n
n 1 ( )
( )
T
r p q
i r x t
n r
= r
e i r x dt
2T r= 0 i( r x ) T
r x
1 n x n x T
+ p q T dt
2T x
i ( r x )T i ( r x )T
n
n e e 1 n x n x
= p r qn r + p q 2T
r= 0 r
r x
2Ti r x 2T x ( )
n
n
= p r qn r
sin r x T n x n x
+ p q .
( )
r= 0 r
r x
rx T x ( )
Taking the limit as T , we get the desired result, namely
n
()
f x = p x q n x .
x
6.5 The Moment GeneratingExercises
Function 145
(One could also use (i) for calculating f(x), since is, clearly, periodic with
period 2.)
EXAMPLE 2 For an example of the continuous type, let X be N(0, 1). In the next section, we
will see that X(t) = et /2. Since |(t)| = et /2, we know that | (t )|dt < , so that
2 2
1 ( 1 2 )( t + 2 itx ) 1 ( 1 2 ) t + 2 t ( ix ) + ( ix ) ( 1 2 )( ix )
2 2 2
2
2 2
= e dt = e e dt
( 1 2 )x 2
( 1 2 )x 2
e 1 ( 1 2 )( t + ix ) e 2
1 ( 1 2 )u2
=
2
2
e dt =
2
2
e du
( )
1 2 x2
e 1 2
= 1 = e x 2
,
2 2
as was to be shown.
PROOF The p.d.f. of an r.v. determines its ch.f. through the definition of the
ch.f. The converse, which is the involved part of the theorem, follows from
Theorem 2. "
Exercises
6.2.1 Show that for any r.v. X and every t !, one has |EeitX| E|eitX|(= 1).
(Hint: If z = a + ib, a, b !, recall that z = a 2 + b 2 . Also use Exercise 5.4.7
in Chapter 5 in order to conclude that (EY)2 EY2 for any r.v. Y.)
6.2.2 Write out detailed proofs for parts (iii) and (vii) of Theorem 1 and
justify the use of Lemmas C, D.
6.2.3 For any r.v. X with ch.f. X, show that X(t) = X(t), t !, where the bar
over X denotes conjugate, that is, if z = a + ib, a, b !, then z = a ib.
6.2.4 Show that the ch.f. X of an r.v. X is real if and only if the p.d.f. fX of X
is symmetric about 0 (that is, fX(x) = fX(x), x !). (Hint: If X is real, then the
conclusion is reached by means of the previous exercise and Theorem 2. If fX
is symmetric, show that fX(x) = fX(x), x !.)
146 6 Characteristic Functions, Moment Generating Functions and Related Theorems
6.2.5 Let X be an r.v. with p.d.f. f and ch.f. given by: (t) = 1 |t| if |t| 1
and (t) = 0 if |t| > 1. Use the appropriate inversion formula to find f.
6.2.6 Consider the r.v. X with ch.f. (t) = e|t|, t !, and utilize Theorem 2(ii)
in order to determine the p.d.f. of X.
()
x n
X t = e itx p x q n x = pe it q n x = pe it + q ,
x= 0 x x = 0 x
Hence
d
() ( )
n 1
X t = n pe it + q ipe it = inp,
dt t=0 t=0
d2
() (
d it
)
n 1
X t = inp pe + q e it
dt 2 t=0
dt t= 0
( )( ) ( )
n 2 n 1
= inp n 1 pe it + q ipe it e it + pe it + q ie it
t= 0
[( ) ]
= i 2 np n 1 p + 1 = np n 1 p + 1 = E X 2 ,[( ) ] ( )
so that
( ) [( ) ]
E X 2 = np n 1 p + 1 and 2 X = E X 2 EX ( ) ( ) ( ) 2
= n 2 p 2 np 2 + np n 2 p 2 = np 1 p = npq; ( )
that is, 2(X ) = npq.
2. Let X be P(). Then X(t) = ee . In fact,
it
( )
x
e
it
x
()
X t = e itx e =e
it it
= e e e = e e
.
x= 0 x! x= 0 x!
6.3 The Characteristic 6.5
Functions
The Moment
of Some
Generating
Random Variables
Function 147
Hence
d
()
it
X t = e e
ie it = i ,
dt t=0 t= 0
d2
dt 2
X t()
t=0
=
d
dt
(
ie e e
it
+it
) t=0
d e it
= ie e +it
dt t= 0
= ie
e eit +it
(
e it i + i ) t= 0
= ie ( )
e i + i
= i ( + 1) = ( + 1) = E ( X ),
2 2
so that
( )
2 X = E X 2 EX ( ) ( ) 2
(
= + 1 2 = ; )
that is, 2(X ) = .
N(0, 1), then X(t) = et /2. If X is N(, 2), then (X )/, is N(0, 1).
2
Thus
() () ( )
( X ) t = (1 ) X ( ) t = e it X t , and X t = e it ( X ) t . ( ) ()
So it suffices to find the ch.f. of an N(0, 1) r.v. Y, say. Now
( ) 2 dy
() 1 1
y 2 2 ity
e e
2
Y t = ity
e y 2
dy =
2 2
1 (
y it )
2
2
e
2 2
= e t 2
dy = e t 2
.
2
2t 2
()
X t = exp it
2
.
148 6 Characteristic Functions, Moment Generating Functions and Related Theorems
Hence
2t 2
d
dt
()
X t = exp it
2 i t
2
( ) = i , so that E X = . ( )
t =0 t =0
2t 2 2t 2
d2
() ( )
2
X t = exp it i t
2
2 exp it
dt 2 t =0
2 2
t =0
= i2 2 2 = i2 2 + 2 . ( )
Then E(X 2) = 2 + 2 and 2(X) = 2 + 2 2 = 2.
2. Let X be Gamma distributed with parameters and . Then X(t) =
(1 it). In fact,
() 1 1 ( )
x 1 it
( ) 0 ( ) 0
X t = e itx x 1 e x dx = x 1 e dx.
Setting x(1 it) = y, we get
x=
y
1 it
, dx =
dy
1 it
, [
y 0, . )
Hence the above expression becomes
1 1 dy
( ) 0 (1 it ) 1
1 y
y e
1 it
( ) 1
( )
( ) 0
= 1 it y 1e y dy = 1 it .
Therefore
i
d
X t () = = i ,
(1 it )
+1
dt
t =0 t =0
d2 (
+1 2 )
X =i 2
(
= i 2 + 1 2 , )
(1 it )
+2
dt 2
t =0 t =0
For = r/2, = 2, we get the corresponding quantities for 2r, and for
= 1, = 1/, we get the corresponding quantities for the Negative Exponential
distribution. So
6.5 The Moment GeneratingExercises
Function 149
1
it
() ( ) ()
r 2
X t = 1 2it , X t = 1 = ,
it
respectively.
3. Let X be Cauchy distributed with = 0 and = 1. Then X(t) = e|t|. In
fact,
1 cos tx ( )
() 1 1
X t = e itx dx = dx
1+ x 2
1 + x 2
i sin tx
+ dx =
( )
2 cos tx
dx
( )
1+ x 2
0 1 + x2
because
( ) dx = 0,
sin tx
1 + x2
since sin(tx) is an odd function, and cos(tx) is an even function. Further, it can
be shown by complex variables theory that
cos tx ( ) dx = e t
0 1 + x2 2
.
Hence
()
X t = e
t
.
Now
d
dt
X t = e
d t
dt
()
does not exist for t = 0. This is consistent with the fact of nonexistence of E(X ),
as has been seen in Chapter 5.
Exercises
6.3.1 Let X be an r.v. with p.d.f. f given in Exercise 3.2.13 of Chapter 3.
Derive its ch.f. , and calculate EX, E[X(X 1)], 2(X), provided they are
finite.
6.3.2 Let X be an r.v. with p.d.f. f given in Exercise 3.2.14 of Chapter 3.
Derive its ch.f. , and calculate EX, E[X(X 1)], 2(X), provided they are
finite.
6.3.3 Let X be an r.v. with p.d.f. f given by f(x) = e(x) I(,)(x). Find its ch.f.
, and calculate EX, 2(X), provided they are finite.
150 6 Characteristic Functions, Moment Generating Functions and Related Theorems
pr
()
t = ;
( )
r
1 qe it
e it e it
()
t = ;
(
it )
ii) By differentiating , show that EX = 2+ and 2 ( X ) = ( )2
12
.
6.3.6 Consider the r.v. X with p.d.f. f given in Exercise 3.3.14(ii) of Chapter
3, and by using the ch.f. of X, calculate EX n, n = 1, 2, . . . , provided they are
finite.
n + +n
(t , , t ) = i ( )
1 k k
j=1 n j
X , , Xk 1 k E X 1n X kn ,
1 k
t tn t tn
1 k 1
1 k t 1 = =tk = 0
and, in particular,
n
t nj
X , , 1 Xk (t , , t )
1 k ( )
= i n E X nj , j = 1, 2, , k.
t 1 = =tk = 0
viii) If in the X1, . . . , Xk(t1, . . . , tk) we set tj1 = = tjn = 0, then the resulting
expression is the joint ch.f. of the r.v.s Xi1, . . . , Xim, where the js and the
is are different and m + n = k.
Multidimensional versions of Theorem 2 and Theorem 3 also hold true.
We give their formulations below.
THEOREM 2 (Inversion formula) Let X = (X1, . . . , Xk) be an r. vector with p.d.f. f and ch.f.
. Then
k
1
( )
T T
ii) f X , ,
1 Xk x1 , , xk = lim
T 2T T
e it x it x
T
1 1 k k
X , ,
1 Xk (t , , t )dt
1 k 1 dt k ,
if X is of the continuous type, with the analog of (ii) holding if the integral
of |X1, . . . , Xk(t1, . . . , tk)| is finite.
THEOREM 3 (Uniqueness Theorem) There is a one-to-one correspondence between the
ch.f. and the p.d.f. of an r. vector.
PROOFS The justification of Theorem 1 is entirely analogous to that given
for Theorem 1, and so is the proof of Theorem 2. As for Theorem 3, the fact
that the p.d.f. of X determines its ch.f. follows from the definition of the ch.f.
That the ch.f. of X determines its p.d.f. follows from Theorem 2. "
152 6 Characteristic Functions, Moment Generating Functions and Related Theorems
(
P X 1 = x1 , , X k = xk = ) n!
x1! xk !
p1x pkx . 1 k
Then
(t , , t ) = ( p e ).
n
it 1
X , ,
1 Xk 1 k 1 + + pk e it k
In fact,
X , ,
1 Xk (t , , t ) =
1 k
x 1 , , xk
e it x + +it x
1 1 k k
n!
x1! xk !
p1x pkx 1 k
n!
( ) ( )
x1 xk
= x1! xk !
p1e it 1
pk e it k
x 1 , , xk
( ).
n
= p1e it + + pk e it
1 k
Hence
k
t1 t k
X , ,1 Xk (t , , t )
1 k
t 1 = = tk = 0
( ) ( )
= n n 1 n k + 1 i p1 pk p1e it1 + k
(
) ( ) ( )
n k
+ pk e itk = i k n n 1 n k + 1 p1 p2 pk .
t 1 = = tk = 0
Hence
( ) (
E X1 X k = n n 1 n k + 1 p1 p2 pk .) ( )
Finally, the ch.f. of a (measurable) function g(X) of the r. vector X =
(X1, . . . , Xk) is defined by:
e itg ( x ) f x , x = x ,
()
1 , xk ( )
()
g(X) t = E e
itg ( X )
= x
( )
itg ( x , , x )
e f x1 , , xk dx1 , , dxk .
1 k
Exercise
6.4.1 (CramrWold) Consider the r.v.s Xj, j = 1, . . . , k and for cj !,
j = 1, . . . , k, set
6.5 The Moment Generating Function 153
k
Yc = c j X j .
j =1
Then
ii) Show that Yc(t) = X1, . . . , Xk(c1t, . . . , ckt), t !, and X1, . . . , Xk(c1, . . . , ck)
= Yc(1);
ii) Conclude that the distribution of the Xs determines the distribution of Yc
for every cj !, j = 1, . . . , k. Conversely, the distribution of the Xs is
determined by the distribution of Yc for every cj !, j = 1, . . . , k.
dn
dn
dt n
MX t () =
dn
dt n
(
Ee tX ) = E n e tX
dt t =0
t =0 t =0
(
= E X n e tX ) t =0
( )
= E Xn .
This is the property from which the m.g.f. derives its name.
154 6 Characteristic Functions, Moment Generating Functions and Related Theorems
n n
( ) ( )
n n
()
x n
M X t = e tx p x q n x = pe t q n x = pe t + q ,
x= 0 x x = 0 x
d
() d
( ) ( ) ( )
n n 1
MX t = pe t + q = n pe t + q pe t = np = E X ,
dt t =0
dt t =0 t =0
and
d2
() d
( )
n 1
MX t = np pe t + q et
dt 2 t =0
dt t =0
( )( ) ( )
n 2 n 1
= np n 1 pe t + q pe t e t + pe t + q et
t =0
( )
= n n 1 p 2 + np = n 2 p 2 np 2 + np = E X 2 , ( )
so that 2(X) = n2p2 np2 + np n2p2 = np(1 p) = npq.
2. If X P(), then MX(t) = ee , t !. In fact,
t
( )
x
e
t
x
()
M X t = e tx e =e
t t
= e e e = e e .
x= 0 x! x= 0 x!
Then
d
() d e
( )
t t
MX t = e = e t e e ==E X ,
dt t =0
dt t =0
t =0
and
d2
dt 2
()
MX t
t =0
=
d
dt
( t
e t e e ) t =0
( t
= e t e e + e t e e e t
t
) t =0
( ) ( )
= 1 + = E X 2 , so that 2 X = + 2 2 = . ( )
6.5 The Moment Generating Function 155
2t 2
3. If X N(, 2), then M X (t ) = e
t +
2
, t !, and, in particular, if X N(0,
1), then MX(t) = et /2, t !. By the property for m.g.f. analogous to property (vi)
2
in Theorem 1,
t t
()
M X t = e t M X , so that M X = e t M X t
()
2t 2 2t 2
d
() d t +
( ) ( )
t +
MX t = e 2
= + 2t e 2
==E X ,
dt t= 0
dt
t=0 t= 0
and
d
2t 2
d2
() ( )
t +
MX t = + 2t e 2
dt 2 dt
t=0 t= 0
t 2 2
t 2 2
( )
t + 2 t +
= 2 e 2
+ + 2t e 2
= 2 + 2
t=0
( )
= E X 2 , so that 2 X = 2 + 2 2 = 2 . ( )
4. If X is distributed as Gamma with parameters and , then MX(t) =
(1 t), t < 1/. Indeed,
() 1 1 ( )
x 1 t
( ) ( )
MX t = e tx x 1 e x dx = x 1 e dx.
0
0
1 1 1
( ) 0
y 1 e y dy = ,
(1 t ) (1 t )
d
() d
( ) ( )
MX t = 1 t = = E X ,
dt t =0
dt t =0
156 6 Characteristic Functions, Moment Generating Functions and Related Theorems
and
d2
() d
( ) ( ) ( )
1 2
MX t = 1 t = + 1 2 1 t
dt 2 t =0
dt t =0 t =0
( ) (
= + 1 = EX , so that X = .
2 2
) 2
( ) 2
In particular, for = 2r and = 2, we get the m.g.f. of the 2r, and its mean and
variance; namely,
() ( 1
) ( ) ( )
r 2
MX t = 1 2t , E X = r , 2 X = 2r .
, t<
2
For = 1 and = 1 , we obtain the m.g.f. of the Negative Exponential
distribution, and its mean and variance; namely
MX t =() t
1
, t < , EX = , 2 X = 2 .
1
( )
5. Let X have the Cauchy distribution with parameters and , and
without loss of generality, let = 0, = 1. Then the MX(t) exists only for t = 0.
In fact.
() ( )
1 1
M X t = E e tX = e tx dx
1 + x2
1
> e tx
0
1
1+ x 2
1
dx > tx
0
1
1 + x2
dx ( )
if t > 0, since ez > z, for z > 0, and this equals
t
2
2 x dx
1+ x
20
=
t du
2 1 u
=
t
lim log u .
2 x ( )
Thus for t > 0, MX(t) obviously is equal to . If t < 0, by using the limits , 0
in the integral, we again reach the conclusion that MX(t) = (see Exercise
6.5.9).
REMARK 4 The examples just discussed exhibit all three cases regarding the
existence or nonexistence of an m.g.f. In Examples 1 and 3, the m.g.f.s exist
for all t !; in Examples 2 and 4, the m.g.f.s exist for proper subsets of !; and
in Example 5, the m.g.f. exists only for t = 0.
For an r.v. X, we also define what is known as its factorial moment
generating function. More precisely, the factorial m.g.f. X (or just when no
confusion is possible) of an r.v. X is defined by:
() ( )
X t = E t X , t ! , if E t X exists. ( )
This function is sometimes referred to as the Mellin or MellinStieltjes trans-
form of f. Clearly, X(t) = MX(log t) for t > 0.
Formally, the nth factorial moment of an r.v. X is taken from its factorial
m.g.f. by differentiation as follows:
6.5 The Moment Generating Function 157
dn
dt n
X t ()
t =1
[ ( ) (
= E X X 1 X n+1 . )]
In fact,
n X
dn
dt n
()
X t =
dn
dt n
E t X
=( )
E n t = E X X 1 X n+1 t
t
[ (
X n
, ) ( ) ]
provided Lemma D applies, so that the interchange of the order of differen-
tiation and expectation is valid. Hence
dn
dt n
X t ()
t =1
[ ( ) (
= E X X 1 X n+1 . )] (9)
REMARK 5 The factorial m.g.f. derives its name from the property just estab-
lished. As has already been seen in the first two examples in Section 2 of
Chapter 5, factorial moments are especially valuable in calculating the vari-
ance of discrete r.v.s. Indeed, since
( ) ( ) ( ) 2
2 X = E X 2 EX , and E X 2 = E X X 1 + E X , ( ) [ ( )] ( )
we get
( ) [ (
2 X = E X X 1 + E X EX ; )] ( ) ( )
2
d2
() ( ) ( ) ( )
n 2
X t = n n 1 p 2 pt + q = n n 1 p2 ,
dt 2 t =1
( )
x
t
x
()
X t = t e
x =0
x
x!
= e
x = 0 x!
= e e t = e t , t ! .
158 6 Characteristic Functions, Moment Generating Functions and Related Theorems
Hence
d2
dt 2
X t () = 2 e t
t =1
= 2 , so that 2 X = 2 + 2 = . ( )
t =1
The m.g.f. of an r. vector X or the joint m.g.f. of the r.v.s X1, . . . , Xk,
denoted by MX or MX1, . . . , Xk, is defined by:
1 k
( ) (
M X , , X t1 , , t k = E e t X + t X , t j ! , j = 1, 2, , k,
1 1 k k
)
for those tjs in ! for which this expectation exists. If MX1, . . . , Xk(t1, . . . , tk) exists,
then formally X1, . . . , Xk(t1, . . . , tk) = MX1, . . . , Xk(it1, . . . , itk) and properties analo-
gous to (i)(vii), (viii) in Theorem 1 hold true under suitable conditions. In
particular,
n + +n
(t , , t ) ( )
1 k
MX , , Xk 1 k = E X 1n 1 X knk , (10)
t1n 1 t knk 1
t 1 = = tk = 0
(t , , t ) = ( p e ),
n
t1
MX , ,
1 Xk 1 k 1 + + pk e t k
t j ! , j = 1, , k.
In fact,
MX1 , , Xk (t , , t ) = Ee
1 k
t 1 X 1 + +tk X k
= e 1
t X 1 + +tk X k n!
x1! xk !
p1x1 pkxk
n!
( ) ( )
x1 xk
= p1e t1 pk e tk
x1! xk !
( ),
n
= p1e t1 + + pk e tk
MX 1 ,X 2
(t , t ) = exp t
1 2
1 1 + 2t 2 +
2
(
1 2 2
1 t1 + 2 1 2 t1t 2 + 22 t 22 ), t , t
1 2 ! . (11)
(
f x1 , x 2 )
2
1 x1 1 x1 1 x 2 2 x 2 2
2
1
= exp 2 + .
2 1 2 1 2 2 1 1 2 2
Set x = (x1 x2), = (1 2), and
12
12 2 .
=
1 2 2
1 2 1 2
2
1 = .
1 2 12
Therefore
22 1 2 x1 1
( )
x 1 x = ( ) 1
(
x1 1 x 2 2
1 2
)
12 x 2 2
1
(
2 x
) ( )( ) ( )
2 1 2 x1 1 x 2 2 + 12 x 2 2
2 2
=
1
1
2 2
2 ( 2
)
2 1 1
1 x 2 x1 1 x 2 2 x 2 2
2
= 1 1
2 +
.
1 2 1 1 2 2
1
()
f x =
1
( )
exp x 1 x ( ) .
2
1 2
2
In this form, is the mean vector of X = (X1 X2), and is the covariance matrix
of X.
Next, for t = (t1 t2), we have
()
M X t = Ee t X = 2 exp t x f x dx
!
( )()
! exp t x 2 ( x ) ( x )dx.
1 1 1
= 1 2 2
(11)
2
1
2
1
2
[ ( 1
)
t + t t 2 t + t t 2 t x + x x . ( )] (13)
Focus on the quantity in the bracket, carry out the multiplication, and observe
1) = 1, xt = tx, t = t, and x 1 =
that = , ( 1x, to obtain
( ) ( ) [ ( ) ( (
2 t + t t 2t x + x 1 x = x + t 1 x + t . ))] (14)
MX t()
1
= exp t + t t
2
1
2 1 2 ! 2
1
( ( )) ( (
exp x + t 1 x + t dx.
2
))
However, the second factor above is equal to 1, since it is the integral of a
Bivariate Normal distribution with mean vector + t and covariance matrix
. Thus
()
1
M X t = exp t + t t .
2
(15)
Observing that
12 1 2 t1
( )
t t = t1 t 2 = 1 t1 + 2 1 2 t1t 2 + 2 t 2 ,
1 2 2 t 2
2
2 2 2 2
Exercises
6.5.1 Derive the m.g.f. of the r.v. X which denotes the number of spots that
turn up when a balanced die is rolled.
6.5.4 Let X be an r.v. with p.d.f. f given by f(x) = e(x )I(,)(x). Find its
m.g.f. M(t) for those ts for which it exists. Then calculate EX and 2(X),
provided they are finite.
6.5.5 Let X be an r.v. distributed as B(n, p). Use its factorial m.g.f. in order
to calculate its kth factorial moment. Compare with Exercise 5.2.1 in Chapter
5.
6.5.6 Let X be an r.v. distributed as P(). Use its factorial m.g.f. in order to
calculate its kth factorial moment. Compare with Exercise 5.2.4 in Chapter 5.
6.5.7 Let X be an r.v. distributed as Negative Binomial with parameters
r and p.
i) Show that its m.g.f and factorial m.g.f., M(t) and (t), respectively, are
given by
pr pr
()
MX t = , t < log q, X t = () , t <
1
;
(1 qe ) (1 qt ) q
r r
t
e t e t
Mt = () ;
(
t )
ii) By differentiation, show that EX = +2 and 2(X) = ( ) .
2
12
6.5.9 Refer to Example 3 in the Continuous case and show that MX(t) = for
t < 0 as asserted there.
2
6.5.10 Let X be an r.v. with m.g.f. M given by M(t) = e t + t , t ! ( !,
> 0). Find the ch.f. of X and identify its p.d.f. Also use the ch.f. of X in order
to calculate EX4.
6.5.11 For an r.v. X, define the function by (t) = E(1 + t)X for those ts for
which E(1 + t)X is finite. Then, if the nth factorial moment of X is finite, show
that
(d n
) ()
dt n t
t =0
[ ( )
= E X X 1 X n+1 .( )]
6.5.12 Refer to the previous exercise and let X be P(). Derive (t) and use
it in order to show that the nth factorial moment of X is n.
162 6 Characteristic Functions, Moment Generating Functions and Related Theorems
6.5.13 Let X be an r.v. with m.g.f. M and set K(t) = log M(t) for those ts for
which M(t) exists. Furthermore, suppose that EX = and 2(X) = 2 are both
finite. Then show that
d2
d
dt
Kt() = and
dt 2
Kt () = 2.
t =0 t =0
(The function K just defined is called the cumulant generating function of X.)
6.5.14 Let X be an r.v. such that EX n is finite for all n = 1, 2, . . . . Use the
expansion
xn
ex =
n = 0 n!
in order to show that, under appropriate conditions, one has that the m.g.f. of
X is given by
( ) tn! .
n
()
M t = EX n
n =0
6.5.15 If X is an r.v. such that EX n = n!, then use the previous exercise in
order to find the m.g.f. M(t) of X for those ts for which it exists. Also find the
ch.f. of X and from this, deduce the distribution of X.
6.5.16 Let X be an r.v. such that
EX 2k =
(2k)! , EX 2k +1 = 0,
2 k k!
k = 0, 1, . . . . Find the m.g.f. of X and also its ch.f. Then deduce the distribution
of X. (Use Exercise 6.5.14)
6.5.17 Let X1, X2 be two r.v.s with m.f.g. given by
2
( )
1
3
( 1
) (
)
M t1 , t 2 = e t1 + t2 + 1 + e t1 + e t2 , t1 , t 2 ! .
6
Calculate EX1, 2(X1) and Cov(X1, X2), provided they are finite.
6.5.18 Refer to Exercise 4.2.5. in Chapter 4 and find the joint m.g.f.
M(t1, t2, t3) of the r.v.s X1, X2, X3 for those t1, t2, t3 for which it exists. Also find
their joint ch.f. and use it in order to calculate E(X1X2X3), provided the
assumptions of Theorem 1 (vii) are met.
6.5.19 Refer to the previous exercise and derive the m.g.f. M(t) of the r.v.
g(X1, X2, X3) = X1 + X2 + X3 for those ts for which it exists. From this, deduce
the distribution of g.
6.5 The Moment GeneratingExercises
Function 163
6.5.20 Let X1, X2 be two r.v.s with m.g.f. M and set K(t1, t2) = log M(t1, t2) for
those t1, t2 for which M(t1, t2) exists. Furthermore, suppose that expectations,
variances, and covariances of these r.v.s are all finite. Then show that for
j = 1, 2,
2
t j
(
K t1 , t 2 ) = EX j ,
t j2
(
K t1 , t 2 ) ( )
= 2 Xj ,
t 1 =t2 = 0 t 1 =t2 = 0
2
t1t 2
K t1 , t 2 ( ) (
= Cov X 1 , X 2 . )
t 1 =t2 = 0
6.5.21 Suppose the r.v.s X1, . . . , Xk have the Multinomial distribution with
parameters n and p1, . . . , pk, and let i, j, be arbitrary but fixed, 1 i < j k.
Consider the r.v.s Xi, Xj, and set X = n Xi Xj, so that these r.v.s have the
Multinomial distribution with parameters n and pi, pj, p, where p = 1 pi pj.
ii) Write out the joint m.g.f. of Xi, Xj, X, and by differentiation, determine the
E(XiXj);
ii) Calculate the covariance of Xi, Xj, Cov(Xi, Xj), and show that it is negative.
6.5.22 If the r.v.s X1 and X2 have the Bivariate Normal distribution with
parameters 1, 2, 21, 22 and , show that Cov(X1, X2) 0 if 0, and
Cov(X1, X2) < 0 if < 0. Note: Two r.v.s X1, X2 for which Fx ,x (X1, X2) 1 2
Fx (X1)Fx (X2) 0, for all X1, X2 in !, or Fx ,x (X1, X2) Fx (X1)Fx (X2) 0, for
1 2 1 2 1 2
all X1, X2 in !, are said to be positively quadrant dependent or negatively
quadrant dependent, respectively. In particular, if X1 and X2 have the Bivariate
Normal distribution, it can be seen that they are positively quadrant depend-
ent or negatively quadrant dependent according to whether 0 or < 0.
6.5.23 Verify the validity of relation (13).
6.5.24
ii) If the r.v.s X1 and X2 have the Bivariate Normal distribution with param-
eters 1, 2, 21, 22 and , use their joint m.g.f. given by (11) and property
(10) in order to determine E(X1X2);
ii) Show that is, indeed, the correlation coefficient of X1 and X2.
6.5.25 Both parts of Exercise 6.4.1 hold true if the ch.f.s involved are re-
placed by m.g.f.s, provided, of course, that these m.g.f.s exist.
ii) Use Exercise 6.4.1 for k = 2 and formulated in terms of m.g.f.s in order to
show that the r.v.s X1 and X2 have a Bivariate Normal distribution if and
only if for every c1, c2 !, Yc = c1X1 + c2X2 is normally distributed;
ii) In either case, show that c1X1 + c2X2 + c3 is also normally distributed for any
c3 !.
164 7 Stochastic Independence with Some Applications
Chapter 7
( ) (
P X j Bj , j = 1, , k = P X j Bj .
j =1
)
The r.v.s Xj, j = 1, 2, . . . are said to be independent if every finite subcollection
of them is a collection of independent r.v.s. Non-independent r.v.s are said to
be dependent. (See also Definition 3 in Section 7.4, and the comment following
it.)
REMARK 1 (i) The sets Bj, j = 1, . . . , k may not be chosen entirely arbitrar-
ily, but there is plenty of leeway in their choice. For example, taking Bj = (,
xj], xj !, j = 1, . . . , k would be sufficient. (See Lemma 3 in Section 7.4.)
(ii) Definition 1 (as well as Definition 3 in Section 7.4) also applies to m-
dimensional r. vectors when ! (and B in Definition 3) is replaced by ! m (Bm).
164
7.1 Stochastic Independence: Criteria of Independence 165
THEOREM 1 (Factorization Theorem) The r.v.s Xj, j = 1, . . . , k are independent if and only
if any one of the following two (equivalent) conditions holds:
k
1 k
( )
i) FX , , X x1 , , xk = FX x j , for all
j =1
j
( ) x j ! , j = 1, , k.
1 k
( )
ii) f X , , X x1 , , xk = f X x j , for all
j =1
j
( ) x j ! , j = 1, , k.
PROOF
ii) If Xj, j = 1, , k are independent, then
k
( )
P X j Bj , j = 1, , k = P X j Bj , Bj ! , j = 1, , k.
j =1
( )
In particular, this is true for Bj = (, xj], xj !, j = 1, . . . , k which gives
k
1 k
(
FX , , X x1 , , xk = FX x j . ) j =1
j
( )
The proof of the converse is a deep probability result, and will, of course,
be omitted. Some relevant comments will be made in Section 7.4, Lemma 3.
ii) For the discrete case, we set Bj = {xj}, where xj is in the range of Xj, j = 1, . . . ,
k. Then if Xj, j = 1, . . . , k are independent, we get
k
(
P X 1 = x1 , , X k = xk = P X j = x j , ) j =1
( )
or
k
1 k
(
f X , , X x1 , , xk = f X x j . ) j =1
j
( )
Let now
k
1 k
(
f X , , X x1 , , xk = f X x j . ) j =1
j
( )
Then for any sets Bj = (, yj], yj !, j = 1, . . . , k, we get
B1
B
1 k
(
f X , , X x1 , , xk = ) B1
B
1
( )
f X x1 f X xk k
( )
k k
k
= f X x j ,
j =1
j
( )
B j
or
k
1 k
(
FX , , X y1 , , yk = FX y j . ) j =1
j
( )
Therefore Xj, j = 1, . . . , k are independent by (i). For the continuous case,
we have: Let
k
1 k
(
f X , , X x1 , , xk = f X x j ) j =1
j
( )
and let
166 7 Stochastic Independence with Some Applications
( ]
C j = , y j , y j ! . j = 1, , k.
Then integrating both sides of this last relationship over the set C1
Ck, we get
k
1 k
(
FX , , X y1 , , yk = FX y j , ) j =1
j
( )
so that Xj, j = 1, . . . , k are independent by (i). Next, assume that
k
1 k
(
FX , , X x1 , , xk = FX x j ) j =1
j
( )
(that is, the Xjs are independent). Then differentiating both sides, we get
k
1 k
( )
f X , , X x1 , , xk = f X x j .
j =1
j
( )
REMARK 2 It is noted that this step also is justifiable (by means of calculus)
for the continuity points of the p.d.f. only.
Consider independent r.v.s and suppose that gj is a function of the jth r.v.
alone. Then it seems intuitively clear that the r.v.s gj(Xj), j = 1, . . . , k ought to
be independent. This is, actually, true and is the content of the following
LEMMA 1 For j = 1, . . . , k, let the r.v.s Xj be independent and consider (measurable)
functions gj : ! !, so that gj(Xj), j = 1, . . . , k are r.v.s. Then the r.v.s gj(Xj),
j = 1, . . . , k are also independent. The same conclusion holds if the r.v.s are
replaced by m-dimensional r. vectors, and the functions gj, j = 1, . . . , k are
defined on ! m into !. (That is, functions of independent r.v.s (r. vectors)
are independent r.v.s.)
PROOF See Section 7.4.
Independence of r.v.s also has the following consequence stated as a
lemma. Both this lemma, as well as Lemma 1, are needed in the proof of
Theorem 1 below.
LEMMA 2 Consider the r.v.s Xj, j = 1, . . . , k and let gj : ! ! be (measurable) functions,
so that gj(Xj), j = 1, . . . , k are r.v.s. Then, if the r.v.s Xj, j = 1, . . . , k are
independent, we have
k k
( )
E g j X j = E g j X j ,
j = 1 j = 1
[ ( )]
provided the expectations considered exist. The same conclusion holds if the
gjs are complex-valued.
PROOF See Section 7.2.
REMARK 3 The converse of the above statement need not be true as will be
seen later by examples.
THEOREM 1 (Factorization Theorem) The r.v.s Xj, j = 1, . . . , k are independent if and only if:
k
1 k
( ) j =1
j
( )
X , , X t1 , , t k = X t j , for all t j ! , j = 1, , k.
7.1 Stochastic Independence: Criteria of Independence 167
1 k
(
f X , , X x1 , , xk = f X x j . ) j =1
j
( )
Hence
k k it X k
(
)
X , , X x1 , , xk = E exp i t j X j = E e = Ee
it X j j j j
1 k
j =1 j =1 j =1
by Lemmas 1 and 2, and this is kj=1X (tj). Let us assume now that j
k
1 k
(
X , , X t1 , , t k = X t j . ) j =1
j
( )
For the discrete case, we have (see Theorem 2(i) in Chapter 6)
f X x j = lim j
( )
1 T it x
T 2T T
e X t j dt j , j = 1, , k, j j
j
( )
and for the multidimensional case, we have (see Theorem 2(i) in Chapter 6)
1
k
k
f X 1 , , X k ( x1 , , x k ) = lim
T T
exp i t j x j
T 2T T T
j =1
X 1 , , X k (t1 , , t k )dt1 dt k
1
k
k k
exp i t j x j X j t j ( dt1 dt k ) ( )
T T
= lim
T 2T
T T
j =1 j =1
1 T it j x j k
( )
k
= lim e X j t j dt j = f X x .
j =1
T 2T T
j =1 j ( j )
That is, Xj, j = 1, . . . , k are independent by Theorem 1(ii). For the continuous
case, we have
it j h
1 T 1 e
( )
h 0 T 2 T
f X x j = lim lim
j
it j h
it h
e X t j dt j , j = 1, , k, j
j
( )
and for the multidimensional case, we have (see Theorem 2(ii) in Chapter 6)
k
1 k 1 e it j h
( )
T T
T
it j x j
f X 1 , , X k x1 , , xk = lim lim e
h 0 T 2 T it j h
j =1
(
X 1 , , X k t1 , , t k dt1 dt k )
k
1 1 e it j h it j x j
k
( )
T T
= lim lim
h 0 T 2 T e Xj t j
it j h
T
j =1
dt1 dt k
k it h
1 e j it j x j
1
( )
T
= lim lim T it j h e X j t j dt j
h 0 T 2
j =1
k
= fX j ( xj ) ,
j =1
168 7 Stochastic Independence with Some Applications
fX 1, X2 (x , x ) =
1 2
1
e q 2 ,
2 1 2 1 2
where
1 x 2 x1 2 x2 2 x2 2
2
q= 1 1
2
,
1 2 1 1 2 2
and
x 2
( ) x
( )
2
( )
fX 1 x1 =
1
exp
1
2 2
1 , f x =
X2 2
1
exp 2
2
( )
2
2 .
2 1 1
2 2 2
Thus, if X1, X2 are uncorrelated, so that = 0, then
( ) ( )
fX 1 , X 2 x1 , x 2 = fX 1 x1 fX 2 x 2 , ( )
that is, X1, X2 are independent. The converse is always true by Corollary 1 in
Section 7.2.
Exercises
7.1.1 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with p.d.f. f and d.f. F. Set
(
X (1) = min X 1 , , X n , ) (
X n = max X 1 , , X n ; )
that is,
() [ ()
X (1) s = min X 1 s , , X n s , ( )] () [ () ( )]
X ( n ) s = max X 1 s , , X n s .
Then express the d.f. and p.d.f. of X(1), X(n) in terms of f and F.
7.1.2 Let the r.v.s X1, X2 have p.d.f. f given by f(x1, x2) = I(0,1) (0,1)(x1, x2).
ii) Show that X1, X2 are independent and identify their common distribution;
ii) Find the following probabilities: P(X1 + X2 < 13 ), P( X12 + X 22 < 14 ),
P(X1X2 > 12 ).
7.1.3 Let X1, X2 be two r.v.s with p.d.f. f given by f(x1, x2) = g(x1)h(x2).
Exercises
7.1 Stochastic Independence: Criteria of Independence 169
ii) Derive the p.d.f. of X1 and X2 and show that X1, X2 are independent;
ii) Calculate the probability P(X1 > X2) if g = h and h is of the continuous
type.
7.1.4 Let X1, X2, X3 be r.v.s with p.d.f. f given by f(x1, x2, x3) = 8x1x2x3 IA(x1,
x2, x3), where A = (0, 1) (0, 1) (0, 1).
ii) Show that these r.v.s are independent;
ii) Calculate the probability P(X1 < X2 < X3).
7.1.5 Let X1, X2 be two r.v.s with p.d.f f given by f(x1, x2) = cIA(x1, x2), where
A = {(x1, x2) !2; x12 + x 22 9}.
ii) Determine the constant c;
ii) Show that X1, X2 are dependent.
7.1.6 Let the r.v.s X1, X2, X3 be jointly distributed with p.d.f. f given by
( )1
(
f x1 , x 2 , x3 = I A x1 , x 2 , x3 ,
4
)
where
{( )( )( )(
A = 1, 0, 0 , 0, 1, 0 , 0, 0, 1 , 1, 1, 1 . )}
Then show that
ii) Xi, Xj, i j, are independent;
ii) X1, X2, X3 are dependent.
7.1.7 Refer to Exercise 4.2.5 in Chapter 4 and show that the r.v.s X1, X2, X3
are independent. Utilize this result in order to find the p.d.f. of X1 + X2 and X1
+ X2 + X3 .
7.1.8 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with p.d.f. f and let B be a (Borel) set
in !.
iii) In terms of f, express the probability that at least k of the Xs lie in B for
some fixed k with 1 k n;
iii) Simplify this expression if f is the Negative Exponential p.d.f. with param-
eter and B = (1/, );
iii) Find a numerical answer for n = 10, k = 5, = 12 .
7.1.9 Let X1, X2 be two independent r.v.s and let g: ! ! be measurable.
Let also Eg(X2) be finite. Then show that E[g(X2) | X1 = x1] = Eg(X2).
7.1.10 If Xj, j = 1, . . . , n are i.i.d. r.v.s with ch.f. and sample mean X ,
express the ch.f. of X in terms of .
7.1.11 For two i.i.d. r.v.s X1, X2, show that X X (t) = |X (t)|2, t !. (Hint:
1 2 1
7.1.12 Let X1, X2 be two r.v.s with joint and marginal ch.f.s X ,X , X and X . 1 1 1 2
X 1 ,X2
(t , t ) = (t ) (t ),
1 2 X1 1 X2 2 t1 , t 2 !.
X 1 ,X2
(t, t ) = (t ) (t ), X1 X2 t ! ,
k
( ) ( ) ( )
E g j X j = g1 x1 gk xk
j =1
) (
f X , , X x1 , , xk dx1 dxk
1 k
= g ( x ) g ( x ) f ( x ) f ( x )dx dx
1 1 k k X1 1 Xk k 1 k
(by independence)
= g ( x ) f ( x )dx g ( x ) f ( x )dx
1 1 X1 1 1 k k Xk k k
= E[ g ( X )] E[ g ( X )].
1 1 k k
Now suppose that the gjs are complex-valued, and for simplicity, set gj(Xj) = Yj
= Yj1 + Yj2, j = 1, . . . , k. For k = 2,
( ) [(
E Y1Y2 = E Y11 + iY12 Y21 + iY22 )( )]
( ) (
= E Y11Y21 Y12Y22 + iE Y11Y22 + Y12Y21)
= [ E(Y Y ) E(Y Y )] + i[ E(Y Y ) + E(Y Y )]
11 21 12 22 11 22 12 21
= [( EY )( EY ) ( EY )( EY )] + i[( EY )( EY ) ( EY )( EY )]
11 21 12 22 11 22 12 21
= ( EY + iEY )( EY + iEY ) = ( EY )( EY ).
11 12 21 22 1 2
( ) [(
E Y1 Ym+1 = E Y1 Ym Ym+1 ) ]
= E(Y Y )( EY ) (by the part just established)
1 m m+1
COROLLARY 1 The covariance of an r.v. X and of any other r.v. which is equal to a constant
c (with probability 1) is equal to 0; that is, Cov(X, c) = 0.
PROOF Cov(X, c) = E(cX) (Ec)(EX) = cEX cEX = 0.
COROLLARY 2 If the r.v.s X1 and X2 are independent, then they have covariance equal to 0,
provided their second moments are finite. In particular, if their variances are
also positive, then they are uncorrelated.
PROOF In fact,
( ) (
Cov X1 , X 2 = E X1 X 2 EX1 EX 2 ) ( )( )
= ( EX )( EX ) ( EX )( EX ) = 0,
1 2 1 2
k k
2 c j X j = c 2j j2 + ci c j Cov X i , X j
j =1 j =1 1i j k
( )
k
= c 2j j2 + 2
j =1
1i < j k
(
ci c j Cov X i , X j . )
ii) If also j > 0, j = 1, . . . , k, and ij = (Xi, Xj), i j, then:
k k
2 c j X j = c 2j j2 + ci c j i j ij
j =1 j =1 1i j k
k
= c 2j j2 + 2 ci c j i j ij .
j =1 1i < j k
PROOF
iii) Indeed,
2
k k k
c j X j = E c j X j E c j X j
2
j =1 j = 1 j =1
2
k
j = 1
(
= E c j X j EX j )
k
( ) + c c (X )( )
2
= E c 2j X j EX j i j i EX i X j EX j
j = 1 i j
k
= c 2j j2 +
j =1
1i j k
(
ci c j Cov X i , X j )
k
= c 2j j2 + 2
j =1
1i < j k
(
ci c j Cov X i , X j )
(since Cov( X , X ) = Cov( X , X )).
i j j i
This establishes part (i). Part (ii) follows by the fact that Cov(Xi, Xj) =
ijij = jiji.
iii) Here Cov (Xi, Xj) = 0, i j, either because of independence and Corollary
2, or ij = 0, in case j > 0, j = 1, . . . , k. Then the assertion follows from
either part (i) or part (ii), respectively.
iii) Follows from part (iii) for c1 = = ck = 1.
Exercises
7.2.1 For any k r.v.s Xj, j = 1, . . . , k for which E(Xj) = (finite) j = 1, . . . , k,
show that
k k
(X ) = (X ) ( ) ( )
2 2 2 2
j j X +k X = kS 2 + k X ,
j =1 j =1
where
2
1 k 1 k
j
X=
k j =1
X and S 2
= Xj X .
k j =1
( )
7.2.2 Refer to Exercise 4.2.5 in Chapter 4 and find the E(X1 X2), E(X1 X2 X3),
2(X1 + X2), 2(X1 + X2 + X3) without integration.
7.2.3 Let Xj, j = 1, . . . , n be independent r.v.s with finite moments of third
order. Then show that
3
n n
( ) ( )
3
E X j EX j = E X j EX j .
j = 1 j =1
7.2.4 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with mean and variance 2, both finite.
7.3 Independence:
7.1 Stochastic Some Consequences
Criteria of Independence 173
ii) In terms of , c and , find the smallest value of n for which the probability
that X (the sample mean of the Xs) and differ in absolute value at most
by c is at least ;
ii) Give a numerical answer if = 0.90, c = 0.1 and = 2.
7.2.5 Let X1, X2 be two r.v.s taking on the values 1, 0, 1 with the following
respective probabilities:
( )
f 1, 1 = , f ( 1, 0) = , f ( 1, 1) =
( )
X = Xj is B n, p , where n = n j .
j =1 j =1
(That is, the sum of independent Binomially distributed r.v.s with the same
parameter p and possibly distinct njs is also Binomially distributed.)
PROOF It suffices to prove that the ch.f. of X is that of a B(n, p) r.v., where
n is as above. For simplicity, writing j Xj instead of kj=1Xj, when this last
expression appears as a subscript here and thereafter, we have
( ) = ( pe )
k k
() () ()
nj n
X t = X t = X t = pe it + q
j j j
it
+q
j =1 j =1
j =1
()
X = X j is P , where = j .
j =1
174 7 Stochastic Independence with Some Applications
(That is, the sum of independent Poisson distributed r.v.s is also Poisson
distributed.)
PROOF We have
( )
k k
() j j
() j =1
j
()
X t = X t = X t = exp j e it j
j =1
k
( )
k
= exp e it j j = exp e it
j =1 j =1
which is the ch.f. of a P() r.v.
THEOREM 4 Let Xj, be N(j, 2j), j = 1, . . . , k and independent. Then
ii) X = kj= 1Xj is N(, 2), where = kj= 1j, 2 = kj= 1 2j, and, more generally,
ii) X = kj= 1cjXj is N(, 2), where = kj= 1cjj, 2 = kj= 1c2j 2j.
(That is, the sum of independent Normally distributed r.v.s is Normally dis-
tributed.)
PROOF (ii) We have
k k 2 c 2t 2
() () ( )
X t = c X t = X cj t = exp icj t j j j
j =1 2
j j j j
j =1
t
2 2
= exp it
2
with and 2 as in (ii) above. Hence X is N(, 2). (i) Follows from (ii) by
setting c1 = c2 = = ck = 1.
Now let Xj, j = 1, . . . , k be any k independent r.v.s with
( )
E X j = , ( )
2 X j = 2, j = 1, , k.
Set
1 k
X= Xj.
k j =1
By assuming that the Xs are normal, we get
COROLLARY Let Xj be N(, 2), j = 1, . . . , k and independent. Then X is N(, 2/k), or
equivalently, [ k ( X )]/ is N(0, 1).
PROOF In (ii) of Theorem 4, we set
1
, 1 = = k = , and 12 = = k2 = 2
c1 = = c k =
k
and get the first conclusion. The second follows from the first by the use of
Theorem 4, Chapter 4, since
7.3 Independence:
7.1 Stochastic Some Consequences
Criteria of Independence 175
k X ( ) = (X ) .
2 k
THEOREM 5 Let Xj be 2r , j = 1, . . . , k and independent. Then
j
k k
X = X j is r2 , where r = rj .
j =1 j =1
PROOF We have
k k
() () () ( ) ( )
rj 2 r 2
X t = X t = X t = 1 2it
j j j
= 1 2it
j =1 j =1
(X ) = ( X ) ( ) ( )
2 2 2 2
j j X +k X = kS 2 + k X ,
j =1 j =1
where
1 k
( )
2
S2 =
k j =1
Xj X .
Now
2
k X
j is k2
j =1
( )
2
k X
is 12 ,
by Theorem 3, Chapter 4. Then taking ch.f.s of both sides of the last identity
above, we get (1 2it)k/2 = (1 2it)1/2 kS / (t). 2 2
( ) ( )
k1 2 2 k1 4
ES 2 = , and 2 S 2 = .
k k2
The following result demonstrates that the sum of independent r.v.s
having a certain distribution need not have a distribution of the same kind, as
was the case in Theorems 25 above.
THEOREM 6 Let Xj, j = 1, . . . , k be independent r.v.s having the Cauchy distribution with
parameters = 0 and = 1. Then X = kj= 1Xj is kY, where Y is Cauchy with
= 0, = 1, and hence, X/k = X is Cauchy with = 0, = 1.
PROOF We have X(t) = Xj(t) = [X (t)]k = (e|t|)k = ek|t|, which is the
j 1
Exercises
7.3.1 For j = 1, . . . , n, let Xj be independent r.v.s distributed as P(j), and set
n n
T = Xj, = j .
j =1 j =1
7.3.3 The life of a certain part in a new automobile is an r.v. X whose p.d.f.
is Negative Exponential with parameter = 0.005 day.
iii) Find the expected life of the part in question;
iii) If the automobile comes supplied with a spare part, whose life is an r.v. Y
distributed as X and independent of it, find the p.d.f. of the combined life
of the part and its spare;
iii) What is the probability that X + Y 500 days?
7.3.4 Let X1, X2 be independent r.v.s distributed as B(n1, p1) and B(n2, p2),
respectively. Determine the distribution of the r.v.s X1 + X2, X1 X2 and X1
X2 + n2.
7.3.5 Let X1, X2 be independent r.v.s distributed as N(1, 21), and N(2, 22),
respectively. Calculate the probability P(X1 X2 > 0) as a function of 1, 2 and
1, 2. (For example, X1 may represent the tensile strength (measured in p.s.i.)
of a steel cable and X2 may represent the strains applied on this cable. Then
P(X1 X2 > 0) is the probability that the cable does not break.)
7.3.6 Let Xi, i = 1, . . . , m and Yj, j = 1, . . . , n be independent r.v.s such that
the Xs are distributed as N(1, 21) and the Ys are distributed as N(2, 22).
Then
ii) Calculate the probability P( X > Y ) as a function of m, n, 1, 2 and 1, 2;
ii) Give the numerical value of this probability for m = 10, n = 15, 1 = 2 and
21 = 22 = 6.
7.3.7 Let X1 and X2 be independent r.v.s distributed as 2r and 2r , respec-
1 2
tively, and for any two constants c1 and c2, set X = c1X1 + c2X2. Under what
conditions on c1 and c2 is the r.v. X distributed as 2r ? Also, specify r.
7.3.8 Let Xj, j = 1, . . . , n be independent r.v.s distributed as N(, 2) and set
n n
X = j X j , Y = j X j ,
j =1 j =1
To start with, consider the probability space (S, A, P) and recall that k
events A1, . . . , Ak are said to be independent if for all 2 m k and all 1 i1
< < im k, it holds that P(Ai Ai ) = P(Ai ) P(Ai ). This definition
1 m 1 m
( )
A j = X j1 B and C j = X j1 ({(, x]; x !}), j = 1, , k.
Then if Cj are independent, so are Aj, j = 1, . . . , k.
PROOF By Definition 3, independence of the r.v.s Xj, j = 1, . . . , k means
independence of the -fields. That independence of those -fields is implied by
independence of the classes Cj, j = 1, . . . , k, is an involved result in probability
theory and it cannot be discussed here.
We may now proceed with the proof of Lemma 1.
PROOF OF LEMMA 1 In the first place, if X is an r.v. and AX = X1(B), and
if g(X) is a measurable function of X and Ag(X) = [g(X)]1(B), then Ag(X) AX.
In fact, let A Ag(X). Then there exists B B such that A = [g(X)]1 (B). But
[ ( )] (B ),
1
A j* = g X j j = 1, , k.
Then
A j* A j , j = 1, , k,
Exercise
7.4.1 Consider the probability space (S, A, P) and let A1, A2 be events. Set
X1 = IA , X2 = IA and show that X1, X2 are independent if and only if A1, A2 are
1 2
Chapter 8
DEFINITION 1 i) We say that {Xn} converges almost surely (a.s.), or with probability one, to
X as n , and we write Xn n a.s.
X, or Xn X with probability 1,
n
or P[Xn
n X] = 1, if X n(s)
n X(s) for all s S except possibly for
a subset N of S such that P(N) = 0.
X means that for every > 0 and for every s N there
a.s. c
Thus Xn n
exists N(, s) > 0 such that
() ()
Xn s X s <
for all n N(, s). This type of convergence is also known as strong
convergence.
Thus Xn P
n
X means that: For every , > 0 there exists N(, ) > 0
such that P[|Xn X| > ] < for all n N(, ).
REMARK 1 Since P[|Xn X| > ] + P[|Xn X| ] = 1, then Xn P
n
X is
equivalent to: P[|Xn X| ] 1. Also if P[|Xn X| > ]
n 0 for every
n
> 0, then clearly P[|Xn X| ] 0.
n
180
8.1 Some Modes of Convergence 181
()
1 ,
fn x = 2
if ( )
x = 1 1 n or ( )
x = 1+ 1 n
0, otherwise.
EXAMPLE 1
Then, clearly, fn(x)
f(x) = 0 for all x ! and f(x) is not a p.d.f.
n
Next, the d.f. Fn corresponding to fn is given by
0,
if ( )
x < 1 1 n
()
Fn x = 21 , if 1 (1 n) x < 1 + (1 n)
1,
if x 1 + (1 n).
Fn
1
1
2 Figure 8.1
0 1" 1 1 1,
1
n n
0 , if x<1
()
F x =
x 1,
1, if
which is a d.f.
Under further conditions on fn, f, it may be the case, however, that fn
converges to a p.d.f. f.
We now assume that E|Xn|2 < , n = 1, 2, . . . , Then:
Exercises
8.1.1 For n = 1, 2, . . . , n, let Xn be independent r.v.s such that
( )
P X n = 1 = pn , ( )
P X n = 0 = 1 pn.
P
Under what conditions on the pns does Xn
0? n
8.1.4 For n = 1, 2, . . . , let Xn, Yn be r.v.s such that E(Xn Yn)2 0 and
n
suppose that E(Xn X)2
n 0 for some r.v. X. Then show that Y q.m.
n n
X.
8.1.5 Let Xj , j = 1, . . . , n be independent r.v.s distributed as U(0, 1), and
set Yn = min(X1, . . . , Xn), Zn = max(X1, . . . , Xn), Un = nYn, Vn = n(1 Zn).
Then show that, as n , one has
P
i) Yn 0;
P
ii) Zn 1;
iii) Un d
U;
iv) Vn d
V, where U and V have the negative exponential distribution
with parameter = 1.
a.s. P
THEOREM 1 i) Xn n
X implies Xn
n
X.
q.m. P
ii) Xn n
X implies X n
n
X.
P d
iii) Xn X implies Xn
n
n
X. The converse is also true if X is degener-
ate; that is, P[X = c] = 1 for some constant c. In terms of a diagram this is
a.s. conv. conv. in prob. conv. in dist.
conv. in q.m.
PROOF
i) Let A be the subset of S on which Xn
X. Then it is not hard to see
n
(see Exercise 8.2.4) that
1
A = I U I X n+ r X < ,
k=1 n=1 r =1 k
so that the set Ac for which Xn
X is given by
n
1
A c = U I U X n+ r X .
k=1 n=1 r =1 k
The sets A, Ac as well as those appearing in the remaining of this discus-
sion are all events, and hence we can take their probabilities. By setting
1
Bk = I U X n+ r X ,
n=1 r =1 k
we have BkAc, as k , so that P(Bk) P(Ac), by Theorem 2, Chapter
a.s.
2. Thus if Xn n
X, then P(Ac) = 0, and therefore P(Bk) = 0, k 1. Next,
it is clear that for every fixed k, and as n , CnBk, where
1
C n = U X n+ r X .
r =1 k
Hence P(Cn)P(Bk) = 0 by Theorem 2, Chapter 2, again. To summarize, if
a.s.
Xn n
X, which is equivalent to saying that P(Ac) = 0, one has that
P(Cn) 0. But for any fixed positive integer m,
n
1
1
X
n+ m X U X n+ r X ,
k r =1 k
so that
1 1
P X n+ m X P U X n+ r X = P C n
k r =1 k
n( )
0
P
for every k 1. However, this is equalivalent to saying that Xn
n
X, as
was to be seen.
ii) By special case 1 (applied with r = 2) of Theorem 1, we have
2
E Xn X
[
P Xn X > ] 2
.
184 8 Basic Limit Theorems
Thus, if Xn nq.m.
X, then E|Xn X|2
0 implies P[|Xn X| > ]
n
n
0 for every > 0, or equivalently, Xn
P
n
X.
iii) Let x ! be a continuity point of F and let > 0 be given. Then we have
[X x ] = [X n ] [
x, X x + X n > x, X x ]
[X n ] [
x + X n > x, X x ]
[X n x] [ X n X , ]
since
[X n ] [
> x, X x = X n > x, X x + ]
[X n ] [
X Xn X . ]
So
[X x ] [X n ] [
x Xn X ]
implies
[ ] [
P X x P Xn x + P Xn X , ] [ ]
or
( ) () [
F x Fn x + P X n X . ]
P
Thus, if Xn
X, then we have by taking limits
n
(
F x lim )inf Fn x .
n
() (1)
In a similar manner one can show that
lim
n
() (
sup Fn x F x + . ) (2)
But (1) and (2) imply F(x ) lim inf Fn(x) lim sup Fn(x) F(x + ).
n n
Letting 0, we get (by the fact that x is a continuity point of F) that
()
F x lim inf Fn x lim
n
()
sup Fn x F x .
n
() ()
Hence lim Fn(x) exists and equals F(x). Assume now that P[X = c] = 1. Then
n
0 , x<c
()
F x =
xc
1,
and our assumption is that Fn(x)
F(x), x c. We must show that
n
P
Xn
n
c. We have
[ ] [
P X n c = P X n c ]
[
= P c Xn c + ]
= P[ X c + ] P[ X
n n < c ]
P[ X c + ] P[ X
n n c ]
= F (c + ) F (c ).
n n
8.2 Relationships Among the Various Modes of Convergence 185
[ ] ( ) ( )
lim P X n c F c + F c = 1 0 = 1.
n
Thus
[
P Xn c ]
1.
n
In fact,
186 8 Basic Limit Theorems
( ) [( ) ( )]
2 2
E Xn c = E X n EX n + EX n c
= E( X EX ) + ( EX c)
2 2
n n n
= ( X ) + ( EX c ) .
2 2
n n
Hence E(Xn c)2
0 if and only if 2(Xn)
n 0 and EXn
n c.
n
REMARK 6 The following example shows that the converse of (iii) is not
true.
EXAMPLE 4 Let S = {1, 2, 3, 4}, and on the subsets of S, let P be the discrete uniform
function. Define the following r.v.s:
() () () ()
X n 1 = X n 2 = 1, X n 3 = X n 4 = 0, n = 1, 2, ,
and
()
X 1 = X 2 = 0, () () ()
X 3 = X 4 = 1.
Then
() ()
X n s X s = 1 for all s S.
Hence Xn does not converge in probability to X, as n . Now,
0, x<0 0, x<0
FX n ()
x = 21 , 0 x < 1, FX ()
x = 21 , 0 x < 1,
1, x1 1, x1
so that FX (x) = FX(x) for all x !. Thus, trivially, FX (x)
n
FX(x) for all
n n
Exercises
8.2.1 (Rnyi) Let S = [0, 1) and let P be the probability function on subsets
of S, which assigns probability to intervals equal to their lengths. For n = 1,
2, . . . , define the r.v.s Xn as follows:
j j +1
N , if
XN 2
+j
s ()
= 2N + 1
s<
2N + 1
0, otherwise,
j = 0, 1, . . . , 2N, N = 1, 2, . . . . Then show that
P
i) Xn
n
0;
ii) Xn(s)
n
0 for any s [0, 1);
iii) Xn (s)
2 0, s (0, 1);
n
iv) EXn
n
0.
8.2.2 For n = 1, 2, . . . , let Xn be r.v.s distributed as B(n, pn), where npn =
n (>0). Then, by using ch.f.s, show that Xn
n
d
n
X, where X is an r.v.
distributed as P().
8.2.3 For n = 1, 2, . . . , let Xn be r.v.s having the negative binomial distribu-
tion with pn and rn such that pn 1, rn
n
, so that rn(1 pn) = n
n
n
(>0). Show that Xn d
n
X, where X is an r.v. distributed as P( ). (Use
ch.f.s.)
8.2.4 If the i.i.d. r.v.s Xj, j = 1, . . . , n have a Cauchy distribution, show that
there is no finite constant c for which X n P
c. (Use ch.f.s.)
n
8.2.5 In reference to the proof of Theorem 1, show that the set A of conver-
gence of {Xn} to X is, indeed, expressed by A = Ik=1 Un=1 I r =1(|Xn+r X| < k1).
1 n
n X ( )
() () 1 x
j n , and x =
n
2
Xn = X , G x = P x e t 2 dt .
n j =1 2
Then Gn(x)
n
(x) for every x in !.
REMARK 8
i) We often express (loosely) the CLT by writing
(
n Xn ) Sn E Sn ( )
( )
N 0, 1 , or N 0, 1 , ( )
Sn ( )
for large n, where
n
Sn = X j , since
(
n Xn )=S n ( ).
E Sn
j =1 (S )
n
ii) In part (i), the notation Sn was used to denote the sum of the r.v.s X1, . . . ,
Xn. This is a generally accepted notation, and we are going to adhere to
it here. It should be pointed out, however, that the same or similar
symbols have been employed elsewhere to denote different quantities
(see, for example, Corollaries 1 and 2 in Chapter 7, or Theorem 9 and
Corollary to Theorem 8 in this chapter). This point should be kept in mind
throughout.
iii) In the proof of Theorem 3 and elsewhere, the little o notation will be
employed as a convenient notation for the remainder in Taylor series
expansions. A relevant comment would then be in order. To this end, let
{an}, {bn}, n = 1, 2, . . . be two sequences of numbers. We say that {an} is o(bn)
(little o of bn) and we write an = o(bn), if an/bn 0. For example, if an
n
= n and bn = n2, then an = o(bn), since n/n2 = 1/n 0. Clearly, if an =
n
o(bn), then an = bno(1). Therefore o(bn) = bno(1).
iv) We recall the following fact which was also employed in the proof of
Theorem 3, Chapter 3. Namely, if an a, then
n
n
an
1 + e a.
n n
PROOF OF THEOREM 3 We may now begin the proof. Let gn be the ch.f.
of Gn and be the ch.f. of ; that is, (t) = et /2, t !. Then, by Theorem 2, it
2
(
n Xn ) = nX n 1 n Xj
n
=
n n j =1
n
1
= Zj ,
n j =1
8.3 The Central Limit Theorem 189
Now consider the Taylor expansion of gz around zero up to the second order
1
term. Then
2
t 1 t t2
gZ
1
n
= gZ 0 +
1
t
()
g Z 0 +
2! n 1
()
g Z 0 + o .
n
1
()
n
Since
() () ( ) ()
g Z1 0 = 1, g Z 1 0 = iE Z1 = 0, g Z1 0 = i 2 E Z 12 = 1, ( )
we get
t t2
gZ
1
n
= 1
t2
2n
+ o n
= 1
t2
+
2n n
t2
o 1 = 1
t2
2n
()
1 o 1 . [ ( )]
Thus
n
()
g n t = 1
t2
2 n
[ ( )]
1o 1 .
n
.
The theorem just established has the following corollary, which along with
the theorem itself provides the justification for many approximations.
COROLLARY The convergence Gn(x)
n
(x) is uniform in x !.
(That is, for every x ! and every > 0 there exists N() > 0 independent of
x, such that |Gn(x) (x)| < for all n N() and all x ! simultaneously.)
PROOF It is an immediate consequence of Lemma 1 in Section
8.6*.
The following examples are presented for the purpose of illustrating the
theorem and its corollary.
8.3.1 Applications
1. If Xj, j = 1, . . . , n are i.i.d. with E(Xj) = , 2(Xj) = 2, the CLT is used
to give an approximation to P[a < Sn b], < a < b < +. We have:
190 8 Basic Limit Theorems
( )
a E Sn ( )
Sn E Sn b E Sn ( )
[ ]
P a < Sn b = P
( )
S n
<
Sn( )
( )
S n
a n S n E S n
= P <
( )
b n
n Sn( ) n
= P
( )
Sn E Sn
b n
P
Sn E Sn
( )
a n
( )
S n n S n ( )
n
( ) ( )
b* a * ,
where
a n b n
a* = , b* = .
n n
(Here is where the corollary is utilized. The points a* and b* do depend on n,
and therefore move along ! as n . The above approximation would not be
valid if the convergence was not uniform in x !.) That is, P(a < Sn b)
(b*) (a*).
2. Normal approximation to the Binomial. This is the same problem
as above, where now Xj, j = 1, . . . , n, are independently distributed as
B(1, p). We have = p, = pq . Thus:
( ) ( ) ( )
P a < Sn b b * a * ,
where
a np b np
a* = , b* = ,
npq npq
REMARK 9 It is seen that the approximation is fairly good provided n and
p are such that npq 20. For a given n, the approximation is best for p = 21 and
deteriorates as p moves away from 12 . Some numerical examples will shed some
light on these points. Also, the Normal approximation to the Binomial
distribution presented above can be improved, if in the expressions of a* and
b* we replace a and b by a + 0.5 and b + 0.5, respectively. This is called the
continuity correction. In the following we give an explanation of the continuity
correction. To start with, let
n
() () 1 2
fn r = p r q n r , and let n x = ex 2
,
r 2npq
where
r np
x= .
npq
Then it can be shown that fn(r)/n(x)
n
1 and this convergence is uniform
for all xs in a finite interval [a, b]. (This is the De Moivre theorem.) Thus for
8.3 The Central Limit Theorem 191
large n, we have, in particular, that fn(r) is close to n(x). That is, the probability
(nr)prqnr is approximately equal to the value
r np 2
1
exp
(
)
2npq 2 npq
of the normal density with mean np and variance npq for sufficiently large n.
Note that this asymptotic relationship of the p.d.f.s is not implied, in general,
by the convergence of the distribution functions in the CLT.
To give an idea of how the correction term 21 comes in, we refer to Fig. 8.2
drawn for n = 10, p = 0.2.
0.3
N(2, 1.6)
0.2
0.1
0 1 2 3 4 5
Figure 8.2
Now
( ) ( )
P 1 < S n 3 = P 2 S n 3 = fn 2 + fn 3 () ()
= shaded area,
while the approximation without correction is the area bounded by the normal
curve, the horizontal axis, and the abscissas 1 and 3. Clearly, the correction,
given by the area bounded by the normal curve, the horizontal axis and the
abscissas 1.5 and 3.5, is closer to the exact area.
To summarize, under the conditions of the CLT, and for discrete r.v.s,
b n a n
( )
P a < S n b
n
n
without continuity correction,
and
b + 0.5 n a + 0.5 n
( )
P a < S n b
n n
with continuity correction.
In particular, for integer-valued r.v.s and probabilities of the form P(a Sn
bn), we first rewrite the expression as follows:
192 8 Basic Limit Theorems
( ) ((
P a S n bn = P a 1 < S n bn , ) ) (3)
and then apply the above approximations in order to obtain:
( ) ( ) ( )
P a Sn b b * a * without continuity correction,
where
a 1 n b n
a* = , b* = , (4)
n n
and
( ) ( ) ( )
P a S n b b a with continuity correction,
where
a 0.5 n b + 0.5 n
a = , b = . (5)
n n
These expressions of a*, b* and a, b in (4) and (5) will be used in calculating
probabilities of the form (3) in the numerical examples below.
EXAMPLE 5 (Numerical) For n = 100 and p1 = 21 , p2 = 16
5 , find P(45 Sn 55).
i) For p1 = 21 : Exact value: 0.7288
Normal approximation without correction:
1
44 100
a* = 2 = 6 = 1.2,
1 1 5
100
2 2
1
55 100
2 5
b* = = = 1.
1 1 5
100
2 2
Thus
( ) ( ) () ( )
b * a * = 1 1.2 = 1 + 1.2 1 () ( )
= 0.841345 + 0.884930 1 = 0.7263.
Normal approximation with correction:
1
45 0.5 100
a = 2 = 5.5 = 1.1
1 1 5
100
2 2
1
55 + 0.5 100
b = 2 = 5.5 = 1.1.
1 1 5
100
2 2
8.3 The Central Limit Theorem 193
Thus
( ) ( ) ( ) ( ) ( )
b a = 1.1 1.1 = 2 1.1 1 = 2 0.864334 1 = 0.7286.
Error without correction: 0.7288 0.7263 = 0.0025.
Error with correction: 0.7288 0.7286 = 0.0002.
ii) For p2 = 16
5 , working as above, we get:
Exact value: 0.0000.
a* = 2.75, b* = 4.15, ( ) ( )
so that b * a * = 0.0030.
a = 2.86, b = 5.23, so that (b) (a) = 0.0021.
Then:
Error without correction: 0.0030.
Error with correction: 0.0021.
3. Normal approximation to Poisson. This is the same problem as in
(1), where now Xj, j = 1, . . . , n are independent P(). We have = , = .
Thus
b n a n
( )
P a < S n b
n
n
without continuity correction,
and
b + 0.5 n a + 0.5 n
( )
P a < S n b
n n
with continuity correction.
Probabilities of the form P(a Sn b) are approximated as follows:
( ) ( ) ( )
P a Sn b b * a * without continuity correction,
where
a 1 n b n
a* = , b* = ,
n n
and
( ) ( ) ( )
P a S n b b a with continuity correction,
where
a 0.5 n b + 0.5 n
a = , b = .
n n
EXAMPLE 6 (Numerical) For n = 16, find P(12 Sn 21). We have:
Exact value: 0.7838.
194 8 Basic Limit Theorems
Exercises
8.3.1 Refer to Exercise 4.1.12 of Chapter 4 and suppose that another manu-
facturing process produces light bulbs whose mean life is claimed to be 10%
higher than the mean life of the bulbs produced by the process described in the
exercise cited above. How many bulbs manufactured by the new process must
be examined, so as to establish the claim of their superiority with probability
0.95?
8.3.2 A fair die is tossed independently 1,200 times. Find the approximate
probability that the number of ones X is such that 180 X 220. (Use the
CLT.)
8.3.3 Fifty balanced dice are tossed once and let X be the sum of the
upturned spots. Find the approximate probability that 150 X 200. (Use the
CLT.)
8.3.4 Let Xj, j = 1, . . . , 100 be independent r.v.s distributed as B(1, p). Find
the exact and approximate value for the probability P( 100 j =1 Xj
= 50). (For the
latter, use the CLT.)
8.3.5 One thousand cards are drawn with replacement from a standard deck
of 52 playing cards, and let X be the total number of aces drawn. Find the
approximate probability that 65 X 90. (Use the CLT.)
8.3.6 A Binomial experiment with probability p of a success is repeated
1,000 times and let X be the number of successes. For p = 12 and p = 14, find
the exact and approximate values of probability P(1,000p 50 X 1,000p
+ 50). (For the latter, use the CLT.)
8.3.7 From a large collection of bolts which is known to contain 3% defec-
tive bolts, 1,000 are chosen at random. If X is the number of the defective bolts
among those chosen, what is probability that this number does not exceed 5%
of 1,000? (Use the CLT.)
Exercises 195
8.3.8 Suppose that 53% of the voters favor a certain legislative proposal.
How many voters must be sampled, so that the observed relative frequency of
those favoring the proposal will not differ from the assumed frequency by
more than 2% with probability 0.99? (Use the CLT.)
8.3.9 In playing a game, you win or lose $1 with probability 12 . If you play the
game independently 1,000 times, what is the probability that your fortune (that
is, the total amount you won or lost) is at least $10? (Use the CLT.)
8.3.10 A certain manufacturing process produces vacuum tubes whose life-
times in hours are independently distributed r.v.s with Negative Exponential
distribution with mean 1,500 hours. What is the probability that the total life of
50 tubes will exceed 75,000 hours? (Use the CLT.)
8.3.11 Let Xj, j = 1, . . . , n be i.i.d. r.v.s such that EXj = finite and 2(Xj) =
2 = 4. If n = 100, determine the constant c so that P(|X n | c) = 0.90. (Use
the CLT.)
8.3.12 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with EXj = finite and 2(Xj) = 2
(0, ).
i) Show that the smallest value of the sample size n for which P(|X n | k)
p is given by n = [ k1 1 ( 1+2 p )] if this number is an integer, and n is
2
< 1). It may be assumed that acceptance and rejection of admission offers by
the various students are independent events.
i) How many students n must be admitted, so that the probability P(|X c|
d) is maximum, where X is the number of students actually accepting an
admission, and d is a prescribed number?
ii) What is the value of n for c = 20, d = 2, and p = 0.6?
iii) What is the maximum value of the probability P(|X 20| 2) for p = 0.6?
Hint: For part (i), use the CLT (with continuity correction) in order to find the
approximate value to P(|X c| d). Then draw the picture of the normal
curve, and conclude that the probability is maximized when n is close to c/p.
For part (iii), there will be two successive values of n suggesting themselves as
optimal values of n. Calculate the respective probabilities, and choose that
value of n which gives the larger probability.)
THEOREM 5 (WLLN) If Xj, j = 1, . . . , n, are i.i.d. r..s with (finite) mean , then
X1 + + X n P
Xn = .
n
n
PROOF
i) The proof is a straightforward application of Tchebichevs inequality under
the unnecessary assumption that the r.v.s also have a finite variance 2.
Then EXn = , 2(X n) = 2/n, so that, for every > 0,
[
P Xn ] 1 2
2 n
0 as n .
8.4 Laws of Large Numbers 197
ii) This proof is based on ch.f.s (m.g.f.s could also be used if they exist). By
Theorems 1(iii) (the converse case) and 2(ii) of this chapter, in order to
prove that Xn P
, it suffices to prove that
n
X t
n
() n
()
t = e it , for t ! .
For simplicity, writing Xj instead of nj=1 Xj, when this last expression
appears as a subscript, we have
n
t t
n
() j
()
X t = 1 nx t = x = X
n n
j 1
n
t t
= 1 + i + o
n n
n
t t
= 1 + i + o 1
n n
()
n
t
= 1 + i + o 1
n
n [
e it . ( )]
REMARK 10 An alternative proof of the WLLN, without the use of ch.f.s,
is presented in Lemma 1 in Section 8.6*. The underlying idea there is that of
truncation, as will be seen.
Both laws of LLN hold in all concrete cases which we have studied except
for the Cauchy case, where E(Xj) does not exist. For example, in the Binomial
case, we have:
If Xj, j = 1, . . . , n are independent and distributed as B(1, p), then
X1 + + X n
Xn =
p
n
a.s.
n
and also in probability.
For the Poisson case we have:
If Xj, j = 1, . . . , n are independent and distributed as P(), then:
X1 + + X n
Xn =
n
a.s.
n
and also in probability.
For x ! , ()
Fn x =
1
n
[
the number of X 1 , , X n x .]
198 8 Basic Limit Theorems
Fn is a step function which is a d.f. for a fixed set of values of X1, . . . , Xn. It is
also an r.v. as a function of the r.v.s X1,, . . . , Xn, for each x. Let
1, Xj x
()
Yj x = Yj =
0, X j > x, j = 1, , n.
Then, clearly,
1 n
()
Fn x = Yj .
n j =1
On the other hand, Yj, j = 1, . . . , n are independent since the Xs are, and Yj is
B(1, p), where
( ) (
p = P Yj = 1 = P X j x = F x . ) ()
Hence
n n
()
E Yj = np = nF x , 2 Yj = npq = nF x 1 F x .
j =1 j =1
( )[ ( )]
It follows that
[ ( )]
1
E Fn x =
n
nF x = F x . () ()
So for each x !, we get by the LLN
() a.s.
Fn x n
F x , () ()
Fn x
P
n
F x. ()
Actually, more is true. Namely,
THEOREM 6 (GlivenkoCantelli Lemma) With the above notation, we have
a.s.
{ () ()
P sup Fn x F x ; x !
n }
0 = 1
(that is, Fn(x) n
F(x) uniformly in x !).
PROOF Omitted.
Exercises
8.4.1 Let Xj, j = 1, . . . , n be i.i.d. r.v.s and suppose that EX kj is finite for a
given positive integer k. Set
n
(k ) 1
X n = X kj
n j =1
for the kth sample moment of the distribution of the Xs and show that
X n(k)
P
EX k1.
n
8.4.2 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with p.d.f. given in Exercise 3.2.14 of
Chapter 3 and show that the WLLN holds. (Calculate the expectation by
means of the ch.f.)
8.5 Further Limit Theorems 199
X n n
P
n
0.
Show that if the Xs are pairwise uncorrelated and 2j M(<), j 1, then the
generalized version of the WLLN holds.
( ) (
P X j = j = P X j = j = ) 1
2
.
Show that for all s such that 0 < 1, the generalized WLLN holds.
8.4.5 Decide whether the generalized WLLN holds for independent r.v.s
such that the jth r.v. has the Negative Exponential distribution with parameter
j = 2j/2.
8.4.6 For j = 1, 2, . . . , let Xj be independent r.v.s such that Xj is distributed
as j2/ j . Show that the generalized WLLN holds.
8.4.7 For j = 1, 2, . . . , let Xj be independent r.v.s such that Xj is distributed
as P( j). If {1/nnj=1j} remains bounded, show that the generalized WLLN
holds.
THEOREM 7 i) Let Xn, n 1, and X be r.v.s, and let g: !! be continuous, so that g(Xn),
a.s.
n 1, and g(X) are r.v.s. Then Xn n a.s.
X implies g(Xn) n
g(X).
ii) More generally, if for j = 1, . . . , k, X n( j), n 1, and Xj are r.v.s, and g:
! k! is continuous, so that g(X (1) (k)
n , . . . , X n ) and g(X1, . . . , Xk) are r.v.s,
then
( j ) a.s.
X n n
Xj,
(1 ) (k )
j = 1, , k imply g X n , , X n n
a.s.
( )
g X1 , , X k .
200 8 Basic Limit Theorems
PROOF Follows immediately from the definition of the a.s. convergence and
the continuity of g.
A similar result holds true when a.s. convergence is replaced by conver-
gence in probability, but a justification is needed.
P
THEOREM 7 i) Let Xn, n 1, X and g be as in Theorem 7(i), and suppose that Xn
n
X.
P
Then g(Xn)
n
g(X).
ii) More generally, let again X (j) n , Xj and g be as in Theorem 7(ii), and suppose
P P
that X (j)
n
n
X j , j = 1, . . . , n , . . . , Xn )
k. Then g(X (1) (k)
n
g(X1, . . . , Xk).
PROOF
i) We have P(X !) = 1, and if Mn (Mn > 0), then P(X [Mn, Mn])
1. Thus there exist n0 sufficiently large such that
n
([ (
P X , M n 0
)] + [ X (M n0 , )]) = P( X > M ) < 2 (M
n0 n0 )
>1 .
Define M = Mn ; we then have
0
(
P X > M < 2. )
g being continuous in !, is uniformly continuous in [2M, 2M]. Thus for
every > 0, there exists (, M) = () (<1) such that |g(x) g(x)| < for
all x, x [2M, 2M] with |x x| < (). From Xn P
n
X we have that
there exists N() > 0 such that
[ ( )]
P Xn X < 2 , n N . ()
Set
[ ] () [
A1 = X M , A2 n = X n X < , ( )]
and
() [ ( ) ( )
A3 n = g X n g X < ] (for n N ( )).
Then it is easily seen that on A1 A2(n), we have 2M < X < 2M, 2M < Xn
< 2M, and hence
()
A1 A2 n A3 n , ()
which implies that
()
A3c n A1c Ac2 n . ()
Hence
[ ( )] ( ) [ ( )]
P A3c n P A1c + P Ac2 n 2 + 2 =
(for n N ( )).
That is, for n N(),
[ ( ) ( )
P g X n g X < . ]
8.5 Further Limit Theorems 201
PROOF In suffices to take g as follows and apply the second part of the
theorem:
i) g(x, y) = x + y,
ii) g(x, y) = ax + by,
iii) g(x, y) = xy,
iv) g(x, y) = x/y, y 0.
The following is in itself a very useful theorem.
d P
THEOREM 8 If Xn
n
X and Yn
n
c, constant, then
d
i) Xn +Yn
n
X + c,
d
ii) XnYn
n
cX,
d
iii) Xn/Yn
n
X/c, provided P(Yn 0) = 1, c 0.
Equivalently,
( )
i) P X n + Yn z = FX n +Yn z
()
FX + c z
n ()
( ) (
= P X + c z = P X z c = FX z c ; ) ( )
( )
ii) P X nYn z = FX Y z
FcX z
n n n
() ()
z z
P X = FX , c>0
c c
(
= P cX z = )
P X z = 1 F z , c < 0;
c
X
c
X
iii) P n z = FX n
Yn
Yn (z) F (z)
n X c
X P X cz = FX cz ,
= P z =
( c>0 ) ( )
c (
P X cz = 1 FX cz ,
c < 0, ) ( )
provided P(Yn 0) = 1.
202 8 Basic Limit Theorems
Thus
X
( ) [
P n z P Yn c + P X n z c , z !.
Yn
( )]
Letting n and taking into consideration the fact that P(|Yn c| ) 0
and P[Xn z(c )] FX[z(c )], we obtain
X
lim sup P n z FX z c , z !.
n Yn
[ ( )]
8.5 Further Limit Theorems 203
Next,
(Y c < ) (Y c )
n n
[ X z( c )] ( Y c < ).
n n
By choosing < c, we have that |Yn c| < is equivalent to 0 < c < Yn <
c + and hence
[X z(c )]I ( Y
n n
X
)
Yn
c < n z , if
z 0,
and
[X z(c + )]I ( Y
n n
X
)
Yn
c < n z , if
z < 0.
[X z(c )]I ( Y
n n )
X
c < n z
Yn
and hence
[X z(c )] ( Y
n n c )U XY n
n
z , z ! .
Thus
[ ( )] (
X
P X n z c P Yn c + P n z .
Yn
)
Letting n and taking into consideration the fact that P(|Yn c| ) 0
and P[Xn z(c )] Fx[z(c )], we obtain
[ ( )] X
FX z c lim inf P n z , z !.
n Yn
Since, as 0, FX[z(c )] FX(zc), we have
X
( )
FX zc lim inf P n z , z !.
n n
Y
(7)
Relations (6) and (7) imply that lim P(Xn/Yn z) exists and is equal to
n
X
( ) ( )
FX zc = P X zc = P z = FX
c
c (z).
Thus
204 8 Basic Limit Theorems
X
P n z = FX
Yn n Yn (z) F (z),
n X c z ! ,
as was to be seen.
REMARK 12 Theorem 8 is known as Slutskys theorem.
Now, if Xj, j = 1, . . . , n, are i.i.d. r.v.s, we have seen that the sample
variance
1 n 1 n
( )
2
S n2 = X j Xn
n j =1
= X 2j X n2 .
n j =1
Next, the r.v.s X j2, j = 1, . . . , n are i.i.d., since the Xs are, and
( ) ( ) (
E X 2j = 2 X j + EX j )
2
= 2 + 2 , if ( )
= E Xj , 2 =2 Xj ( )
(which are assumed to exist). Therefore the SLLN and WLLN give the result
that
1 n
X 2j
n j =1
2 + 2
n
a.s.
and also in probability (by the same theorems just referred to). So we have
proved the following theorem.
REMARK 13 Of course,
2
n Sn P
S 2n
P
2 implies
1,
n
n 1 2 n
since n/(n 1)
1.
n
COROLLARY If X1, . . . , Xn are i.i.d. r.v.s with mean and (positive) variance 2, then
TO THEOREM 8
(
n 1 Xn ) N 0, 1 (
n Xn ) N 0, 1 .
Sn
d
n( ) and also
Sn
d
n ( )
PROOF In fact,
(
n Xn ) N 0, 1 ,
d
n ( )
8.5 Further Limit Theorems 205
by Theorem 3, and
n
Sn P
1,
n1
n
[ ( ) ( )] ( ) (
cn g X n g d = cn X n d g X *n . ) (9)
P P
However, |X n* d| |Xn d|
0 by (8), so that Xn*
d, and therefore,
by Theorem 7(i) again,
( )
g X *n
P
g d. () (10)
By assumption, convergence (10) and Theorem 8(ii), we have cn(Xn d)
d g(d)X. This result and relation (9) complete the proof of the
g(X n*)
theorem.
COROLLARY Let the r.v.s X1, . . . , Xn be i.i.d. with mean ! and variance 2 (0, ),
and let g: ! ! be differentiable with derivative continuous at . Then, as
n ,
[ ( ) ( )]
[ ( )] .
2
n g Xn g
d N 0, g
[ ( ) ( )] ()
[ ( )] .
2
n g Xn g
d
g X ~ N 0, g
APPLICATION If the r.v.s Xj, j = 1, . . . , n in the corollary are distributed as
B(1, p), then, as n ,
[ ( )
n X n 1 X n pq ] (
d N 0, pq 1 2 p .
2
)
Here = p, 2 = pq, and g(x) = x(1 x), so that g(x) = 1 2x. The result
follows.
Exercises
8.5.1 Use Theorem 8(ii) in order to show that if the CLT holds, then so does
the WLLN.
8.5.2 Refer to the proof of Theorem 7(i) and show that on the set A1
A2(n), we actually have 2M < X < 2M.
8.5.3 Carry out the proof of Theorem 7(ii). (Use the usual Euclidean
distance in ! k.)
LEMMA 1 (Plya). Let F and {Fn} be d.f.s such that Fn(x) F(x), x !, and let F be
n
continuous. Then the convergence is uniform in x !. That is, for every > 0
there exists N() > 0 such that n N() implies that |Fn(x) F(x)| < for every
x !.
PROOF Since F(x) 0 as x , and F(x) 1, as x , there exists an
interval [, ] such that
( ) ()
F < 2, F > 1 2 . (11)
The continuity of F implies its uniform continuity in [, ]. Then there is a
finite partition = x1 < x2 < < xr = of [, ] such that
( ) ( )
F x j +1 F x j < 2, j = 1, , r 1. (12)
Next, Fn(xj)
F(xj) implies that there exists Nj() > 0 such that for all
n
n Nj(),
( ) ( )
Fn x j F x j < 2, j = 1, , r .
By taking
8.6* Plyas Lemma and Alternative Proof of the WLLN 207
() ( ()
n N = max N1 , , N r , ( ))
we then have that
( ) ( )
Fn x j F x j < 2, j = 1, , r. (13)
Let x0 = , xr+1 = . Then by the fact that F() = 0 and F() = 1, relation (11)
implies that
( ) ( ) ( ) ( )
F x1 F x0 < 2, F x r +1 F x r < 2. (14)
Thus, by means of (12) and (14), we have that
( ) ( )
F x j +1 F x j < 2, j = 0, 1, , r . (15)
Also (13) trivially holds for j = 0 and j = r + 1; that is, we have
( ) ( )
Fn x j F x j < 2, j = 0, 1, , r + 1. (16)
Next, let x be any real number. Then xj x < xj+1 for some j = 0, 1, . . . , r. By (15)
and (16) and for n N(), we have the following string of inequalities:
( ) ( ) () ( ) ( )
F x j 2 < Fn x j Fn x Fn x j+1 < F x j1 + 2
< F (x ) + F (x) + F (x ) + .
j j+1
Hence
() () ( ) ( )
0 F x + Fn x F x j+1 + F x j + 2 < 2
and therefore |Fn(x) F(x)| < . Thus for n N(), we have
() ()
Fn x F x < for every x !. (17)
Relation (17) concludes the proof of the lemma.
Below, a proof of the WLLN (Theorem 5) is presented without using
ch.f.s. The basic idea is that of suitably truncating the r.v.s involved, and is
due to Khintchine; it was also used by Markov.
and
0, if Xj n
()
Zj n = Zj =
X j > n, j = 1, , n.
X j , if
Then, clearly, Xj = Yj + Zj, j = 1, . . . , n. Let us restrict ourselves to the
continuous case and let f be the (common) p.d.f. of the Xs. Then,
208 8 Basic Limit Theorems
( )
2 Y j = 2 Y1 ( )
( ) ( )
= E Y 12 EY1
2
( )
E Y 12
{
= E X 12 I
[X 1 n ] ( X1 ) }
[ x n] ( x ) f ( x )dx
= x2I
() () ()
n n
= x 2 f x dx n x f x dx n x f x dx
n n
= nE X1 ;
that is,
( )
2 Yj n E X 1 . (18)
Next,
( )
E Yj = E Y1 = E X 1 I ( ) { [X 1 n }
] ( X1 )
[ x n ] ( x) f ( x)dx.
= xI
Now,
xI
[ x n] ( x ) f ( x ) < x f ( x ), xI [ x n] ( x ) f ( x )
xf ( x ),
n
and
()
x f x dx < .
Therefore
[ x n] ( x ) f ( x )dx xf ( x )dx =
xI n
by Lemma C of Chapter 6; that is,
E Yj
( )
.
n
(19)
1 n n n
P Yj EYj = P Yj E Yj n
n j = 1 j = 1 j =1
1 n
2 Yj
n 22
j =1
=
n 2 Y1 ( )
n 2 2
n n E X 1
n 2 2
= E X1
2
8.6* Plyas Lemma and Alternative Proof of the WLLN 209
1 n 1 n
( ) ( ( ) )
P Y j 2 = P Y1 E Y j + E Y1 2
n j=1 n j=1
1 n
P Y j EY1 + EY1 2
n j=1
1 n
P Y j EY1 + P EY1
n j=1
[ ]
2 E X1
for n sufficiently large, by (19) and (20); that is,
1 n
P Y j 2 2 E X1 (21)
n j=1
for n large enough. Next,
( ) (
P Zj 0 = P Zj > n )
= P ( X > n)
j
() f ( x)dx
n
= f x dx +
n
) ( )
= f x dx
( x > n
) ( )
= f x dx
( x > in >1
x
<
( x > n ) n ()
f x dx
=
1
n ( x > n )
x f x dx ()
1 2
<
n
= , since
n ( x > n ) ()
x f x dx < 2
PROOF Omitted.
As an application of Theorem 11, refer to Example 2 and consider the
subsequence of r.v.s {X2 1}, where
k
X2
k
1
= I 2
k1
1
.
k 1 1
2
Then for > 0 and large enough k, so that 1/2k1 < , we have
( ) ( 1
P X 2 1 > = P X 2 1 = 1 = k1 < .
k
2
k )
Hence the subsequence {X2 1} of {Xn} converges to 0 in probability.
k
Exercises 211
Exercises
8.6.1 Use Theorem 11 in order to prove Theorem 7(i).
8.6.2 Do likewise in order to establish part (ii) of Theorem 7.
212 9 Transformations of Random Variables and Random Vectors
Chapter 9
{ ( ) }
A = xi; h xi = y j ,
and hence
( ) ( ) ({ }) = P ( A) = f (x ),
fY y j = P Y = y j = PY y j X
x i A
X i
212
9.1 The Univariate Case 213
where
( ) (
fX x i = P X = x i . )
EXAMPLE 1 Let X take on the values n, . . . , 1, 1, . . . , n each with probability 1/2n, and
let Y = X2. Then Y takes on the values 1, 4, . . . , n2 with probability found as
follows: If B = {r2}, r = 1, . . . , n, then
( ) (
A = h 1 B = x 2 = r 2 = x = r or ) ( x=r )
= ( x = r ) + ( x = r ) = { r} + {r}.
Thus
( ) ( )
PY B = PX A = PX r + PX r ({ }) ({ }) = 21n + 21n = 1n .
That is,
( )
P Y = r 2 = 1 n, r = 1, , n.
EXAMPLE 2 Let X be P() and let Y = h(X) = X2 + 2X 3. Then Y takes on the values
{y = x 2
} {
+ 2 x 3; x = 0, 1, = 3, 0, 5, 12, . }
From
x 2 + 2 x 3 = y,
we get
( )
x 2 + 2 x y + 3 = 0, so that x = 1 y + 4 .
( ) {
A = h 1 B = 1 + y + 4 , }
and
e 1+ y + 4
( ) (
PY B = P Y = y = PX A = ) ( ) .
(1 + y + 4 ! )
For example, for y = 12, we have P(Y = 12) = e3/3!.
It is a fact, proved in advanced probability courses, that the distribution PX of
an r.v. X is uniquely determined by its d.f. X. The same is true for r. vectors.
(A first indication that such a result is feasible is provided by Lemma 3 in
Chapter 7.) Thus, in determining the distribution PY of the r.v. Y above, it
suffices to determine its d.f., FY. This is easily done if the transformation h is
one-to-one from S onto T and monotone (increasing or decreasing), where S
is the set of values of X for which fX is positive and T is the image of S, under
h: that is, the set to which S is transformed by h. By one-to-one it is meant
that for each y T, there is only one x S such that h(x) = y. Then the inverse
214 9 Transformations of Random Variables and Random Vectors
= P{h [ h( X )] h ( y)}
1 1
= P( X x ) = F ( x ),
X
1
where x = h (y) and h is increasing. In the case where h is decreasing, we have
( ) [ ( ) ] { [ ( )]
FY y = P h X y = P h 1 h X h 1 y ( )}
= P[ X h ( y)] = P( X x )
1
= 1 P( X < x ) = 1 F ( x ),
X
where FX(x) is the limit from the left of FX at x; FX(x) = lim FX(y), y x.
REMARK 1 Figure 9.1 points out why the direction of the inequality is re-
versed when h1 is applied if h in monotone decreasing.
y
y ! h(x)
y0
(y * y0) corresponds,
under h, to (x - x0)
x
0 x0 ! h"1 (y0 )
Figure 9.1
that is,
()
FY y = FX ( y ) F ( y )
X
We will now focus attention on the case that X has a p.d.f. and we will
determine the p.d.f. of Y = h(X), under appropriate conditions.
One way of going about this problem would be to find the d.f. FY of the r.v.
Y by Theorem 1 (take B = (, y], y ! ), and then determine the p.d.f. fY of
Y, provided it exists, by differentiating (for the continuous case) FY at continu-
ity points of fY. The following example illustrates the procedure.
EXAMPLE 4 In Example 3, assume that X is N(0, 1), so that
()
1 x 2 2
fX x = e .
2
Then, if Y = X2, we know that
()
FY y = FX ( y ) F ( y ),
X y 0.
Next,
d
dy
FX ( y ) = f ( y ) dyd
X y=
2 y
1
fX ( y) = 2 1
2 y
e y 2 ,
and
d
dy ( )
FX y =
1
2 y
fX y = ( )
1
2 2 y
e y 2 ,
so that
d
() ()1 1 1
( = ( )),
1
1
FY y = fY y = ey 2 = y 2
ey 2 1
dy
( )Z
1
2
y 2 1 2
2
y 0 and zero otherwise. We recognize it as being the p.d.f. of a 21 distributed
r.v. which agrees with Theorem 3, Chapter 4.
Another approach to the same problem is the following. Let X be an r.v.
whose p.d.f. fX is continuous on the set S of positivity of fX. Let y = h(x) be a
(measurable) transformation defined on ! into ! which is one-to-one on the
set S onto the set T (the image of S under h). Then the inverse transformation
x = h1(y) exists for y T. It is further assumed that h1 is differentiable and its
derivative is continuous and different from zero on T. Set Y = h(X), so that Y
is an r.v. Under the above assumptions, the p.d.f. fY of Y is given by the
following expression:
()
fY y = X [ ( )]
f h 1 y
d 1
dy
h y , ()
y T
0, otherwise.
For a sketch of the proof, let B = [c, d] be any interval in T and set A = h1(B).
Then A is an interval in S and
( ) [( ) ] (
P Y B = P h X B = P X A = fX x dx. ) A
()
Under the assumptions made, the theory of changing the variable in the
integral on the right-hand side above applies (see for example, T. M. Apostol,
216 9 Transformations of Random Variables and Random Vectors
( )
P Y B = fX h 1 y
B [ ( )] dyd h ( y) dy. 1
()
h 1 y =
1
a
(
yb ) and
d 1
dy
1
h y = .
a
()
Therefore,
()
fY y =
1
exp
( a
1
y b
)
2 2 2 a
[ ( )]
2
1 y a + b
= exp
2 a 2 a 2 2
which is the p.d.f. of a normally distributed r.v. with mean a + b and variance
a22. Thus, if X is N(, 2), then aX + b is N(a + b, a22).
Now it may happen that the transformation h satisfies all the requirements
of Theorem 2 except that it is not one-to-one from S onto T. Instead, the
following might happen: There is a (finite) partition of S, which we denote by
9.1 The Univariate Case 217
j
()
fY y = fX h j 1 y [ ( )] dyd h ( y) ,
1
j y T j ,
j
() [ ( )] dyd h ( y) ;
fY y = fX h j 1 y 1
j
(
S1 = , 0 , ] (
S 2 = 0, , ) [
T1 = 0, , ) (
T2 = 0, )
by assuming that fX(x) > 0 for every x !. Next,
()
h11 y = y , ()
h 21 y = y,
so that
d 1
dy
h1 y = ()
1
,
d 1
dy
()
h2 y =
1
, y > 0.
2 y 2 y
Therefore,
218 9 Transformations of Random Variables and Random Vectors
()
fY y = fX y
1 ( ) 2 1y , ()
fY y = fX
2 ( y) 2 1y ,
and for y > 0, we then get
()
fY y =
1
2 y
fX ( y ) + f ( y ),
X
Exercises
9.1.1 Let X be an r.v. with p.d.f. f given in Exercise 3.2.14 of Chapter 3 and
determine the p.d.f. of the r.v. Y = X 3.
9.1.2 Let X be an r.v. with p.d.f. of the continuous type and set
Y = nj= 1cjIB (X), where Bj, j = 1, . . . , n, are pairwise disjoint (Borel) sets and cj,
j
j = 1, . . . , n, are constants.
i) Express the p.d.f. of Y in terms of that of X, and notice that Y is a discrete
r.v. whereas X is an r.v. of the continuous type;
ii) If n = 3, X is N(99, 5) and B1 = (95, 105), B2 = (92, 95) + (105, 107),
B3 = (, 92] + [107, ), determine the distribution of the r.v. Y defined
above;
iii) If X is interpreted as a specified measurement taken on each item of a
product made by a certain manufacturing process and cj, j = 1, 2, 3 are the
profit (in dollars) realized by selling one item under the condition that
X Bj, j = 1, 2, 3, respectively, find the expected profit from the sale of one
item.
9.1.3 Let X, Y be r.v.s representing the temperature of a certain object
in degrees Celsius and Fahrenheit, respectively. Then it is known that Y = 95 X
+ 32. If X is distributed as N(, 2), determine the p.d.f. of Y, first by determin-
ing its d.f., and secondly directly.
9.1.4 If the r.v. X is distributed as Negative Exponential with parameter ,
find the p.d.f. of each one of the r.v.s Y, Z, where Y = eX, Z = log X, first by
determining their d.f.s, and secondly directly.
9.1.5 If the r.v. X is distributed as U(, ):
i) Derive the p.d.f.s of the following r.v.s: aX + b (a > 0), 1/(X + 1), X2 +1,
eX, log X (for > 0), first by determining their d.f.s, and secondly directly;
ii) What do the p.d.f.s in part (i) become for = 0 and = 1?
iii) For = 0 and = 1, let Y = log X and suppose that the r.v.s Yj, j = 1, . . . ,
n, are independent and distributed as the r.v. Y. Use the ch.f. approach to
determine the p.d.f. of nj= 1Yj.
9.2
9.1 The
TheMultivariate
Univariate Case 219
() (
FY y = P X1 + X 2 y = )
{x +x 1 2 y }
fX 1 ,X 2
(x , x )dx dx .
1 2 1 2
2 < y 2 , FY y = () 1
A,
( )
2
220 9 Transformations of Random Variables and Random Vectors
where A is the area of that part of the square lying to the left of the line
x1 + x2 = y. Since for y + , A = (y 2)2/2, we get
( y 2 )
2
F ( y) = for 2 < y + .
2( )
Y 2
= 1 2 y
() 1
(
)
2
FY y = .
( ) 2
( )
2 2
2
Thus we have:
0 , y 2
( )
2
y 2
, 2 < y +
( )
2
2
()
FY y =
( )
2
2 y
1 , + < y 2
( )
2
2
1, y > 2 .
x2
2&
x1 , x2 ! y
&
x1 , x2 ! y
%,&
x1
% 0 & 2&
2%
Figure 9.2
REMARK 3 The d.f. of X1 + X2 for any two independent r.v.s (not necessarily
U(, ) distributed) is called the convolution of the d.f.s of X1, X2 and is
denoted by FX +X = FX * FX . We also write fX +X = fX * fX for the corresponding
1 2 1 2 1 2 1 2
(
P X1 = y1 y2 P X 2 = y2 ) ( )
n1 y y n ( y y ) n2 y n y2
= p q p q 1 2 1 1 2 2 2
y1 y2 y2
n1 n2 y ( n + n ) y
= p q ; 1 1 2 1
y1 y2 y2
that is
n1 n2 y ( n + n ) y
1 2
(
fY ,Y y1 , y2 = )
p q
y1 y2 y2
, 1 1 2 1
0 y1 n1 + n2
(
u = max 0, y1 n1 y2 min y1 , n2 = . ) ( )
Thus
n1 n2
( )
fY y1 = P Y1 = y1 = ( ) f (y , y ) = pY1 ,Y2 1 2
y1
q
( n + n ) y
1 2 1
y
.
y 2 y2
1
y2 =u y2 =u 1
Next, for the four possible values of the pair, (u, ), we have
y1
n1 n2 n n1 n2 2 y
n1 n2 1
y = =
y2 = 0 1 y2 y2 y =0 y1 y2 y2
2 y = y n y1 y2 y2 2 1 1
n2
n1 n2 n1 + n2
= = ;
y1 y2 y2 y1
y2 = y1 n 1
that is, Y1 = X1 + X2 is B(n1 + n2, p). (Observe that this agrees with Theorem 2,
Chapter 7.)
Finally, with y1 and y2 as above, it follows that
n1 n2
( y y2 y2
P Y2 = y2 Y1 = y1 = 1
n1 + n2
, )
y1
the hypergeometric p.d.f., independent, of p!.
We next have two theorems analogous to Theorems 2 and 3 in Section 1.
That is,
222 9 Transformations of Random Variables and Random Vectors
THEOREM 2 Let the k-dimensional r. vector X have continuous p.d.f. fX on the set S on
which it is positive, and let
() ( ()
y = h x = h1 x , , hk x ( ))
be a (measurable) transformation defined on ! k into ! k, so that Y = h(X) is a
k-dimensional r. vector. Suppose that h is one-to-one on S onto T (the image
of S under h), so that the inverse transformation
() ( ()
x = h 1 y = g1 y , , gk y ( )) exists for y T .
()
fY y = X [ ( )]
f h 1 y J = f g y ,
X 1 [ ()
, gk y J , ( )] y T
0, otherwise,
where the Jacobian J is a function of y and is defined as follows
and is assumed to be 0 on T.
REMARK 4 In Theorem 2, the transformation h transforms the k-dimen-
sional r. vector X to the k-dimensional r. vector Y. In many applications,
however, the dimensionality m of Y is less than k. Then in order to determine
the p.d.f. of Y, we work as follows. Let y = (h1(x), . . . , hm(x)) and choose
another k m transformations defined on ! k into !, hm+j, j = 1, . . . , k m,
say, so that they are of the simplest possible form and such that the
transformation
(
h = h1 , , hm , hm + 1 , , hk )
satisfies the assumptions of Theorem 2. Set Z = (Y1, . . . , Ym, Ym + 1, . . . , Yk),
where Y = (Y1, . . . , Ym) and Ym + j = hm + j(X), j = 1, . . . , k m. Then by applying
Theorem 2, we obtain the p.d.f. fZ of Z and then integrating out the last k m
arguments ym+j, j = 1, . . . , k m, we have the p.d.f. of Y.
A number of examples will be presented to illustrate the application of
Theorem 2 as well as of the preceding remark.
9.1 The
9.2 TheMultivariate
Univariate Case 223
EXAMPLE 9 Let X1, X2 be i.i.d. r.v.s distributed as U(, ). Set Y1 = X1 + X2 and find the
p.d.f. of Y1.
We have
1
, < x1 , x 2 <
fX 1 ,X 2
(x , x ) (
1 2 = )
2
0 , otherwise.
Consider the transformation
y = x1 + x 2 Y = X1 + X 2
h: 1 , < x1 , x 2 < ; then 1
y2 = x 2 Y2 = X 2 .
From h, we get
x1 = y1 y2 1 1
Then J= =1
x2 = y2 . 0 1
and also < y2 < . Since y1 y2 = x1, < x1 < , we have < y1 y2 < . Thus
the limits of y1, y2 are specified by < y2 < , < y1 y2 < . (See Figs. 9.3 and
9.4.)
x2
h
& (Fig. 9.4) T
x1
% 0 &
%
y2 y1 " y2 ! c
&
y1 " y2 ! % T
2% %
y1
0 %,& & 2&
y1 " y2 ! &
%
Figure 9.4 T = image of S under the transformation h.
224 9 Transformations of Random Variables and Random Vectors
Thus we get
1
, 2 < y1 < 2 , < y2 < , < y1 y2 <
(
fY ,Y y1 , y2
1 2
) ( = )
2
0 , otherwise.
Therefore
1 y1 y1 2
dy2 = , for 2 < y1 +
( ) ( )
2 2
fY ( )
y1 = 1 2 y1
1
y dy2 = , for + < y1 < 2
( ) ( )
2 2
1
0, otherwise.
fY1(y1)
Figure 9.5
1
(& " %)
y1
2% 0 %,& 2&
is transformed by h onto
( y
)
T = y1 , y2 ; 1 < 1 < , 1 < y2 < .
y2
9.2
9.1 The
TheMultivariate
Univariate Case 225
&
y2 ! y1 T y1
y2 ! &
1
y1
0 1 & &2
Figure 9.6
1
1
, (y , y ) T
fY ,Y
1 2
( ) (
y1 , y2 = 1 )
2
2
1 2
0, otherwise,
we have
1 y1 dy2 1
= log y1 , 1 < y1 <
( ) ( ) ( )
1
2 1 y2 1
2
fY y1 =
1 dy2 1
(2 log log y ),
1
= y1 < 2 ;
( ) ( )
1
1 2 y1 y2 1
2
that is
1
log y1 , 1 < y1 <
( )
2
1
1
fY1 ( )
y1 = 2 log log y1 , ( ) y1 < 2
( )
2
1
0, otherwise.
EXAMPLE 11 Let X1, X2 be i.i.d. r..s from N(0, 1). Show that the p.d.f. of the r.v.
Y1 = X1/X2 is Cauchy with = 0, = 1; that is,
fY y1 =
1
( ) 1
1
1 + y 21
, y1 ! .
We have
Y1 = X1/X2. Let Y2 = X2 and consider the transformation
226 9 Transformations of Random Variables and Random Vectors
y = x1 x2 , x2 0 x = y1 y2
h: 1 then 1
y2 = x2 ; x2 = y2
and
y2 y1
J= = y2 , so that J = y2 .
0 1
Since < x1, x2 < implies < y1, y2 < , we have
y12 y22 + y22
( ) (
fY1 ,Y2 y1 , y2 = fX 1 ,X 2 y1 y2 , y2 y2 = ) 1
2
exp
2
y2
and therefore
1 y 21 y22 + y22 1
y2 + 1 y2 ( )
exp 2 y2 dy2 = 0 exp 2 y2 dy2 .
1 2
fY ( y ) =
1
1
2
Set
(y 2
1 +1 )y 2
= t , so that y22 =
2t
2
2 2
y +1
1
and
2y2 dy2 =
2dt
y 21 + 1
, or y2 dy2 =
dt
2
y +1
[ )
, t 0, .
1
fY y1 =
1
( ) 1
1
y12 + 1
.
Hence
y2 y1
J= = y2 y1 y2 + y1 y2 = y2 and J = y2 .
y2 1 y1
Next,
1 x + x2
x 1 x2 1 exp 1 , x1 , x2 > 0,
fX 1 ,X 2
( ) 1
x1 , x2 = 2 2 ( )() 2
0, otherwise, , > 0.
From the transformation, it follows that for x1 = 0, y1 = 0 and for x1 ,
x1 1
y1 = = 1.
x1 + x 2 1 + x 2 x1 ( )
Thus 0 < y1 < 1 and, clearly, 0 < y2 < . Therefore, for 0 < y1 < 1, 0 < y2 < , we
get
y
( ) 1
( )
1
fY1 ,Y2 y1 , y2 = y1 1 y2 1 y2 1 1 y1 exp 2 y2
( )()
2 + 2
1
( )
1
= y1 1 1 y1 y2 + 1e y 2 2 .
( )()
2 +
Hence
( ) 1
( )
1
fY1 y1 = y1 1 1 y1
( )()
2 +
y2 + 1e y2 2 dy2 .
0
But
( )
0 y2 + 1e y dy2 = 2 + t + 1e t dt = 2 + + .
2 2
0
Therefore
+ ( )
( )
1
y 1 1 y1 0 < y1 < 1
fY 1
( )
y1 = a 1 ()()
,
0, otherwise.
x1
y1 =
x1 + x 2 x1 = y1 y2 y3
x1 + x 2
h: y2 = , x1 , x 2 , x3 > 0; then x 2 = y1 y2 y3 + y2 y3
x1 + x 2 + x3 x = y y + y
3 2 3 3
y3 = x1 + x 2 + x3
and
y2 y3 y1 y3 y1 y2
J = y2 y3 y1 y3 + y3 y1 y2 + y2 = y2 y32 .
0 y3 y2 + 1
Now from the transformation, it follows that x1, x2, x3 (0, ) implies that
( )
y1 0, 1 , y2 0, 1 , y3 0, . ( ) ( )
Thus
2 y3
, 0 < y1 < 1, 0 < y2 < 1, 0 < y3 <
fY ,Y
1 2 ,Y 3
( y , y , y ) = 0y, y e
1 2 3
2 3
otherwise.
Hence
( ) y y e
1
fY y1 = 2
2 y3
3 dy2 dy3 = 1, 0 < y1 < 1,
1 0 0
(y ) = y y e
1
fY 2 2
2 y3
3 dy1dy3 = y2 y32 e y dy3 3
2 0 0 0
= 2 y2 , 0 < y2 < 1
and
( )
1 1 1
fY y3 = y2 y32 e y dy1dy2 = y32 e y
3 3
y dy
2 2
3 0 0 0
1 2 y
= y3 e , 0 < y3 < .
3
Since
( ) ( ) ( ) ( )
fY1 ,Y2 ,Y3 y1 , y2 , y3 = fY1 y1 fY2 y2 fY3 y3 ,
fX x =() 1
e
( )
1 2 x2
, x ! ,
2
1 ( r 2 )1 y 2
1 y e , y>0
()
fY y = r 2
2
(
( )
1 2 )r
0, y 0.
Set U = Y and consider the transformation
x 1
t = x = t u
h: y r ; then r
u = y y = u
and
u t
u
J= r 2 u r = .
r
0 1
( )
fT ,U t , u =
1
e
( )
t2 u 2 r
1
u
( r 2 )1 u 2 u
e .
2 r 22( )
r 2
r
1 (1 2 )( r +1)1 u t2
= u exp 1 + .
( )
2r r 2 2 r 2
2 r
Hence
u t2
() 1 (1 2 )( r +1)1
fT t = u exp 1 + du.
0
( )
2r r 2 2 r 2
2 r
We set
1 1
u t2 t2 t2
1 + = z, so that u = 2 z 1 + , du = 2 1 + dz,
2 r
r
r
(1 2 )( r +1)1
() 1 2z 2
fT t = e z dz
0
2r r 2 2 ( )
r 2
1+ t r 2
( ) 1 + t2 r ( )
(1 2 )( r +1)
1 2 (1 2 )( r +1)1
1 2 )( r +1) 0
= z e z dz
2r r 2 2 r 2
( )1+ t r 2
[ ( )]
(
=
1
( ) [ ( )]
r r 2 1 + t 2 r ( )( )
1 2 r +1
1
[ (r + 1)];
1
2
that is
() [
fT t =
(r + 1)] 1
2 1
, t ! .
r r 2 ( ) [1 + (t r )]( 2
)( )
1 2 r +1
The probabilities P(T t) for selected values of t and r are given in tables (the
t-tables). (For the graph of fT, see Fig. 9.7.)
fT (t)
t . (N(0, 1))
t5
t
0
Figure 9.7
The density of the F distribution with r1, r2 d.f. (Fr ,r ). Let the independent r..s 1 2
r.v. F is said to have the F distribution with r1, r2 degrees of freedom (d.f.) and
is often denoted by Fr ,r . 1 2
1 ( r 2 )1 x 2
1 x 1
e , x>0
fX () ( )
x = 2 r1 2 r1 2
0, x 0,
9.2
9.1 The
TheMultivariate
Univariate Case 231
1 ( r 2 )1 y 2
1 y 2
e , y>0
()
fY y = 2 r2 2 r ( ) 2 2
0, y 0.
( r 2 )1
r1
1
( )
f F ,Z f , z =
1
) r2
f
( r 2 )1 ( r 2 )1 ( r 2 )1
1
z 1
z 2
( r )( r )2
1
2 1
1
2 2
(1 2 )( r + r1 2
r r
exp 1 fz e z 2 1 z
2r2 r2
( )
r 2 ( r 2 )1
1
r1 r2 f zr
1
(1 2 )( r + r )1
= z exp 1 f + 1 .
1 2
( )( )
12 r1 12 r2 2
( )( )
1 2 r + r 1 2
2 r2
Therefore
() ( )
fF f = fF ,Z f , z dz
0
(r r ) f ( )
r1 2 r1 2 1
1 2 (1 2 )( r + r )1 zr
= z exp 1 f + 1 dz.
1 2
( r ) ( r )2
1 ( 1 )(
1 2 r1 + r2 ) 0
2 r2
2 1 2 2
Set
1
z r1 r1
f + 1 = t , so that z = 2t f + 1 ,
2 r2 2
r
1
r
dz = 2 1 f + 1 dt , t 0, .
r2
[ )
Thus continuing, we have
232 9 Transformations of Random Variables and Random Vectors
(r r ) f ( ) ( )( )
r1 2 r1 2 1 1 2 r1 + r2 +1
(1 2 )( r + r )1 r1
fF ()
f =
1 2
2 f + 1
1 2
( r ) ( r )2
1 ( 1 )(
1 2 r1 + r2 ) r2
2 1 2 2
1
r (1 2 )( r + r )1
2 1 f + 1 t e t dt
1 2
r2 0
=
[ (r + r )](r r )
1
2 1 2 1 2
r1 2
f
( r 2 )1
1
.
( r )( r )
1
2 1
1
2 2
[1 + (r r ) f ] 1 2
(1 2 )( r + r )
1 2
Therefore
[(
1 r + r r r
2 1 2 1 2
)]( )
r1 2
f
( r 2 )1
1
, for f >0
fF ()
f = ( )( )
12 r1 12 r2
[1 + (r r ) f ] (1 2 ) ( r + r )
1 2
1 2
0, for f 0.
The probabilities P(F f) for selected values of f and r1, r2 are given by
tables (the F-tables). (For the graph of fF, see Fig. 9.8.)
fF ( f )
F10, 10
F10, 4
f
0 10 20 30
Figure 9.8
REMARK 6
i) If F is distributed as Fr ,r , then, clearly, 1/F is distributed as Fr ,r .
1 2 2 1
Exercises
9.2.1 Let X1, X2 be independent r.v.s taking on the values 1, . . . , 6 with
probability f(x) = 16 , x = 1, . . . , 6. Derive the distribution of the r.v. X1 + X2.
9.2.2 Let X1, X2 be r.v.s with joint p.d.f. f given by
(
f x1 , x 2 = ) 1
(
I A x1 , x 2 , )
where
( )
A = x1 , x2 ! 2 ; x 21 + x22 1.
Set Z2 = X 21 + X 22 and derive the p.d.f. of the r.v. Z2. (Hint: Use polar co-
ordinates.)
9.2.3 Let X1, X2 be independent r.v.s distributed as N(0, 1). Then:
i) Find the p.d.f. of the r.v.s X1 + X2 and X1 X2;
ii) Calculate the probability P(X1 X2 < 0, X1 + X2 > 0).
9.2.4 Let X1, X2 be independent r.v.s distributed as Negative Exponential
with parameter = 1. Then:
234 9 Transformations of Random Variables and Random Vectors
1
()
f x =I 1, x .
x2 ( )
()
Determine the distribution of the r.v. X = X1/X2.
9.2.7 Let X be an r.v. distributed as tr.
(Hint: For part (iv), use Stirlings formula (see, for example, W. Fellers book
An Introduction to Probability Theory, Vol. I, 3rd ed., 1968, page 50) which
states that, as n , (n)/(2)1/2n(2n1)/2en tends to 1.)
9.2.10 Let X1, X2 be independent r.v.s distributed as N(0, 2). Then show
that:
EX r = 0, r 2; 2 X r = ( ) r
r 2
, r 3.
2r 22 r1 + r2 2 ( )
EX r1 ,r2 =
r2
, r2 3; 2 X r1 ,r2 = ( ), r2 5.
r2 2
( )( )
2
r1 r2 2 r2 4
9.2.13 Let Xr be an r.v. distributed as tr, and let fr be its p.d.f. Then show that:
x2
fr x
()
r
1
exp , x ! .
2
2
(Hint: Use Stirlings formula given as a hint in Exercise 9.2.8(iv).)
9.2.14 Let Xr and Xr ,r be r.v.s distributed as 2r and Fr ,r , respectively, and,
1 1 2 1 1 2
9.3.1 Preliminaries
A transformation h: ! k ! k which transforms the variables x1, . . . , xk to the
variables y1, . . . , yk in the following manner:
k
yi = cij xj , cij real constants, i, j = 1, 2, , k (1)
j =1
Let D = (dij) and * = |D|. Then, as is known from linear algebra (see also
Appendix 1), * = 1/. If, furthermore, the linear transformation above is
such that the column vectors (c1j, c2j, . . . , ckj), j = 1, . . . , k are orthogonal,
that is
236 9 Transformations of Random Variables and Random Vectors
k
cij ci j = 0
j j
for
i =1
and k (3)
ij c 2
= 1, j = 1, , k ,
i =1
According to what has been seen so far, the Jacobian of the transforma-
tion (1) is J = * = 1/, and for the case that the transformation is orthogonal,
we have J = 1, so that |J| = 1. These results are now applied as follows:
Consider the r. vector X = (X1, . . . , Xk) with p.d.f. fX and let S be the
subset of ! k over which fX > 0. Set
k
Yi = cij X j , i = 1, , k,
j =1
where we assume = |(cij)| 0. Then the p.d.f. of the r. vector Y = (Y1, . . . , Yk)
is given by
k k 1
(
fY y1 , , yk ) fX d1 j y j , , dkj y j ,
= j =1 j =1
(y , , y )
1 k T
0, otherwise,
where T is the image of S under the transformation in question. In particular,
if the transformation is orthogonal,
9.3 Linear Transformations
9.1 The
of Random
Univariate
Vectors
Case 237
k k
(
fY y1 , , yk ) fX c j 1 y j , , c jk y j ,
= j =1 j =1
(y , , y )
1 k T
0 , otherwise.
Another consequence of orthogonality of the transformation is that
k k
Y i2 = X 2i .
i =1 i =1
In fact,
2
k k k k k k
Y = cij X j = cij X j cil X l
2
i
i =1 i =1 j =1 i =1 j =1 l =1
k k k k k k
= cij cil X j X l = X j X l cij cil
i =1 j =1 l =1 j =1 l =1 i =1
k
= X 2i
i =1
because
k
k k
X 2
j = Y 2j .
j =1 j =1
Then the r.v.s Y1, . . . , Yk are also independent, normally distributed with
common variance 2 and means given by
k
( )
E Yi = cij j , i = 1, , k.
j =1
1
k
k k
2
(
fY y1 , , yk ) =
2
exp 1
2 2 c ji y j i
i =1 j =1
.
Now
k k
2 2
k k k
ji j i ji j i
c y = c y + 2
2 i ji j
c y
i =1 j =1 i =1 j =1 j =1
k k k k
= c ji cli y j yl + 2i 2 i c ji y j
i =1 j =1 l =1 j =1
k k k k k k
= y j yl c ji cli + 2i 2 i c ji y j
j =1 l =1 i =1 i =1 j =1 i =1
k k k k
= y 2j 2 c ji i y j + 2i
j =1 j =1 i =1 i =1
k 2 k k k
j ji jl i l c ji i y j
y + c c 2
j =1 i =1 l =1 i =1
k k k k k k
= y 2j 2 i c ji y j + i l c ji c jl
j =1 j =1 i =1 i =1 l =1 j =1
k k k k
= y 2j 2 i c ji y j + 2i ,
j =1 j = 1 i =1 i =1
as was to be seen.
As a further application of Theorems 4 and 5, we consider the following
result. Let Z1, . . . , Zk be independent N(0, 1), and set
1 1 1
Y1 = Z1 + Z2 + + Zk
k k k
1 1
Y2 = Z1 Z2
2 1 2 1
Y = 1 1 2
Z1 + Z2 Z3
3 3 2 3 2 3 2
M
Y = 1 1 k 1
Z1 + + Zk 1 Zk .
k
(
k k 1 ) (
k k 1 ) (
k k 1 )
We thus have
1
c1 j = , j = 1, , k, and for i = 2, , k
k
1
c ij = , for j = 1, , i 1, and
( )
i i1
i1
c ii = .
( )
i i1
Hence
k
k
c 2
1j =
k
= 1, and for i = 2, , k,
j =1
( )
2
k i i 1
1
c = c = i 1 i i 1 + i i 1
2 2
( )
j =1 j =1
ij ij
( ) ( )
1 i 1
+ = 1, =
i i
while for i = 2, . . . , k, we get
k
1 k
1 i
1 i 1 i 1
c1 j cij = cij = cij = = 0,
j =1 k j =1 k j =1
k i i 1 ( )
i i 1 ( )
240 9 Transformations of Random Variables and Random Vectors
c c ij lj if i > l .
j =1
For i < l, this is
(
i i1 l l 1
1
)( )
[(i 1) (i 1)] = 0,
and for i > l, this is
(
i i1 l l 1
1
)( )
[(l 1) (l 1)] = 0.
Thus the transformation is orthogonal. It follows, by Theorem 5, that
Y1, . . . , Yk are independent, N(0, 1), and that
k k
Y i2 = Z 2i by Theorem 4.
i =1 i =1
Thus
( kZ )
k k k 2
Y i2 = Y i2 Y 12 = Z 2i
i=2 i =1 i =1
k k
( )
2
= Z kZ = Zi Z .2
i
2
i =1 i =1
is independent of
Since Y1 is independent of ki=2Y 2i, we conclude that Z
)2. Thus we have the following theorem.
i = 1 (Zi Z
k
are independent. Hence X and S are independent. 2
Exercises
9.3.1 For i = 1, 2, 3, let Xi be independent r.v.s distributed as N(i, 2), and
set:
1 1 1 1 1
Y1 = X1 + X 2 , Y2 = X1 X2 + X3 ,
2 2 3 3 3
1 1 2
Y3 = X1 + X2 + X3 .
6 6 6
9.1 The Univariate
Exercises
Case 241
Then:
i) Show that the r.v.s Y1, Y2, Y3 are also independent normally distributed
with variance 2, and specify their respective means.
(Hint: Verify that the transformation is orthogonal, and then use Theorem
5);
ii) If 1 = 2 = 3 = 0, use a conclusion in Theorem 4 in order to show that
Y 21 + Y 22 + Y 23 # 2 23.
9.3.2 If the pair of r.v.s (X, Y) has the Bivariate Normal distribution with
parameters 1, 2, 21, 22, , that is, (X, Y) # N(1, 2, 21, 22, ), then show that
( 1
1
2
2
)
X Y # N(0, 0, 1, 1, ), and vice versa.
,
9.3.3 If (X, Y) # N(0, 0, 1, 1, ), and c, d are constants with cd 0, then show
that (cX, dY) # N(0, 0, c2, d2, 0), where 0 = if cd > 0, and 0 = if cd < 0.
(
12 = 12 + 22 + 2 1 2 , 22 = 12 + 22 2 1 2 , and 0 = 12 22 1 2 ; )
iii) X + Y # N(0, 21) and X Y # N(0, 22);
iii) The r.v.s X + Y and X Y are independent if and only if 1 = 2. (Compare
with the latter part of Exercise 9.3.5.)
9.3.7 Let (X, Y) # N(1, 2, 21, 22, ), and let c, d be constants with cd 0.
Then:
i) (cX, dY) # N(c1, d2, c2 21, d2 22, ), with + if cd > 0, and if cd < 0;
ii) (cX + dY, cX dY) # N(c1 + d2, c1 d2, 21, 22, 0), where
12 = c 2 12 + d 2 22 + 2 cd 1 2 , 22 = c 2 12 + d 2 22 2 cd 1 2 ,
c 2 12 d 2 22
and 0 = ;
1 2
iii) The r.v.s cX + dY and cX dY are independent if and only if c
d
= ;
2
1
242 9 Transformations of Random Variables and Random Vectors
iv) The r.v.s in part (iii) are distributed as N(c1 + d2, 21), and N(c1 d2,
22), respectively.
THEOREM 7 Let X be an r.v. with continuous d.f. F, and define the r.v. Y by Y = F(X). Then
the distribution of Y is U(0, 1).
PROOF Let G be the d.f. of Y. We will show that G(y) = y, 0 < y < 1;
G(0) = 0; G(1) = 1. Indeed, let y (0, 1). Since F(x) 0 as x , there
exists a such that (0 )F(a) < y; and since F(x) 1 as x , there exists
> 0 such that y + < 1 and F(y) < F(y + )( 1). Set F(a) = c, y + = b,
and F(b) = d. Then the function F is continuous in the closed interval
[a, b] and all y of the form y + n (n 2 integer) lie in (c, d). Therefore, by the
Intermediate Value Theorem (see, for example, Theorem 3(ii) on page 95
in Calculus and Analytic Geometry, 3rd edition (1966), by George B.
Thomas, Addison-Wesley Publishing Company, Inc., Reading, Massachusetts)
there exist x0 and xn (n 2) in (a, b) such that F(x0) = y and F(xn) = y + n .
Then
9.4 The Probability
9.1 The Integral
Univariate
Transform
Case 243
(X x ) [F (X ) F (x )] (since F is nondecreasing)
0 0
= [ F ( X ) y] (since F (x ) = y) 0
( )
F X < y +
n
[ ( ) ( )]
= F X < F xn
( )
since F x n = y +
n
(
X < xn ) (by the fact that F is nondecreasing
and by contradiction).
That is (X x0) [F(X) y] (X xn). Hence
( ) [ ( ) ] (
P X x0 P F X y P X x n , )
or y = F ( x ) G( y) F ( x ) = y + .
0 n
n
Letting n , we obtain G(y) = y. Next, G is right-continuous, being a d.f.
Thus, as y 0, G(0) = lim G(y) = lim y = 0. Finally, as y 1, G(1) = lim G(y)
= lim y = 1, so that G(1) = 1. The proof is completed.
For the formulation and proof of the second result, we need some notation
and a preliminary result. To this end, let X be an r.v. with d.f. F. Set y = F(x)
and define F1 as follows:
F 1 ( y) = inf {x ! ; F ( x) y}. (5)
From this definition it is then clear that when F is strictly increasing, for each
x !, there is exactly one y (0, 1) such that F(x) = y. It is also clear that, if
F is continuous, then the above definition becomes as follows:
F 1 ( y) = inf {x ! ; F ( x) = y}. (6)
(See also Figs. 9.9, 9.10 and 9.11.)
1 1 1
y F
F F
y
y
0 x 0 A x B 0 x
Figure 9.9 Figure 9.10 Figure 9.11
244 9 Transformations of Random Variables and Random Vectors
1
[ ( )]
F F 1 y y.
1
(7)
Now assume that F (y) t. Then F[F (y)] F(t), since F is nondecreasing.
Combining this result with (7), we obtain y F(t).
Next assume, that y F(t). This means that t belongs to the set {x !, F(x)
y} and hence F1(y) t. The proof of the lemma is completed.
By means of the above lemma, we may now establish the following result.
THEOREM 8 Let Y be an r.v. distributed as U(0, 1), and let F be a d.f. Define the r.v. X by
X = F 1(Y), where F1 is defined by (5). Then the d.f. of X is F.
PROOF We have
( ) [ ( ) ] [ ( )]
P X x = P F 1 Y x = P Y F x = F x , ()
where the last step follows from the fact that Y is distributed as U(0, 1) and the
one before it by Lemma 1.
REMARK 7 As has already been stated, the theorem just proved provides a
specific way in which one can construct an r.v. X whose d.f. is a given d.f. F.
Exercise
9.4.1 Let Xj, j = 1, . . . , n be independent r.v.s such that Xj has continuous
and strictly increasing d.f. Fj. Set Yj = Fj(Xj) and show that the r.v.
n
X = 2 log 1 Yj
j =1
( )
is distributed as 22n.
Chapter 10
In this chapter we introduce the concept of order statistics and also derive
various distributions. The results obtained here will be used in the second part
of this book for statistical inference purposes.
(that is, for each s S, look at X1(s), X2(s), . . . , Xn(s), and then Yj(s) is defined
to be the jth smallest among the numbers X1(s), X2(s), . . . , Xn(s), j = 1, 2, . . . ,
n). It follows that Y1 Y2 Yn, and, in general, the Ys are not
independent.
We assume now that the Xs are of the continuous type with p.d.f. f such
that f(x) > 0, ( )a < x < b( ) and zero otherwise. One of the problems we
are concerned with is that of finding the joint p.d.f. of the Ys. By means of
Theorem 3, Chapter 9, it will be established that:
THEOREM 1 If X1, . . . , Xn are i.i.d. r.v.s with p.d.f. f which is positive for a < x < b and 0
otherwise, then the joint p.d.f. of the order statistics Y1, . . . , Yn is given by:
( )
g y1 , , yn =
( ) ( )
n! f y1 f yn , a < y1 < y2 < < yn < b
0, otherwise.
PROOF The proof is carried out explicitly for n = 3, but it is easily seen, with
the proper change in notation, to be valid in the general case as well. In the first
place, since for i j,
245
246 10 Order Statistics and Related Theorems
( ) ( )( )
P X i = X j = ( xi = x j ) f xi f x j dxi dx j = a
b
x f ( xi ) f ( x j )dxi dx j = 0,
xj
and therefore P(Xi = Xj = Xk) = 0 for i j k, we may assume that the joint
p.d.f., f(, , ), of X1, X2, X3 is zero if at least two of the arguments x1, x2, x3 are
equal. Thus we have
( )
f x1 , x 2 , x3 =
( )( )( )
f x1 f x 2 f x3 , a < x1 x 2 x3 < b
0, otherwise.
Now on each one of the Sijks there exists a one-to-one transformation from the
xs to the ys defined as follows:
S123 : y1 = x1 , y2 = x 2 , y3 = x3
S132 : y1 = x1 , y2 = x3 , y3 = x 2
S 213 : y1 = x 2 , y2 = x1 , y3 = x3
S 231 : y1 = x 2 , y2 = x3 , y3 = x1
S312 : y1 = x3 , y2 = x1 , y3 = x 2
S321 : y1 = x3 , y2 = x 2 , y3 = x1 .
S123 : x1 = y1 , x 2 = y2 , x3 = y3
S132 : x1 = y1 , x 2 = y3 , x3 = y2
S 213 : x1 = y2 , x 2 = y1 , x3 = y3
S 231 : x1 = y3 , x 2 = y1 , x3 = y2
S312 : x1 = y2 , x 2 = y3 , x3 = y1
S321 : x1 = y3 , x 2 = y2 , x3 = y1 .
1 0 0 0 0 1
S123 : J 123 = 0 1 0 = 1 S231 : J 231 = 1 0 0 = 1
0 0 1 0 1 0
1 0 0 0 1 0
S132 : J 132 = 0 0 1 = 1; S312 : J 312 = 0 0 1 = 1
0 1 0 1 0 0
0 1 0 0 0 1
S213 : J 213 = 1 0 0 = 1 S321 : J 321 = 0 1 0 = 1.
0 0 1 1 0 0
( )( )( ) ( )( )( ) ( )( )( )
f y1 f y2 f y3 + f y1 f y3 f y2 + f y2 f y1 f y3
( )
g y1 , y2 , y3 = ( )( )( ) ( )( )( ) ( )( )( )
+ f y3 f y1 f y2 + f y2 f y3 f y1 + f y3 f y2 f y1 ,
a < y1 < y2 < y3 < b
0, otherwise.
This is,
( )
g y1 , y2 , y3 =
( )( )( )
3! f y1 f y2 f y3 , a < y1 < y2 < y3 < b
0, otherwise.
Notice that the proof in the general case is exactly the same. One has n!
regions forming S, one for each permutation of the integers 1 through n. From
the definition of a determinant and the fact that each row and column contains
exactly one 1 and the rest all 0, it follows that the n! Jacobians are either 1 or
1 and the remaining part of the proof is identical to the one just given except
one adds up n! like terms instead of 3!.
EXAMPLE 1 Let X1, . . . , Xn be i.i.d. r..s distributed as N(, 2 ). Then the joint p.d.f. of the
order statistics Y1, . . . , Yn is given by
n
1 1 n
2
( )
g y1 , , yn = n! exp
2 2
2 (y
j =1
j )
,
( )
g y1 , , yn =
n!
,
( )
n
248 10 Order Statistics and Related Theorems
THEOREM 2 Let X1, . . . , Xn be i.i.d. r.v.s with d.f. F and p.d.f. f which is positive and
continuous for ( ) a < x < b( ) and zero otherwise, and let Y1, . . . , Yn be
the order statistics. Then the p.d.f. gj of Yj, j = 1, 2, . . . , n, is given by:
( ) ( )( )[ ] [
( ) 1 F ( y )] f ( y ),
n! j 1 n j
F yj j j a < yj < b
i) g j y j = j 1 ! n j !
0, otherwise.
In particular,
[ ( )] f ( y ),
n1
n 1 F y1 a < y1 < b
( )
i) g1 y1 = 1
0, otherwise
and
[ ( )] f ( y ),
n1
n F yn a < yn < b
( )
i) g n yn = n
0, otherwise.
The joint p.d.f. gij of any Yi, Yj with 1 i < j n, is given by:
)( )[ ] [
( ) ( ) ( )]
n! i 1 j i 1
F yi F y j F yi
( )(
i1! j i1! n j !
( )
ii) g ij yi , y j =
[ ( )] ( )( )
n j
1 F y j f yi f y j , a < yi < y j < b
0, otherwise.
In particular,
( )[ ( ) ( )] ( )( )
n 2
n n 1 F yn F y1 a < y1 < yn < b
(
ii) g1n y1 , yn = ) f y1 f yn ,
0, otherwise.
PROOF From Theorem 1, we have that g(y1, . . . , yn) = n!f(y1) f(yn) for
a < y1 < < yn < b and equals 0 otherwise. Since f is positive in (a, b), it follows
that F is strictly increasing in (a, b) and therefore F 1 exists in this interval.
Hence if u = F(y), y (a, b), then y = F 1 (u), u (0, 1) and
dy
=
1
, ( )
u 0, 1 .
du f F 1 u
[ ( )]
10.1 Order Statistics and Related Distributions 249
( ) [ ( )]
h u1 , , un = n! f F 1 u1 f F 1 un [ ( )] f [F (u )] 1 f [F (u )]
1
1
1
n
for 0 < u1 < < un < 1 and equals 0 otherwise; that is, h(u1, . . . , un) = n! for
0 < u1 < < un < 1 and equals 0 otherwise. Hence for uj (0, 1),
( )
uj u2 1 1
h u j = n! dun du j +1 du1 du j1 .
0 0 uj u n 1
The first n j integrations with respect to the variables un, . . . , uj+1 yield
[1/(n j)!] (1 uj)nj and the last j 1 integrations with respect to the variables
u1, . . . , uj1 yield [1/(j 1)!] ujj1. Thus
( ) n!
( )
n j
h uj = u jj1 1 u j
( )(
j 1! n j ! )
for uj (0, 1) and equals 0 otherwise. Finally, using once again the transforma-
tion Uj = F(Yj), we obtain
( )( )[ ] [
( ) ( ) 1 F ( y )] f ( y )
n! j 1 n j
g yj = F yj j j
j 1! n j !
for yj (a, b) and 0 otherwise. This completes the proof of (i).
Of course, (i) and (i) follow from (i) by setting j = 1 and j = n, res-
pectively. An alternative and easier way of establishing (i) and (i) is the
following:
( ) ( )
Gn yn = P Yn yn = P all X j s yn = F n yn . ( ) ( )
Thus gn(yn) = n[F(yn)]n1 f(yn). Similarly,
( ) ( ) ( ) [ ( )] .
n
1 G1 y1 = P Y1 > y1 = P all X j s > y1 = 1 F y1
Then
( ) [ ( )] [ f ( y )], ( ) [ ( )] f ( y ).
n1 n
g1 y1 = n 1 F y1 1 or g1 y1 = n 1 F y1 1
The proof of (ii) is similar to that of (i), and in fact the same method can be
used to find the joint p.d.f. of any number of Yjs (see also Exercise
10.1.19).
EXAMPLE 3 Refer to Example 2. Then
0 , x
x
()
F x = , a<x<
1, x ,
250 10 Order Statistics and Related Theorems
and therefore
y j
j 1
yj
nj
n! 1
( ) (
gj yj = j 1! n j )( )
!
, a<yj <
0, otherwise
n!
(y ) ( y )
j 1 nj
, a<yj <
( )( )!( )
n j j
= j 1! n j
0, otherwise.
Thus
y n1 1 n
( )
n1
n 1
= y1 , < y1 <
( )
g1 y1 =
n
( )
0, otherwise,
y n1 1 n
( )
n1
n n = yn , < yn <
( )
g n yn =
n
( )
0, otherwise,
y y1
n 2
(
n n1 )
( ) 1
(y )
n 2
n n 1 n = y1 ,
( ) ( )
n
2 n
(
g1n y1 , yn ) =
a < y1 < yn <
0, otherwise.
n!
( )
n j
y jj 1 1 y j , 0 < yj < 1
( ) (
g j yj = j 1 ! n )(
j ! )
0, otherwise.
Since (m) = (m 1)!, this becomes
n+1 ( )
( )
n j
y jj1 1 y j , 0 < yj < 1
( ) () (
g j yj = j n j + 1 )
0, otherwise,
which is the density of a Beta distribution with parameters = j, = n j + 1.
Likewise
10.1 Order Statistics and Related Distributions 251
n 1 y
( )
n 1
0 < y1 < 1
( )
g1 y1 = 1 ,
0, otherwise,
nyn 1 , 0 < yn < 1
( )
gn yn = n
0, otherwise
and
(
n n 1 y y
)( )
n 2
0 < y1 < yn < 1
(
g1n y1 , yn ) = n 1 ,
0, otherwise.
The r.v. Y = Yn Y1 is called the (sample) range and is of some interest in
statistical applications. The distribution of Y is found as follows. Consider the
transformation
y = yn y1 y = z
Then 1 and hence J = 1.
z = y1 . yn = y + z
Therefore
( ) (
fY , Z y, z = g1n z, y + z )
)[ ( ) ( )]
0 < y < b a
( ()( )
n 2
= n n1 F y+ z F z f z f y + z ,
a < z < b y
and zero otherwise. Integrating with respect to z, one obtains
) [F ( y + z) F (z)] f (z) f ( y + z)dz,
n 2
(
b y
()
fY y = n n 1 a
0 < y< ba
0, otherwise.
In particular, if X is an r.v. distributed as U(0, 1), then
() ( ) ( ) ( )
1 y
fY y = n n 1 y n 2 dz = n n 1 y n 2 1 y , 0 < y < 1;
0
that is
() (
n n 1 yn 2 1 y ,
fY y =
) ( ) 0< y<1
0, otherwise.
Let now U be and independent of the sample range Y. Set
2
r
Y
Z= .
U r
We are interested in deriving the distribution of the r.v. Z. To this end, we
consider the transformation
252 10 Order Statistics and Related Theorems
y
z = u = rw
u r Then and hence J = r w.
y = z w
w = u r .
Therefore
( ) ( ) ( )
fZ , W z, w = fY z w fU rw r w ,
()
( ) ( )
fZ z = fY z w fU rw r w dw,
0
Exercises
Throughout these exercises, Xj, j = 1, . . . , n, are i.i.d. r.v.s and Yj = X(j) is the
jth order statistic of the Xs. The r.v.s Xj, j = 1, . . . , n may represent various
physical quantities such as the breaking strength of certain steel bars, the
crushing strength of bricks, the weight of certain objects, the life of certain
items such as light bulbs, vacuum tubes, etc. From these examples, it is then
clear that the distribution of the Ys and, in particular, of Y1, Yn as well as
Yn Y1, are quantities of great interest.
10.1.1 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with d.f. and p.d.f. F and f, respec-
tively, and let m be the median of F. Use Theorem 2(i) in order to calculate
the probability that all Xjs exceed m; also calculate the probability P(Yn m).
10.1.2 Let X1, X2, X3 be independent r.v.s with p.d.f. f given by:
()
f x =e
( x )
( )
I ( , ) x1 .
Use Theorem 2(i) in order to determine the constant c = c( ) for which
P( < Y3 < c) = 0.90.
10.1.3 If the independent r.v.s X1, . . . , Xn are distributed as U(, ), then:
i) Calculate the probability that all Xs are greater than ( + )/2;
ii) In particular, for = 0, = 1, and n = 2, derive the p.d.f. of the r.v. Y2/Y1.
10.1.4 Let Xj, j = 1, . . . , n be independent r.v.s distributed as U(, ). Then:
i) Use the p.d.f. derived in Example 3 in order to show that
Exercises 253
( ) j + ( ) j(n j + 1) ;
2
EY j = and 2
(Y ) =
(n + 1) (n + 2)
j
n+1 2
ii) Derive EY1, 2(Y1), and EYn, 2(Yn) from part (i);
iii) What do the quantities in parts (i) and (ii) become for = 0 and
= 1? (Hint: In part (i), use the appropriate Beta p.d.f.s to facilitate
the integrations.)
10.1.5
i) Refer to the p.d.f. gij derived in Theorem 2(ii), and show that, if X1, . . . , Xn
are independent with distribution U(, ), then:
(
n n1 )
( ) (y )
n 2
g1n y1 , yn = y1 , < y1 < yn < ;
( )
n n
ii) Set Y = Yn Y1, and show that the p.d.f. of Y is given by:
(
n n1 )
()
fY y = ( y) y n 2
, 0 < y < ;
( )
n
(
P a <Y < b = ) n
[( b)b n1
(
a a n1 + ) ] bn a n
;
( ) ( )
n n
( ) n!
( ) (1 y )
j i1 n j
g ij yi , y j = yii1 y j yi ,
( )(
i1! j i1! n j ! )( ) j
(
i! j i 1 ! j ) ( j + 1)!(n j )! ;
0 y ( z y) 0 z (1 z)
z j i 1 1
j +1 n j
i
dy = z, dz =
j! (n + 2)!
iii) Use parts (i) and (ii) to show that
(
i j +1 )
(
i j =
E YY ) ;
(n + 1)(n + 2)
254 10 Order Statistics and Related Theorems
(
i n j +1 ) (
i n j +1 )
(
Cov Yi , Yj = ) (
and Yi , Yj = ) ;
( )( ) [( )( )]
2 1 2
n+1 n+ 2 i ni +1 j n j +1
n
(
)
z1 , , zn ! ; zj 0, j = 1, , n and
n
z
j =1
j 1.
10.1.9
i) Let the independent r.v.s X1, . . . , Xn have the Negative Exponential distri-
bution with parameter . Then show that Y1 has the same distribution with
parameter n;
ii) Let F be the (common) d.f. of the independent r.v.s X1, . . . , Xn, and
suppose that their first order statistic Y1 has the Negative Exponential
distribution with parameter n. Then F is the Negative Exponential d.f.
with parameter .
10.1.10 Let the independent r.v.s X1, . . . , Xn have the Negative Expon-
ential distribution with parameter . Then:
i) Use Theorem 2(i) in order to show that the p.d.f. of Yn is given by:
( ) ( )
n1
g n yn = ne y 1 e y n n
, yn > 0 ;
ii) Let Y be the (sample) range; that is, Y = Yn Y1, and then use Theorem 2
(ii) in order to show that the p.d.f. of Y is given by:
() ( ) ( )
n 2
fY y = n 1 e y 1 e y , y > 0;
Exercises 255
iii) Calculate the probability P(a < Y < b) for 0 < a < b;
iv) For a = 1/, b = 2/, and n = 10, find a numerical value for the probability
in part (iii).
10.1.11 Let the independent r.v.s X1, . . . , Xn have the Negative Exponen-
tial distribution with parameter , and let 1 < j < n. Use Theorem 2(ii) and
Exercises 1.9(i) and 1.10(i) in order to determine:
i) The conditional p.d.f. of Yj, given Y1;
ii) The conditional p.d.f. of Yj, given Yn;
iii) The conditional p.d.f. of Yn, given Y1; and the conditional p.d.f. of Y1,
given Yn.
10.1.12 Let the independent r.v.s X1, . . . , Xn have the Negative Exponen-
tial distribution with parameter , and set: Z1 = Y1, Zj = Yj Yj1, j = 2, . . . , n.
Then:
i) For j = 1, . . . , n, show that Zj has the Negative Exponential distribution
with parameter (n j + 1), and that these r.v.s are independent;
ii) From the definition of the Zjs, it follows that Yj = Z1 + + Zj, j = 1, . . . ,
n. Use this expression and part (i) in order to conclude that:
11 1 1
EY j = + ++ ;
n n1 n j + 1
iii) Use part (i) in order to show that, for c > 0:
i j [ ( ) ]
P min X i X j c = exp n n 1 c 2 .
j =1 j =1 i = j
iv) Also utilize parts (i) and (ii) in order to show that:
n n n n
Cov ci Yi , d j Yj = i2 c j d j + 2 ck dl ,
i =1 j =1 i =1 j = i 1k < l n
where cj !, dj ! constants.
Let Xj, j = 1, . . . , n be i.i.d. r.v.s. Then the sample median SM is defined as
follows:
256 10 Order Statistics and Related Theorems
Y n+ 1 if n is odd
SM = 1
2
(*)
2 Y + Y
n n +2 if n is even.
2 2
10.1.14 If Xj, j = 1, . . . , n are i.i.d. r.v.s, and n is odd, determine the p.d.f. of
SM when the underlying distribution is:
i) U(, );
ii) Negative Exponential with parameter .
10.1.15 If the r.v.s Xj, j = 1, . . . , n are independently distributed as N(, 2),
show that the p.d.f. of SM is symmetric about , where SM is defined by (*).
Without integration, conclude that ESM = .
10.1.16 For n odd, let the independent r.v.s Xj, j = 1, . . . , n have p.d.f. f with
median m. Then determine the p.d.f. of SM, and also calculate the probability
P(SM > m) in each one of the following cases:
i) f(x) = 2xI(0,1)(x);
ii) f(x) = 2(2 x)I(1,2)(x);
iii) f(x) = 2(1 x)I(0,1)(x);
iv) What do parts (i)(iii) become for n = 3?
10.1.17 Refer to Exercise 10.1.2 and derive the p.d.f. of SM, where SM is
defined by (*).
10.1.18 Let Xj, j = 1, . . . , 6 be i.i.d. r.v.s with p.d.f. f given by f(x) = 16 ,
x = 1, . . . , 6. Find the p.d.f.s of Y1 and Y6. Also, observe that P(Y1 = y) =
P(Y6 = 7 y), y = 1, . . . , 6.
10.1.19 Carry out the proof of part (ii) of Theorem 2.
THEOREM 3 Let X1, . . . , Xn be i.i.d. r..s with continuous d.f. F and let Zj = F(Yj), where Yj,
j = 1, 2, . . . , n are the order statistics. Then Z1, . . . , Zn are order statistics
from U(0, 1), and hence their joint p.d.f., h is given by:
n!, 0 < z1 < < zn < 1
( )
h z1 , , zn =
0, otherwise.
10.2 Further Distribution Theory: Probability of Coverage of a Population Quantile 257
THEOREM 4 Let X1, . . . , Xn be i.i.d. r.v.s with continuous d.f. F and let Y1, . . . , Yn be the
order statistics. For p, 0 < p < 1, let xp be the (unique by assumption) pth
quantile. Then we have
( )p q
j 1
(
P Yi x p Yj = ) k =i
n
k
k n k
, where q = 1 p.
1, X j x p
Wj =
0, X j > x p , j = 1, 2, , n.
Then W1, . . . , Wn are i.i.d. r..s distributed as B(1, p), since
( ) (
P W1 = 1 = P X1 x p = F x p = p. ) ( )
Therefore
( )p q
n
(
P at least i of X 1 , , X n x p = ) k =i
n
k
k n k
;
or equivalently,
( )pq
n
( )
P Yi < x p = P Yi x p = ( ) k= i
n
k
k n k
.
( ) ( ) (
P Yi x p = P Yi x p , Y j x p + P Yi x p , Y j < x p )
= P(Y x
i p Y ) + P(Y < x ),
j i p
258 10 Order Statistics and Related Theorems
since
(Y j ) (
< x p Yi x p . )
Therefore
( ) ( ) (
P Yi x p Y j = P Yi x p P Y j < x p . )
By means of (1), this gives
( )p q ( )p q
n n
( )
P Yi x p Yj =
k =i
n
k
k n k
k= j
n
k
k n k
( )p q
j 1
= n
k
k n k
.
k =i
Exercise
10.2.1 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with continuous d.f. F. Use Theorem
3 and the relevant part of Example 3 in order to determine the distribution of
the r.v. F(Y1) and find its expectation.
11.1 Sufficiency: Definition and Some Basic Results 259
Chapter 11
Let X be an r.v. with p.d.f. f(; ) of known functional form but depending upon
an unknown r-dimensional constant vector = (1, . . . , r) which is called a
parameter. We let stand for the set of all possible values of and call it the
parameter space. So ! r, r 1. By F we denote the family of all p.d.f.s we
get by letting vary over ; that is, F = {f(; ); }.
Let X1, . . . , Xn be a random sample of size n from f(; ), that is, n
independent r.v.s distributed as X above. One of the basic problems of
statistics is that of making inferences about the parameter (such as estimat-
ing , testing hypotheses about , etc.) on the basis of the observed values
x1, . . . , xn, the data, of the r.v.s X1, . . . , Xn. In doing so, the concept of
sufficiency plays a fundamental role in allowing us to often substantially con-
dense the data without ever losing any information carried by them about the
parameter .
In most of the textbooks, the concept of sufficiency is treated exclusively
in conjunction with estimation and testing hypotheses problems. We propose,
however, to treat it in a separate chapter and gather together here all relevant
results which will be needed in the sequel. In the same chapter, we also
introduce and discuss other concepts such as: completeness, unbiasedness and
minimum variance unbiasedness.
For j = 1, . . . , m, let Tj be (measurable) functions defined on ! n into ! and
not depending on or any other unknown quantities, and set T = (T1, . . . , Tm).
Then
( ) ( ( ) ( ))
T X 1 , , X n = T1 X 1 , , X n , , Tm X 1 , , X n
is called an m-dimensional statistic. We shall write T(X1, . . . , Xn) rather than
T(X1, . . . , Xn) if m = 1. Likewise, we shall write rather than when r = 1.
Also, we shall often write T instead of T(X1, . . . , Xn), by slightly abusing the
notation.
259
260 11 Sufficiency and Related Theorems
( )
f x; =
n!
1x1 rxr I A x =() n!
x1! xr !
r 1
j =1 ( )
x j ! n x1 xr 1 !
( ) ()
n rj =11x j
1x1 rxr 11 1 1 r 1 IA x ,
r
( )
A = x = x1 , , xr ! r ; x j 0, j = 1, , r , x j = n.
j =1
For example, for r = 3, is that part of the plane through the points (1, 0, 0),
(0, 1, 0) and (0, 0, 1) which lies in the first quadrant, whereas for r = 2, the
distribution of X = (X1, X2) is completely determined by that of X1 = X which
is distributed as B(n, 1) = B(n, ).
EXAMPLE 2 Let X be U(, ). By setting 1 = , 2 = , we have = (1, 2), = {(1, 2)
! 2; 1, 2 !, 1 < 2} (that is, the part of the plane above the main diagonal)
and
(
f x; =
1
)
2 1
IA x , ()
A = 1 , 2 . [ ]
If is known and we put = , then = (, ) and
f x; = ( 1
IA x , ) A = , . () [ ]
Similarly, if is known and = .
EXAMPLE 3 Let X be N(, 2). Then by setting 1 = , 2 = 2, we have = (1, 2),
( )
= 1 , 2 ! 2 ; 1 ! , 2 > 0
(that is, the part of the plane above the horizontal axis) and
11.1 Sufficiency: Definition and Some Basic Results 261
x
( )
2
f x; = ( exp
)
2
1 1 .
2 2 2
If is known and we set = , then = ! and
x
( )
2
f x; = ( exp
2 2
) 1 .
2
Similarly if is known and 2 = .
EXAMPLE 4 Let X = (X1, X2) have the Bivariate Normal distribution. Setting 1 = 1, 2 = 2,
3 = 21, 4 = 22, 5 = , we have then = (1, . . . , 5) and
( )
= 1 , , 5 ! 5 ; 1 , 2 ! , 3 , 4 0, , 5 1, 1
( ) ( )
and
( )
f x; =
1
e q 2 ,
2 1 2 1 2
where
1 x 2 x1 1 x2 2 x2 2
2
q= 1 1
2 +
,
1 2 1 1 2 2
(
x = x1 , x2 . )
Before the formal definition of sufficiency is given, an example will be
presented to illustrate the underlying motivation.
( ) (1 ) ( )
xj 1 x j
fX x j ; =
j
I A xj , j = 1, , n,
where A = {0, 1},
= (0, 1). Set T = nj = 1 Xj. Then T is B(n, ), so that
n
( ) ( ) ()
n t
fT t ; = t 1 IB t ,
t
where B = {0, 1, . . . , n}. We suppose that the Binomial experiment in question
is performed and that the observed values of Xj are xj, j = 1, . . . , n. Then the
problem is to make some kind of inference about on the basis of xj, j = 1, . . . ,
n. As usual, we label as a success the outcome 1. Then the following question
arises: Can we say more about if we know how many successes occurred and
where rather than merely how many successes occurred? The answer to this
question will be provided by the following argument. Given that the number
of successes is t, that is, given that T = t, t = 0, 1, . . . , n, find the probability of
262 11 Sufficiency and Related Theorems
each one of the (nt) different ways in which the t successes can occur. Then, if
there are values of for which particular occurrences of the t successes can
happen with higher probability than others, we will say that knowledge of the
positions where the t successes occurred is more informative about than
simply knowledge of the total number of successes t. If, on the other hand, all
possible outcomes, given the total number of successes t, have the same
probability of occurrence, then clearly the positions where the t successes
occurred are entirely irrelevant and the total number of successes t provides all
possible information about . In the present case, we have
(
P X 1 = x1 , , X n = xn , T = t )
(
P X 1 = x1 , , X n = xn | T = t = )
(
P T = t )
=
(
P X 1 = x1 , , X n = xn )
(
P T = t )
if x1 + + xn = t
and zero otherwise, and this is equal to
( ) ( ) ( )
1 x 1 1 x n n t
x 1
1
x 1
n
t 1 1
= =
n t n t n
( ) ( )
n t n t
1 1
t t t
if x1 + + xn = t and zero otherwise. Thus, we found that for all x1, . . . , xn such
that xj = 0 or 1, j = 1, . . . , n and
n
n
x j = t , P ( X 1 = x1 , , X n = xn | T = t ) = 1
t
j =1
(
Tj = Tj X 1 , , X n , ) j = 1, , m
are statistics. We say that T is an m-dimensional sufficient statistic for the
family F = {f(; ); }, or for the parameter , if the conditional distribution
of (X1, . . . , Xn), given T = t, is independent of for all values of t (actually, for
almost all (a.a.)t, that is, except perhaps for a set N in ! m of values of t such
that P (T N) = 0 for all , where P denotes the probability function
associated with the p.d.f. f(; )).
REMARK 1 Thus, T being a sufficient statistic for implies that every (meas-
urable) set A in ! n, P[(X1, . . . , Xn) A|T = t] is independent of for a.a.
11.1 Sufficiency: Definition and Some Basic Results 263
(
)
(
P T* B T = t = P X 1 , , X n A T = t
)
and this is independent of for a.a. t.
We finally remark that X = (X1, . . . , Xn) is always a sufficient statistic
for .
Clearly, Definition 1 above does not seem appropriate for identifying a
sufficient statistic. This can be done quite easily by means of the following
theorem.
THEOREM 1 (FisherNeyman factorization theorem) Let X1, . . . , Xn be i.i.d. r.v.s with
p.d.f. f(; ), = (1, . . . , r) ! r. An m-dimensional statistic
( ) ( ( ) (
T = T X 1 , , X n = T1 X 1 , , X n , , Tm X 1 , , X n ))
is sufficient for if and only if the joint p.d.f. of X1, . . . , Xn factors as follows,
( ) [ ( ) ]( )
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn ,
where g depends on x1, . . . , xn only through T and h is (entirely) independent
of .
PROOF The proof is given separately for the discrete and the continuous
case.
Discrete case: In the course of this proof, we are going to use the notation
T(x1, . . . , xn) = t. In connection with this, it should be pointed out at the outset
that by doing so we restrict attention only to those x1, , xn for which
T(x1, . . . , xn) = t.
Assume that the factorization holds, that is,
( ) [ ( ) ]( )
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn ,
( ) [( ) ] (
P T = t = P T X 1 , , X n = t = P X 1 = x1 , , X n = xn , )
where the summation extends over all (x1, . . . , xn) for which T(x1, . . . , xn) = t.
Thus
( ) ( ) ( ) ( )(
P T = t = f x1 ; f xn ; = g t; h x1 , , xn )
= g(t; ) h( x , , x ).
1 n
Hence
264 11 Sufficiency and Related Theorems
(
P X 1 = x1 , , X n = xn T = t )
=
(
P X 1 = x1 , , X n = xn , T = t ) = P (X
1 = x1 , , X n = xn )
P T = t( ) P T = t( )
=
( )( ) = h( x , , x )
g t; h x1 , , xn 1 n
g(t; ) h( x , , x ) h( x , , x )
1 n 1 n
(
P X 1 = x1 , , X n = xn )
(
P X 1 = x1 , , X n = xn T = t = ) (
P T = t )
[ (
= k x1 , , xn , T x1 , , xn )]
if and only if
( ) ( ) (
f x1 ; f xn ; = P X 1 = x1 , , X n = xn )
= P (T = t)k[ x , , x , T( x , , x )].
1 n 1 n
Setting
[( ) ]
g T x1 , , xn ; = P T = t ( ) (
and h x1 , , xn )
[
= k x1 , , xn , T x1 , , xn , ( )]
we get
( ) ( ) [ (
f x1 ; f xn ; = g T x1 , , xn ; h x1 , , xn , ) ]( )
as was to be seen.
Continuous case: The proof in this case is carried out under some further
regularity conditions (and is not as rigorous as that of the discrete case). It
should be made clear, however, that the theorem is true as stated. A proof
without the regularity conditions mentioned above involves deeper concepts
of measure theory the knowledge of which is not assumed here. From Remark
1, it follows that m n. Then set Tj = Tj(X1, . . . , Xn), j = 1, . . . , m, and assume
that there exist other n m statistics Tj = Tj(X1, . . . , Xn), j = m + 1, . . . , n, such
that the transformation
(
t j = Tj x1 , , xn , ) j = 1, , n,
is invertible, so that
(
x j = x j t, t m +1 , , t n , ) j = 1, , n, t = t1 , , t m . ( )
11.1 Sufficiency: Definition and Some Basic Results 265
(
fT , T , , T t, t m +1 , , t n ;
m+1 n
)
( )[ (
= g t; h x1 t, t m +1 , , t n ), , x (t, t
n m +1 , , tn )] J
= g(t; )h * (t , t m +1 , , tn , )
where we set
( ) [ ( )
h * t, t m +1 , , t n = h x1 t, t m +1 , , t n , , xn t, t m +1 , , t n ( )] J .
Hence
( ) ( ) ( ) ( ) ()
fT t; = g t; h * t, t m +1 , , t n dt m +1 dt n = g t; h ** t ,
where
() ( )
h * * t = h * t, t m +1 , , t n dt m +1 dt n .
( ) (
f x1 , , xn ; = fT , T , , T t, t m +1 , , t n ; J 1
m+1 n
)
(
= f t m +1 , , t n t; fT t; J 1 . ) ( )
But f(tm+1, . . . , tn|t; ) is independent of by Remark 1. So we may set
( ) (
f t m +1 , , t n t; J 1 = h * t m +1 , , t n ; t = h x1 , , xn . ) ( )
If we also set
( ) [ (
fT t; = g T x1 , , xn ; , ) ]
266 11 Sufficiency and Related Theorems
we get
( ) [ ( ) ](
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn , )
as was to be seen.
COROLLARY Let : ! m ! m ((measurable and independent) of ) be one-to-one, so that
the inverse 1 exists. Then, if T is sufficient for , we have that T = (T) is also
sufficient for and T is sufficient for = (), where : ! r ! r is one-to-one
(and measurable).
PROOF We have T =1[ (T)] = 1( T ). Thus
( ) [ ( ) ](
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn )
{
=g 1
[T ( x , , x )]; }h( x , , x )
1 n 1 n
[ ( )]
= 1 = 1 . ()
Hence
( ) [ ( ) ](
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn )
becomes
( ) [( ) ](
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn , )
where we set
( ) [
f x1 , , xn ; = f x1 , , xn ; 1 ( )]
and
[( ) ] [(
g T x1 , , xn ; = g T x1 , , xn ; 1 . ) ( )]
Thus, T is sufficient for the new parameter .
We now give a number of examples of determining sufficient statistics by
way of Theorem 1 in some interesting cases.
EXAMPLE 6 Refer to Example 1, where
( )
f x; =
n!
x1! xr !
1x rx I A x .
1 r
()
Then, by Theorem 1, it follows that the statistic (X1, . . . , Xr) is sufficient for
= (1, . . . , r). Actually, by the fact that rj = 1j = 1 and rj = 1 xj = n, we also have
( )
f x; =
n!
r 1
j =1 (
x j ! n x1 xr 1 ! )
( ) ()
n rj =11x j
1x rx1 1 1 r 1
1 r1
IA x
11.1 Sufficiency: Definition and Some Basic Results 267
from which it follows that the statistic (X1, . . . , Xr1) is sufficient for (1, . . . ,
r1). In particular, for r = 2, X1= X is sufficient for 1 = .
EXAMPLE 7 Let X1, . . . , Xn be i.i.d. r.v.s from U(1, 2). Then by setting x = (x1, . . . , xn)
and = (1, 2), we get
( )
f x; =
(
1
1 )
n
I [ 1, ( )
) x(1) I ( , ] x( n ) 2 ( )
2
=
( 2
1
1 )
n [ ][
g1 x(1) , g 2 x( n ) , , ]
where g1[x(1), ] = I[ , )(x(1)), g2[x(n), ] = I(, ](x(n)). It follows that (X(1), X(n)) is
1 2
) = [( x )] = ( x
n n n
(x ) ( ) ( )
2 2 2 2
j 1 j x + x 1 j x + n x 1 ,
j =1 j =1 j =1
so that
n
1 1 n
2
( ) (x ) n
( )
2
f x; = exp x x 1 .
2 2 2
j
2 2
2 j =1
Now consider the conditional p.d.f. of (X1, . . . , Xn1), given nj= 1Xj = yn. By
using the transformation
n
y j = x j , j = 1, , n 1, yn = x j ,
j =1
one sees that the above mentioned conditional p.d.f. is given by the quotient of
the following p.d.f.s:
n
1 1
( ) ( )
2 2
exp y1 1 + + yn 1 1
2 2 2 2
( )
2
+ yn y1 yn 1 1
and
2
1
exp
1
yn n1 . ( )
2n 2 2 n 2
This quotient is equal to
2n 2 1
( ) ( ) ( )
2 2 2
exp yn n1 n y1 1 n yn 1 1
( ) 2 n 2
n
2 2
( )
2
n yn y1 yn 1 1
11.1 Exercises
Sufficiency: Definition and Some Basic Results 269
and
(y ) ( ) ( ) ( )
2 2 2 2
n n1 n y1 1 n yn 1 1 n yn y1 yn 1 1
(
= yn2 n y12 + + yn21 + yn y1 yn 1 ) ,
2
independent of 1. Thus the conditional p.d.f. under consideration is indepen-
dent of 1 but it does depend on 2. Thus nj= 1 Xj, or equivalently, X is not
sufficient for (1, 2). The concept of X being sufficient for 1 is not valid
unless 2 is known.
Exercises
11.1.1 In each one of the following cases write out the p.d.f. of the r.v. X and
specify the parameter space of the parameter involved.
i) X is distributed as Poisson;
ii) X is distributed as Negative Binomial;
iii) X is distributed as Gamma;
iv) X is distributed as Beta.
11.1.2 Let X1, . . . , Xn be i.i.d. r.v.s distributed as stated below. Then use
Theorem 1 and its corollary in order to show that:
i) nj= 1 Xj or X is a sufficient statistic for , if the Xs are distributed as
Poisson;
ii) nj= 1 Xj or X is a sufficient statistic for , if the Xs are distributed as
Negative Binomial;
iii) ( nj= 1 Xj, nj= 1 Xj) or ( nj= 1 Xj, X ) is a sufficient statistic for (1, 2) = (,
) if the Xs are distributed as Gamma. In particular, nj= 1 Xj is a sufficient
statistic for = if is known, and nj= 1 Xj or X is a sufficient statistic for
= if is known. In the latter case, take = 1 and conclude that nj= 1 Xj
or X is a sufficient statistic for the parameter = 1/ of the Negative
Exponential distribution;
iv) ( nj= 1 Xj, nj= 1 (1 Xj)) is a sufficient statistic for (1, 2) = (, ) if the Xs
are distributed as Beta. In particular, nj= 1 Xj or nj= 1 log Xj is a sufficient
statistic for = if is known, and nj= 1 (1 Xj) is a sufficient statistic for
= if is known.
11.1.3 (Truncated Poisson r.v.s) Let X1, X2 be i.i.d. r.v.s with p.d.f. f(; )
given by:
( )
f 0 ; = e , ( )
f 1; = e , ( )
f 2; = 1 e e ,
f ( x; ) = 0, x 0, 1, 2,
where > 0. Then show that X1 + X2 is not a sufficient statistic for .
270 11 Sufficiency and Related Theorems
11.1.4 Let X1, . . . , Xn be i.i.d. r.v.s with the Double Exponential p.d.f. f(; )
given in Exercise 3.3.13(iii) of Chapter 3. Then show that nj= 1|Xj| is a sufficient
statistic for .
11.1.5 If Xj = (X1j, X2j), j = 1, . . . , n, is a random sample of size n from the
Bivariate Normal distribution with parameter as described in Example 4,
then, by using Theorem 1, show that:
n n n
X1 , X 2 , X1 j , X 2 j , X1 j X 2 j
2 2
j =1 j =1 j =1
is a sufficient statistic for .
11.1.6 If X1, . . . , Xn is a random sample of size n from U(, ), (0, ),
show that (X(1), X(n)) is a sufficient statistic for . Furthermore, show that this
statistic is not minimal by establishing that T = max(|X1|, . . . , |Xn|) is also a
sufficient statistic for .
11.1.7 If X1, . . . , Xn is a random sample of size n from N(, 2), !, show
that
n n n
X j, X j X j2
2
or X ,
j =1 j =1 j =1
( )
f x; = e
(
x )
()
I ( , ) x , ! ,
( ) () ( )
i) f x; = x 1 I ( 0 , 1) x , 0, ;
( )
iii) f x; =
1 3 x
6 4
()
x e I ( 0 , ) x , 0, ; ( )
+1
c
( )
iv)) f x; =
c x
() (
I ( c , ) x , 0, . )
11.1 Sufficiency: Definition and 11.2
Some Completeness
Basic Results 271
11.2 Completeness
In this section, we introduce the (technical) concept of completeness which we
also illustrate by a number of examples. Its usefulness will become apparent in
the subsequent sections. To this end, let X be a k-dimensional random vector
with p.d.f. f(; ),
Rr, and let g: ! k ! be a (measurable) function,
so that g(X) is an r.v. We assume that E g(X) exists for all and set
F = {f(.; ); }.
DEFINITION 2 With the above notation, we say that the family F (or the random vector X) is
complete if for every g as above, E g(X) = 0 for all implies that g(x) = 0
except possibly on a set N of xs such that P(X N) = 0 for all .
The examples which follow illustrate the concept of completeness. Mean-
while let us recall that if nj= 0 cnj xnj = 0 for more than n values of x, then
cj = 0, j = 0, . . . , n. Also, if n = 0 cnxn = 0 for all values of x in an interval for
which the series converges, then cn = 0, n = 0, 1, . . . .
EXAMPLE 9 Let
n
( ) ( ) ( ) () ( )
n x
F = f ; ; f x; = x 1 I A x , 0, 1 ,
x
where A = {0, 1, . . . , n}. Then F is complete. In fact,
n
n n
( ) () ( ) ( ) g( x) nx
n x n
E g X = g x x 1 = 1 x
,
x= 0 x x =0
for every (0, ), hence for more than n values of , and therefore
n
()
g x = 0, x = 0, 1, , n
x
which is equivalent to g(x) = 0, x = 0, 1, . . . , n.
EXAMPLE 10 Let
x
( ) (
F = f ; ; f x; = e )I A x , 0, , () ( )
x !
where A = {0, 1, . . .}. Then F is complete. In fact,
x g x
()
( )
E g X = g x e
x= 0
() x!
= e
x = 0 x!
x = 0
EXAMPLE 11 Let
( ) (
F = f ; ; f x; =
1
) ()
I [ , ] x , , . ( )
Then F is complete. In fact,
( )
E g X =
1
a
()
g x dx.
Thus, if E g(X) = 0 for all (, ), then g(x)dx = 0 for all > which
intuitively implies (and that can be rigorously justified) that g(x) = 0 except
possibly on a set N of xs such that P (X N) = 0 for all , where X is an
r.v. with p.d.f. f(; ). The same is seen to be true if f(; ) is U(, ).
EXAMPLE 12 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). If is known and = , it can be
shown that
x
( )
2
, !
( ) (
F = f ; ; f x; =
1
)
exp
2 2
2
is complete. If is known and 2 = , then
x 2
( )
( ) (
F = f ; ; f x; = )
1
exp
2
, 0, ( )
2
is not complete. In fact, let g(x) = x . Then E g(X ) = E(X ) = 0 for all
(0, ), while g(x) = 0 only for x = . Finally, if both and 2 are unknown,
it can be shown that ( X , S2) is complete.
In the following, we establish two theorems which are useful in certain
situations.
THEOREM 2 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), ! r and let T = (T1, . . . ,
Tm) be a sufficient statistic for , where Tj = Tj(X1, , Xn), j = 1, , m. Let
g(; ) be the p.d.f. of T and assume that the set S of positivity of g(; ) is the
same for all . Let V = (V1, . . . , Vk), Vj = Vj(X1, . . . , Xn), j = 1, . . . , k, be
any other statistic which is assumed to be (stochastically) independent of T.
Then the distribution of V does not depend on .
PROOF We have that for t S, g(t; ) > 0 for all and so f(v|t) is well
defined and is also independent of , by sufficiency. Then
( ) ( )( )
fV , T v, t; = f v t g t;
for all v and t S, while by independence
( ) ( )( )
fV , T v, t; = fV v; g t;
for all v and t. Therefore
11.1 Exercises
Sufficiency: Definition and Some Basic Results 273
( )( ) ( )( )
fV v; g t; = f v t g t;
for all v and t S. Hence fV(v; ) = f(v/t) for all v and t S; that is, fV(v; ) =
fV(v) is independent of .
REMARK 4 The theorem need not be true if S depends on .
Under certain regularity conditions, the converse of Theorem 2 is true
and also more interesting. It relates sufficiency, completeness, and stochastic
independence.
THEOREM 3 (Basu) Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), ! r and let
T = (T1, . . . , Tm) be a sufficient statistic of , where Tj = Tj(X1, . . . , Xn),
j = 1, . . . , m. Let g(; ) be the p.d.f. of T and assume that C = {g(; ); }
is complete. Let V = (V1, . . . , Vk), Vj = Vj(X1, . . . , Xn), j = 1, . . . , k be any other
statistic. Then, if the distribution of V does not depend on , it follows that V
and T are independent.
PROOF It suffices to show that for every t ! m for which f(v|t) is defined,
one has fV(v) = f(v|t), v ! k. To this end, for an arbitrary but fixed v, consider
the statistic (T; v) = fV(v) f(v|T) which is defined for all ts except perhaps
for a set N of ts such that P (T N) = 0 for all . Then we have for the
continuous case (the discrete case is treated similarly)
( ) [ ( ) ( )]
E T ; v = E f V v f v T = f V v E f v T () ( )
() ( )( )
= fV v f v t1 , , t m g t1 , , t m ; dt1 dt m
= fV ( v ) f ( v ) = 0;
V
that is, E(T; v) = 0 for all and hence (t; v) = 0 for all t Nc by
completeness (N is independent of v by the definition of completeness). So
fV(v) = f(v/t), t Nc, as was to be seen.
Exercises
11.2.1 If F is the family of all Negative Binomial p.d.f.s, then show that F is
complete.
11.2.2 If F is the family of all U(, ) p.d.f.s, (0, ), then show that F
is not complete.
11.2.3 (Basu) Consider an urn containing 10 identical balls numbered + 1,
+ 2, . . . , + 10, where = {0, 10, 20, . . . }. Two balls are drawn one by
one with replacement, and let Xj be the number on the jth ball, j = 1, 2. Use this
274 11 Sufficiency and Related Theorems
example to show that Theorem 2 need not be true if the set S in that theorem
does depend on .
11.3 UnbiasednessUniqueness
In this section, we shall restrict ourselves to the case that the parameter is real-
valued. We shall then introduce the concept of unbiasedness and we shall
establish the existence and uniqueness of uniformly minimum variance un-
biased statistics.
DEFINITION 3 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), ! and let U = U(X1, . . . ,
Xn) be a statistic. Then we say that U is an unbiased statistic for if EU = for
every , where by EU we mean that the expectation of U is calculated by
using the p.d.f. f(; ).
We can now formulate the following important theorem.
THEOREM 4 (RaoBlackwell) Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), !, and
let T = (T1, . . . , Tm), Tj = Tj(X1, . . . , Xn), j = 1, . . . , m, be a sufficient statistic
for . Let U = U(X1, . . . , Xn) be an unbiased statistic for which is not a
function of T alone (with probability 1). Set (t) = E(U|T = t). Then we have
that:
i) The r.v. (T) is a function of the sufficient statistic T alone.
ii) (T) is an unbiased statistic for .
iii) 2[(T)] < 2(U), , provided EU2 < .
PROOF
i) That (T) is a function of the sufficient statistic T alone and does not
depend on is a consequence of the sufficiency of T.
ii) That (T) is unbiased for , that is, E(T) = for every , follows
from (CE1), Chapter 5, page 123.
iii) This follows from (CV), Chapter 5, page 123.
The interpretation of the theorem is the following: If for some reason one
is interested in finding a statistic with the smallest possible variance within the
class of unbiased statistics of , then one may restrict oneself to the subclass of
the unbiased statistics which depend on T alone (with probability 1). This is so
because, if an unbiased statistic U is not already a function of T alone (with
probability 1), then it becomes so by conditioning it with respect to T. The
variance of the resulting statistic will be smaller than the variance of the
statistic we started out with by (iii) of the theorem. It is further clear that
the variance does not decrease any further by conditioning again with respect
to T, since the resulting statistic will be the same (with probability 1) by
(CE2), Chapter 5, page 123. The process of forming the conditional expecta-
tion of an unbiased statistic of , given T, is known as RaoBlackwellization.
11.1 11.3 UnbiasednessUniqueness
Sufficiency: Definition and Some Basic Results 275
[ ( ) ( )]
E U T V T = 0, , or ( )
E T = 0, ,
where (T) = U(T) V(T). Then by completeness of C, we have (t) = 0 for
all t Rm except possibly on a set N of ts such that P(T N) = 0 for all
.
DEFINITION 4 An unbiased statistic for which is of minimum variance in the class of all
unbiased statistics of is called a uniformly minimum variance (UMV) unbi-
ased statistic of (the term uniformly referring to the fact that the variance
is minimum for all ).
Some illustrative examples follow.
EXAMPLE 13 Let X1, . . . , Xn be i.i.d. r.v.s from B(1, ), (0, 1). Then T = nj= 1 Xj is a
sufficient statistic for , by Example 5, and also complete, by Example 9. Now
X = (1/n)T is an unbiased statistic for and hence, by Theorem 5, UMV
unbiased for .
EXAMPLE 14 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). Then if is known and = , we
have that T = nj= 1 Xj is a sufficient statistic for , by Example 8. It is also
complete, by Example 12. Then, by Theorem 5, X = (1/n)T is UMV unbiased
for , since it is unbiased for . Let be known and without loss of generality
set = 0 and 2 = . Then T = nj= 1 X 2j is a sufficient statistic for , by Example
8. Since T is also complete (by Theorem 8 below) and S2 = (1/n)T is unbiased
for , it follows, by Theorem 5, that it is UMV unbiased for .
Here is another example which serves as an application to both Rao
Blackwell and LehmannScheff theorems.
EXAMPLE 15 Let X1, X2, X3 be i.i.d. r.v.s from the Negative Exponential p.d.f. with param-
eter . Setting = 1/, the p.d.f. of the Xs becomes f(x; ) = 1/ex/, x > 0. We
have then that E(Xj) = and 2(Xj) = 2, j = 1, 2, 3. Thus X1, for example, is an
unbiased statistic for with variance 2. It is further easily seen (by Theorem
276 11 Sufficiency and Related Theorems
Exercises
11.3.1 If X1, . . . , Xn is a random sample of size n from P(), then use
Exercise 11.1.2(i) and Example 10 to show that X is the (essentially) unique
UMV unbiased statistic for .
11.3.2 Refer to Example 15 and, by utilizing the appropriate transformation,
show that X is the (essentially) unique UMV unbiased statistic for .
( )
f x; = C e ()
Q ( )T ( x )
()
hx, x ! , ! , ( (1) )
where C() > 0, and also h(x) > 0 for x S, the set of positivity of f(x; ),
which is independent of . It follows that
( ) e
C 1 =
x S
() ( )
Q T x
()
hx
()
C 1 = e
S
() ( )
Q T x
()
h x dx
for the continuous case. If X1, . . . , Xn are i.i.d. r.v.s with p.d.f. f(; ) as above,
then the joint p.d.f. of the Xs is given by
n
( ) ()
() j =1
( ) ( )
( )
f x1 , , xn ; = C n exp Q T x j h x1 h xn ,
x j ! , j = 1, , n, . (2)
Some illustrative examples follow.
11.4 The Exponential Family
11.1 of p.d.f.s:Definition
Sufficiency: One-Dimensional Parameter
and Some Case
Basic Results 277
EXAMPLE 16 Let
n
( ) ( ) ()
n x
f x; = x 1 IA x ,
x
where A = {0, 1, . . . , n}. This p.d.f. can also be written as follows,
n
( ) ( ) () ( )
n
f x; = 1 exp log x I A x , 0, 1 ,
1 x
and hence is of the exponential form with
n
() ( ) () () () ()
n
C = 1 , Q = log , T x = x, h x = I A x .
1 x
EXAMPLE 17 Let now the p.d.f. be N(, 2). Then if is known and = , we have
2 1 2
(
f x; = ) 1
exp 2
2
exp 2 x exp
x , ! ,
2 2
2
and hence is of the exponential form with
2
C =() 1
exp 2
2
, ()
Q =
2
,
2
1 2
T x = x, h x = exp () x .
2 2
()
If now is known and = , then we have
2
1 2
(
f x; = ) 1
exp
2
(
x , 0, ,
) ( )
2
and hence it is again of the exponential form with
() 1
() 1
() ( ) ()
2
C = , Q = , T x = x and h x = 1.
2 2
If the parameter space of a one-parameter exponential family of p.d.f.s
contains a non-degenerate interval, it can be shown that the family is com-
plete. More precisely, the following result can be proved.
THEOREM 6 Let X be an r.v. with p.d.f. f(; ), ! given by (1) and set C = {g(; );
}, where g(; ) is the p.d.f. of T(X). Then C is complete, provided
contains a non-degenerate interval.
Then the completeness of the families established in Examples 9 and 10
and the completeness of the families asserted in the first part of Example 12
and the last part of Example 14 follow from the above theorem.
In connection with families of p.d.f.s of the one-parameter exponential
form, the following theorem holds true.
278 11 Sufficiency and Related Theorems
THEOREM 7 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. of the one-parameter exponential form.
Then
i) T* = nj= 1 T(Xj) is a sufficient statistic for .
ii) The p.d.f. of T* is of the form
( ) ()
g t; = C n e
()
Q t
()
h* t ,
n n
( ) () () ( )
g t ; = C n exp Q T x j h x j
j =1 j = 1
( )
n
()
= Cn e
Q ( ) t
( ) () ()
h x j = C n e ( ) h * t ,
Q t
j =1
where
n
() ( )
h * t = h x j .
j = 1
Next, let the Xs be of the continuous type. Then the proof is carried out under
certain regularity conditions to be spelled out. We set Y1 = nj= 1 T(Xj) and let
Yj = Xj, j = 2, . . . , n. Then consider the transformation
n n
( )
y1 = T x j ( )
T x1 = y1 T y j ( )
j =1 j =2
hence
y = x , j = 2, x = y , j = 2, . . . , n,
j j , n; j j
and thus
n
x1 = T y1 T y j
1
j =2
( )
x = y , j = 2, , n,
j j
x1 n
=
1
, where ( )
z = y1 T y j ,
[ ( )]
y1 T T 1 z j =2
x1
=
1 z
=
T yj
,
( ) and
x j
=1
[ ( )]
y j T T 1 z y j T T 1 z [ ( )] y j
( ) ( ) { ( )[ ( )
g y1 , , yn ; = C n exp Q y1 T y2 T yn ( )
+ T ( y ) + + T ( y )]}
2 n
{ [ ( )]} h( y ) J
n
( )
h T 1 y1 T y2 T yn
j =2
j
= Cn e () ()
Q y1
{ [ ( )
h T 1 y1 T y2 T yn ( )]}
n
h yj J .
j =2
( )
So if we integrate with respect to y2, . . . yn, set
( )
{ [
h * y1 = h T 1 y1 T y2 T yn
( ) ( )]}
n
( )
h y j J dy2 dyn ,
j =2
THEOREM 9 Let the r.v. X1, . . . , Xn be i.i.d. from a p.d.f. of the one-parameter exponential
form and let T* be defined by (i) in Theorem 7. Then, if V is any other statistic,
it follows that V and T* are independent if and only if the distribution of V
does not depend on .
PROOF In the first place, T* is sufficient for , by Theorem 7(i), and the set
of positivity of its p.d.f. is independent of , by Theorem 7(ii). Thus the
assumptions of Theorem 2 are satisfied and therefore, if V is any statistic which
is independent of T*, it follows that the distribution of V is independent of .
For the converse, we have that the family C of the p.d.f.s of T* is complete, by
Theorem 8. Thus, if the distribution of a statistic V does not depend on ,
it follows, by Theorem 3, that V and T* are independent. The proof is
completed.
APPLICATION Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). Then
1 n 1 n
( )
2
X= Xj
n j =1
and S2 = Xj X
n j =1
are independent.
PROOF We treat as the unknown parameter and let 2 be arbitrary (>0)
but fixed. Then the p.d.f. of the Xs is of the one-parameter exponential form
and T = X is both sufficient for and complete. Let
n
( ) ( )
2
V = V X1 , , X n = X j X .
j =1
(X ) = (Y Y ) .
2 2
j X j
j =1 j =1
But the distribution of nj= 1 (Yj Y )2 does not depend on , because P[nj= 1 (Yj
Y )2 B] is equal to the integral of the joint p.d.f. of the Ys over B and this
p.d.f. does not depend on .
Exercises
11.4.1 In each one of the following cases, show that the distribution of the
r.v. X is of the one-parameter exponential form and identify the various
quantities appearing in a one-parameter exponential family.
i) X is distributed as Poisson;
ii) X is distributed as Negative Binomial;
iii) X is distributed as Gamma with known;
11.1 11.5 Some
Sufficiency: Multiparameter
Definition and SomeGeneralizations
Basic Results 281
r
( ) () () () ()
f x; = C exp Q j Tj x h x ,
j = 1
where x = (x1, . . . , xk), xj !, j = 1, . . . , k, k 1, = (1, . . . , r) Rr,
) > 0, and h(x) > 0 for x S, the set of positivity of f(; ), which is
C(
independent of .
The following are examples of multiparameter exponential families.
EXAMPLE 18 Let X = (X1, . . . , Xr) have the multinomial p.d.f. Then
( ) ( )
n
f x1 , , xr ; 1 , , r 1 = 1 1 r 1
r j
exp x j log
j =1 1
n!
(
x ! x ! I A x1 , , xr , )
1 r 1 1 r
() ( )
n
C = 1 1 r 1 ,
j
()
Q j = log
1 1 r 1
( )
, Tj x1 , , xr = x j , j = 1, , r 1,
and
( )
h x1 , , xr =
n!
x1! xr !
( )
I A x1 , , xr .
2
()
C =
1
2
()
exp 1 , Q1 = 1 , Q2 =
2
1
, T1 x = x, ()
2 2 2 2 2
()
T2 x = x 2 ()
and h x = 1.
For multiparameter exponential families, appropriate versions of Theo-
rems 6, 7 and 8 are also true. This point will not be pursued here, however.
Finally, if X1, . . . , Xn are i.i.d. r.v.s with p.d.f. f(; ), = (1, . . . , r)
! r, not necessarily of an exponential form, the r-dimensional statistic U =
(U1, . . . , Ur), Uj = Uj(X1, . . . , Xn), j = 1, . . . , r, is said to be unbiased if EUj =
j, j = 1, . . . , r for all . Again, multiparameter versions of Theorems 49
may be formulated but this matter will not be dealt with here.
Exercises
11.5.1 In each one of the following cases, show that the distribution of the
r.v. X and the random vector X is of the multiparameter exponential form and
identify the various quantities appearing in a multiparameter exponential
family.
i) X is distributed as Gamma;
ii) X is distributed as Beta;
iii) X = (X1, X2) is distributed as Bivariate Normal with parameters as de-
scribed in Example 4.
11.5.2 If the r.v. X is distributed as U(, ), show that the p.d.f. of X is not
of an exponential form regardless of whether one or both of , are unknown.
11.5.3 Use the not explicitly stated multiparameter versions of Theorems 6
and 7 to discuss:
11.1 Exercises
Sufficiency: Definition and Some Basic Results 283
px =() 1
( + x )
,
1+e
where > 0, ! are unknown parameters. In an experiment, k different
doses of the drug are considered, each dose is applied to a number of animals
and the number of deaths among them is recorded. The resulting data can be
presented in a table as follows.
Dose x1 x2 ... xk
Number of deaths
(Y) Y1 Y2 ... Yk
x1, x2, . . . , xk and n1, n2, . . . , nk are known constants, Y1, Y2, . . . , Yk are
independent r.v.s; Yj is distributed as B(nj, p(xj)). Then show that:
ii) The joint distribution of Y1, Y2, . . . , Yk constitutes an exponential family;
ii) The statistic
k k
j j j
Y , x Y
j =1 j =1
is sufficient for = (, ).
(REMARK In connection with the probability p(x) given above, see also
Exercise 4.1.8 in Chapter 4.)
284 12 Point Estimation
Chapter 12
Point Estimation
12.1 Introduction
Let X be an r.v. with p.d.f. f(; ), where ! r. If is known, we can
calculate, in principle, all probabilities we might be interested in. In practice,
however, is generally unknown. Then the problem of estimating arises; or
more generally, we might be interested in estimating some function of , g( ),
say, where g is (measurable and) usually a real-valued function. We now
proceed to define what we mean by an estimator and an estimate of g( ). Let
X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ). Then
DEFINITION 1 Any statistic U = U(X1, . . . , Xn) which is used for estimating the unknown
) is called an estimator of g(
quantity g( ). The value U(x1, . . . , xn) of U for
the observed values of the Xs is called an estimate of g( ).
For simplicity and by slightly abusing the notation, the terms estimator and
estimate are often used interchangeably.
Exercise
12.1.1 Let X1, . . . , Xn be i.i.d. r.v.s having the Cauchy distribution with =
1 and unknown. Suppose you were to estimate ; which one of the estimators
would you choose? Justify your answer.
X1 , X
as a criterion of selection.)
(Hint: Use the distributions of X1 and X
284
12.2 Criteria12.3
for Selecting
The CaseanofEstimator:
AvailabilityUnbiasedness, Minimum Statistics
of Complete Sufficient Variance 285
[ ()
P U g 1 ] 2U
2
.
Therefore the smaller 2U is, the larger the lower bound of the probability of
concentration of U about g( ) becomes. A similar interpretation can be given
by means of the CLT when applicable.
h1(u; ) h2(u; )
u u
0 g( ) 0 g( )
(a) (b)
Figure 12.1 (a) p.d.f. of U1 (for a fixed ). (b) p.d.f. of U2 (for a fixed ).
286 12 Point Estimation
Following this line of reasoning, one would restrict oneself first to the class
of all unbiased estimators of g( ) and next to the subclass of unbiased estima-
tors which have finite variance under all . Then, within this restricted
class, one would search for an estimator with the smallest variance. Formaliz-
ing this, we have the following definition.
Exercises
12.2.1 Let X be an r.v. distributed as B(n, ). Show that there is no unbiased
estimator of g( ) = 1/ based on X.
In discussing Exercises 12.2.212.2.4 below, refer to Example 3 in Chapter 10
and Example 7 in Chapter 11.
12.2.2 Let X1, . . . , Xn be independent r.v.s distributed as U(0, ), =
(0, ). Find unbiased estimators of the mean and variance of the Xs depend-
ing only on a sufficient statistic for .
12.2.3 Let X1, . . . , Xn be i.i.d. r.v.s from U(1, 2), 1 < 2 and find unbiased
estimators for the mean (1 + 2)/2 and the range 2 1 depending only on a
sufficient statistic for (1, 2).
12.3 The Case of Availability of Complete Sufficient Statistics 287
U1 =
n+1
X
2 n + 1 (n )
and U 2 =
n+1
5n + 4 [ ]
2 X ( n ) + X (1 ) .
Then show that both U1 and U2 are unbiased estimators of and that U2 is
uniformly better than U1 (in the sense of variance).
12.2.5 Let X1, . . . , Xn be i.i.d. r.v.s from the Double Exponential distribu-
tion f(x; ) = 21 e|x|, = !. Then show that (X(1) + X(n))/2 is an unbiased
estimator of .
12.2.6 Let X1, . . . , Xm and Y1, . . . , Yn be two independent random samples
with the same mean and known variances 21 and 22, respectively. Then show
+ (1 c)Y
that for every c [0, 1], U = cX is an unbiased estimator of . Also
find the value of c for which the variance of U is minimum.
12.2.7 Let X1, . . . , Xn be i.i.d. r.v.s with mean and variance 2, both
unknown. Then show that X is the minimum variance unbiased linear estima-
tor of .
THEOREM 1 Let g be as in Definition 2 and assume that there exists an unbiased estimator
U = U(X1, . . . , Xn) of g( ) with finite variance. Furthermore, let T = (T1, . . . ,
Tm), Tj = Tj(X1, . . . , Xn), j = 1, . . . , m be a sufficient statistic for and suppose
that it is also complete. Set (T) = E (U|T). Then (T) is a UMVU estimator
) and is essentially unique.
of g(
This theorem will be illustrated by a number of concrete examples.
EXAMPLE 1 Let X1, . . . , Xn be i.i.d. r.v.s from B(1, p) and suppose we wish to find a
UMVU estimator of the variance of the Xs.
The variance of the Xs is equal to pq. Therefore, if we set p = , =
(0, 1) and g() = (1 ), the problem is that of finding a UMVU estimator for
g(). We know that, if
1 n
( )
2
U= Xj X ,
n 1 j =1
then E U = g(). Thus U is an unbiased estimator of g(). Furthermore,
2
n n n 1 n
( )
2
= X j2 nX 2 = X j n X j
Xj X
n j =1
j =1 j =1 j =1
because Xj takes on the values 0 and 1 only and hence X 2j = Xj. By setting
T = nj= 1 Xj, we have then
n
T2 1 T2
(X )
2
X =T , so that U = T .
n 1 n
j
j =1 n
n
( ) ( ) ( )
n n 1 n 2
= 1 + n 1 + 2 1 .
2
1 if X1 2
U=
0 if X 1 > 2.
( ) (
E U T = t = P U = 1T = t )
= P (X
(
P X 1 2, X 1 + + X r = t )
1 2T = t = ) P T = t ( )
)[
=
1
(
P X 1 = 0, X 2 + + X r = t )
(
P T = t
(
+ P X 1 = 1, X 2 + + X r = t 1 )
+P ( X
1 = 2, X 2 + + X r = t 2)]
)[
=
1
(
P X 1 = 0 P X 2 + + X r = t ) ( )
(
P T = t
( ) (
+ P X 1 = 1 P X 2 + + X r = t 1 )
+P ( X
1 = 2)P ( X 2 + + X = t 2)] r
n(r 1)
1
nr ( )
( ) (1 ) (1 )
nr t n n r 1 t
= t 1
t
t t
(
+ n 1 )
n 1
(
n r 1 t 1
1
)
n ( r 1)t +1
( )
t 1
n
+ 2 1 ( )
n 2
(
n r 1 t 2
1
)
n ( r 1)t + 2
( )
2 t2
( )
1
nr t n r 1
( ) ( )
nr t nr t
= t 1 1
t t
( )
n r 1 n n r 1
+ n + .
( )
t 1 2 t 2
Therefore
( )
nr n r 1
T =
1
( )
n r 1 n n r 1
+ n +
( ) ( )
T T T 1 2 T 2
290 12 Point Estimation
EXAMPLE 3 Consider certain events which occur according to the distribution P(). Then
the probability that no event occurs is equal to e. Let now X1, . . . , Xn (n 2)
be i.i.d. r.v.s from P(). Then the problem is that of finding a UMVU estima-
tor of e.
Set
n
T = X j , = , g = e
j =1
()
and define U by
1 if X1 = 0
U=
0 if X 1 1.
Then
( ) (
E U = P U = 1 = P X1 = 0 = g ; ) ()
that is, U is an unbiased estimator of g( ). However, it does not depend on T
which is a complete, sufficient statistic for , according to Exercise 11.1.2(i)
and Example 10 in Chapter 11. It remains then for us to RaoBlackwellize U.
For this purpose we use the fact that the conditional distribution of X1, given
T = t, is B(t, 1/n). (See Exercise 12.3.1.) Then
t
( ) ( 1
E U T = t = P X1 = 0 T = t = 1 ,
n
)
so that
T
1
( )
T = 1
n
is a UMVU estimator of e.
EXAMPLE 4 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2) with 2 unknown and known. We
are interested in finding a UMVU estimator of .
Set 2 = and let g() = . By Corollary 5, Chapter 7, we have that
1/ nj=1(Xj )2 is 2n. So, if we set
1 n
( )
2
S2 =
n j =1
Xj ,
then nS2/ is 2n, so that nS/ is distributed as n. Then the expectation
E(nS/ ) can be calculated and is independent of ; call it cn (see Exercise
12.3.2). That is,
nS nS
E = c n , so that E = ,
c n
Setting finally cn = cn /n, we obtain
12.3 The Case of Availability of Complete Sufficient Statistics 291
S
E = ;
cn
that is, S/cn is an unbiased estimator of g( ). Since this estimator depends on
the complete, sufficient statistic (see Example 8 and Exercise 11.5.3(ii), Chap-
ter 11) S2 alone, it follows that S/cn is a UMVU estimator of .
EXAMPLE 5 Let again X1, . . . , Xn be i.i.d. r.v.s from N(, 2) with both and 2 unknown.
We are interested in finding UMVU estimators for each one of and 2.
Here = (, 2) and let g1( ) = , g2(
) = 2. By setting
1 n
( )
2
S2 =
n j =1
Xj X ,
we have that (X, S2) is a sufficient statistic for . (See Example 8, Chapter 11.)
Furthermore, it is complete. (See Example 12, Chapter 11.) Let U1 = X and U2
= nS /(n 1). Clearly, E U1 = . By Remark 5 in Chapter 7,
2
nS 2
E 2 = n 1.
Therefore
nS 2
= .
2
E
n 1
So U1 and U2 are unbiased estimators of and 2, respectively. Since they
, S 2), it follows that they
depend only on the complete, sufficient statistic (X
are UMVU estimators.
EXAMPLE 6 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2) with both and 2 unknown, and
set p for the upper pth quantile of the distribution (0 < p < 1). The problem is
that of finding a UMVU estimator of p.
Set = (, 2 ). From the definition of p, one has P (X1 p) = p. But
X p p
( )
P X 1 p = P 1
= 1
,
so that
p
= 1 p.
Hence
p
( ) (
= 1 1 p and p = + 1 1 p . )
Of course, since p is given, 1(1 p) is a uniquely determined number. Then
) = + 1(1 p), our problem is that of finding a UMVU
by setting g(
). Let
estimator of g(
292 12 Point Estimation
U=X+
S 1
cn
(
1 p ,)
where cn is defined in Example 4. Then by the fact that E X = and E (S/cn)
= (see Example 4), we have that E U = g( ). Since U depends only on the
, S2), it follows that U is a UMVU estimator
complete, sufficient statistic (X
of p.
Exercises
12.3.1 Let X1, . . . , Xn be i.i.d. r.v.s from P() and set T = nj= 1 Xj. Then show
that the conditional p.d.f. of X1, given T = t, is that of B(t, 1/n). Furthermore,
observe that the same is true if X1 is replaced by any one of the remaining Xs.
12.3.2 Refer to Example 4 and evaluate the quantity cn mentioned there.
12.3.3 If X1, . . . , Xn are i.i.d. r.v.s from B(1, ), = (0, 1), by using
Theorem 1, show that X is the UMVU estimator of .
12.3.4 If X1, . . . , Xn are i.i.d. r.v.s from P( ), = (0, ), use Theorem
1 in order to determine the UMVU estimator of .
12.3.5 Let X1, . . . , Xn be i.i.d. r.v.s from the Negative Exponential distribu-
tion with parameter = (0, ). Use Theorem 1 in order to determine the
UMVU estimator of .
12.3.6 Let X be an r.v. having the Negative Binomial distribution with
parameter = (0, 1). Find the UMVU estimator of g( ) = 1/ and
determine its variance.
12.3.7 Let X1, . . . , Xn be independent r.v.s distributed as N(, 1). Show that
2 (1/n) is the UMVU estimator of g( ) = 2.
X
12.3.8 Let X1, . . . , Xn be independent r.v.s distributed as N(, 2 ), where
both and 2 are unknown. Find the UMVU estimator of /.
12.3.9 Let (Xj, Yj), j = 1, . . . , n be independent random vectors having the
Bivariate Normal distribution with parameter = (1, 2, 1, 2, ). Find the
UMVU estimators of the following quantities: 12, 12, 2/1.
12.3.10 Let X be an r.v. denoting the life span of a piece of equipment. Then
the reliability of the equipment at time x, R(x), is defined as the probability
that X > x. If X has the Negative Exponential distribution with parameter
= (0, ), find the UMVU estimator of the reliability R(x; ) on the basis
of n observations on X.
12.3.11 Let X be an r.v. having the Geometric distribution; that is,
( ) ( ) ( )
x
f x; = 1 , x = 0, 1, . . . , = 0, 1 ,
12.4 The Case Where Complete Sufficient
12.3 The Case ofStatistics AreofNot
Availability Available
Complete or May Statistics
Sufficient Not Exist 293
12.4 The Case Where Complete Sufficient Statistics Are Not Available or
May Not Exist: CramrRao Inequality
When complete, sufficient statistics are available, the problem of finding a
UMVU estimator is settled as in Section 3. When such statistics do not exist,
or it is not easy to identify them, one may use the approach described here in
searching for a UMVU estimator. According to this method, we first establish
a lower bound for the variances of all unbiased estimators and then we attempt
to identify an unbiased estimator with variance equal to the lower bound
found. If that is possible, the problem is solved again. At any rate, we do have
a lower bound of the variances of a class of estimators, which may be useful for
comparison purposes.
The following regularity conditions will be employed in proving the main
result in this section. We assume that ! and that g is real-valued and
differentiable for all .
or f ( x1 ; ) f ( xn ; )
S S
may be differentiated under the integral or summation sign, respectively.
iv) E [(/ )log f (X; )]2, to be denoted by I( ), is >0 for all .
vi) S S U ( x1 , . . . , xn )f ( x1 ; ) f ( xn ; ) dx1 dxn
or U ( x1 , . . . , xn ) f ( x1 ; ) f ( xn ; )
S S
may be differentiated under the integral or summation si gn, respectively,
where U(X1, . . . , Xn) is any unbiased estimator of g( ). Then we have the
following theorem.
THEOREM 2 (CramrRao inequality.) Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ) and
assume that the regularity conditions (i)(vi) are fulfilled. Then for any un-
biased estimator U = U(X1, . . . , Xn) of g( ), one has
[g ( )] ()
2
dg
2 U , , where g = () .
nI () d
PROOF If 2 U = or I( ) = for some , the inequality is trivially true
for those s. Hence we need only consider the case where 2 U < and I( )
< for all . Also it suffices to discuss the continuous case only, since the
discrete case is treated entirely similarly with integrals replaced by summation
signs.
We have
(
E U X 1 , . . . , X n )
( )( )
= U x1 , . . . , xn f x1 ; f xn ; dx1 dxn = g .
S S
( ) () (1)
Now restricting ourselves to S, we have
[( )
f x1 ; f xn ; ( )]
= ( )
f x1 ; f x j ; + f ( ) ( x ; )
2
j 1
(
f x j ; + + )
f xn ; ( ) f ( x ; )j
j 2 j n
n
=
j =1
f xj ; ( ) f ( x ; )i
ij
n n
=
1
( )
f x j ; f xi ; ( )
(
j =1 f x j ; ) i =1
n n
=
j =1
log f ( )
x j ; f xi ; .
i =1
( ) (2)
12.4 The Case Where Complete Sufficient
12.3 The Case ofStatistics AreofNot
Availability Available
Complete or May Statistics
Sufficient Not Exist 295
n n
()
g =
S S 1
U x ,(. . . , x n )
j = 1
log f x j ;
i = 1
( )
f xi ; dx1 dxn ( )
n
(
= E U X 1 , . . . , X n
j = 1
) (
log f X j ; = E UV ,
) (3) ( )
where we set
n
(
V = V X 1 , . . . , X n = ) j = 1
log f X j ; .( )
Next,
S ( ) (
f x1 ; f xn ; dx1 dxn = 1.
S
)
Therefore differentiating both sides with respect to by virtue of (iv), and
employing (2),
n n
0 =
j = 1
( )
log f x j ; f xi ; dx1 dxn = E V . ( ) (4)
i = 1
S S
n n
0 = E V = E
j =1
(
log f X j ; = E log f X j ;
j =1
) ( )
= nE log f X 1 ; , ( )
so that
(
E log f X 1 ; = 0. )
Therefore
n n
(
2V = 2 log f X j ; = 2 log f X j ;
j = 1 j = 1
) ( )
= n 2 log f X 1 ;
( )
2 2
( )
= nE log f X 1 ; = nE log f X ; . ( ) (6)
296 12 Point Estimation
But
( )
Co U , V
(
U , V = )
( U )( V )
2
(
C o U , V 2U 2V . ) ( )( ) (7)
Taking now into consideration (5) and (6), relation (7) becomes
2
[g ( )] ( )
( )
2
U nE log f X ; ,
2
or by means of (v),
[g ( )] [g ( )]
2 2
2U = . (8)
[(
nE log f X ; ) ( )]
2
nI ()
The proof of the theorem is completed.
DEFINITION 5 The expression E[(/ )log f(X; )]2, denoted by I( ), is called Fishers infor-
mation (about ) number; nE[(/ )log f(X; )]2 is the information (about )
contained in the sample X1, . . . , Xn.
(For an alternative way of calculating I( ), see Exercises 12.4.6 and 12.4.7.)
Returning to the proof of Theorem 2, we have that equality holds in (8) if
2
and only if C o(U, V) = ( 2U)( 2V) because of (7). By Schwarz inequality
(Theorem 2, Chapter 5), this is equivalent to
( )(
V = E V + k U EU with P probability 1, ) (9)
where
V
()
k =
U
.
Furthermore, because of (i), the exceptional set for which (9) does not hold is
independent of and has P-probability 0 for all . Taking into considera-
tion (4), the fact that EU = g() and the definition of V, equation (9) becomes
as follows:
n
( ) () (
log f X j ; = k U X 1 , . . . , X n g k
j =1
) ()() (10)
( ) (
log f X j ; = U X 1 , . . . , X n
j =1
) k( )d g( )k( )d + h ( X , . . . , X ), 1 n
12.4 The Case Where Complete Sufficient
12.3 The Case ofStatistics AreofNot
Availability Available
Complete or May Statistics
Sufficient Not Exist 297
where h(X 1, . . . , Xn) is the constant of the integration, or
n
( ) (
log f x j ; = U x1 , . . . , xn
j =1
) k( )d g( )k( )d + h ( x , . . . , x ).
1 n (11)
f ( x ; ) = C ( ) exp[Q( )U ( x , . . . , x )]h( x , . . . , x ),
n
j 1 n 1 n (12)
j =1
where
() [ ()() ] ()
C = exp g k d , Q = k d ()
and
( ) [( )]
h x1 , . . . , xn = exp h x1 , . . . , xn .
Thus, if equality occurs in the CramrRao inequality for some unbiased
estimator, then the joint p.d.f. of the Xs is of the one-parameter exponential
form, provided certain conditions are met. More precisely, we have the follow-
ing result.
then
f ( x ; ) = C ( ) exp[Q( )U ( x , . . . , x )]h( x , . . . , x )
n
j 1 n 1 n
j =1
outside a set N in ! n such that P[(X1, . . . , Xn) N] = 0 for all ; here C()
= exp[g( )k( )d] and Q( ) = k( )d. That is, the joint p.d.f. of the Xs is
of the one-parameter exponential family (and hence U is sufficient for ).
REMARK 1 Theorem 2 has a certain generalization for the multiparameter
case, but this will not be discussed here.
In connection with the CramrRao bound, we also have the following
important result.
THEOREM 3 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ) and let g be an estimable real-
valued function of . For an unbiased estimator U = U(X1, , Xn) of g( ), we
assume that regularity conditions (i)(vi) are satisfied. Then 2U is equal to
the CramrRao bound if and only if there exists a real-valued function of ,
d( ), such that U = g( ) + d( )V except perhaps on a set of P-probability
zero for all .
PROOF Under the regularity conditions (i)(vi), we have that
298 12 Point Estimation
[g ( )] [g ( )]
2 2
U
2
, or U 2
,
nI ()
2 V
since nI( ) = 2V by (6). Then 2U is equal to the CramrRao bound if and
only if
[g ( )] = ( U )( V ).
2
2 2
But
[g ( )] ( ) (5).
2 2
= C o U , V by
Thus 2U is equal to the CramrRao bound if and only if C 2(U, V) = ( 2U)
( 2V), or equivalently, if and only if U = a( ) + d( )V with P-probability
1 for some functions of , a( ) and d( ). Furthermore, because of (i), the
exceptional set for which this relationship does not hold is independent of
and has P-probability 0 for all . Taking expectations and utilizing the
unbiasedness of U and relation (4), we get that U = g( ) + d( )V except
perhaps on a set of P-probability 0 for all . The proof of the theorem is
completed.
The following three examples serve to illustrate Theorem 2. The checking
of the regularity conditions is left as an exercise.
EXAMPLE 7 Let X1, . . . , Xn be i.i.d. r.v.s from B(1, p), p (0, 1). By setting p = , we have
( ) ( )
1 x
f x; = x 1 , x = 0, 1
so that
( )
log f x; = x log + 1 x log 1 . ( ) ( )
Then
x 1 x
log f x; =
1
( )
and
2
( )
1 2 1
(1 x) 2
( )
2
log f x; = 2 x + x 1 x .
1 ( )
2
(
1 )
Since
E X 2 = , E 1 X ( )
2
= 1 and E X 1 X = 0 [ ( )]
(see Chapter 5), we have
2
1
E log f X ; = , ( )
1 ( )
so that the CramrRao bound is equal to (1 )/n.
12.4 The Case Where Complete Sufficient
12.3 The Case ofStatistics AreofNot
Availability Available
Complete or May Statistics
Sufficient Not Exist 299
2
E log f X ; = ,
1
( )
is an unbiased
so that the CramrRao bound is equal to /n. Since again X
is a UMVU estimator of .
estimator of with variance /n, we have that X
EXAMPLE 9 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). Assume first that 2 is known and
set = . Then
x
( )
2
(
f x; = ) 1
exp
2 2
, x !
2
and hence
( )
2
1 x
( )
log f x; = log
2 2 2
.
Next,
1 x
log f x; =(
, )
so that
2 2
1 x
(
log f x; = 2 )
.
Then
2
1
E log f X ; = 2 ,
( )
300 12 Point Estimation
(
f x; = ) 1
exp
2
,
2
so that
( )
2
x
( 1
2
) 1
log f x; = log 2 log
2 2
( )
and
( )
2
x
log f x; =
1
2
(
+
2 2
) .
Then
2 2 4
1 x 1 x
( )
1
log f x; = 2 2 + 2
4 2 4
and since (X )/ is N(0, 1), we obtain
2 4
X X
E
= 1, E
= 3. (See Chapter 5.)
Therefore
2
E log f X ; = 2
1
( )
2
and the CramrRao bound is 2 /n. Next,
2
2
n X
j is n2
j =1
(see first corollary to Theorem 5, Chapter 7), so that
n X 2 n X 2
E
= n and
2
j j
j =1 = 2n
j =1
12.3 The Case of Availability of Complete Sufficient Exercises
Statistics 301
Exercises
12.4.1 Let X1, . . . , Xn be i.i.d. r.v.s from the Gamma distribution with
known and = (0, ) unknown. Then show that the UMVU estimator
of is
n
(
U X1 , . . . , X n = ) 1
n
X
j =1
j
302 12 Point Estimation
2
()
Then show that I = E 2 log f X ; .
( )
12.4.7 In Exercises 12.4.112.4.4, recalculate I( ) and the CramrRao
bound by utilizing Exercise 12.4.6 where appropriate.
12.4.8 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), !. For an
estimator V = V(X1, . . . , Xn) of for which EV is finite, write EV = + b( ).
Then b( ) is called the bias of V. Show that, under the regularity conditions
(i)(vi) preceding Theorem 2where (vi) is assumed to hold true for all
estimators for which the integral (sum) is finiteone has
[1 + b( )]
2
2V , .
[( ) ( )]
2
nE log f X ;
Here X is an r.v. with p.d.f. f(; ) and b( ) = db( )/d. (This inequality is
established along the same lines as those used in proving Theorem 2.)
DEFINITION 6 The estimate = (x1, . . . , xn) is called a maximum likelihood estimate (MLE)
of if
( ) [(
L x1 , . . . , xn = max L x1 , . . . , xn ; ; ) ]
(X1, . . . , Xn) is called an ML estimator (MLE for short) of .
REMARK 3 Since the function y = log x, x > 0 is strictly increasing, in order
to maximize (with respect to ) L( |x1, . . . , xn) in the case that !, it suffices
|x1, . . . , xn). This is much more convenient to work with, as
to maximize log L(
will become apparent from examples to be discussed below.
In order to give an intuitive interpretation of a MLE, suppose first that the
Xs are discrete. Then
( ) (
L x1 , . . . , xn = P X 1 = x1 , . . . , X n = xn ; )
that is, L( |x1, . . . , xn) is the probability of observing the xs which were
acutally observed. Then it is intuitively clear that one should select as an
estimate of that which maximizes the probability of observing the xs which
were actually observed, if such a exists. A similar interpretation holds true
for the case that the Xs are continuous by replacing L( |x1, . . . , xn) with the
probability element L( |x1, . . . , xn)dx1 dxn which represents the probability
(under P ) that Xj lies between xj and xj + dxj, j = 1, . . . , n.
In many important cases there is a unique MLE, which we then call the
MLE and which is often obtained by differentiation.
Although the principle of maximum likelihood does not seem to be justi-
fiable by a purely mathematical reasoning, it does provide a method for
producing estimates in many cases of practical importance. In addition, an
MLE is often shown to have several desirable properties. We will elaborate on
this point later.
The method of maximum likelihood estimation will now be applied to a
number of concrete examples.
( )
L x1 , . . . , xn = e n
1
nj = 1 x j
j =1 x j !
n
and hence
n n
( )
log L x1 , . . . , xn = log x j ! n + x j log .
j =1 j =1
Therefore the likelihood equation
304 12 Point Estimation
( ) 1
log L x1 , . . . , xn = 0 becomes n + nx = 0
which gives = x . Next,
2
2 ( 1
)
L x1 , . . . , xn = nx 2 < 0 for all > 0
and hence, in particular, for = . Thus = x is the MLE of .
(
L x1 , . . . , xr = ) n!
p 1x prx
1 r
j =1 x j !
r
n!
( )
xr
= p1x prx1 1 p1 pr 1
1 r1
,
j =1 x j !
r
(
log L x1 , . . . , xr = log ) n!
+ x1 log p1 +
r
x!
j =1 j
(
+ xr 1 log pr 1 + xr log 1 p1 pr 1 . )
Differentiating with respect to pj, j = 1, . . . , r 1 and equating the resulting
expressions to zero, we get
1 1
xj xr = 0, j = 1, . . . , r 1.
pj pr
This is equivalent to
xj xr
= , j = 1, . . . , r 1;
pj pr
that is,
x1 x x
= = r 1 = r ,
p1 pr 1 pr
and this common value is equal to
x1 + + xr 1 + xr n
= .
p1 + + pr 1 + pr 1
12.5 Criteria12.3
for Selecting
The Case anofEstimator: The
Availability of Maximum
Complete Likelihood Principle
Sufficient Statistics 305
Hence xj /pj = n and pj = xj /n, j = 1, . . . , r. It can be seen that these values of the
ps actually maximize the likelihood function, and therefore p j = xj /n, j = 1, . . . ,
r are the MLEs of the ps. (See Exercise 12.5.4.)
EXAMPLE 12 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2) with parameter = (, 2). Then
n
1 1 2
( )
n
L x1 , . . . , xn = exp
2
2 (x j ,
)
2 2 j =1
so that
( )
n
(x )
1 2
log L x1 , . . . , xn = n log 2 n log 2 j .
2 2 j =1
Next,
1 n 1 2
2
( )
n
2
log L x1 , . . . , xn = 4 2
2
(x j )
j =1
1 n
( )
2
xj ,
n j =1
becomes
1 n n
n = < 0.
4 2 2 4
So
1 n
( )
2
2 = x j
n j =1
is the MLE of 2 in this case.
EXAMPLE 13 Let X1, . . . , Xn be i.i.d. r.v.s from U(, ). Here = (, ) which is the
part of the plane above the main diagonal.
Then
( )
L x1 , , xn =
1
( )
n ( ) ( )
I [ , ) x(1) I ( , ] x( n ) .
Here the likelihood function is not differentiable with respect to and , but
it is, clearly, maximized when is minimum, subject to the conditions that
x(1) and x(n). This happens when = x(1) and = x(n). Thus = x(1) and
= x(n) are the MLEs of and , respectively.
In particular, if = c, = + c, where c is a given positive constant, then
( )
L x1 , . . . , xn =
1
( 2c)
n ( ) ( )
I [ c , ) x(1) I ( , +c ] x( n ) .
The likelihood function is maximized, and its maximum is 1/(2c)n, for any
such that c x(1) and + c x(n); equivalently, x(1) + c and x(n) c. This
shows that any statistic that lies between X(1) + c and X(n) c is an MLE of .
For example, 12 [X(1) + X(n)] is such a statistic and hence an MLE of .
If is known and = , or if is known and = , then, clearly, x(1) and
x(n) are the MLEs of and , respectively.
REMARK 4
iii) The MLE may be a UMVU estimator. This, for instance, happens in
Example 10, for in Example 12, and also for 2 in the same example
when is known.
12.5 Criteria12.3
for Selecting
The Case anofEstimator: The
Availability of Maximum
Complete Likelihood Principle
Sufficient Statistics 307
iii) The MLE need not be UMVU. This happens, e.g., in Example 12 for 2
when is unknown.
iii) The MLE is not always obtainable by differentiation. This is the case in
Example 13.
iv) There may be more than one MLE. This case occurs in Example 13 when
= c, = + c, c > 0
In the following, we present two of the general properties that an MLE enjoys.
THEOREM 4 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), ! r, and let T = (T1, . . . ,
Tr), Tj = Tj(X1, . . . , Xn), j = 1, . . . , r be a sufficient statistic for = (1, . . . , r).
Then, if = (1, . . . , r) is the unique MLE , it follows that is a function
of T.
PROOF Since T is sufficient, Theorem 1 in Chapter 11 implies the following
factorization:
( ) ( ) [ ( ) ](
f x1 ; f xn ; = g T x1 , . . . , xn ; h x1 , . . . , xn , )
where h is independent of .
Therefore
[( ) (
max f x1 ; f xn ; ; ) ]
( ) {[ ( ) ]
= h x1 , . . . , xn max g T x1 , . . . , xn ; ; . }
Thus, if a unique MLE exists, it will have to be a function of T, as it follows
from the right-hand side of the equation above.
REMARK 5 Notice that the conclusion of the theorem holds true in all
Examples 1013. See also Exercise 12.3.10.
Another optimal property of an MLE is invariance, as is proved in the
following theorem.
THEOREM 5 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(x; ), ! r, and let be defined
on onto * ! m and let it be one-to-one. Suppose is an MLE of . Then
( ) is an MLE of ( ). That is, an MLE is invariant under one-to-one
transformations.
), so that = 1(
PROOF Set * = ( *). Then
( ) [ ( ) ]
L x1 , . . . , xn = L 1 * x1 , . . . , xn ,
*|x1, . . . , xn). It follows that
call it L*(
[( ) ] [ (
max L x1 , . . . , xn ; = max L * * x1 , . . . , xn ; * * .) ]
By assuming the existence of an MLE, we have that the maximum at the
left-hand side above is attained at an MLE . Then, clearly, the right-hand
side attains its maximum at *, where * = ( ). Thus ( ) is an MLE of (
).
For instance, since
308 12 Point Estimation
1 n
( )
2
xj x
n j =1
is the MLE of 2 in the normal case (see Example 12), it follows that
1 n
( )
2
xj x
n j =1
is the MLE of .
Exercises
12.5.1 If X1, . . . , Xn are i.i.d. r.v.s from B(m, ), = (0, ), show that
/m is the MLE of .
X
12.5.2 If X1, . . . , Xn are i.i.d. r.v.s from the Negative Binomial distribution
with parameter = (0, 1), show that r/(r + X ) is the MLE of .
12.5.3 If X1, . . . , Xn are i.i.d. r.v.s from the Negative Exponential distribu-
tion with parameter = (0, ), show that 1/X ) is the MLE of .
12.5.4 Refer to Example 11 and show that the quantities p j = xj/n, j = 1, . . . ,
r indeed maximize the likelihood function.
12.5.5 Refer to Example 12 and consider the case that both and 2 are
unknown. Then show that the quantities = x and
1 n
( )
2
2 = xj x
n j =1
indeed maximize the likelihood function.
12.5.6 Suppose that certain particles are emitted by a radioactive source
(whose strength remains the same over a long period of time) according to a
Poisson distribution with parameter during a unit of time. The source in
question is observed for n time units, and let X be the r.v. denoting the number
of times that no particles were emitted. Find the MLE of in terms of X.
12.5.7 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; 1, 2) given by
x 1
( )
f x; 1 , 2 =
1
2
exp
2
( )
, x 1 , = 1 , 2 = ! 0, . ( )
Find the MLEs of 1, 2.
12.5.8 Refer to Exercise 11.4.2, Chapter 11, and find the MLE of .
12.5.9 Refer to Exercise 12.3.10 and find the MLE of the reliability !(x; ).
12.5.10 Let X1, . . . , Xn be i.i.d. r.v.s from the U ( 1
2
,+ 1
2
), !
distribution, and let
12.6 Criteria for Selecting
12.3 The Casean
of Estimator:
AvailabilityThe Decision-Theoretic
of Complete Sufficient Approach
Statistics 309
( )
1
( )(
= X 1 , . . . , X n = X ( n ) + cos 2 X 1 X (1) X ( n ) + 1 .
2 )
Then show that is an MLE of but it is not a function only of the sufficient
statistic (X(1), X(n)). (Thus Theorem 4 need not be correct if there exists more
than one MLE of the parameters involved. For this, see also the paper Maxi-
mum Likelihood and Sufficient Statistics by D. S. Moore in the American
Mathematical Monthly, Vol. 78, No. 1, January 1971, pp. 4245.)
[ ( )] (
L ; x1 , . . . , xn = x1 , . . . , xn , )
or more generally,
[ ( )] ( ) ( )
k
L ; x1 , . . . , xn = x1 , . . . , xn , k > 0;
[ ( )] [ ( )] .
2
L ; x1 , . . . , xn = x1 , . . . , xn
DEFINITION 9 The risk function corresponding to the loss function L(; ) is denoted by
R (; ) and is defined by
( ) [ (
R ; = E L ; X 1 , . . . , X n )]
=
[ ( 1 n )] (
1 ) n 1 (
L ; x , . . . , x f x ; f x ; dx dx
n )
[ ( )] ( )
L ; x1 , . . . , xn f x1 ; f xn ; .
x 1 x n
( )
310 12 Point Estimation
That is, the risk corresponding to a given decision function is simply the
average loss incurred if that decision function is used.
Two decision functions and * such that
( ) [ ( )] [ ( )] ( )
R ; = E L ; X 1 , . . . , X n = E L ; * X 1 , . . . , X n = R ; *
for all are said to be equivalent.
In the present context of (point) estimation, the decision = (x1, . . . , xn)
will be called an estimate of , and its goodness will be judged on the basis of
its risk R(; ). It is, of course, assumed that a certain loss function is chosen
and then kept fixed throughout. To start with, we first rule out those estimates
which are not admissible (inadmissible), where
DEFINITION 10 The estimator of is said to be admissible if there is no other estimator *
of such that R(; *) R(; ) for all with strict inequality for at least
one .
Since for any two equivalent estimators and * we have R( ; ) =
R(; *) for all , it suffices to restrict ourselves to an essentially complete
class of estimators, where
DEFINITION 11 A class D of estimators of is said to be essentially complete if for any
estimator * of not in D one can find an estimator in D such that R(; *)
= R(; ) for all .
Thus, searching for an estimator with some optimal properties, we confine
our attention to an essentially complete class of admissible estimators. Once
this has been done the question arises as to which member of this class is to be
chosen as an estimator of . An apparently obvious answer to this question
would be to choose an estimator such that R(; ) R(; *) for any other
estimator * within the class and for all . Unfortunately, such estimators
do not exist except in trivial cases. However, if we restrict ourselves only to the
class of unbiased estimators with finite variance and take the loss function to
be the squared loss function (see paragraph following Definition 8), then,
clearly, R(; ) becomes simply the variance of (X1, . . . , Xn). The criterion
proposed above for selecting then coincides with that of finding a UMVU
estimator. This problem has already been discussed in Section 3 and Section 4.
Actually, some authors discuss UMVU estimators as a special case within the
decision-theoretic approach as just mentioned. However, we believe that the
approach adopted here is more pedagogic and easier for the reader to follow.
Setting aside the fruitless search for an estimator which would uniformly
(in ) minimize the risk within the entire class of admissible estimators, there
are two principles on which our search may be based. The first is to look for an
estimator which minimizes the worst which could happen to us, that is, to
minimize the maximum (over ) risk. Such an estimator, if it exists, is called a
minimax (from minimizing the maximum) estimator. However, in this case,
while we may still confine ourselves to the essentially complete class of estima-
tors, we may not rule out inadmissible estimators, for it might so happen that
12.6 Criteria for Selecting
12.3 The Casean
of Estimator:
AvailabilityThe Decision-Theoretic
of Complete Sufficient Approach
Statistics 311
R R
R(; 0*)
R(; 0), 0 ! minimax R(; 0)
R(/0; 0)
/ /
0 0 /0
Figure 12.2 Figure 12.3
[( ) ] [( )
sup R ; ; sup R ; * ; . ]
Figure 12.2 illustrates the fact that a minimax estimator may be inadmissible.
Now one may very well object to the minimax principle on the grounds
that it gives too much weight to the maximum risk and entirely neglects its
other values. For example, in Fig. 12.3, whereas the minimax estimate is
slightly better at its maximum R(0; ), it is much worse than * at almost all
other points.
Legitimate objections to minimax principles like the one just cited
prompted the advancement of the concept of a Bayes estimate. To see what
this is, some further notation is required. Recall that !, and suppose now
that is an r.v. itself with p.d.f. , to be called a prior p.d.f. Then set
( )()
R ; d
() (
R = E R ; = ) ( )()
R ; .
Assuming that the quantity just defined is finite, it is clear that R( ) is
simply the average (with respect to ) risk over the entire parameter space
when the estimator is employed. Then it makes sense to choose that for
which R() R(*) for any other *. Such a is called a Bayes estimator of ,
provided it exists. Let D2 be the class of all estimators for which R( ) is finite
for a given prior p.d.f. on . Then
DEFINITION 13 Within the class D2, the estimator is said to be a Bayes estimator (in the
decision-theoretic sense and with respect to the prior p.d.f. on ) if R( )
R(*) for any other estimator *.
It should be pointed out at the outset that the Bayes approach to estima-
tion poses several issues that we have to reckon with. First, the assumption of
being an r.v. might be entirely unreasonable. For example, may denote the
(unknown but fixed) distance between Chicago and New York City, which is
312 12 Point Estimation
( ) ( ) [ ( )] [ ( )] .
2
= x1 , . . . , xn , L ; = L ; x1 , . . . , xn = x1 , . . . , xn
Let be an r.v. with prior p.d.f. . Then we are interested in determining so
that it will be a Bayes estimate (of in the decision-theoretic sense). We
consider the continuous case, since the discrete case is handled similarly with
the integrals replaced by summation signs. We have
( ) [ ( )]
2
R ; = E X 1 , . . . , X n
[ ( x , . . . , x )] f ( x ; ) f ( x ; )dx
2
= 1 n 1 n 1 dxn .
Therefore
()
R = R ; d
( )()
[ )]
2
= (
x1 , . . . , xn
( ) ( )
f x1 ; f xn ; dx1 dxn d }()
12.3 12.7
The Case of Availability of Finding
Complete Bayes Estimators
Sufficient Statistics 313
[ ( x , . . . , x )]
2
= 1 n
()( ) ( ) }
f x1 ; f xn ; d dx1 dxn . (13)
(As can be shown, the interchange of the order of integration is valid here
because the integrand is nonnegative. The theorem used is known as the
Fubini theorem.)
From (13), it follows that if is chosen so that
[ ( x1 , . . . , xn )] ( ) f ( x1 ; ) f ( xn ; )d
2
[ ( x1 , . . . , xn )] ( ) f ( x1 ; ) f ( xn ; )d
2
= 2 ( x1 , . . . , xn ) f ( x1 ; ) f ( xn ; ) ( )d 2 ( x1 , . . . , xn )
f ( x1 ; ) f ( xn ; ) ( )d + 2 f ( x1 ; ) f ( xn ; ) ( )d , (14)
f ( x ; ) f ( x ; ) ( )d
(
x1 , . . . , xn = ) f x ; f x ; d .
1 n
( )
1 ( )() n
THEOREM 6 A Bayes estimate (x1, . . . , xn) (of ) corresponding to a prior p.d.f. on for
which
f ( x1 ; ) f ( x n ; ) ( )d < ,
0 < f ( x1 ; ) f ( x n ; ) ( )d < ,
and
f (x ; ) f (x )()
; d < ,
2
1 n
for each (x1, . . . , xn), is given by
f ( x ; ) f ( x ; ) ( )d
(
x1 , . . . , xn = ) f x ; f x ; d ,
1 n
(15)
( )
1 ( )() n
where
() ( )() ( ) ( )()
h x = f x; d = f x1 ; f xn ; d
for the case that is of the continuous type. By means of (15) and (16), it
follows then that the Bayes estimate of (in the decision-theoretic sense)
(x1, . . . , xn) is the expectation of with respect to its posterior p.d.f., that is,
( ) ( )
x1 , . . . , xn = h x d .
Another Bayesian estimate of could be provided by the median of h(|x),
or the mode of h(|x), if it exists.
REMARK 6 At this point, let us make the following observation regarding
the maximum likelihood and the Bayesian approach to estimation problems.
As will be seen, this observation establishes a link between maximum likeli-
hood and Bayes estimates and provides insight into each other. To this end, let
h(|x) be the posterior p.d.f. of given by (16) and corresponding to the prior
p.d.f. . Since f(x; ) = L( |x), h(|x) may be written as follows:
( ) ( h(x) ) .
L x ( )
hx = (17)
EXAMPLE 14 Let X1, . . . , Xn be i.i.d. r.v.s from B(1, ), = (0, 1). We choose to be
the Beta density with parameters and ; that is,
+ ( )
( ) ( )
1
1 1 , if 0, 1
()
= ( )()
0, otherwise.
Now, from the definition of the p.d.f. of a Beta distribution with param-
eters and , we have
12.3 12.7
The Case of Availability of Finding
Complete Bayes Estimators
Sufficient Statistics 315
( ) ( ),
0 x (1 x)
1
1 1
dx = (18)
( + )
and, of course () = ( 1)( 1). Then, for simplicity, writing njxj rather
than nj=1xj when this last expression appears as an exponent, we have
( )
I 1 = f x1 ; f xn ; d
( )()
( ) 1 1 d
+
( ) ( )
1 n j x j 1
j xj 1
=
( ) ( ) 0
( + ) ( )
( )
(1 )
1 + j x j 1 +n j x j 1
=
( ) ( )
0
d ,
( ) + x + n x j
n n
+ j =1 j j =1
I1 = . (19)
( ) ( ) ( + + n)
Next,
( )
I 1 = f x1 ; f xn ; d
( )()
( ) 1 1 d
+
( ) ( )
1 n j x j 1
j xj 1
=
( ) ( ) 0
( + ) ( )
( )
(1 )
1 + j x j +1 1 +n j x j 1
=
( ) ( )
0
d .
( ) + x + 1 + n x j
n n
+ j =1 j j =1
I2 = . (20)
( ) ( ) ( + + n + 1)
Relations (19) and (20) imply, by virtue of (15),
( )
+ + n + j =1 x j + 1 + n x j
n
(
x1 , . . . , xn ) = =
j =1
;
( )
+ + n + 1 + j =1 x j
n ++n
that is,
n
xj +
(
x1 , . . . , xn ) =
j =1
n+ +
. (21)
316 12 Point Estimation
n
xj + 1
(
x1 , . . . , xn ) =
j =1
n+ 2
,
1 1 n 2
= exp x + 2
( )
n +1 j
2 2 j =1
1
[( )
( )]
exp n + 1 2 2 nx + d .
2
But
nx +
(n + 1) 2
( ) (
2 nx + = n + 1 2 2
n+1
)
nx + nx +
2
nx +
2
(
= n + 1 2 2
) n+1
+
n+1
n+1
nx +
2
nx +
2
(
= n + 1
)n+1
n+1
.
Therefore
( )
2
1 1 1 n 2 nx +
I1 = exp x j + 2
( 2 )
2 j =1 n+1
n
n+1
2
1 1 nx +
exp d
2 1 n + 1 (
2 1 n+1 )
2
n+1
( )
nx +
( )
2
1 1 1 n 2
j
= exp x + 2
.
( )
2 j =1 n + 1
n
n + 1 2
(22)
12.3 The Case of Availability of Complete Sufficient Exercises
Statistics 317
Next,
( ) (
I 2 = f x1 ; f xn ; d
)()
(
)
2
1 n
1
( )
2
d
= exp x j exp
( )
n +1
2 2 j = 1 2
( )
2
1 1 1 n nx +
= exp x 2j + 2
( 2 )
2 j =1 n+1
n
n+1
2
1 1 nx +
exp d
2 1 n + 1(
)
2 1 n+1
2
(
n+1
)
( )
2
1 1 1 n 2 nx + nx +
= exp x j + 2
.
( )
2 j =1 n + 1 n + 1 (23)
n
n + 1 2
nx +
( )
x1 , . . . , xn =
n+1
. (24)
Exercises
12.7.1 Refer to Example 14 and:
iii) Determine the posterior p.d.f. h(|x);
iii) Construct a 100(1 )% Bayes confidence interval for ; that is, deter-
mine a set { (0, 1); h( |x) c(x)}, where c(x) is determined by the
requirement that the P-probability of this set is equal to 1 ;
iii) Derive the Bayes estimate in (21) as the mean of the posterior p.d.f.
h(|x).
(Hint: For simplicity, assign equal probabilities to the two tails.)
12.7.2 Refer to Example 15 and:
iii) Determine the posterior p.d.f. h( |x);
iii) Construct the equal-tail 100(1 )% Bayes confidence interval for ;
iii) Derive the Bayes estimate in (24) as the mean of the posterior p.d.f.
h( |x).
318 12 Point Estimation
( x1 , . . . , xn ) = h( x)d ,
THEOREM 7 Suppose there is a prior p.d.f. on such that for the Bayes estimate defined
by (15) the risk R( ; ) is independent of . Then is minimax.
PROOF By the fact that is the Bayes estimate corresponding to the prior ,
one has
12.3 12.8
The Case of Availability Finding Minimax
of Complete SufficientEstimators
Statistics 319
R( ; ) ( )d R( ; *) ( )d
for any estimate *. But R(; ) = c by assumption. Hence
[( ) ] ( )()
sup R ; ; = c R ; * d sup R ; * ;
[( ) ]
for any estimate *. Therefore is minimax. The case that is of the discrete
type is treated similarly.
The theorem just proved is illustrated by the following example.
EXAMPLE 16 Let X1, . . . , Xn and be as in Example 14. Then the corresponding Bayes
estimate is given by (21). Now by setting X = nj= 1Xj and taking into consid-
eration that E X = n and E X2 = n(1 + n), we obtain
2
X +
( )
R ; = E
n+ +
=
1
(
+ )
2
(
n 2 2 2 + 2 n + 2 . )
(n + + )
2
1
By taking = = n and denoting by * the resulting estimate, we have
2
( + )
2
n = 0, 2 2 + 2 n = 0,
so that
2
(
R ; * = ) =
n
=
1
.
( ) ( ) ( )
2 2 2
n+ +
4 n+ n 4 1+ n
Since R(; *) is independent of , Theorem 6 implies that
1
n
xj +
n
2 nx + 1
(
* x1 , . . . , xn = ) j =1
2 =
n+ n 2 1+ n ( )
is minimax.
EXAMPLE 17 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2), where 2 is known and = .
It was shown (see Example 9) that the estimator X of was UMVU. It can
be shown that it is also minimax and admissible. The proof of these latter two
facts, however, will not be presented here.
Now a UMVU estimator has uniformly (in ) smallest risk when its
competitors lie in the class of unbiased estimators with finite variance. How-
ever, outside this class there might be estimators which are better than a
UMVU estimator. In other words, a UMVU estimator need not be admissible.
Here is an example.
320 12 Point Estimation
EXAMPLE 18 Let X1, . . . , Xn be i.i.d. r.v.s from N(0, 2). Set 2 = . Then the UMVU
estimator of is given by
1 n
U= X j2 .
n j =1
(See Example 9.) Its variance (risk) was seen to be equal to 2 2/n; that is,
R(; U) = 2 2/n. Consider the estimator = U. Then its risk is
( ) ( ) [( ) ( )] 2
[( ) ]
2 2
R ; = E U = E U + 1 = n + 2 2 2 n + n .
n
The value = n/(n + 2) minimizes this risk and the minimum risk is equal to
2 2/(n + 2) < 2 2/n for all . Thus U is not admissible.
Exercise
12.8.1 Let X1, . . . , Xn be independent r.v.s from the P( ) distribution, and
consider the loss function L(; ) = [ (x)]2/. Then for the estimate (x) =
calculate the risk R(; ) = 1/E [ (X)]2, and conclude that (x) is
x,
minimax.
[X ( )] .
2
k np j
=
j
2
j =1 np ()
j
Often the ps are differentiable with respect to the s, and then the minimiza-
tion can be achieved, in principle, by differentiation. However, the actual
solution of the resulting system of r equations is often tedious. The solution
may be easier by minimizing the following modified 2 expression:
12.3 12.9
The Case of Availability Other Methods
of Complete of Estimation
Sufficient Statistics 321
[X ( )] ,
2
k np j
=
j
mod
2
j =1 Xj
provided, of course, all Xj > 0, j = 1, . . . , k.
Under suitable regularity conditions, the resulting estimators can be
shown to have some asymptotic optimal properties. (See Section 12.10.)
The method of moments. Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; )
and for a positive integer r, assume that EX r = mr is finite. The problem is that
of estimating mr. According to the present method, mr will be estimated by the
corresponding sample moment
1 n
X jr ,
n j =1
The resulting moment estimates are always unbiased and, under suitable
regularity conditions, they enjoy some asymptotic optimal properties as well.
On the other hand the theoretical moments are also functions of =
(1, . . . , r). Then we consider the following system
1 n k
( )
X j = mk 1 , . . . , r , k = 1, . . . , r,
n j =1
the solution of which (if possible) will provide estimators for j, j = 1, . . . , r.
EXAMPLE 19 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2), where both and 2 are unknown.
By the method of moments, we have
X =
n
1 1 n
( )
2
n X 2
j = 2
+ 2
, hence
= X ,
2
=
n j =1
Xj X .
j =1
EXAMPLE 20 Let X1, . . . , Xn be i.i.d. r.v.s from U(, ), where both and are unknown.
Since
( )
2
+
EX 1 =
2
and X 1 2
( ) =
12
(see Chapter 5), we have
+
X =
2
n
( ) + ( + )
2 2
1 X2 = + = 2 X
n j , or
12 4 = S 12 ,
j =1
where
322 12 Point Estimation
1 n
( )
2
S= Xj X
n j =1
.
Hence = X S 3 , = X + S 3 .
REMARK 8 In Example 20, we see that the moment estimators , of , ,
respectively, are not functions of the sufficient statistic (X(1), X(n)) of (, ).
This is a drawback of the method of moment estimation. Another obvious
disadvantage of this method is that it fails when no moments exist (as in the
case of the Cauchy distribution), or when not enough moments exist.
Least square method. This method is applicable when the underlying
distribution is of a certain special form and it will be discussed in detail in
Chapter 16.
Exercises
12.9.1 Let X1, . . . , Xn be independent r.v.s distributed as U( a, + b),
where a, b > 0 are known and = !. Find the moment estimator of and
calculate its variance.
12.9.2 If X1, . . . , Xn are independent r.v.s distributed as U(, ), =
(0, ), does the method of moments provide an estimator for ?
12.9.3 If X1, . . . , Xn are i.i.d. r.v.s from the Gamma distribution with param-
eters and , show that = X 2/S2 and = S2/X
are the moment estimators of
and , respectively, where
1 n
( )
2
S2 =
n j =1
Xj X .
( 2
)
f x; =
2
( ) ()
x I ( 0 , ) x , = 0, .( )
Find the moment estimator of .
12.9.5 Let X1, . . . , Xn be i.i.d. r.v.s from the Beta distribution with param-
eters , and find the moment estimators of and .
12.9.6 Refer to Exercise 12.5.7 and find the moment estimators of 1 and 2.
( )
n Vn
d
( ( ))
N 0, 2 , as n ,
( P )
P
it follows that Vn n
(see Exercise 12.10.1).
THEOREM 9 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), !. Then, if certain
suitable regularity conditions are satisfied, the likelihood equation
( )
log L X 1 , . . . , X n = 0
has a root *n = *(X1, . . . , Xn), for each n, such that the sequence {*n} of
estimators is BAN and the variance of its limiting normal distribution is equal
to the inverse of Fishers information number
2
() (
I = E log f X ; , )
where X is an r.v. distributed as the Xs above.
In smooth cases, *n will be an MLE or the MLE. Examples have been
constructed, however, for which { n} does not satisfy (ii) of Definition 16 for
some exceptional s. Appropriate regularity conditions ensure that these
exceptional s are only a few (in the sense of their set having Lebesgue
measure zero). The fact that there can be exceptional s, along with other
considerations, has prompted the introduction of other criteria of asymptotic
efficiency. However, this topic will not be touched upon here. Also, the proof
of Theorem 9 is beyond the scope of this book, and therefore it will be omitted.
EXAMPLE 21 iii) Let X1, . . . , Xn be i.i.d. r.v.s from B(1, ). Then, by Exercise 12.5.1, the
MLE of is X , which we denote by X n here. The weak and strong
consistency of X n follows by the WLLN and SLLN, respectively (see
Chapter 8). That n(X n ) is asymptotically normal N(0, I1()), where
I() = 1/[(1 )] (see Example 7), follows from the fact that
n ( X n ) (1 ) is asymptotically N(0, 1) by the CLT (see Chapter
8).
iii) If X1, . . . , Xn are i.i.d. r.v.s from P( ), then the MLE X =X n of (see
Example 10) is both (strongly) consistent and asymptotically normal by
the same reasoning as above, with the variance of limiting normal distribu-
tion being equal to I1() = (see Example 8).
iii) The same is true of the MLE X =X n of and (1/n)nj=1(Xj )2 of 2 if
X1, . . . , Xn are i.i.d. r.v.s from N(, 2) with one parameter known and the
other unknown
(see Example 12). The variance of the (normal) distribu-
tion of n(Xn ) is I1() = 2, and the variance of the limiting normal
distribution of
1 n
(
2
) ( )
n X j 2 is I 1 2 = 2 4 see Example 9 .
n j =1
( )
It can further be shown that in all cases (i)(iii) just considered the regu-
larity conditions not explicitly mentioned in Theorem 9 are satisfied, and
therefore the above sequences of estimators are actually BAN.
12.3 12.11Sufficient
The Case of Availability of Complete Closing Statistics
Remarks 325
Exercise
12.10.1 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ); ! and let {Vn}
= {Vn(X1, . . . , Xn)} be a sequence of estimators of such that n(Vn ) d
( P )
Y as n , where Y is an r.v. distributed as N(0, ( )). Then show that
2
P
Vn n
. (That is, asymptotic normality of {Vn} implies its consistency in
probability.)
{U } = {U ( X , . . . , X )}
n n 1 n and {V } = {V ( X , . . . , X )}
n n 1 n
be two sequences of estimators of . Then we say that {Un} and {Vn} are
asymptotically equivalent if for every ,
(
n U n Vn 0. ) n
P
For an example, suppose that the Xs are from B(1, ). It has been shown
(see Exercise 12.3.3) that the UMVU estimator of is Un = Xn (= X ) and this
coincides with the MLE of (Exercise 12.5.1). However, the Bayes estimator
of , corresponding to a Beta p.d.f. , is given by
n
j =1
Xj +
Vn = , (25)
n+ +
and the minimax estimator is
n
j =1
Xj + n 2
Wn = . (26)
n+ n
That is, four different methods of estimation of the same parameter pro-
vided three different estimators. This is not surprising, since the criteria
of optimality employed in the four approaches were different. Next, by the
CLT, n(Un ) d Z, as n , where Z is an r.v. distributed as
( P )
N(0, (1 )), and it can also be shown (see Exercise 11.1), that n(Vn )
Z, as n , for any arbitrary but fixed (that is, not functions of n)
d
( P )
values of and . It can also be shown (see Exercise 12.11.2) that n(Un Vn)
P
n 0. Thus {Un} and {Vn} are asymptotically equivalent according to Defi-
nition 17. As for Wn, it can be established (see Exercise 12.11.3) that n(Wn
) W, as n , where W is an r.v. distributed as N( 1 , (1 )).
d
(P ) 2
326 12 Point Estimation
Thus {Un} and {Wn} or {Vn} and {Wn} are not even comparable on the basis of
Definition 17.
Finally, regarding the question as to which estimator is to be selected in a
given case, the answer would be that this would depend on which kind of
optimality is judged to be most appropriate for the case in question.
Although the preceding comments were made in reference to the Bino-
mial case, they are of a general nature, and were used for the sake of definite-
ness only.
Exercises
12.11.1 In reference to Example 14, the estimator Vn given by (25) is the
Bayes estimator of , corresponding to a prior Beta p.d.f. Then show that
n(Vn )
d Z as n , where Z is an r.v. distributed as N(0, (1 )).
(P )
12.11.2 In reference to Example 14, Un = X n is the UMVU (and also the ML)
estimator of , whereas the estimator Vn is given by (25). Then show that n(Un
Vn) 0.
P
n
12.11.3 In reference to Example 14, Wn, given by (26), is the minimax
estimator of . Then show that n(Wn ) d W as n , where W is an
( P )
r.v. distributed as (N 21 , (1 ).)
13.3 UMP Tests for Testing Certain Composite Hypotheses 327
Chapter 13
Testing Hypotheses
( )
H H 0 :
A : c .
Often hypotheses come up in the form of a claim that a new product, a
new technique, etc., is more efficient than existing ones. In this context, H (or
H0) is a statement which nullifies this claim and is called a null hypothesis.
If contains only one point, that is, = { 0}, then H is called a simple
hypothesis, otherwise it is called a composite hypothesis. Similarly for
alternatives.
Once a hypothesis H is formulated, the problem is that of testing H on the
basis of the observed values of the Xs.
DEFINITION 2 A randomized (statistical) test (or test function) for testing H against the
alternative A is a (measurable) function defined on ! n, taking values in [0, 1]
and having the following interpretation: If (x1, . . . , xn) is the observed value of
(X1, . . . , Xn) and (x1, . . . , xn) = y, then a coin, whose probability of falling
327
328 13 Testing Hypotheses
0 if
(x
1 , . . . , x ) B .
n
c
In this case, the (Borel) set B in ! n is called the rejection or critical region and
Bc is called the acceptance region.
In testing a hypothesis H, one may commit either one of the following two
kinds of errors: to reject H when actually H is true, that is, the (unknown)
parameter does lie in the subset specified by H; or to accept H when H is
actually false.
effective or if it has harmful side effects, the loss sustained by the company due
to an immediate obsolescence of the product, decline of the companys image,
etc., will be quite severe. On the other hand, failure to market a truly better
drug is an opportunity loss, but that may not be considered to be as serious as
the other loss. If a decision is to be made on the basis of a number of clinical
trials, the null hypothesis H should be that the cure rate of the new drug is no
more than 60% and A should be that this cure rate exceeds 60%.
We notice that for a nonrandomized test with critical region B, we have
() ( ) ( )
= P X 1 , . . . , X n B = 1 P X 1 , . . . , X n B
( ) (
+ 0 P X 1 , . . . , X n Bc = E X 1 , . . . , X n ,
)
and the same can be shown to be true for randomized tests (by an appropriate
application of property (CE1) in Section 3 of Chapter 5). Thus
() () (
= = E X 1 , . . . , X n , ) . (1)
DEFINITION 4 A level- test which maximizes the power among all tests of level is said to
be uniformly most powerful (UMP). Thus is a UMP, level- test if (i) sup
[(
); ] = and (ii) (
) *(
), c for any other test * which
satisfies (i).
If c consists of a single point only, a UMP test is simply called most
powerful (MP). In many important cases a UMP test does exist.
Exercise
13.1.1 In the following examples indicate which statements constitute a
simple and which a composite hypothesis:
i) X is an r.v. whose p.d.f. f is given by f(x) = 2e2xI(0,)(x);
ii) When tossing a coin, let X be the r.v. taking the value 1 if head appears and
0 if tail appears. Then the statement is: The coin is biased;
iii) X is an r.v. whose expectation is equal to 5.
be two given p.d.f.s. We set f0 = f(; 0), f1 = f(; 1) and let X1, . . . , Xn be i.i.d.
r.v.s with p.d.f., f(; ),
. The problem is that of testing the hypothesis H :
= {0} against the alternative A : c = { 1} at level . In other words,
we want to test the hypothesis that the underlying p.d.f. of the Xs is f0 against
the alternative that it is f1. In such a formulation, the p.d.f.s f0 and f1 need not
330 13 Testing Hypotheses
( ) ( )
1, if f x1 ; 1 f xn ; 1 > Cf x1 ; 0 f xn ; 0
( ) ( )
(
x1 , . . . , xn ) ( ) ( )
= , if f x1 ; 1 f xn ; 1 = Cf x1 ; 0 f xn ; 0 ( ) ( )
0, otherwise, (2)
(
E0 X 1 , . . . , X n = . ) (3)
Then, for testing H against A at level , the test defined by (2) and (3) is MP
within the class of all tests whose level is .
The proof is presented for the case that the Xs are of the continuous type,
since the discrete case is dealt with similarly by replacing integrals by summa-
tion signs.
PROOF For convenient writing, we set
(
z = x1 , . . . , xn , )
dz = dx1 dxn , (
Z = X1 , . . . , X n )
and f(z; ), f(Z; ) for f(x1; ) f(xn; ), f(X1; ) f(Xn; ), respectively.
Next, let T be the set of points z in ! n such that f0(z) > 0 and let Dc = Z1(T c).
Then
( ) (
P Dc = P Z T c = f0 z dz = 0,
0 0 ) Tc
()
and therefore in calculating P -probabilities we may redefine and modify r.v.s
0
0
( ) [ ( ) ( )] [ ( ) ( )]
E Z = P f1 Z > Cf0 Z + P f1 Z = Cf0 Z
0 0
f Z
1
= P
( )
> C I D + P
f ( Z)
=
1
C I
D
0
f
0 Z ( )
( )
f Z
0
0
= P 0
(Y > C ) + P (Y = C ),
0 (4)
13.213.3 UMPaTests
Testing forHypothesis
Simple Testing Certain Composite
Against a SimpleHypotheses
Alternative 331
a(C )
Figure 13.1
a(C")
a(C )
C
0
( ) ( ) ( ) [
P Y = C = G C G C = 1 a C 1 a C = a C a C ,
0
( )] [ ( )] ( ) ( )
and a(C) = 1 for C < 0, since P (Y 0) = 1 0
Figure 13.1 represents the graph of a typical function a. Now for any (0
< < 1) there exists C0 (0) such that a(C0) a(C0 ). (See Fig. 13.1.) At
this point, there are two cases to consider. First, a(C0) = a(C0 ); that is, C0 is
a continuity point of the function a. Then, = a(C0) and if in (2) C is replaced
by C0 and = 0, the resulting test is of level . In fact, in this case (4) becomes
( ) (
E Z = P Y > C0 = a C0 = ,
0 0
) ( )
as was to be seen.
Next, we assume that C0 is a discontinuity point of a. In this case, take
again C = C0 in (2) and also set
=
( ) a C0
a(C ) a(C )
0 0
(so that 0 1). Again we assert that the resulting test is of level . In the
present case, (4) becomes as follows:
( ) (
E Z = P Y > C0 + P Y = C0
0 0
) 0
( )
( ) a C a C = .
a C0
( )
= a C0 +
a(C ) a(C )
0
[ ( ) ( )] 0
0 0
332 13 Testing Hypotheses
=
( )
a C0
a(C ) a(C )
0 0
{ () () } (
B + = z ! n ; z * z > 0 = * > 0 , )
B
= {z ! ; (z ) * (z ) < 0} = ( * < 0).
n
( ) (
B+ = > * = 1 = = f1 Cf0 ) ( ) ( ) (5)
( ) (
B = < * = 0 = = f1 Cf0 .
) ( ) ( )
Therefore
which is equivalent to
But
0
( )
= E Z E * Z = E * Z 0,
0
( ) 0
( ) (7)
and similarly,
[ ( ]] ( ) ( )
P Y b1 , b2 G b2 G b1
0
ii) The theorem shows that there is always a test of the structure (2) and (3)
which is MP. The converse is also true, namely, if is an MP level test,
then necessarily has the form (2) unless there is a test of size < with
power 1.
This point will not be pursued further here.
The examples to be discussed below will illustrate how the theorem is
actually used in concrete cases. In the examples to follow, = {0, 1} and the
problem will be that of testing a simple hypothesis against a simple alternative
at level of significance . It will then prove convenient to set
( ) (
f x1 ; 1 f xn ; 1 )
( )
R z; 0 , 1 =
f (x ; ) f (x ; )
1 0 n 0
where x = nj= 1 xj and therefore, by the fact that 0 < 1, R(z; 0, 1) > C is
equivalent to
x > C0 , where C0 = log C n log
1 1 1 1 0 ( )
log .
1 0 0 1 1 ( )
334 13 Testing Hypotheses
1,
j =1 x j > C0
n
if
()
z = , if j =1 x j = C0
n
(9)
0, otherwise,
where C0 and are determined by
( ) (
E Z = P X > C0 + P X = C0 = ,
0 0
) 0
( ) (10)
and X = nj= 1 Xj is B(n, i), i = 0, 1. If 0 > 1, the inequality signs in (9) and (10)
are reversed.
For the sake of definiteness, let us take 0 = 0.50, 1= 0.75, = 0.05 and
n = 25. Then
( ) ( )
0.05 = P0.5 X > C0 + P0.5 X = C0 = 1 P0.5 X C0 + P0.5 X = C0 ( ) ( )
is equivalent to
( )
P0.5 X C0 P0.5 X = C0 = 0.95. ( )
For C0 = 17, we have, by means of the Binomial tables, P0.5(X 17) = 0.9784
and P0.5(X = 17) = 0.0323. Thus is defined by 0.9784 0.0323 = 0.95, whence
= 0.8792. Therefore the MP test in this case is given by (2) with C0 = 17 and
= 0.882. The power of the test is P0.75(X > 17) + 0.882 P0.75(X = 17) = 0.8356.
EXAMPLE 2 Let X1, . . . , Xn be i.i.d. r.v.s from P() and suppose 0 < 1. Then
1
(
logR z; 0 , 1 = x log ) 0
n 1 0 , ( )
where
n
x = xj
j =1
and hence, by using the assumption that 0 < 1, one has that R(z; 0, 1) > C is
equivalent to x > C0 , where
n( )
log Ce 1 0
C0 = .
log 1 0 ( )
Thus the MP test is defined by
1,
n
if x j > C0
j =1
()
z = , if
n
j =1
x j = C0 (11)
0, otherwise,
13.213.3
Testing
UMPaTests
Simple
forHypothesis
Testing Certain
Against
Composite
a SimpleHypotheses
Alternative 335
( ) ( )
E0 Z = P0 X > C0 + P0 X = C0 = , ( ) (12)
and X = nj=1 Xj is P(ni), i = 0, 1. If 0 > 1, the inequality signs in (11) and (12)
are reversed.
As an application, let us take 0 = 0.3, 1 = 0.4, = 0.05 and n = 20. Then
(12) becomes
( )
P0.3 X C0 P0.3 X = C0 = 0.95. ( )
By means of the Poisson tables, one has that for C0 = 10, P0.3(X 10) = 0.9574
and P0.3(X = 10) = 0.0413. Therefore is defined by 0.9574 0.0413 = 0.95,
whence = 0.1791.
Thus the test is given by (11) with C0 = 10 and = 0.1791. The power of the
test is
( )
P0.4 X > 10 + 0.1791 P0.4 X = 10 = 0.2013. ( )
EXAMPLE 3 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 1) and suppose 0 < 1. Then
1 n
( ) ( ) (x )
1
2 2
logR z; 0 , 1 = x j 0
2 j =1
j
C0 =
1 log C
+
n 0 + 1 ( )
n 1 0 2
by using the fact that 0 < 1.
Thus the MP test is given by
1, if x > C0
()
z = (13)
0, otherwise,
where C0 is determined by
( ) (
E Z = P X > C0 = ,
0 0
) (14)
is N( , 1/n), i = 0, 1. If > , the inequality signs in (13) and (14) are
and X i 0 1
reversed.
Let, for example, 0 = 1, 1, = 1, = 0.001 and n = 9. Then (14) gives
( ) [( ) ( )] [ (
P1 X > C0 = P1 3 X + 1 > 3 C0 + 1 = P N 0, 1 > 3 C0 + 1 = 0.001, ) ( )]
whence C0 = 0.03. Therefore the MP test in this case is given by (13) with
C0 = 0.03. The power of the test is
( ) [( ) ] [ ( )
P1 X > 0.03 = P1 3 X 1 > 2.91 = P N 0, 1 > 2.91 = 0.9982. ]
336 13 Testing Hypotheses
EXAMPLE 4 Let X1, . . . , Xn be i.i.d. r.v.s from N(0, ) and suppose 0 < 1. Here
1 0
(
logR z; 0 , 1 = ) 2 01
1
x + log 0 ,
2 1
where x = nj=1 x2j, so that, by means of 0 < 1, one has that R(z; 0, 1) > C is
equivalent to x > C0, where
201
C0 = log C 1 .
1 0 0
Thus the MP test in the present case is given by
1,
n
if x 2j > C0
()
z = j =1 (15)
0, otherwise,
where C0 is determined by
n
0
()
E P = E X j2 > C0 = ,
j =1
0
(16)
X
and is distributed as 2n, i = 0, 1, where X = nj= 1 X2j. If 0 > 1, the inequal-
i
ity signs in (15) and (16) are reversed. For an example, let 0 = 4, 1 = 16, =
0.01 and n = 20. Then (16) becomes
X C 2 C0
( )
P4 X > C0 = P4 > 0 = P 20
4 4
> = 0.01,
4
whence C0 = 150.264. Thus the test is given by (15) with C0 = 150.264. The
power of the test is
( )
X 150.264
P16 X > 150.264 = P16
16
>
16
(
= P 20 > 9.3915 = 0.977.
2
)
Exercises
13.2.1 If X1, . . . , X16 are independent r.v.s, construct the MP test of the
hypothesis H that the common distribution of the Xs is N(0, 9) against the
alternative A that it is N(1, 9) at level of significance = 0.05. Also find
the power of the test.
13.2.2 Let X1, . . . , Xn be independent r.v.s distributed as N(, 2), where
is unknown and is known. Show that the sample size n can be determined so
that when testing the hypothesis H : = 0 against the alternative A : = 1, one
has predetermined values for and . What is the numerical value of n if
= 0.05, = 0.9 and = 1?
13.3 UMP Tests for Testing Certain Composite Hypotheses 337
DEFINITION 5 The family {g(; ); } is said to have the monotone likelihood ratio (MLR)
property in V if the set of zs for which g(z; ) > 0 is independent of and there
exists a (measurable) function V defined in ! n into ! such that whenever ,
with < then: (i) g(; ) and g(; ) are distinct and (ii) g(z; )/g(z; )
is a monotone function of V(z).
Note that the likelihood ratio (LR) in (ii) is well defined except perhaps on
a set N of zs such that P(Z N) = 0 for all . In what follows, we will
always work outside such a set.
An important family of p.d.f.s having the MLR property is a one-
parameter exponential family.
PROPOSITION 1 Consider the exponential family
(
f x: =C e ) () () ( )
Q T x
()
hx,
where C() > 0 for all ! and the set of positivity of h is independent
of . Suppose that Q is increasing. Then the family {g( ;); } has the MLR
property in V, where V(z) = nj=1 T(xj) and g( ; ) is given by (17). If Q is
decreasing, the family has the MLR property in V = V.
PROOF We have
(
g z : = C0 e ) () () ()
Q V z
()
h* z ,
where C0() = C (), V(z) = T(xj) and h*(z) = h(x1) h(xn). Therefore on
n n
j=1
the set of zs for which h*(z) > 0 (which set has P-probability 1 for all ),
one has
( ) = C ( )e ( ) ( ) = C ( ) e[
g z; 0
Q V z
0 ( ) ( )] ( )
Q Q V z
.
g ( z; ) C ( )e
() ()
0
C ( ) Q V z
0
Now for < , the assumption that Q is increasing implies that g(z; )/g(z; )
is an increasing function of V(z). This completes the proof of the first assertion.
The proof of the second assertion follows from the fact that
Then
( )=e
f x;
1 + e x
2
and
( ) < f ( x ; )
f x;
f ( x; ) f ( x; ) f ( x ; )
x
1+ e
if and only if
2 2
1 + e x
1 + e
x
e x
< e x
.
1+e 1+e
However, this is equivalent to ex(e e ) < ex(e e ). Therefore if < ,
the last inequality is equivalent to ex < ex or x < x. This shows that the
family {f(; ); !} has the MLR property in x.
For families of p.d.f.s having the MLR property, we have the following
important theorem.
THEOREM 2 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(x; ), ! and let the family {g(;
); } have the MLR property in V, where g(; ) is defined in (17). Let 0
and set = { ; 0}. Then for testing the (composite) hypothesis
H : against the (composite) alternative A : c at level of significance ,
there exists a test which is UMP within the class of all tests of level . In the
case that the LR is increasing in V(z), the test is given by
1,
()
if V z > C
()
z = , if V ( z) = C (19)
0, otherwise,
( ) [() ]
E 0 Z = P 0 V Z > C + P 0 V Z = C = . [() ] (19)
If the LR is decreasing in V(z), the test is taken from (19) and (19) with
the inequality signs reversed.
The proof of the theorem is a consequence of the following two lemmas.
LEMMA 1 Under the assumptions made in Theorem 2, the test defined by (19) and (19)
is MP (at level ) for testing the (simple) hypothesis H0 : = 0 against the
(composite) alternative A : c among all tests of level .
PROOF Let be an arbitrary but fixed point in c and consider the problem
of testing the above hypothesis H0 against the (simple) alternative A : = at
level . Then, by Theorem 1, the MP test is given by
1,
if ( ) ( )
g z; > C g z; 0
()
z = , if g( z; ) = C g( z; )
0
0, otherwise,
340 13 Testing Hypotheses
( )
E 0 Z = .
Let g(z; )/g(z; 0) = [V(z)]. Then in the case under consideration is
defined on ! into itself and is increasing. Therefore
[ ( )]
V z > C if and only if () ( )
V z > 1 C = C0
(20)
[V ( z)] = C if and only if ()
V z = C0 .
In addition,
( ) { [ ( )] } { [ ( )] }
E 0 Z = P 0 V Z > C + P 0 V Z = C
[() ] [()
= P 0 V Z > C0 + P 0 V Z = C0 . ]
Therefore the test defined above becomes as follows:
1,
()
if V z > C0
()
z = , if V ( z) = C 0 (21)
0, otherwise,
and
( ) [() ] [()
E0 Z = P0 V Z > C0 + P0 V Z = C 0 = , ] (21)
so that C0 = C and = by means of (19) and (19).
It follows from (21) and (21) that the test is independent of c. In
other words, we have that C = C0 and = and the test given by (19) and (19)
is UMP for testing H0 : = 0 against A : c (at level ).
LEMMA 2 Under the assumptions made in Theorem 2, and for the test function defined
by (19) and (19), we have E(Z) for all .
PROOF Let be an arbitrary but fixed point in and consider the problem
of testing the (simple) hypothesis H : = against the (simple) alternative
A0(= H0) : = 0 at level () = E(Z). Once again, by Theorem 1, the MP test
is given by
1,
if ( ) ( )
g z; 0 > C g z;
()
z = , if g( z; ) = C g( z; )
0
0, otherwise,
( ) { [ ( )] } { [ ( )] }
E Z = P V Z > C + P V Z = C = . ( )
13.3 UMP Tests for Testing Certain Composite Hypotheses 341
1,
()
if V z > C0
()
z = , if V ( z) = C
0 (22)
0, otherwise,
( ) [() ] [() ] ( )
E Z = P V Z > C0 + P V Z = C0 = , (22)
where C0 = 1(C ).
Replacing 0 by in the expression on the left-hand side of (19) and
comparing the resulting expression with (22), one has that C0 = C and = .
Therefore the tests and are identical. Furthermore, by the corollary to
Theorem 1, one has that () , since is the power of the test .
PROOF OF THEOREM 2 Define the classes of test C and C0 as follows:
{
C = all level tests for testing H : 0 ,}
C0 = {all level tests for testing H : = }.
0 0
Then, clearly, C C0. Next, the test , defined by (19) and (19), belongs in C
by Lemma 2, and is MP among all tests in C0, by Lemma 1. Hence it is MP
among tests in C. The desired result follows.
REMARK 2 For the symmetric case where = { ; 0}, under the
assumptions of Theorem 2, a UMP test also exists for testing H : against
A : c. The test is given by (19) and (19) if the LR is decreasing in V(z) and
by those relationships with the inequality signs reversed if the LR is increasing
in V (z). The relevant proof is entirely analogous to that of Theorem 2.
COROLLARY Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ) given by
( )
f x; = C e() ()
Q ( )T ( x )
hx,
where Q is strictly monotone. Then for testing H : = { ; 0} against
A : c at level of significance , there is a test which is UMP within the
class of all tests of level . This test is given by (19) and (19) if Q is increasing
and by (19) and (19) with reversed inequality signs if Q is decreasing.
Also for testing H : = { ; 0} against A : c at level , there
is a test which is UMP within the class of all tests of level . This test is given
by (19) and (19) if Q is decreasing and by those relationships with reversed
inequality signs if Q is increasing.
In all tests, V(z) = nj= 1 T(xj).
PROOF It is immediate on account of Proposition 1 and Remark 2.
It can further be shown that the function () = E(Z), , for the
problem discussed in Theorem 2 and also the symmetric situation mentioned
342 13 Testing Hypotheses
&(/) &(/)
1 1
% %
/ /
0 /0 0 /0
Figure 13.2 H : 0, A : > 0 Figure 13.3 H : 0, A : < 0
in Remark 2, is increasing for those s for which it is less than 1 (see Figs. 13.2
and 13.3, respectively).
Another problem of practical importance is that of testing
{
H : = ; 1 or 2 }
against A : c, where 1, 2 and 1 < 2. For instance, may represent a
dose of a certain medicine and 1, 2 are the limits within which is allowed to
vary. If 1 the dose is rendered harmless but also useless, whereas if 2
the dose becomes harmful. One may then hypothesize that the dose in ques-
tion is either useless or harmful and go about testing the hypothesis.
If the underlying distribution of the relevant measurements is assumed to
be of a certain exponential form, then a UMP test for the testing problem
above does exist. This result is stated as a theorem below but its proof is not
given, since this would rather exceed the scope of this book.
THEOREM 3 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), given by
( )
f x; = C e () () ( )
Q T x
()
hx, (23)
1,
if C1 < V z < C 2 ()
()
z = i ()
if V z = C i (i = 1, 2) (C 1 < C2 ) (24)
0, otherwise,
i
( ) [ ( ) ]
E Z = P C1 < V Z < C 2 + 1P V Z = C1
i [() i ]
P [V (Z) = C ] = , i = 1, 2,
n
+ 2 i 2 ()
and V z = T x j .
j =1
( ) (25)
13.3 UMP Tests for Testing Certain Composite Hypotheses 343
&(/)
1
% %
/
0 /1 /0 /2
Figure 13.4 H : 1 or 2, A : 1 < < 2.
If Q is decreasing, the test is given again by (24) and (25) with C1 < V (z) < C2
replaced by V(z) < C1 or V(z) > C2.
It can also be shown that (in nondegenerate cases) the function () =
E(Z), for the problem discussed in Theorem 3, increases for 0 and
decreases for 0 for some 1 < 0 < 2 (see also Fig. 13.4).
Theorems 2 and 3 are illustrated by a number of examples below. In order
to avoid trivial repetitions, we mention once and for all that the hypotheses to
be tested are H : = { ; 0} against A : c and H : = { ;
1 or 2} against A : c; 0, 1, 2 and 1 < 2. The level of
significance is (0 < < 1).
EXAMPLE 6 Let X1, . . . , Xn be i.i.d. r.v.s from B(1, ), = (0, 1). Here
n
()
V z = xj
j =1
()
and Q = log
1
is increasing since /(1 ) is so. Therefore, on account of the corollary to
Theorem 2, the UMP test for testing H is given by
1,
j = 1 x j > C
n
if
()
z = , if j = 1 x j = C
n
(26)
0, otherwise,
where C and are determined by
( ) (
E Z = P X > C + P X = C = ,
0 0
) 0
( ) (27)
and
n
X = Xj
j =1
is B n, .( )
For a numerical application, let 0 = 0.5, = 0.01 and n = 25. Then one
has
344 13 Testing Hypotheses
( )
P0.5 X > C + P0.5 X = C = 0.01. ( )
The Binomial tables provided the values C = 18 and = 27
143
. The power of the
test at = 0.75 is
( )
0.75 = P0.75 X > 18 + ( ) 27
143
(
P0.75 X = 18 = 0.5923. )
By virtue of Theorem 3, for testing H the UMP test is given by
()
z = i if
n
j=1
x j = Ci (i = 1, 2)
0, otherwise,
with C1, C2 and 1, 2 defined by
( ) ( ) ( )
E i Z = P i C1 < X < C 2 + 1 P i X = C1 + 2 P i X = C 2 = , i = 1, 2. ( )
Again for a numerical application, take 1 = 0.25, 2 = 0.75, = 0.05 and
n = 25. One has then
( ) ( )
P0.25 C1 < X < C 2 + 1 P0.25 X = C1 + 2 P0.25 X = C 2 = 0.05 ( )
P0.75 (C
1 < X < C ) + P (X = C ) +
2 1 0.75 1 P
2 0.75 (X = C ) = 0.05.
2
416 1 + 2 2 = 205
2 1 + 416 2 = 205,
from which we obtain
205
1 = 2 = 0.4904.
418
The power of the test at = 0.5 is
( ) (
0.5 = P0.5 10 < X < 15 + ) 205
418 [ ( )
P0.5 X = 10 + P0.5 X = 15 = 0.6711. ( )]
EXAMPLE 7 Let X1, . . . , Xn be i.i.d. r.v.s from P(), = (0, ). Here V(z) = nj= 1 xj and
Q() = log is increasing. Therefore the UMP test for testing H is again given
by (26) and (27), where now X is P(n).
For a numerical example, take 0 = 0.5, = 0.05 and n = 10. Then, by means
of the Poisson tables, we find C = 9 and
182
= 0.5014.
363
13.3 UMP Tests for Testing Certain Composite Hypotheses 345
()
V z = xj
j =1
and Q = () 1
2
( )
E 0 Z = P 0 X > C = , ( )
is N(, 2/n). (See also Figs. 13.5 and 13.6.)
and X
The power of the test, as is easily seen, is given by
n C ( )
= 1 ()
.
For instance, for = 2 and 0 = 20, = 0.05 and n = 25, one has C = 20.66. For
= 21, the power of the test is
( )
21 = 0.8023.
On the other hand, for testing H the UMP test is given by
1, if C1 < x < C 2
()
z =
0, otherwise,
where C1, C2 are determined by
( ) (
E Z = P C1 < X < C 2 = ,
i i
) i = 1, 2.
(See also Fig. 13.7.)
The power of the test is given by
n C ( ) n (C )
()
=
2
1
.
N (/0 , )
$2
N (/0, )
$2
n n
% %
/ /
0 /0 C 0 C /0
Figure 13.5 H : 0, A : > 0. Figure 13.6 H : 0, A : < 0.
346 13 Testing Hypotheses
N (/0, )
$2
n
%
/
0 C1 /0 C2
Figure 13.7 H : 1 or 2, A : 1 < < 2.
()
= 1 P n2 < C ( ) (independent of !).
(See also Figs. 13.8 and 13.9; 2n stands for an r.v. distribution as 2n.)
For a numerical example, take 0 = 4, = 0.05 and n = 25. Then one has
C = 150.608, and for = 12, the power of the test is (12) = 0.980.
On the other hand, for testing H the UMP test is given by
1,
( )
2
if C1 < j = 1 x j
n
< C2
()
z =
0, otherwise,
1 n2 1 n2
% %
/ /
0 C//0 0 C//0
Figure 13.8 H : 0, A : > 0. Figure 13.9 H : 0, A : < 0.
13.3 UMP Tests for Testing Certain Composite Hypotheses
Exercises 347
n
( ) ( )
2
E Z = P C1 < X j < C2 = , i = 1, 2.
i i
j =1
(
P 25
2
) (
< C 2 P 25
2
< C1 = 0.01, ) 2 C2
P 25
<
2 C1
P 25 < = 0.01
3 3
and C1, C2 are determined from the Chi-square tables (by trial and error).
Exercises
13.3.1 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f given below. In each case,
show that the joint p.d.f. of the Xs has the MLR property in V = V(x1, . . . , xn)
and identity V.
1 x
(
i) f x; = ) ()
x e I ( 0 , ) x , = 0, , = known ( ) (> 0);
( )
r + x 1
( ) ( ) () { } ( )
x
ii) f x; = r 1 I A x , A = 0, 1, . . . , = 0, 1 .
x
13.3.2 Refer to Example 8 and show that, for testing the hypotheses H and
H mentioned there, the power of the respective tests is given by
n C ( )
()
= 1
and
n C ( ) n (C )
()
=
2
1
as asserted.
13.3.3 The length of life X of a 50-watt light bulb of a certain brand may be
assumed to be a normally distributed r.v. with unknown mean and known
s.d. = 150 hours. Let X1, . . . , X25 be independent r.v.s distributed as X and
suppose that x = 1,730 hours. Test the hypothesis H : = 1,800 against the
alternative A : < 1,800 at level of significance = 0.01.
348 13 Testing Hypotheses
C C C
()
()
= 1 P n2 < and = P n2 < 2 P n2 < 1
as asserted.
13.3.6 Let X1, . . . , X25 be independent r.v.s distributed as N(0, 2). Test the
hypothesis H : 2 against the alternative A : > 2 at level of significance =
0.05. What does the relevant test become for 25 j = 1xj = 120, where xj is the
2
13.3.11 Let X be the number of times that an electric light switch can be
turned on and off until failure occurs. Then X may be considered to be an r.v.
distributed as Negative Binomial with r = 1 and unknown p. Let X1, . . . , X15 be
independent r.v.s distributed as X and suppose that x = 15,150. Test the
hypothesis H : p = 104 against the alternative A : p > 104 at level of significance
= 0.05.
13.3.12 Let X1, . . . , Xn be independent r.v.s with p.d.f. f given by
( )
f x; =
1 x
e ()
I (0 , ) x , (
= 0, .)
i) Derive the UMP test for testing the hypothesis H : 0 against the alter-
native A : < 0 at level of significance ;
ii) Determine the minimum sample size n required to obtain power at least
0.95 against the alternative 1 = 500 when 0 = 1,000 and = 0.05.
( )
f x; = C e () ( )
T x
()
hx, !. (28)
1,
()
if V z < C1 or V (z ) > C 2
()
z = i , if V (z ) = C , (i = 1, 2) (C
i 1 < C2 )
0, otherwise,
( )
Ei Z = , i = 1, 2 for H ,
and
0
( ) 0 [ ( ) ( )]
E Z = , E V Z Z = E V Z 0
( ) for H 0 .
(Recall that z = (x1, . . . , xn), Z = (X1, . . . , Xn) and V(z) = nj= 1 T(xj).)
Furthermore, it can be shown that the function () = E(Z),
(except for degenerate cases) is decreasing for 0 and increasing for 0
for some 1 < 0 < 2 (see also Fig. 13.10).
REMARK 4 We would expect that cases like Binomial, Poisson and Normal
would fall under Theorem 4, while they seemingly do not. However, a simple
reparametrization of the families brings them under the form (28). In fact, by
Examples and Exercises of Chapter 11 it can be seen that all these families are
of the exponential form
&(/)
1
% %
/
0 /1 /0 /2
Figure 13.10 H : 1 2, A : < 1 or > 2.
13.4
13.3 UMPU
UMP Tests for Testing Certain Composite Hypotheses 351
(
f x; = C e ) () () ( )
Q T x
hx. ()
i) For the Binomial case, Q() = log[/(1 )]. Then by setting log[/(1 )]
= , the family is brought under the form (28). From this transformation,
we get = e/(1 + e ) and the hypotheses 1 2, = 0 become
equivalently, 1 2, = 0, where
i
i = log , i = 0, 1, 2.
1 i
ii) For the Poisson case, Q() = log and the transformation log = brings
the family under the form (28). The transformation implies = e and the
hypotheses 1 2, = 0 become, equivalently, 1 2, = 0 with
i = log i, i = 0, 1, 2.
iii) For the Normal case with known and = , Q() = (1/2) and the factor
1/2 may be absorbed into T(x).
iv) For the Normal case with known and 2 = , Q() = 1/(2) and the
transformation 1/(2) = brings the family under the form (28) again.
Since = 1/(2), the hypotheses 1 2 and = 0 become, equiva-
lently, 1 2 and = 0, where i = 1/(2i), i = 0, 1, 2.
As an application to Theorem 4 and for later reference, we consider the
following example. The level of significance will be .
EXAMPLE 10 Suppose X1, . . . , Xn are i.i.d. r.v.s from N(, 2). Let be known and set
= . Suppose that we are interested in testing the hypothesis H : = 0 against
the alternative A : 0. In the present case,
()
T x =
1
2
x,
so that
n n
()
V z = T x j =
j =1
( ) 1
2
x
j =1
j =
n
2
x.
( )
E Z = , E V Z Z = E V Z .
0 0 [ ( ) ( )] 0
( )
Now can be expressed equivalently as follows:
352 13 Testing Hypotheses
(
n x 0 ) < C (
n x 0 ) > C
()
z = 1, if
1 or
2
0, otherwise,
where
C1 n0 C 2 n0
C1 = , C 2 = .
n n
On the other hand, under H, n(X 0)/ is N(0, 1). Therefore, because of
symmetry C1 = C2 = C, say (C > 0). Also
(
n x 0 ) < C or
(
n x 0 ) >C
is equivalent to
( )
2
n x
0
>C
and, of course, [n(X 0)/]2 is 21, under H. By summarizing then, we have
( )
2
n x
if
0
1, >C
()
z =
0, otherwise,
where C is determined by
(
P 12 > C = . )
In many situations of practical importance, the underlying p.d.f. involves
a real-valued parameter in which we are exclusively interested, and in
addition some other real-valued parameters 1, . . . , k in which we have no
interest. These latter parameters are known as nuisance parameters. More
explicitly, the p.d.f. is of the following exponential form:
( ) (
f x; , 1 , . . . , k = C , 1 , . . . , k exp T x [ () )
+ T ( x) + + T ( x )]h( x),
1 1 k k k (29)
H1 : =
{ ; }
0
H1 : =
{ ; }
0
H 2 : = { ; or }A ( A) :
1 2 i i
c
, i = 1, . . . , 4.
H 3 : = { ; }
1 2
H 4 : = { }
0 (30)
Exercises
13.4.1 A coin, with probability p of falling heads, is tossed independently 100
times and 60 heads are observed. Use the UMPU test for testing the hypoth-
esis H : p = 12 against the alternative A : p 12 at level of significance = 0.1.
13.4.2 Let X1, X2, X3 be independent r.v.s distributed as B(1, p). Derive the
UMPU test for testing H : p = 0.25 against A : p 0.25 at level of significance .
Determine the test for = 0.05.
Whenever convenient, we will also use the notation z and Z instead of (x1, . . . ,
xn) and (X1, . . . , Xn), respectively. Finally, all tests will be of level .
(
P n21 > C 02 = , ) (32)
is UMP. The test given by (31) and (32) with reversed inequalities is UMPU
for testing H1 : 0 against A1 : < 0.
The power of the tests is easily determined by the fact that (1/2) nj= 1
(Xj X )2 is 2n1 when obtains (that is, is the true s.d.). For example, for
n = 25, 0 = 3 and = 0.05, we have for H1, C/9 = 36.415, so that C = 327.735.
The power of the test at = 5 is equal to P(224 > 13.1094) = 0.962.
For H1, C/9 = 13.848, so that C = 124.632, and the power at = 2 is P 224
(< 31.158) = 0.8384.
PROPOSITION 3 For testing H2 : 1 or 2, against A2 : 1 < < 2, the test given by
1,
( )
2
if C1 < j =1 x j x
n
< C2
()
z = (33)
0, otherwise,
where C1, C2 are determined by
(
P C1 i2 < n21 < C 2 i2 = , ) i = 1, 2, (34)
is UMPU. The test given by (33) and (34), where the inequalities C1 < nj=1
(xj x)2 < C2 are replaced by
n n
( ) (x )
2 2
xj x < C1 or j x > C2 ,
j =1 j =1
2 C1 2 C2
P 24 > P 24 > = 0.05
4 4
P 2 > C1 P 2 > C 2 = 0.05
24 9
24
9
13.3 UMP
13.5 Tests
Testing
forthe
Testing
Parameters
Certainof
Composite
a Normal Hypotheses
Distribution 355
0, otherwise,
where C1, C2 are determined by
() 1 c
()
c 2 02 02
g t dt =
n 1 c
tg t dt = 1 ,
2
c 1 02 1 02
n x 0( )
()
t z =
n
. (35)
1
( )
2
xj x
n 1 j =1
PROPOSITION 5 For testing H1 : 0 against A1 : > 0, the test given by
()
1,
z =
if t z > C () (36)
0, otherwise,
where C is determined by
(
P t n1 > C = , ) (37)
is UMPU. The test given by (36) and (37) with reversed inequalities is UMPU
for testing H1 : 0 against A1 : < 0; t(z) is given by (35). (See also Figs. 13.11
and 13.12; tn1 stands for an r.v. distributed as tn1.)
For n = 25 and = 0.05, we have P(t24 > C) = 0.05; hence C = 1.7109 for H1,
and C = 1.7109 for H1.
PROPOSITION 6 For testing H4 : = 0 against A4 : 0, the test given by
()
1,
z =
()
if t z < C ()
or t z > C (C > 0)
0, otherwise,
356 13 Testing Hypotheses
tn " 1 tn " 1
% %
0 C C 0
Figure 13.11 H1 : 0, A1 : > 0. Figure 13.12 H 1 : 0, A1 : < 0.
where C is determined by
( )
P t n1 > C = 2,
is UMPU; t(z) is given by (35). (See also Fig. 13.13.)
tn " 1
% % Figure 13.13 H4 : = 0, A4 : 0.
2 2
"C 0 C
Exercises
13.5.1 The diameters of bolts produced by a certain machine are r.v.s dis-
tributed as N(, 2). In order for the bolts to be usable for the intended
purpose, the s.d. must not exceed 0.04 inch. A sample of size 16 is taken and
is found that s = 0.05 inch. Formulate the appropriate testing hypothesis
problem and carry out the test if = 0.05.
13.5.2 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2), where both and are
unknown.
i) Derive the UMPU test for testing the hypothesis H : = 0 against the
alternative A : 0 at level of significance ;
ii) Carry out the test if n = 25, 0 = 2, 25 )2 = 24.8, and = 0.05.
j = 1 (xj x
x j = 1,752 and x 2
j = 31, 157.
j =1 j =1
Make the appropriate assumptions and test the hypothesis H that the manu-
facturers claim is correct against the appropriate alternative A at level of
significance = 0.01.
13.5.5 The diameters of certain cylindrical items produced by a machine are
r.v.s distributed as N(, 0.01). A sample of size 16 is taken and is found that
x = 2.48 inches. If the desired value for is 2.5 inches, formulate the appropri-
ate testing hypothesis problem and carry out the test if = 0.05.
%
%
0 C0 0 C0
Figure 13.14 H1 : 0, A1 : > 0. Figure 13.15 H 1 : 0, A1 : < 0.
j =1 ( y j y )
n 2
1, if >C
(
z, w = ) i =1 ( xi x )
m 2
(38)
0, otherwise,
where C is determined by
(m 1)C ,
(
P Fn1,m1 > C0 = , ) C0 = (39)
(n 1) 0
is UMPU. The test given by (38) and (39) with reversed inequalities is UMPU
for testing H : 0 against A1 : < 0. (See also Figs. 13.14 and 13.15; Fn1,m1
stands for an r.v. distributed as Fn1,m1.)
The power of the test is easily determined by the fact that
1 n n
( ) (Yj Y )
2 2
Yj Y
22 j = 1
n1
1 m1 j =1
=
1 m n1 m
( ) (Xi X )
2 2
Xj X
12 i = 1
m1
i =1
is Fn1,m1 distributed when obtains. Thus the power of the test depends only
on . For m = 25, n = 21, 0 = 2 and = 0.05, one has P(F20,24 > 5C/12) = 0.05,
hence 5C/12 = 2.0267 and C = 4.8640 for H1; for H1,
5C 12
P F20 ,24 < = P F24 ,20 > = 0.05
12 5C
implies 12/5C = 2.0825 and hence C = 1.1525.
Now set
n
( yj y)
1 2
0
(
V z, w = ) m
j =1
n
. (40)
( xi x ) ( yj y)
2 1 2
+
i =1 0 j =1
( 1,
)
z, w =
if V z, w < C1 ( ) (
or V z, w > C 2 )
0, otherwise,
P C1 < B1 1 < C 2 = P C1 < B1 1 < C2 = 1 ,
2
( n1) , ( m1)
2 2
( n+1) , ( m1)
2
is UMPU; V(z, w) is defined by (40). (Br ,r stands for an r.v. distributed as Beta
1 2
with r1, r2 degrees of freedom.) For the actual determination of C1, C2, we use
the incomplete Beta tables. (See, for example, New Tables of the Incomplete
Gamma Function Ratio and of Percentage Points of the Chi-Square and Beta
Distributions by H. Leon Harter, Aerospace Research Laboratories, Office of
Aerospace Research; also, Tables of the Incomplete Beta-Function by Karl
Pearson, Cambridge University Press.)
yx
(
t z, w =) . (41)
( ) ( )
2 2
+ j =1 y j y
m n
i =1
xi x
(
1,
)
z, w =
if t z, w > C ( ) (42)
0, otherwise,
where C is determined by
m+n2
( )
P t m+ n 2 > C0 = , C0 = C , (43)
(
1 m + 1 n) ( )
is UMPU. The test given by (42) and (43) with reversed inequalities is UMPU
for testing H1 : 0 against A1 : < 0; t(z, w) is given by (41). The determination
of the power of the test involves a non-central t-distribution, as was also the
case in Propositions 5 and 6.
For example, for m = 15, n = 10 and = 0.05, one has for H1 :
P(t23 > C 23 6 ) = 0.05; hence C 23 6 = 1.7139 and C = 0.1459. For H1,
C = 0.1459.
( )
1,
z, w =
( )
if t z, w < C ( )
or t z, w > C
0, otherwise,
where C is determined by
( )
P t m+ n 2 > C0 = 2 ,
C0 as above, is UMPU.
Again with m = 15, n = 10 and = 0.05, one has P(t23 > C 23 6 ) = 0.025
and hence C 23 6 = 2.0687 and C = 0.1762.
Once again the determination of the power of the test involves the non-
central t-distribution.
REMARK 6 In Propositions 9 and 10, if the variances are not equal, the tests
presented above are not UMPU. The problem of comparing the means of two
normal densities when the variances are unequal is known as the Behrens
Fisher problem. For this case, various tests have been proposed but we will not
discuss them here.
Exercises
13.6.1 Let Xi, i = 1, . . . , 9 and Yj, j = 1, . . . , 10 be independent random
samples from the distributions N(1, 21) and N(2, 22), respectively. Suppose
that the observed values of the sample s.d.s are sX = 2, sY = 3. At level of
significance = 0.05, test the hypothesis: H : 1 = 2 against the alternative A : 1
2 and find the power of the test at 1 = 2, 2 = 3. (Compute the value of the
test statistic, and set up the formulas for determining the cut-off points and the
power of the test.)
13.6.2 Let Xj, j = 1, . . . , 4 and Yj, j = 1, . . . , 4 be two independent random
samples from the distributions N(1, 21) and N(2, 22), respectively. Suppose
that the observed values of the Xs and Ys are as follows:
x1 = 10.1, x 2 = 8.4, x3 = 14.3, x 4 = 11.7,
y1 = 9.0, y2 = 8.2, y3 = 12.1, y4 = 10.3.
Test the hypothesis H : 1 = 2 against the alternative A : 1 2 at level of
significance = 0.05. (Compute the value of the test statistic, and set up the
formulas for determining the cut-off points and the power of the test.)
13.6.3 Five resistance measurements are taken on two test pieces and the
observed values (in ohms) are as follows:
x1 = 0.118, x 2 = 0.125, x3 = 0.121, x 4 = 0.117, x5 = 0.120
y1 = 0.114, y2 = 0.115, y3 = 0.119, y4 = 0.120, y5 = 0.110.
Make the appropriate assumptions and test the hypothesis H : 1 = 2 against
the alternative A : 1 2 at level of significance = 0.05. (Compute the value
13.3 UMP Tests for Testing Certain
13.7 Composite
Likelihood Hypotheses
Ratio Tests 361
of the test statistic, and set up the formulas for determining the cut-off points
and the power of the test.)
13.6.4 Refer to Exercise 13.6.2 and suppose it is known that 1 = 4 and 2 =
3. Test the hypothesis H that the two means do not differ by more than 1 at
level of significance = 0.05.
13.6.5 The breaking powers of certain steel bars produced by processes A
and B are r.v.s distributed as normal with possibly different means but the
same variance. A random sample of size 25 is taken from bars produced by
each one of the processes, and it is found that x = 60, sX = 6, y = 65,
sY = 7. Test whether there is a difference between the two processes at the level
of significance = 0.05.
13.6.6 Refer to Exercise 13.6.3, make the appropriate assumptions, and
test the hypothesis H : 1 = 2 against the alternative A : 1 2 at level of
significance = 0.05.
13.6.7 Let Xi, i = 1, . . . , n and Yi, i = 1, . . . , n be independent random
samples from the distributions N(1, 21) and N(2, 22), respectively, and sup-
pose that all four parameters are unknown. By setting Zi = Xi Yi, we have
that the paired r.v.s Zi, i = 1, . . . , n, are independent and distributed as N(,
2) with = 1 2 and 2 = 21 + 22. Then one may use Propositions 5 and 6 to
test hypotheses about .
Test the hypotheses H1 : 0 against A1 : > 0 and H2 : = 0 against
A2 : 0 at level of significance = 0.05 for the data given in (i) Exercise 13.6.2;
(ii) Exercise 13.6.3.
( )
P 2 log x G x , () x 0 for all ,
13.3 UMP Tests for Testing Certain
13.7 Composite
Likelihood Hypotheses
Ratio Tests 363
EXAMPLE 11 (Testing the mean of a normal distribution) Let X1, . . . , Xn be i.i.d. r.v.s from
N(, 2), and consider the following testing hypotheses problems.
(x )
= 1
L exp x
( )
j
2
n 2
2 j =1
and
1 n
2
( )
L =
1
exp (x )
0 .
( )
j
2
n 2
2 j =1
n
( )
2
2 log = x 0
2
and the LR test is equivalent to
( )
2
n x
if
0
1, >C
()
z =
0, otherwise,
where C is determined by
(
P 12 > C = . )
(Recall that z = (x1, . . . , xn).)
Notice that this is consistent with Theorem 6. It should also be pointed
out that this test is the same as the test found in Example 10 and therefore
the present test is also UMPU.
ii) Consider the same problem as in (i) but suppose now that is also un-
known. We are still interested in testing the hypothesis H : = 0 which now
is composite, since is unspecified.
364 13 Testing Hypotheses
1 n 1 n
( ) ( )
2 2
2 = x j x and 2 = x j 0
n j =1 n j =1
1 2
( )
n
(x )
= 1 1
L exp x = e n 2
( ) ( )
j
2
n 2 n
2 j =1 2
and
1 n
2
( )
L =
1
exp (x 0 = ) 1
e n 2 .
( ) ( )
j
2
n 2 n
2 j =1 2
Then
(x x )
n 2 n 2
2 j=1 j
= 2 or 2 n
= .
(x )
n 2
j=1 j 0
But
n n
(x ) = (x ) ( )
2 2 2
j 0 j x + n x 0
j =1 j =1
and therefore
1
( )
2 1
1 n x 0 t2
2 n
= 1+ = 1 + ,
n1 1 n 2 n 1
xj x
n 1 j =1
( )
where
n x 0( )
t =t z = () .
1 n
( )
2
xj x
n 1 j =1
Then < 0 is equivalent to t2 > C for a certain constant C. That is, the LR
test is equivalent to the test
1, if t < C or t > C
()
z =
0, otherwise,
13.3 UMP Tests for Testing Certain
13.7 Composite
Likelihood Hypotheses
Ratio Tests 365
where C is determined by
(
P t n1 > C = 2.)
Notice that, by Proposition 6, the test just derived is UMPU.
EXAMPLE 12 (Comparing the parameters of two normal distributions) Let X1, . . . , Xm be
i.i.d. r.v.s from N(1, 21) and Y1, . . . , Yn be i.i.d. r.v.s from N(2, 22). Suppose
that the Xs and Ys are independent and consider the following testing hy-
potheses problems. In the present case, the joint p.d.f. of the Xs and Ys is
given by
1 m m
2
(x ) (y )
1 1 2 1
exp 2 .
( )
m +n i 1 j
1 2
m n
2 1
2
2 22
2 i =1 j =1
1 m n
2
( ) + (y )
2
1, = x , 2 , = y , 2 = xi x j y ,
m + n i =1 j =1
as is easily seen. Therefore
L( )
= 1
e
(
m +n ) 2
.
( )
m +n
2
1 m n mx + ny
=
m + n i =1
x i + yj =
m+ n
,
j =1
1 m +n 1 m n
=
m + n k =1
k =
m + n i =1
x i + y j =
j =1
and
1 m +n 1 m n
2
( ) ( ) + (y )
2 2
2 = vk v
m + n k =1
= xi
m + n i =1
j .
j =1
Therefore
( )
L =
1
e
(
m +n ) 2
.
( )
m +n
2
It follows that
366 13 Testing Hypotheses
m +n
(
2 m +n ) 2
= and = .
2
Next
( xi ) = [( xi x ) + ( x )] = ( xi x )
m m m
v
( )
2 2 2 2
+ m x
i =1 i =1 i =1
m
mn 2
( ) (x y) ,
2 2
= xi x +
(m + n)
2
i =1
so that
m n
(m + n) = (m + n) mn
( ) = (x x) + (y )
2 2 2 2 2
+ xy i j y
m+ n i =1 j =1
mn
( )
2
+ xy .
m+ n
where
m n
2
mn
( ) 1
( ) + (y )
2
t= xy xi x j y .
m+ n m + n 2 i = 1 j =1
Therefore the LR test which rejects H whenever < 0 is equivalent to the
following test:
(
1,
z, w = )
if t < C or t > C (C > 0)
0, otherwise,
where C is determined by
(
P t m+ n 2 > C = 2, )
and z = (x1, . . . , xm), w = (y1, . . . , yn), because, under H, t is distributed as
tm+n2. We notice that the test above is the same as the UMPU test found in
Proposition 10.
13.3 UMP Tests for Testing Certain
13.7 Composite
Likelihood Hypotheses
Ratio Tests 367
and
1 m n
2
( ) + (y )
2
2 = xi x j y .
m + n i =1 j =1
Therefore
( )
=
L
1
1
e
(
m +n ) 2
( 2 ) ( ) ( )
m +n m 2 n 2
2 2
1 , 2 ,
and
L = ( ) 1
1
e
( m +n ) 2
,
( 2 ) ( ) ( m +n ) 2
m +n
2
so that
( ) ( )
m 2 n 2
2 2
1 , 2 ,
=
( )( )
m +n 2
2
m 2
(m + n)( ) 2 m
( ) (y y )
m +n 2 2
i =1 xi x
n
j
=
j =1
( m +n ) 2
m
( ) ( ) ( )
2
mm 2 nn 2 i = 1 xi x + j = 1 y j y
2 n 2 n
yj y
j =1
m 2
( )
2
m 1 i = 1 xi x m 1
m
n1
( m + n)
( )
( )
2
j = 1 y j y n 1
m +n 2 n
=
mm 2 nn 2 ( m +n ) 2
( )
2
1 + m 1 i = 1 xi x m 1
m
n1
( )
2
j = 1 y j y n 1
n
368 13 Testing Hypotheses
m 2
m1
( n+ m)
(m + n) f
2
n1
= ,
m m 2 nn 2 ( m+ n) 2
m1
1 + f
n1
where f = [mi= 1(xi x)2/(m 1)]/[nj= 1(yj y)2/(n 1)].
Therefore the LR test, which rejects H whenever < 0, is equivalent to
the test based on f and rejecting H if
m 2
m1
f
n1
<C for a certain constant C.
( m+ n) 2
m1
1 + f
n1
Setting g( f) for the left hand side of this last inequality, we have that g(0) = 0
and g(f) 0 as f . Furthermore, it can be seen (see Exercise 13.7.4) that
g(f) has a maximum at the point
fmax =
( );
m n1
n( m 1)
it is increasing between 0 and fmax and decreasing in (fmax, ). Therefore
()
g f < C if and only if f < C1 or f > C2
for certain specified constants C1 and C2.
Now, if in the expression of f the xs and ys are replaced by Xs and Ys,
respectively, and denote by F the resulting statistic, it follows that, under H, F
is distributed as Fm1,n1. Therefore the constants C1 and C2 are uniquely deter-
mined by the following requirements:
(
P Fm1,n1 < C1 or Fm1,n1 > C 2 = ) ( ) ( )
and g C1 = g C 2 .
However, in practice the C1 and C2 are determined so as to assign probability
/2 to each one of the two tails of the Fm1,n1 distribution; that is, such that
( ) (
P Fm1,n1 < C1 = P Fm1,n1 > C 2 = 2. )
(See also Fig. 13.16.)
Fm " 1, n " 1
%
2 % Figure 13.16
2
0 C1 C2
13.3 UMP Tests for Testing Certain Composite Hypotheses
Exercises 369
Exercises
13.7.1 Refer to Exercise 13.4.2 and use the LR test to test the hypothesis
H : p = 0.25 against the alternative A : p 0.25. Specifically, set (t) for the
likelihood function, where t = x1 + x2 + x3, and:
i) Calculate the values (t) for t = 0, 1, 2, 3 as well as the corresponding
probabilities under H;
ii) Set up the LR test, in terms of both (t) and t;
iii) Specify the (randomized) test of level of significance = 0.05;
iv) Compare the test in part (iii) with the UMPU test constructed in Exercise
13.4.2.
13.7.2 A coin, with probability p of falling heads, is tossed 100 times and 60
heads are observed. At level of significance = 0.1:
i) Test the hypothesis H : p = 12 against the alternative A : p 12 by using the
LR test and employ the appropriate approximation to determine the cut-
off point;
ii) Compare the cut-off point in part (i) with that found in Exercise 13.4.1.
13.7.3 If X1, . . . , Xn are i.i.d. r.v.s from N(, 2), derive the LR test and the
test based on 2 log for testing the hypothesis H : = 0 first in the case that
is known and secondly in the case that is unknown. In the first case,
compare the test based on 2 log with that derived in Example 11.
13.7.4 Consider the function
m 2
m1
t
n1
()
gt =
( m+ n) 2
,
m1
1 + t
n1
[ ()
max g t ; t 0 =] mm 2
[1 + (m n)]( )
m+ n 2
0,
(
m n1 ) m n 1
and decreasing in
, .
( )
(
n m1 ) n m 1
( )
370 13 Testing Hypotheses
(
P X1 = x1 , . . . , X k = x k = ) n!
x1! x k !
p1x pkx ,
1 k
k
( )
= = p1 , . . . , pk ; p j > 0, j = 1, . . . , k, p j = 1.
j =1
We may suspect that the ps have certain specified values; for example, in
the case of a die, the die may be balanced. We then formulate this as a
hypothesis and proceed to test it on the basis of the data. More generally, we
may want to test the hypothesis that lies in a subset of .
Consider the case that H : = {
0} = {(p10, . . . , pk0)}. Then, under ,
( )
L =
n!
x1! x k !
p10x pkx0 ,
1 k
while, under ,
L( )
= n!
x1! x k !
p 1x p kx ,
1 k
where pj = xj/n are the MLEs of pj, j = 1, . . . , k (see Example 11, Chapter 12).
Therefore
xj
p
k
= n j 0
n
j =1 x j
and H is rejected if 2 log > C. The constant C is determined by the fact that
2 log is asymptotically 2k1 distributed under H, as it can be shown on the
basis of Theorem 6, and the desired level of significance .
Now consider r events Ai, i = 1, . . . , r which form a partition of the sample
space S and let {Bj, j = 1, . . . , s} be another partition of S. Let pij = P(Ai Bj)
and let
s r
pi . = pij , p. j = pij .
j =1 i =1
13.8 Applications
13.3of UMP
LR Tests:
TestsContingency
for Testing Certain
Tables,Composite
Goodness-of-Fit
Hypotheses
Tests 371
p = p i. .j = pij = 1.
i =1 j =1 i =1 j =1
Furthermore, the events {A1, . . . , Ar} and {B1, . . . , Bs} are independent if and
only if pij = pi. p.j, i = 1, . . . , r, j = 1, . . . , s.
A situation where this set-up is appropriate is the following: Certain
experimental units are classified according to two characteristics denoted
by A and B and let A1, . . . , Ar be the r levels of A and B1, . . . , Br be the J
levels of B. For instance, A may stand for gender and A1, A2 for male and
female, and B may denote educational status comprising the levels B1 (el-
ementary school graduate), B2 (high school graduate), B3 (college graduate),
B4 (beyond).
We may think of the rs events Ai Bj being arranged in an r s rectangular
array which is known as a contingency table; the event Ai Bj is called the
(i, j)th cell.
Again consider n experimental units classified according to the character-
istics A and B and let Xij be the number of those falling into the (i, j)th cell. We
set
s r
Xi . = X ij and X . j = X ij .
j =1 i =1
X i. = X . j = n.
i =1 j =1
p ij = pi , p ij = qj ,
j =1 i =1
the likelihood function takes the following forms under and , respectively.
Writing i,j instead of ir= 1 js= 1, we have
( ) n!x ! p
L =
i ,j
xij
ij ,
i ,j ij
n! x
( ) n!x ! ( p q ) n!
xij
pi q j = pi qj
x x x
L = = ij ij i. .j
i ,j ij i ,j
i j
i , j ij i , j
x ! i , j ij i
x ! j
since
, pi q j = pi q j = pix q!x q sx
xij xij xij xij i i1 is
i j i j i
= pix q1x q sx = pix qj
x
.
i i1 is i j
i i i j
xij xi . x. j
p ij ,W = , p i , = , q j , = ,
n n n
as is easily seen (see also Exercise 13.8.1). Therefore
xij
xij
n! xi .
x x x
( )
i. .j
=
L
n!
i , j xij ! i , j n
, L = ( )
i , j xij ! i n
. j
j n
and hence
x xi . x x. j
i . . j xi
x j
i n j n xi . x j
i j
= = .
nn xij ij
xij x
x
nij i ,j
i ,j
(X )
2
k np j
= 2 j
.
j =1 np j
13.3 UMP Tests for Testing Certain Composite Hypotheses
Exercises 373
(x )
2
k npj 0
=
2 j
j =1 npj 0
and reject H if 2 is too large, in the sense of being greater than a certain
constant C which is specified by the desired level of the test. It can further be
shown that, under , 2 is asymptotically distributed as 2k1. In fact, the present
test is asymptotically equivalent to the test based on 2 log .
For the case of contingency tables and the problem of testing indepen-
dence there, we have
(x )
2
npi q j
= 2 ij
,
i ,j npi q j
where is as in the previous case in connection with the contingency tables.
However, 2 is not a statistic since it involves the parameters pi, qj. By replac-
ing them by their MLEs, we obtain the statistic
(x )
2
np i , p j ,
= 2 ij
.
i ,j np i , q j ,
Exercises
xij
13.8.1 Show that p ij , = , p i , = , q j , =
xi . x j
n n n as claimed in the discussion in
this section.
In Exercises 13.8.213.8.9 below, the test to be used will be the appropriate 2
test.
13.8.2 Refer to Exercise 13.7.2 and test the hypothesis formulated there at
the specified level of significance by using a 2-goodness-of-fit test. Also,
compare the cut-off point with that found in Exercise 13.7.2(i).
374 13 Testing Hypotheses
13.8.3 A die is cast 600 times and the numbers 1 through 6 appear with the
frequencies recorded below.
1 2 3 4 5 6
100 94 103 89 110 104
A B C D F
3 12 10 4 1
13.8.6 It is often assumed that I.Q. scores of human beings are normally
distributed. Test this claim for the data given below by choosing appropriately
the Normal distribution and taking = 0.05.
x 90 90 < x 100 100 < x 110 110 < x 120 120 < x 130 x > 130
10 18 23 22 18 9
(Hint: Estimate and 2 from the grouped data; take the midpoints for the
finite intervals and the points 65 and 160 for the leftmost and rightmost
intervals, respectively.)
13.8.7 Consider a group of 100 people living and working under very similar
conditions. Half of them are given a preventive shot against a certain disease
and the other half serve as control. Of those who received the treatment, 40
did not contract the disease whereas the remaining 10 did so. Of those not
treated, 30 did contract the disease and the remaining 20 did not. Test effec-
tiveness of the vaccine at the level of significance = 0.05.
13.3
13.9 UMP
Decision-Theoretic
Tests for Testing
Viewpoint
Certain Composite
of Testing Hypotheses 375
WARD
Totals
1 2 3 4
Favor
Proposal 37 29 32 21 119
Do not favor
proposal 63 71 68 79 281
Totals 100 100 100 100 400
0, if and = 0, or c and = 1.
( )
L ; = L1 , if and = 1
L2 , if c and = 0,
where L1, L2 > 0.
Clearly, a decision function in the present framework is simply a test
function. The notation instead of could be used if one wished.
By setting Z = (X1, . . . , Xn), the corresponding risk function is
( ) ( ) ( ) (
R ; = L ; 1 P Z B + L ; 0 P Bc , ) ( )
or
L1P Z B ,
( )
if
R ; = ( ) (44)
L2 P Z B ,
c
( )
if c .
In particular, if = {
0}, c = {1} and P (Z B) = , P (Z B) = , we have
0 1
L1 , if = 0
R ; = ( )
L2 1 , ( )
if = 1 .
(45)
[( ) ( )] [( ) (
max R 0 ; , R 1 ; max R 0 ; * , R 1 ; * )]
for any other decision function *.
Regarding the existence of minimax decision functions, we have the result
below. The r.v.s X1, . . . , Xn is a sample whose p.d.f. is either f(; 0) or else
f(; 1). By setting f0 = f(; 0) and f1 = f(; 1), we have
THEOREM 7 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), = {0, 1}. We are
interested in testing the hypothesis H : = 0 against the alternative A : = 1 at
level . Define the subset B of ! n as follows: B = {z ! n; f(z; 1) > Cf(z; 0)}
and assume that there is a determination of the constant C such that
( )
L1P0 Z B = L2 P1 Z Bc ( ) (equivalently, R( ; ) = R( ; )).
0 1 (46)
1, if z B
()
z = (47)
0, otherwise,
is minimax.
PROOF For simplicity, set P0 and P1 for P and P , respectively, and similarly
0 1
(
R 0; = L1) ( )
and R 1; = L2 1 . ( )
Let A be any other (measurable) subset of ! and let * be the corresponding n
( )
R 0; * = L1P0 Z A ( ) ( )
and R 1; * = L2 P1 Z Ac . ( )
Consider R(0; ) and R(0; *) and suppose that R(0; *) R(0; ). This is
equivalent to L1P0(Z A) L1P0(Z B), or
(
P0 Z A . )
Then Theorem 1 implies that P1(Z A) P1(Z B) because the test defined
by (47) is MP in the class of all tests of level . Hence
( ) ( )
P1 Z Ac P1 Z Bc , or L2 P1 Z Ac L2 P1 Z Bc , ( ) ( )
or equivalently, R(1; *) R(1; ). That is, if
( ) ( )
R 0; * R 0; , then R 1; R 1; * . ( ) ( ) (48)
Since by assumption !(0; ) = !(1; ), we have
[( ) ( )] ( )
max R 0; * , R 1; * = R 1; * R 1; = max R 0; , R 1; , ( ) [ ( ) ( )]
(49)
whereas if R(0; ) < R(0; *), then
[( ) ( )] ( )
max R 0; * , R 1; * R 0; * > R 0; = max R 0; , R 1; . ( ) [ ( ) ( )]
(50)
Relations (49) and (50) show that is minimax, as was to be seen.
REMARK 7 It follows that the minimax decision function defined by (46) is
an LR test and, in fact, is the MP test of level P0 (Z B) constructed in
Theorem 1.
We close this section with a consideration of the Bayesian approach. In
connection with this it is shown that, corresponding to a given p.d.f. on =
0, 1}, there is always a Bayes decision function which is actually an LR test.
{
More precisely, we have
378 13 Testing Hypotheses
THEOREM 8 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), = {0, 1} and let 0 = {p0,
p1} (0 < p0 < 1) be a probability distribution on . Then for testing the
hypothesis H : = 0 against the alternative A : = 1, there exists a Bayes
decision function corresponding to 0 = {p0, p1}, that is, a decision rule which
0
(44), and by employing the simplified notation used in the proof of Theorem
7, we have
() (
R = L1P0 Z B p0 + L2 P1 Z Bc p1
0
) ( )
(
= p0 L1P0 Z B + p1L2 1 P1 Z B ) [ ( )]
[
= p1L2 + p0 L1P0 Z B p1L2 P1 Z B ( ) ( )] (51)
[ (
p1L2 + p0 L1 f z; 0 p1L2 f z; 1 dz
B
) ( )]
for the continuous case and equal to
[ (
p1L2 + p0 L1 f z; 0 p1L2 f z; 1
zB
) ( )]
for the discrete case. In either case, it follows that the which minimizes R () 0
is given by
1,
z = ()
if ( )
p0 L1 f z; 0 p1L2 f z; 1 < 0 ( )
0,
0
otherwise;
equivalently,
1, if z B
z = ()
0, otherwise,
0
where
pL
(
B = z ! n ; f z; 1 > 0 1 f z; 0 ,
p1L2
) ( )
as was to be seen.
REMARK 8 It follows that the Bayesian decision function is an LR test and
is, in fact, the MP test for testing H against A at the level P0(Z B), as follows
by Theorem 1.
13.3
13.9 UMP
Decision-Theoretic
Tests for Testing
Viewpoint
Certain Composite
of Testing Hypotheses 379
( ) = exp[n( )x ] ,
f z; 1 1 0
f ( z; )
exp n( )
1 0
2 2
1 0
2
so that f(z; 1) > Cf(z; 0) is equivalent to
[( 1
2
)]
exp n 1 0 x > C exp n 12 02 or
( ) x > C0 ,
where
C0 =
1
(
1 + 0 +
log C
) (for > ).
2 n 1 0 ( ) 1 2
(
L1P X > C0 = L2 P X C0 .
0
) 1
( )
As a numerical example, take 0 = 0, 1 = 1, n = 25 and L1 = 5, L2 = 2.5. Then
( )
L1 P 0 X > C0 = L2 P 1 X < C0 ( )
becomes
( )
P 1 X < C0 = 2P0 X > C0 , ( )
or
P 1 [ ( ) (
n X 1 < 5 C0 1 = 2 P )] 0 [ (
n X 0 > 5C0 , ) ]
or
( ) [ ( )]
5C0 5 = 2 1 5C0 , or 2 5C0 5 5C0 = 1 ( ) ( )
Hence C0 = 0.53, as is found by the Normal tables.
Therefore the minimax decision function is given by
1, if x > 0.53
()
z =
0, otherwise.
The type-I error probability of this test is
( ) [ ( )
P X > 0.53 = P N 0, 1 > 0.53 5 = 1 2.65 = 1 0.996 = 0.004
0 ] ( )
380 13 Testing Hypotheses
( ) [ ( ) (
P 1 X > 0.53 = P N 0, 1 > 5 0.53 1 = 2.35 = 0.9906. )] ( )
Therefore relation (44) gives
( )
R 0 ; = 5 0.004 = 0.02 and R 1 ; = 2.5 0.009 = 0.0235. ( )
Thus
[( ) (
max R 0 ; , R 1 ; = 0.0235, )]
corresponding to the minimax given above.
EXAMPLE 14 Refer to Example 13 and determine the Bayes decision function correspond-
ing to 0 = {p0, p1 }.
From the discussion in the previous example it follows that the Bayes
decision function is given by
1, if x > C0
z = ()
0, otherwise,
0
where
C0 =
1
(1 + 0 +
log C
) and C
p0 L1
.
2 n 1 0 ( ) p1L2
1, if x > 0.55
()
z =
0, otherwise.
0
1 (2.75) = 0.003 and the power of the test is P (XX > 0.55) = P[N(1, 1) > 1
2.25] = (2.25) = 0.9878. Therefore relation (51) gives that the Bayes risk
corresponding to { 32 , 13 } is equal to 0.0202.
EXAMPLE 15 Let X1, . . . , Xn be i.i.d. r.v.s from B(1, ). We are interested in determining the
minimax decision function for testing H : = 0 against A : = 1.
We have
( ) =
f z; 1 1
x
1 1
n x
, where x = x j ,
n
f ( z; )
0 0 1 0 j =1
x log
(1 ) > C ,
0 1
(1 )
0
0 1
13.3
13.9 UMP
Decision-Theoretic
Tests for Testing
Viewpoint
Certain Composite
of Testing Hypotheses 381
where
1 1
C 0 = log C n log .
1 0
Let now 0 = 0.5, 1 = 0.75, n = 20 and L1 = 1071/577 1.856, L2 = 0.5. Then
(1 ) = 3
0
(1 )
1
( > 1)
0 1
C0 = log C n log
1 1
log
( ).
1 1 0
1 0 (1 )
0 1
Next, X = Xj is B(n, ) and for C0 = 13, we have P0.5(X > 13) = 0.0577 and
n
j=1
P0.75(X > 13) = 0.7858, so that P0.75(X 13) = 0.2142. With the chosen values of
L1 and L2, it follows then that relation (46) is satisfied. Therefore the minimax
decision function is determined by
1, if x > 13
(z ) =
0, otherwise.
Furthermore, the minimax risk is equal to 0.5 0.2142 = 0.1071.
382 14 Sequential Procedures
Chapter 14
Sequential Procedures
382
14.1 Some Basic Theorems of Sequential Sampling 383
[( )] ( )(
( >)m = E Y j = E E Yj N = E Yj N = n P N = n
n =1
)
( )( ( )(
j 1
= E Yj N = n P N = n + E Y j N = n P N = n .
n =1
) n= j
) (4)
( )(
j 1
( )
m = m P N = n + E Y j N = n P N = n
n =1 n =j
)
384 14 Sequential Procedures
or
( )(
( )
mP N j = E Yj N = n P N = n .
n =j
) (5)
( )(
( )
mP N 1 = m = E Y1 = E Y1 N = n P N = n .
n =1
)
Therefore
E( Y j N = n)P( N = n) = mP( N j ),
j 1,
n= j
and hence
E( Y )(
j =1n = j
j )
N = n P N = n = m P N j = m jP N = j = mEN ,
j =1
( ) j =1
( ) (6)
p
n =1 j =1
jn ( ) (
= p11 + p12 + p22 + + p1n + p2n + + pnn + , )
and
p
j =1 n = j
jn ( ) ( )
= p11 + p12 + + p22 + p23 + + + pnn + pn ,n +1 + + ( )
are equal. That this is, indeed, the case follows from part (i) and calculus
results (see, for example, T.M. Apostol, Theorem 1242, page 373, in
Mathematical Analysis, Addison-Wesley, 1957).
PROOF OF THEOREM 1 Since TN = SN N, it suffices to show (3). To this
end, we have
[( )] ( )(
ETN = E E TN N = E TN N = n P N = n
n=1
)( )
n
= E Y j N = n P N = n
n=1 j=1
( )
n
( )(
n
ii) Here
[( )] ( )(
ETN = E E TN N = E TN N = n P N = n
n =1
)
n
( )(
n
(
= E Yj N = n P N = n = E Yj N = n P N = n
n =1 j =1 n =1 j =1
) )
( )(
= E Yj N = n P N = n .
j =1 n = j
) (7)
E(Y )( ( )(
j =1 n = j
j )
N = n P N = n E Yj N = n P N = n <
j =1 n = j
)
by Lemma 1(i). Next, for j 1,
[ ( )] ( )(
0 = EYj = E E Yj N = E Yj N = n P N = n ,
n =1
) (8)
( )( ( )(
0 = E Yj N = n P N = n + E Yj N = n P N = n
n =1
) n= j
)
( )(
= E Yj N = n P N = n .
n= j
) (9)
This is so because the event (N = n) depends only on Y1, . . . , Yn, so that, for
j > n, E(Yj|N = n) = EYj = 0. Therefore (9) yields
E(Y j N = n)P( N = n) = 0,
j 2. (10)
n= j
E(Y j N = n)P( N = n) = 0,
j 1. (11)
n= j
E(Y )(
j =1n = j
j N = n P N = n = 0. ) (12)
THEOREM 2 Let Z1, Z2, . . . be i.i.d. r.v.s such that P(Zj = 0) 1. Set Sn = Z1 + + Zn and
for two constants C1, C2 with C1 < C2, define the r. quantity N as the smallest n
for which Sn C1 or Sn C2; set N = if C1 < Sn < C2 for all n. Then there exist
c > 0 and 0 < r < 1 such that
(
P N n cr n ) for all n. (13)
PROOF The assumption P(Zj = 0) 1 implies that P(Zj > 0) > 0, or P(Zj < 0)
> 0. Let us suppose first that P(Zj > 0) > 0. Then there exists > 0 such that
P(Zj > ) = > 0. In fact, if P(Zj > ) = 0 for every > 0, then, in particular,
P(Zj > 1/n) = 0 for all n. But (Zj > 1/n) (Zj > 0) and hence 0 = P(Zj > 1/n)
P(Zj > 0) > 0, a contradiction.
Thus for the case that P(Zj > 0) > 0, we have that
There exists > 0 such that ( )
P Z j > = > 0. (14)
With C1, C2 as in the theorem and as in (14), there exists a positive integer m
such that
m > C 2 C1. (15)
For such an m, we shall show that
k +m
P Z j > C 2 C1 > m for k 0. (16)
j = k +1
We have
k +m k +m k +m
I (Z j )
> Z j > m Z j > C 2 C1 ,
j =k +1 j =k +1
(17)
j =k +1
the first inclusion being obvious because there are m Zs, each one of which is
greater than , and the second inclusion being true because of (15). Thus
k +m k +m k +m
( )
P Z j > C 2 C1 P I Z j > = P Z j > = m ,
j = k +1 j = k +1 j = k +1
( )
the inequality following from (17) and the equalities being true because of the
independence of the Zs and (14). Clearly
[ ]
k 1
Skm = Z jm +1 + + Z( j + 1)m .
j =0
(N km + 1) (C 1 < S j < C 2 , j = 1, . . . , km )
[ ]
k 1
I Z jm +1 + + Z( j +1)m C 2 C1 ,
j= 0
the first inclusion being obvious from the definition of N and the second one
following from (18). Therefore
( k 1
) [
P N km + 1 P I Z jm +1 + + Z( j +1)m C 2 C1
j = 0
]
= P[ Z C C ]
k 1
jm +1 + + Z( j +1)m 2 1
j =0
( ) ( ),
k 1 k
1m = 1m
j =0
the last inequality holding true because of (16) and the equality before it by the
independence of the Zs. Thus
( ) ( )
k
P N km + 1 1 m . (19)
( ) ( ) ( )
k
P N n P N km + 1 1 m
(k + 1 )m
1
(1 ) = c (1 )
k+ 1 1 m
= m m
(1 ) m
(k + 1 )m
= cr cr n ;
these inequalities and equalities are true because of the choice of k, relation
(19) and the definition of c and r. Thus for the case that P(Zj > 0) > 0, relation
(13) is established. The case P(Zj < 0) > 0 is treated entirely symmetrically, and
also leads to (13). (See also Exercise 14.1.2.) The proof of the theorem is then
completed.
The theorem just proved has the following important corollary.
COROLLARY Under the assumptions of Theorem 2, we have (i) P(N < ) = 1 and (ii)
EN < .
PROOF
i) Set A = (N = ) and An = (N n). Then, clearly, A = #n=1An. Since also
A1 ! A2 ! , we have A = lim An and hence
n
( ) (
P A = P lim An = lim P An
n
) n
( )
388 14 Sequential Procedures
( ) (
EN = nP N = n = P N n cr n = c r n
n=1 n=1
) n=1 n=1
r
=c < ,
1r
as was to be seen.
REMARK 2 The r.v. N is positive integer-valued and it might also take on the
value but with probability 0 by the first part of the corollary. On the other
hand, from the definition of N it follows that for each n, the event (N = n)
depends only on the r.v.s Z1, . . . , Zn. Accordingly, N is a stopping time by
Definition 1 and Remark 1.
Exercises
14.1.1 For a positive integer-valued r.v. N show that EN = n=1P(N n).
14.1.2 In Theorem 2, assume that P(Zj < 0) > 0 and arrive at relation (13).
( ) ( ).
f1 X1 f1 X n
(
n = n X1 , . . . , X n ; 0, 1 = ) f (X ) f (X )
0 1 0 n
14.1 Some Basic
14.2 Theorems
SequentialofProbability
SequentialRatio
Sampling
Test 389
We shall use the same notation n for n (x1, . . . , xn; 0, 1), where x1, . . . , xn are
the observed values of X1, . . . , Xn.
For testing H against A, consider the following sequential procedure: As
long as a < n < b, take another observation, and as soon as n a, stop
sampling and accept H and as soon as n b, stop sampling and reject H.
By letting N stand for the smallest n for which n a or n b, we have that
N takes on the values 1, 2, . . . and possibly , and, clearly, for each n, the event
(N = n) depends only on X1, . . . , Xn. Under suitable additional assumptions,
we shall show that the value is taken on only with probability 0, so that N will
be a stopping time.
Then the sequential procedure just described is called a sequential prob-
ability ratio test (SPRT) for obvious reasons.
In what follows, we restrict ourselves to the common set of positivity of
f0 and f1, and for j = 1, . . . , n, set
( ),
f1 X j n
(
Z j = Z j X j ; 0, 1 = log ) f (X )
so that log n = Z j .
j =1
0 j
Clearly, the Zjs are i.i.d. since the Xs are so, and if Sn = nj= 1Zj, then N is
redefined as the smallest n for which Sn log a or Sn log b.
At this point, we also make the assumption that Pi[f0(X1) f1(X1)] > 0 for
i = 0, 1; equivalently, if C is the set over which f0 and f1 differ, then it is assumed
that C f0(x)dx > 0 and C f1(x)dx > 0 for the continuous case. This assumption is
equivalent to Pi(Z1 0) > 0 under which the corollary to Theorem 2 applies.
Summarizing, we have the following result.
PROPOSITION 1 Let X1, X2, . . . be i.i.d. r.v.s with p.d.f. either f0 or else f1, and suppose that
n =
( ) ( ),
f1 X1 f1 X n
Z j = log
( ),
f1 X j
j = 1, . . . , n
f (X ) f (X )
0 1 0 n f (X )
0 j
and
n
Sn = Z j = log n .
j =1
For two numbers a and b with 0 < a < b, define the random quantity N as the
smallest n for which n a or n b; equivalently, the smallest n for which
Sn log a or Sn log b for all n. Then
( )
Pi N < = 1 and Ei N < , i = 0, 1.
Thus, the proposition assures us that N is actually a stopping time with
finite expectation, regardless of whether the true density is f0 or f1. The impli-
cation of Pi(N < ) = 1, i = 0, 1 is, of course, that the SPRT described above will
390 14 Sequential Procedures
(
+ a < 1 < b, . . . , a < n1 < b, n a + ) ]
= P ( a ) + P ( a < < b,
1 1 1 1 2 a + )
+ P ( a < < b, . . . , a <
1 1 n1 < b, n a + . ) (21)
Relations (20) and (21) allow us to determine theoretically the cut-off points
a and b when and are given.
In order to find workable values of a and b, we proceed as follows. For
each n, set
(
fin = f x1 , . . . , x n ; i , i = 0, 1 )
and in terms of them, define T n and T n as below; namely
f
T1 = x1 ! ; 11 a , T1= x1 ! ;
f11 x1
b
( ) (22)
f01 f01 x1 ( )
and for n 2,
f
(
)
T n = x1 , . . . , x n ! n ; a < 1 j < b, j = 1, . . . , n 1 and
f0 j
f1n
f0 n
a , (23)
f
( )
T n = x1 , . . . , x n ! n ; a < 1 j < b, j = 1, . . . , n 1 and
f0 j
f1n
f0 n
b. (24)
14.1 Some Basic
14.2 Theorems
SequentialofProbability
SequentialRatio
Sampling
Test 391
In other words, T n is the set of points in ! n for which the SPRT terminates
with n observations and accepts H, while T n is the set of points in ! n for which
the SPRT terminates with n observations and rejects H.
In the remainder of this section, the arguments will be carried out for the
case that the Xjs are continuous, the discrete case being treated in the same
way by replacing integrals by summation signs. Also, for simplicity, the differ-
entials in the integrals will not be indicated.
From (20), (22) and (23), one has
= f0n .
T n
n =1
( )
Pi N = n = fin + fin , i = 0, 1,
T n T n
and by Proposition 1,
n=1
( )
1 = Pi N = n = fin + fin , i = 0, 1.
n=1
T n
n=1
T n
(26)
and 1 be the two types of errors associated with a and b. Then replacing
, , a and b by , , a and b, respectively, in (29) and also taking into
consideration (30), we obtain
1 1
a = and = b
1 1
and hence
1 1
1
1
(
1
1
) and
.
(31)
That is,
1
and 1 . (32)
1
From (31) we also have
(1 )(1 ) (1 )(1 ) and ,
or
(1 ) + (1 ) + and ,
and by adding them up,
( )
+ 1 + 1 . ( ) (33)
Summarizing the main points of our derivations, we have the following result.
PROPOSITION 2 For testing H against A by means of the SPRT with prescribed error probabili-
ties and 1 such that < < 1, the cut-off points a and b are determined
by (20) and (21). Relation (30) provides approximate cut-off points a and b
with corresponding error probabilities and 1 , say. Then relation (32)
provides upper bounds for and 1 and inequality (33) shows that their
sum + (1 ) is always bounded above by + (1 ).
REMARK 3 From (33) it follows that > and 1 > 1 cannot happen
simultaneously. Furthermore, the typical values of and 1 are such as 0.01,
0.05 and 0.1, and then it follows from (32) that and 1 lie close to and
1 , respectively. For example, for = 0.01 and 1 = 0.05, we have <
0.0106 and 1 < 0.0506. So there is no serious problem as far as and 1
are concerned. The only problem which may arise is that, because a and b
are used instead of a and b, the resulting and 1 are too small compared
to and 1 , respectively. As a consequence, we would be led to taking a
much larger number of observations than would actually be needed to obtain
and . It can be argued that this does not happen.
Exercise
14.2.1 Derive inequality (28) by using arguments similar to the ones em-
ployed in establishing relation (27).
14.1
14.3 Some
Optimality
BasicofTheorems
the SPRT-Expected
of Sequential
Sample
Sampling
Size 393
( )
Ei N = nPi N = n = 1Pi N = 1 + nPi N = n
n=1
( ) n= 2
( )
( ) (
= Pi 1 a or 1 b + nPi a < j < b, j = 1, . . . , n 1,
n= 2
n a or n b , i = 0, 1. ) (34)
Thus formula (34) provides the expected sample size of the SPRT under both
H and A, but the actual calculations are tedious. This suggests that we should
try to find an approximate value to Ei N, as follows. By setting A = log a and
B = log b, we have the relationships below:
(a < j < b, j = 1, . . . , n 1, n a or b )
j n n
= A < Zi < B, j = 1, . . . , n 1, Zi A or Z i B , n 2 (35)
i =1 i =1 i =1
and
( 1 ) (
a or 1 b = Z1 A or Z1 B . ) (36)
From the right-hand side of (35), all partial sums Zi, j = 1, . . . , n 1 lie j
i=1
between A and B and it is only the ni=1Zi which is either A or B, and this is
due to the nth observation Zn. We would then expect that ni= 1Zi would not be
too far away from either A or B. Accordingly, by letting SN = Ni= 1Zi, we are led
to assume as an approximation that SN takes on the values A and B with
respective probabilities
(
Pi S N A ) and Pi S N B ,( ) i = 0, 1.
But
( )
P0 S N A = 1 , P0 S N B = ( )
and
( )
P1 S N A = 1 , P1 S N B = . ( )
394 14 Sequential Procedures
Therefore we obtain
( ) (
E0 S N 1 A + B and E1 S N 1 A + B. ) (37)
On the other hand, by assuming that Ei |Z1| < , i = 0, 1, Theorem 1 gives
EiSN = (EiN)(EiZ1). Hence, if also Ei Z1 0, then EiN = (EiSN)/(EiZ1). By
virtue of (37), this becomes
E0 N
(1 )A + B , E1 N
(1 )A + B . (38)
E0 Z1 E1Z1
Thus we have the following result.
PROPOSITION 3 In the SPRT with error probabilities and 1 , the expected sample size EiN,
i = 0, 1 is given by (34). If furthermore Ei|Z1| < and EiZ1 0, i = 0, 1, relation
(38) provides approximations to EiN, i = 0, 1.
REMARK 4 Actually, in order to be able to calculate the approximations
given by (38), it is necessary to replace A and B by their approximate values
taken from (30), that is,
1
A log a = log and B log b = . (39)
1
In utilizing (39), we also assume that < < 1, since (30) was derived under
this additional (but entirely reasonable) condition.
Exercises
14.3.1 Let X1, X2, . . . be independent r.v.s distributed as P(), =
(0, ). Use the SPRT for testing the hypothesis H : = 0.03 against the alterna-
tive A : = 0.05 with = 0.1, 1 = 0.05. Find the expected sample sizes under
both H and A and compare them with the fixed sample size of the MP test for
testing H against A with the same and 1 as above.
14.3.2 Discuss the same questions as in the previous exercise if the Xjs
are independently distributed as Negative Exponential with parameter
= (0, ).
What we explicitly do, is to set up the formal SPRT and for selected
numerical values of and 1 , calculate a, b, upper bounds for and
1 , estimate EiN, i = 0, 1, and finally compare the estimated EiN, i = 0, 1
with the size of the fixed sample size test with the same error probabilities.
EXAMPLE 1 Let X1, X2, . . . be i.i.d. r.v.s with p.d.f.
( ) ( ) ( )
1 x
f x; = x 1 , x = 0, 1, = 0, 1 .
1 1 ( )
1 1 0
A n log log
1 0 (1 )
0 1
n
< X j < B n log
1 1
log
( ).
1 1 0
(40)
j =1 1 0 (1 )
0 1
Next,
Z1 = log
( ) = X log (1 ) + log 1
f1 X1 1 0 1
,
f (X ) (1 )
1
0 1
1 0 1 0
so that
EiZ1 = i log
( ) + log 1
1 1 0 1
, i = 0, 1. (41)
(1 )
0
1 1 0
For a numerical application, take = 0.01 and 1 = 0.05. Then the cut-off
points a and b are approximately equal to a and b, respectively, where a and
b are given by (30). In the present case,
0.05 0.05 0.95
a = = 0.0505 and b = = 95.
1 0.01 0.99 0.01
For the cut-off points a and b, the corresponding error probabilities and
1 are bounded as follows according to (32):
0.01 0.05
a 0.0105 and 1 0.0505.
0.95 0.99
Next, relation (39) gives
5
A log = 1.29667 and B log 95 = 1.97772. (42)
99
At this point, let us suppose that 0 = 38 and 1 = 48 . Then
396 14 Sequential Procedures
log
( ) = log 5 = 0.22185
1 1 0
and log
1 1 4
= log = 0.09691,
(1 )
0 1
3 1 0 5
so that by means of (41), we have
E0 Z1 = 0.13716 and E1Z1 = 0.014015. (43)
Finally, by means of (42) and (43), relation (38) gives
E0 N 92.5 and E1 N 129.4
On the other hand, the MP test for testing H against A based on a fixed
sample size n is given by (9) in Chapter 13. Using the normal approximation,
we find that for the given = 0.01 and = 0.95, n has to be equal to 244.05.
Thus both E0N and E1N compare very favorably with it.
EXAMPLE 2 Let X1, X2, . . . be i.i.d. r.v.s with p.d.f. that of N(, 1). Then
( )
n
j =1
(1
2
)
n = exp 1 0 X j n 12 20
and we continue sampling as long as
( 2
) (
( ) (
n
n 2
A + 2 1 0 1 ) n
0 < X j < B + 21 20
2
1 )
0 . (44)
j =1
Next,
( )=
f1 X1
Z1 = log
f (X )
0
(
1
1 )
0 X1
1 2
2
( )
1 02 ,
so that
1 2
(
EiZ1 = i 1 0
2
)
1 02 , i = 0, 1. ( (45) )
By using the same values of and 1 as in the previous example, we have
the same A and B as before. Taking 0 = 0 and 1 = 1, we have
E0 Z1 = 0.5 and E1Z1 = 0.5.
Thus relation (38) gives
E0 N 2.53 and E1 N 3.63.
Now the fixed sample size MP test is given by (13) in Chapter 13. From this
we find that n 15.84. Again both E0N and E1N compare very favorably with
the fixed value of n which provides the same protection.
15.2 Some Examples 397
Chapter 15
Confidence RegionsTolerance
Intervals
[( ) ( )]
P L X1 , . . . , X n U X1 , . . . , X n 1 for all . (1)
Also we say that U(X1, . . . , Xn) and L(X1, . . . , Xn) is an upper and a lower
confidence limit for , respectively, with confidence coefficient 1 , if for all
,
[ ( )]
P < U X1 , . . . , X n 1
and
[( ) ]
P L X1 , . . . , X n < 1 . (2)
Thus the r. interval [L(X1, . . . , Xn), U(X1, . . . , Xn)] is a confidence interval
for with confidence coefficient 1 , if the probability is at least 1 that the
397
398 15 Confidence RegionsTolerance Intervals
[ ( )] [ (
P L X1 , . . . , X n + P U X1 , . . . , X n )]
[( ) ( )]
= P L X1 , . . . , X n U X1 , . . . , X n + 1,
it follows that, if L(X1, . . . , Xn) and U(X1, . . . , Xn) is a lower and an upper
confidence limit for , respectively, each with confidence coefficient 1 12 ,
then [L(X1, . . . , Xn), U(X1, . . . , Xn)] is a confidence interval for with confi-
dence coefficient 1 . The length l(X1, . . . , Xn) of this confidence interval is
l = l(X1, . . . , Xn) = U(X1, . . . , Xn) L(X1, . . . , Xn) and the expected length is
E l, if it exists.
Now it is quite possible that there exist more than one confidence interval
for with the same confidence coefficient 1 . In such a case, it is obvious
that we would be interested in finding the shortest confidence interval within a
certain class of confidence intervals. This will be done explicitly in a number of
interesting examples.
At this point, it should be pointed out that a general procedure for
constructing a confidence interval is as follows: We start out with an r.v.
Tn() = T(X1, . . . , Xn; ) which depends on and on the Xs only through a
sufficient statistic of , and whose distribution, under P , is completely deter-
mined. Then Ln = L(X1, . . . , Xn) and Un = U(X1, . . . , Xn) are some rather
simple functions of Tn() which are chosen in an obvious manner.
The examples which follow illustrate the point.
Exercise
15.1.1 Establish the relation claimed in Remark 1 above.
dence interval (and also the shortest confidence interval within a certain class)
for with confidence coefficient 1 .
EXAMPLE 1 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). First, suppose that is known, so
that is the parameter, and consider the r.v. Tn() = n(X )/. Then Tn()
depends on the Xs only through the sufficient statistic X of and its distribu-
tion is N(0, 1) for all .
Next, determine two numbers a and b (a < b) such that
[ ( ) ]
P a N 0, 1 b = 1 . (3)
From (3), we have
P a
(
n X ) b = 1
which is equivalent to
P X b X a = 1 .
n n
Therefore
X b , X a (4)
n n
is a confidence interval for with confidence coefficient 1 . Its length is
equal to (b a) /n. From this it follows that, among all confidence intervals
with confidence coefficient 1 which are of the form (4), the shortest one is
that for which b a is smallest, where a and b satisfy (3). It can be seen (see
also Exercise 15.2.1) that this happens if b = c (> 0) and a = c, where c is the
upper /2 quantile of the N(0, 1) distribution which we denote by z /2. There-
fore the shortest confidence interval for with confidence coefficient 1
(and which is of the form (4)) is given by
X z 2 , X + z 2 . (5)
n n
Next, assume that is known, so that 2 is the parameter, and consider
the r.v.
( )
Tn 2 =
nSn2
2
, where Sn
2
=
1 n
n j =1
(
2
Xj . )
Then Tn( 2) depends on the Xs only through the sufficient statistic S 2n of 2
and its distribution is 2n for all 2.
Now determine two numbers a and b (0 < a < b) such that
( )
P a n2 b = 1 . (6)
nS 2
P a 2n b = 1
2
which is equivalent to
nS 2 nS 2
P n 2 n = 1 .
2
b a
Therefore
nSn2 nSn2
, (7)
b a
is a confidence interval for 2 with confidence coefficient 1 and its length
is equal to (1/a 1/b)nS 2n. The expected length is equal to (1/a 1/b)n 2.
Now, although there are infinite pairs of numbers a and b satisfying (6), in
practice they are often chosen by assigning mass /2 to each one of the tails of
the 2n distribution. However, this is not the best choice because then the
corresponding interval (7) is not the shortest one. For the determination of the
shortest confidence interval, we work as follows. From (6), it is obvious that a
and b are not independent of each other but the one is a function of the other.
So let b = b(a). Since the length of the confidence interval in (7) is l = (1/a
1/b)nS 2n, it clearly follows that that a for which l is shortest is given by
dl/da = 0 which is equivalent to
db b 2
= . (8)
da a 2
Now, letting Gn and gn be the d.f. and the p.d.f. of the 2n, relation (6) becomes
Gn(b) Gn(a) = 1 . Differentiating it with respect to a, one obtains
db gn a ()
gn b ( ) db g (a) = 0, or = .
da
n
da gn b ()
Thus (8) becomes a2gn(a) = b2gn(b). By means of this result and (6), it follows
that a and b are determined by
() () a g n (t )dt = 1 .
b
a 2 gn a = b2 gn b and (9)
For the numerical solution of (9), tables are required. Such tables are available
(see Table 678 in R. F. Tate and G. W. Klett, Optimum confidence intervals
for the variance of a normal distribution, Journal of the American Statistical
Association, 1959, Vol. 54, pp. 674682) for n = 2(1)29 and 1 = 0.90, 0.95,
0.99, 0.995, 0.999).
To summarize then, the shortest (both in actual and expected length)
confidence interval for 2 with confidence coefficient 1 (and which is of the
form (7)) is given by
15.2 Some Examples 401
nSn2 nSn2
, ,
b a
25S252 2
25S25
, .
40.646 13.120
25S25 2 2
25S25
,
45.7051 14.2636
and the ratio of their lengths is approximately 1.07.
EXAMPLE 2 Let X1, . . . , Xn be i.i.d. r.v.s from the Gamma distribution with parameter
and a known positive integer, call it r. Then nj= 1Xj is a sufficient statistic for
(see Exercise 11.1.2(iii), Chapter 11). Furthermore, for each j = 1, . . . , n, the
r.v. 2Xj / is 22r, since
2t
2 X (t ) = =
1
(see Chapter 6).
( )
Xj
j
1 2it
2r 2
Therefore
2 n
()
Tn = Xj
j =1
is 22rn for all > 0. Now determine a and b (0 < a < b) such that
( )
P a 22rn b = 1 . (10)
2 n X 2 j =1 X j
j =1 j ,
n
. (11)
b a
Its length and expected length are, respectively,
1 1 n 1 1
l = 2 X j , E l = 2 rn .
a b j =1 a b
As in the second part of Example 1, it follows that the equal-tails confidence
interval, which is customarily employed, is not the shortest among those of the
form (11).
In order to determine the shortest confidence interval, one has to mini-
mize l subject to (10). But this is the same problem as the one we solved in the
second part of Example 1. It follows then that the shortest (both in actual and
expected length) confidence interval with confidence coefficient 1 (which
is of the form (11)) is given by (11) with a and b determined by
() () a g 2 rn (t )dt = 1 .
b
a 2 g 2 rn a = b 2 g 2 rn b and
2 7 X 2 j =1 X j
j =1 j ,
7
.
49.3675 16.5128
The equal-tails confidence interval is
2 7 X 2 j =1 X j
j =1 j ,
7
,
44.461 15.308
so that the ratio of their length is approximately equal to 1.075.
EXAMPLE 3 Let X1, . . . , Xn be i.i.d. r.v.s from the Beta distribution with = 1 and =
unknown.
Then nj= 1Xj, or jn= 1 log Xj is a sufficient statistic for . (See Exercise
11.1.2(iv) in Chapter 11.) Consider the r.v. Yj = 2 log Xj. It is easily seen that
its p.d.f. is 21 exp(yj /2), yj > 0, which is the p.d.f. of a 22. This shows that
n n
()
Tn = 2 log X j = Yj
j =1 j =1
( )
P a 22n b = 1 . (12)
15.2 Some Examples 403
n
P a 2 log X j b = 1
j =1
which is equivalent to
n n
P a 2 log X j b 2 log X j = 1 .
j =1 j =1
Therefore a confidence interval for with confidence coefficient 1 is given
by
a b .
, (13)
2 n log X 2 j =1 log X j
j =1
n
j
ab
l= .
2 j =1 log X j
n
Considering dl/da = 0 in conjunction with (12) in the same way as it was done
in Example 2, we have that the shortest (both in actual and expected length)
confidence interval (which is of the form (13)) is found by numerically solving
the equations
() () a g 2n (t )dt = 1 .
b
g 2n a = g 2n b and
( )
gn yn =
n n 1
n
yn , 0 yn (by Example 3, Chapter 10).
Consider the r.v. Tn() = Yn/. Its p.d.f. is easily seen to be given by
()
hn t = nt n 1 , 0 t 1.
[ () ]
b
P a Tn b = nt n 1dt = bn an = 1 . (14)
a
404 15 Confidence RegionsTolerance Intervals
dl 1 da 1
= X (n ) 2 + ,
db a db b 2
while by way of (14), da/db = bn1/an1, so that
dl an +1 bn +1
= X (n ) .
db b 2 an +1
Since this is less than 0 for all b, l is decreasing as a function of b and its
minimum is obtained for b = 1, in which case a = 1/n, by means of (14).
Therefore the shortest (both in actual and expected length) confidence inter-
val with confidence coefficient 1 (which is the form (15)) is given by
X (n )
X n , .
( ) 1 n
For example, for n = 32 and 1 = 0.95, we have approximately
[X(32), 1.098X(32)].
Exercises 15.2.515.2.7 at the end of this section are treated along the
same lines with the examples already discussed and provide additional inter-
esting cases, where shortest confidence intervals exist. The inclusion of the
discussions in relation to shortest confidence intervals in the previous exam-
ples, and the exercises just mentioned, has been motivated by a paper by W. C.
Guenther on Shortest confidence intervals in The American Statistican,
1969, Vol. 23, Number 1.
Exercises
15.2.1 Let be the d.f. of the N(0, 1) distribution and let a and b with a < b be
such that (b) (a) = (0 < < 1). Show that b a is minimum if b = c (> 0)
and a = c. (See also the discussion of the second part of Example 1.)
15.2.2 Let X1, . . . , Xn be independent r.v.s having the Negative Exponential
distribution with parameter = (0, ), and set U = ni= 1Xi.
ii) Show that the r.v. U is distributed as Gamma with parameters (n, ) and
that the r.v. 2U/ is distributed as 22n;
15.2 Some Exercises
Examples 405
ii) Use part (i) to construct a confidence interval for with confidence coeffi-
cient 1 . (Hint: Use the parametrization f (x; ) = 1/ ex/, x > 0).
15.2.3
ii) If the r.v. X has the Negative Exponential distribution with parameter
= (0, ), show that the reliability R(x; ) = P(X > x) (x > 0) is equal to ex/;
ii) If X1, . . . , Xn is a random sample from the distribution in part (i) and U =
in= 1Xi, then (by Exercise 15.2.2(i)) 2U/ is distributed as 22n. Use this fact
and part (i) of this exercise to construct a confidence interval for R(x; )
with confidence coefficient 1 . (Hint: Use the parametrization f (x; ) =
1/ ex/, x > 0).
15.2.4 Refer to Example 4 and set R = X(n) X(1). Then:
iii) Find the distribution of R;
iii) Show that a confidence interval for , based on R, with confidence coeffi-
cient 1 is of the form [R, R/c], where c is a root of the equation
[ ( )]
c n 1 n n 1 c =
iii) Show that the expected length of the shortest confidence interval in Exam-
ple 4 is shorter than that of the confidence interval in (ii) above. (Hint: Use
the parametrization f (x; ) = 1/ ex/, x > 0).
15.2.5 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. given by
( )
f x; = e
( x )
()
I ( , ) x , = !
and set Y1 = X(1). Then show that:
iii) The p.d.f. g of Y1 is given by g(y) = nen(y )I(,)(y)
iii) The r.v. Tn() = 2n(Y1 ) is distributed as 22;
iii) A confidence interval for , based on Tn(), with confidence coefficient
1 is of the form [Y1 (b/2n), Y1 (a/2n)];
iv) The shortest confidence interval of the form given in (iii) is provided by
22 ;
Y1 , Y1 ,
2n
where 22; is the upper th quantile of the 22 distribution.
15.2.6 Let X1, . . . , Xn be independent r.v.s having the Weibull p.d.f. given in
Exercise 11.4.2, Chapter 11. Then show that:
iii) The r.v. Tn() = 2Y/ is distributed as 22n, where Y = nj= 1X j ;
iii) A confidence interval for , based on Tn(), with confidence coefficient
1 is of the form [2Y/b, 2Y/a];
406 15 Confidence RegionsTolerance Intervals
iii) The shortest confidence interval of the form given in (ii) is taken for a and
b satisfying the equations
a g 2n (t )dt = 1 () ()
b
and a 2 g 2n a = b 2 g 2n b ,
(
f x; = ) 1 x
2
e , (
= 0, . )
Then show that:
2Y
ii) The r.v. Tn() = is distributed as 22n, where Y = nj= 1|Xj|;
ii) and (iii) as in Exercise 15.2.6.
15.2.8 Consider the independent random samples X1, . . . , Xm from N(1, 21)
and Y1, . . . , Yn from N(2, 22), where 1, 2 are known and 1, 2 are unknown,
and let the r.v. Tm,n( 1 2) be defined by
(X ) (
Yn 1 2 ).
(
Tm, n 1 2 = ) m
( 1
2
m + ) ( 2
2 n)
Then show that:
ii) A confidence interval for 1 2, based on Tm,n(1 2), with confidence
coefficient 1 is given by
2 2 2 2
( m
)
X m Yn b 1 + 2 , X m Yn a 1 + 2 ,
n m n
( )
where a and b are such that (b) (a) = 1 ;
ii) The shortest confidence interval of the aforementioned form is provided by
the last expression above with a = b = z .
2
15.2.9 Refer to Exercise 15.2.8, but now suppose that 1, 2 are known and
1, 2 are unknown. Consider the r.v.
2 S2
Tm, n 1 = 12 n2
2 2 Sm
and show that a confidence interval for 21/ 22, based on Tm,n(1/2), with
confidence coefficient 1 is given by
Sm2 Sm2
a 2 , b 2 ,
Sn Sn
15.3 Confidence Intervals in the Presence of15.2
Nuisance
SomeParameters
Examples 407
where 0 < a < b are such that P(a Fn,m b) = 1 . In particular, the equal-
tails confidence interval is provided by the last expression above with
a = Fn,m; /2 and b = Fn,m; /2, where F n,m;
/2 and Fn,m; /2 are the lower and the upper
/2 quantiles, respectively, of Fn,m.
15.2.10 Let X1, . . . , Xm and Y1, . . . , Yn be independent random samples
from the Negative Exponential distributions with parameters 1 and 2, respec-
tively, and set U = mi= 1Xi, V = ni= 1Yj. Then (by Exercise 15.2.2(i)) the inde-
pendent r.v.s 2U/1 and 2V/2 are distributed as 22m and 22n, respectively, so
that the r.v. 2V2 2U1 is distributed as F2n,2m. Use this result in order to construct a
confidence interval for 1/2 with confidence coefficient 1 . (Hint: Employ
the parametrization used in Exercise 15.2.2.)
where tn1; /2 is the upper /2 quantile of the tn1 distribution. For instance, for
n = 25 and 1 = 0.95, the corresponding confidence interval for is taken
from (17) with t 24;0.025 = 2.0639. Thus we have approximately [X n 0.41278S24,
X n + 0.41278S24].
Suppose now that we wish to construct a confidence interval for 2. To this
end, modify the r.v. Tn( 2) of Example 1 as follows:
( ) ( )
n1 S n 1
T n 2 = 2
,
() () a gn 1 (t )dt = 1 .
b
a 2 gn 1 a = b 2 gn 1 b and
Thus with n and 1 as above, one has, by means of the tables cited in
Example 1, a = 13.5227 and b = 44.4802, so that the corresponding interval
approximately is equal to [0.539S 224, 1.775S 224].
EXAMPLE 6 Consider the independent r. samples X1, . . . , Xm from N( 1, 21) and Y1, . . . ,
Yn from N( 2, 22), where all 1, 2, 1 and 2 are unknown.
First, suppose that a confidence interval for 1 2 is desired. For this
purpose, we have to assume that 1 = 2 = , say (unspecified).
Consider the r.v.
(X ) (
Yn 1 2 )
(
Tm, n 1 2 = ) m
.
(m 1)S 2
(
+ n 1 S n21 1 1
m1 )
+
m+n2 m n
Then Tm, n( 1 2) is distributed as tm+n2. Thus, as in the first case of Example
1 (and also Example 5), the shortest (both in actual and expected length)
confidence interval based on Tm,n( 1 2) is given by
(m 1)S 2
(
+ n 1 S n2 1 1 1)
(
X Y t
m n m+n 2; a ) 2
m1
m+ n2
+ ,
m n
(m 1)S 2
(
+ n 1 S n2 1 1 )
1
(X m )
Yn + t m + n 2 ; a 2
m1
m+ n2
+ .
m n
15.3 Confidence Intervals in the Presence of15.2
Nuisance
SomeParameters
Examples 409
For instance, for m = 13, n = 14 and 1 = 0.95, we have t25;0.025 = 2.0595, so that
the corresponding interval approximately is equal to
(
13 14 )
X Y 0.1586 12S 2 + 13S 2 ,
12 13 (X 13 )
Y14 + 0.1586 12S122 + 13S132 .
If our interest lies in constructing a confidence interval for 21/ 22, we
consider the r.v.
2 S2
Tm, n 1 = 12 n2 1
2 2 S m1
which is distributed as Fn1,m1. Now determine two numbers a and b with 0 < a
< b and such that
( )
P a Fn 1,m 1 b = 1 .
Then
12 Sn21
P 2 a 2 2 b = 1 ,
1
2 Sm 1
or
Sm2 1 12 Sm2 1
P 2 a b = 1 .
Sn 1 2
2 2
1
Sn21
Therefore a confidence interval for 21/ 22 is given by
Sm2 1 Sm2 1
a 2 , b 2 .
Sn 1 Sn 1
In particular, the equal-tails confidence interval is provided by
Sm2 1 Sm2 1
2 Fn1,m 1; 2 , Fn 1,m 1; 2 ,
Sn 1 Sn21
where Fn1,m1; /2 and Fn1,m1; /2 are the lower and the upper /2-quantiles of
Fn1,m1. The point Fn1,m1; /2 is read off the F-tables and the point Fn1,m1; /2 is
given by
1
Fn1,m 1; 2 = .
Fm 1,n 1; 2
Thus, for the previous values of m, n and 1 , we have F13,12;/0.025 = 3.2388
and F12,13;0.025 = 3.1532, so that the corresponding interval approximately is
equal to
S122 S122
0.3171 2 , 3.2388 2 .
S13 S13
410 15 Confidence RegionsTolerance Intervals
Exercise
15.3.1 Let X1, . . . , Xn be independent r.v.s distributed as N(, 2). Derive a
confidence interval for with confidence coefficient 1 when is unknown.
(
n Xn ) and
(n 1)S 2
n 1
,
2
[ ( ) ]
P c N 0, 1 c = 1 and (
P a n21 b = 1 . )
From these relationships, we obtain
P , c
(
n Xn
c, a
)
n 1 Sn21
b
( )
2
= P , c
n Xn
(
c P , a
)n 1 Sn21
b
= 1 .
( )
2
Equivalently,
c 2 2 (n 1)S 2
(n 1)S 2
( )
2 n 1 n 1
P , X n , 2 = 1 . (19)
n b a
For the observed values of the Xs, we have the confidence region for (, 2)
indicated in Fig. 15.1. The quantities a, b and c may be determined so that
the resulting intervals are the shortest ones, both in actual and expected
lengths.
Now suppose again that is real-valued. In all of the examples considered
so far the r.v.s employed for the construction of confidence intervals had an
exact and known distribution. There are important examples, however, where
this is not the case. That is, no suitable r.v. with known distribution is available
which can be used for setting up confidence intervals. In cases like this, under
15.4 Confidence RegionsApproximate15.2
Confidence
Some Examples
Intervals 411
$2
1 n
$2 ! a j2 (xj " xn) 2
!1
confidence region for c 2$ 2
(#, $ 2 )' with confidence (# " xn ) 2 ! n
coefficient 1 " %
n
1
$2 ! b
2 (xj " xn) 2
j!1
#
0 x
Figure 15.1
1 n
( )
2
Sn2 = X j Xn
n j =1
2
n
X z (
Xn 1 Xn ), X n + z
(
Xn 1 Xn
. )
n 2
n
2
n
EXAMPLE 9 Let X1, . . . , Xn be i.i.d. r.v.s from P(). Then a confidence interval for with
approximate confidence coefficient 1 is provided by
Xn
X n z 2 .
n
The two-sample problem also fits into this scheme, provided both means
and variances (known or not) are finite.
412 15 Confidence RegionsTolerance Intervals
We close this section with a result which shows that there is an intimate
relationship between constructing confidence regions and testing hypotheses.
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(x; ), ! r. For each * let
us consider the problem of testing the hypothesis H( *): = * at level of
significance , and let A( *) stand for the acceptance region in ! n. Set Z =
(X1, . . . , Xn), z = (x1, . . . , xn), and define the region T(z) in as follows:
() {
T z = : z A .( )} (21)
In other words. T(z) is that subset of with the following property: On the
) is accepted for T(z). From (21), it is obvious that
basis of z, every H(
z A () if and only if T z . ()
Therefore
[ ( )] [ ( )]
P T Z = P Z A 1 ,
so that T(Z) is a confidence region for with confidence coefficient 1 . Thus
we have the following theorem.
THEOREM 1 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(x; ), ! r. For each
*
, consider the problem of testing H( *) : = * at level and let A(
*)
be the acceptance region. Set Z = (X1, . . . , Xn), z = (x1, . . . , xn), and define
T(z) by (21). Then T(Z) is a confidence region for with confidence coefficient
1 .
Exercises
15.4.1 Let X1, . . . , Xn be i.i.d. r.v.s with (finite) unknown mean and (finite)
known variance 2, and suppose that n is large.
iii) Use the CLT to construct a confidence interval for with approximate
confidence coefficient 1 ;
iii) What does this interval become if n = 100, = 1 and = 0.05?
iii) Refer to part (i) and determine n so that the length of the confidence
interval is 0.1, provided = 1 and = 0.05.
15.4.2 Refer to the previous problem and suppose that both and 2 are
unknown. Then a confidence interval for with approximate confidence coef-
ficient 1 is given be relation (20).
iii) What does this interval become for n = 100 and = 0.05?
iii) Show that the length of this confidence interval tends to 0 in probability
(and also a.s.) as n ;
iii) Discuss part (i) for the case that the underlying distribution is B(1, ),
= (0, 1) or P(), = (0, ).
15.5
15.2Tolerance
Some Examples
Intervals 413
If we notice that for the observed values t1 and t 2 of T1 and T2, respectively,
F(t 2) F(t1) is the portion of the distribution mass of F which lies in the interval
(t1, t 2], the concept of a tolerance interval has an interpretation analogous to
that of a confidence interval. Namely, suppose the r. experiment under consid-
eration is carried out independently n times and let (t1, t 2] be the resulting
interval for the observed values of the Xs. Suppose now that this is repeated
independently N times, so that we obtain N intervals (t1, t2]. Then as N gets
larger and larger, at least 100 of the N intervals will cover at least 100p
percent of the distribution mass of F.
Now regarding the actual construction of tolerance intervals, we have the
following result.
THEOREM 2 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f of the continuous type and let Yj = X( j),
j = 1, . . . , n be the order statistics. Then for any p (0, 1) and 1 i < j n, the
r. interval (Yi, Yj] is a 100 percent tolerance interval of 100p percent of F,
where is determined as follows:
()
1
= g j i d ,
p
( )
P Z j Zi p = . (22)
This suggests that we shall have to find the p.d.f. of Zj Zi. Set
W1 = Z1 and Wk = Zk Zk 1 , k = 2, . . . , n.
( ) ( )
Z j Zi = W1 + + Wj W1 + + Wi = Wi +1 + + Wj .
(
n+1 )
( )
n r
v r 1 1 v , 0<v<1
() () (
gr v = r n r + 1 )
0, otherwise.
g (v)dv = .
1
r
p
[( ) ] [
P F X p = P X F 1 p = F F 1 p = p, ( )] [ ( )]
so that (, X] does cover at most p of the distribution mass of F with
probability p, and
[( ) ] [( ) ]
P F X > 1 p = 1 P F X 1 p = 1 F F 1 1 p = 1 1 p = p, [ ( )] ( )
so that (X, ) does cover at least 1 p of the distribution mass of F with
probability p, as was to be seen.
416 16 The General Linear Hypothesis
Chapter 16
416
16.1 Introduction of the Model 417
y j = 1 + 2 x j + + k +1 x kj , j = 1, . . . , n (2 k n 1), (2)
for some values of the parameters 1, . . . , k+1, or, finally,
( )
y j = 1 + 2 cos t j + 3 sin t j + + 2 k cos kt j + 2 k+1 sin kt j , ( )
j = 1, . . . , n (n 2k + 1), (3)
for some values of the parameters 1, . . . , 2k+1.
In the presence of r. errors ej, j = 1, . . . , n, the ys appearing in (1)(3) are
observed values of the following r.v.s, respectively:
Yj = 1 + 2 x j + e j , (
j = 1, . . . , n n 2 , ) (1)
Yj = 1 + 2 x j + + k +1 x kj + e j , (
j = 1, . . . , n 2 k n 1 , ) (2)
j = 1, . . . , n (n 2k + 1). (3)
At this point, one observes that the models appearing in relations (1)(3)
are special cases of the following general model:
Y1 = x11 1 + x 21 2 + + x p1 p + e1
Y2 = x12 1 + x 22 2 + + x p 2 p + e 2
Yn = x1n 1 + x 2 n 2 + + x pn p + e n ,
or in a more compact form
p
Yj = xij i + e j , j = 1, . . . , n with p n and most often p < n. (4)
i =1
By setting
( ) ( )
Y = Y1 , . . . , Yn , = 1 , . . . , p , e = e1 , . . . , en ( ) and
x11 x12 x1n x11 x21 x p1
x
21 x22 x2n x12 x22 x p 2
X= , so that X =
x p1 x p 2 x pn x1n x2n x pn
DEFINITION 1 Let C = (Zij) be an n k matrix whose elements Zij are r.v.s. Then by assuming
the EZij are finite, the EC is defined as follows. EC = (EZij). In particular, for
Z = (Z1, . . . , Zn), we have EZ = (EZ1, . . . , EZn), and for C = (Z EZ)(Z
EZ), we have EC = E[(Z EZ) (Z EZ)]. This last expression is denoted by
/ z and is called the variancecovariance matrix of Z, or just the covariance
matrix of Z. Clearly the (i, j)th element of the n n matrix / z is Co(Zi, Zj), the
covariance of Zi and Zj, so that the diagonal elements are simply the variances
of the Zs.
Since the r.v.s ej, j = 1, . . . , n are r. errors, it is reasonable to assume that
Eej = 0 and that 2(ej) = 2, j = 1, . . . , n. Another assumption about the es
which is often made is that they are uncorrelated, that is, Co(ei, ej) = 0, for
i j. These assumptions are summarized by writing E(e) = 0 and / e = 2 In,
where In is the n n unit matrix.
By then taking into consideration Definition 1 and the assumptions just
made, our model in (5) becomes as follows:
Y = X + e, EY = X = , / Y = 2 I n , (6)
where e is an n 1 r. vector, X is an n p (p n) matrix of known constants,
and is a p 1 vector of parameters, so that Y is an n 1 r. vector.
This is the model we are going to concern ourselves with from now on.
It should also be mentioned in passing that the expectations j of the r.v.s Yj,
j = 1, . . . , n are linearly related to the s and are called linear regression
functions. This motivates the title of the present chapter.
In the model represented by (6), there are p + 1 parameters 1, . . . , p, 2
and the problem is that of estimating these parameters and also testing certain
hypotheses about the s. This is done in the following sections.
is minimum.
DEFINITION 2 Any value of which minimizes the squared norm $Y $2, where = X , is
called a least square estimator (LSE) of and is denoted by .
The norm of an m-dimensional vector v = (v1, . . . , vm), denoted by $v$, is
the usual Euclidean norm, namely
16.2 Least Square EstimatorsNormal Equations 419
1 2
m
v = j2 .
j =1
For the pictorial illustration of the principle of LS, let p = 2, x1j = 1 and
x2j = xj, j = 1, . . . , n, so that j = 1 + 2xj, j = 1, . . . , n. Thus (xj, j), j = 1, . . . ,
n are n points on the straight line = 1 + 2x and the LS principle specifies that
1 and 2 be chosen so that jn= 1(Yj j)2 be minimum; Yj is the (observable) r.v.
corresponding to xj, j = 1, . . . , n. (See also Fig. 16.1.)
(The values of 1 and 2 are chosen in order to minimize the quantity
(Y1 1)2 + + (Y5 5)2.)
From (1, . . . , n) = = X , we have that
p
j = xij i , j = 1, . . . , n
i =1
and
2
n n p
( )
2 2
Y = Yj j = Yj xij i ,
j =1 j =1 i =1
which we denote by S(Y, ). Then any LSE is a root of the equations
3 3 ! &1 , &2 x
Y5
4Y5 " 354
35
Y4
4Y4 " 344
34
33
Y3
4Y3 " 334
32
4Y2 " 324 Y2
x1
x
x2 0 x3 x4 x5
31
4Y1 " 314
Y1
Figure 16.1
420 16 The General Linear Hypothesis
v
( )
S Y, = 0, = 1, . . . , p
v
( ) j =1 i =1
(
j =1
)
S Y, = 2 Yj xij j 1 xvj = 2 xvj Yj + 2 xvj xij i ,
j =1 i =1
THEOREM 1 Any LSE of is a solution of the normal equations and any solution of the
normal equations is an LSE.
PROOF We have
x11 x 21 x p1 1
x12 x 22 x p 2 2
= X =
M
x1n x 2 n x pn p
(
= x11 1 + x 21 2 + + x p1 p , x12 1 + x 22 2 +
+ x p 2 p , . . . , x1n 1 + x 2 n 2 + + x pn p )
(
= 1 x11 , x12 , . . . , x1n ) (
+ 2 x 21 , x 22 , . . . , x 2 n ) +
(
+ p x p1 , x p 2 , . . . , x pn ) = 11 + 2 2 + + p p ,
Vn
Y " X' ! Y "
Y
0
Vr
(p), and Vr Vn. Of course, Y Vn and from (8), it follows that Vr. Let
be the projection of Y into Vr. Then = pj= 1 jj, where j, j = 1, . . . , p may
not be uniquely determined ( is, however) but may be chosen to be functions
of Y since is a function of Y. Now, as is well known, $Y X $2 = $Y $2
becomes minimum if = . Thus is an LSE of if and only if X = , and
this is equivalent to saying that Y X
Vr. Clearly, an equivalent condition
to it is that Y X j, j = 1, . . . , p, or j(Y X
) = 0, j = 1, . . . , p. From
the definition of j, j = 1, . . . , p, this last condition is equivalent to X(Y X )
= 0, or equivalently, XX
= XY, which is the matrix notation for the normal
equations. This completes the proof of the theorem. (For a pictorial
illustration of some of the arguments used in the proof of the theorem, see
Fig. 16.2.)
In the course of the proof of the last theorem, it was seen that there exists
at least one LSE or and by the theorem itself the totality of LSEs coincides
with the set of the solutions of the normal equations. Now a special but
important case is that where X is of full rank, that is, rank X = p. Then S = XX
is a p p symmetric matrix of rank p, so that S1 exists. Therefore the normal
equations in (7) provide a unique solution, namely = S1XY. This is part of
the following result.
THEOREM 2 If rank X = p, then there exists a unique LSE of given by the expression
= S 1 XY, where S = XX . (9)
Furthermore, this LSE is linear in Y, unbiased and has covariance matrix given
by / = 2S1.
PROOF The existence and uniqueness of the LSE and the fact that it is
given by (9) have already been established. That it is linear in Y follows
immediately from (9). Next, its unbiasedness is checked thus:
( )
E = E S 1 XY = S 1 X E Y = S 1 XX = S 1S = I = . p
422 16 The General Linear Hypothesis
and therefore
( ) ( ) (
= E a Y = E d Y + b EY = E d Y + b X . )
16.2 Least Square EstimatorsNormal Equations 423
( )
2 2 2
2 a Y = a / Y a = 2 a =2 d +2 b .
Since also
( )
2
2 d = 2 d Y
by (10) again, we have
( ) ( )
2
2 aY = 2 dY + 2 b
from which we conclude that
( )
2 aY 2 dY . ( )
iii) By (i), E(dY) = = c
identically in . But
( )
E d Y = d EY = d X ,
= c
so that dX , the
identically in . Hence dX = c. Next, with = X
projection of Y into Vr, one has d(Y ) = 0, since d Vr. Therefore
d Y = d = d X = c .
Y) = . Then we have
iv) Finally, let Vr be such that E(
( ) ( ) [( ) ] (
0 = E Y E d Y = E d Y = d X . )
d)X
That is, ( = 0 identically in and hence ( d)X = 0, or
d)X = 0 which is equivalent to saying that d Vr. So, both d
(
Vr and d Vr and hence d = 0. Thus = d, as was to be seen.
Part (iii) of Lemma 1 justifies the following definition.
DEFINITION 4 Let = c
be an estimable function. Thus there exists a Vn such that E(aY)
(= dY),
= identically in , and let d be the projection of a into Vr. Set = c
where is any LSE of . Then the unbiased, linear (in Y) estimator of is
called the LSE of .
We are now able to formulate and prove the following basic result.
THEOREM 3 (GaussMarkov) Assume the model described in (6) and let be an estimable
function. Then its LSE has the smallest variance in the class of all linear in
Y and unbiased estimators of .
PROOF Since is estimable there exists a Vn such that E(aY) = identi-
cally in , and let d be the projection of a into Vr. Then if bY is any other linear
in Y and unbiased estimator of , it follows, by Lemma 1, that 2(bY)
2(dY). Since dY = , the proof is complete.
424 16 The General Linear Hypothesis
COROLLARY Suppose that rank X = p. Then for any c Vp, the function = c is estimable,
and hence its LSE = c has the smallest variance in the class of all linear in
Y and unbiased estimators of . In particular, the same is true for each j,
j = 1, . . . , p, where = (1, . . . , p).
PROOF The first part follows immediately by the fact that = S1XY.
The particular case follows from the first part by taking c to have all its
components equal to zero except for the jth one which is equal to one, for
j = 1, . . . , n.
( )
2 Z j = 2 , j = 1, . . . , n. (11)
Next, let EZ = = (1, . . . , n). Then = E(PY) = P
, where Vr. It follows
then that
j = 0, j = r + 1, . . . , n. (12)
By recalling that is the projection of Y into Vr, we have
n r
Y = Z j j and = Z j j ,
j=1 j=1
so that
16.3 Canonical Reduction of the Linear ModelEstimation of 2 425
2
2
n n n n
Y = Z j j = Zj j Zj j =
j = r +1 j = r +1
Z 2
j . (13)
j = r +1 j = r +1
REMARK 1 From the last theorem above, it follows that in order for
us to be able to actually calculate the LSE 2 of 2, we would have to rewrite
$Y $2 in a form appropriate for calculation. To this end, we have
( ) (Y X )
2 2
Y = Y X = Y X
( )( )
= Y X Y X = Y Y Y X XY + XX
= Y Y Y X + XX XY . ( )
But YX is (1 n) (n p) (p 1) = 1 1, that is, a number. Hence
YX = (YX ) = XY. On the other hand, XX
XY = 0 since XX
= XY
by the normal equations (7). Therefore
n
2
Y = Y Y XY = Yj2 XY. (14)
j =1
426 16 The General Linear Hypothesis
Finally, denoting by rv the vth element of the p 1 vector XY, one has
n
rv = xvj Yj , v = 1, . . . , p (15)
j =1
1 x1
1 x
2
X = and = 1 , 2 . ( )
1 xn
Next,
1 x1
1 x2 n
1 1 1
n x j
XX = = n
j =1
n
= S,
x1 x2 xn
xj
j = 1
x 2
j
j =1
1 xn
so that the normal equations are given by (7) with S as above and
1 1 1 n n
XY =
x1 x2 xn
( j =1 j =1
)
Y1 , Y2 , . . . , Yn = Yj , x j Yj . (17)
Now
2
n n n
( )
2
S = n x j2 x j = n x j x ,
j =1 j =1 j =1
so that
n
(x )
2
j x 0,
j =1
16.3 Canonical Reduction of the Linear ModelEstimation of 2 427
provided that not all xs are equal. Then S1 exists and is given by
n 2 n
1 xj x j
S 1 = n
j =n1 j =1
. (18)
( ) x j
2
n x j x n
j =1 j =1
It follows that
1
= 1 = S 1XY = n
2
( )
2
n x j x
j =1
n 2 n n n
x j Yj x j x j Yj
j =1 j =1 j =1 j =1
,
n n n
x j Yj + n x j Yj
j =1 j =1 j =1
so that
n 2 n n n
x j Y j
j =1 j =1 j =1 j =1
x j x j Y j
1 = ,
n
( )
2
n x j x
j =1
n n n
n x j Yj x j Yj
(19)
2 =
= = =
j 1 j 1 j 1
and
n
( )
2
n x j x
j =1
2
n n n
( )
n x j x = n x 2j x j .
2
j =1
j =1 j =1
But
n n n n
n x j Yj x j Yj = n x j x Yj Y ,
j =1 j =1 j =1 j =1
( )( )
as is easily seen, so that
( x x )(Y Y ) .
n
2 = j =1 i j
(19)
(x x)
n 2
j =1 j
428 16 The General Linear Hypothesis
2
n n n
Y = Yj2 1 Yj 2 x j Yj . (20)
j =1 j =1 j =1
Since also r = p = 2, the LSE of 2 is given by
2
Y
=
2
. (21)
n2
For a numerical example, take n = 12 and the xs and Ys as follows:
x1 = 30 x 7 = 70 Y1 = 37 Y7 = 20
x 2 = 30 x8 = 70 Y2 = 43 Y8 = 26
x3 = 30 x 9 = 70 Y3 = 30 Y9 = 22
x 4 = 50 x10 = 90 Y4 = 32 Y10 = 15
x5 = 50 x11 = 90 Y5 = 27 Y11 = 19
x 6 = 50 x12 = 90 Y6 = 34 Y12 = 20.
Then relation (19) provides us with the estimates 1 = 46.3833 and 2 = 0.3216,
and (20) and (21) give the estimate 2 = 14.8939.
Exercises
16.3.1 Referring to the proof of the corollary to Theorem 4, elaborate on the
assertion that is a function of the r.v.s Zj, j = 1, . . . , r alone.
16.3.2 Verify relation (19).
16.3.3 Show directly, by means of (19) and (19), that
( )
E 1 = 1 , E 2 = 2 , 2 2 =
2
(x )
n 2
j =1 j x
16.4 Testing Hypotheses About % $ E(Y) 429
and that 2 and 1 are normally distributed if the Ys are normally distributed.
2 j = 1 x j2
n
( )
1 =
2
( )
, 2 2 =
2
( ) (x )
2 2
nj = 1 x j x
n n
j =1 j x
and that, if x = 0, then 1 and 2 are uncorrelated;
Again refer to Example 1 and suppose that x = 0. Then
ii) Conclude that 1 is normally distributed if the Ys are normally distributed;
iii) Show that
( )
( ) 2
2
2 1 = , 2 2 =
n
n
j =1
x j2
and, by assuming that n is even and xj[x, x], j = 1, . . . , n for some x > 0,
conclude that 2(2) becomes a minimum if half of the xs are chosen equal
to x and the other half equal to x (if that is feasible). (It should be pointed
out, however, that such a choice of the xswhen feasibleneed not be
optimal. This is the case, for example, when there is doubt about the
linearity of the model.)
x 1 2 3 4 5
the joint p.d.f. of the Ys and let SC and Sc stand for the minimum of
n
( ) ( )
2 2
S y, = y X = y j EYj
j =1
( y j EYj )
2
fY y : , 2
= exp
2
2
2
2
j =1
n
1 1
= exp S y , . ( )
2
2
2 2 (24)
From (24), it is obvious that for a fixed 2, the maximum of fY(Y; , 2) with
respect to , under C, is obtained when S(y, ) is replaced by Sc. Thus in order
to maximize fY(y; , 2) with respect to both and 2, under C, it suffices to
maximize with respect to 2 the quantity
n
1 1
exp SC ,
2
2 2 2
or its logarithm
n
2
( )
n S 1
log 2 log 2 C 2 .
2 2
16.4 Testing Hypotheses About % $ E(Y) 431
Differentiating with respect to 2 this last expression and equating the deriva-
tive to zero, we obtain
n 1 SC 1 S
+ = 0, so that 2 = C .
2 2
2 4
n
The second derivative with respect to 2 is equal to n/(2 4) (SC/ 6) which for
2 = 2 becomes n3/(2SC2 ) < 0. Therefore
n 2
C
( n
max fY y; , 2 = )
2SC
n
exp .
2
(25)
c
( n
max fY y; , 2 = )
2Sc
n
exp ,
2
(26)
( )
2 2
SC = min S Y, = Y C = Y X C ,
C (28)
C = LSE of under C ,
and
( )
2 2
S c = min S Y, = Y c = Y X c ,
c (29)
c = LSE of under c .
()
g = 2 n 1. (30)
Then
dg ( )=2 ( n+ 2 ) n
< 0,
d n
432 16 The General Linear Hypothesis
so that g() is decreasing. Thus < 0 if and only if g() > g( 0).
Taking into consideration relations (27) and (30), the last inequality
becomes
n r S c SC
> F0 ,
q SC
where F0 is determined by
n r S c SC
PH > F0 = .
q S C
Therefore the LR test is equivalent to the test which rejects H whenever
n r Sc SC
F > F0 , where F = and F0 = Fq , n r . (31)
q SC jx
The statistics SC and Sc are given by (28) and (29), respectively, and the
distribution of F, under H, is Fq,nr, as is shown in the next section.
Now although the F-test in (31) is justified on the basis that it is equivalent
to the LR test, its geometric interpretation illustrated by Fig. 16.3 below
illuminates it even further. We have that C is the best estimator of under
C and c is the best estimator of under c. Then the F-test rejects H
whenever C and c differ by too much; equivalently, whenever Sc SC is too
large (when measured in terms of SC).
Suppose now that rank X = p (< n), and let be the unique (and unbiased)
LSE of . By the facet that (Y )( ) = 0 because Y , one has
that the joint p.d.f. of the Ys is given by the following expression, where y has
been replaced by Y:
n
1 1 X .
( )
2
exp 2
n p
2
+ X
2 2 2
Vn
SC ! squared norm of
Y Vr " q
C 0
Sc ! squared norm of
Vr
c
Sc " S C ! squared norm of
Exercises
16.4.1 Show that the MLE and the LSE of 2, 2, and 2, respectively, are
related as follows:
nr 2
2 = and that 2 = 2
n
where 2 is given in Section 16.4.
16.4.2 Let Yj, j = 1, . . . , n be independent r.v.s where Yj is distributed as
( ( ) )
N + xj x , 2 ; xj , j = 1, . . . , n
are known constants,
1 n
x= xj
n j =1
and , , 2 are parameters. Then:
ii) Derive the LR test for testing the hypothesis H : = 0 against the alterna-
tive A : 0 at level of significance ;
ii) Set up a confidence interval for with confidence coefficient 1 .
Hence
q
Sc SC = Z j2 ,
j =1
so that
j = 1Z j2 = j = 1Z j2 q .
q q
nr
F =
q
j = r +1Z j2 j = r +1Z j2 n r
n n
Z j2 is 2 q2 and Z j2 is 2 n2r .
j =1 j = r +1
It follows that, under H (H), the statistic F is distributed as Fq, nr. The distri-
bution of F, under the alternatives, is non-central Fq, nr which is defined in
terms of a nr
2
and a non-central 2q distribution. For these definitions, the
reader is referred to Appendix II.
From the derivations in this section and previous results, one has the
following theorem.
16.5 Derivation of the Distribution of the F Statistic 435
THEOREM 6 Assume the model described in (22) and let rank X = p (< n). Then the LSEs
and 2 of and 2, respectively, are independent.
PROOF It is an immediate consequence of the corollary to Theorem 4 and
Theorem 5 in Chapter 9.
Finally, we should like to emphasize that the transformation of the r.v.s
Yj, j = 1, . . . , n to the r.v.s Zj, j = 1, . . . , n is only a technical device for deriving
the distribution of F and also for proving unbiasedness of the LSE of 2. For
actually carrying out the test and also for calculating the LSE of 2, the Ys
rather than the Zs are used.
This section is closed with two examples.
EXAMPLE 2 Refer to Example 1 and suppose that the xs are not all equal, and that the Ys
are normally distributed. It follows that rank X = r = 2, and the regression line
is y = 1 + 2x in the xy-plane. Without loss of generality, we may assume that
x1 x2. Then i = 1 + 2 xi, i = 1, 2 are linearly independent, and all j, j = 3, . . . ,
n are linear combinations of 1 and 2. Therefore V2.
Now, suppose we are interested in testing the following hypothesis about
the slope of the regression line; namely H1: 2 = 20, where 20 is a given
number. Hypothesis H1 is equivalent to the hypothesis H1 : i = 1 + 20 xi, i = 1,
2, from which it follows that, under H1 (or H1 ), V1. Thus, r q = 1, or q =
1. The LSE of 1 and 2 are 1C = Y 2Cx, 2C = jn= 1(xj x)2(Yj Y )/
j = 1(xj x) , whereas under H1 (or H1 , the LSE become 1c = Y
n 2 20x, 2c = 20.
Then C = XC and SC = $Y C $2 = jn= 1(Yj Y )2 22C jn= 1(xj x)2. Likewise,
c = X c and Sc = $Y c $ = j = 1(Yj Y
2 n ) + 20 (20 22C) jn= 1(xj x)2. It
2
follows that Sc SC = (2C 20) j = 1(xj x)2, and the test statistic is
2 n
( )
2
1 1 x0 x
Z2 = 2 + + n . (32)
m n 2
x j x ( )
j =1
It follows by Theorem 6 that
Z Z Y0 Y0
=
2 2
( )
2
1 1 2 x0 x
+ + n
n 2 m n 2
x j x ( )
j =1
is tn2 distributed, so that a prediction interval for Y0 with confidence coefficient
1 is provided by
[Y st
0 n 2 ; 2 , Y0 + st n 2 ; 2 ],
( )
2
1 12 x0 x
where s = + + n
n 2 m n 2
x j x ( )
j =1
For a numerical example, refer to the data used in Example 1 and let x0 = 60,
m = 1 and = 0.05. Then Y 0 = 27.0873, s = 1.2703 and t10;0.025 = 2.2281, so that the
prediction interval for Y0 is given by [24.2570, 29.9176].
Exercises
16.5.1 From the discussion in Section 16.5, it follows that the distribution of
[(n r) 2]/2 is 2nr. Thus the statistic 2 can be used for testing hypotheses
about 2 and also for constructing confidence intervals for 2.
(
n 1 ) j = 1 x j2 ( 2 2 )
n
and
Exercises 437
(
n 1 ) and
j =1 ( x j x )
n 2
( 2 2 )
2 2
are distributed as tn2.
Thus the r.v.s figuring in (ii) may be used for testing hypotheses
about 1 and 2 and also for constructing confidence intervals for 1 and
2;
iii) Set up the test for testing the hypothesis H : 1 = 0 (the regression line
passes through the origin) against A : 1 0 at level and also construct a
1 confidence interval for 1;
iv) Set up the test for testing the hypothesis H : 2 = 0 (the Ys are independent
of the xs) against A : 2 0 at level and also construct a 1 confidence
interval for 2;
v) What do the results in (iii) and (iv) become for n = 27 and = 0.05?
16.5.3 Verify relation (32).
16.5.4 Refer to Example 3 and suppose that the r.v.s Y0i = 1 + 2x0 + ei,
i = 1, . . . , m corresponding to an unknown point x0 are observed. It is
assumed that the r.v.s Yj, j = 1, . . . , n and Y0i, i = 1, . . . , m are all
independent.
ii) Derive the MLE x0 of x0;
ii) Set V = Y0 1 2x0, where Y0 is the sample mean of the Y0is and show
that the r.v.
V V
,
(m + n) (m + n 3)
2 2
where
( )
2
1 1 x0 x
= + + n
2 2
2
( )
V
m n
j = 1 x j x
and
1 n
( ) 2
m
( )
2
2 =
Yj 1 2 x j + Y0i Y0 ,
m + n j = 1 i =1
is distributed as tm+n3.
16.5.5 Refer to the model considered in Example 1 and suppose that the xs
and the observed values of the Ys are given by the following table:
438 16 The General Linear Hypothesis
x 5 10 15 20 25 30
i) Find the LSEs of 1, 2 and 2 by utilizing the formulas (19), (19) and
(21), respectively;
ii) Construct confidence intervals for 1, 2 and 2 with confidence coefficient
1 = 0.95 (see Exercises 16.5.1 and 16.5.2(ii));
iii) On the basis of the assumed model, predict Y0 at x0 = 17 and construct a
prediction interval for Y0 with confidence coefficient 1 = 0.95 (see
Example 3).
16.5.6 The following table gives the reciprocal temperatures (x) and the
corresponding observed solubilities of a certain chemical substance.
Assume the model considered in Example 1 and discuss questions (i) and (ii)
of the previous exercise. Also discuss question (iii) of the same exercise for
x0 = 3.77.
16.5.7 Let Zj, j = 1, . . . , n be independent r.v.s, where Zj is distributed as
N(j, 2). Suppose that j = 0 for j = r + 1, . . . , n whereas 1, . . . , r, 2 are
parameters. Then derive the LR test for testing the hypothesis H : 1 = 1
against the alternative A : 1 1 at level of significance .
16.5.8 Consider the r.v.s of Exercise 16.4.2 and transform the Ys to Zs by
means of an orthogonal transformation P whose first two rows are
x1 x x x 1 1 n
( )
2
,..., n , ,..., , s x2 = x j x .
sx sx n n j =1
Then:
Chapter 17
Analysis of Variance
17.1 One-way Layout (or One-way Classification) with the Same Number of
Observations Per Cell
The models to be discussed in the present chapter are special cases of the
general model which was studied in the previous chapter. In this section, we
consider what is known as a one-way layout, or one-way classification, which
we introduce by means of a couple of examples.
440
17.1 One-way Layout (or One-way Classification) 441
(
Y = Y11 , . . . , Y1 J ; Y21 , . . . , Y2 J ; . . . ; YI , . . . , YI )
e = (e , . . . , e ; e
11 1J 21 )
, . . . , e2 J ; . . . ; eI , . . . , eI
= ( , . . . , )
1 I
442 17 Analysis of Variance
1 2 j J"1 J
(i, jth)
i
cell
I"1
Figure 17.1
64447 I 4448
1 0 0 0
1 0 0 0
J
%
0 1 0 0
0 J
X = 0 1 0 0
0 0 0 1
J
0 0 0 1
J 0 0 0
0 J 0 0
S = XX = = JIp
0 0 0 J
and
J J J
XY = Y1 j , Y2 j , . . . , YIJ ,
j =1 j =1 j =1
so that, by (9), Chapter 16,
1 J 1 J 1 J
= S 1 XY = Y1 j , Y2 j , . . . , YIJ .
J j =1 J j =1 J j =1
Therefore the LSEs of the s are given by
1 J
i = Yi ., where Yi . = Yij , i = 1, . . . , I . (2)
J j =1
Next, one has
647 J 48 647 J 48 647 J 48
= EY = 1 , . . . , 1 ; 2 , . . . , 2 ; . . . ; I , . . . , I ,
so that, under the hypothesis H : 1 = = I (= , unspecified), V1. That is,
r q = 1 and hence q = r 1 = p 1 = I 1. Therefore, according to (31) in
Chapter 16, the F statistic for testing H is given by
F = =
(
n r Sc SC I J 1 Sc SC
.
) (3)
q SC I 1 SC
Now, under H, the model becomes Yij = + eij and the LSE of is obtained by
differentiating with respect to the expression
I J
( )
2 2
Y c = Yij .
i =1 j =1
I J I J
( ) = (Y )
2 2 2
SC = Y C = Yij ij ,C ij Yi .
i =1 j =1 i =1 j =1
and
I J I J
( ) = (Y )
2 2 2
Sc = Y c = Yij ij ,c ij Y. . .
i =1 j =1 i =1 j =1
(Y ) = Y
2 2
ij Yi . ij JY i2. ,
j =1 j =1
so that
I J I J I
( ) = Y
2
SC = SSe , where SSe = Yij Yi . 2
ij J Y i2. . (5)
i =1 j =1 i =1 j =1 i =1
Likewise,
I J I J
( ) = Y
2
Sc = SST , where SST = Yij Y. . 2
ij IJY .2 . , (6)
i =1 j =1 i =1 j =1
( )
2
Sc SC = J Y i2. IJY .2 . = J Yi2. IY .2 . = J Yi . Y. . ,
i =1 i =1 i =1
since
1 I 1 J 1 I
Y. . = ij = Yi . .
Y
I i =1 J j =1 I i =1
That is,
Sc SC = SSH, (7)
where
I I
( )
2
SSH = J Yi . Y. . = J Y i2. IJY .2 . .
i =1 i =1
F =
(
I J 1 SS H MS H
= ,
) (8)
I 1 SS e MS e
where
SS H SS e
MS H = , MS e =
I 1 I ( J 1)
and SSH and SSe are given by (7) and (5), respectively. These expressions are
also appropriate for actual calculations. Finally, according to Theorem 4 of
Chapter 16, the LSE of 2 is given by
SS e
2 = . (9)
I ( J 1)
17.1 One-way Layout (or One-way Classification) 445
source of degrees of
variance sums of squares freedom mean squares
I
SS H
SS H = J (Y i . Y. . )
2
between groups I1 MS H =
i =1 I 1
SS e
(
I J
SS e = Y ij Y i . ) I( J 1)
2
within groups MS e =
i =1 j =1 I( J 1)
(
I J
SS T = Y ij Y. . )
2
total IJ 1
i =1 j =1
REMARK 1 From (5), (6) and (7) it follows that SST = SSH + SSe. Also from
(6) it follows that SST stands for the sum of squares of the deviations of the Yijs
from the grand (sample) mean Y. .. Next, from (5) we have that, for each i,
Jj = 1(Yij Yi.)2 is the sum of squares of the deviations of Yij, j = 1, . . . , J within
the ith group. For this reason, SSe is called the sum of squares within groups.
On the other hand, from (7) we have that SSH represents the sum of squares of
the deviations of the group means Yi. from the grand mean Y. . (up to the factor
J). For this reason, SSH is called the sum of squares between groups. Finally,
SST is called the total sum of squares for obvious reasons, and as mentioned
above, it splits into SSH and SSe. Actually, the analysis of variance itself derives
its name because of such a split of SST.
Now, as follows from the discussion in Section 5 of Chapter 16, the
quantities SSH and SSe are independently distributed, under H, as 2 I1 2
and
I(J1), respectively. Then SST is IJ1 distributed, under H. We may
2 2 2 2
Exercise
17.1.1 Apply the one-way layout analysis of variance to the data given in the
table below.
A B C
i = j = 0.
i =1 j =1
is the contribution of the ith fertilized and the jth column effect is the contri-
bution of the jth variety of the commodity in question.
From the preceding two examples it follows that the outcome Yij is af-
fected by two factors, machines and workers in Example 4 and fertilizers and
varieties of agricultural commodity in Example 5. The I objects (machines or
fertilizers) and the J objects (workers or varieties of an agricultural commod-
ity) associated with these factors are also referred to as levels of the factors.
The same interpretation and terminology is used in similar situations through-
out this chapter.
In connection with model (10), there are the following three problems to
be solved: Estimation of ; i, i = 1, . . . , I; j, j = 1, . . . , J; testing the hypothesis
HA : 1 = = I = 0 (that is, there is no row effect), HB : 1 = = J = 0 (that is,
there is no column effect) and estimation of 2.
We first show that model (10) is a special case of the model described in
(6) of Chapter 16. For this purpose, we set
(
Y = Y11 , . . . , Y1 J ; Y21 , . . . , Y2 J ; . . . ; YI 1 , . . . , J IJ )
e = (e , . . . , e ; e , . . . , e ; . . . ; e
11 1J 21 2J I1 )
, . . . , e IJ
= ( ; , . . . , ; , . . . , )
1 I 1 J
and
%
1 1 0 0 0 1 0 0 0
1 1 0 0 0 0 1 0 0
J
1 1 0 0 0 0 0 0 0 1
1 0 1 0 0 1 0 0 0
1 0 1 0 0 0 1 0 0
J
X =
1 0 1 0 0 0 0 0 0 1
1 0 0 0 0 1 1 0 0 0
1
0 0 0 0 1 0 1 0 0
J
1 0 0 0 0 1 0 0 0 0 1
448 17 Analysis of Variance
Y = X + e with n = IJ and p = I + J + 1.
It can be shown (see also Exercise 17.2.1) that X is not of full rank but rank
X = r = I + J 1. However, because of the two independent restrictions
I J
= i j = 0,
i =1 j =1
imposed on the parameters, the normal equations still have a unique solution,
as is found by differentiation.
In fact,
( )
I J
( ) ( )
2
S Y, = Yij. i j and S Y, = 0
i =1 j =1
implies = Y. ., where Y. . is again given by (4);
a i
S Y, = 0 ( )
implies i = Yi . Y. ., where Yi. is given by (2) and (/j)S(Y, ) = 0 implies
j = Y.j Y. ., where
1 I
Yij .
I i =1
Y. j =
Summarizing these results, we have then that the LSEs of , i and j are,
respectively,
= Y. . , i = Yi. Y. . , i = 1, . . . , I , j = Y. j Y. . , j = 1, . . . , J . (11)
where Yi ., i = 1, . . . , I are given by (2), Y. . is given by (4) and
1 I
Y. j = Yij , j = 1, . . . , J .
I i =1
(12)
(Y )
2
ij j
i =1 j =1
A = Y. . = , j, A = Y. j Y. . = j , j = 1, . . . , J . (13)
Therefore relations (28) and (29) in Chapter 16 give by means of (11) and (12)
I J I J
( ) = (Y )
2 2 2
SC = Y C = Yij ij ,C ij Yi . Y. j + Y. .
i =1 j =1 i =1 j =1
and
( ) = (Y
I J I J
)
2 2 2
Sc = Y c
A A
= Yij ij ,c A ij Y. j .
i =1 j =1 i =1 j =1
[( )]
I J
) (
2
SC = SSe = Yij Y. j Yi . Y. .
i =1 j =1
I J I
( ) ( )
2 2
= Yij Y. j J Yi . Y. . (14)
i =1 j =1 i =1
because
I J I J
( )
2
= J Yi . Y. . .
i =1
Therefore
I I
( )
2
Sc SC = SS A , where SS A = J 2i = J Yi . Y. .
A
i =1 i =1
I
= J Yi 2. IJY.2. . (15)
i =1
It follows that for testing HA, the F statistic, to be denoted here by FA, is given
by
FA =
(I 1)( J 1) SS A
=
MS A
, (16)
I 1 SSe MSe
where
SS A SS e
MS A = , MS e =
I 1 ( I 1 J 1 )( )
and SSA, SSe are given by (15) and (14), respectively. (However, for an expres-
sion of SSe to be used in actual calculations, see (20) below.)
450 17 Analysis of Variance
FB =
(I 1)( J 1) SS B
=
MSB
, (17)
J 1 SSe MSe
where MSB = SSB /(J 1) and
J J J
( )
2
SSB = Sc SC = I j2 = I Y. j Y. .
B
= I Y. 2j IJY.2. . (18)
j =1 j =1 j =1
The quantities SSA and SSB are known as sums of squares of row effects and
column effects, respectively.
Finally, if we set
I J I J
( ) = Y
2
SST = Yij Y. . 2
ij IJY.2. , (19)
i =1 j =1 i =1 j =1
we show below that SST = SSe + SSA + SSB from where we get
SSe = SST SSA SSB. (20)
Relation (20) provides a way of calculating SSe by way of (15), (18) and (19).
Clearly,
[( )]
I J
) ( ) (
2
SSe = Yij Y. . Yi . Y. . Y. j Y. .
i =1 j =1
I J I
( ) ( )
2 2
= Yij Y. . + J Yi . Y. .
i =1 j =1 i =1
J I J
( ) ( )( )
2
+ I Y. j Y. . 2 Yij Y. . Yi . Y. .
j =1 i =1 j =1
I J
(
2 Yij Y. . Y. j Y. .
i =1 j =1
)( )
I J
(
+ 2 Yi . Y. . Y. j Y. . = SSTT SS A SSB
i =1 j =1
)( )
because
I J I J
( )
2
= J Yi . Y. . = SS A ,
i =1
Exercises 451
Table 2 Analysis of Variance for Two-way Layout with One Observation Per Cell
source of degrees of
variance sums of squares freedom mean squares
I I
SS A
SS A = J 2i = J (Y i . Y. . )
2
rows I1 MS A =
i =1 i =1 I 1
SS B
(
J J
SS B = I 2j = I Y. j Y. . )
2
columns J1 MS B =
j =1 j =1 J 1
SS e
(
I J
SS e = Y ij Y i . Y. j + Y. . ) ( I 1) ( J 1)
2
residual MS e =
i =1 j =1 ( I 1)( J 1)
(
I J
SS T = Y ij Y. . )
2
total IJ 1
i =1 j =1
I J J I
(Y ij )(
Y. . Y. j Y. . = Y. j Y. . ) ( ) (Y ij Y. . )
i =1 j =1 j =1 i =1
J
( )
2
= I Y. j Y. . = SSB
j =1
and
I J I J
(Y i. )(
Y. . Y. j Y. . = Yi . Y. . ) ( ) (Y .j )
Y. . = 0.
i =1 j =1 i =1 j =1
The pairs SSe, SSA and SSe, SSB are independent 2 2 distributed r.v.s with
certain degrees of freedom, as a consequence of the discussion in Section 5 of
Chapter 16. Finally, the LSE of 2 is given by
2 = MSe. (21)
This section is closed by summarizing the basic results in Table 2 above.
Exercises
17.2.1 Show that rank X = I + J 1, where X is the matrix employed in
Section 2.
17.2.2 Apply the two-way layout with one observation per cell analysis of
variance to the data given in the following table (take = 0.05).
452 17 Analysis of Variance
3 7 5 4
1 2 0 2
1 2 4 0
= =
i j ij = ij = 0
i =1 j =1 j =1 i =1
(
Y = Y111 , . . . , Y11K ; . . . ; Y1 J 1 , . . . , Y1 JK ; . . . ; YIJ 1 . . . , YIJK )
e = (e 111 )
, . . . , e11K ; . . . ; e1 J 1 , . . . , e1 JK ; . . . ; e IJ 1 , . . . , e IJK
= ( 11 , . . . , 1 J ; . . . ; I 1 , . . . . IJ )
and
17.3 Two-way Layout (Classification) with K ( 2) Observations Per Cell 453
64444447 IJ 4444448
%
1 0 0 0
K
1 0 0 0
0 1 0 0
K
0 1 0 0
J
6447448
%
0 0 1 0 0
K
X = 0 0 1 0 0 ,
I 1 J +1
64444
7
44448
0 0 1 0 0
K
0 0 1 0 0
%
0 0 1
K
0 0 1
it is readily seen that
Y = X
+e with n = IJK and p = IJ, (22)
so that model (22) is a special case of model (6) in Chapter 16. From the form
of X it is also clear that rank X = r = p = IJ; that is, X is of full rank (see also
Exercise 17.3.1). Therefore the unique LSEs of the parameters involved are
obtained by differentiating with respect to ij the expression
I J K
( ) ( )
2
S Y, = Yijk ij .
i =1 j =1k =1
We have then
454 17 Analysis of Variance
ij = Yij. , i = 1. . . . , I ; j = 1, . . . , J . (23)
Next, from the fact that ij = + i + j + ij and on the basis of the assumptions
made in (22), we have
= . ., i = i. . ., j = .j . ., ij = ij i. .j + . ., (24)
by employing the dot notation already used in the previous two sections.
From (24) we have that , i, j and ij are linear combinations of the param-
eters ij. Therefore, by the corollary to Theorem 3 in Chapter 16, they are
, i, j, ij, are given by the above-mentioned linear
estimable, and their LSEs
combinations, upon replacing ij by their LSEs. It is then readily seen that
= Y. . . , i = Yi. . Y. . . , j = Y. j. Y. . . ,
ij = Yij. Yi. . Y. j. + Y. . . ,
i = 1, . . . , I ; j = 1, . . . , J . (25)
Now from (23) and (25) it follows that
ij = + i + j, + ij. Therefore
2
( )
I J K I J K
( )
2
SC = Yijk ij = Yijk i j ij .
i =1 j =1k =1 i =1 j =1 k =1
Next,
(
Yijk ij = Yijk i j ij + ) ( )
+ ( ) + ( ) + (
i i j j ij + ij )
and hence
I J K
( ) ( ) ( )
2 2
S Y, = Yijk ij = SC + IJK
i =1 j =1k =1
( )
I J I J
( ) ( )
2 2 2
+ JK i i + IK j j + K ij ij , (26)
i =1 j =1 i =1 j =1
because, as is easily seen, all other terms are equal to zero. (See also Exercise
17.3.2.)
From identity (26) it follows that, under the hypothesis
HA :1 = = I = 0,
the LSEs of the remaining parameters remain the same as those given in (25).
It follows then that
I I
Sc A = SC + JK i2 , so that Sc A SC = JK i2 .
i =1 i =1
Thus for testing the hypothesis HA the sum of squares to be employed are
I J K 2
I I
( )
2
Sc SC = SS A = JK i2 = JK Yi . . Y. . .
A
i =1 i =1
I
= JK Yi2. . IJKY.2. . . (28)
i =1
FA =
(
IJ K 1 SS A MS A
=
)
, (29)
I 1 SS e MS e
where
SS A SS e
MS A = , MS e =
I 1 (
IJ K 1 )
and SSA, SSe are given by (28) and (27), respectively.
For testing the hypothesis
HB : 1 = = J = 0,
we find in an entirely symmetric way that the F statistic to be employed is
given by
FB =
(
IJ K 1 SS B MS B
= ,
) (30)
J 1 SS e MS e
where
J J 2
MSB =
SSB
J 1
and SSB = IK j2 = IK Y. j . Y. . .
j =1 j =1
( )
J
= IK Y. 2j . IJKY.2. . . (31)
j =1
F AB =
(
IJ K 1
=
)
SS AB MS AB
, (32)
( )(
I 1 J 1 SS e MS e )
where
456 17 Analysis of Variance
I J
SS AB
MS AB = and SS AB = K ij2
( )(
I 1 J 1 ) i =1 j =1
I J
( )
2
= K Yij . Yi . . Y. j . + Y. . . . (33)
i =1 j =1
(However, for an expression of SSAB suitable for calculations, see (35) below.)
Finally, by setting
I J K 2
SST = Yijk Y. . .
i =1 j =1k =1
( )
I J K
= Yijk2 IJKY.2. . , (34)
i =1 j =1k =1
we can show (see Exercise 17.3.3) that SST = SSe + SSA + SSB + SSAB, so that
SSAB = SST SSe SSA SSB. (35)
Relation (35) is suitable for calculating SSAB in conjunction with (27), (28), (31)
and (34).
Of course, the LSE of 2 is given by
2 = MSe. (36)
Once again the main results of this section are summarized in a table, Table 3.
The number of degrees of freedom of SST is calculated by those of SSA,
SSB, SSAB and SSe, which can be shown to be independently distributed as 2 2
r.v.s with certain degrees of freedom.
EXAMPLE 6 For a numerical application, consider two drugs (I = 2) administered in three
dosages (J = 3) to three groups each of which consists of four (K = 4) subjects.
Certain measurements are taken on the subjects and suppose they are as
follows:
X111 = 18 X121 = 64 X131 = 61
X112 = 20 X122 = 49 X132 = 73
X113 = 50 X123 = 35 X133 = 62
X114 = 53 X124 = 62 X134 = 90
X211 = 34 X221 = 40 X231 = 56
X212 = 36 X222 = 63 X232 = 61
X213 = 40 X223 = 35 X233 = 58
X214 = 17 X224 = 63 X234 = 73
For this data we have
= 50.5416; 1 = 2.5417, 2 = 2.5416; 1 = 17.0416, 2 = 0.8334,
3 = 16.2084; 11 = 0.7917, 12 = 1.4167, 13 = 2.2083, 21 = 0.7916,
22 = 1.4166, 23 = 2.2084
Exercises 457
Table 3 Analysis of Variance for Two-way Layout with K ( 2) Observations Per Cell
source of degrees of
variance sums of squares freedom mean squares
I I
SS A
SS A = JK i2 = JK (Y i . . + Y. . . )
2
A main effects I1 MS A =
i =1 i =1 I 1
SS B
(
J J
SS B = IK j2 = IK Y. j . Y. . . )
2
B main effects J1 MS B =
j =1 j =1 J 1
SS AB
(
I J I J
SS AB = K 2ij = K Yij . Yi . . Y. j . + Y. . . ) ( I 1)( J 1)
2
AB interactions MS AB =
i =1 j =1 i =1 j =1 ( I 1)( J 1)
SS e
(
I J K
SS e = Y ijk Y ij . ) IJ ( K 1)
2
error MS e =
i =1 j =1 k =1 IJ ( K 1)
(
I J K
SS T = Y ijk Y. . . )
2
total IJK 1
i =1 j =1 k =1
and
FA = 0.8471, FB = 12.1038, FAB = 0.1641.
Thus for = 0.05, we have F1,18;0.05 = 4.4139 and F2,18;0.05 = 3.5546; we accept HA,
reject HB and accept HAB. Finally, we have 2 = 183.0230.
The models analyzed in the previous three sections describe three experi-
mental designs often used in practice. There are many others as well. Some of
them are taken from the ones just described by allowing different numbers of
observations per cell, by increasing the number of factors, by allowing the row
effects, column effects and interactions to be r.v.s themselves, by randomizing
the levels of some of the factors, etc. However, even a brief study of these
designs would be well beyond the scope of this book.
Exercises
17.3.1 Show that rank X = IJ, where X is the matrix employed in Section
17.3.
17.3.2 Verify identity (26).
458 17 Analysis of Variance
17.3.3 Show that SST = SSe + SSA + SSB + SSAB, where SSe, SSA, SSB, SSAB and
SST are given by (27), (28), (31), (33) and (34), respectively.
17.3.4 Apply the two-way layout with two observations per cell analysis of
variance to the data given in the table below (take = 0.05).
95 117 60 138 94
ci i with ci = 0.
i =1 i =1
I
1 I
( )
= ci Yi ., 2 = ci2 MSe and S 2 = I 1 FI 1,n I ;a ,
i =1 J i =1
( )
where n = IJ. We will show in the sequel that the interval [ S ( ), +
S ( )] is a confidence interval with confidence coefficient 1 for all con-
trasts . Next, consider the following definition.
DEFINITION 2 Let and be as above. We say that is significantly different from zero,
according to the S (for Scheff) criterion, if the interval [ S ( ), + S
( )] does not contain zero; equivalently, if | | > S ( ).
Now it can be shown that the F test rejects the hypothesis H if and only if
there is at least one contrast such that is significantly different from zero.
Thus following the rejection of H one would construct a confidence inter-
val for each contrast and then would proceed to find out which contrasts are
responsible for the rejection of H starting with the simplest contrasts first.
The confidence intervals in question are provided by the following
theorem.
THEOREM 1 Refer to the one-way layout described in Section 17.1 and let
I I
= ci i , c i = 0,
i =1 i =1
so that
1 I
( )
2 = ci2 MSe ,
J i =1
where MSe is given in Table 1. Then the interval [ S ( ), + S ( )] is a
confidence interval simultaneously for all contrasts with confidence coeffi-
cients 1 , where S2 = (I 1)FI 1,nI; and n = IJ.
PROOF Consider the problem of maximizing (minimizing) (with respect to
ci, i = 1, . . . , I) the quantity
I
( )
f c1 , . . . , c I =
1
c (Y i i. i )
1 I 2 i =1
ci
J i =1
subject to the contrast constraint
I
ci = 0.
i =1
Now, clearly, f(c1, . . . , cI) = f(c1, . . . , cI) for any > 0. Therefore the maxi-
mum (minimum) of f(c1, . . . , cI), subject to the restraint
I
ci = 0,
i =1
is the same with the maximum (minimum) of f( c1, . . . , cI) = f(c1, . . . , cI),
ci = ci, i = 1, . . . , I subject to the restraints
460 17 Analysis of Variance
c = 0 i
i =1
and
1 I 2
ci = 1.
J i =1
Hence the problem becomes that of maximizing (minimizing) the quantity
I
( ) (
q c1 , . . . , c I = ci Yi . i ,
i =1
)
subject to the constraints
I I
ci = 0 and ci2 = J .
i =1 i =1
Thus the points which maximize (minimize) q(c1, . . . , cI) are to be found on
the circumference of the circle which is the intersection of the sphere
I
ci2 = J
i =1
and the plane
I
ci = 0
i =1
which passes through the origin. Because of this it is clear that q(c1, . . . , cI) has
both a maximum and a minimum. The solution of the problem in question will
be obtained by means of the Lagrange multipliers. To this end, one considers
the expression
I I I
( ) ( )
h = h c1 , . . . , c I ; 1 , 2 = ci Yi . i + 1 ci + 2 ci2 J
i =1 i =1 i =1
and maximizes (minimizes) it with respect to ci, i = 1, , I and 1, 2. We
have
h
= Yk . k + 1 + 2 2 ck = 0, k = 1, . . . , I
ck
h I
= ci = 0 (37)
1 i =1
h I
= c i J = 0.
2
2 i =1
ck =
1
2 2
(
k Yk . 1 ) k = 1, . . . , I . (38)
( )
1 2
1 = . Y. . and 2 = i . Yi . + Y. . .
2 J i =1
ck =
(
J k . Yk . + Y. . ) , k = 1, . . . , I .
( )
2
J i =1 i . Yi . + Y. .
I
Next,
)[( )]
I I
(Y k. )( )
k k . Yk . + Y. . = k Yk . ( k ) (
Yk . . Y. .
k =1 k =1
I I
( ) + ( Y ) ( )
2
= k Yk . . .. k Yk .
k =1 k =1
I 2
( ) ( )
2
= k Yk . I . Y. .
k =1
2
[( )]
I
= k Yk . . Y. .
k =1
) ( 0.
Therefore
I
(
c Yi . i )
( )
2
J i =1 i . Yi . + Y. .
I i =1 i
1 J i =1 ci2
I
( )
2
J i =1 i . Yi . + Y. .
I
(39)
for all ci, i = 1, . . . , I such that
I
ci = 0.
i =1
Now we observe that
[( )]
I I
( ) ) (
2 2
J i . Yi . + Y. . = J Yi . Y. . i .
i =1 i =1
is 2 2I1 distributed (see also Exercise 17.4.1) and also independent of SSe
which is 2 2nI distributed. (See Section 17.1.) Therefore
(
J i =1 i . Yi . + Y. .
I
) (I 1)
MSe
is FI1,nI distributed and thus
(I 1)F ( )
2
MSe J i =1 i . Yi . + Y. .
I
P I 1 , n I ;
( ) (I 1)F
2
= P J i =1 i . Yi . + Y. .
I
I 1 , n I ;a MSe = 1 . (40)
462 17 Analysis of Variance
P
(I 1)F I 1 , n I ;
I I
(
1 J i = 1ci2 MSe i = 1ci Yi . i )
(I 1)F I 1 , n I ; 1 J i = 1ci2 MSe = 1 ,
I
for all ci, i = 1, . . . , I such that iI= 1ci = 0, or equivalently,
[ ( )
P S + S = 1 , ( )]
for all contrasts , as was to be seen. (This proof has been adapted from the
paper A simple proof of Scheffs multiple comparison theorem for contrasts
in the one-way layout by Jerome Klotz in The American Statistician, 1969,
Vol. 23, Number 5.)
In closing, we would like to point out that a similar theorem to the one just
proved can be shown for the two-way layout with (K 2) observations per cell
and as a consequence of it we can construct confidence intervals for all con-
trasts among the s, or the s, or the s.
Exercises
( )
2
Show that the quantity J i =1 i . Yi . + Y. . mentioned in Sec-
I
17.4.1
tion 17.4 is distributed as 2 2I1, under the null hypothesis.
17.4.2 Refer to Exercise 17.1.1 and construct confidence intervals for all
contrasts of the s (take 1 = 0.95).
18.1 Introduction 463
Chapter 18
18.1 Introduction
In this chapter, we introduce the Multivariate Normal distribution and estab-
lish some of its fundamental properties. Also, certain estimation and inde-
pendence testing problems closely connected with it are discussed.
Let Yj, j = 1, . . . , m be i.i.d. r.v.s with common distribution N(0, 1).
Then we know that for any constants cj, j = 1, . . . , m and the r.v. mj= 1 cjYj +
is distributed as N(, mj= 1 c2j ). Now instead of considering one (non-
homogeneous) linear combination of the Ys, consider k such combinations;
that is,
m
X i = cij Yj + i , i = 1, . . . , k, (1)
j =1
or in matrix notation
X = CY + , (2)
where
(
X = X1 , . . . , X k , ) ( )( )
C = c ij k m ,
(
Y = Y1 , . . . , Ym , ) and = ( , . . . , ) .
1 k
463
464 18 The Multivariate Normal Distribution
From (2) and relation (10), Chapter 16, it follows that EX = and / x =
/ YC = CImC = CC; that is,
C
EX = , (
/ x or just / = CC . ) (3)
We now proceed to finding the ch.f. x of the r. vector X. For t = (t1, . . . , tk)
! k, we have
() ( ) [ ( )]
X t = E exp it X = E exp it CY + = exp it E exp it CY . ( ) (4)
But
k k
t CY = t j c j 1 , . . . ,
j =1
t j c jm (Y1 , . . . , Ym )
j =1
k k
= t j c j 1 Y1 + + t j c jm Ym
j =1 j =1
and hence
k k
( )
E exp it CY = Y t j c j 1 + + Y t j c jm
j =1
1
j =1
m
2
2
1 k 1 k
= exp t j c j 1 t j c j m
2 j =1 2 j =1
1
= exp t CCt (5)
2
because
2 2
k k
j j1
t c + + t j c j m = t CCt.
j =1 j =1
Therefore by means of (3)(5), we have the following result.
THEOREM 1 The ch.f. of the r. vector X = (X1, . . . , Xk), which has the k-Variate Normal
distribution with mean and covariance matrix / , is given by
()
1
x t = exp it t / t .
2
(6)
1
() ( ) ( ) ( )
k 2 1 2
fX x = 2 exp x / 1 x , x ! k ,
/ (7)
2
where / = CC and |
/ | denotes the determinant of / .
PROOF From X = CY + we get CY = X , which, since C is non-singular,
gives
Y = C 1 X . ( )
Therefore
1 k
() [ (
fX x = fY C 1 x )] C 1
= 2 ( )
k 2
exp y 2j C 1 .
2 j =1
But
y = ( x ) (C ) (C )(x ) (see also Exercise 18.1.2),
k
2 1 1
j
j =1
and (C ) (C ) = (C ) C = (CC ) = / .
1 1 1
C 1 = C 1 1 1 1
(8)
Therefore
1
() ( ) ( ) ( ) .
k 2
fX x = 2 C 1 exp x / 1 x
2
Finally, from / = CC, one has |
/ | = |C| |C| = |C| , so that ||C|| = |
1
/ | . Thus 2
2
1
() ( ) ( ) ( ) ,
k 2 1 2
fX x = 2 / exp x / 1 x
2
as was to be seen.
REMARK 2 A k-Variate Normal distribution with p.d.f. given by (7) is called
a non-singular k-Variate Normal. The use of the term non-singular corre-
/ | 0; that is, the fact that / is of full rank.
sponds to the fact that |
COROLLARY 1 In the theorem, let k = 2. Then X = (X1, X2) and the joint p.d.f. of X1, X2 is the
Bivariate Normal p.d.f.
PROOF By Remark 1, both X1 and X2 are normally distributed and let X1~
N(1, 21) and X2 N(2, 22). Also let be the correlation coefficient of X1 and
X2. Then their covariance matrix / is given by
12 1 2
/ =
1 2 22
/ | = 21 22(1 2), so that
and hence |
1 22 1 2
/ 1 = .
(
12 22 1 2 )
1 2 12
466 18 The Multivariate Normal Distribution
Therefore
( )( )
12 22 1 2 x / 1 x ( )
22 1 2 x1 1
(
= x1 1 , x 2 2
1 2
) 21 x 2 2
x 1
(( ) ( )
= x1 1 22 x 2 2 1 2 , x1 1 1 2 + x 2 2 12 1 (
x2 2
) ( ) )
( ) ( )( ) ( )
2 2
= x1 1 22 2 x1 1 x 2 2 1 2 + x 2 2 12 .
Hence
x1 1
2
fX (x , x ) = 1
exp
1
1 ,X 2 1 2
2 1 2 1 2
2 1
2
( ) 1
x 2
2
2
1 2
(
x1 1 x 2 2 + 2 )(
2
,)
as was to be shown.
COROLLARY 2 The (normal) r.v.s Xi, i = 1, . . . , k are independent if and only if they are
uncorrelated.
PROOF The r.v.s Xi, i = 1, . . . , k are uncorrelated if and only if / is a
diagonal matrix and its diagonal elements are the variances of the Xs. Then
/ | = 21 2n. On the other hand, |
| / | / 1 is also a diagonal matrix with the jth
diagonal element given by ij i , so that / 1 itself is a diagonal matrix with the
2
Exercises
18.1.1 Use Definition 1 herein in order to conclude that the LSE of in (9)
of Chapter 16 has the n-Variate Normal distribution with mean and
covariance matrix 2S1. In particular, ( 1, 2), given by (19) and (19) of
18.2 Some Properties of Multivariate Normal
18.1 Distributions
Introduction 467
Chapter 16, have the Bivariate Normal distribution with means and variances
E 1 = 1, E 2 = 2 and
2 j = 1 x 2j
n
( )
1 =
2
( )
, 2 2 =
2
( ) (x )
2 2
nj = 1 x j x
n n
j =1 j x
x
n
j
and correlation coefficient equal to j =1
.
n x
n 2
j =1 j
() [ ( )] (
)
Y t = E exp t Y = E exp t AX = E exp A t X = X A t ,
( ) ( )
so that by means of (6), we have
() ( ) 1
( ) ( ) 1
Y t = expi A t A t / A t = expit A t A/ A t
2 2
( ) ( )
and this last expression is the ch.f. of the m-Variate Normal with mean A and
covariance matrix A / A, as was to be seen. The particular case follows from
the general one just established.
THEOREM 4 For j = 1, . . . , n, let Xj be independent N(j, / j) k-dimensional r. vectors and let
cj be constants. Then the r. vector
n n n
X = c j X j is N c j j , c 2j / j
j =1 j =1 j =1
468 18 The Multivariate Normal Distribution
()
X t = c X t = Xj c j t .
j =1
j j
() j =1
( )
But
( ) 1
( )
X c j t = expi c jt j c jt / j c jt
2
( ) ( )
j
1
(
= expit c j j t c 2j / j t ,
2
) ( )
so that
n 1 n
()
X t = exp it c j j t c 2j / j t.
j = 1 2 j =1
COROLLARY , / ) k-dimensional r. vectors and let
For j = 1, . . . , n, let Xj be independent N(
1 n
X= Xj.
n j =1
is N(
Then X , (1/n)
/ ).
PROOF In the theorem, taken j = , / j = / and cj = 1/n, j = 1, . . . , n.
THEOREM 5 / 1(X ).
, / ) and set Q = (X )
Let X = (X1, . . . , Xk) be non-singular N(
Then Q is an r.v. distributed as k.2
the integral is equal to one and we conclude that Q (t) = (1 2it)k/2 which is the
ch.f. of 2k.
REMARK 4 Notice that Theorem 5 generalizes a known result for the one-
dimensional case.
Exercise
18.2.1 Consider the k-dimensional random vectors Xn = (X1n, . . . , Xkn), n =
1, 2, . . . and X = (X1, . . . , Xk) with d.f.s Fn, F and ch.f.s n, , respectively.
Then we say that {Xn} converges in distribution to X as n , and we write
Xn d X, if Fn ( x) F ( x) for all x ! k for which F is continuous (see
n n
also Definition 1(iii) in Chapter 8). It can be shown that a multidimensional
version of Theorem 2 in Chapter 8 holds true. Use this result (and also
Theorem 3 in Chapter 6) in order to prove that X n d
n
X, if and only if
X n X, for every = (, . . . , k) ! . In particular, X n
d k
d
X,
n n
where X is distributed as N( , / ) if and only if {
Xn} converges in distribution
as n , to an r.v. Y which is distributed as Normal with mean and
variance / for every ! k.
( ) ( )( )
S = Sij , where Sij = X ki X i X kj X j , i, j = 1, . . . , k.
k =1
Then
and S are sufficient for (
i) X , / );
ii) X and S/(n 1) are unbiased estimators of and / , respectively;
and S/n are MLEs of and / , respectively.
iii) X
Now suppose that the joint distribution of the r.v.s X and Y is the
Bivariate Normal distribution. That is,
470 18 The Multivariate Normal Distribution
( )
fX ,Y x, y =
1
e q 2 ,
2 1 2 1 2
1 x 2 x 1 y 2 y 2
2
q= 1
2 + .
1 2 1 1 2 2
Then by Corollary 2 to Theorem 2, the r.v.s X and Y are independent if and
only if they are uncorrelated. Thus the problem of testing independence for X
and Y becomes that of testing the hypothesis H : = 0. For this purpose,
consider an r. sample of size n(Xj, Yj), j = 1, . . . , n, from the Bivariate Normal
under consideration. Then their joint p.d.f., f, is given by
n
1
e Q 2 ,
2 1 2
1 2
where
n
Q = qj
j =1
and
x 2 x j 1 y j 2 y j 2
2
1 j 1
qj = 2 + ,
1 2 1
1 2 2
j = 1, . . . , n. (9)
For testing H, we are going to employ the LR test. And although the
MLEs of the parameters involved are readily given by Theorem 6, we
choose to derive them directly. For this purpose, we set g( ) for log f(
)
considered as a function of the parameter , where the parameter space
is given by
( )
= = 1 , 2 , 12 , 22 , ! 5 ; 1 , 2 ! ; 12 , 22 > 0; 1 < p < 1,
whereas under H, the parameter space becomes
( )
= = 1 , 2 , 12 , 22 , ! 5 ; 1 , 2 ! ; 12 , 22 > 0; = 0 .
We have
() (
g = g = g 1 , 2 , 12 , 22 , ; x1 , . . . , xn , y1 , . . . , yn )
( )
n
n n n 1
= n log 2 log 12 log 22 log 1 2 q j , (10)
2 2 2 2 j =1
18.3 Estimation of and / and Test18.1
of Independence
Introduction 471
1 1
2 1 = y x
2 1 2 1
1 1
. (See also Exercise 18.3.1.) (11)
1 2 = x y.
1 2 1 2
Solving system (11) for 1 and 2, we get
1 = x , 2 = y. (12)
Now let us set
1 n
( )
2
Sx =
n j =1
xj x ,
n
1 n
1
( ) ( )( )
2
Sy = yj y
n j =1
and Sxy = x j x y j y .
n j =1
(13)
Then, differentiating g with respect to 21 and 22, equating the partial deriva-
tives to zero and replacing 1 and 2 by 1 and 2, respectively, we obtain after
some simplifications
1
Sx Sxy = 1 2
12 1 2
1 2
. (See also Exercise 18.3.2.) (14)
Sy Sxy = 1 .
22 1 2
Next, differentiating g with respect to and equating the partial derivative to
zero, we obtain after some simplifications (see also Exercise 18.3.3)
1 2 1 1
2
Sx S xy + 2 S y + S xy = 0. (15)
1 12
1 2 2 1 2
In (14) and (15), solving for 21, 22 and , we obtain (see also Exercise 18.3.4)
S xy
12 = S x , 22 = S y , = . (16)
S x Sy
It can further be shown (see also Exercise 18.3.5) that the values of the
parameters given by (12) and (16) actually maximize f (equivalently, g) and the
maximum is given by
n
[() ] ()
=
max f ; = L
e 1
S 2xy
.
(17)
2 S S
x y 1
Sx Sy
472 18 The Multivariate Normal Distribution
It follows that the MLEs of 1, 2, 21, 22 and , under , are given by (12) and
(16), which we may now denote by 1,, 2,, 21,, 22, and . That is,
Sxy
1, = x , 2 , = y , 12, = Sx , 22 , = Sy , = . (18)
Sx Sy
Under (that is, for = 0), it is seen (see also Exercise 18.3.6) that the MLEs
of the parameters involved are given by
1, = x , 2 , = y , 12, = Sx , 22 , = Sy (19)
and
n
[()
max f ; = L = ] ()
e 1
2 S S
.
(20)
x y
Replacing the xs and ys by Xs and Ys, respectively, in (17) and (20), we have
that the LR statistic is given by
n 2
S2
( )
n 2
= 1 XY = 1 R2 . (21)
S X SY
where R is the sample correlation coefficient, that is,
n n n
( )( ) (X ) (Y Y )
2 2
R = X j X Yj Y j X j . (22)
j =1 j =1 j =1
n 2R
W =W R = ( ) , (24)
1 R2
it is easily seen, by differentiation, that W is an increasing function of R .
Therefore, the test in (23) is equivalent to the following test
Reject H whenever W < c or W > c , (25)
where c is determined, so that PH(W < c or W > c) = . It is shown in the sequel
that the distribution of W under H is tn2 and hence c is readily determined.
18.3 Estimation of and / and Test18.1
of Independence
Introduction 473
( )( ) (x ) (Y Y )
2 2
R x = x j x Yj Y j x j . (26)
j =1 j =1 j =1
Let also
n
( ) (x )
2
vj = x j x j x ,
j =1
so that
n n
v j = 0 and v 2
j = 1. (27)
j =1 j =1
n n n
2
Wx = W v = v j Yj
j =1
Y j2 nY 2 v j Yj
j =1
j =1
( n 2) . (28)
We have that Yj, j = 1, . . . , n are independent N(2, 22). Now if we consider the
N(0, 1) r.v.s Yj = (Yj 2)/ 2, j = 1, . . . , n and replace Yj by Yj in (28), it is seen
(see also Exercise 18.3.9) that Wx = W*v remains unchanged. Therefore we may
assume that the Ys are themselves independent N(0, 1). Next consider the
transformation
1 1
Z1 = Y1 + + Yn
n n
Z = v Y + + v Y .
2 1 1 n n
1
n 1 ( )
1 r 2 ( n 2 ) 2 , 1 < r < 1.
()
fR r =
1 2
1 n 2
( )
2 ( )
PROOF From W = n 2 r/1 R2, it follows that R and W have the same
sign, that is, RW 0. Solving for R, one has then R = W/W 2 + n 2. By setting
w = n 2 r/1 r2, one has dw/dr = n 2(1 r2)3/2, whereas
1
n 1 ( ) w2
( )
n 1 2
2
fW ( )
w =
1
1 +
n 2
, w ! .
n 2 n 2 ( )
2
Therefore
n 2 r dw
()
fR r = fW
1 r 2 dr
1
n 1 ( ) n 2 r2 ( )
( )
n 1 2
2
( )
3 2
= 1 + n 2 1 r2
1
n 2 n 2 ( )
(
n 2 1 r2 )( )
2
1
n 1 ( )
1 r 2 ( n 2 ) 2 ,
= 2
1
( )
n 2 ( )
2
as was to be shown.
The p.d.f. of R when 0 can also be obtained, but its expression is rather
complicated and we choose not to go into it.
We close this chapter with the following comment. Let X be a k-
dimensional random vector distributed as N( , / ). Then its ch.f. is given by (6).
Furthermore, if / is non-singular, then the N(, / ) distribution has a p.d.f.
which is given by (7). However, this is not the case if / is singular. In this latter
case, the distribution is called singular, and it can be shown that it is concen-
trated in a hyperplane of dimensionality less than k.
18.1 Introduction
Exercises 475
Exercises
18.3.1 Verify relation (11).
18.3.2 Verify relation (14).
18.3.3 Verify relation (15).
18.3.4 Show that 21, 22, and given by (16) is indeed the solution of the
system of the equations in (14) and (15).
18.3.5 Consider g given by (10) and set
2
dij =
i j
g () ,
=
where
(
= 1 , 2 , 12 , 22 , ) (
= 1 , 2 , 12 , 22 , )
and 1, 2, 21, 22 and are given by (12) and (16). Let D = (dij), i, j = 1, . . . , 5
and denote by D5k the determinant obtained from D by deleting the last k
rows and columns, k = 1, . . . , 5, D0 = 1. Then show that the six numbers D0,
D1, . . . , D5 are alternately positive and negative. This result together with the
fact that dij, i, j = 1, . . . , 5 are continuous functions of implies that the
quantities given by (18) are, indeed, the MLEs of the parameters 1, 2, 21, 22
and , under . (See for example Mathematical Analysis by T. M. Apostol,
Addison-Wesley, 1957, Theorem 7.9, pp. 151152.)
18.3.6 Show that the MLEs of 1, 2, 21, and 22, under , are indeed given
by (19).
18.3.7 Show that ! 2 1, where ! is the sample correlation coefficient given
by (22).
18.3.8 Verify relation (28).
18.3.9 Show that the statistic Wx (= W*v ) in (28) remains unchanged if the
r.v.s Yj are replaced by the r.v.s
Yj 2
Y j =
, j = 1, . . . , n.
2
and S defined in Theorem 6 and, by using
18.3.10 Refer to the quantities X
Basus theorem (Theorem 3, Chapter 11), show that they are independent.
476 19 Quadratic Forms
Chapter 19
Quadratic Forms
19.1 Introduction
In this chapter, we introduce the concept of a quadratic form in the variables
xj, j = 1, . . . , n and then confine attention to quadratic forms in which the xjs
are replaced by independent normally distributed r.v.s Xj, j = 1, . . . , n. In this
latter case, we formulate and prove a number of standard theorems referring
to the distribution and/or independence of quadratic forms.
A quadratic form, Q, in the variables xj, j = 1, . . . , n is a homogeneous
quadratic (second degree) function of xj, j = 1, . . . , n. That is,
n n
Q = cij xi x j ,
i =1 j =1
where here and in the sequel the coefficients of the xs are always assumed to
be real-valued constants. By setting x = (x1, . . . , xn) and C = (cij), we can write
Q = xCx. Now Q is a 1 1 matrix and hence Q = Q, or (xCx) = xCx = xCx.
Therefore Q = 12 (xCx + xCx) = xAx, where A = 12 (C + C); that is to say, if
A = (aij), then aij = 12 (cij + cji), so that aij = aji. Thus A is symmetric. We can then
give the following definition.
DEFINITION 1 A (real) quadratic form, Q, in the variables xj, j = 1, . . . , n is a homogeneous
quadratic function of xj, j = 1, . . . , n,
n n
Q = cij xi x j , (1)
i =1 j =1
476
19.2 Some Theorems on Quadratic Forms 477
DEFINITION 2 For an n n matrix C, the polynomial (in ) |C In| is of degree n and is called
the characteristic polynomial of C. The n roots of the equation |C In| = 0 are
called characteristic or latent roots or eigenvalues of C.
DEFINITION 3 The quadratic form Q = xCx is called positive definite if xCx > 0 for every
x 0; it is called negative definite if xCx < 0 for every x 0 and positive
semidefinite if xCx 0 for every x. A symmetric n n matrix C is called positive
definite, negative definite or positive semidefinite if the quadratic form asso-
ciated with it, Q = xCx, is positive definite, negative definite or positive
semidefinite, respectively.
DEFINITION 4 If Q = xCx, then the rank of C is also called the rank of Q.
Q = X CX, where C = C.
where for i = 1, . . . , k, Qi are quadratic forms in X with rank Qi = ri. Then the
r.v.s Qi are independent 2r if and only if ki= 1ri = n.
i
Next, we suppose that ki= 1ri = n and show that for i = 1, . . . , k, Qi are
independent 2r . To this end, one has that Qi = XCiX, where Ci is an n n
i
symmetric matrix with rank Ci = ri. Consider the matrix Ci. By Theorem 11.I(ii)
in Appendix I, there exist ri linear forms in the Xs such that
2 2
(i ) (i ) (i ) (i ) (i ) (i )
Qi = 1 b11 X 1 + + b1n X n + + r b r 1 X 1 + + b r n X n , (4)
i i i
where 1(i), . . . , (i)
ir are either 1 or 1. Now i = 1ri = n and let B be the n n
k
matrix defined by
478 19 Quadratic Forms
b(1) b(1)
11 1n
b(1) b(1)
r1 rn
B= .
(k )
1 1
( k)
b11 b1n
(k ) (k )
br 1 br n
k k
Qi = X X
i =1
and (BX)D(BX) = X (BDB)X.
Therefore (5) gives
(
X X = X BDB X ) identically in X.
Hence BDB = In. From the definition of D, it follows that ||D|| = 1, so that rank
D = n. Let r = rank B. Then, of course, r n. Also n = rank In = rank (BDB)
r, so that r = n. It follows that B is nonsingular and therefore the relationship
BDB = In implies D = (B)1B1 = (BB)1. On the other hand, for any
nonsingular square matrix M, MM is positive definite (by Theorem 10.I(ii)
in Appendix I) and so is (MM)1. Thus (BB)1 is positive definite and hence
so is D. From the form of D, it follows then that all diagonal elements of
D are equal to 1, which implies that D = In and hence BB = In; that is to
say, B is orthogonal. Set Y = BX. By Theorem 5, Chapter 9, it follows that, if
Y = (Y1, . . . , Yn), then the r.v.s Yj, j = 1, . . . , n are independent N(0, 1). Also
the fact that D = In and the transformation Y = BX imply, by means of (4), that
Q1 is equal to the sum of the squares of the first r1, Ys, Q2 is the sum of the
squares of the next r2 Ys, . . . , Qk is the sum of the squares of the last rk Ys.
It follows that, for i = 1, . . . , k, Qi are independent 2r . The proof is i
completed.
APPLICATION 1 For j = 1, . . . , n, let Zj be independent r.v.s distributed as N(, 2) and set
Xj = (Zj )/, so that the Xs are i.i.d. distributed as N(0, 1). It has been
seen elsewhere that
( ) ;
2
n Zj
2
n Z Z
2
n Z
=
j
+
j =1 j =1
equivalently,
19.2 Some Theorems on Quadratic Forms 479
2
n n
( )
1
2
X j2 = X j X + n X .
2
j =1 j =1
Now
2 2
1
1 n
n X = X j = X C 2 X,
2
n j =1
where C2 has its elements identically equal to 1/n, so that rank C2 = 1. Next it
can be shown (see also Exercise 19.2.1) that
n
(X )
2
j X = X C1X,
j =1
where C1 is given by
(
n1 n ) 1 n 1 n
1 n
C1 =
(n 1) n 1 n
1 n
1 n ( )
n 1 n
and that rank C1 = n 1. Then Theorem 1 applies with k = 2 and gives that
)2 and (nX
nj= 1(Xj X )2 are independent distributed as 2n1 and 21, respec-
)2 is distributed as 2n1 and is
tively. Thus it follows that (1/ 2)nj= 1(Zj Z
independent of Z .
( ) ( )
Q = X CX = PY C PY = Y P CP Y = j Y j2 , ( ) j =1
(8)
480 19 Quadratic Forms
[(1 2i t ) (1 2i t )]
1 2
1 m . (9)
(1 2it )
r 2
. (10)
From (8)(10), one then has that (see also Exercise 19.2.2)
1 = = m = 1 and m = r. (11)
It follows then that rank C = r. We now show that C2 = C. From (8), one has
that PCP is diagonal and, by (11), its diagonal elements are either 1 or 0.
Hence PCP is idempotent. Thus
( ) = (PCP)(PCP)
2
P CP = P CP
= P C(PP )CP = P CI CP = P C P. n
2
That is,
P CP = P C 2P. (12)
1 1
Multiplying by P and P on the left and right, respectively, both sides of
(12), one concludes that C = C2. This completes the proof of the theorem.
APPLICATION 2 Refer to Application 1. It can be shown (see also Exercise 19.2.3) that C1 and
C2 are idempotent. Then Theorem 2 implies that nj=1(Xj X )2 and (n 1/2X
)2,
or equivalently
( )
2
n n Z
(Z )
1 2
j Z and
2 j =1
are distributed as 2n1 and 21, respectively.
To this theorem there are the following two corollaries which will be
employed in the sequel.
COROLLARY 1 If the quadratic form Q = XCX is distributed as 2r, then it is positive
semidefinite.
PROOF From (8) and (10), one has that Q = XCX is equal to jr= 1Y 2j, so that
XCX is equal to jr= 1Y 2j , where X = (X1, . . . , Xn) and (Y1, . . . , Yn) = Y = P1X.
Thus XCX 0 for every X, as was to be seen.
19.2 Some Theorems on Quadratic Forms 481
COROLLARY 2 Let P be an orthogonal matrix and consider the transformation Y = P1X. Then
if the quadratic form Q = XCX is 2r , so is the quadratic form Q* = Y(PCP)Y.
PROOF By the theorem, it suffices to show that PCP is idempotent and that
its rank is r. We have
(PCP) ( )
2
= P C PP CP = P CCP = P CP
Q2 = X X Q1 = X X X C1X = X I n C1 X ( )
and (In C1) = In 2C1 + C = In C1, that is, In C1 is idempotent. Also rank
2 2
1
C1 + rank (In C1) = n by Theorem 12.I(iii) in Appendix I, so that rank (In C1)
= n r1. We have then that rank Q1 + rank Q2 = n, and therefore Theorem 1
applies and gives the result.
APPLICATION 3 is N(0, 1), it follows that (nX
Refer to Application 1. Since nX )2 is 21. Then,
n
by Theorem 3, j=1(Xj X ) is distributed as n1 and is independent of
2 2
Qi = Y Bi Y, where Bi = P Ci P, i = 1, 2.
The equation Q = Q1 + Q2 implies
r
(Y , . . . ,Y )(Y , . . . ,Y ) = Y
1 r 1 r
j =1
2
j = Q1 + Q2 . (13)
follows that Q*1 and Q*2 are functions of Yj, j = 1, . . . , r only. From the
orthogonality of P, we have that the r.v.s Yj, j = 1, . . . , r are independent
N(0, 1). On the other hand, Q*1 is 2r by Corollary 2 to Theorem 2. These
1
facts together with (13) imply that Theorem 3 applies (with n = r) and provides
the desired result.
This last theorem generalizes as follows.
THEOREM 5 Suppose that Q = ki= 1Qi, where Q and Qi, i = 1, . . . , k ( 2) are quadratic forms
in X. Furthermore, let Q be 2r, let Qi be 2r , i = 1, . . . , k 1 and let Qk be
i
k 1
rk = r ri , and Qi , i = 1, . . . , k
i =1
are independent.
PROOF The proof is by induction. For k = 2 the conclusion is true by Theo-
rem 4. Let the theorem hold for k = m and show that it also holds for m + 1. We
write
m 1
Q = Qi + Qm , where Qm = Qm + Qm +1 .
i =1
m 1
rm = r ri , and Q1 , . . . , Qm 1 , Qm
i =1
are independent, by the induction hypothesis. Thus Q*m = Qm + Qm+1, where Q*m
is 2r* , Qm is 2r and Qm+1 is positive semidefinite. Once again Theorem 4 applies
m m
m
rm +1 = rm rm = r ri ,
i =1
and that Qm and Qm+1 are independent. It follows that Qi, i = 1, . . . , m + 1 are
also independent and the proof is concluded.
The theorem below gives a necessary and sufficient condition for inde-
pendence of two quadratic forms. More precisely, we have the following
result.
THEOREM 6 Consider the independent r.v.s Yj, j = 1, . . . , n, where Yj is distributed as
N(j, 2), and for i = 1, 2, let Qi be quadratic forms in Y = (Y1, . . . , Yn); that
is, Qi = YCiY. Then Q1 and Q2 are independent if and only if C1C2 = 0.
PROOF The proof is presented only for the special case that Yj = Xj #
N(0, 1) and Qi # 2r , i = 1, 2. To this end, suppose that C1C2 = 0. By the fact that
i
19.2 Some Theorems on Quadratic
Exercises
Forms 483
that is, C 2i = Ci, i = 1, 2. Next by the symmetry of Ci, one has C2C1 = C2C1 =
(C1C2) = 0 = 0. Therefore
( ) (
C 1 I n C 1 C 2 = C 2 I n C 1 C 2 = 0. )
Then Theorem 12.I(iii), in Appendix I, implies that
(
rank C1 + rank C 2 + rank I n C1 C 2 = n. ) (14)
X X = X C1X + X C 2 X + X I n C1 C 2 X. ( ) (15)
Then relations (14), (15) and Theorem 1 imply that XC1X = Q1, XC2X = Q2
(and X(In C1 C2)X) are independent.
Let now Q1, Q2 be independent. Since Q1 is 2r and Q2 is 2r , it follows that
1 2
Exercises
19.2.1 Refer to Application 1 and show that
n
(X )
2
j X = X C1X
j =1
as asserted there.
19.2.2 Justify the equalities asserted in (11).
19.2.3 Refer to Application 2 and show that the matrices C1 and C2 are both
idempotent.
19.2.4 Consider the usual linear model Y = X + e, where X is of full rank
p, and let = S1XY be the LSE of . Write Y as follows: Y = X + (Y X )
and show that:
ii) ||Y||2 = YXS1XY + ||Y X ||2;
ii) The r.v.s YXS1XY and ||Y X ||2 are independent, the first being distrib-
uted as noncentral 2p and the second as 2np.
484 19 Quadratic Forms
19.2.5 Let X1, X2, X3 be independent r.v.s distributed as N(0, 1) and let the
r.v. Q be defined by
Q=
1
6
( )
5 X 12 + 2 X 22 + 5 X 23 + 4 X 1 X 2 2 X 1 X 3 + 4 X 2 X 3 .
Then find the distribution of Q and show that Q is independent of the r.v.
3j = 1X 2j Q.
19.2.6 Refer to Example 1 in Chapter 16 and (by using Theorem 6 herein)
show that the r.v.s 1, 2, as well as the r.v.s 2, 2, are independent, where
2 is the LSE of 2.
19.2.7 For j = 1, . . . , n, let Yj be independent r.v.s, Yj being distributed as
N(j, 1), and set Y = (Y1, . . . , Yn). Let YY = ki= 1Qi, where for i = 1, . . . , k, Qi
are quadratic forms in Y, Qi = YCiY, with rank Qi = ri. Then show that the
r.v.s Qi are independent 2r ; if and only if ki= 1ri = n, where the noncentrality
i i
Chapter 20
Nonparametric Inference
(
n Xn ) d Z# N 0, 1 .
n
( )
Thus if z/2 is the upper /2 quantile of the N(0, 1) distribution, then
485
486 20 Nonparametric Inference
P z 2
(
n Xn )
z 2
= P X n z 2 X n + z 2 1 ,
n n n
so that [Ln, Un] is a confidence interval for with asymptotic confidence co-
efficient 1 ; here
( )
Ln = L X 1 , . . . , X n = X n z 2
n
and
( )
U n = U X 1 , . . . , X n = X n + z 2 .
n
Next, suppose that 2 is unknown and set S 2n for the sample variance of the
Xs; namely,
1 n
( )
2
S 2n = X j Xn .
n j =1
Then the WLLNs and the SLLNs, properly applied, ensured that S 2n, viewed
as an estimator of 2, was both a weakly and strongly consistent estimator of 2.
Also by the corollary to Theorem 9 of Chapter 8, it follows that
(
n Xn ) d Z# N 0, 1 .
Sn n
( )
By setting
( )
Ln = L X 1 , . . . , X n = X n z 2
Sn
n
and
( )
U n = U X 1 , . . . , X n = X n + z , 2
Sn
n
n ] is a confidence interval for with asymptotic confidence
we have that [L*n, U*
coefficient 1 .
Clearly, the examples mentioned so far are cases of nonparametric
point and interval estimation. A further instance of point nonparametric
estimation is provided by the following example. Let F be the (common
and unknown) d.f. of the Xis and set Fn for their sample or empirical d.f.; that
is,
20.2 Nonparametric
20.1 Nonparaetric
Estimation Estimation
of a P.D.F. 487
( )
Fn x; s =
1
n[ () () ]
the number of X 1 s , . . . , X n s x , x ! , s S . (1)
We often omit the random element s and write Fn(x) rather than Fn(x; s). Then
it was stated in Chapter 8 (see Theorem 6) that
( )
Fn x; a.s.
F x
n
() uniformly in x ! . (2)
Thus Fn(x; ) is a strongly consistent estimator of F(x) and for almost all s S
and every > 0, we have
( ) ()
Fn x; s F x Fn x; s + , ( )
provided n n(, s) independent of x ! .
We close this section by observing that Section 5 of Chapter 15 is con-
cerned with another nonparametric aspect, namely that of constructing toler-
ance intervals.
( )
Fn x + h Fn x h ( ).
()
fn x =
2h
However,
488 20 Nonparametric Inference
( ) (
Fn x + h Fn x h )
()
fn x =
2h
1 the number of X1 , . . . , X n x + h
=
2h n
the number of X1 , . . . , X n x h
n
=
1 the number of X1 , . . . , X n in x h, x + h(;
]
2h n
that is
()
fn x =
1 the number of X1 , . . . , X n in x h, x + h ( ]
2h n
and it can be further easily seen (see Exercise 20.2.1) that
1 n x Xj
()
fn x = K
nh j=1 h
, (3)
K x = 2
, if x 1, 1
()
1
( ]
0, otherwise.
Thus the proposed estimator fn(x) of f(x) is expressed in terms of a known
p.d.f. K by means of (3). This expression also suggests an entire class of
estimators to be introduced below. For this purpose, let K be any p.d.f. defined
on ! into itself and satisfying the following properties:
{ ()
sup K x ; x ! <
}
()
lim xK x = 0 as x (4)
( )
K x = K x , x ! . ()
Next, let {hn} be a sequence of positive constants such that
hn
0. (5)
n
For each x ! and by means of K and {hn}, define the r.v. fn(x; s), to be
shortened to fn(x), as follows:
n x Xj
()
fn x =
1
nhn
K
j =1 hn
. (6)
satisfying (5) and for each x ! let fn(x) be defined by (6). Then for any x
! at which f is continuous, the r.v. fn(x), viewed as an estimator of f(x), is
asymptotically unbiased in the sense that
( ) n ( )
Efn x
f x .
Now let {hn} be as above and also satisfying the following requirement:
nhn
. (7)
n
Then the following results hold true.
THEOREM 2 Under the same assumptions as those in Theorem 1 and the additional condi-
tion (7), for each x ! at which f is continuous, the estimator fn(x) of f(x) is
consistent in quadratic mean in the sense that
() ()
2
E fn x f x
0.
n
The estimator fn(x), when properly normalized, is also asymptotically
normal, as the following theorem states.
THEOREM 3 Under the same assumptions as those in Theorem 2, for each x ! at which
f is continuous,
( ) [ ( )] d Z# N 0, 1 .
fn x E fn x
[ f ( x )] n
( )
n
nh 2n
, (8)
n
we may show the following result.
THEOREM 4 Under the same assumptions as those in Theorem 1 and also condition (8),
()
fn x a.
n
s.
f x () uniformly in x ! ,
() 1 2
K x = ex 2
.
2
490 20 Nonparametric Inference
Then, clearly,
()
K x =
2
1
ex
2
2
2
1
, so that sup K x ; x ! < . { () }
Next, for x > 1, one has ex < ex , so that ex /2 < ex/2 and hence
2 2
2
xe x 2
< xe x 2 = x e x 2 .
Now consider the expansion et = 1 + tet for some 0 < < 1, and replace t by
x/2. We get then
x x 1
=
0 =
( )
e x 2 1 + x 2 e x
1 x + e
1 x 2 x 2
( )
2
2 2
and therefore xe x 2
x
0. In a similar way xe x 2 0, so that
x
| | ||
lim xK(x) = 0 as x . Since also K(x) = K(x), condition (4) is satisfied.
Let us now take hn = 1/n1/4. Then 0 < hn 0, nhn2 = n n1 2 = n1 2
n
and
n
nhn = n
3 4
n
. Thus the estimator given by (6) has all properties stated in
Theorems 14. This estimator here becomes as follows:
(
xX
)
2
n
()
fn x =
1
exp 2 n1 2
j .
2 n3 4 j =1
Exercise
20.2.1 Let Xj, j = 1, . . . , n be i.i.d. r.v.s and for some h > 0 and any x ! ,
define fn(x) as follows:
()
fn x =
1 the number of X1 , . . . , X n in x h, x + h
.
( ]
2h n
Then show that
1 n x Xj
()
fn x = K
nh j = 1 h
,
testing hypotheses problems about F also arise and are of practical impor-
tance. Thus we may be interested in testing the hypothesis H : F = F0, a given
d.f., against all possible alternatives. This hypothesis can be tested by utilizing
the chi-square test for goodness of fit discussed in Chapter 13, Section 8. The
chi-square test is the oldest nonparametric test regarding d.f.s. Alternatively,
the sample d.f. Fn may also be used for testing the same hypothesis as above.
In order to be able to employ the test proposed below, we have to make the
supplementary (but mild) assumption that F is continuous. Thus the hypoth-
esis to be tested here is
H : F = F0 , a given continuous d.f.,
against the alternative
A : F F0 (in the sense that F (x) F (x) for at least one x ! ).
0
{ () ()
Dn = sup Fn x F0 x ; x ! , } (9)
where Fn is the sample d.f. defined by (1). Then, under H, it follows from (2)
that Dn a.s.
0. Therefore we would reject H if Dn > C and would accept it
n
otherwise. The constant C is to be determined through the relationship
(
P Dn > C H = . ) (10)
In order for this determination to be possible, we would have to know the
distribution of Dn, under H, or of some known multiple of it. It has been shown
in the literature that
( )
(1) e
j 2 j 2 x 2
P nDn x H
, x 0. (11)
n j =
Thus for large n, the right-hand side of (11) may be used for the purpose of
determining C by way of (10). For moderate values of n (n 100) and selected
s ( = 0.10, 0.05, 0.025, 0.01, 0.005), there are tables available which facilitate
the calculation of C. (See, for example, Handbook of Statistical Tables by D. B.
Owen, Addison-Wesley, 1962.) The test employed above is known as the
Kolmogorov one-sample test.
The testing hypothesis problem just described is of limited practical im-
portance. What arise naturally in practice are problems of the following type:
Let Xi, i = 1, . . . , m be i.i.d. r.v.s with continuous but unknown d.f. F and let
Yj, j = 1, . . . , n be i.i.d. r.v.s with continuous but unknown d.f. G. The two
random samples are assumed to be independent and the hypothesis of interest
here is
H : F = G.
One possible alternative is the following:
A: F G (12)
(in the sense that F(x) G(x) for at least one x !).
492 20 Nonparametric Inference
{ ()
Dm,n = sup Fm x Gn x ; x ! , () } (13)
where Fm, Gn are the sample d.f.s of the Xs and Ys, respectively. Under H,
F = G, so that
() ( ) [ ( ) ( )] [ ( ) ( )]
Fm x Gn x = Fm x F x Gn x G x
F ( x ) F ( x ) + G ( x ) G( x ) .
m n
Hence
{ () () }
Dm,n sup Fm x F x ; x ! + sup Gn x G x ; x ! , { () () }
whereas
{ () ()
sup Fm x F x ; x ! a.s. }
0, sup Gn x G x ; x ! a.s.
m
0. { () () } m
a.s.
In other words, we have that Dm,n 0 as m, n , and this suggests
rejecting H if Dm,n > C and accepting it otherwise. The constant C is deter-
mined by means of the relation
(
P Dm,n > C H = . ) (14)
( )
(1) e
j 2 j 2x 2
P N Dm ,n x H as m, n , x 0, (15)
j =
{ ()
Dm+ ,n = sup Fm x Gn x ; x ! () }
20.4 More About Nonparametric
20.1 Nonparaetric
Tests: Rank
Estimation
Tests 493
and reject H if D+m,n > C+. The cut-off point C+ is determined through the
relation
( )
P Dm+ ,n > C + H =
by utilizing the fact that
P ( )
N Dm+ ,n x 1 e 2 x
2
as m, n , x ! ,
as can be shown. Here N is as before, that is, N = mn/(m + n). Similarly, for
testing H against A, we employ the statistic D m,n defined by
{ () ()
Dm ,n = sup Gn x Fm x ; x ! }
and reject H if Dm,n < C . The cut-off point C is determined through the
relation
( )
P Dm ,n < C H =
by utilizing the fact that
P ( )
N Dm ,n x 1 e 2 x
2
as m, n , x ! .
For relevant tables, the reader is referred to the reference cited earlier in this
section. The last three tests based on the statistics Dm,n, D+m,n and Dm,n are
known as KolmogorovSmirnov two-sample tests.
ordering the Xs and Ys, we have strict inequalities with probability equal to
one.
For testing the hypothesis H specified above, we are going to use either
one of the rank sum statistics RX, RY defined by
m n
i =1
( )
RX = R X i , RY = R Yj
j =1
( ) (18)
(
P ! X < C H = . ) (20)
Theoretically the determination of C is a simple matter; under H, all (mN )
values of (R(X1), . . . , R(Xm)) are equally likely each having probability 1/(mN ).
The rejection region then is defined as follows: Consider all these ( mN ) values
and for each one of them form the rank sum RX. Then the rejection region
consists of the k smallest values of these rank sums. For small values of m and
n (n m 10), this procedure is facilitated by tables (see reference cited in
previous section), whereas for large values of m and n it becomes unmanage-
able; for this latter case, the normal approximation to be discussed below may
be employed. The remaining two alternatives are treated in a similar fashion.
Next, consider the function u defined as follows:
1, if z > 0
()
uz = (21)
0, if z < 0
and set
m n
U = u X i Yj .
i =1 j =1
( ) (22)
Then U is, clearly, the number of times a Y precedes an X and it can be shown
(see Exercise 20.4.2) that
U = mn +
(
n n+1 ) R = RX
(
m m+1 ). (23)
Y
2 2
20.4 More About Nonparametric
20.1 Nonparaetric
Tests: Rank
Estimation
Tests 495
() ( )
G x = F x , x ! for some unknown ! .
As before, F is assumed to be unknown but continuous. In this case, we
say that G is a shift of F (to the right if > 0 and to the left if < 0). Then
the hypothesis H : F = G is equivalent to testing = 0 and the alternatives
A : F G, A : F > G and A : F < G are equivalent to 0, > 0 and < 0,
respectively.
In closing this section, we should like to mention that there is also the one-
sample WilcoxonMannWhitney test, as well as other one-sample and two-
sample rank tests available. However, their discussion here would be beyond
the purposes of the present chapter.
As an illustration, consider the following numerical example.
EXAMPLE 2 Let m = 5, n = 4 and suppose that X1 = 78, X2 = 65, X3 = 74, X4 = 45, X5 = 82;
Y1 = 110, Y2 = 71, Y3 = 53, Y4 = 50. Combining these values and ordering them
according to their size, we obtain
45 50 53 65 71 74 78 82 110
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
X Y Y X Y X X X Y ,
where an X or Y below a number means that the number is coming from the
X or Y sample, respectively. From this, we find that
( ) ( ) ( ) ( )
R X 1 = 7, R X 2 = 4, R X 3 = 6, R X 4 = 1, R X 5 = 8; R Y1 = 9, ( ) ( )
R(Y ) = 5, R(Y ) = 3, R(Y ) = 2, so that
2 3 4 RX = 26, RY = 19.
=
5
126
(
0.04 . )
Then for testing H against A (given by (16)), we would reject for small values
of RX, or equivalently (by means of (23)), for small values of U. For the given
m, n, and for the observed value of RX (or U), H is accepted. (See tables on
p. 341 of the reference cited in Section 20.3.)
Exercises
20.4.1 Consider the two independent random samples Xi, i = 1, . . . , m and
Yj, j = 1, . . . , n and let R(Xi) and !(Yj) be the ranks of Xi and Yj, respectively,
in the combined sample of the Xs and Ys. Furthermore, let RX and RY be
defined by (18). Then show that
RX + RY =
(
N N +1 ),
2
where N = m + n.
20.4.2 Let RX and RY be as in the previous exercise and let U be defined by
(22). Then establish (23).
20.4.3 Let Xi, i = 1, . . . , m and Yj, j = 1, . . . , n be two independent random
samples and let U be defined by (22). Then show that, under H,
(
mn m + n + 1 )
EU =
mn
2
, 2 U =( ) 12
.
Also set Z = nj=1Zj and p = P(Xj < Yj). Then, clearly, Z is distributed as B(n, p)
and the hypothesis H above is equivalent to testing p = 12 . Depending on the
type of the alternatives, one would use the two-sided or the appropriate one-
sided test.
Some cases where the sign test just described is appropriate is when one is
interested in comparing the effectiveness of two different drugs used for the
treatment of the same disease, the efficiency of two manufacturing processes
producing the same item, the response of n customers regarding their prefer-
ences towards a certain consumer item, etc.
Of course, there is also the one-sample Sign test available, but we will not
discuss it here.
For the sake of an illustration, consider the following numerical example.
EXAMPLE 3 Let n = 10 and suppose that
X1 = 73, X 2 = 68, X 3 = 64, X 4 = 90, X 5 = 83,
X 6 = 48, X 7 = 100, X 8 = 75, X 9 = 90, X10 = 85
and
Y1 = 50, Y2 = 100, Y3 = 70, Y4 = 96, Y5 = 74,
Y6 = 64, Y7 = 76, Y8 = 83, Y9 = 98, Y10 = 40.
Then
Z1 = 0, Z2 = 1, Z3 = 1, Z4 = 1, Z5 = 0, Z6 = 1, Z7 = 0,
10
Z8 = 1, Z9 = 1, Z10 = 0, so that Z j = 6.
j =1
cance is . Formally, we may also employ the t-test (see (36), Chapter 13) for
testing the same hypothesis against the same specified alternative at the same
level . Let n* be the (common) sample size required in order to achieve a
power equal to by employing the t-test. We further assume that the limit of
n*/n, as n , exists and is independent of and . Denote this limit by e.
Then this quantity e is the Pitman asymptotic relative efficiency of the
WilcoxonMannWhitney test (or of the Sign test, depending on which one is
used) relative to the t-test. Thus, if we use the WilcoxonMannWhitney test
and if it so happens that e = 13 , then this means that the WilcoxonMann
Whitney test requires approximately three times as many observations as the
t-test in order to achieve the same power. However, if e = 5, then the
WilcoxonMannWhitney test requires approximately only one-fifth as many
observations as the t-test in order to achieve the same power.
It has been found in the literature that the asymptotic efficiency of the
WilcoxonMannWhitney test relative to the t-test is 3/ 0.95 when the
underlying distribution is Normal, 1 when the underlying distribution is Uni-
form and when the underlying distribution is Cauchy.
I.3 Basic Definitions About Matrices 499
Appendix I
x + y = y + x, ( x + y ) + z = x + ( y + z ) = x + y + z.
The product x of the vector x = (x1, . . . , xn) by the real number (scalar) is
the vector defined by x = (x1, . . . , xn). For any two vectors x, y in Vn and
any two scalars , , the following properties are immediate:
( )
x + y = x + y, ( + )x = x + x, (x) = (x) = x,
(x + y) = x + y , 1x = x.
The inner (or scalar) product xy of any two vectors x = (x1, . . . , xn), y =
(y1, . . . , yn) is a scalar and is defined as follows:
499
500 Appendix I Topics from Vector and Matrix Algebra
n
x y = x j y j .
j =1
For any three vectors x, y and z in Vn and any scalars , , the following
properties are immediate:
( ) ( ) ( ) ( )
x y = y x, x y = x y = x y , x y + z = x y + x z,
x x 0 and x x = 0 if and only if x = 0;
also if
x = x .
r
V = y Vn ; y = j x j , j ! , j = 1, . . . , r
j =1
is a subspace of Vn.
The vectors xj, j = 1, . . . , r in Vn are said to span (or generate) the subspace
V Vn if every vector y in V may be written as follows: y = j=1 r
jxj for some
scalars j, j = 1, . . . , r.
For any positive integer m < n, the m-dimensional vector space Vm may be
considered as a subspace of Vn by enlarging the m-tuples to n-tuples and
identifying the appropriate components with zero in the resulting n-tuples.
Thus, for examples, the x-axis in the plane may be identified with the set {x =
(x1, x2) V2; x1 !, x2 = 0} which is a subspace of V2. Similarly the y-axis in the
plane may be identified with the set {y = (y1, y2) V2; y1 = 0, y2 !} which is
a subspace of V2; the xy-plane in the three-dimensional space may be identi-
fied with the set {z = (x1, x2, x3) V3; x1, x2 !, x3= 0} which is a subspace of
V3, etc.
From now on, we shall assume that the above-mentioned identi-
fication has been made and we shall write Vm Vn to indicate that Vm is a
subspace of Vn.
I.2I.3 Some
Basic
Theorems
Definitions
onAbout
VectorMatrices
Spaces 501
THEOREM 1.I For any positive integer n, consider any subspace V Vn. Then V has a basis
and any two bases in V have the same number of vectors, say, m (the
dimension of V ). In particular, the dimension of Vn is n and m n.
THEOREM 2.I Let m, n be any positive integers with m < n and let {xj, j = 1, . . . , m} be an
orthonormal basis for Vm. Then this basis can be extended to an orthonormal
basis {xj, j = 1, . . . , n} for Vn.
THEOREM 3.I Let n be any positive integer, let x be a vector in Vn and let V be a subspace
of Vn. Then x V if and only if x is orthogonal to the vectors of a basis for V,
or to the vectors of any set of vectors in V spanning V.
THEOREM 4.I Let m, n be any positive integers with m < n and let Vm be a subspace of Vn of
dimension m. Let U be the set of vectors in Vn each of which is orthogonal to
Vm. Then U is an r-dimensional subspace Ur of Vn with r = n m and is called
the orthocomplement (or orthogonal complement) of Vm in Vn. Furthermore,
any vector x in Vn may be written (decomposed) uniquely as follows: x = v + u
with v Vm, u U r. The vectors v, u are called the projections of x into Vm
and Ur, respectively, and ||x||2 = ||v||2 + ||u||2. Finally, as z varies in Vm, | x z|| has
a minimum value obtained for z = v, and as w varies in Ur, ||x w|| has a
minimum value obtained for w = u.
502 Appendix I Topics from Vector and Matrix Algebra
BA is not defined unless r = m and even then, it is not true, in general, that
AB = BA. For example, take
0 0 1 1
A= , B = .
0 1 0 0
Then
0 0 0 1
AB = , BA = ,
0 0 0 0
so that AB BA. The products AB, BA are always defined for all square
matrices of the same order.
Let A be an m n matrix, let B, C be two n r matrices and let D be an
r k matrix. Then for any scalars , and , the following properties are
immediate:
( )
IA = AI = A, 0A = A0 = 0, A B + C = AB + AC,
where the aij are the elements of A and the summation extends over all
permutations (i, j, . . . , p) of (1, 2, . . . , m). The plus sign is chosen if the
permutation is even and the minus sign if it is odd. For further elaboration, see
any of the references cited at the end of this appendix. It can be shown that A
is nonsingular if and only is |A| 0. It can also be shown that if |A| 0, there
exists a unique matrix, to be denoted by A1, such that AA1 = A1A = I. The
matrix A1 is called the inverse of A. Clearly, (A1)1 = A.
Let A be a square matrix of order n such that AA = AA = I. Then A is
said to be orthogonal. Let ri and ci, i = 1, . . . , n stand for the row and column
504 Appendix I Topics from Vector and Matrix Algebra
THEOREM 8.I i) Let r1, r2 be two vectors in Vn such that r1r2 = 0 and ||r1|| = ||r2|| = 1. Then there
exists an n n orthogonal matrix, the first two rows of which are equal to
r1, r2.
(For a concrete example, see the application after Theorem 5 in
Chapter 9.)
ii) Let x be a vector in Vn, let A be an n n orthogonal matrix and set y = Ax.
Then xx = yy, so that ||x|| = ||y||.
iii) For every symmetric matrix A there is an orthogonal matrix B (of the
same order as that of A) such that the matrix BAB is diagonal (and its
diagonal elements are the characteristic roots of A).
THEOREM 9.I i) For any square matrix A,
( ) ( )
rank AA = rank A A = rank A = rank A .
ii) Let A, B and C be m n, n r and r k matrices, respectively.
Then
( ) (
rank AB min rank A, rank B )
and
( ) (
rank ABC min rank A, rank B, rank C . )
iii) Let A, B and C be m n, m m and n n matrices, respectively, and
suppose that B, C are non-singular. Then
( ) ( )
rank BA = rank AC = rank BAC = rank A. ( )
iv) Let A, B and C be m n, m m and n n matrices, respectively, and
suppose that B, C are non-singular. Then rank (BAC) = rank A. In
particular, rank (BAB) = rank (BAB) = rank A if m = n and B is
orthogonal.
v) For any matrix A, rank A = number of nonzero characteristic roots of A.
THEOREM 10.I i) If A is positive definite, A1 exists and is also positive definite.
ii) For any nonsingular square matrix A, AA is positive definite (and
symmetric).
iii) Let A = (aij), i, j = 1, . . . , n and define Aj by
a a1 j
A j = 11 , j = 1, . . . , n.
a j 1 a jj
Then A is positive definite if and only if |Aj| > 0, j = 1, . . . , n. In particular,
a diagonal matrix is positive definite if and only if its diagonal elements are
all positive.
iv) A matrix A of order n is positive definite (semidefinite, negative definite,
respectively,) if and only if xAx > 0 (0, <0, respectively) for every x Vn
with x 0.
506 Appendix I Topics from Vector and Matrix Algebra
b x , ij j i = 1, . . . , r
j =1
such that
2
n
r
Q = i bij x j ,
i =1 j =1
where i is either 1 or 1, i = 1, . . . , r.
iii) Let Q be as in (ii). There exists an orthogonal matrix B such that if
m
y = B 1 x, then Q = j y 2j ,
j =1
In particular,
rank A 1 + rank I A 1 = n ( )
I.4 Some Theorems I.3
About
Basic
Matrices
Definitions
and Quadratic
About Matrices
Forms 507
and
( )
rank A 1 + rank A 2 + rank I A 1 A 2 = n.
iv) If Aj, j = 1, . . . , m are symmetric idempotent matrices of the same order
and mj= 1Aj is also idempotent, the AiAj = 0 for 1 i < j m.
The proof of the theorems formulated in this appendix may be found in
most books of linear algebra. For example, see Birkhoff and MacLane, A
Survey of Modern Algebra, 3d ed., MacMillan, 1965; S. Lang, Linear Algebra,
Addison-Wesley, 1968; D. C. Murdoch, Linear Algebra for Undergraduates,
Wiley, 1957; S. Perlis, Theory of Matrices, Addison-Wesley, 1952. For a brief
exposition of most results from linear algebra employed in statistics, see also
C. R. Rao, Linear Statistical Inference and Its Applications, Chapter 1, Wiley,
1965; H. Scheff, The Analysis of Variance, Appendices I and II, Wiley, 1959;
and F. A. Graybill, An Introduction to Linear Statistical Models, Vol. I, Chap-
ter 1, McGraw-Hill, 1961.
508 Appendix II Noncentral t, 2 and F Distributions
Appendix II
( ) 1 ( r 1)
2
ft t ; = x
r;
2
( )
r +1 2
( )
r 2 r
0
2
1 x
exp x + t dx, t ! .
2 r
508
II.3 Noncentral F-Distribution 509
( )
f x; = Pj fr + 2j x , x 0,
2
r;
j =0
() ()
where
( 2)
j
2
()
Pj = e 2 2
j!
and fr + 2 j is the p.d.f. of r2+ 2j , j = 0, 1, . . . .
X r1
F= ,
Y r2
where X and Y were independent r.v.s distributed as 2r and 2r , respectively. 1 2
Suppose now that the r.v.s X and Y are independent and distributed as r2: 1
and 2r , respectively, and set
2
X r1
F = .
Y r2
Then the distribution of F is said to be the noncentral F-distribution with r1 and
r2 d.f. and noncentrality parameter . This distribution, and also an r.v. having
this distribution, is often denoted by Fr ,r ;, and its p.d.f., which does not have 1 2
any simple closed form, is given by the following expression:
( 2)
j 1 r 1+ j
2
(f; ) = e
2 1
f
cj
2
2
fF , f 0,
( r + r )+ j
j!
(1 + f )
r1 , r2 : 1
1 2
j =0 2
where
1
r1 + r2 + j
2
( )
cj = , j = 0, 1, . . . .
1 1
r1 + j r2
2 2
510 Appendix II Noncentral t, 2 and F Distributions
REMARKS
(i) By setting = 0 in the noncentral t, 2 and F-distributions, we obtain the
t, 2 and F-distributions, respectively. In view of this, the latter distribu-
tions may also be called central t, 2 and F-distributions.
(ii) Tables for the noncentral t, 2 and F-distributions are given in a reference
cited elsewhere, namely, Handbook of Statistical Tables by D. B. Owen.
Addison-Wesley, 1962.
Tables 511
Appendix III
Tables
p
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16
511
512 Appendix III Tables
Table 1 (continued )
p
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16
Table 1 (continued )
p
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16
Table 1 (continued )
p
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16
Table 1 (continued )
p
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16
Table 1 (continued )
p
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16
Table 1 (continued )
p
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16
Table 1 (continued )
p
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16
Table 1 (continued )
p
n k 1/16 2/16 3/16 4/16 5/16 6/16 7/16 8/16
j!
.
k 0.001 0.005 0.010 0.015 0.020 0.025
0 0.9990 0050 0.9950 1248 0.9900 4983 0.9851 1194 0.9801 9867 0.9753 099
1 0.9999 9950 0.9999 8754 0.9999 5033 0.9998 8862 0.9998 0264 0.9996 927
2 1.0000 0000 0.9999 9998 0.9999 9983 0.9999 9945 0.9999 9868 0.9999 974
3 1.0000 0000 1.0000 0000 1.0000 0000 0.9999 9999 1.0000 000
4 1.0000 0000 1.0000 000
k 0.030 0.035 0.040 0.045 0.050 0.055
0 0.970 446 0.965 605 0.960 789 0.955 997 0.951 229 0.946 485
1 0.999 559 0.999 402 0.999 221 0.999 017 0.998 791 0.998 542
2 0.999 996 0.999 993 0.999 990 0.999 985 0.999 980 0.999 973
3 1.000 000 1.000 000 1.000 000 1.000 000 1.000 000 1.000 000
k 0.060 0.065 0.070 0.075 0.080 0.085
0 0.941 765 0.937 067 0.932 394 0.927 743 0.923 116 0.918 512
1 0.998 270 0.997 977 0.997 661 0.997 324 0.996 966 0.996 586
2 0.999 966 0.999 956 0.999 946 0.999 934 0.999 920 0.999 904
3 0.999 999 0.999 999 0.999 999 0.999 999 0.999 998 0.999 998
4 1.000 000 1.000 000 1.000 000 1.000 000 1.000 000 1.000 000
k 0.090 0.095 0.100 0.200 0.300 0.400
0 0.913 931 0.909 373 0.904 837 0.818 731 0.740 818 0.670 320
1 0.996 185 0.995 763 0.995 321 0.982 477 0.963 064 0.938 448
2 0.999 886 0.999 867 0.999 845 0.998 852 0.996 401 0.992 074
3 0.999 997 0.999 997 0.999 996 0.999 943 0.999 734 0.999 224
4 1.000 000 1.000 000 1.000 000 0.999 998 0.999 984 0.999 939
5 1.000 000 0.999 999 0.999 996
6 1.000 000 1.000 000
k 0.500 0.600 0.700 0.800 0.900 1.000
0 0.606 531 0.548 812 0.496 585 0.449 329 0.406 329 0.367 879
1 0.909 796 0.878 099 0.844 195 0.808 792 0.772 482 0.735 759
2 0.985 612 0.976 885 0.965 858 0.952 577 0.937 143 0.919 699
Tables 521
Table 2 (continued )
k 0.500 0.600 0.700 0.800 0.900 1.000
3 0.998 248 0.996 642 0.994 247 0.990 920 0.986 541 0.981 012
4 0.999 828 0.999 606 0.999 214 0.998 589 0.997 656 0.996 340
5 0.999 986 0.999 961 0.999 910 0.999 816 0.999 657 0.999 406
6 0.999 999 0.999 997 0.999 991 0.999 979 0.999 957 0.999 917
7 1.000 000 1.000 000 0.999 999 0.999 998 0.999 995 0.999 990
8 1.000 000 1.000 000 1.000 000 0.999 999
9 1.000 000
k 1.20 1.40 1.60 1.80 2.00 2.50 3.00 3.50
k 4.00 4.50 5.00 6.00 7.00 8.00 9.00 10.00
Table 2 (continued )
k 4.00 4.50 5.00 6.00 7.00 8.00 9.00 10.00
1
( x ) =
x
2
e t 2 dt.
2
[ ( x ) = 1 ( x ) . ]
Table 3 (continued )
Table 3 (continued )
( )
P t r x = .
r 0.75 0.90 0.95 0.975 0.99 0.995
Table 4 (continued )
r 0.75 0.90 0.95 0.975 0.99 0.995
Table 4 (continued )
r 0.75 0.90 0.95 0.975 0.99 0.995
( )
P r2 x = .
r 0.005 0.01 0.025 0.05 0.10 0.25
Table 5 (continued )
r 0.005 0.01 0.025 0.05 0.10 0.25
r 0.75 0.90 0.95 0.975 0.99 0.995
Table 5 (continued )
r 0.75 0.90 0.95 0.975 0.99 0.995
(
P Fr ,r x = .
1 2
)
r1
1 2 3 4 5 6
Table 6 (continued )
r1
7 8 9 10 11 12
Table 6 (continued )
r1
13 14 15 18 20 24
Table 6 (continued )
r1
30 40 48 60 120
Table 6 (continued )
r1
1 2 3 4 5 6
Table 6 (continued )
r1
7 8 9 10 11 12
Table 6 (continued )
r1
13 14 15 18 20 24
Table 6 (continued )
r1
30 40 48 60 120
Table 6 (continued )
r1
1 2 3 4 5 6
Table 6 (continued )
r1
7 8 9 10 11 12
These tables have been adapted from Donald B. Owens Handbook of Statistical Tables,
published by Addison-Wesley, by permission of the publishers.
542 Appendix III Tables
Table 7 Table of Selected Discrete and Continuous Distributions and Some of their Characteristics
n
Binomial, B(n, p) f ( x) = p x q n x , x = 0, 1, . . . , n;
x
0 < p < 1, q = 1 p np npq
(Bernoulli, B(1, p) f (x) = p q x 1 x
, x = 0, 1 p pq)
x
Poisson P() f ( x) = e , x = 0, 1, . . . ;
x!
>0
m n
x r x
Hypergeometric f (x) = , where
m + n
r
mr mnr(m + n r )
m+n (m + n) (m + n 1)
2
x = 0, 1, . . . , min(r , m)
r + x 1 x rq rq
Negative Binomial f ( x) = p r q , x = 0, 1, . . . ;
x p p2
0 < p < 1, q = 1 p
q q
(Geometric f ( x) = pq x , x = 0, 1, . . . )
p p2
n!
Multinomial f ( x1 , . . . , xk ) = vector of expectations: vector of variances:
x1! x2 ! xk !
p1x1 p2x2 pkxk , x j 0 integers, (np1, . . . , npk) (np1q1, . . . , npkqk)
x1 + + xk = n; pj > 0, j = 1, qj = 1 pj, j = 1, . . . , k
2, . . . , k, p1 + p2 + + pk = 1
(x ) 2
1
Normal, N(, 2) f (x) = exp ,
2 2 2
x !; !, > 0 2
1 x2
(Standard Normal, N(0, 1) f ( x) = exp , x ! 0 1)
2 2
1 x
Gamma f (x) = x 1 exp , x > 0 ; 2
( )
, > 0
1 r
x
f ( x) =
1
Chi-square x 2 exp , x > 0;
r r2 2
2
2
r > 0 integer r 2r
Tables 543
Table 7 (continued )
1 1
Negative Exponential f ( x ) = exp( x ), x > 0 ; > 0
2
( )
2
1 +
Uniform, U(, ) f (x) = , x ;
2 12
< < <
( + )
f (x) = x 1 ( 1 x )
1
Beta , 0 < x < 1;
( ) ( )
, > 0
+ ( + ) ( + + 1)
2
1
Cauchy f ( x) = , x ! ;
2 + ( x )2
!, > 0 Does not exist Does not exist
1 q
Bivariate Normal f ( x1 , x 2 ) = exp ,
2 1 2 1 2 2
1 x1 1
2
x 1
q= 2 1
1 2 1 1
x 2 x2 2
2
2 + ,
2 2
x1, x2, !; vector of expectations: vector of variances:
1, 2 !, 1, 2 > 0, 1 1 (1, 2) ( 21, 22).
f ( x ) = (2 )
k 2 1 2
k-Variate Normal, N(, / ) /
1
exp ( x )
/ 1 ( x ),
2
x !k; !, / : k k mean vector: covariance
non-singular symmetric matrix matrix: /
(
(t ) = pe it + q , t ! ) (
M (t ) = pe t + q , t ! )
n n
Binomial, B(n, p)
(Bernoulli, B(1, p) (t ) = pe it + q, t ! M (t ) = pe t + q, t ! )
Poisson, P() (
(t ) = exp e , t ! it
) (
M (t ) = exp e t , t ! )
r r
p p
Negative Binomial (t ) = , t ! M(t ) = , t < log q
(1 qe ) (1 qe )
r r
it t
p p
(Geometric (t ) = , t ! M(t ) = , t < log q)
1 qe it 1 qe t
544 Appendix III Tables
Table 7 (continued )
( )
(t 1 , . . . , t k ) = p1e it1 + + pk e it k , (
M (t 1 , . . . , t k ) = p1e t1 + + pk e t k ,)
n n
Multinomial
t1, . . . , tk ! t1, . . . , tk !
2t 2 2t 2
Normal, N(, 2) (t ) = exp it , t ! M (t ) = exp t + , t !
2 2
t2 t2
(Standard Normal (t ) = exp , t ! M (t ) = exp , t ! )
2 2
1 1 1
Gamma (t ) = , t ! M(t ) = ,t<
(1 it ) (1 t )
1 1 1
Chi-square (t ) = , t ! M(t ) = ,t<
(1 2 it ) ( 1 2t )
r 2 r2
2
Negative Exponential (t ) = , t ! M(t ) = , t<
it t
e it e it e t e t
Uniform, U(, ) (t ) = , t ! M (t ) = , t !
it ( ) t ( )
Cauchy ( = 0, = 1) ( )
(t ) = exp t , t ! Does not exist (for t 0)
Bivariate Normal (t 1 , t 2 ) = exp[i1 t 1 + i 2 t 2 M(t 1 , t 2 ) = exp[ 1 t 1 + 2 t 2
2
(
1 2 2
)
1 t 1 + 2 1 2 t 1 t 2 + 22 t 22 ,
+
2
(
1 2 2
)
1 t 1 + 2 1 2 t 1 t 2 + 22 t 22 ,
t1, t2 ! t1, t2 R
1 1
k-Variate Normal ( t) = exp it t / t , M ( t ) = exp t + t / t ,
2 2
t !k t !k
1.1 Some Definitions and Notation 545
545
546 Some Notation and Abbreviations
}
E(X) or EX or
(X) or X or expectation (mean value, mean) of X
just
}
2(X) ((X)) or
2X (X) or variance (standard deviation) of X
just 2()
Cov(X, Y), (X, Y) Covariance and correlation coefficient, respectively, of X and Y
X or X , . . . , X ,
1 n
Chapter 2 2.1.1. P(Ac1 A2) = P(Ac2 A3) = 1/6, P(Ac1 A3) = 1/3,
P(A1 Ac2 Ac3) = 0, P(Ac1 Ac2 Ac3) = 5/12.
2.1.2. (i) 1/9; (ii) 1/3.
2.1.3. (i) 3/190; (ii) 4/190.
2.1.4. P(A) = 0.14, P(B) = 0.315, P(C) = 0.095.
2.2.7. P(Aj | A) = j(5 j)/20, j = 1, . . . , 5.
2.2.8. (i) 2/5; (ii) 5/7.
2.2.9. (i) 15/26; (ii) 13/24.
547
548 Answers to Selected Exercises
1 6
2.2.17. [ m j n j /( m j + n j )( m j + n j 1)].
3 j =1
2.4.1. 720.
2.4.2. 2n.
2.4.3. 900.
2.4.4. (i) 107; (ii) 104.
2.4.6. 1/360.
2.4.7. (i) 1/(24!); (ii) 1/(13!) (9!).
2.4.8. n(n 1).
2.4.12. 29/56.
2.4.13. (2n)!.
2n
2 n
2.4.14. (1 / 2) 2n j
.
j =n +1
2
n n
2.4.15. j p j (1 p)n j .
j =0
j 10 j
10
10 1 4
2.4.21. j 5 5 .
j=5
4 48 2 3
2.4.29. With regard to order: P(A1) = 0.125; P(A2) = 0.25; P(A3) =
52 3
0.19663; P(A4) = 0.015625; P[A5] = 0.09375.
7
Without regard to order: P(A1) = 0.13207; P(A2) = 0.50;
53
392 507
P(A3) = 0.18965; P(A5) = 0.08857.
2067 5724
Answers to Selected Exercises 549
3
26 26 13 13 4
4 48
2.4.31. (i) 2 ; (ii) 4 ; (iii) ; (iv) 384;
3 2 1 2 j = 2 j 5 j
13
(v) 4 .
5
2.6.3. 244/495.
4
Chapter 3 3.2.1. (i) {0, 1, 2, 3, 4}; (ii) P(X = x) = 2 4 , x = 0, 1, . . . , 4.
x
3.3.12. 2 tan 1 c / .
4.2.3. (i) c = any real > 0; (ii) fX(x) = 2(c x)/c2, x [0, c], fY( y) = 2y/c2,
y [0, c]; (iii) f(x|y) = 1/y, x [0, y], y (0, c], f( y|x) = 1/(c x),
0 x y < c; (iv) (2c 1)/c2, by assuming that c > 1/2.
4.2.4. (i) 1 ex, x > 0; (ii) 1 ey, y > 0; (iii) 1/2; (iv) 1 4e3.
1 1
5.4.5. P( X = ) = P I X < = P lim X <
n =1 n
n n
1
= lim P X < = 1.
n n
5.5.1. 0, 2.5, 2.5, 2.25, 0.
Answers to Selected Exercises 551
Chapter 7 7.1.1. FX (x) = 1 [1 F(x)]n, FX (x) = [F(x)]n. Then for the continuous
(1) (n)
fX (x) = nf(x)[F(x)]n1.
(n)
7.1.2. (i) fX (x1) = I(0,1 )(x1), fX (x2) = I(0,1)(x2); (ii) 1/18, /16, (1 log 2)/2.
1 2
n
n
7.1.8. (i) j p (1 p)
j n j
, p = P( X 1 B); (ii) p = 1 / e;
j =k
10
10
(iii) e j (1 e 1 )10 j 0.3057 independently of .
j=5 j
Chapter 8 8.1.1. pn 0 as n .
n) = 0 as n .
2
8.1.3. n )2 = 2(X
E(X
n
552 Answers to Selected Exercises
8.2.2. X (t ) = pn + qn = 1 it
e
e = e e = X t ,
n
n n
where X ~ P().
8.3.2. P(180 X 200) 0.88.
8.3.3. P(150 X 200) 0.96155.
8.3.5. P(65 X 90) 0.87686.
8.3.7. 0.999928.
8.3.8. 4,146.
8.3.11. c = 0.329.
8.3.15. n = 123.
8.3.16. 26.
2
1 n 1 n 1 n
8.4.3. E( X n n ) = 2 E ( X j j ) = 2 E( X j j ) 2 = 2
2
2
j
n j =1 n j =1 n j =1
1 M
2 nM = 0.
n n n
1 1
8.4.6. 2(X j ) = 2 j2 = 2 ( j2 ) = 2 and then Exercise 8.4.3 applies.
j j
n n
( ) 1 1
2
8.4.7. 2 ( X j ) = j so that E X n n =
n2
2j = n2
j
j=1 j=1
n
1 1
= j n
0.
n n j=1
Chapter 9 9.1.2. (ii) P(Y = c1) = 0.9596, P(Y = c2) = 0.0393, P(Y = c3) = 0.0011;
(iii) 0.9596c1 + 0.0393c2 + 0.0011c3.
9.1.3.
N( 59 + 32, 81
25 ).
2
1
i(ii) The transformation y = gives
1 + rr x1
2
r2 1 y dx r2 1
x= , o < y < 1, and = .
r1 y dy r1 y 2
r 1 y r2 1
fY ( y) = f X 2 =
r1 +2 r2 r
1
y2 1 y
( ) 2
( )
r1
2
1
r1
y r1 y 2
21 22
r r
()()
after cancellations. This last expression is the p.d.f. of the Beta distri-
bution with degrees of freedom r2 and r2 . 2 1
(iii) By (ii) and for r1 = (= r), 1/(1 + X ) is B(r/2, r/2) which is symmetric
about 1/2.
1 1 1
Hence P( X 1) = P = ; (iv) Set Y = r1 X . Then
1 + X 2 2
1
(r1 + r2 ) r /2 r /2
2 ( r / 2 )1 y y
2 1
fY ( y) = y 1 + 1 +
1
1 1 r2 r2
r1 r2 r 2r / 2 1
2 2
1
r
y( r / 2)1e y / 2
1
2
1 ( r / 2)1
r1 2 1
[ ]
r +r r
since 1 2 1
1 2 ( r / 2)1
2 r 2r / 2 r
1
2 2 2
1
r + 1 r
2 2
9.2.13 ()
fr x = r 2 ( r 1) 2
.
( r +1) 2 r + 1 r 2 r
2 e
2
2 e
2
1 2 1 2
r 1 2
( )
r
1 + 1 t2 t2 1
1 + 1 + e1 2 2
.
r r r
As r , the first two terms on the right-hand side converge to 1 (by Stirlings
formula), and the remaining terms converge to:
( )
1
2 1 2
e1 2 e t 2
1 e 1 2 2 = e t 2 .
2
9.3.5. (i)
u + v
~ N
0 2 1 +
,
( ) 0
,
u v 0 0
(
2 1 )
(ii)
x + y +
1 2 (
2 2 1 + ) 0
~ N , .
x y 1 2 0 (
2 2 1 )
( ) ( )
2
n n
i(ii) EY1 = ( )
+ , 2 Y1 = 2 Yn = , EYn = ( ) + ;
n+1 n+1
( )( )
2
n+1 n+2
(iii) EY1 =
1
( )
, 2 Y1 = 2 Yn =
n
( )
, EYn =
n
.
n+1 n +1
( )( )
2
n+1 n+2
10.1.9. For the converse, ent = P(Y1 > t) = P(Xj > t, j = 1, . . . , n) = [P(X1 > t)]n
so that P(X1 > t) = et. Thus the common distribution of the X s is the Negative
Exponential distribution with parameter .
Answers to Selected Exercises 555
n + 1 (2 k 1)! 1
10.1.14. With k = , 2 k1
( y )k1 ( y)k1 , y ( , ),
2 [( k 1)!] ( )
2
( 2k 1)!
and (1 e y )k 1 (e y )k , y > 0.
[(k 1)!]2
10.1.15. For n = 2k 1, fS M ( y) = [(( 2kk1)!]1)!2 [ F ( y)]k 1 [1 F ( y)]k 1 f ( y), y ! . But
f( y) = f ( + y) and F( + y) = 1 F( y). Hence the result.
10.1.17. fS (y) = 3!e2( y) [1 e(y)], y > .
M
n n n
Chapter 11 11.1.10. (i) X j ; (ii) ( X 1 , . . . , X n ); (iii) X j ; (iv) X j .
j =1 j =1 j =1
1
11.2.2. Take g(x) = x. Then E g(X ) =
2 x dx = 0 for every = (0, ).
n+1 n+ 2 2
Chapter 12 12.2.2. X (n ) , X (n ) .
2n 12 n
n+1
12.2.3. [ X (1) + X (n ) ] 2 , [ X X (1) ].
n 1 (n )
1 1
12.3.2. cn = 2 ( n + 1) n .
2 2
12.3.5. if the p.d.f. is in the form f ( x; ) = 1 e x , x > 0, and it is
It is X n 1 1
n X
if the p.d.f. is in the form f(x; ) = ex, x > 0.
12.3.6. (X + r)/r, 2[(X + r)/r] = (1 )/r 2.
n 1 n 2 n
2 X / ( X j X ) .
2
12.3.8.
2 2 j =1
556 Answers to Selected Exercises
n n
1
12.3.9. ( X j X )(Yj Y ) /(n 1), XY n(n 1) ( X j X )(Yj Y ),
j =1 j =1
n 1 n
n 3 n
2 ( X j X )(Yj Y ) /( n 1) (X j X ) .
2
2 j =1 2 j =1
12.5.6. log(X/n).
12.5.7. X(1).
X(1), X
n
X j n.
r
12.5.8.
j =1
12.5.9. ).
exp(x/X
ba b a
12.9.1. X , 2 X 2
= (a + b) / 12 n.
2 2
1 n
12.9.6. X S and S, where S 2 = ( X j X )2 .
n j =1
n
12.11.2. n (U n Vn ) = ( + ) X j n n ( + + n) and
j =1
E [ n (U n Vn )] = [( + )n n]/ n ( + + n) n 0,
2 [ n (U n Vn )] = n (1 )[( + ) / n ( + + n)]2 n
0.
n n
13.3.12. (i) Reject H if x j < C, C : P X j < C = ; (ii) n = 23.
j =1
0
j =1
13.4.1. H is rejected.
13.5.1. H : 0.04, A : > 0.04. H is accepted.
13.5.4. Assume normality and independence. H is accepted.
13.5.5. H : = 2.5, A : 2.5. H is accepted.
13.7.2. H is rejected in both cases.
13.8.3. Cut-off point = 2.82, H is accepted.
13.8.4. H (hypothesizing the validity of the model) is accepted.
13.8.7. H (the vaccine is not effective) is rejected.
Chapter 14 14.3.1. E0(N) = 77.3545, E1(N) = 97.20, n (fixed sample size) = 869.90
870.
14.3.2. E0(N) = 2.32, E1(N) = 4.863, n (fixed sample size) = 32.18 33.
Chapter 15 15.2.4. (i) fR(r) = n(n 1)r n2( r)/ n, r (0, ); (iii) The expected length
of the shortest confidence interval in Example 4 is = n(1/n 1)/(n + 1).
The expected length of the confidence interval in (ii) is = (n 1)
(1 c)/c(n + 1) and the required inequality may be seen to be true.
15.4.1.
ii(i) [ X n z / 2 / n , X n + z / 2 / n ];
100 0.196, X
i(ii) [X 100 + 0.196];
(iii) n = 1537.
15.4.2.
100
i(i) 100 0.0196S100, X
[X 100 + 0.0196S100], S 2100 = (X j X 100 ) 2 / 100;
j =1
j 1
10
15.4.5. P(Yi x p Yj ) = pk (1 p)10k = 1 . Let p = 0.25 and (i, j) =
k =i k
558 Answers to Selected Exercises
(2, 9), (3, 4), (4, 7). Then 1 = 0.756, 0.2503, 0.2206, respectively.
For p = 0.50 and (i, j) as above, 1 = 0.9786, 0.1172, 0.6562,
respectively.
15.4.7. [xp/2, xp/2], [0.8302, 2.0698].
Chapter 16 16.3.5.
0.280
ii(i) = 0.572 , 2 = 7.9536;
0.268
4.6 3.30 0.50
i(ii) 3.3
2
2.67 0.43 ;
0.5 0.43 0.07
16.4.2.
n
i(i) (i) Reject H if ( 0 ) (x j x )2 n 2 / ( n 2) > t n 2 ; / 2 ,
j =1
n n
where = ( xj x )Yj ( xj x )2 , 2
j =1 j =1
n
= [Yj ( xj x )]2 n , = Y ;
j =1
n
(ii) t n 2 ; / 2 n 2 ( n 2) ( x j x ) 2 ,
j =1
n
+ t n 2 ; / 2 n 2 ( n 2) ( x j x ) 2 .
j =1
16.5.1.
b 2 / n , a 2 / n , a, b : P(a t b) = 1 ;
1 1
n 2
n
(iv) Reject H if 2 x 2
j 2 > t n 2 ; / 2 ,
j =1
n n
2 b 2 x j2 , 2 a 2 j , a, b as in (iii).
x 2
j =1 j =1
16.5.5.
m m n n
( 1 1) 2 xi2 m ( xi x ) 2 + xj 2 n ( xj x ) 2
i =1 i =1 j =1 j =1
> t m +n 4 ; / 2 ,
and reject H2 if
m n
( 2 2 ) 2 1 ( xi x )2 + 1 ( xj x j )2 > tm +n 4 ; / 2 .
i =1 j =1
Chapter 17 17.1.1. SSH = 0.9609 (d.f. = 2), MSH = 0.48045, SSe = 8.9044 (d.f. = 6),
MSe = 1.48407, SST = 9.8653 (d.f. = 8).
17.2.2. SSA = 34.6652 (d.f. = 2), MSA = 17.3326, SSB = 12.2484 (d.f. = 3),
MSB = 4.0828, SSe = 12.0016 (d.f. = 6), MSe = 2.0003, SST = 58.9152 (d.f. = 11).
17.4.1. Yij N( i, 2), i = 1, . . . , I; j = 1, . . . , J independent implies
2
Yi i ~ N 0, , i = 1, . . . , I independent. Since
J
1 I
(Yi i ) = Y , we have that
I i =1
I
Chapter 18 18.1.3. Let X(1) = (Xi , . . . , Xi ), X(2) = (Xj , . . . , Xj ) and partition and / as
l m l n
follows:
(1) / / 12
= ( 2) , / = 11 .
/ 21 / 22
Then the conditional distribution of X(1), given X(2) = x(2), is the m-variate
Normal with parameters:
[ ]
(1) + / 12 + / 221 x ( 2) ( 2) , / 11 / 12 / 221 / 21 .
2
n n 2 n 2
18.3.7. In the inequality j j i j , i, j !, i, j = 1, . . . ,
j =1 i =1 j =1
n, set i = Xi X , j = Yj Y .
5 / 6 1 / 3 1 / 6
Chapter 19 19.2.5. Q = XCX, where X = (X1, X2, X3), C = 1 / 3 1 / 3 1 / 3
1 / 6 1 / 3 5 / 6
m n
Chapter 20 20.4.1. RX + RY = R( X i ) + R(Yj ) = 1 + 2 + + N = N ( N + 1) / 2.
i =1 j =1
20.4.3. Eu(Xi Yj) = Eu (Xi Yj) = P(Xi > Yj) = 1/2, so that 2u(Xi Yj ) =
2
INDEX
561
562 Index
H probability, 125
Tchebichevs, 126
Hazard rate, 91 Inference(s), 263
Hypergeometric distribution, 59, 542 Interaction(s), 452, 457
approximation to, 81 Intersection, 2
expectation of, 119, 542 Invariance, 307, 493
Multiple, 63 Inverse image, 82
p.d.f. of, 59, 542 Inversion formula, 141, 151
variance of, 119, 542
Hypothesis(-es) J
alternative, 331
composite, 331, 353 Jacobian, 226, 237, 240
linear, 416 Joint, ch.f., 150
null, 331 conditional distribution, 95, 467
simple, 331 conditional p.d.f., 95
statistical, 331 d.f., 91, 94
testing a simple, 333, 337 moment, 107
testing a composite, 341 m.g.f., 158
probability, 25
I p.d.f., 95
p.d.f. of order statistics, 249
Independence, 27
complete (or mutual), 28 K
criteria of, 164
in Bivariate Normal distribution, 168 Kolmogorov
in the sense of probability, 28 one sample test, 491
of classes, 177178 -Smirnov, two-sample test, 493
of events, 28 Kurtosis of a distribution, 121
of random experiments, 32, 46 of Double Exponential
of r.v.s, 164, 178 distribution, 121
of sample mean and sample variance in the of Uniform distribution, 121
Normal distribution, 244, 284, 479, 481
of -fields, 46, 178 L
pairwise, 28
statistical, 28 Laplace transform, 153
stochastic, 28 Latent roots, 481, 504
Independent Laws of Large Numbers (LINs), 198
Binomial r.v.s, 173 Strong (SLLNs), 198, 200
Chi-square r.v.s, 175 Weak (WLLNs), 198200, 210
classes, 178 Least squares, 418
completely, 28 estimator, 418, 420
events, 28 estimator in the case of full rank, 421
in the sense of probability, 28 Lebesgue measure, 186, 328
mutually, 28 Leptokurtic distribution, 121
Normal r.v.s, 174175 Level, of factor, 447
pairwise, 28 of significance, 332
Poisson r.v.s, 173 Likelihood, function, 307
random experiments, 32, 46 ratio, 365
r.v.s, 164, 178 Likelihood ratio test, 365, 430, 432
-fields, 46, 178 applications of, 374
statistically, 28 in Normal distribution(s), 367372
stochastically, 28 interpretation of, 366
Indicator, 84 Likelihood statistic, 365
function, 135 asymptotic distribution of, 366
Indistinguishable balls, 38 Limit
Inequality(-ies) inferior, 5
Cauchy-Schwarz, 127 of a monotone sequence of sets, 5
Cramr-Rao, 297-298 superior, 5
Markov, 126 theorems, basic, 180
moment, 125 theorems, further, 202
566 Index