www.MathGeek.
com
Probability:
Basic Ideas and Selected Topics
Eric B. Hall
Gary L. Wise
ALL RIGHTS RESERVED.
UNAUTHORIZED DUPLICATION
IS STRICTLY PROHIBITED.
www.MathGeek.com
www.MathGeek.com
Preface
In writing this book we were faced with a serious dilemma. To
how much of the vast subject of probability theory should an
undergraduate student be exposed? Although it is tempting to
remain at the level of coin flipping, card shuffling, and Riemann
integration, we feel that such an approach does a great disservice
to the students by reinforcing the many popular myths about
probability. In particular, probability theory is simply a branch
of measure theory, and no one should sugarcoat that fact. Al
though some might suggest that this approach is over the head
of an average student, such has not been the case in our experi
ence. Indeed, most of the reluctance to cover probability at this
level seems to originate behind the desk rather than in front of
it. The importance of probability is increasing even faster than
the frontier of scientific knowledge, and hence the usefulness of
the standard nonmeasuretheoretic approach is being left far
behind. Students need to be able to reason clearly and think
critically rather than just learn how to parrot a few simplistic
results. Our goal with this book is to provide the serious student
of engineering with a rigorous yet understandable introduction
to basic probability that not only will serve his or her present
needs but will continue to serve as a useful reference into the
next century.
www.MathGeek.com
www.MathGeek.com
www.MathGeek.com
www.MathGeek.com
Contents
Preface 1
Acknowledgments 7
Introduction 9
Notation 11
1 Set Theory 13
1.1 Introduction 13
1.2 Unions and Intersections 14
1.3 Relations.. 19
1.4 Functions . . . . . . . 19
1.5 crAlgebras . . . . . . . 24
1. 6 Dynkin 's 1l" A Theorem 28
1.7 Topological Spa<:es . . 30
1.8 Caveats and Curiosities. 31
2 Measure Theory 33
2.1 Definitions.. 33
2.2 Snpremums and Infimnms . . . . . . . . . 35
2.3 Convergen<:e of Sets: Lim Inf and Lim Sup 36
2.4 Measurable Functions. . . . . . . . . . . . 38
2.5 Real Borel Sets . . . . . . . . . . . . . . . 39
2.6 Lebesgue Measure and Lebesgue Measurable Sets 43
2.7 Caveats and Curiosities. . . . . . . . . . . . . .. 46
3 Integration 47
3.1 The Riemann Integral . . . . . 47
3.2 The RiemannStieltjes Integral 49
3.3 The Lebesgue Integral . . . . . 51
www.MathGeek.com
www.MathGeek.com
3.3.1 Simple Functions 51
3.3.2 Measurable Functions. 52
3.3.3 Properties of the Lebesgue Integral 53
3.4 The Riemann Integral and the Lebesgue Integral. 55
3.0t:: The RiemannStieltjes Integral and the Lebesgue
Integral 56
3.6 Caveats and Curiosities . 57
4 Functional Analysis 59
4.1 Vector Spaces 59
4.2 Normed Linear Spaces 60
4.3 Inner Product Spaces . 62
4.4 The RadonNikodym Theorem 68
4.5 Caveats and Curiosities . 68
5 Probability Theory 69
5.1 Introduction. 69
5.2 Random Variables and Distributions 70
5.3 Independence 75
5.4 The Binomial Distribution 80
5.4.1 The Poisson Approximation to the Bino
mial Distribution 82
5.5 Multivariate Distributions 83
5.6 Caratheodory Extension Theorem 86
5.7 Expectation 94
5.8 Useful Inequalities 98
5.9 Transformations of Random Variables. 102
5.10 Moment Generating and Characteristic Functions 105
5.11 The Gaussian Distribution 108
5.12 The Bivariate Gaussian Distribution 112
5.13 Multivariate Gaussian Distributions 113
5.14 Convergence of Random Variables 116
5.14.1 Pointwise Convergence 116
5.14.2 Almost Sure Convergence 116
5.14.3 Convergence in Probability. 117
5.14.4 Convergence in Lp 118
5.14.5 Convergence in Distribution 119
5.15 The Central Limit Theorem 120
5.16 Laws of Large Numbers 122
5.17 Conditioning 123
www.MathGeek.com
www.MathGeek.com
5.18 Regression Functions 129
5.19 Statistical Hypothesis Testing 132
5.20 Caveats and Curiosities . 134
6 Random Processes 135
6.1 Introduction. 135
6.2 Gaussian Processes 138
6.3 Second Order Random Pro<:esses 139
6.4 The KarhunenLoeve Expansion. 143
6.5 Markov Chains 145
6.6 Markov Processes 147
6.7 Martingales 149
6.8 Random Processes with Orthogonal Increments 151
6.9 \Vide Sense Stationary Random Processes 154
6.10 ComplexValued Random Processes. 156
6.11 Linear Operations on VVSS Random Processes 157
6.12 Nonlinear Transformations 158
6.13 Brownian Motion 164
6.14 Caveats and Cnriosities . 168
7 Problems 169
7.1 Set Theory. 169
7.2 Measnre Theory . 171
7.3 Integration Theory 172
7.4 Functional Analysis 174
7.5 Distributions & Probabilities. 174
7.6 Independence 176
7.7 Random Variables. 177
7.8 Moments. 179
7.9 Transformations of Random Variables. 181
7.10 The Gaussian Distribution 182
7.11 Convergen<:e . 183
7.12 Conditioning 185
7.13 True/False Questions 187
8 Solutions 193
8.1 Solutions to Exercises. 193
8.2 Solutions to Problems 202
8.3 Solutions to True/False Questions 238
www.MathGeek.com
www.MathGeek.com
www.MathGeek.com
www.MathGeek.com
Acknow ledgments
The authors would like to thank David Drumm for many help
ful suggestions. In addition, they would like to thank Dr. Herb
\Voodson, Dr. Stephen Szygenda, Dr. Tom Edgar, Dr. Edward
Powers, Dr. Francis Bostick, and Dr. James Cogdell. Also, they
would like to acknowledge the wonderful help that GL\V received
in his recovery from a stroke, and in this regard, they mention
the supportive friendship of the preceding friends as well as that
of Dr. Michael Edmond and many dedicated therapists, includ
ing Michelle Sanderson, Jerilyn Iliff, Janice Johnson, Audrey
Schooling, Liz Larue, and Mischa Smith. Finally, they are grate
ful to Carey Taylor of the Texas Rehabilitation Commission for
his help in providing services for GL\V's recovery.
This book was typeset using the u\TEX typesetting system de
veloped by Donald Knuth and Leslie Lamport.
www.MathGeek.com
www.MathGeek.com
www.MathGeek.com
www.MathGeek.com
Introduction
This book is designed to impart a working knowledge of prob
ability theory and random processes that will enable a student
to undertake serious studies in this area. No prior experien<:e
with probability, statistics, or real analysis is required. All that
is needed is a familiarity with basic <:akulus and an ability to
follow mathemati<:al reasoning.
Any course on probability theory must go down one of two roads.
On the first road the student flips mins, shuffles <:ards, looks
at pretty bell <:nrves, and mnsiders many simple <:onseqnen<:es
of deep, dark theorems mentioned only in footnotes. Although
this road is popular with engineers (and some statistidans), it is
a deadend road that produces students capable of dealing only
with a few overlyrestrictive special cases and incapable of think
ing for themselves. The semnd road treats probability theory as
a bran<:h of an area of mathematics known as measnre theory.
Althongh this approa<:h reqnires a stndent to first learn some
very basic aspects of set theory and real analysis, the benefits of
taking this road are enormous. Students suddenly understand
the results that they are applying, formerly obtuse theorems be
come transparently easy, and seemingly advanced engineering
tools such as the Kalman filter are seen as simple consequences
of much more general results. In this work we will take the latter
road without apology.
www.MathGeek.com
www.MathGeek.com
www.MathGeek.com
www.MathGeek.com
Notation
the set of all real numbers
the set of all integers
the set of all integers greater than zero
the set of all rational numbers
the set of all complex numbers
the empty set
'l the imaginary unit
z* the complex conjugate of the complex
number z
A~B A c B and A i= B
lP(S) the set of all subsets of the set S
IA the indicator function of the set A
AC the complement of the set A
A\B the set of points in A that are not in B
ADB (A \ B) U (B \ A)
A {x: x E A} for A c JR.
Lp(D, F, JL) the set of all JLequivalence classes
of functions f:(0., F) + (JR., B(JR.))
such that Jo Ifl P dJL < OCI
B(T) the Borel subsets of a Borel
subset T of JR.
M(A) the Lebesgue measurable subsets
of A E B(JR.)
m Lebesgue measure on M(JR.)
). Lebesgue measure on B(JR.)
a({A;: i E I}) the smallest aalgebra including {Ai: i E I}
1T({X;:iEI}) the smallest ITalgebra for which
Xi is measurable for each i E I
www.MathGeek.com
www.MathGeek.com
12
the function j restricted to A
max{j, O} for a realvalued
function j
 min{j, O} for a realvalued
function j
a.e. [J1] almost everywhere with respect to the
measure J1; i.e. pointwise off
a J1null set
a.s. almost surely
{a E A: condition} the set of points in A for which the
indicated condition is true
V "for all" or "for each"
::3 "there exists"
st "such that"
wp "with probability"
"has the distribution"
D Quod Erat Demonstrandum
<) This symbol denotes an unusually difficult
section or problem. Proceed with caution.
www.MathGeek.com
www.MathGeek.com
1 Set Theory
1.1 Introduction
\Ve will take a naive approach to set theory. That is, we will
assnme that any describable collection of objects is a set. Con
sider a set A By writing x E A we will mean that x is an
element of the set A By writing x t/:. A we will mean that x is
not an element of the set A. Note that x E A and x t/:. A cannot
both be true simultaneously. To see why our approach is naive,
let R denote the set of all sets A such that A t/:. Al If R E R
then by definition it follows that R t/:. R. Similarly, if R t/:. R
then by definition it follows that R E R. Thus, although R is a
describable collection of objects, R is not a set!
This paradox was discovered by Bertrand Russell and had a
rather devastating effect on the work of a German logician
named Gottlob Frege who later wrote: "To a scientific author
hardly something worse can happen than the destruction of the
foundation of his edifice after the completion of his work. I
was placed in this position by a letter of Mr. Bertrand Russell
when the printing came to a dose." To avoid such paradoxes,
set theory is based upon systems of axioms such as the Zermelo
F):·aenkel system. Mathematics is based upon such systems of
axioms and "mathematical truths" mnst be nnderstood in that
light. One such axiom that we will use without hesitation is
the Axiom of Choice which simply states that for any collec
tion {Xa : a E A} of nonempty sets, there exists a function c
mapping A to UaEA Xa such that c( a) E Xa for each a E A
Although seemingly innocuous, there are many deep and dark
consequences of the Axiom of Choice.
The set with no elements is called the empty set and is denoted
by 0. Vve say that a set B is a subset of a set A and we write
B c A if x E A whenever x E B. Vve sometimes denote this by
1 It is possible for a set to be an element of itself. For example, the set
of all sets that contain more than one element is itself a set that contains
more than one element, and hence is an elelnent of itself.
www.MathGeek.com
www.MathGeek.com
14 Set Theory
saying that A is a superset of B in which case we write A :J B.
Note that any set is a subset and a superset of itself. Two sets A
and B are said to be equal if A c B and if B c A. In this case
we write A = B. If A and B are not equal we write A i= B. A
set A is said to be a proper subset of B if A c B and if A i= B.
(We sometimes denote this by writing A~B.)
Later generations will regard set theory as
a malady from which one has recovered.
Poincare
Consider a nonempty set 0 and let x be an element from 0.
The set {x} containing only the element x is called a singleton
set. In general, for elements Xi from 0 where i ranges over some
index set I we will let {Xi: i E I} denote the set containing only
the elements Xi for i E I.
Exercise 1.1 Is there any difference between {0} and 0?
For any set A the power set of A is denoted by JID(A) and is
defined to be the set of all subsets of A. That is, a set B is
an element of JID(A) if and only if B c A. In set notation we
may write JID(A) = {B: B C A}. Note that 0 E JID(A) and that
A E JID(A) for any set A.
Exercise 1.2 \Vhat is JID(0)?
1.2 Unions and Intersections
Let 0 and I be nonempty sets and consider a collection of sub
sets of 0 denoted by {Ai: i E I}. In this case the set I is called
an index set and is often taken to be a subset of the real line R
The intersection of the sets in {Ai:i E I} is denoted by niEI Ai
www.MathGeek.com
www.MathGeek.com
Unions and Intersections 15
and is defined to be the set of all points in n that are in Ai for
each i E I. That is,
nA =
iEI
{x En: x E Ai Vi E I}.
(Note that this intersection equals n if I = 0.) The union of
the sets in {Ai: i E I} is denoted by UiEI Ai and is defined to
be the set of all points in n that are in Ai for some i E I. That
IS,
U Ai = {x En: 3i E I st x E Ai}.
iEI
(Note that this union equals 0 if I = 0.) In other words, for
two sets A and B, the set An B contains elements that are in
A and in B and the set A U B contains elements that are in A
or in B. 2 If I = {I, ... , n} for some positive integer n then we
will often write
as n
or as Al n ... n An and similarly for unions. If 1= N, the set of
positive integers, then we will often write
as
=
and similarly for unions.
Consider three sets A, B, and C. YOll should be able to prove
the following properties concerning unions and intersections:
1. A n B = B n A and A U B = B U A. That is, unions and
intersections are commutative.
2. A n0 = 0 and A U 0 = A.
2This "or" is not an exclusive or. That is, a point that is in both A and
B is also in A u B.
www.MathGeek.com
www.MathGeek.com
16 Set Theory
3. A U A = A n A = A. That is, unions and intersections are
idempotent.
4. (AUB)UC = AU(BUC) and (AnB)nC = An(BnC).
That is, unions and intersedions are associative.
5. (A n B) c A and A c (A U B).
6. A c B if and only if A U B = B.
7. If A c C and B c C then (A U B) c C.
8. (A U B) n C = (A n C) U (B n C) and (A n B) U C =
(A U C) n (B U C). That is, nnions and intersedions are
distributive.
Exercise 1.3 Prove that A c B if and only if A U B = B.
The set difference of two sets A and B is denoted by A \ Band
is defined to be the set of points in A that are not in B. The set
A \ B is sometimes called the relative complement of B in A.
If the set A is clear from the context of our discussion we will
often write A \ B as BC and refer to it simply as the complement
of B. That is, if n is some fixed nonempty underlying set then
BC = {x En: x tj. B}.
The symmetric difference of two sets A and B is denoted by
AL:.B and is defined to be the set (A \B) U (B\A). Two sets A
and B are said to be disjoint if A n B = 0. A colledion of sets
is said to be disjoint if any two distind sets from the colledion
are disjoint.
In what follows, any set of the form AC should be interpreted
to refer to the set n \ A for some fixed nonempty set n that
contains every point of interest. You should be able to prove
the following properties:
2. Au AC = n.
3. A and AC are disjoint.
www.MathGeek.com
www.MathGeek.com
Unions and Intersections 17
It may seem to be a stark paradox that,
just when mathematics has been brought
close to the ultimate in abstractness, its
applications have begun to multiply and
proliferate in an extraordinary fashion .
. . . Far from being paradoxical, however,
this conjunction of two apparently oppo
site trends in the development of mathe
matics may rightly be viewed as the sign
of an essential truth about mathematics it
self. For it is only to the extent that math
ematics is freed fi"Oln the bonds which have
attached it in the past to particular aspects
of reality that it can become the extremely
flexible and powerful instrument we need
to break paths into areas now beyond our
ken. Marshall Stone
The Cartesian product of two sets A and B is denoted by A x B
and is defined to be the set of all ordered pairs (a, b) for which
a E A and bE B. For example, JR. x JR. (often denoted by JR.2) is
the plane. For n E N, the Cartesian product of n sets A 1 , ... ,
An is the set of all orderedntuples (a1, ... , an) where ai E Ai
for each positive integeri :::::; n. This product is denoted by
n
ITA;
i=l
or by A1 X ... x An. Note that this product is empty if Ai is
empty for any i. For example, JR. x JR. x JR. (denoted by JR.3) is the
set of all ordered triples of three real numbers. Note that JR.3,
JR. X JR.2, and JR.2 x JR. are three distinct sets.
For sets A and B, the set BA is the set of all functions mapping
A into B. Let A and 0 be nonempty sets and, for each), E A,
let A,\ be a nonempty subset of 0. The Cartesian product of
the A,\ 's over the set A is a subset {w.,\ EO: ). E A} of OA such
www.MathGeek.com
www.MathGeek.com
18 Set Theory
that for all A E A, w).. E A)... \Ve denote this product by
II A)...
)..EA
In the context of this product, the set A).. is called the Ath
factor. Also, if {h En: A E A} is a point in the product
then h is called the Ath coordinate of the point. For A E A,
we will let 'iT)..: ITaEA A" 7 A).. be the mapping that assigns a
point in IT)..EA A).. to its Ath coordinate. The map 'iT).. is called
the canonical projedion into the Ath fador or the evaluation
at A.
1.1 Theorem (DeMorgan's Law) Let n and I be nonempty sets
and assume that Ai C n
for each i E I. Then
Proof. If I = 0 then the result reduces to n
follows by definition.
Let I be an arbitrary nonempty set, and for each i E I, let Ai
be a subset of n. First assume that niEI Ai = 0. Then for each
wEn, there exists some i E I such that w tj Ai and hence such
that w E UiEI Af. We have shown that n c UiEI Af. Clearly,
UiEI Af c n. Thus, n = UiEI Af, and it follows that
nA =
iEliEI
0 = n c
= (u A~) c
Now, assume that niEI Ai i= 0. Let w be any point belonging to
niEI Ai· Then w E Ai for eachi E I. In particular, w tj UiEI Af,
and thus
nAi c (UA~)C
iEI iEI
Conversely, it follows from this that assnming that niEI Ai i= 0
implies that (UiEI A~r i= 0. Now let w E (UiEI A~r. Then
w tj Af for any i E I and thus w E Ai for all i E I. Therefore,
(uA~)C c nA.
lEliEI
www.MathGeek.com
www.MathGeek.com
Relations 19
Hence, we have
nAi = (U A~)C
iEI iEI
for any set I and for any family {Ai: i E I} of subsets of D. D
Note that the following corollaries are an immediate consequence
of the previous theorem.
1.1 Corollary Au B = (AC n BC)c.
1.2 Corollary An B = (AC U BC)c.
1.3 Relations
Consider subsets A and B of a nonempty set o. A relation R
between A and B is a subset of Ax B. If R is a relation between
A and B, then two points a E A and b E B are said to be R
related if (a, b) E R. Vve will call a relation R between A and
A a relation R on A. A relation R on A is said to be transitive3
if (aI, a2) E Rand (a2, a3) E R imply that (aI, a3) E R. A
relation R on A is symmetric if (aI, a2) E R implies that (a2'
al) E R. A relation R on A is reflexive if (a, a) E R for all
a E A. A relation R on A is called an equivalence relation if it
is reflexive, symmetric, and transitive.
1.4 Functions
Let A and B be nonempty sets. A function f mapping A into
B (written as f: A 7 B) is a relation between A and B such
that:
JEven though a relation is a set, there is a difference between a transitive
set and a transitive relation. Here we are defining a transitive relation.
www.MathGeek.com
www.MathGeek.com
20 Set Theory
1. if a E A then there exists b E B such that (a, b) E 1, and,
2. if (a, bd E 1 and (a, b2 ) E 1 then bl = b2 ·
Thus, a function is defined everywhere on A and assigns precisely
one element of B to an element in A. If A and Bare nonempty
sets and if 1 is a function mapping A into B, then we typically
use the notation 1(a) = b to denote that (a, b) E .f. The set A
is called the domain of the function 1. If SeA then we will
let I(S) denote the subset of B given by {b E B : b = l(a) for
some a E S}. The set 1(S) is called the image of Sunder 1.
The set 1(A) is sometimes called the range of 1. By convention,
for a E A, 1({ a }) is usually taken to be the element 1(a) E B
rather than the subset {I (a)} of B. For any set A, the indicator
function of A is denoted by IA (x) and equals 1 if x E A and
equals zero otherwise.
Example 1.1 Let f: lR 7 [0,00) via l(x) = x 2 . Then 1((1,
2]) = (1, 4], 1({2, 3}) = {4, 9}, and 1({3, 3}) = {9}. D
A function 1: A B is said to be injective or onetoone if any
7
two distinct elements of A have distinct images in B; that is, if
UI i= a2 then l(al) i= l(a2). A function f: A 7 B is said to be
snrjective or onto if I(A) = B; that is, given any b E B there
exists an a E A such that 1(a) = b. A function 1: A 7 B is
said to be bijective or to be a bijection between A and B if it is
both injective and surjective.
Let f: A 7 B and let M c B. The inverse image of }vl with
respect to 1 is denoted by 11 (}vl) and is defined to be the set
{a E A : l(a) E AI}. Note that 11 is a function mapping
lP(B) into lP(A). A function f: A 7 B is bijective if and only if
11 ({x} ) is a function mapping the set of all singleton subsets of
B into the set of all singleton subsets of A. In this case we write
11 ({ X } ) as 11 (x) and say that 1 is invertible with inverse 11.
Exercise 1.4 For each of the following functions answer the
following questions: Is the function onto? Is the function one
toone? If yes to both then what is the inverse of the function?
www.MathGeek.com
www.MathGeek.com
Functions 21
The inverse of a function f: A 7 B can be defined in two
different ways. First, we can consider f 1 to be a function
that maps JP(B) to JP(A). This type of inverse always
exists for any function f. Second, we can consider f 1
to be a function that maps B to A. This type of inverse
exists if and only if f is onetoone and onto. The type
of inverse under consideration must be inferred from the
context.
2. f: JR. 7 [0, (0) via f(x) = x 2 .
3. f: [0, (0) 7JR. via f(x) = x 2 .
4. f: [0, (0) 7 [0, (0) via f(x) = x 2 .
Exercise 1.5 For a bijection f: A 7 B show that fU 1 (b)) =
band fl(f(a)) = a for each a in A and each b in B.
Exercise 1.6 Let S be any set with exactly two elements and
let R be any set with exactly three elements. Does there exist
a bijection from R into S? Why or why not? Does there exist a
bijection from S into R? Why or why not?
Exercise 1. 7 If there exists a bijection of A into B then must
there also exist a bijection of B into A?
Two sets A and B are said to be equipotent if there exists a
bijection mapping A to B. A set S is said to be countable if it is
empty or if it is equipotent to a subset of the positive integers.
A set S is said to be finite if it is empty or if it equipotent to a
set of the form {I, 2, ... , n} for some positive integer n. A set
S is said to be countably infinite if it is countable but not finite.
A set is said to be uncountable if it is not countable.
<) Example 1. 2 Let A = JR. and let B denote the set of all
functions that map A into {O, I}. Vve will show that A and B
www.MathGeek.com
www.MathGeek.com
22 Set Theory
are not equipotent. Assume, by way of contradiction, that A
and B are equipotent. There then exists a function g: A +
B such that 9 is onto and onetoone. For a real number a,
denote g(a) by fa(x); that is, g(a) is a function mapping lR to
{O, I}. Let 4J(x) = 1  fx(x) and note that 4J E B. Since 9
is bijective, there exists a point a E A such that 4J(x) = fa (x)
which implies that 1  .Ix (x) = fa (x). If we let x = 0: then
it follows that 1  .In (a) = .In (a) which in turn implies that
fa(c~) = ~. This, however, is not possible since fa takes values
only in the set {O, I}. This contradiction implies that A and B
are not equipotent. D
1.2 Theorem (Dedekind) A set ,is an infinite set if and only 'if it
is eq1L'ipotent to a proper subset of 'itself.
1.1 Lemma If A and B are sets, if A is countable, and if .I: A +
B J then f(A) ,is cmmtable.
1.2 Lemma Let T be a set having at least two distinct elements and
let I be an infinite set. The set of all functions mapping I to T
is uncountable.
1.3 Theorem (SchroederBernstein) Let A and B be sets. If
there eJ;ists a onetoone mapping of A to B and a onetoone
mapping of B to A then A and B are equipotent.
Proof. This result is proved on page 20 of Real and Abstract
Analysis by E. Hewitt and K. Stromberg (SpringerVerlag, New
York, 1965). D
1.3 Lemma ( Cantor) For any set OJ the sets 0 and lfD(O) are not
equipotent.
Proof. Assume that 0 and lfD(O) are equipotent. There then
exists a function .I mapping 0 to lfD(O) that is onto and oneto
one. Let U = {w ED: w tj. f(w)}. Since U E lfD(O) it follows
that U = .I (x) for some point x in D. Is x E U? If x E U then
x tj. f(x) which implies that x tj. U. If x tj. U then x E f(x)
which implies that x E U. Thus, no sHch fmIction f exists and
www.MathGeek.com
www.MathGeek.com
Functions 23
the desired result follows. Note that this lemma implies that
"the set of all sets" is not a set! D
Exercise 1.8 Show that any subset of a countable set must
itself be countable.
Exercise 1.9 Show that a count ably infinite union of count
able sets is conntable. That is, show that UiEN Ai must be connt
able if Ai is countable for each i E N.
1.4 Theorem The set Q of rational numbeTs is countable.
Proof. Note that
Q U[U {::}]
nE.:z kEN k
and hence Q is countable since it may be written as a countable
union of countable sets. D
1.5 Theorem The set of all real numbeTs is an uncountable set.
Proof. Assume that [0, 1) is countable. Hence there exists a
bijective function f: [0, 1) 7 N. Using this function f, enumer
ate the set [0,1) as a sequence {aI, a2, ... }. Notice that each ai
corresponds to a point in [0, 1) and hence may be expressed as
a decimal expansion where we agree that any expansion ending
with a string of all 9's will instead be written in a form ending
with a string of O's. Construct an element b of [0, 1) as follows:
Let b = 0.17.117.217.3'" where, for each i E N, 17.i is chosen to be
a single digit that is not equal to the ith digit in the decimal
expansion of ai. Since b is an element in [0, 1) that is not eqnal
to ai for any i we conclude that [0, 1) (and hence JR) is uncount
able. D
www.MathGeek.com
www.MathGeek.com
24 Set Theory
1.5 oAlgebras
Consider a nonempty set D and a subset A of JPl(D). (That is,
an element of A is a subset of D.) The set A is said to be an
alge bra (or afield) on D if the following three properties are
satisfied:
1. DE A.
2. If A E A then Ac E A.
3. If A E A and B E A then A U B E A.
That is, an algebra on D is a subset of JPl(D) that contains D,
that is dosed under complementation, and that is dosed under
finite unions.
The set A is said to be a aalgebra (or a afield) on the nonempty
set D if the following three properties are satisfied:
1. DE A.
2. If A E A then AC E A.
3. If An E A for each n E N then UnEl'I An E A.
That is, a ITalgebra on D is a subset of JPl(D) that contains D,
that is closed under complementation, and that is closed under
countable unions. Note that any aalgebra is an algebra and that
any algebra contains the empty set. Note also that DeMorgan's
Law implies an algebra is closed under finite intersections and
that a aalgebra is closed under countable intersections. Finally,
note that any algebra containing only a finite number of elements
is also a ITalgebra.
Let's briefly review our notation: Let A be a subset of a
nonempty set D and assume that w is a point in A and that A is
an element of an algebra A on D. Then w E {w} cAe D E A,
A E A c JPl(D), and w E A. In the following exercises let D be
a nonempty set.
www.MathGeek.com
www.MathGeek.com
cr Algebras 25
Exercise 1.10 Is {0, O} a cralgebra on O?
Exercise 1.11 Is lP(O) a cralgebra on O?
Exercise 1.12 Let 0 {l, 2, 3}. Find five different a
algebras on O.
Exercise 1.13 Show that an intersection of cralgebras is
itself a aalgebra. Does the same hold for a union of aalgebras?
Exercise 1.14 Let 0 be the set of all real nnmbers and let A
be the collection of all snbsets of D that are either finite or have
finite complements. (A set with a finite complement is said to
be cofinite.) Is A an algebra on O? Is A a aalgebra on O?
Exercise 1.15 Let 0 be the set of all real numbers and let
A be the collection of all subsets of 0 that are either countable
or have conntable complements. (A set with a countable com
plement is said to be coconntable.) Is A an algebra on D? Is A
a cralgebra on O?
Consider a nonempty set 0 and a cralgebra A on O. The ordered
pair (0, A) is called a measurable space and sets in A are called
measurable sets. Later we will refer to D as a sample space and
refer to measnrable sets as events.
Consider a non empty set 0 and let F be any subset oflP(O). The
cralgebra generated by F is denoted by cr(F) and is defined to
be the smallest cralgebra on 0 that contains each element in F.
That is, if B is any cralgebra on 0 that contains each element in
F then cr(F) C B. Note, also, that if F is already a cralgebra,
then a(F) = F.
Exercise 1.16 ~What is the difference (if any) between a(0)
and a( {0})?
www.MathGeek.com
www.MathGeek.com
26 Set Theory
Exercise 1.17 \Vhat is iT( {0})7
Exercise 1.18 \Vhat is iT( {A}) for a subset A of 07
Exercise 1.19 What is 0"( {A, B}) for subsets A and B of 07
(In general, 0"( {A, B}) will contain 16 elements.)
Exercise 1.20 Consider a iTalgebra F on a nonempty set O.
Does there exist a subset A of n snch that A C F and A E F7
\Ve will next consider several properties of inverse functions. In
each of the following three lemmas we will let the context set
our notation.
Let A and B be nonempty sets and let .I: A + B. Further, for
a nonempty set I, let Bi be a subset of B for each i E I. Note
that:
iEI
(by definition of the inverse)
{a E A : 3i E I st .I (a) E Bi }
(by definition of the union)
{a E A : 3i E I st a E .1 1 (Bin
(by definition of .1 1 )
{a E A: a E U f1(B i n
iEI
(by definition of the union)
U f1(B i ).
iEI
Thus, we have the following result:
1.4 Lemma .1 1 (U
iEI
Bi) = U
iEI
f1(B i ).
www.MathGeek.com
www.MathGeek.com
crAlgebras 27
Next, let M be a subset of B and notice that
a E (fl(NI))C ¢:::::} a tJ. jl (lVI)
¢:::::} j(a) tJ. .M
¢:::::} j(a) E N[C
¢:::::} a E jl(l\;I C
).
Thus, we have the following result:
Again, let A and B be nonempty sets and let f: A + B. Fur
ther, for a nonempty set I, let B;, for i E I, be a subset of B.
Note that:
UU 1(Bi))C)C; via DeMorgan's Law
(1EI
(u jl(Bn)C; via Lemma 1.5
1EI
(jl (~Bf) J; via Lemma 1.4
jl ( (~Bf J) ; via Lemma 1.5
jl (n
1EI
Bi) ; via DeMorgan's Law.
Thus, we have the following result:
1.6 Lemma n
1EI
jl(Bi ) = jl (n 1EI
Bi) .
If f: A + B and if F is a subset of JID(B) then we will let jl(F)
denote the subset of JID(A) consisting of every subset of A that is
an inverse image of some element in F. That is, S E jl (F) if
and only if S = jl (T) for some T E F. The following theorem
follows quickly from the three preceding results.
1.6 Theorem Let A and B be nonempty sets and let j: A + B. If
B is a cralgebm on B then jl(B) is a cralgebm on A.
www.MathGeek.com
www.MathGeek.com
28 Set Theory
Exercise 1.21 Let A and B be nonempty sets and let f:
A + B. For a subset 5 of A, let f(5) denote the subset of B
given by {f(.s) : .s E 5}. For a subset P of lP(A), let f(P) denote
the subset of lP(B) given by {f(5) : 5 E P}. If A is a cralgebra
on A then must f(A) be a cralgebra on B?
1.7 Theorem Consider measumble spaces (rh, .r1) and (0 2 , .r2)
and let f be a function mapping 0 1 to O2 . Let A be a collection
of subsets of O2 s1Lch that rr(A) = .r2. If f1(A) c .r1 then
f 1(.r2) c .r1.
Proof. It follows from Lemma 1.4 and Lemma 1.5 that the
collection Q of all subsets A of O2 such that f1(A) E .r1 is a cr
algebra on O2 . Note that O2 E Q since f1(0) = 0. (Note that
in the last equation the first empty set is o~ and the second
empty set is n1) Further, note that A c Q. This implies
that rr(A) C Q. Since rr(A) = .r2 the desired result follows
immediately. D
<> 1.6 Dynkin's 'iTA Theorem
Consider a nonempty set o. A subset P of lP(O) is said to be a
7rsystem if it is dosed under the formation of finite intersections;
that is, if A E P and B E P imply that An B E P. A subset L
of lP(O) is said to be a Asystem if it satisfies the following three
properties:
1. 0 E L.
2. If A E L then AC E L.
3. If An E L for each 17 E N and if Ai n Aj = 0 when i 1 j
then Un EN An E £.
www.MathGeek.com
www.MathGeek.com
Dynkin's ITA Theorem 29
That is, a Asystem contains 0, is closed under the formation
of complements, and is closed under the formation of countable
disjoint unions.
The following resnlt is called Dynkin's 1TA Theorem and is often
quite helpful in proving uniqneness.
1.8 Theorem (Dynkin) Consider a nonempty set 0 and subsets
P and £, of JID(O). If P is a 1Tsystem and if £, is a Asystem
then Pc£, implies that rr(P) C £'.
Proof. Let A(P) denote the intersection of all Asystems that
in dude P as a snbset. Each family of sets in this intersection
contains 0, is closed under proper differences, and is closed un
der strictly increasing limits of sets. 4 Thus, the intersection itself
contains 0, is closed under proper differences, and is closed un
der strictly increasing limits of sets. Hence, A(P) is a Asystem.
Note that A(P) C £. Thus if A(P) is a 1Tsystem, then it will
be a rralgebra that is a superset of rr(P) and a subset of £', and
hence the desired result will follow. Therefore, we will show that
A(P) is a 1Tsystem.
For each subset A of 0, let PA denote the family of all subsets
B of 0 such that An B is an element of A(P). Let Al E A(P).
Notice that 0 E PAl since Al nO = Al E A(P). Now assume
that C1 and C2 are elements of PAl such that C1 c C2 . Then
(AI n C 1 ) c (AI n C 2 ) and thus ((AI n C 2 ) \ (AI n C 1 )) E A(P);
also, (AI n C 2 ) \ (AI n Cd = (AI n C2 ) n (AI n CdC = (AI n
C2 ) n (A~ U Cf) = Al n C2 n Cf = Al n (C2 \ Cd· Thns PAl is
dosed lmder proper differences. Finally, assnme that {Dn}nEN
is an increasing seqnence of sets in PAl. Then the sequence
{AI n Dn}nEN is either increasing or there exists some kEN
such that n > k implies that Al n Dn = AI. In either case,
lim(Al n Dn) E A(P) since A(P) is a 1Tsystem containing the set
AI. Thus, PAl is a Asystem. Furthermore, notice that P C PAl
since (AI n B) E P c A(P) for all B E P. Thns, if Al E P, then
PAl is a Asystem that indndes P. Since A(P) is the minimal
Asystem that indndes P, we see that for Al E P, A(P) C PAl.
4Limits of sets will be defined later. If this is your first trip through the
book, then hold off on this proof until you have read the next chapter.
www.MathGeek.com
www.MathGeek.com
30 Set Theory
From this we see that if Al E P and B E ).(P) then (Al n B) E
).(P). This, in turn, implies that for B E )'(P), P C PB. Since,
for B E )'(P), PB is a ).system that includes P and since ).(P) is
the minimal ).system that includes P, we see that ).(P) C PB.
Now we observe that this means that for B E ).(P) and. for
C E ).(P) we have (BnC) E ).(P). Thus, ).(P) is a 1Tsystem. D
<) 1.7 Topological Spaces
Let 0 be a nonempty set. A topology U for 0 is a subset of JP(O)
that contains 0, contains the empty set, is closed under finite
intersections, and closed under arbitrary unions. A topological
space is an ordered pair (0, U) where 0 is a nonempty set and U
is a topology for D. The sets in U are called the open sets with
respect to the topology U on O. The complement of an open
set is called a closed set. Note that in any topological space (0,
U) the sets 0 and 0 are both open and closed. It follows from
DeMorgan's Law that a finite union of closed sets is closed and
an arbitrary intersection of closed sets is closed.
Example 1.3 Consider the set JR:k for a positive integer k. For
x E JR:k and r E (0, (0), let B(x, r) denote the open Euclidean
ball in JR:k centered at x with radius r; that is,
B(x, r) = {Y E JR:k : "t,(1T (X) 
i 1T.;(y))2 < r2} .
For the usual topology on JR: k , a subset U of JR:k is open if and only
if for any x E U there exists a positive real number r snch that
B(x, r) C U. Unless noted otherwise, we will always assnme
that JR:k is equipped with its usual topology. D
Let (0, U) be a topological space and let A C o. A point w E 0
is a limit point of A if A n (U \ {w}) is not empty for any open
set U that contains w. Note that a limit point of A need. not be
an element of A. The closure of A is the union of A with the set
www.MathGeek.com
www.MathGeek.com
Caveats and Curiosities 31
of all limit points of A. A neighborhood of A is any subset of n
that includes an open superset of A. An isolated point of A is
a point w E A such that AnN = {w} for some neighborhood
N of {w}. A closed set that has no isolated points is said to
be a perfect set. A subset of n is said to be a G b set if it is
expressible as a <:ountable intersedion of open sets. A subset of
n is said to be an Fer set if it is expressible as a countable union
of dosed sets.
1.8 Caveats and Curiosities
It is important to keep in mind the crucial role that topology
plays in dealing with questions regarding convergence. For ex
ample, the set {ffi., 0} is a topology on the real line that is called
the trivial topology on ffi.. Under this topology, every sequence
of real numbers converges to every real number!
www.MathGeek.com
www.MathGeek.com
32 Set Theory
I wanted certainty in the kind of way
in which people want religious faith. I
thought that certainty is more likely to
be found in mathematics than elsewhere.
But I dismvered that many mathemati
cal demonstrations, which my teachers ex
pected me to accept, were full of fallacies,
and that, if certainty were indeed discov
erable in mathematics, it would be in a
new field of mathematics, with more solid
foundations than those that had hitherto
been thought secure. But as the work
proceeded, I was continually reminded of
the fable about the elephant and the tor
toise. Having mnstructed an elephant
upon which the mathematical world could
rest, I found the elephant tottering, and
proceeded to construct a tortoise to keep
the elephant from falling. But the tortoise
was no more secure than the elephant, and
after some twenty years of very arduous
toil, I came to the conclusion that there
was nothing more that I could do in the
way of making mathematical knowledge in
dubitable. Bertrand Russell (who, with
Alfred North Vlhitehead, constructed a
362 page proof that 1+1=2.)
www.MathGeek.com
www.MathGeek.com
2 Measure Theory
2.1 Definitions
A measure JL on a measurable space (n, F) is a fmIction on F
that satisfies the following three properties:
1. JL: F 7 [0, 00]
2. JL(0) = 0
3. If An E F for all n E N and if An n Am = 0 when m i= n
then
A function that satisfies property (3) is said to be countably
additive. Thus, a measure is a countably additive, nonnegative,
extended realvalued set function that maps the empty set to
zero. If A is an element of F then JL(A) is called the measure
(or JLmeasure) of A.
A measure JL on a measurable space (n, F) is said to be a finite
measure if j1(n) < 00. A measure j1 on a measurable space
(n, F) is said to be a ITfinite measure if n may be written as
n = UnEN An where An E F and JL(An) < 00 for each n.
If JL is a measure on a measurable space (n, F) then the re
sulting ordered triplet (n, F, JL) is called a measure space. A
probability measure P on a measurable space (n, F) is a mea
sure on (n, F) such that p(n) = 1. The associated measure
space (n, F, P) is then called a probability space and sets in
F are called events. If A is an event then P(A) is called the
probability of A. Note that it does not make sense to discuss
the probability of subsets of n that are not events.
Example 2.1 Let n be a nonempty set and let Wo be a point
in n. Let JL: JID(n) 7 {O, I} via JL(A) = 1 if Wo E A and JL(A) = 0
if Wo tt A. Then JL is a measure on (n, JID(n)) and (n, JID(D) , JL)
www.MathGeek.com
www.MathGeek.com
34 Measure Theory
is a probability space. The particular measure in this example
is known as Dirac measure at the point Wo. D
Example 2.2 Let 0 be any nonempty set and define JL on
lP(O) by letting JL(A) = 00 if A is an infinite set and by letting
JL(A) equal the number of points in A if A is a finite set. Then
JL is a measure on (0, lP(O)). The particular measure in this
example is known as munting measure. D
2.1 Theorem (Monotonicity) Consider a measure space (0, F,
JL) and let A and B be elements of F. If A c B then J1 (A) ::;
JL( B).
Proof. Notice that B = Au (B \ A) and also that An (B \
A) = 0. Since J1 is countably additive we see that JL(B) =
JL(A) + JL(B \ A). Further, since JL is nonnegative, we see that
JL(B \ A) 2': o. Thus, it follows that JL(A) ::; JL(B). D
2.2 Theorem Consider a measure space (0, F, JL). The measure
JL is countably s1Lbaddiiive. That is, given any (not necessarily
disjoint) sequence {An}nEN of sets in F it follows that
Proof. Define a new sequence {A;'JnEN of measurable sets as
follows:
nl
A~ An \ U Ak for n E N \ {I}.
k=l
Note that
U An = U A~
nEN nEN
and that the A~ 's are disjoint. (The collection {A;' : n E N}
is called a disjointification of the collection {An : n E N}.)
Further, since A;1 c An for each n, Theorem 2.1 implies that
www.MathGeek.com
www.MathGeek.com
Supremums and Infimums 35
JL(A~):S JL(An) for each n. This observation combined with the
countable additivity of JL implies that
Logic is the railway track along which the
mind glides easily. It is the axioms that
determine our destination by setting us on
this track or the other, and it is in the mat
ter of choice of axioms that applied math
ematics differs most fundamentally from
pure. Pure mathematics is controlled (or
should we say "uncontrolled"?) by a prin
ciple of ideological isotropy: any line of
thought is as good as another, provided
that it is logically smooth. Applied math
ematics on the other hand follows only
those tracks which offer a view of natural
scenery; if sometimes the track dives into
a tunnel it is because there is prospect of
scenery at the far end. J. L. Synge
2.2 Supremums and Infimums
Let S be a subset of JR. An element x E JR is said to be an upper
bound of S if y :S x for all yES. An element x E JR is said to
be a lower bound of S if x :S y for all yES.
\Ve say that a subset of JR is bounded above (below) if it has an
upper (lower) bound. If a subset of JR has both an upper bound
and a lower bound then we say that the set is bounded. vVe say
www.MathGeek.com
www.MathGeek.com
36 Measure Theory
that a subset of lR is unbounded if it lacks either an upper or a
lower bound.
Let S be a subset of R If S is bounded above then an upper
bonnd of S is said to be a snpremnm (or least npper bonnd) of
S if it is less than every other npper bonnd of S. We denote the
supremum of S by sup S. If S is bounded below then a lower
bonnd of S is said to be an infimnm (or greatest lower bonnd) of
S if it is greater than every other lower bound of S. \Ve denote
the infimum of S by inf S. If S is not bounded above then we
will define snp S to be 00 and if S is not bonnded below then
we will define inf S to be 00. Thus, any subset of lR possesses
an infimum and a supremum.
The supremum of a subset S of lR and the infimum of Sneed
not belong to S. If sup S is an element of S then we sometimes
refer to sup S as the maximum of S and denote it by max S. If
inf S is an element of S then we sometimes refer to inf S as the
minimum of S and denote it by min S. For example, if S = (a,
b] then inf S = a, sup S = b, max S = b, and min S does not
exist.
<) Exercise 2.1 Does there exist a subset A of lR such that
sup A < inf A?
2.3 Convergence of Sets: Lim Inf and
Lim Sup
Let {An }nEN be a sequence of subsets of some non empty set O.
The set of all elements from 0 that belong to all but a finite
number of the An's is called the inferior limit of the sequence
{An}nEN and is denoted by liminf An or sometimes by [An a.a.]
where a.a. is an abbreviation for "almost always." The set of all
elements from 0 that belong to infinitely many An's is called
the superior limit of the sequence {An}nEN and is denoted by
lim sup An or sometimes by [An i.o.] where i.o. is an abbreviation
www.MathGeek.com
www.MathGeek.com
Convergence of Sets: Lim Inf and Lim Sup 37
for "infinitely often." That is,
liminf An =
00
U n Am
k=l m=k
00
and
limsllpAn= n U Am.
00
k=l m=k
00
If lim inf An = lim sup An = A then we say that the sequence
{An }nEN converges to the set A. In such a case we denote A by
limn>0O An.
Exercise 2.2 Show that limsllpAn = (liminf(A~))c.
Exercise 2.3 Show that lim inf An C lim sup An.
Exercise 2.4 Define subsets An of JR; via:
A.  {(lin, 1] if n is odd
n  (1, lin] if n is even
for positive integers n. Show that lim inf An = {O} and that
limsllpAn = (1,1].
2.3 Theorem (The First BorelCantelli Lemma) Let (0, F,
JL) be a measure space. Ij {An}nEN is a sequence oj measurable
sets and 'if
=
then JL(limsupAn) = o.
Proof. To begin, note that since
n U Ak
00 00
lim sup An =
m=l k=m
it follows that lim sup An C U~m Ak for each mEN. Thus, the
monotonicity of JL implies that
JL(lim sup An) :::; JL CQ, Ak)
www.MathGeek.com
www.MathGeek.com
38 Measure Theory
for each mEN. Hence, via the countable subadditivity of p, it
follows that 00
k=m,
for each mEN. Note that since ~~=1 P,(An) < 00 it follows that
~~m p,(Ak) 7 0 as m 7 00; that is, the tail of the convergent
sequence vanishes. Therefore, since p,(lim sup An) is nonnegative
and must be smaller than any positive value we conclnde that
p,(lim sup An) = O. D
The following continuity property of finite measures will be use
ful in proving some later results.
2.1 Lemma Consider a measure space (n, F, p,) such that p,(n) <
00. If {An}nEN is a sequence of measurable sets that converges
to some (measurable) set A then the sequence p,( An) converges
to p,(A) as 17 7 00.
2.4 Measurable Functions
Let (nl' Fd and (n2' F 2 ) be measurable spaces. If f: n 1 7 n 2
is such that f 1 (F2 ) C Fl then we say that f is a measurable
function mapping (nl' F 1 ) into (n2' F 2 ), and we denote this
property by writing f: (nl, F 1 ) 7 (n2' F2).
Example 2.3 Let n 1 = {red, blue, green} and let n 2 = {O,
I}. Further, let Fl = {0, n 1 , {red, blue}, {green}} and let
F2 = {0, n 2, {O}, {I}}. Define f: n 1 7 n 2 via f(red) =
f(blue) = 0 and f(green) = 1. Define g: n 1 7 n 2 via g(red) =
o and g(green) = g(blue) = 1. Note that fl(0) = 0 and
f 1 (n 2 ) = n 1 . (Indeed, these relationships always hold.) In
addition, note that f 1 ( {O}) = {red, blue} and that f 1 ( {I}) =
{green}. Thus, since f 1 (F2 ) C Fl (equal, in fact) we conclude
that f is a measurable function mapping (nl' F 1 ) into (n2' F2).
Note, however, that since gl( {O}) = {red} tj:. Fl it follows
that 9 is not a measurable function mapping (nl, F 1 ) into (n2,
F2). D
www.MathGeek.com
www.MathGeek.com
Real Borel Sets 39
2.5 Real Borel Sets
Recall that a bounded open interval in JR is a subset of JR of
the form (a, b) where a and b are real numbers snch that a < b
and where, as usnal, (a, b) = {x E JR : a < x < b}. Let A
denote the collection of all bounded open intervals in R The
collection of Borel subsets of JR is denoted by B(JR) and is defined
by B(JR) = a(A). That is, B(JR) is the smallest aalgebra on JR
that contains every bounded open interval. The subsets of JR in
B(JR) are called real Borel sets or Borel measurable subsets of
R Note that (JR, B(JR)) is a measurable space.
I hold ... that utility alone is not a proper
measure of value, and would even go so
far as to say that it is, when strictly and
shortsightedly applied, a dangerously false
measure of value. For mathematics, which
is at once the pure and untrammelled cre
ation of the mind and the indispensable
tool of science and modern technology, the
adoption of a strictly utilitarian standard
could lead only to disaster; it would first
bring about the drying up of the sources
of new mathematical knowledge and would
thereby eventually cause the suspension of
significant new activity in applied mathe
matics as well. In mathematics we need
rather to aim at a proper balance be
tween pure theory and practical applica
tions ... Marshall Stone
<:; Exercise 2.5 Try to find a subset of JR that is not a real Borel
set.
www.MathGeek.com
www.MathGeek.com
40 Measure Theory
Consider a measurable space (0, :F). If f: (0, :F) + (JR, B(JR))
then f is said to be a realvalued :Fmeasurable function defined
on 0. If.f: (JR, B(JR)) + (JR, B(JR)) then f is said to be a
realvalued Borel measurable function defined on JR.
Exercise 2.6 Show that any countable subset of JR is a real
Borel set.
Let f : JR + R Recall that we say that f is continuous at the
real number x if and only if for any c > 0 there exists 8 > 0
such that if Ix  yl < 8, then If(x)  f(y)1 < c. Further, if f
is continuous at x for each real number x, then we say that f is
continuous.
2.4 Theorem Let f : JR + JR. The function f is continuous if and
only if for each open set U of real numbers, fl(U) zs an open
set.
Proof. Suppose that f 1 (U) is open for each open set U of real
numbers, and let x be an arbitrary real number. Then, given
any real number c > 0, the interval 1= (.t(x) c, f(x) +c) is an
open set, and so fl(1) must be open. Now, since x E f 1 (1),
there must exist some real number 8 > 0 such that (x  8,
x + 8) C fl(1). But this implies that if Ix  yl < 8, then
f(y) E (f(x)  c, f(x) + c). Hence, f is continuous at x and,
since x was arbitrary, f is continuous.
Now, suppose that f : JR + JR is continuous, and let U be
a nonempty open subset of R If fl(U) is empty then the
desired result follows since the empty set is open. Assume then
that fl(U) is not empty and let x E fl(U). Then, since
f(x) E U there exists some c > 0 such that (f(x)  s, f(x) + c)
is a subset of U. Since f is continuous at x there exists a b > 0
such that If(x)  f(y)1 < c when Ix  YI < b. Thus, for every
y E (x  8, x + 8), it follows that f(y) E (f(x)  c, f(x) + c)
which is an open subset of fl(U). Thus, fl(U) is open. D
<) Exercise 2.7 Show that any continuous function mapping JR
to JR is Borel measurable.
www.MathGeek.com
www.MathGeek.com
Real Borel Sets 41
Consider a nonempty set o. A ITsubalgebra of a ITalgebra F
on 0 is a ITalgebra on D that is a subset of F. For example,
{0, O} is a ITsub algebra of any ITalgebra on O. For a second
example, let F = {0, 0, A, AC} for some proper subset A of O.
Even though the subset {0, A} of F is a ITalgebra on A, it is
not a ITsubalgebra of F.
\Ve say that a ITalgebra A on a nonempty set 0 is collntably
generated if A = IT( {An: n E N} ) for some choice of the An's. If
F is a countably generated ITalgebra on a nonempty set 0 and if
9 is a ITsllbalgebra of F, then mllst 9 be collntably generated?
In the following example we show that the answer is no.
Example 2.4 Let 0 = [0, 1] and let F = 8([0, 1]). Fur
ther, let 9 be the ITsubalgebra of F given by the countable and
cocountable subsets of [0, 1]. (A set is cocountable if it has a
countable complement.) It follows from one of the problems that
F is countable generated. Assume now that 9 is also countably
generated. That is, assume that 9 = IT( {An : n E N}) where
An C [0, 1] for each n E N. Note that without loss of general
ity, we may assume that An is countable for each n E N. Let
B = UnEl'l An and note that B is also countable. Thus, there
exists some real number x such that x E [0, 1] \ B. Notice also
that if D is the family of all subsets of B and their complements
then D is a ITalgebra such that 9 :J D :J IT( {An: n E N}).
But, D i= 9 since {x} is in 9 but not in D. This contradiction
implies that 9 is not countably generated even though it is a
ITsubalgebra of the countably generated ITalgebra F. D
2.2 Lemma Consider a mea.'mrable space (D, F) and realvahled
F measurable functions f and 9 defined on O. The set {w EO:
f(w) > g(w)} is an element of F.
Proof. \iVrite the set Q of rational numbers as a sequence
{rn}"EN. Note that
{w EO: f(w) > g(w)}
U {w EO: f(w) > Tn > g(w)}
U ({WE 0 : f (w) > Tn} n {W EO: 9 (w) < Tn} )
nEl'l
www.MathGeek.com
www.MathGeek.com
42 Measure Theory
U U 1 ((rn' (0)) ng 1((00, rn))).
nEN
The desired result then follows since (rn' (0) and (00, rn) is
in B(JR) for each n E N. D
2.3 Lemma Consider a meas1Lrable space (fl, F) and a realvalued
Fmeasurable ftmction f defined on fl. If 0; is any real number
then f + 0; and o;f are F measurable functions defined on fl.
2.4 Lemma Consider a measurable space (fl, F) and realvalued
Fmeasurable functions f and 9 defined on fl. The function
f + 9 is an Fmeasurable function defined on fl.
2.5 Lemma Consider a measurable space (fl, F) and realvalued
F measumble functions f and 9 defined on fl. The function f 9
is an F measmable f1mdion defined on fl, and, if 9 ,is nonzeTO
then f /g is an Fmeasurable function defined on fl.
2.6 Lemma Consider a measurable space (fl, F) and a sequence
{fn}nEN of realvalued Fmeasumble functions defined on fl.
The functions SUPkEN fk(X) and inhEN fk(X) are Fmeasurable
functions defined on fl.
Consider a sequence {Xn}nEN of real numbers. Recall that the
superior limit of this sequence is given by
lim snp Xn = inf snp Xn
n~= JEN n';?j
and the inferior limit of this sequence is given by
lim inf Xn = sup inf Xn .
n+= JEN n';?j
Further, this sequence is said to converge to a real number x if
lim sup Xn = lim inf Xn = x.
n.= n7CX)
Finally, a sequence {fn}nEN of realvalued functions defined on
some nonempty set fl is said to converge pointwise to a function
f : fl +JR is the sequence {fn(W)}nEN ofreal numbers converges
to the real number f (w) for each W E fl. In this case, we denote
the pointwise limit f as limn .= fn.
www.MathGeek.com
www.MathGeek.com
Lebesgue Measure and Lebesgue Measurable Sets 43
2.7 Lemma Consider a measurable space (D,:F) and a sequence
{fn}nEN of realvalued :F measurable functions defined on D. ~f
limn *= fn exists then it is an :Fmeasurable function defined on
D.
2.6 Lebesgue Measure and Lebesgue
Measurable Sets
For an open interval (a, b) of JR., let £( (a, b)) denote the length
of the interval (a, b). That is, if I = (a, b) with a < b then
£(1) = b  a.
Let A be a snbset of R \lVe will say that a conntable colledion
{In: n E 11{ C N} of open intervals covers A if
A C (U
nEM
In).
For each snch set A, let SA be the subset of JR. given by
The onter Lebesgne measure of A is denoted by m * (A) and is
defined by m*(A) = inf SA. (Note that outer Lebesgue measure
is defined for any set in lP(JR.) but is not a measure on (JR., lP(JR.))
since it fails to be conntably additive.)
2.1 Definition (The Caratheodory Criterion) A subset E of
JR. is said to be Lebesgue measurable if
m*(B) = m*(B n E) + 'm*(B n E C
)
for every subset B of R
Let M (JR.) denote the collection of all subsets of JR. that satisfy
the Caratheodory Criterion; that is, .1\// (JR.) denotes the collec
tion of all Lebesgue measurable subsets of R
www.MathGeek.com
www.MathGeek.com
44 Measure Theory
2.5 Theorem The set B(~)is a proper subset of M(~).
2.6 Theorem The set M(~) is a proper subset ofJPl(~).
Proof. For the construction of a nonLebesgue measurable
subset of the real line, see pages 4142 of Counie'rexamples in
Probability and Real Analysis by G. Wise and E. Hall (Oxford
University Press, New York, 1993). Also, see page 63 of Real
Analysis by H. L. Royden (lVIacmillan, New York, 1988, Second
~~. D
2.7 Theorem The set M(~) ,is a (Jalgebm on R
Proof. See pages 5658 of Real Analysis by H. L. Royden
(Macmillan, New York, 1988, Second edition). D
Lebesgue measure m on the measurable space (~, M(~)) is de
fined to be the restriction of m* to M(~). That is, m(A) is
equal tom*(A) if A E M(~) and m(A) is left undefined if
A 1:. M(~). Lebesgue measure A on the measurable space (~,
B(~)) is defined to be the restriction of m to B(~).
Lebesgne measure corresponds to om intuitive concept of length.
That is, the Lebesgue measure of an interval is the length of the
interval. Lebesgue measure, however, is defined for subsets of
~ that are mnch more complicated than intervals. Note, also,
that we have only defined Lebesgue measure for certain subsets
of the real line. Later, we will define it for certain snbsets of ~k.
In any case, however, when discllssing the Lebesglle measnre of
a set A it will always be true that A is a subset of ~k for some
positive integer k.
2.8 Theorem Let A denote Lebesgue measure on (~, B(~)). If x E
~ then A( {x}) = o.
Proof. For each positive integer TI, let In denote the subset of
~ given by
I
n
= (x ~.
211'
x+ ~)
2n
.
www.MathGeek.com
www.MathGeek.com
Lebesgue Measure and Lebesgue Measurable Sets 45
Note that )..(In) = lin. Further, since {x} C In for each n
it follows via monotonicity that )..({x}) ::::; l/n for any positive
integer rL. Thus, we conclude that )..( {x}) = o. D
<> Exercise 2.8 If A is a Lebesgue measurable subset of Jl{ hav
ing zero Lebesgue measure then must A be countable?
°
Consider a measure space (0, :F, JL). A subset of is said to be
a nnll set (or a JLnull set) if it is measurable and has measnre
zero. That is, A is a null set if A E :F and if JL(A) = O. Let
A C B where where B is a null set. If A E :F then A must
also be a nnll set since JL(A) ::::; JL(B). In general, however, A
need not be a null set since A need not be an element of :F. A
measure space is said to be complete if every snbset of a null set
is a measurable set. Note that while the empty set is always a
null set, a null set need not be empty.
2.9 Theorem Corresponding to any measure space (0, :F, JL) there
exists a complete measure space (0, :Fa, JLo) such that
1. :F c :Fa.
2. JL(A) = JLo(A) for each set A E :F.
3. A E :Fa if and only if A = E U F where E E :F and where
FeN for some N E :F with JL(N) = O.
The measure space (0, :Fa, JLo) is said to be the completion of
(0, :F, JL).
2.10 Theorem The measure space (Jl{, M(Jl{),m) is the completion
of the measure space (Jl{, B(Jl{) , )..).
Exercise 2.9 If we complete Lebesgue measure on the real
Borel sets, then we obtain the real Lebesgue sets. There do exist
measures on the real Borel sets that when completed yield the
power set of R Can you think of such a measure?
www.MathGeek.com
www.MathGeek.com
46 Measure Theory
For a positive integer n, let JRn denote the nfold Cartesian prod
uct of JR with itself. That is, an element of JRn is an ordered
ntuple of the form (aI, ... ,an) where ai E JR for each i. A set
I of the form I = II X ... x In where h is an open interval of
the form (ak' bk) for each k is called an open rectangle in JRn.
The smallest ITalgebra on JRn that contains every open rectan
gle in JRn is denoted by B(JRn) and is called the set of Borel
measurable subsets of JRn. Note that, for any positive integer
n, (JRn, B(JRn)) is a measurable space. If f: (JR k , B(JR k )) + (JR,
B (JR)) for some kEN then f is said to be a realvalued Borel
measurable function defined on JR k .
2.11 Theorem For any kEN there exists a unique measure A on
(JR k , B(JR k )) such that
A(A1 X ... X A k ) = "\(A1) ... "\(Ak)
for any sets AI, ... , Ak from B(JR) where..\ is Lebesgue measure
on (JR, B(JR)). The measure A on (JRk, B(JRk)) is called Lebesgue
measure on (JRk, B(JRk)).
2.7 Caveats and Curiosities
www.MathGeek.com
www.MathGeek.com
3 Integration
3.1 The Riemann Integral
Let fbe a bonnded realvalued fnndion defined on an interval
[a, b] and let f = {aD, ... , an} be a subdivision of [a, b]; that
is, a = aD < al < ... < an = b for some positive integer n.
Let S denote the collection of all subdivisions of [a, b]. Define
realvalued functions 51 and 52 on S via
n
5 1(r) = 2:)0:;  aiI) sup{f(x) : ail < x ::::; o:d
i=1
and.
n
S:z(r) = 2]a;  O:il) inf{f(x) : ail < x ::::; ai}
i=1
where f = {aD, ... , an} is an element from S. The upper
Riemann integral of f over [a, b] is given by
U lb f(x)dx = inf{51 (f) : f E S}
and the lower Riemann integral of f over [a, b] is given by
£ lb f(x) dx = sup{52(f) : f E S}.
If the upper and lower Riemann integrals of f over [a, b] are
each equal to the same value j3 then we say that f is Riemann
integrable over [a, b] and we denote the value /3 by f(x) dx J:
and call it the Riemann integral of f over [a, b]. As the next
example shows, it is not difficult to find functions that are not
Riemann integrable.
Example 3.1 Let [a, b] with a < b be a subinterval of ffi. and
define a realvalued fnndion f on [a, b] via
f(x) = {~ if x is irrational
if x is rational.
www.MathGeek.com
www.MathGeek.com
48 Integration
Once when walking past a lounge in the
University of Chicago that was filled with a
loud crowd watching TV, [Zygmund] asked
one of his students what was going on.
The student told him that the crowd was
watching the \Vorld Series and explained to
him some of the features of this baseball
phenomenon. Zygmund thought about it
all for a few minutes and commented, "I
think it should be called the \Yorld Se
quence." Ronald Coifman and Robert
Strichartz writing about Antoni Zygmund
That is, f(x) = IiQ)(x). Let r = {aD, ... , an} be a subdivision
of [a, b]. Given any positive integer i ::; 1/, there exists a rational
number qi and an irrational number Ti such that O:i1 < qi ::; ai
and such that O:i1 < Ti ::; ai. Hence, it follows that sup{f (x) :
ai1 < x ::; ai} = 1 and inf{f(x) : ai1 < x ::; ai} = O.
From this we conclude that Sl (r) = L:~=1 O:i  a,l = b  a
and SAr) = O. Since these values do not depend upon the
particular subdivision r that was selected it follows that the
upper Riemann integral of f over [a, b] is equal to b  a and
that the lower Riemann integral of f over [a, b] is equal to zero.
Since these values do not coincide, we see that f is not Riemann
integrable over [a, b]. D
Example 3.1 points out a serious shortcoming of the Riemann
integral. In particular, for a Borel set E we would like IE to
be integrable and fJJ~IE(x) dx to equal the Lebesgue measure of
E. That is, ideally fIR IiQ)(x) dx should equal zero (the Lebesgue
measure of Q) but the Riemann integral of IiQ) does not exist.
Although the Riemann integral is not general enough or powerful
enough for our purposes, it remains useful for other purposes due
to its simplicity and computability.
\Ve will consider two additional types of integration. The first
will be a straightforward extension of the Riemann integral and,
as above, will be used to integrate functions defined on a subset
www.MathGeek.com
www.MathGeek.com
The RiemannStieltjes Integral 49
of the real line. The second new integration technique will be
much more general in that it will allow us to integrate functions
defined on arbitrary sets.
3.2 The RiemannStieltjes Integral
Let f be a realvalued function that is defined on an interval [a,
bj. As before, let r = {ao, ... , an} be a subdivision of [a, bj.
(That is, a = ao < a1 < ... < an = b.) Let g denote the set
of all subdivisions of [a, bj. Define a function S mapping g into
the extended nonnegative reals via
n
S(r) = L If(ai)  f(ai1)1·
i=l
The variation of f over [a, bj is defined by
v = sup{S(r) : rEg}.
If V < 00 then we say that f is of bounded variation on [a, bj.
If V = 00 then we say that f is of unbounded variation on [a,
bj.
Example 3.2 Consider a function f defined on [a, bj that is
nondecreasing; that is, if a ::; x < y ::; b then f (x) ::; f (y). Then
S(r) = f(b)  f(a) for any subdivision r and hence it follows
that V = f(b)  f(a). D
Example 3.3 Let f(x) = I<QI(x) for x E [a, bj. Then, given
any positive number B there exists a subdivision r of [a, bj such
that S(r) > B. (Simply choose r = {(Yo, ... , an} such that 11
is large and such that ai is rational when i is even and irrational
when i is odd.) Thus, V = 00 and we conclude that f is of
unbounded variation on [a, bj. D
Exercise 3.1 A function f defined on [a, bj and taking values
in lR is said to satisfy a Lipschitz condition on [a, bj if there
www.MathGeek.com
www.MathGeek.com
50 Integration
exists a constant C such that If(x)  f(y)1 s
Clx  YI for all x
and y in [a, bj. Show that for such a function f it follows that
V S C(b  a) where V is the variation of f over [a, bj.
Now, let f and 9 be realvalued functions defined on [a, bj and
c:onsider a subdivision r = {ao, ... , an} of [a, bj. Let <D be a
sample from the subdivision r. That is, <D = {PI, ... , Pn} is
a collection of real numbers such that ail S /3i S 0:; for each
positive integer i S n. Let g denote the set of all subdivisions
of [a, bj and for a subdivision r let Sr denote the collection of
all samples from the subdivision r. Let S denote the set of all
ordered pairs of the form (r, <D) where <D E Sr and define a
function R mapping S to lR via
n
R((r, <D)) = "Lf(Pi)(g(O:i)  g(ail)).
i=1
The value R( (r, <D)) is called a RiemannStieltjes sum of f with
respect to 9 for the subdivision r.
This was all part of his passion for order
in the world of mathematics. He could
not stand untidiness in his chosen ter
ritory, blunders, obscurity, or vagueness,
unproven assertions or half substantiated
claims ... the man who did his job incom
petently, who spoilt Landau's world, re
ceived no mercy: that was the unpardon
able sin in Landau's eyes, to make a math
ematical mess where there had been order
before. G. H. Hardy and H. Heilbronn
writing about Edmund Landau
For a subdivision r = {o:o, ... , Ct n } of [a, b], let
If! = max (O:i  aiI)
1 <:::1.<:::n
denote the size of r. If the limit
lim R((r, <D))
II'I>O
www.MathGeek.com
www.MathGeek.com
The Lebesgue Integral 51
exists and is finite then that limit is called the RiemannStieltjes
integral of f with respect to 9 on [a, b] and is denoted by
1 [a, b]
f(x) dg(x).
(N ate that this limit does not depend on 1>.) If 9 (x) = x then
I[a, b] f (x) dg (x) is simply the Riemann integral of the function f
over [a, b].
3.1 Theorem If f is continuous on [a, b] and if 9 ,is of bmmded
variation on [a, b] then the RiemannStieltjes integml of f with
respect to 9 on [a, b] exists.
3.2 Theorem (Integration by Parts) fr
1 [a. b]
f(x) dg(x)
exists then so does
Ja,b] g(x) df(x)
and
1 [a, b]
f(x) dg(x) = (f(b)g(b)  f(a)g(a)) 1
[a, b]
g(x) df(x).
3.3 Theorem If f is continuous on [a, b] and if 9 has a continuous
derivative g' on [a, b] then
{
J[a,b]
f dg = Ib f g' dx.
a
3.3 The Lebesgue Integral
3.3.1 Simple Functions
Consider a measure space (0, y, JL). A function f: 0 7 lR is
said to be a simple function 1 if it has the form
n
f(w) = LCYJA;(W)
i=l
lOr, more precisely, a rneasumble simple function.
www.MathGeek.com
www.MathGeek.com
52 Integration
wheren E N, where ai E JR for each i, and where the Ai '8
are disjoint elements of .:F. Note that such a simple function
is a measurable mapping from (0, .:F) to (JR, B(JR)). Note also
that any function having the form given above with the A/s not
disjoint may be written as a simple function by taking intersec
tions. For a simple function f as given above we will define the
Lebesgue integral of f over 0 to be
Example 3.4 Let 0 = {Head, Tail} and let .:F = lP(O). Let
JL be a measure defined on (0, .:F) via JL( {Head}) = 1/2 and
JL( {Tail}) = 1/2. Let f map 0 to JR via
f(w) = a1 I {Tail}(W) + a2 I {Head}(w)
where a1 and a2 are real numbers. Note that
3.3.2 Measurable Functions
Consider a measurable realvalued function f defined on (0, .:F)
and assume that f (w) ~ 0 for all w E 0. Let Sf denote the set
of all simple functions h defined on (0, .:F) such that 0 ::::; h( w) ::::;
f (w) for all w E 0. For such a nonnegative measurable function
f we define the Lebesgue integral of f over 0 to be
in f dJL = sup {in h dJL : h E Sf} .
Consider a measurable realvalued function f defined on (0, .:F)
and let f+ and f denote the positive and negative parts of
f, respectively. That is, f+(w) = max{f(w), O} and f(w) =
max{  f(w), 0) for each w E 0. Note that f+ and f are non
negative measurable functions, that If I = f+ + f, and that
www.MathGeek.com
www.MathGeek.com
The Lebesgue Integral 53
j = j+  j. \Ve will define the Lebesgue integral of j over 0
to be
L
j d~ = L
j+ d~  L
j d~
provided that the two integrals on the right are not both equal
to 00; if they are ea<:h infinite then we say that the Lebesgue in
tegral of j does not exist. The function j is said to be Lebesgue
integrable if 10 j d~ exists and is finite. If A E :F then we will
let
!
. A
j d~ = r
Jo
j 1/1 dp .
Note that the Lebesgne integral of a nonnegative measurable
function always exists although the value of the integral may
be 00.
\Ve have considered two important concepts that are as
sociated with Lebesgue. It is important not to confuse
them. Lebesgue measure is a particular example of a
measure that is only defined for certain subsets of the
real line or ]R.k. The Lebesgue integral allows us to inte
grate realvalued measurable functions that are defined
on any measurable space. In particular, the Lebesgue in
tegral is defined on general measure spa<:es and need not
have any relation at all to Lebesgue measure. If, how
ever, we consider the Lebesgue integral with respect to
Lebesgue measure on ]R.k then for certain functions we
recover the familiar Riemann integral.
3.3.3 Properties of the Lebesgue Integral
Consider a measure spa<:e (0, :F, Jil A <:ondition is said to hold
almost everywhere with respect to the measure ~ (written a.e.
[,u]) if there exists a ,unull set B su<:h that the mndition holds
for all w in 0 \ B. For example, if 0 = ]R. and if ~ is Lebesgue
measure then IQ(x) = 0 a.e. [~l. Lebesgue integrals satisfy the
following properties:
www.MathGeek.com
www.MathGeek.com
54 Integration
1. If In f dp, exists and if k E lR then In kf dp, exists and
equals k In f dp,.
2. If g(w) 2': h(w) for all w E 0 then
k9dP, 2': k hdp,
provided that these integrals exist.
3. If In f dp, exists then
4. If the Lebesgue integral of f and of 9 each exist then
provided that the right hand side is not of the form 00  00
or 00 + 00.
5. A realvalued measurable function f is integrable if and
only if If I is integrable.
6. If f = 0 a.e. [p,l then In f dp, = o.
7. If 9 = h a.e. [p,l, if In 9 dp, exists, and if h is measurable
then In h dp, exists and is equal to In 9 dp,.
8. If h is integrable then h is finite a.e. [p,l.
9. If h 2': 0 and In hdp, = 0 then h = 0 a.e. [p,l.
The following two results are the "workhorses" of real analysis:
3.4 Theorem (Monotone Convergence or B. Levi's Theorem)
Let {fn}nEN be a sequence of measurable Tealvalued functions
defined on 0 such that 0 ::::; h(w) ::::; h(w) ::::; ... faT all w E 0
and such that f n (w) 7 f (w) as n 7 00 faT all w E 0 faT some
function f. The f1mdion f ,is measurable and
as n 7 00.
www.MathGeek.com
www.MathGeek.com
The Riemann Integral and the Lebesgue Integral 55
Proof. For a proof of this theorem, see page 172 of Real and
Abstract Analysis by E. Hewitt and K. Stromberg (Springer
Verlag, New York, 196.5). D
3.5 Theorem (Dominated Convergence Theorem) Let {fn}nEN
be a sequence of measurable realvalued functions defined on n
s1lch that f(w) = limn>oo fn(w) exists for all wEn. If Ifni gs
for some integrable function g and for each n E N then
1. f is integrable,
2. lim
n>= Jor Ifn  fl dp = 0, and,
Proof. For a proof of this theorem, see pages 172173 of
Real and Abstract Analysis by E. Hewitt and K. Stromberg
(SpringerVerlag, New York, 1965). D
3.4 The Riemann Integral and the
Lebesgue Integral
3.6 Theorem Let f be a bounded realvalued function defined on
an interval [a, b]. If f is Riemann integrable on [a, b] then f
is Lebesgue integrable wdh respect to Lebesgue measure on [a, b]
and the two integrals are eq1wl.
Proof. This result is proved on pages 121122 of Real Variables
by A. Torchinsky (Addison'Wesley, Redwood City, California,
1988). D
3.7 Theorem Let f be a bounded realvalued function defined on an
interval [a, b]. The function f is Riemann integrable on [a, b] if
and only if f is continuous a.e. on [a, b] with respect to Lebesgue
measure.
www.MathGeek.com
www.MathGeek.com
56 Integration
Proof. This result is proved on page 123 of Real Variables
by A. Torchinsky (Addison\Vesley, Redwood City, California,
1988). D
Note 3.1 The previous result holds for a bounded function
defined on a bounded interval. In contrast, there do exist func
tions that possess improper Riemann integrals and yet are not
Lebesgue integrable. The function sin( x) / x is such a function.
(Recall that for an improper Riemann integral either the inte
grand or the interval over which the integrand is integrated is
unbounded. )
Note 3.2 Let). denote Lebesgue measnre on (lR, B(lR)). If
.f: lR + lR is Lebesgue integrable then we will often denote the
integral
kfd).
via the more familiar notation
k
f(x) dx.
The second expression, however, is just our notation for the
Lebesgue integral with respect to Lebesgue measnre and should
not ordinarily be taken to refer to a Riemann integral.
3.5 The RiemannStieltjes Integral
and the Lebesgue Integral
A fnnction F: lR + lR is said to be right continnous if
limF(y) = F(x)
ylx
for any x E R (The notation y 1x means that y + x with
y > x.)
www.MathGeek.com
www.MathGeek.com
Caveats and Curiosities 57
3.8 Theorem Let F: ~ + ~ be nondecreas'ing and Tight cont'iml
ous. Let f be a continuous, realvalued function defined on [a,
b]. The function F induces a measure JL on (~, B(~)) such that
JL((s, t]) = F(t)  F(s) for all s < t and such that
la
b
f(x)dF(x) = 1
(a,~
fdJL.
Proof. This res nIt is proved on pages 59 of Frobabildy Theory
by rr. G. Laha and V. K. Rohatgi (John 'Wiley, New York, 1979).
This proof nses the Caratheodory Extension Theorem that is
developed in Section 5.6 of this book. D
3.6 Caveats and Curiosities
www.MathGeek.com
www.MathGeek.com
58 Integration
www.MathGeek.com
www.MathGeek.com
4 Functional Analysis
4.1 Vector Spaces
Let X be a nonempty set and snppose that there exists a map
ping .f of X X X into X that is called the addition fllndion and
is denoted by j(XI' X2) Xl + X2. Suppose also that there is
a mapping 9 of lR X X into X that is called the scalar mlllti
plication fllndion and is denoted by g(n, x) cu. The set X
endowed with two such mappings is called a real vector space if
the following properties are satisfied:
1. x +y = y +x for all x and y in X.
2. x + (y + z) = (x + y) + z for all x, y, and z in X.
3. There exists in X a unique element denoted by 0 and called
the zero element such that x + 0 = x for each x in X.
4. To each x in X there corresponds a llniqlle element in X
denoted by x such that x + (x) = O. (\Ve will often
write +(x) as x.)
5. n(x + y) = nx + ny for each n in lR and each x and y
from X.
6. (a + (3)x = ax + /3x for each a and /3 in lR and each x
in X.
7. a(/3x) = (n(3)x for each a and /3 in lR and each x in X.
8. Ix = x for each x in X.
9. Ox = 0 for each x in X where the 0 on the left is a real
number and the 0 on the right is the element in X de
scribed in Property 3.
Consider a real vector space X (or, more precisely, a real vector
space (X, j, g)). A finite set {Xl, ... , x n } of elements (vedors)
from X is said to be linearly dependent (or consist of elements
www.MathGeek.com
www.MathGeek.com
60 Functional Analysis
that are linearly dependent) ifthere exist real numbers (scalars)
aI, ... , an, not all zero, such that alxl + ... + anXn = O. Oth
envise, the elements are said to be linearly independent. An
infinite set is said to be linearly independent if every finite sub
set of it is linearly independent.
A nonempty subset 1\11 of a vector space X is called a subspace
of X if x + y and ax are in AI for every a in Jl{ and every x and
y fI·om !vI. A subspace !vI of X is said to be a proper subspace
if !vI i= X. A subspace of a vector space is itself a vector space.
The intersedion of any family of sllbspaces is itself a sllbspace.
Let S be a nonempty subset of a vedor space X and let £(S)
be the set of all finite linear combinations of elements from S.
That is, x E £(S) if and only if x = CtlXl + ... + anXn for some
positive integer n and where Xi E Sand ai E Jl{ for each i. The
set £(S) is a subspace of X and is called the linear manifold
generated by S or the linear span of S.
If X is a vector space then there may be some positive integer n
such that X contains a set ofo, vectors that are linearly indepen
dent while every set ofo, + 1 vectors in X is linearly dependent.
In this case we say that X is finitedimensional and of (iimen
siono,. The trivial vector space {O} has dimension O. If X is not
finite dimensional then it is infinite dimensional. (The set Jl{k
endowed with the standard operations is kdimensional. Spaces
whose elements are functions are typically infinitedimensional.)
If X is ndimensional for some positive integer n then there ex
ists a linearly independent set S consisting of n elements such
that the linear span of S is X itself. Such a set is called a basis
for X.
4.2 Normed Linear Spaces
A mapping from a vector space X into Jl{ is called a norm on X
and is denoted by I . I if it satisfies the following properties:
1. Ilx + yll ::; Ilxll + Ilyll for each x and y from X.
www.MathGeek.com
www.MathGeek.com
Normed Linear Spaces 61
2. Iiaxil = lallixil for each a in lR and each x in X.
3. Ilxll :2: 0 for each x in X.
4. Ilxll = 0 if and only if x = o.
A nonempty set X is said to be a metric space if there exists a
mapping P of X x X into lR (called a metric or distance function)
such that:
1. P(XI' X2) :2: 0 for each Xl and X2 from X.
2. P(XI' X2) = 0 if and only if Xl = X2.
3. P(XI' X2) = P(X2' Xl) for each Xl and X2 from X.
4. P(XI' X3) :::; P(XI' X2) + P(X2' X3) for each Xl, x2, and X3
from X.l
An open ball centered at a point p in a metric space (X, p) is a
set consisting of all points q in X such that p(p, q) < 'r for some
fixed positive 'r. A point p is said to be a limit point of a subset
E of X if every open ball centered at p contains a point q such
that q i= p and such that q E E. The set E is closed if every
limit point of E is an element of E.
A seqnence {Xi}iEN of elements from a metric space (X, p) is
said to be a Cauchy sequence if for every E > 0 there exists an
integer N snch that p(xn, xm) < E whenever n :2: N and 'Tn :2: N.
A metric space in which every Canchy sequence converges to a
point in the space is said to be a complete metric space.
A vector space X equipped with a norm is called a normed linear
space. With the aid of this norm on X we can define a metric
d on X by letting d(x, y) = Ilx  yll for each X and y from X.
That is, a normed linear space is also a metric space. A normed
linear space that is complete with respect to the metric induced
by its norm is called a Banach space.
lThis property is called the Triangle Inequality.
www.MathGeek.com
www.MathGeek.com
62 Functional Analysis
4.3 Inner Product Spaces
Consider a real vector space X. A mapping of X x X into lR
is called an inner product on X and is denoted by (x, y) if it
satisfies the following conditions:
2. (x, y) = (y, x).
3. (x, x) ~ O.
4. (x, x) = 0 if and only if x = o.
A vector space endowed with an inner product is called an inner
product space or a preHilbert space. An inner product may be
used to define a norm by letting Ilxll = ~. A complete
inner product space is called a Hilbert space.
Mathematics is the one area of human en
terprise where the motivation to deceive
has been practically eliminated. Not be
cause mathematicians are necessarily vir
tuous people, but because the nature of
mathematical ability is such that decep
tion can be immediately determined by
other mathematicians. This requirement
of honesty soon affects the character of
the continuous student of mathematics. 
Howard Fehr
Two elements x and y in an inner product space X are said
to be orthogonal if (x, y) = o. If S is a set of vectors from
X then a vector y is said to be orthogonal to the set S if (x,
y) = 0 for each x in S. A set S of vectors from X such that (x,
y) = 0 for any distinct elements x and y from S is said to be
www.MathGeek.com
www.MathGeek.com
Inner Product Spaces 63
an orthogonal set. An orthogonal set S of vectors from X such
that Ilxll = 1 for each xES is said to be an orthonormal set.
An orthonormal subset of X is said to be total if there exists no
orthonormal subset of X of which S is a proper subset. For a
subspace AI of a Hilbert space H let A1.l (pronounced. 'M perp')
denote the subspace of H consisting of all elements in H that
are orthogonal to every element in M.
Example 4.1 The vectors [0, 1] and [1, 0] comprise a total
orthonormal subset of the inner product space ]R.2 where the
inner product is simply the vector dot product. D
4.1 Theorem (Bessel's inequality) Let {'U1, 'U2, ... } be an or
thonormal subset of an inner product space X. Then, for each
x EX, it follows that:
(Xl
L\X, Uk)'2 ::; Ilxll'2·
k=l
4.2 Theorem Let {U1,U2, ... } be an orthonormal subset of a
Hilbert space X. Each of the follollJ'ing conditions is necessary
and sufficient for the orihonormal set to be total:
(Xl
1. x = L\X, un)u n for each x EX.
n=l
(Xl
2. Ilxll'2 = L \X, Un )2 for each x E X.'2
n=l
4.3 Theorem (Parallelogram Law) In an inner product space,
the following eq1wlity holds for any two elements x and y of the
space:
4.4 Theorem If {Xl, ... , xn} is an orthonormal subset of a Hilbert
space H and if x E H then
Fl,
X  Lajxj
j=l
2This equality is called Parseval's identity.
www.MathGeek.com
www.MathGeek.com
64 Functional Analysis
is minimized when aj = (x, Xj) faT j = 1, ... ,n. (That is, the
aj's provide the coefficients for a best linear estimator of x in
terms of the Xj's.)
Proof. Note that
2
where we note that
since the x;'s are orthonormal. Thus, we have
2
n n
2
x ~ax·
L J J IIxI1 + L (a;  2aj(x, Xj))
j=1 j=1
n
IIxI1 + L
2
((aj  (x, Xj))2  (x, Xj)2)
j=1
slllce
(aj  (x, Xj)) 2 2 2
= aj  2aj(x, Xj) + (x, Xj) .
Thus, we have
2
n n n
0::::; X  Lajxj = IIxI1 2  L(x, Xj)2 + L(aj  (x, Xj))2,
j=1 j=1 j=1
which is minimized when aj = (x, Xj). D
A subset E of a vector space X is said to be convex if it has the
following geometric property: Vlhenever x and yare in E and
o < t < 1 then the point (1  t)x + ty is also in E. That is,
convexity requires that E contain the line segment between any
two of its points.
www.MathGeek.com
www.MathGeek.com
Inner Product Spaces 65
4.5 Theorem Let !vI be a nonempty closed convex subset of a
Hilbert space H. fr x E H, then there is a un'ique element
Yo E All such that Ilx  yoll = inf{llx  yll : Y E 1\1}. The
element Yo is called the projection of x on 1'1.
Proof. Let d = inf{llx  yll : Y E !vI} and choose points Yl, Y2,
... E !vI such that Ilx  Ynll Jo d as n Jo 00. \lVe will show that
{Yn}nEN is a Cauchy sequence.
The parallelogram law states that Ilu+vl1 2+ Ilu 1)11 2 = 211ul12 +
211vl12 for all u and v in H. Let u = Yn  X and v = Ym  X to
obtain:
2 2 2
llYn + Yrn  2xl12 + llYn  Yrnl1 = 211Yn  xl1 + 211Yrn  x11 ,
or,
2
2 2
llYn  Yrnl1 = 211Yn  xl1 + 211Ym  xl1
2
 411~(Yn + Ym) _ xl1
Since ~(Yn + Yrn) E M (by convexity), it follows that
Thus
Since the right hand side of this expression goes to 0 as '17" m Jo
00 it follows that {Yn}~=l is Cauchy.
Since H is complete, Yn converges to some limit Yo E H as
n Jo 00. Thus, Ilx  Ynll Jo Ilx  Yoll as n Jo 00. But then
II x  Yo II = d and Yo E }./I since 1\1 is closed. Thus such an
element Yo exists.
To prove uniqueness, let Yo, Zo E 1\1 with Ilxyo II = Ilxzoll = d.
In the parallelogram law, let u = Yo  x and v = Zo  x to obtain:
But,
2
Ilyo + Zo  2xl12 = 411~(YO + zo)  xii ;:::: 4d
2
.
Thus, Ilyo  zoll = 0, which implies that Yo = zoo D
www.MathGeek.com
www.MathGeek.com
66 Functional Analysis
4.6 Theorem Let All be a closed subspace of a Hilbert Space H, and
let Yo be an element of A1. Then Ilxyoll = inf{llxYII : Y E A1}
iff x  Yo ~ 1'1, i.e. iff (x  Yo, y) = 0 for all Y E M.
Proof. Assume that x  Yo ~ !vI. If Y E 1W then
Ilx  Yo  (y  Yo)11 2
Ilx  yol12 + Ily  yol12  2(x  Yo, Y  Yo)
Ilx  Yol12 + Ily  Yol12 ~ Ilx  Yol12
since Y  Yo E M. Thus, Ilx  yoll = inf{llx  yll : Y E !vI}.
Assume now that Ilx  Yoll = inf{llx  yll : Y E !vI}. Let Y E All
and let c be a real number. Since A1 is a subspace it follows that
Yo + cy E lW. Thus Ilx  Yo  cY11 ~ Ilx  Yoll. Bnt,
II x  Yo  cy 112 = II x  Yo 112 + c211 Y 112  2 (x  Yo, cy).
Thus,
c211YI12  2(x  Yo, cy) ~ o.
Let c = b(x  Yo, y) for some b E R Then
(x  Yo, cy) = (x  Yo, b(x  Yo, y)y) = b(x  Yo, y) 2 .
Thus,
b2(x  Yo, y)211Y112  2b(x  Yo, y)2
(x  Yo, y)2(b 21IYI12  2b) ~ o.
But (b 2 11Y112  2b) < 0 if b is small and positive. Thus (x  Yo,
y) = o. D
4.7 Theorem (Hilbert Space Projection Theorem) Let A1 be
a closed subspace of a Hilbert space H. If x E H, then x has
a 1mique representation x = y + z where y E M and z E !vI ~ .
Furthermore, y is the pTOjection of x on A1; that is, y is the
nearest pO'int ,in M to x.
Proof. Let Yo be the projection of x on !vI (see Theorem 4.5)
and let y = Yo and z = x  Yo. Theorem 4.6 implies that z E
www.MathGeek.com
www.MathGeek.com
Inner Product Spaces 67
M..1. Thus such a representation exists. To prove uniqueness,
let x = y + z = y' + z' where y, y' E AI and z, z' E 1\1..1.
Then y  y' E NI since AI is a subspace and y  y' E 11/1..1 since
y  y' = z'  z. Thus y  y' is orthogonal to itself which implies
that y = y'. But then z = z', which proves uniqueness. D
4.8 Theorem (RieszFrechet) Cons'ider a real Hilbert space H.
Every bounded linear function f : H 7 lR. may be expressed as
an inner product on H. That is, every bounded linear function
f : H 7 lR. may be expressed in the form f(h) = ~h, z) where
z E H ,is uniq1Lely determined by f and has norm Ilzll = Ilfll·
Proof. If f = 0 then let z = 0 and note that f(h) = ~h, z) and
Ilzll = Ilfll = O. Assume that f i 0 and note that z i O. Let
N (f) denote the null space of f; that is, N (f) consists of those
points h in H su<:h that f (h) = O. Note that z 1.. N (f) sin<:e
~h, z) = 0 for all h E N(f).
Note that N(f) is a vector space. That is, if 'U and 1) are in N(f)
then, since f is linear, f(u) + f(v) = f(u + v) = O. Further, if
a is a scalar and if u E fl(f) then au E N(f) since af(u) =
f (au) = O. Note also that N (f) is closed since f is a bounded,
linear, and hence continuous, map. Thus, since f i 0 it follows
that N(f) i H and hence, via Theorem 4.7, that J/(f)..1 i {O}.
Let Zo be any nonzero element of N (f)..1, and let 1) = f (x) Zo 
f(zo)x for some fixed x E H. Applying f to ea<:h side implies
that f(v) = f(x)f(zo)  f(zo)f(x) = O. That is, 'U E N(f).
Fnrther, sin<:e Zo E N(f)..1 it follows that ~/}, zo) = 0 = f(x)~zo,
zo)  f(zo)~x, zo) whkh implies that f(x)llzoI12  f(zo)~x, zo) =
O. Solving for f(x) implies that
f(zo)
f (x) = I Zo 112 ~ x, zo)
where we recall that IIzol12 > O. Finally, we may rewrite f(x) as
~x, z) where
www.MathGeek.com
www.MathGeek.com
68 Functional Analysis
<:; 4.4 The RadonNikodym Theorem
4.9 Theorem Consider crfinite measures JL and v defined on a
measumble space (O,:F) such that any JLnull set is also a v
null set. There exists an a. e. [JLl unique:F measumble function
h : 0 7 lR. such that
v(F) = k h dJL
for all F E :F.
4.5 Caveats and Curiosities
www.MathGeek.com
www.MathGeek.com
5 Probability Theory
5.1 Introduction
Modern probability theory is a branch of measnre theory that
is distingllished by its special emphasis and applications. Mllch
of the terminology of probability theory was established hun
dreds of years ago by people sllch as Pascal, Fermat, Bernoulli,
Laplace, and Gallss. While this historical foundation provided
much of the current vocabulary used in probability, it did not
provide a rigorous mathematical basis for probability theory.
Near the end of the nineteenth centnry, C. S. Peirce, the founder
of pragmatism, wrote:
This branch of mathematics [probability] is the only
one, I believe, in which good writers frequently get
resllits entirely erroneOllS. In elementary geometry
the reasoning is freqllently fallac:iolls, but erroneous
conclusions are avoided; but it may be doubted if
there is a single extensive treatise on probabilities in
existence which does not contain soilltions absolutely
indefensible. This is partly owing to the want of
any reglliar methods of procednre; for the subject
involves too many sllbtleties to make it easy to pllt
problems into equations without such aid.
At the beginning of the twentieth century measure theory was
established primarily through the work of Henri Lebesgue. In
1929 Andrei Kolmogorov developed a measuretheoretical ap
proach to probability theory and established probability theory
as a rigorous mathematical theory. 1 Thus, much of the vocab
ulary of probability theory was established hllndreds of years
before the vocabulary of measure theory was established. Con
sequently, many concepts have different names when seen from
the perspectives of probability theory and measure theory. For
lThis incident seems to have been overlooked by a large part of the
engineering community.
www.MathGeek.com
www.MathGeek.com
70 Probability Theory
example, an event in probability theory is a measurable set in
measure theory. On the other hand, there are concepts such
as statistical independence in probability theory that have no
analog in measure theory.
5.2 Random Variables and Distribu
tions
Consider a probability space (n, F, P); that is, consider a mea
sure space (n, F, P) such that p(n) = 1. A realvalued F
measurable function defined on n is said to be a random vari
able defined on (n, F, P). That is, X is a random variable if X:
(n, F) + (JR, B(JR)). Note that X is a fundion and X(w) is a
real number. Note, also, that if Hand P2 are probability mea
sures defined on (n, F), then a random variable X defined on
(n, F, P1 ) is also a random variable defined on (n, F, P2 ). A
random variable X defined on (n, F, P) is said to be a bounded
random variable if there exists some real number B such that
IX(w)1 < B for all wEn.
The probability distribution function of a random variable X
is the function F: JR + [0, 1] defined by F(x) = P(X :::; x)
where P(X :::; x) denotes the probability of the event {w E n:
X (w) :::; x}. (How do we know that this set is an event?) If sev
eral random variables are under consideration we may denote
the distribution function of X by Fx. A probability distribu
tion function F of a random variable X satisfies the following
properties:
1. lim F(x) =
xtCXJ
o.
2. lim F(x) = 1.
Xt(X)
3. If x < y then F(x) :::; F(y).
4. F is right continuous.
www.MathGeek.com
www.MathGeek.com
Random Variables and Distributions 71
[In statistics] you have the fact that the
concepts are not very dean. The idea of
probability, of randomness, is not a clean
mathematical idea. You cannot prodnce
random nnmbers mathematically. They
can only be produced by things like toss
ing dice or spinning a roulette wheel. With
a formula, any formula, the number you
get wonld be predidable and therefore not
random. So as a statistician you have to
rely on some conception of a world where
things happen in some way at random,
a conception which mathematicians don't
have. Lucien LeCam
5. P(x < X ::::; y) = F(y)  F(x).
6. P(X > x) = 1  F(x).
7. P(X = x) = F(x) limF(y).
yTx
8. P(X = x) = 0 for x E lR if and only if F is continuous
at x.
Exercise 5.1 Consider a probability distribution function F.
Show that limx+_= F(x) = O.
Exercise 5.2 Consider a probability distribution function F.
Show that F is right continuous.
Exercise 5.3 Consider a random variable X defined on a
probability space (D, F, P). Show that P(X = x) = F(x) 
limyTx F(y) where F is the probability distribution function
of X.
www.MathGeek.com
www.MathGeek.com
72 Probability Theory
Example 5.1 If F is continuous then P(X = x) is zero for
each x E R Does this mean that X cannot take on the value
x for any x E lR? No. Consider a dart that lands in a circular
dart board of unit area in such a way that the probability that
the dart lands in any particular circular region of the board
is simply the area of that region. Since a single point on the
board can be enclosed within a circle of arbitrarily small area
it follows that the probability of hitting any particular point is
zero. Thus, even though our (idealized) dart will hit a point
when thrown, the probability that it will hit that point is zero
before it is thrown. D
A random variable is said to be discrete if it takes values only
in some countable subset of R A probability distribution func
tion is said to be atomic if it is continnons except at most a
conntable nnmber of points and if it is constant between any
two adjacent points from the union of the set of discontinuities
with {oo, oo}. A probability distribution function F is said to
be absolutely continuous if
F(x) = [xao f(t) dt
for some integrable Borel measurable function f. If a function
is absolutely continuous then it is continuous, but there do ex
ist continuous functions that are not absolutely continuous. A
probability distribution function F is said to be singular if
d
dxF(x) = 0 a.e.
with respect to Lebesgue measure. If a distribution function is
atomic then it is singular, but there do exist singular distribution
functions that are not atomic.
The following corollary relates our definition of a random vari
able to a definition that is frequently found in introductory texts.
5.1 Corollary Consider a measurable space (n, F) and let X be a
function mapping n to R It follows that X 1 (( 00, xl) E F
for each x E lR if and only 'if X 1 (B(lR)) c F.
Proof. It follows immediately that if X 1 (B(lR)) c F then
X 1 ( (  00, x]) E F for each x E R Further, using Theorem 1. 7
www.MathGeek.com
www.MathGeek.com
Random Variables and Distributions 73
it follows that X 1 (B(lR)) c :F if X 1((00, xl) E :F for each
x E lR. D
5.2 Corollary Any continuous function mapping lR to lR is Borel
measurable.
Proof. Let.l: lR + lR and recall that, for any subset A of
R fl(AC) = (.fl(A)t Next, assume that .I is continuous
and recall from Theorem 2.4 that for any open subset U of lR,
.11 (U) is open. Thus, we see that for a closed subset K of lR,
.I 1 (K) is closed. Further, from Corollary 5.1 it follows that f
is Borel measurable if and only if for each x E lR, .11 (( 00,
xl) E B(lR). Note that for any x E lR, (00, xl is a closed set
since it is the complement of the open set (x, (0). Thus, for any
x E lR, f 1 (( 00, xl) is closed and hence is a real Borel set. \\'e
thus conclude that f is Borel measurable. D
I told him, I'm a scientist, we're objective.
I told him a crash was improbable. I was
trying to remember the exact probability
when we smashed into the ground. 27
year old botanist Wim Kodman trying to
calm a friend as their jet flew through tur
bulence.
5.1 Theorem (The Lebesgue Decomposition Theorem) Any
pTObability distribution function F may be written in the form
F(x) = Ci:lFl(X) + cC2F2(X) + a3F:3(x) where Ci:i ~ 0 for each i,
where al + a2 + a3 = 1, and where
1. Fl is an atom'ic pTObability distribution function,
2. F2 is an absolutely continuous pTObability distribution
function, and
3. F3 'is a singular, cont'lmLOus pTObabildy distr'ibution func
tion.
www.MathGeek.com
www.MathGeek.com
74 Probability Theory
Note 5.1 The CantorLebesgue function (which is developed
by Example 2.6 on page 54 of Counterexamples in Probability
and Real Analysis 2 by Gary VVise and Eric Hall) is an example
of a distribution function that is continuous and singular. In
particular, it is equal to zero at zero, is equal to one at one,
and is nondecreasing and continuous, yet has a derivative that
is almost everywhere equal to zero.
Consider a random variable X that possesses a probability dis
tribution function F that is absolutely continuous. There exists
a nonnegative Borel measurable function 1 mapping JR to JR such
that
P (X E A) = L1(x) dx
for any real Borel set A. Such a function 1 is called a probability
density function of the random variable X and exists if and only
if the probability distribution function of X is absolutely con
tinuous. A probability density function 1 for X is often denoted
by Ix. Note that if X possesses an absolutely continuous distri
bution function F then F is a.e. differentiable with respect to
Lebesgue measure and X possesses a probability density func
tion given by the derivative of F at points where the derivative
exists and defined to be an nonnegative value at points where
the derivative does not exist.
Let X be a random variable with an absolutely continuous prob
ability distribution function F and a probability density function
f. The function 1 satisfies the following properties:
1. F(x) = 1 (00, x]
l(s) ds.
2. k1 (x) dx = 1.
3. P(a:S; X :s; b) = P(a < X < b).
<:; Note 5.2 Consider a random variable X defined on a prob
ability space (r2, F, P) and the corresponding measure /Lx de
fined on (JR, B(JR)) such that /Lx(B) = P(X E B) for each
20xfonl University Press, 1993.
www.MathGeek.com
www.MathGeek.com
Independence 75
B E B(JR.).If px is absolutely continuous with respect to
Lebesgue measure A defined on (JR., B (JR.)) then there exists a
RadonNikodym derivative dpjdA. A nonnegative version of
this RadonNikodym derivative is known as a probability den
sity function of X. Note from the RadonNikodym Theorem
that such a fundion must be Borel measurable. Thus, there
exist nonnegative integrable functions that integrate to one, yet
which are not probability density fundions.
5.3 Independence
Consider a probability space (n, 5", P). Recall that elements of
5" are said to be events. Two events A and B are said to be
independent if P(A n B) = P(A)P(B). Consider an index set I
and let Ai be an event for eachi E I. The sets {Ai :i E I} are
mutually independent 3 if for every finite collection {h, i 2 , ... ,
id of distinct indices from I it follows that
The sets {Ai : i E I} are said to be pairwise independent if
P(A i n Aj) = P(AJP(Aj ) for all i and j from I with i i= j. If
the index set I contains only two elements then mutual inde
pendence and pairwise independence are equivalent. In general,
however, pairwise independence is implied by, but does not im
ply, mutual independence.
Note 5.3 Consider three events A l , A 2 , and A 3 . The fol
lowing chart illustrates the difference between pairwise indepen
3Many authors omit the word "mutually," but we prefer to retain it
as a way of reinforcing the distinction between mutual independence and
pairwise independence.
www.MathGeek.com
www.MathGeek.com
76 Probability Theory
dence and mutual independence of the three events.
P(A I n A 2) =
P(A l )P(A2)
pmnVlse P(A 2 n A 3 ) =
mutual independence P(A 2)P(A 3 )
independence P(A I n A 3 ) =
P(A l )P(A 3 )
P(A I n A2 n A 3 ) =
P(A l )P(A 2)P(A 3 )
Consider a probability space (n, F, P). Let F l , F 2 , . . . , Fn
be subsets (not necessarily ITsub algebras) of F. (That is, each
Fi is a collection of events.) The collection F l , F 2 , . . . , Fn
are said to be mutually independent if given any Al E F l , any
A2 E F 2, ... , and any An E F n, it follows that AI, A 2, ... , An
are mutually independent.
[Cantor's theory] seems to me the most
admirable fruit of the mathematical mind
and indeed one of the highest achievements
of man's intellectual processes .... No one
shall expel us from the paradise which
Cantor has created for us. David Hilbert
Let X be a random variable defined on (n, F, P). We define the
ITalgebra generated by X (denoted by IT(X)) to be the small
est O"subalgebra of F with respect to which X is measurable.
That is, IT(X) = Xl(B(lR.)). For a collection Xl, ... , Xn of
random variables we will let IT(Xl, ... , Xn) denote the smallest
ITalgebra with respect to which Xl, ... , Xn are each measur
able. Note that
Random variables Xl, X 2 , ... , Xn defined on (n, F, P) are
said to be mutually independent if IT(Xd, IT(X2 ), ... , IT(Xn)
are mutually independent collections of events.
www.MathGeek.com
www.MathGeek.com
Independence 77
5.2 Theorem For an 'integer n > 1, cons'ider mutually independent
random variables Xl, X 2 , . . . , Xn defined on a common probabil
ity space. Let Tn be a positive integer such that Tn < rI. Further,
consider two functions f and g such that f : (JR.m, B (JR.m)) +
(JR., B(JR.)) and g : (JR.nm, B(JR.nm)) + (JR., B(JR.)). The ran
dom variables f(X 1 , ... , Xm) and g(Xm+l, ... , Xn) are 'inde
pendent.
5.3 Theorem (The Second BorelCantelli Lemma) Consider
a pTObabildy space (0, F, P). If {An}nEN is a sequence of mu
tually independent events and if
00
then P(lim sup An) = 1.
Proof. Since limsupAn = (liminf A~Y and since P(A) +
P(AC) = 1 for any event A the desired result will follow if we
show that P(lim inf A~) = O. Recall that
= =
liminf A~ = U n
k=l m=k
A~n·
By countable subadditivity it follows that
P(lim inf A~) ::; ~ P CQk A~n) .
Thus, the desired result will follow if we show that
for all kEN. Let j E N and note that
via independence ofthe An's (and hence of the A~'s). Note that
1  x ::; e X for all x E JR. and, in particular, for all x E [0, 1].
www.MathGeek.com
www.MathGeek.com
78 Probability Theory
Thus, it follows that
r Cd A~;) k+j
II (1 
n=k
P(An))
k+j
< II exp(  P(An))
n=k
Since L~=l P(An) = 00 we see that
+0
and hence that
k+j )
P ( nQk A~ + 0
as j + 00 for any kEN. Since
k+j 00
n A~ + n A~
n=k n=k
as j + 00 the desired result follows from Lemma 2.1. D
Example 5.2 An adaptive communications system transmits
blocks of bits where each block contains a fixed number of bits.
Let Xn equal 1 or 0 depending on whether an error occurs or
does not occur in block n, respectively. Assume that the Xn's
are mutually independent. Further, let Pn denote the probability
that Xn = 1. In this example we will derive a condition of the
Pn's that is necessary and sufficient for there to be almost surely
only a finite number of errors.
Let En denote the event that the nth block of data has an error.
That is, let En denote that event that Xn = 1. Thus, if [l is
the set of all possible sequences of received bits, then wEEn
if and only if sequence w contains an error in block n. Note
that lim sup En is the event that infinitely many errors occur.
www.MathGeek.com
www.MathGeek.com
Independence 79
That is, lim sup En is the set of w such that wEEn for infinitely
many different values of n. Thus, our problem is to determine
a condition that is both necessary and sufficient to ensure that
P(lim sup En) = O. If this probability is zero then with proba
bility one there will be only a finite number of errors.
By the first BorelCantelli lemma we know that if L:~=1 Pn < 00
then P(lim sup En) = o. Fnrther, the second BorelCantelli
lemma implies that if L:~=1 P(En) = 00 then P(lim sup En) =
1. That is, there will almost surely be infinitely many errors.
Thus, it is necessary that L:~=1 Pn < 00 for there to be almost
surely only a finite number of errors. Finally, we conclude that
L:~=1 Pn < 00 occurs if and only if there are almost surely a
finite number of errors. D
Example 5.3 Although we know that all years are not of
equal length, and although we might suspect that all days of
the year are not eqnally likely to be birthdays, we will neverthe
less make the simplifying assnmptions that all years have 365
days and that each day is equally likely to be a birthday. This
example is concerned with the probability of the existence of
a common birthday between any two or more people among a
given group of people. It seems easier to calculate the prob
ability that each of the birthdays are different. Note that for
two people, the probability of no common birthday is given by
1  (1/365); that is, the first person has some birthday, and
the second person then has 364 possible days for a noncommon
birthday. Further, for three people, the probability of no com
mon birthday is given by (1  (1/365))(1  (2/365)), and, for
four people, the probability of no common birthday is given by
(1 (1/365))(1 (2/365))(1 (3/365)). Continuing in this way,
we see that for n people (where n is a positive integer less than
365), the probability of no common birthday is given by
( 1  _1 ) (1  ~) (1  ~) x ... x (1  ~) .
365 365 365 365
Checking this numerically, we find that foro, = 23, this probabil
ity is less than 1/2. Thus, for 23 or more people, the probability
that at least two have a common birthday exceeds 1/2. D
www.MathGeek.com
www.MathGeek.com
80 Probability Theory
5.4 The Binomial Distribution
Consider a finite sequence of n terms taking the values of Hand
T. Let N (n, k) denote the nnmber of snch seqnences of length
17 having exactly k H's. Note that if we know this quantity
for sequences of length n  1 then we see that in sequences of
length 71, the sequences that have exactly k H's are given by
those which have exactly k H's in the first n  1 terms and a
T for the nth term and those sequences that have k  1 H's in
the first 17  1 terms and an H in the nth term. Hence, N(n,
k) = N(n  1, k) + N(n  1, k  1). Next, use induction, and
assume that
n!
N(17, k) = k!(17 _ k)!·
(We use the convention that zero factorial is one.) Assume that
this expression is correct for n  1. Then,
(n1)! (n1)!
N(n, k) =
k!(n  1  k)! + (k  l)!(n  k)!
n! (n  k k)
k!(n  k)! 17 +;:;
n!
k!(n  k)!·
Note that for k = 0 or for k = 17, it follows straightforwardly that
N(17, 0) = 1 and N(17, 17) = 1. For 17 = 1 we have that N(l,
0) = 1 and N(l, 1) = 1. Thus, the general result follows by
induction, and we conclude that the number of ways of selecting
k items from a set of 17 items is given by
n!
k!(17  k)!
which is denoted by
(~)
and read as "17 choose k."
www.MathGeek.com
www.MathGeek.com
The Binomial Distribution 81
He uses statistics as a drunken man uses
lamppostsfor support rather than illumi
nation. Andrew Lang
In many of the elementary aspects of probability, a sequence of
mutually independent trials whose only outcomes are success or
failure is considered such that the probability of success is fixed
from trial to trial. Such trials are called Bernoulli trials.
Consider a finite sequence of 17 Bernoulli trials where the proba
bility of success on a trial is given by p. vVe model the underlying
probability space as the set of all seqnences of length 17 consist
ing of S's and F's. Let q = 1  p. \Ve assign a sequence of n
S's and k  17 F's to have probability pnqkn. Now, consider
the probability of getting exactly k S's in 17 trials. Each s11ch
seqnence has probability pnqkn. Fnrther, there are (~) snch
sequences. Since probability measnres are conntably additive,
it follows that to find the probability of obtaining exactly k S's
in n trials, we simply multiply the common probability of one
such sequence by the total number of such sequences. Hence,
the probability of obtaining exactly k S's in 17 trials is given by
Further, note that the event of having exactly kl successes is
disjoint from the event of having exactly k2 successes if kl i k 2.
Thus, we see that the probability of having no more than T
successes in 17 trials is given by
~ (~)pkqnk.
A random variable X taking values in the set {O, 1, ... , n} for
some positive integer n such that
for some p E [0, 1] is said to have a binomial distribution with
parameters p and n.
www.MathGeek.com
www.MathGeek.com
82 Probability Theory
5.4.1 The Poisson Approximation to the Binomial Dis
tribution
Let b(k; n, p) denote the probability of obtaining exactly k suc
cesse8 in n Bernoulli trials where p denotes the probability of suc
ce8S. It is common to deal with a binomial distribution where,
relatively speaking, the parameter n is large and the parame
ter p is small, and yet the product ..\ = np is positive and of
moderate size. In such cases it is often convenient to use an
approximation that is due to Poisson.
For k = 0 it follows that
b(O;17, p) = (1 pyn = (1 ~)n
Taking logarithms and using Taylor's expansion yields
..\2  ....
Inb(O; 17, p) = 17ln 1  ..\) = ..\  .
( 17 217
Thus, for large 17, it follows that b(O; 17, p) ~ e).. Alternatively,
we could have obtained this result by recalling that for fixed A,
..\)71 =
lim
71'= (1  17
e).
Also, for any fixed positive integer k, it follows that for suffi
ciently large n,
b(k;n,p) "\(k1)p..\
b(k  1; n, p) = k(l  p) ~ k'
From this we successively conclude that
b(l; n, p) ~ ..\b(O; n, p) ~ ..\e>.,
and,
1 1 2 ).
b(2; n, p) ~ "2..\b(l; n, p) ~ "2..\ e .
Induction thus implies that
www.MathGeek.com
www.MathGeek.com
Multivariate Distributions 83
This is the classical Poisson approximation to the binomial dis
tribution.
Let
),k
p(k;)') = e>'kT'
\Ve have shown that p(k;)') is an approximation for b(k; 17, p)
when n is sufficiently large. Note that
= = ),k
L p(k;)') = e>' L , = 1.
k=O k=O k.
5.5 Multivariate Distributions
For a positive integer n, consider random variables Xl, X 2 ,
... , Xn defined on a probability space (r2, :.F, P). The joint
probability distribution function of Xl, ... , Xn is the func
tion F : jRn Jo [0, 1] defined by F(XI' ... ,xn ) = P(XI :::;
Xl and ... and Xn :::; xn). \lVe will often denote the function F
by F X1 , ... , Xn when the particular random variables of interest
are not dear from context.
The random variables Xl, ... , Xn possess a joint probability
density fllnction f if there exists a nonnegative Borel measnrable
function J : JRn Jo JR such that
P((XI' X 2, ... , Xn) E A)
= j J(XI' X2, ... , xn) dXI dX2' .. dXn
A
for all A E B(JRn). Note that the integral of J over JRn is equal
to 1.
For a positive integer n consider random variables Xl, ... , Xn
defined on the same probability space and possessing a joint
probability density function JX1, ... ,X.", For any positive integer
i :::; n, the random variable Xi possesses a probability density
function Jx; given by
Jx; (Xi) = r
Jrrt. n  1
JX1' ... , Xn (Xl, ... , Xn) dXI ... dXi1 dXi+1 ... dx n.
www.MathGeek.com
www.MathGeek.com
84 Probability Theory
A density function obtained m this way IS called a marginal
density function.
5.4 Theorem For a positive integer n consider mndom variables
Xl, ... , Xn defined on the same pTObability space. The mn
dam variables Xl, ... , Xn are mui1tally independent if and only
if FXI, ... ,X,,(XI, ... ,xn ) = FXI(XI)" . Fx,,(xn) for all Xl, ... ,
Xn E JR.
5.5 Theorem For a positive integer n consider mndom variables
Xl, ... , Xn defined on the same probability space and possess
ing a joint pTObability density function fXI' ... , Xn ' The mndom
variables Xl, ... , Xn are rrmtually independent 'if and only if
fXI, ... ,Xn(XI, ... , xn) = fxI(xd···fxn(x n ) a.e. with 'respect to
Lebesgue measure on B(JRn).
A random variable X is said to have a nniform distribution on
an interval [a, b] if
0 if X <a
xa
Fx(x) = if a < x ::::; b
ba
1 if X> b.
Note that a density function for X is given by
1
fx(x) = b _ a I[a,b](x),
If we knew that the outcome of an experiment resulted in values
from some interval [a, b] but had no reason to believe that those
valnes wonld tend to concentrate toward any particnlar part of
that interval then we might choose to model the experiment via
a uniform distribution.
Example 5.4 In this example we will consider a problem
known as Buffon's needle problem, which was an early example
of a problem solving technique called Monte Carlo analysis in
which a nonprobabilistic problem is solved using probabilistic
techniques. Consider a plane that is ruled by the lines y = 17 for
17 E Z and onto which a needle of unit length is cast randomly.
www.MathGeek.com
www.MathGeek.com
Multivariate Distributions 85
\Vhat is the probability that the needle intersects one of the
ruled lines?
Let (X, Y) denote the coordinates of the center of the needle
and let e denote the angle between the needle and the x axis.
Let Z denote the distan<:e from the needle's <:enter to the nearest
line beneath it. Note that Z = Y  l Y J where lx J (the floor of
x) denotes the greatest integer not greater than x.
\Ve will model the statement "needle is cast randomly" via the
following assumptions:
1. Z is uniformly distributed on [0, 1].
2. e is uniformly distributed on [0, 1T].
3. Z and e are independent.
Note that these assnmptions imply that
. 1 .
iZ,e(z, 8) = iz(z)ie(8) = 1[0. 1] (z)1[O,1f] (8).
1T
For what values of z and 8 will the needle intersect the line
immediately above its center? If z < 1/2 then the needle cannot
intersect the line above its center. Assume then that 1/2 :::::; z :::::;
1. In this case the needle intersects the line directly above its
<:enter if and only if 80 :::::; 8 :::::; 1T  80 where 80 = sin1(2(1  z)).
Thus, the probability that the needle intersects the line above
it is given by
111 l1fsinl(2(1Z))
 d8dz
1T 1/2 sin 1 (2(1z))
1 2lo1/2
   sin 1(2y) dy
2 1T 0
"21 :;2 [
y sin 1 ( 2y) + "21 viI  4y2 ] 11/2
0
By symmetry the needle has the same probability of hitting the
line directly beneath its <:enter. Thus, the probability that the
needle hits any line on the grid is given by 2/1T.
www.MathGeek.com
www.MathGeek.com
86 Probability Theory
Note that this experiment can be used to obtain an estimate of
the numerical value of 1l". That is, throw the needle N times and
count the number of times H that the needle hits a line. The
ratio 2N/ H should be close to 1l" for large values of N. Indeed,
we will show later that this ratio converges to 1l". Solving a
deterministic problem via probabilistic techniques is an example
of a technique known as Monte Carlo simulation. D
5.6 Caratheodory Extension Theo
rem
Let D be a nonempty set, and let A be an algebra of subsets of
D. That is, A is a nonempty set of subsets of n that is closed
under the operations of taking complements and finite unions.
Recall that it follows from DeMorgan's Law that an algebra is
also closed under the operation of taking finite intersections.
Further, recall, that an algebra on D contains both the empty
set and the set D.
By a measure A on an algebra A we mean a function A defined
on A and taking values in [0,00] that satisfies the following two
properties:
1. A(0) = 0 and, for A E A, A(A) ;:::: o.
2. If {An}nEN" is a sequence of disjoint sets in A whose union
UnEN" An also belongs to A, then
Note that when a countable union of disjoint sets in the algebra
is itself in the algebra then we require that the measure on the
algebra must behave as if it were a measure.
www.MathGeek.com
www.MathGeek.com
Caratheodory Extension Theorem 87
Let 0 be a nonempty set. A function M defined on JP(O) and
taking values in [0,00] is called an outer measure if it satisfies
the following three properties:
1. 1\1(A) 2': 0 for all A E JP(O), and 1\1(0) = O.
2. 1V1(A) ::::; M(B) if A c B c O.
3. 1\1 (91 Ak) ::::; %i M(Ak) for any sequence {Ak}kEN of
subsets of O.
As an example of an outer measure, note that Lebesgue outer
measure is an outer measure on JP(JR). Further, note that Dirac
measure at a fixed point of a set is an outer measure on the
family of all subsets of the set of interest.
As with outer Lebesgue measure, it is possible to use an outer
measnre to characterize a family of measnrable sets. In doing
so, we base the definition of measurability on Caratheodory's
condition. For a given outer measure AI, we say that a subset S
of 0 is measnrable if J\J(A) = 1V1(AnS)+Al(A\S) for any snbset
A of O. This condition has somewhat of an artificial touch to it.
It almost seems mysterions, since it is not in the least intnitive.
Indeed, it singles out the subsets S of 0 which when split by
any subset of 0 resnlts in two subsets of 0 for which the onter
measure adds. Note that a subset S of 0 is measurable if and
only if 1V1(AI U A 2) = 1V1(Al) + 1\I(A2) whenever Al C Sand
A2 C se.
Note that it follows from property (3) of an outer measure that
1\1(A) ::::; l\1(An S) + 1\1(A \S). Hence, we see that a subset S of
o is measurable if and only if, for any subset A of 0, it follows
that M(A) 2': M(A n S) + M(A \ S). Now, it follows almost
immediately that if IvI is an outer measure on JP(O) and if Z is
a subset of 0 such that 1V1(Z) = 0, then Z is measnrable. That
is, let Z be such a set and let A be any subset of O. Then we
have that J\1(A n Z) + 1V1(A \ Z) ::::; J\1(Z) + M(A) by property
(2) of outer measures. Then, since IvI(Z) = 0, we have that
1\1(A n Z) + J'I(A \ Z) ::::; 1\1(A), which characterizes Z as being
measnrable since it is always true that M(A n Z) + M(A \ Z) 2':
1\1(A) by property (3) of outer measures.
www.MathGeek.com
www.MathGeek.com
88 Probability Theory
If lVI is an outer measure on the subsets of 0, and if A is a mea
surable set, then M·(A) is called the NImeasure, or simply the
measure, of A. This terminology is justified by the next theo
rem. Before presenting that theorem, however, we will present
a lemma that will be of use in proving the theorem of interest.
5.1 Lemma Let 0 be a nonempty set and let M be an outer meaS1tre
on the subsets of o. If Al and A2 are measurable, then so 'is
Al \ A 2.
Proof. Vve will show that NI(AnB) = NI(A)+NI(B) whenever
A c (AI \A 2) and B C (AI \A 2)c. Since B = (BnA 2) u (B\A 2),
it follows that Au B = (A u (B \ A 2)) u (B n A2). Hence, since
A U (B \ A 2) c A2 and (B n A 2) c A 2, it follows from the
measnrability of A2 that JVI(AU B) = lVI(AU (B \ A 2)) + J\J(B n
A2). However, A C Al and (B \ A 2) c (AI \ A 2)C \ A2 cAl.
Therefore, since Al is measnrable, JVI(A U (B \ A 2)) = AI(A) +
.i\1(B\A2). Combining equalities and using the measurability of
A2 we see that NJ(A U B) = .i\1(A) + NI(B \ A 2) + JVI(B n A 2) =
.i\1(A) + .i\1(B), and the lemma is proved. D
As before, let D be a nonempty set. If S is a subset of 0, then
any family of subsets of 0 whose union contains S as a subset
is known as a cover of S. A countable cover of S is a cover of S
that is countable.
5.6 Theorem Let D be a nonempty set. Let NJ be an outer meaS1tre
on the subsets of D.
1. The family of ./I/I measurable subsets ofD. forms a (jalgebra
on D..
2. If {Ad kEN is a sequence of disjo'int measurable sets then
More generally, for any subset A of 0, it follows that
www.MathGeek.com
www.MathGeek.com
Caratheodory Extension Theorem 89
and
Proof. Let {Ad kEN be a seqnence of disjoint measurable sub
sets of O. Let E = Uk EN Ak and, for each positive integer j, let
E j = U~=I A k . vVe will show that
j
NI(A) = L AI(A n A k) + M(A \ Ej).
k=1
The proof will proceed by induction on j. For j = 1, the result
follows from the measurability of AI. Now, assnming that the
result holds for j  1, it follows that
1\1(A) 1\1(A n Aj) + 1\1(A \ Aj)
j
1\1(A n Aj) + L NI((A \ Aj) \ A k )
k=1
Recalling that the Ak'S are disjoint, it follows that (A\Aj)nAk =
An Ak for k ~ j  1. Therefore, since (A \ Aj) \ E j  I = A \ E j ,
it follows that
j
1\;f(A) = L AI(A n Ak) + M(A \ E j ),
k=1
as required. This completes the proof of the previous claim.
Next, since E j C E, it follows that M(A \ E j ) :2: M(A \ E).
Using this fact with the above result and considering the limit
as j 7 00, we see that
00
M(A) :2: L 1\;f(A n A k) + 1VI(A \ E) :2: 1\;f(A n E) + 1Vl(A \ E).
k=1
However, we also have that M(A) ~ M(A n E) + 1VI(A \ E).
Therefore, E is measurable, and
00
1Vl(A) = L 1Vl(A n A k) + M(A \ E).
k=1
www.MathGeek.com
www.MathGeek.com
90 Probability Theory
If we replace A with An B in this equation we see that
=
M(A n B) = L NI(A n A k ),
k=l
and the proof of (2) is complete.
Note that we have also shown that a countable union of disjoint
measurable sets is measurable. To prove (1), we must show that
a countable union of arbdrary measurable sets is measurable.
Returning now to the proof of (1), it follows from Lemma 5.1
and the fad that n is measurable that the complement of a
measurable set is also measurable. Moreover, since El U E2 =
(Ef \ E 2)C, it follows that El U E2 is measurable if El and E2
are measurable. Therefore, any finite union of measurable sets
is measurable. Next, let {EkhEN be a sequence of measurable
sets. If, for each positive integer j, B j = U~=l E k , then
Since the Bj's are measurable and nondecreasing, the terms on
the right are measurable and disjoint. Thus, by the case already
considered, it follows that Uk=l Ek is measurable. This com
pletes the proof of the theorem. D
A measure Jl on an algebra A is said to be O"finite (with respect
to A) if n can be written as n = UkEN nk where for each positive
integer k, rh E A and Jl(fh) < 00. For example, Lebesgue
measure is O"finite on the algebra generated by the intervals (a,
b].
Let n be a nonempty set, and let A be an algebra on n. If Jl is
a measure on the algebra A, we define the outer extension Jl* of
Jl as follows: For any subset A of n,
=
Jl*(A) = inf L Jl(A k ) ,
k=l
where the infimum is taken over all countable covers of A by sets
in A. Note that it is always possible to find such a cover of A
www.MathGeek.com
www.MathGeek.com
Caratheodory Extension Theorem 91
since 0 itself belongs to A. The fact that A is an algebra allows
us to assume without loss of generality that the sets Ak are dis
joint. \Ve will make this assumption throughout the remainder
of the section.
5.2 Lemma Let 0 be a nonempty set. If A is an algebra on 0 and
if JL is a measure on A then the outer extension JL* of JL ,is an
outer meaS1Lre.
Proof. Note that JL*(0) = 0 since 0 E A, and JL*(A) ;:::: 0 for
any subset A of D. If Al and A2 are two subsets of 0 such
that Al C A 2, then any <:ountable (;Over of A2 by sets in A is
also a countable cover of Al by sets in A. Thus, we see that
JL*(AI) ::; JL*(A2). Now, let {AdkEN be any sequen<:e of subsets
of O. \Ve wish to show that
Let E be a positive real number. For ea<:h positive integer k,
there is a countable covering of Ak by sets {Ajk} from A such
that
L JL(A jk ) ::; JL* (Ak + ;k'
jEN
since JL* (Ajk) is defined as an infimum. Now, since Uk EN Ak C
UjEN Uk EN Ajk' it follows that
and, since E > 0 may be chosen arbitrarily close to zero, the
desired result follows. D
5.7 Theorem (Caratheodory Extension Theorem) Let A be
an algebra on a nonempty set D. If A is a meaSUTe on A, let A*
be the correspond'ing outer meaS1Lre, and let A* be the O"algebra
of A* measumble sets. Then
1. the restriction of A* to A* is an extension of A
www.MathGeek.com
www.MathGeek.com
92 Probability Theory
2. if A is rrfinite with respect to A, and if S is any rralgebm
with A eSc A *, then A* is the only measure on S that
is an extension of A.
Proof. Let A E A. Then clearly A*(A) ::::; A(A). On the other
hand, given disjoint sets {Ak : kEN} in A that cover A, let
A~ = Ak n A. Then A~ E A and A is the disjoint union of the
A~'s. Hence A(A) = L:kEN A(AU. Since A~ c A, it follows that
A(A) ::::; L:kEN A(A~). Therefore, A(A) ::::; A*(A), and the proof of
(1) is complete.
To prove (2), which states the nniqneness of the extension, let
JL be any measure on the O"algebra S where A eSc A* that
agrees with A on A. Given a set E E S, consider any countable
colledion {Ed snch that E C UkEN Ek and snch that each Ak E
A. Then
Therefore, by definition of A*, it follows that JL( E) ::::; A* (E).
To show that equality holds, first suppose that there exists a
set A E A with E c A and A(A) < 00. Applying what has
just been proved to A \ E, which belongs to S, we see that
JL(A \ E) ::::; A*(A \ E). However,
JL(E) + JL(A \ E) = JL(A) = A*(A) = A*(E) + A*(A \ E).
Since each of these terms is finite (due to the fact that A(A) is
finite) it follows that JL(E) = A*(E) in this case.
In the general case, since A is rrfinite, there exist disjoint Ak E A
such that the Ak'S cover S1 and such that A(Ak) < 00. \lVe may
apply the result above to each En Ak (which is a subset of A k )
to show that JL(E n A k ) = A*(E n Ak)' By summing over k, we
see that JL(E) = A*(E), and this completes the proof. D
The next result follows as a consequence of the Caratheodory
Extension Theorem.
5.8 Theorem Let F : lR [0, 1] be a pTObab'ility distribution func
7
tion and let JLo((a, b]) = F(b)  F(a) for 00 ::::; a < b. Then
www.MathGeek.com
www.MathGeek.com
Caratheodory Extension Theorem 93
theTe is a unique e:Eiension of Po to a meaSUTe p on B(lR) such
that p(I) < 00 for any bounded interval I.
Consider a random variable X defined on a probability space (0,
F, P). The distribution or law of X is the probability measure
Px on (lR, B(lR)) defined by px(A) = P(X E A) = P({w EO:
X(w) E A}) for each A E B(lR). VVe say that Px is the measure
on (lR, B(lR)) induced by X. Note that Fx(x) = px(( 00, xl)
and that Px = PoXI.
5.9 Theorem If F is a nondecreasing, rightcontirmous realvalued
function defined on lR then theTe er;ists a unique meaSUTe p on
(lR, B(lR)) such that p((a, bl) = F(b)  F(a) for all a <: b.
The measure p corresponding to the function F in Theorem 5.9
is said to be the measure on (lR, B(lR)) induced by F and is
obtained via Theorem 5.8. If Fx is the distribution function of
a random variable X then the measure on (lR, B(lR)) induced by
Fx is equal to the measure on (lR, B(lR)) induced by X.
5.10 Theorem If F is any probability distribution function then
there exists on some probability space a random variable X such
that Fx = F.
Proof. Let p be the measure on (lR, B (lR)) induced by F and
define a random variable X on the probability space (lR, B(lR) ,
p) by letting X (w) = w for each w E R The distribution
function Fx of X is given by Fx(x) = p({w : X(w) :::; x}) =
p(( 00, xl). From Theorem 5.9 we know that the measure p
is such that p((a, b]) = F(b)  F(a). In particular, p(( 00,
xl) = F(x) and hence F(x) = Fx(x). D
The dearer the teacher makes it, the worse
it is for you. You must work things out for
yourself and make the ideas your own.
vVilliam Osgood
www.MathGeek.com
www.MathGeek.com
94 Probability Theory
Many questions about a random variable X may be answered
based only on the distribution function of X. That is, to answer
such questions we do not need to know the probability space on
which X is defined. Instead, we may simply take the distribution
function Fx and use Theorem 5.10 to define a random variable
Yon a probability space (R B(JR.) , fL) where fL is the measure on
(JR., B(JR.)) induced by Fx. Any question about X that depends
only upon Fx will have the same answer if we ask it about
the random variable Y instead. Thus, we will often say "let X
be a random variable with distribution function Fx" and make
no reference to the underlying probability space on which X is
defined.
The following result establishes a link between the concept of
measurability and the existence of a functional relation. In par
ticular, this result will place on firm footing the engineering
concept of a data processor.
5.11 Theorem ConsideT a collection {Xl, ... , Xn} of random vaTi
abIes defined on a pTObab'ility space (0, F, P). A random vaTi
able X defined on th'is space ,is measurable with Tespect to ()(Xl,
... , Xn) if and only if theTe exists a BOTel measurable function
f: JR.n  7 JR. s1Lch that X(w) = f(Xl(w), ... , Xn(w)) fOT all
wE 0.
5.7 Expectation
If X is a random variable defined on (0, F, P) then the expected
value of X is denoted by E[X] and is defined by
E[X] = kXdP
provided the integral exists. If 9 : (JR., B(JR.)) 7 (JR., B(JR.)) then
E[g(X)] = k g(X) dP.
A random variable X for which E[X] exists and is finite is said
to be integrable or to have a finite mean or to be a first order
random variable.
www.MathGeek.com
www.MathGeek.com
Expectation 95
5.12 Theorem Consider an integrable random variable X and a
Borel measurable function 9 : JR. + R ff Fx is the distribution
function of X and if JL x is the measure on (JR., B (JR.)) induced
by X then
Further, if X possesses a density function fx then
E[g(X)] = kg(x)fx(x) dx.
Example 5.5 In this example we will find a rather simple
expectation using three different methods in order to illustrate
some of the concepts that we have been considering.
Let X be a random variable defined on a probability space (0,
F, P) with distribution function
0 if x < 2
F (x) = ~ if  2 ::; x < 3
{
1 if x ;:::: 3.
Note that P(X = 2) = P(X = 3) = 1/2. VVhat is E[X2 +
I]? D
Method I: \Ve will first find the expectation via a Lebesgue
integral over 0 with respect to P. Let A = {w EO: X(w) =
2} and let B = {w EO: X(w) = 3} and note that 0\ (AUB)
is a Pnull set. Further, note that
k(X 2
+ 1) dP
L + 1) + k
(X2 dP (X2 + 1) dP
L( + + k +
4 1) dP (9 1) dP
15
5P(A) + 10P(B) = 2.
The following result will be used by Method II.
www.MathGeek.com
www.MathGeek.com
96 Probability Theory
5.3 Lemma If
G(x) = {a(3 if x ~ y
'if x < y
where /3 > a and if h : lR Jo lR is continuous at y then
k h(x) dG(x) = ((3  a)h(y).
Proof. Consider a subdivision r = {ao, ... , an} of an interval
[a, b] such that a < y < b and assume that aj > y > ajl.
Recall the notation we introduced during our derivation of the
RiemannStieltjes integral. This desired result follows since
n
R(r) = Lh(bi)(G(a;)  G(a;I)) = h(bj ) ((3  a)
i=1
and since h(bj ) Jo h(y) as If! Jo O. D
Method II: We will next express E[X] as a RiemannStieltjes
integral over lR with respect to F. Let 1 > [ > 0 and note that
k (x
2
+ 1) dF(x)
2
j2E, 2+e) (x + 1) dF(x)
2
+ j3E,3+c) (x + 1) dF(x)
1 1 15
(4+1)+(9+1) =.
2 2 2
Method III: Finally, we will express E[X] as a Lebesgue inte
gral over lR with respect to the measure /Lx on (lR, B(lR)) induced
by X. First, note that
I if 2 E A and 3 E A
/Lx(A) = P(X E A) = 1/2 ~~ 2 E A and 3 E Ac
{ 1/2 If 2 E AC and 3 E A
o if 2 E AC and 3 E AC
for A E B(lR). Note that lR \ { 2, 3} is a /Lxnull set. Thus, it
follows that
E[X2 + 1] r
J{2}
(x 2 + l)d/LX + r (x
J{3}
2
+1)d/LX
15
(4 + l)/Lx({ 2}) + (9 + 1)/Lx({3}) = 2.
www.MathGeek.com
www.MathGeek.com
Expectation 97
5.4 Lemma Consider a random variable X that takes values only
in some countable set {Xl, X2, ... } and a function 9 : JR;. + JR;.
such that g(X) is integrable. It follows that
=
E[g(X)] = Lg(Xi) P(X = Xi).
i=l
Example 5.6 This example is known as the St. Petersburg
Paradox. Consider the following game. A fair min is flipped
until a tail appears; we win $2k if it appears on the kth toss.
Let the random variable X denote our winnings. \Vhat is E[X]?
That is, how mu(;h should we be required to "put up" in order
to make the game fair? Note that X takes on the value 2k with
probability 2 k ; i.e. the probability that we toss k 1 heads and
then toss one tail. Thus,
ex)
E[X] = L 2k 2 k = 00.
k=l
The paradox arises since most people would "expect" their win
nings to be much less. The problem arises from our inability to
put in perspective the very small probabilities of winning very
large amounts. The problem returns a much more realistic value
if we assign a maximum amount that can be won; that is, if we
are allowed to "break the bank" when we reach a preassigned
level. D
Example 5.7 As part of a reliability stndy, a total of n items
are tested. Suppose that each item has an exponential failure
time distribution given by
for t > 0 where Ti is a random variable that denotes the time at
whi(;h the ith item fails and where A is a fixed positive mnstant.
Note that if A is large then we expect the item to fail quickly.
Assume that the T/s are mutually independent. (Is this a good
assumption?) Let T denote the time at which the first failure
O(;(;11rs. \Vhat is the expected value of T? Note that T ex(;eeds
www.MathGeek.com
www.MathGeek.com
98 Probability Theory
some positive time t if and only if Ti > t for each i. Thus, for
t> 0,
P(T> t) P(T1 > t, T2 > t, ... , Tn > t)
P(T1 > t)P(T2 > t) ... P(Tn > t)
{= AeAtldtl1= Ae At2 dt 2 ••• (= AeAtndt n
it . t it
e At ... e At = e nAt
From this we see that FT(t) = 1  e nAt for t > o. Recall
from the fnndamental theorem of caknlns that if a probability
distribution function is differentiable then that derivative is a
probability density function corresponding to that distribution.
Thus, fT(t) = nAe nAt for t > 0 from which it follows that
= 1
E[T] =
1 o
tfT(t) dt = 
nA
where we have used the fact that fo= yeYdy = 1. Note that
the expeded time of the first failure decreases as either 17 or A
mcreases. D
5.8 Useful Inequalities
Let X be a random variable defined on (0, :.F, P). If kEN
then E[Xk] is called the kth moment of X and E[(X  E[X])k]
is called the kth central moment of X. The first moment of
X is called the mean of X and the second central moment of
X is called the variance of X and is denoted by (]"2, (]"1, or by
VAR[X]. The standard deviation of X is denoted by (Tx and
is given by the nonnegative square root of the variance of X.
A random variable with a finite second moment is said to be a
second order random variable.
5.13 Theorem If k > 0 and if E[Xk] is finite then E[Xj] is finite
when 0 < j < k.
www.MathGeek.com
www.MathGeek.com
Useful Inequalities 99
Proof. Note that E[Xj] is finite if and only if E[IXlj] is finite.
Further, note that
j
E[IXl ] in IXl j dP
j j
r
J{IXlj<l}
IXl dP + r
l{IXlj:;"l}
IXl dP
< r
J{IXlj<l}
IdP+ r
J{IXlj:;"l}
IXlkdP
j
< P({IXl < I}) +E[IXlk] < 00.
Thus, if the kth moment is finite then all lower moments are
also finite. D
Exercise 5.4 The density function
. 1
f(x) = :::
7r(1 + x 2 )
for x E lR is called a Cauchy density function. Let X be a
random variable with density function f. Show that none of the
odd moments of X exists and that none of the even moments of
X is finite.
Exercise 5.5 Although the first moment of a random variable
need not exist, the second moment of a random variable always
exists. vVhy?
Exercise 5.6 Show that if X is a second order random vari
able then
5.14 Theorem Consider a positive integer n and let Xl, ... , Xn be
mutually independent random variables defined on (n, F, P). If
Xi :2: 0 for each i or if E[Xi ] < 00 for each i then E[XI ... Xn]
exists and is equal to E[Xl]' .. E[Xn]'
www.MathGeek.com
www.MathGeek.com
100 Probability Theory
5.1 Inequality (Holder) Ifl < P < 00, 1 < q < 00, and 1+1
p q
= I,
then
5.2 Inequality (Minkowski) If p :2: 1, then
The following inequality is a special case of Holder's inequality.
5.3 Inequality (CauchySchwarz) E[IXYI] ::; VE[X2]VE[Y2].
5.4 Inequality (Chebyshev) If 0: > 0 then
P(IX  E[X]I :2: a) ::; ~2VAR[X].
a
Example 5.8 Consider again Buffon's needle problem from
Section 5.5 on page 84 and recall that the random variable Y =
H/N provides an estimate of 2/Tr where H denotes the number
of times the needle hits a line after N drops. Note that
P(H = h) = (N)
h
(2)h
;
(1  ;2)Nh
for h = 0, 1, ... , N, where we have nsed the binomial dis
tribntion from Section 5.4 on page 80. Thns, E[Y] = 2/Tr
and VAR[Y] = ~~ (1  ~). What value of N ensures that
IY  (2/it) I < 0.01 with probability 0.999? Chebyshev's in
equality implies that such will be true if
1 2 1 2 ( 1  2) < 0.001.
( 1/ 100 ) N it it
This inequality holds when N > 2,313,350. The dedicated
reader is invited to verify this result empirically. D
Recall that a function <I> : lR 7 lR is said to be convex if <I> (AX +
(IA)Y) ::; A<I>(x)+(IA)<I>(y) whenever 0::; A::; 1. A sufficient
condition for <I> to be convex is that it have a nonnegative second
derivative.
www.MathGeek.com
www.MathGeek.com
Useful Inequalities 101
5.5 Inequality (Jensen) If <I> is conve:r on an interval containing
the range of X then <I>(E[X]) ::::; E[<I>(X)]. Note that letting
<I> ( x) = x 2 implies that (E[X])2 ::::; E[X2].
5.6 Inequality (Lyapounov) If 0 < a ::::; (3 then
Statistical thinking will one day be as nec
essary for efficient citizenship as the ability
to read and write. H. G. 'Wells
Let X and Y be random variables with finite means and assume
that E[XY] is also finite. The covariance of X and Y is de
noted by COV[X, Y] and is defined to be COV[X, Y] = E[(X
E[X])(Y E[Y])]. Note that COV[X, Y] = E[XY]E[X]E[Y],
also. The random variables X and Yare said to be nncorrelated
if COV[X, Y] = 0; that is, if E[XY] = E[X]E[Y]. Note that if
X and Yare independent (and if E[X], E[Y], and E[XY] are
finite) then X and Yare uncorrelated. If the variances oJ and
()~ of X and Yare finite and nonzero then the correlation coef
ficient between X and Y is denoted by p(X, Y) and is defined
by
p(X, Y) = COV[X, Y].
(}x(}y
5.15 Theorem If Xl, ... ,Xn are second order random variables
then
n n
VAR[XI + ... +Xn] = LVAR[Xi ] + 2 L COV[Xi' Xj].
i, j=l
i<j
5.3 Corollary If Xl, ... , Xn are second order, uncorrelated ran
dom var'iables (that is, if COV[X;, X j ] = 0 when i i= j) then
n
VAR[XI + ... +Xn] = LVAR[Xd
;=1
www.MathGeek.com
www.MathGeek.com
102 Probability Theory
5.4 Corollary If Xl, ... , Xn are second order, mutually indepen
dent random variables then
n
VAR[X 1 + ... +Xn ] = LVAR[Xi ].
;=1
5.9 Transformations of Random Vari
ables
5.16 Theorem If X and Y have a joint probability density function
f x, y (x, y) then the random variable Z = X + Y possesses a
density function given by
fz(z) = [ : fx,Y(x, z  x) dx.
Proof. Let Az = {(x, y) E]R2 : x + y:::::; z} and note that
P(Z:::::;z) = jjfX,y(x,y)dydx
Az
[: [Z~X fx,Y(x, y) dydx
[: [z= fx,Y(x, s  x) dsdx
[ziX) [: fx,y(x, s  x) dxds.
Thus, we have found a nonnegative function f z (s) such that
P(Z:::::; z) = [ziX) fz(s) ds
for all z E R It follows by definition that fz is a probability
density function for Z. D
5.5 Corollary If X and Yare independent mndom 1Jariables pos
sessing density functions fx and fy, respectively, then the mn
dom variable Z = X + Y possesses a probability density function
given by
fz(z) = l fx(x)fy(z  x) dx.
www.MathGeek.com
www.MathGeek.com
Transformations of Random Variables 103
Note that this density for Z is the convoltdion of fx and fy.
For example, if X and Yare independent random variables each
with a uniform distribution on [0, 1] then X +Y has a triangular
distribution on [0, 2]. A proof of the following result will be
supplied by Example 5.10.
5.17 Theorem If X and Y possess a joint pmbability density func
tion fx,Y(x, y) then the random variable B = XY possesses a
probability density funct'ion given by
5.18 Theorem Consider a random variable X that possesses a pmb
ability density function fx and a function g: ]R + ]R that pos
sesses a differentiable inverse. The random variable Y = g(X)
possesses a pmbability density function given by
Example 5.9 Consider a random variable X with probability
density function fx and let g(x) = ax + b for a, b E ]R with
a i O. Let Y = g(X) and note that gl(x) = (x  b)ja. Thus,
Theorem 5.18 implies that
fy(y) = fx(gl(y)) Id~ygl(Y)1
.fx (y : b) I :y (y : b) I
Ix (Y:b) I~I.
D
5.19 Theorem Consider random variables X and Y that possess a
joint pmbability density function Ix, y(x, y). Consider functions
g: ]R2 + ]R and h: ]R2 + ]R for wh'lch there eX'lst I1Lnctions a:
]R2 + ]R and {3: ]R2 + ]R such that a(g(x, y), h(x, y)) = x
www.MathGeek.com
www.MathGeek.com
104 Probability Theory
aa aa
and (3(g(x , y), h(X, y)) = y and such that a (x, y), a (x, y),
'x y
l6
aa (x, y), and aa(3 (x, y) each e:rist. The random vaTiables B =
x y
g(X, Y) and T = h(X, Y) possess a joint pmbability density
function given by
fB,T(b, t) = fX'y(a(b, t), (3(b, t)) det
Example 5.10 As an example we will prove Theorem 5.17.
Let g(x, y) = xy and h(x, y) = y. Let a(b, t) = bit and fJ(b,
t) = t and note that a(g(x, y), h(x, y)) = x and. 16(g(x, y), h(x,
y)) = y as desired. Let B = XY and T = Y. Using the previous
result it follows that
1
0
t
fB, T(b, t) fx,y(~,t) det
b
1
t'2
fx,Y (~, t) I~I·
Thus, it follows that
fB(b) k fB, T(b, t) dt
k fx,Y (~, t) I~I dt
as claimed. D
www.MathGeek.com
www.MathGeek.com
Moment Generating and Characteristic Functions 105
5.10 Moment Generating and Char
acteristic Functions
The moment generating function of a random variable X is de
fined to be Alx(s) = E[e SX ] for all s E lR for whkh the expec
tation is finite provided that 1\1x (s) is finite in some nonempty
open interval containing the origin.
5.20 Theorem The moment genemting funct'lon of a bounded mn
dom variable exists.
Proof. Let X be a bounded random variable, and note that e Sx
is bounded as well for each fixed value of s. Thus, E[e SX ] exists
for each fixed s and, by the dominated convergence theorem, is
a continuous function of s. D
5.21 Theorem Consider a mndom variable X for wh'ich the moment
genemting function 1\Jx (s) e.1:'ists. The function JvIx satisfies the
following properties:
=
1. 1\1x(s) = L skE[Xk]jkL
k=O
5.22 Theorem If X and Yare independent mndom variables pos
sessing moment genemting functions lUX and lvIy , respectively,
then the sum X + Y possesses a moment genemting function
that is given by lvIx+y(s) = lUx(s)My(s).
5.23 Theorem Cons'lder two mndom variables X and Y possessing
moment genemting functions 1\1x and lvIy , respectively. The
mndom variables X and Y have the same distribution if and
only if lvIx = 1\1y .
4This result is known as Taylor's Theorem.
www.MathGeek.com
www.MathGeek.com
106 Probability Theory
In the space of one hundred and seventysix
years the Lower Mississippi has shortened
itself two hundred and fortytwo miles.
That is an average of a trifle over one mile
and a third per year. Therefore, any calm
person, who is not blind or idiotic, can
see that in the Old Oolitic Silurian Period,
just a million years ago next November,
the Lower Mississippi River was upward of
one million three hundred thousand miles
long, and stuck out over the Gulf of Mex
ico like a fishingrod. And by the same to
ken any person can see that seven hundred
and fortytwo years from now the Lower
Mississippi will be only a mile and three
quarters long, and Cairo and New Orleans
will have joined their streets together, and
be plodding comfortably along under a sin
gle mayor and a mutual board of aldermen.
There is something fascinating about sci
ence. One gets such wholesale returns of
conjecture out of such a trifling investment
of fact. Mark Twain
Example 5.11 A random variable X is said to have a Poisson
distribution with parameter A > 0 if
Ak
P(X = k) = _e A
k!
for each nonnegative integer k. Note that for snch a random
variable X, the moment generating function NIx exists and is
given by
NIx(s)
www.MathGeek.com
www.MathGeek.com
Moment Generating and Characteristic Functions 107
where we have recalled that the Taylor's series expansion for e Z
is given by
k (Xl
z _","""" z
e  L ,.
k=O k.
Now, assume that X and Yare independent random variables
each with a Poisson distribution with parameter ).. \Vhat is the
distribution of X + Y? Using Theorem 5.22 we see that
and hence from Theorem 5.23 it follows that X +Y is Poisson
with parameter 2),. D
A problem with the moment generating function is that it need
not exist and hence is difficult to use in a general setting. The
characteristic function defined below shares many of the same
properties as the moment generating function yet always exists.
In nonprobabilistic contexts, a moment generating function is
similar to a Laplace transform and a characteristic function is
similar to a Fourier transform.
The characteristic function of a random variable X is the func
tion <I> x : lR Jo C defined by
<I>x(t) = E[e~tX] = E[cos(tX)] + 1.E[sin(tX)].
For the characteristic functions of several common distributions,
see Table 5.1.
5.24 Theorem A characteristic function <I>x exists for any random
variable X and it possesses the following properties:
1. I<I> x (t) I ::::; <I> x (0) = 1 for all t E R
2. IfE[lXkl] < 00 then <I>~)(O) = 1,kE[Xk].
3. <I>x(t) = <I>x(t).
4. <I> x (t) is realvalued if and only if F x is symmetric; that
is, if and only if IE dFx(x) = LE dFx(x) for any real
Borel set B where  B = {x: x E B}. (Note that a
random variable with a symmetric, absolutely continuous
probability distr"ibution function possesses an even proba
bility density function.)
www.MathGeek.com
www.MathGeek.com
108 Probability Theory
5.25 Theorem Distinct pTObability distribtdions correspond to dis
tinct characteristic functions.
5.26 Theorem If X and Yare independent random variables then
5.27 Theorem If a, b E lR and 'if Y = aX + b then
5.28 Theorem (Continuity Property) Suppose that {Fn}nENis a
sequence of pTObability distribution functions with correspond
ing characteristic functions {<pn : n E N}. If there exists a
pTObability distribution function F such that Fn(x) 7 F(x) at
each point x where F is continuous then <pn(t) 7 <p(t) for all
t, where <P is the characteristic function of F. Conversely, if
<p(t) = limn 7= <pn(t) exists and is continuous at t = 0 then <P is
the character'isi'ic function of some pTObability &istribui'ion ftLnc
tion F and Fn(x) 7 F(x) at each point x where F is continuous.
5.11 The Gaussian Distribution
A random variable X is said to be a Gaussian random variable
or to possess a Gaussian distribution if X has a probability
density function of the form
1
fx(x) = ~exp
((X  m)2) 2
21l"(J"2 2(}
for all x E lR where m E lR and (}2 > 0 are fixed parameters. To
indicate that X has such a distribution we write X rv N(m, (}2).5
As we will see, the mean of X is Tn and the variance of X is (}2.
Note that these two parameters completely specify the Gaussian
"Some texts refer to the Gaussian distribution as the Normal distribu
tion. The "N" in our notation comes from this latter terminology.
www.MathGeek.com
www.MathGeek.com
The Gaussian Distribution 109
Table 5.1 Common Characteristic Functions
I Distribution I Ix <Px
e't  1
Uniform I(O,l)(X)
~t
1
Exponential exI(o, =) (x) 
I  ~t
1
Laplace le 1xl 
2 1 + {2
1 1
Cauchy  e 1tl
1[" 1 + X2
1 e x2/2
 e t2 /"2
Gaussian
yI27r
www.MathGeek.com
www.MathGeek.com
110 Probability Theory
distribution of X. If X N(O, 1) (i.e. if X is Gaussian with
f'.)
zero mean and unit variance), then we say that X is a standard
Gaussian random variable or that X has a standard Gaussian
distribution.
5.29 Theorem If X N(m, ()2) then X possesses a moment gener
f'.)
ating function given by
2
NJX(t) = exp ((J:t + tm) .
Proof. Note that
Mx(t)
where the final integral equals 1 since the integrand is a Gaussian
density with mean (J2t + 1n and variance (J2. D
www.MathGeek.com
www.MathGeek.com
The Gaussian Distribution 111
Example 5.12 The moment generating function may now
be used to confirm that if X N(rn, (}2) then E[X] = Tn and
f"'.)
VAR[X] = (}2. Note that lVl'x(t) = (Tn + (}2t)l\1x(t) and that
l\1'{(t) = (}2l\1x(t) + (m + (}2t)2 1v1x(t). Thus, E[X] = M'x(O) =
m and E[X2] = l\1'{(0) = (}2+rn 2, which implies that VAR[X] =
E[X2]  (E[X])2 = (}2 as expected. D
5.30 Theorem If a mndom vaTiable X has a N(m, (}2) distTibution
then the mndom vaTiable
W = X Tn
()
is a standaTd Gauss'lan mndom vaTiable.
Proof. Note that
Fw(w) = P(W S w) = P (X :rn s w) = P(X S (}w + rn).
Thus, it follows that
Fw(w) = i:+m fx(x) dx
j w ~ exp ( _y2) dy with y = x m~ .
00 V 21f 2 ()
Thus, we see that lV has a standard Gaussian distribution. D
Note 5.4 If
x 1
cD(x) = j
. 00
;;cexp(t 2 /2)dt
V 21f
then, for x 2 0,
cD(x) = 1 ~(1 +d1x+d"2X2 +d3X 3 +d4x4+d5X 5+d6x 6)16 +E(X)
where 1c:(x)1 < 1.5 x 10 7 and where
d1 0.0498673470
d2 0.0211410061
d3 0.0032776263
d4 0.0000380036
d5 0.0000488906
d6 0.0000053830.
www.MathGeek.com
www.MathGeek.com
112 Probability Theory
Further, if <I>(x) = 1  p for 0 < p ::; 1/2 then
x=t
Co + C1t + C2 t "2 +;:(p)
1+q1t+q2t2+q3t3 ~
where Ic(p) I < 4.5 X 104 , where
and where
Co 2.515517
C1 0.802853
C2 0.010328
ql 1.432788
q"2 0.189269
q3 0.001308.
5.12 The Bivariate Gaussian Distri
bution
Two random variables X and Yare said to possess a bivariate
or joint Gaussian distribution if they possess a joint probability
density function of the form
fx,Y(x, y) = 2
1
VI  P2 exp
(q(X,2 y))
KO"lO""2
with q(x, y) =
www.MathGeek.com
www.MathGeek.com
Multivariate Gaussian Distributions 113
where IT1 > 0, IT2 > 0, m1 E lR, m2 E lR, and Ipi < 1. Our nota
tion for this distribution is N(m1' rn2, o"i, 0"3, p). Such random
variables X and Yare said to be jointly Gaussian or mutually
Gaussian.
Exercise 5.7 For X and Y as above, show that X f"'...J N(m1'
lTD and that Y rv N(m2, IT3).
Exercise 5.8 For X and Y as above, show that the correla
tion coefficient of X and Y is p.
5.31 Theorem Let X and Y have a N(m1' m2, O"i, 0"3, p) distribu
tion. The random variables X and Yare independent if and
only 'if p = O. That is, mutually Gaussian random var'iables X
and Yare 'independent if and only 'if they are uncorrelated.
Proof. \Ve have already seen on page 101 that if the two random
variables are independent then they are uncorrelated. To see
that, in this case, if the random variables are uncorrelated then
they are independent, simply let p = 0 and note that fx,Y(x,
y) = fx(x)fy(y). D
5.32 Theorem If X and Y possess a bivariate Gaussian distribution
then X + Y is a Gamsian random variable.
5.13 Multivariate Gaussian Distribu
tions
A collection {Xl, ... , Xn} ofrandom variables is said to possess
a multivariate Gaussian distribution (or to be jointly Gaussian
or mutually Gaussian) if they possess a joint probability density
function of the form !x 1 , ... ,Xn (X1, ... , xn ) =
1
===exp [ (x
1  m) T ~ l (x
'  m) ]
V(21l")nVdet~ 2
www.MathGeek.com
www.MathGeek.com
114 Probability Theory
where x = [Xl' .. Xn]T, m = [ml ... mnJT, and 2.: is a symmetric
positive definite matrix. Recall that a matrix N is symmetric if
N = NT and that a real symmetric matrix is positive definite if
all of its eigenvalues are positive. It follows easily that E[Xi ] =
mi fori = 1, ... , n and that COV[Xi' X j ] = aij where
2.:=
Ltnl an 2 (Y nn
The matrix 2.: is called the covariance matrix of Xl, ... , X n . We
denote sHch a distribution for X = [Xl, ... , Xn]T by writing
X N(m, 2.:).
f"'.)
Except for boolean algebra there is no the
ory more universally employed in mathe
matics than linear algebra; and there is
hardly any theory which is more elemen
tary, in spite of the fact that generations
of professors and textbook writers have ob
scured its simplicity by preposterous calcu
lations with matrices. J. Dieudonne
5.33 Theorem If a collection of Gaussian mndom variables are mu
tually independent then they are mut1wlly Gaussian.
5.34 Theorem If a collection of mutually Gaussian mndom vari
ables are (pa'lrW'l.'ie) 1Lncorrdated then they are rrmtually inde
pendent.
5.35 Theorem If X = [Xl, ... , Xn]T has a N(p, 2.:) distribution
with 2.: posdive definde, if e is an m x n real matr'lX with mnk!"
m ::::; n, and if b is an m x 1 real vector, then ex + b has a
N (e J1 + b, C2.:CT ) distribution and e2.:CT is positive definite.
6The rank of a matrix is the number of linearly independent rows (or
columns) in the matrix. The matrix C in this theorem can have more
columns than rows but the rows must be linearly independent.
www.MathGeek.com
www.MathGeek.com
Multivariate Gaussian Distributions 115
5.36 Theorem If X = [Xl, ... , Xn]T 'lS composed of mut1wlly
Gaussian, positive variance random variables, then there ex
ists a nonsingular n x n real matrix C such that Z = ex
is a random vector composed of mutually independent positive
variance Gaussian random variables.
Example 5.13 Let Xl and X 2 be mutually Gaussian ran
dom variables with zero mean, unit variances, and correlation
coefficient ~. Let
[ ~~ ] = [~~ ~:] [ ~~ ] = [ ~~~~ ! ~:~~ ].
Note that Zl and Z2 are mutually Gaussian. Thus, for Zl and
Z'2 to be independent we require that E[ZlZ'2] = E[Zl]E[Z'2] =
O. Let Cl = C3 = 1, let C'2 = 0, and note that E[ZlZ'2] =
E[Xl(Xl + C4X2)] = E[Xf] + C4E[X1X2]. Note that E[Xf] = 1
and E[X1X'2] = 1/2. Thus, Zl and Z'2 are independent if C4 =
2. D
Example 5.14 Let Xl, ... , Xn be random variables possess
ing a joint probability density function given by
www.MathGeek.com
www.MathGeek.com
116 Probability Theory
Thus, since any subset of {Xl, ... , Xn} containingn  1 ran
dom variables is composed of mutually independent standard
Gaussian random variables, it follows that any proper subset of
{Xl, ... , Xn} containing at least two random variables is also
composed of mutually independent standard Gaussian random
variables. However, it is dear that the random variables in {Xl,
... , Xn} are neither mutually independent nor mutually Gaus
sian. This example points out the dangers that arise when one
attempts to show that a collection of random variables is jointly
Gaussian. D
5.14 Convergence of Random Vari
ables
Consider a probability space (0, F, P) and a sequence {Xn}nEN
of random variables defined on that space. In this section we
will consider several ways in which the elements in this sequence
may converge.
5.14.1 Pointwise Convergence
Consider a probability space (0, F, P) and a sequence {Xn}nEN
of random variables defined on that space. \lVe say that the Xn's
converge pointwise to a random variable X defined on (0, F, P)
if IXn(w) X(w)l+ 0 asn + OC! for each wE 0. In such a case
we write Xn + X.
5.14.2 Almost Sure Convergence
In a probabilistic context, a condition that holds almost ev
erywhere with respect to the underlying probability measure of
interest is said to hold almost surely (written a.s.) or with prob
ability one (written wpl).
www.MathGeek.com
www.MathGeek.com
Convergence of Random Variables 117
Consider a probability space (0, :F, P) and a sequence {Xn}nEN
of random variables defined on that space. \lVe say that the Xn's
converge almost surely to a random variable X defined on (0,
:F, P) if there exists a set E E :F such that P(E) = and such
° °
that IXn(w)  X(w)1 + as n + 00 for each wE EC. In such a
case we write Xn + X a.s.
5.37 Theorem Consider a pmbability space (O,:F, P) and a se
quence {Xn},,,EN of mndom variables defined on that space. If
X is a mndom variable defined on (0, :F, P) such that
00
L E[(Xn  X)2] < 00
n=l
then Xn + X a.s.
Example 5.15 Consider the probability space given by ([0,
1], 8([0, 1]), A) where 8([0, 1]) denotes the collection of real
Borel subsets of [0, 1] and A is Lebesgue measure on 8([0, 1]).
Define random variables Xn for n E N on this space via:
if w E [0, 1] n QC
if w E [0, 1] n Q.
Note that Xn(w) + 00 as n + 00 for all w E [0, 1] n Q. Even so,
[0, 1] n Q is countable and hence is a Lebesgue null set. Further,
off the set [0, 1] n Q we see that Xn +
° °
as 17 + 00. Thns,
we mndllde that Xn + a.s. Note that this also follows from
Theorem 5.37 since
which is finite. D
5.14.3 Convergence in Probability
Consider a probability space (0, :F, P) and a sequence {Xn}nEN
of random variables defined on that space. \Ve say that Xn
converges in probability to a random variable X defined on (0,
°
:F, P) if for each E > 0, P(IX n  XI ~ E) + as n + 00. In
p
such a case we write .Xn ? X.
www.MathGeek.com
www.MathGeek.com
118 Probability Theory
5.38 Theorem ConsideT a probability space (0, F, P) and a se
q1lence {Xn }nEN of random variables defined on that space. Let
X be a random variable also defined on (0, F, P). 1f Xn + X
p
a.s. then Xn ? X. That is, convergence in probability is
weaker than almost sure convergence.
Example 5.16 Consider a sequence of mutually independent
random variables {Xn}nEN such that
P(Xn=(Y)={~1 1
if (Y = 1
if 0: = O.
n
Let c > 0 and note that
01 ~ c) ~ s) = { ~ if c > 1
P(IXn  = P(Xn if 0 < c :S 1.
Thus, P(Xn ~ c) + 0 as n + 00 for any E > 0 which implies
p
that Xn ? O. Does Xn + 0 a.s.? See Problem 11.1. D
5.14.4 Convergence in Lp
Consider a probability space (0, F, P), and let p be a positive
real number. Let Lp(O, F, P) denote the set of all random
variables defined on (0, F, P) whose pth absolute moment is
finite, where we agree to identify any two random variables that
are eqnal almost surely. (The pth absolute moment of a random
variable X is E[IXIP].)
Consider a probability space (0, F, P) and a sequence {Xn}nEN
of random variables defined on that space such that Xn E Lp(O,
F, P) for some fixed p > o. \Ve say that the Xn's converge
in Lp (or in the pth mean) to a random variable X E Lp(O,
F, P) if E[IXn  XIP] + 0 as n + 00. In such a case we write
Xn + X in Lp. If p = 1 then Lp convergence is sometimes called
convergence in mean. If p = 2 then Lp convergence is sometimes
nl.s.
called convergence in meansqnare and we often write Xn ? X.
5.39 Theorem ConsideT a probability space (0, F, P) and a se
quence {Xn}nEN of random variables defined on that space. Let
www.MathGeek.com
www.MathGeek.com
Convergence of Random Variables 119
After passing through several rooms in
a museum filled with the paintings of a
rather wellknown modern painter, [Zyg
mund] mused, 'Mathematics and art are
quite different. VVe could not publish
so many papers that used repeatedly the
same idea and still command the respect
of our colleagues." Ronald Coifman and
Robert Strichartz writing about Antoni
Zygmund
X be a random variable also defined on (n, F, P). frp there
exists some p > 0 such that Xn 7 X in Lp then Xn 7 X.
That is, convergence in probability is weaker than convergence
in Lp.
Exercise 5.9 Does the converse to Theorem 5.39 hold?
Exercise 5.10 Construct a sequence of random variables
that does not converge pointwise to zero at any point yet does
converge to zero in Lp for any p > o.
Exercise 5.11 Show by an example that almost sure con
vergence need not imply convergence in Lp.
5.14.5 Convergence in Distribution
A sequence {Xn}nEN of random variables is said to converge in
distribution or converge in law to a random variable X if the
sequence {FX,JnEN of distribution functions converges to Fx(x)
at all points x where Fx is continuous. In such a case we write
L
Xn X. Note that these random variables need not be defined
7
on the same probability space.
www.MathGeek.com
www.MathGeek.com
120 Probability Theory
Table 5.2 Relations Between Types of Convergence
Relation Reference
Xn 7 X in Lp # Xn 7 X a.s. Exercise 5.10
Xn 7 X a.s. # Xn 7 X in Lp Exercise 5.11
p
Xn 7 X a.s. ::::} Xn ? X Theorem 5.38
p
Xn ? X # Xn 7 X a.s. Example 5.16
p
Xn 7 X in Lp ::::} Xn ? X Theorem 5.39
p
Xn ? X # Xn 7 X in Lp Problem 11.3
p L
Xn ? X ::::} X n ? X Theorem 5.40
L P
Xn ? X # Xn ? X Example 5.17
5.40 Theorem ConsideT a probability space (0, F, P) and a se
quence {Xn}nEN of random vaTiables defined on that space. Let
p
X be a random vaTiable also defined on (0, F, P). If Xn ? X
L
then Xn ? X.
Example 5.17 Let X take on the values 0 and 1 each with
probability 1/2, and let Xn = X for each n E N. Let Y = IX.
L
Note that Xn ? Y since FXn = Fx = Fy for all n E N even
though IXn  YI = 1 for each n E N. D
Table 5.2 summarizes the relationships between the different
types of convergence that we have considered.
5.15 The Central Limit Theorem
The Central Limit Theorem states that the sum of many in
dependent random variables will be approximately Gaussian if
each term in the sum has a high probability of being small.
www.MathGeek.com
www.MathGeek.com
The Central Limit Theorem 121
A key word in that description is "approximately." Nowhere
does the Central Limit Theorem state that anything actually
has a Gaussian distribution, except perhaps in a limit. In en
gineering applications, Gaussian assumptions are often justified
by appeals to the Central Limit Theorem. Such appeals are,
however, often at best simply not properly supported and at
worst are simply specious. \Ve must always keep in mind that
the Central Limit Theorem is not a magic wand that can make
anything have a Gaussian distribution.
5.41 Theorem (Central Limit Theorem) Suppose that {Xn}nEN
is a mutually 'independent sequence of ident'ically distr'ib1Lied ran
dom variables each with mean m and finite positive variance (j2.
frSn = Xl + ... + Xn then
S nm
(jfo
n
£
+Z
where Z ,is a standard Cams'ian random variable.
Proof. (Sketch) Let m = O. Let ¢ be the characteristic
function of Xn and note that Sn;;;; has characteristic function
()yn
[¢ ((j~) In. Since the X;'s have a finite variance, Taylor's
t 2 ()2
theorem implies that ¢(t) = 1  2 + /3(t) where fJ(t)/t 2 7 0
as t 7 O. (Recall from calculus that
(1  ~n2)11 7 exp (2)
;
as n 7 00.) Thus, it follows that the characteristic function of
S
~. converges to exp( t 2 /2), the characteristic function of a
()yn
standard Gaussian random variable, as n 7 00. The desired
result follows from Theorem 5.28 on page 108. D
<) Note 5.5 A sequence of mutually independent, identically
distributed, second order random variables exists such that the
convergence rate associated with the Central Limit Theorem can
www.MathGeek.com
www.MathGeek.com
122 Probability Theory
be aTbitraTily slow. For details about this result, see the article
"A Lower Bound for the Convergence Rate in the Central Limit
Theorem" by V. K Matskyavichyus in the Theory of Probability
and its Applications, 1983, Vol. 28, No.3, pp. 596601. This
results calls into question the standard engineering claim that
the sum of a few dozen random variables is always approximately
Gaussian.
5.16 Laws of Large Numbers
Consider n mutually independent tossings of a coin with con
stant probability p of turning up heads. Let T denote the num
ber of times that the coin comes up heads in n tosses. If n is
large then it is reasonable to expect the ratio T /n to be close
to p. The laws of large numbers make this idea mathematically
precise. According the 'Weak Law of Large Numbers CWLLN),
the ratio r/n converges to p in probability. According to the
Strong Law of Large Numbers (SLLN), T /n converges to p al
most surely.
5.42 Theorem (WLLN) If {Xn}nEN is a sequence of identically
distributed, mtdually independent random var'iables each with a
finite mean m, then
      + In.
n
Proof. We will prove only the special case when the Xn's each
have a finite positive variance (J"2. Let
1 n
X=  LXk
n k=l
and apply Chebyshev's inequality to obtain
P (I XI + ..... + Xn mI>s) <.
17 ns  
(J2
2
The desired result now follows immediately. D
www.MathGeek.com
www.MathGeek.com
Conditioning 123
5.43 Theorem (SLLN) If {Xn}nENis a sequence of identically dis
trib1ded, rrmtually independent random variables each with a fi
nite mean Tn and a finite positive variance 17 2 , then
+ Tn a.s.
n
5.17 Conditioning
Consider a random variable X defined on a probability space
(rl, F, P) with E[IXI] < 00, and let Q be a l7subalgebra of F.
The conditional expectation of X given Q is denoted by E[XIQ]
and is defined to be any random variable defined on (rl, F, P)
that satisfies the following two properties:
1. E[XIQ] is Qmeasurable.
2. 1; E[XIQ] dP = 1; X dP for all G E Q.
Any Qmeasurable random variable that is equal a.s. to E[XIQ]
is called a version of E[XIQ].
5.44 Theorem Consider a random var'iable X defined on a probabil
ity space (rl, F, P) with E[lXI] < 00, and let Q be a l7subalgebra
of F. The conditional e.rpectat'ion E[XIQ] eX'ists and ,is almost
surely unique.
If A E F then the conditional probability of A given Q is denoted
by P(AIQ) and is defined by
Thus, P(AIQ) satisfies the following two properties:
1. P(AIQ) is Qmeasurable.
2. r P(AIQ) dP = P(A n G) for all G E Q.
Je
www.MathGeek.com
www.MathGeek.com
124 Probability Theory
Exercise 5.12 Consider a random variable X defined on
a probability space (D, F, P) with E[IXll < 00. Show that
E[XIFl = X a.s.
Exercise 5.13 Consider a random variable X defined on
a probability space (D, F, P) with E[IXll < 00. Show that
E[XI{ 0, n}l = E[Xl. Does this hold pointwise or just almost
surely?
Consider random variables X and Y defined on a probability
space (D, F, P) with E[IXll < 00 and E[lYll < 00, and let Q
be a (Jsllbalgebra of F. Conditional expectations satisfy the
following properties:
1. If X = a a.s. for a E ~ then E[XIQ] = a a.s.
2. If 0: E ~ and (3 E ~ then E[aX + (3YIQl = o:E[XIQl +
(3E[YIQl a.s.
3. If X ::::; Y a.s. then E[XIQl ::::; E[YIQl a.s.
4. IE[XIQll ::::; E[IXIIQl a.s.
Property (1) is a special case of the following result.
5.45 Theorem Consider integrable random variables X and Y de
fined on a probability space (D, F, P), and let Q be a (Jsubalgebra
ofF. If X is Qmeasurable and ifE[XYl is finite then E[XYIQl =
XE[YIQl a.s.
5.6 Corollary Consider a random variable X defined on a probabil
ity space (n, F, P) with E[lXll < 00, and let Q be a (Jsubalgebra
of F. If X is Qmeasurable then E[XIQl = X a.s.
5.46 Theorem Consider a random variable X defined on a proba
bility space (D, F, P) with E[IXll < 00, and let Q1 and Q2 be
(J" s1Lbalgebras of F such that Q1 c Q:2. It follows that
E[E[XIQ1lIQ2l
E[XIQ1l a.s.
www.MathGeek.com
www.MathGeek.com
Conditioning 125
Proof. Vve will first show that
Recall that if
1. Y is Qrmeasnrable, and
2. r Y dP = Jer X dP for all G E Ql
Je
then Y = E[XIQl] a.s. Thus, if E[E[XIQ2]IQl] satisfies the previ
ous two properties then E[E[XIQ2]IQl] = E[XIQl] a.s. By defini
tion, E[E[XIQ2]IQl] is Qlmeasurable. Thus, we need only show
that
1;E[E[XIQ2]IQl] dP = 1;
X dP
for all G E Ql. By definition, the conditional expectation
E[E[XIQ2]IQl] must satisfy:
L E[E[XIQ2]IQl] dP = LE[XIQ2] dP for all G E Ql. (1)
Similarly, E[XIQ2] must satisfy:
which, since Ql C Q2, implies that
Substituting this expression into (1) implies that
for all G E Ql which is what we wanted to show. \Ve will next
show that E[E[XIQl]IQ2] = E[XIQl] a.s. In this regard, the
following lemma will be useful.
5.5 Lemma Consider a mndom variable Z defined on a probability
space (D, F, P) and let Ql and Q2 be (Js1Lbalgebms of F. If Z
is Ql measumble and if Ql C Q2 then Z is Q2 measumble.
www.MathGeek.com
www.MathGeek.com
126 Probability Theory
Proof. Since Z is Ylmeasurable it follows that Zl(B(lR)) C
Yl. But, since Yl C Y2 it follows that Zl(B(lR)) C Y2. But this
means that Z is Y2measurable. D
\Ve will now continue with our proof of Theorem 5.46. By defi
nition, E[XIY1] is Ylmeasurable. Lemma 5.5 thus implies that
E[XIY1] is also Y2measurable. Corollary 5.6 thus implies that
E[E[XIY1]IY2] = E[XIY1] a.s. D
5.7 Corollary Consider a random variable X defined on a pmbabil
ity space (0, F, P) with E[IXI] < 00, and let Y be a (Tsubalgebra
of F. It follows that E[E[XIYll = E[X].
Consider random variables X, 11, ... , Y" defined on a proba
bility space (0, F, P) with E[IXI] < 00. The conditional ex
pectation of X given Y1, ... , Yn is denoted by E[XI11, ... , Yn ]
and is defined to be E[XI(J"(Yl' ... , Yn)].
5.4 7 Theorem If X and Yare independent random variables de
fined on a pmbability space (0, F, P) wdh X 'integrable then
E[XIY] = E[X] a.s.
Proof. Since X and Yare independent it follows that P(A n
B) = P(A)P(B) for all A E (T(X) and for all B E (T(y). Let
A E (T(y) and consider the random variable I A . Since (T(IA) C
(T(y) it follows that X and IA are independent. Note that
LE[XIY]dP LX dP for all A E (T(y)
in lAX dP for all A E (J"(y)
E[IAX] for all A E (T(y)
E[tl]E[X] for all A E (T(y)
in IA dP E[X] for all A E (J"(y)
14 dP E[X] for all A E (J"(y)
14 E[X] dP for all A E (T(y).
Note also that E[X] is (J"(Y)measurable. Thus it follows that
E[XIY] = E[X] a.s. D
www.MathGeek.com
www.MathGeek.com
Conditioning 127
Exercise 5.14 Show that if E[XIY] = E[X] then E[XY] =
E[X]E[Y].
5.48 Theorem (Jensen's Inequality) Cons'ideT a mndom vaT'l
able X defined on a probability space (D, :.F, P) with E[IXI] <
00, and let Q be a rrsubalgebm of :.F. If cjJ ,is a conve.1: Tealvalued
function defined on lR and if cjJ(X) is integmble then
cjJ(E[XIQ]) :::; E[cjJ(X)IQ] a.s.
Example 5.18 Consider a random variable X defined on a
probability space (D, :.F, P) with E[X2] < 00, and let Ql and Q2
be O"tmbalgebras of :.F such that Ql C Q2' \Ve will show that
To begin, for a rrsubalgebra Q of :.F, note that
E[X2  2XE[XIQ] + E[XIY]2]
E[X2]  2E[XE[XIQ]] + E[E[XIQ]2]
E[X2]  2E[E[XE[XIQ]IQ]]
+E[E[XIQ]2]
E[X2]  2E[E[XIQ]2] + E[E[XIQ]2]
E[X2]  E[E[XIQ]2].
Thus, it follows that
if and only if
The desired result follows since we have
E[E[XIQl]2] E[(E[E[XIQ2]IQl])2]
< E[E[E[XIQ2]2IQl]]
E[E[XIQ2]2]
via Jensen's ineqnality. D
www.MathGeek.com
www.MathGeek.com
128 Probability Theory
Consider a probability space (0, :F, P) and let H denote the
Hilbert space of square integrable random variables defined on
(0, :F, P) where (X, Y) = E[XY] and where we agree to iden
tify any two random variables X and Y for which E[(X  y)2] =
O. Let X, Y1 , ... , Yn be second order random variables defined
on (0, :F, P). Our goal now is to find a Borel measurable fU11(;
tion f: lRn + lR so that E[(X  .1(11, ... , Yn))2] is minimized
over all such functions f. Let G be the subspace of H given
by all elements of H that may be written as Borel measurable
transformations of Y1 , ... , Yn . Using the Hilbert Space Projec
tion Theorem (Theorem 4.7) we know that the function we are
seeking is the projection of X on G; that is, we seek the point
in G that is nearest to X.
5.6 Lemma The projection of X on G is given by E[XIY1 , ... , Yn].
Proof. First, note that E[XI11, ... , Y,,] E G since (via Jensen's
inequality) we have:
Next, let Z E G and note that
E[XZ] = E[E[XZI11, ... , Y"ll = E[ZE[XI11, ... , Y"ll·
That is,
(X, Z) = (E[XIY1' ... , Yn ], Z)
which implies that
(X  E[XI11, ... , Yn ], Z) = O.
Thus, X  E[XI11, ... , Y,,] is orthogonal to every element in
G. D
Thus, we conclude that the best minimum meansquare Borel
measurable estimate of X in terms of 11, ... , Yn is given by
E[XI11, ... ,Yn ]. The following result shows that Borel mea
surability of our estimators cannot be dispensed with:
5.49 Theorem Let A1 be any real number. There exists a probability
space (0, :F, P), two bounded mndom variables X and Y defined
on (0, :F, P), and a f1Lnct'lon .I: lR + lR such that X(w) =
f(Y(w)) Jor all w E ° yet such that E [(X  E[Xly])2] > !vI.
www.MathGeek.com
www.MathGeek.com
Regression Fu nctions 129
<) Proof. See "A Note on a Common Misconception in Estima
tion" by Gary \\'ise in Systems and Control Letters, 1985, Vol. 5,
pp. 355356. For related material, see also "A Result on Multi
dimensional Quantization" by Eric Hall and Gary \Vise, in Pro
ceedings of the American Mathematical Society, Vol. 118, No.2,
June 1993, pp. 609613. D
5.18 Regression Functions
Consider random variables X and Y defined on a probability
space (0, F, P) with E[X] < 00. A regression function of X
given Y = y is denoted by E[XIY = y] and is defined to be any
realvalued Borel measurable function on lR that satisfies
r E[XIY = y] dFy(y) = r
JB JYl(B)
X dP
for all B E 8(lR).
5.50 Theorem Any two regression functions of X given Y = yare
equal almost everywhere with respect to the measure 'induced by
Fy .
5.51 Theorem Consider random variables X and Y defined on
a probability space (0, F, P) with E[IXI] < 00. If ¢(y) =
E[XIY = y] then E[XIY] = ¢(Y) a.s.
5.52 Theorem Consider two random variables X and Y possessing
a joint density function fx, y then
E[XIY = y] = r x fx,Y(x, y) dx
JIT€. fy(y)
almost everywhere with respect to the measure induced by Fy .
That is, a version of E[XIY = y] is given by
r . fx,Y(x, y) d
JIT€. X fy(y) x.
www.MathGeek.com
www.MathGeek.com
130 Probability Theory
(The ratio
fx.y(x, y)
fy(y)
is called a condit'lonal densdy of X given Y = y and is denoted
by fXIY(xly).)
Example 5.19 Let X and Y be zero mean, unit variance,
mutually Gaussian random variables with correlation coefficient
p. In this example we will find E[XIY = y] and E[XIY]. Note
that
E[XIY =y] r xfX,y(x'Y)dx
JlR fy(y)
1 ((x 2  2pxy + y2))
r =2nv/;;: O=l=_=p::::;;:2 exp 2 (1  (2) d
JJFi!. x 1
y'21f
(_y2)
exp :2
X
1
y'2K2n VI
1
 p2 exp
(y2) (_y2)
2 exp 2(1 _ (2)
~
(X2  2PX Y))
x
JFi!.
x exp ( 21p2)
( dx
1 1 (y2) (_y2)
y'2K v/l  p2 exp 2 exp 2(1 _ (2)
~
_(x2 _ 2pxy ± p2y2))
X lR x exp ( (2)
21p dx
y2) (_y2) (p2y2)
exp ( 2 exp 2(1 _ (2) exp 2(1 _ (2)
X r _1_ 1 x exp ((X  py)2) dx
JlR y'2K v/l  p2 2(1  (2)
y2) (_y2) (p2y2)
exp ( 2 exp 2(1 _ (2) exp 2(1 _ (2) py
y2(1 _ p"2) _ y"2 + p2y"2)
exp ( 2(1 _ p"2) py
py
where the final integral above is simply the mean of a N(py,
1 (2) random variable. Thus, it follows that E[XIY = y] = py
and hence that E[XIY] = pY a.s. D
www.MathGeek.com
www.MathGeek.com
Regression Fu nctions 131
As the following theorem shows, the existence of a joint density
function for two random variables X and Y with X integrable
places no additional restrictions on the regression function of X
given Y = y.
5.53 Theorem Let 9 be any Borel measurable function mapping JR
into JR. There exist mndom variables X and Y possessing a
joint density function such that X is integmble and E[XIY =
y] = g(y) for all y E R
Proof. Let g: JR 7 JR be Borel measurable and define
1
f(x, y) = 4" exp[ exp(lyl)lx  g(y)I]·
Note that f(x, y) is a joint probability density function since
kkf(x, y)dxdy kk~ exp[  exp(lyl) Ix  g(y) I] dx dy
kk~ exp[  exp(lyl) Izl] dz dy
k~exp( IYI) dy = 1.
Let X and Y be random variables s11ch that the pair (X, Y)
has a joint probability density function given by f(x, y). Notice
from the above calculation that a second marginal probability
density function of f(x, y) is given by fy(y) = exp( lyl)/2.
Recall that a version of E[XIY = y] is given by fIR. x [J(x,
y)/ fy(y)] dx. This version will be used throughout the remain
der of this proof. Substituting for fy (y) implies that
E[XIY =y] 2 exp(lyl) ( ::. exp[  exp(lyl) Ix  g(y) I] dx
JIR. 4
( z+g(y)
2exp(lyl) JIR. 4· exp[exp(IYI)lzl] dz
2exp(IYI)g(y) \1
2exp y
I) = g(y).
Hence, the random variables X and Y with the joint probability
density function f(x, y) are such that E[XIY = y] = g(y), where
9 was an arbitrarily preselected Borel measurable function. D
www.MathGeek.com
www.MathGeek.com
132 Probability Theory
{) 5.19 Statistical Hypothesis Testing
Basic concepts of statistics arise in medicine, engineering, sociol
ogy, bllsiness, education, and other areas. For example, consider
a medical situation in which a new medication for a particular
problem is being tested. Assume that patients are divided into
two groups, and assume that the patients in the first group are
each given the new medication and that the patients in the sec
ond group are each given a placebo. (A placebo is a sugar pill
that is identical in appearence to the medication.) Assume that
each group has 50 patients in it. 'What if 36 patients in the first
group were found to be free of the medical problem of concern,
and 25 patients in the second group were found to be free of the
medical problem of concern? How might we describe these re
sults? Of the 50 patients taking the new medication, 36 of them
improved. This is an objective result of the data. However, we
should be careful before concluding that 72% of the time the
new medication will be effective. This conclusion belongs in the
realm of statistical inference.
A statistical hypothesis is a nonempty family of probability mea
Sllres on a given measurable space. For convenience, we will take
our llnderlying measurable space to be (JR., B(JR.)). Then, for
instance, the probability measures of interest could be distribu
tions of random variables. A statistical hypothesis is said to be
simple if it is a singleton set. Roughly speaking, we are trying
to discern which hypothesis is in effect based upon knowledge
of a realization of a random variable. For example, if the hy
potheses are simple and if, under one hypothesis, unit measure
is given to a particular Borel set, and if, under the other hy
pothesis, the measure gives unit measure to a disjoint Borel set,
then it should be straightforward to discern which hypothesis is
in effect. Indeed, just see which of the two Borel sets contains
the realization and announce the corresponding hypothesis.
Consider the situation where we have two disjoint simple sta
tistical hypotheses Ho and Hl and assume that we know the
probability 1To of Ho and 1Tl of H 1 . These probabilities are of
www.MathGeek.com
www.MathGeek.com
Statistical Hypothesis Testing 133
ten called the priors since such a probability is the probability
of a hypothesis being true without regard to any random vari
able that is observed. For convenience, assume that the relevant
measures associated with Ho and HI have probability densities
denoted, respectively, by .10 and .h.
\Ve note that there are two types of errors we could make in
reaching a decision. VVe conld annonnce HI when Ho is trne or
we could announce Ho when HI is true. Our goal will be to
make a decision in such a way so as to minimize the probability
of error Pe. Let So denote a Borel set snch that if a realization
belongs to So then we announce Ho and if a realization belongs
to S8 we announce HI. Let SI = Soc. Thus, it follows that
Pe= 7rO r
JS I
fo(x)dx+ 7rl
JSa
r .h(x)dx.
Rewriting this, we have
Pe = hI .10
7ro (x) dx + 7rl ha .II(x) dx
+ hI .II
7r l (x) dx  7rl hI .11 (x) dx
+ hI (
7rl 7rofo(x)  7rl.h(x)) dx.
Now, we see that we can minimize Pe by choice of SI by defining
SI to be the set ofreal numbers x such that 7rofo(x) 7rl.h(X) <
o. Consequently, So is the set of all real numbers x such that
7rofo(x)  7rl.h(x) ;:::: O. \Ye note that the equality condition in
these inequalities is arbitrary since such a change in the inequal
ity does not change the corresponding integral.
As an example, consider testing for the hypothesis that a ran
dom variable X has a standard normal distribution versus the
hypothesis that X is normal with a mean and a variance of
one. Assnme that each hypothesis is eqnally likely; that is,
7ro = 7rl = 1/2. Using the above procednre, we announce that
the mean is one for real nnmbers x snch that
e(xl)2
that is, we announce that the mean is one when x > 1/2. Hence,
we announce that the mean is one whenever X E (1/2, (0), and
this test minimizes the probability of error.
www.MathGeek.com
www.MathGeek.com
134 Probability Theory
5.20 Caveats and Curiosities
www.MathGeek.com
www.MathGeek.com
Random Processes 135
6 Random Processes
6.1 Introduction
Throughout this chapter, we will assume that all probability
spaces are complete unless otherwise specified. A random pro
cess (or a stochastic process) defined on a probability space (0,
F, P) is an indexed collection of random variables each defined
on (0, F, P). \Ve denote a random process hy {X(t) : t E T}
where T is a nonempty index set that often denotes time and is
usually (in these notes, always) taken to be a subset of R Thus,
for each fixed t in T, X (t) (or, more precisely, X (t, .)) is simply
a random variable defined on (0, F, P). If T is a countably in
finite set then we say that {X (t) : t E T} is a random sequence
or a discrete time or discrete parameter random process. If T
is a subinterval of lR. then we say that {X (t) : t E T} is a con
tinuous time or continuous parameter random process. Vve will
often denote a random process {X(t) : t E T} by {X(t)} when
the index set T is arbitrary or clear from the context.
Consider a random process {X(t) : t E T}. A function X(t,
wo) : T + lR. obtained by fixing some Wo E n and letting t
vary is called a sample function or sample path or trajectory
of the random process {X (t)}. If T is count ably infinite then
a sample path is called a sample sequence. If {tl' t"j, ... , tn}
is any finite set of elements from T then the joint probability
distribution of the random variables X(t 1 ), ... , X(t n ) is called
a finite dimensional distribution of the random process {X (t)}.
A random process {Y(t) : t E T} is said to be a modification of a
random process {X(t) : t E T} if X(t) = Y(t) a.s. for all t E T.
Notice that in such a case {X(t) : t E T} and {Y(t) : t E T}
have the same family of finite dimensional distributions. Also,
note that the associated Pnnll set can depend on t.
Two random processes {X(t) : t E T} and {Y(t) : t E T}
are said to be indistinguishable if, for almost every w, X(t,
w) = Y(t, w) for all t E T. Notice that there is just one set
of measure zero off of which X(t) = Y(t) for all t in T while for
www.MathGeek.com
www.MathGeek.com
136 Random Processes
a modification the set of measure zero off of which X(t) = Y(t)
may depend on t. If T is a countable set then the two definitions
are equivalent since a countable union of null sets is itself a null
set.
The value of a problem is not so much in
coming up with the answer as in the ideas
and attempted ideas it forces on the would
be solver. 1. N. Herstein
Let D be a subset of R The set D is said to be dense in lR
if every nonempty open snbset of lR contains an element from
D. For example, the set Q of rational numbers is dense in R
Let {X (t) : t E T} be a random process defined on a complete
probability space (0, F, P) where T is an interval. The random
process {X (t) : t E T} is said to be separable if there exists a
countable dense subset I of T and a null set N E F such that
if w E NC and t E T then there exists a sequence {tn}nEN of
elements from I with tn + t such that X (tn' w) + X (t, w).
6.1 Theorem Consider a random process {X (t) : t E T} defined on
a complete pTObabildy space and assume that T E B(lR). There
exists a separable random pTOcess {Y(t) : t E T} defined on the
same pTObabUity space that ,is a modificat'ion of {X(t) : t E T}.
Theorem 6.1 says that requiring a random process to be sep
arable places no additional restrictions on the family of finite
dimensional distributions of that process. In short, any random
process admits a separable modification.
Let (fh, Fd and (0 2 , F 2 ) be two measurable spaces. If A E Fl
and B E F2 then A x B is called a measurable rectangle. The
smallest ITalgebra on 0 1 X O 2 that contains every measurable
rectangle is denoted by Fl x F'2 and is called the product IT
algebra on 0 1 x fh
6.2 Theorem If (0 1, F 1 , !11) and (0 2, F 2 , !12) are (jfinite measure
spaces then there e.rists a ITfinde measure on the measmable
space (0 1 X O2 , Fl x F 2 ), called the pTOduct measure and denoted
www.MathGeek.com
www.MathGeek.com
I ntrod uction 137
by P,2, such that, for any measurable rectangle A x B,
P,1 X P,1 x
P,2(A x B) = P,1(A)P,2(B).
Consider a random process {X (t) : t ETc JR.} defined on a
probability space (0, 5", P), and let M(JR.) denotes the collec
tion of all Lebesgue measurable subsets of JR.. If T is an ele
ment of M(JR.) and if X is a measurable mapping from (JR. x 0,
M(JR.) x 5") to (JR., M(JR.)) then we say that the random process
{X (t) : t E T} is measurable.
Example 6.1 Let A be a subset of JR. that is not a Lebesgue
measurable set, let X be a positive random variable defined
on a probability space (0, 5", P), and define a random process
{Y(t) : t E JR.} on this space via Y(t, w) = X(w)IA(t). The
inverse image of the Borel set (0, (0) is A x ° which is not a
measurable set in the product measure space. Thus, {Y(t)} is
not a measurable random process. Notice that for each fixed t,
Y(t) is a random variable yet, for each fixed w, Y(t) is a non
Lebesgue measnrable fnndion of t. D
6.3 Theorem Let {X (t) : t E T} be a random process defined on
a complete probability space and assume that T is a Lebesgue
measurable subset of R Suppose that there exists a subset N
of JR. having Lebesgue measure zero such that X (s) converges in
probability to X(t) as s + t for every t in T \ N. (That is,
S1lppose that {X(t) : t E T \ N} is continuous in probability.)
Then there e:rists a random process defined on the same space
that is a measurable and separable modification of {X (t) : t E
T}.
Theorem 6.3 says that any random process that is continuous
in probability admits a modification that is both separable and
measurable. Recall that separability places no additional re
strictions on the family of finite dimensional distributions of a
random process. This statement cannot be made for measura
bility. In particular, there exist random processes that do not
possess measurable modifications. An example of such a pro
cess (that is, nevertheless, discussed frequently in engineering
contexts) is provided by the following theorem.
www.MathGeek.com
www.MathGeek.com
138 Random Processes
6.4 Theorem Let {X (t) : t E lR} be a random process composed of
second order, positive variance, mui1wlly independent random
variables defined on the same probability space. The random
process {X (t) : t E lR} does not admit a measurable modification.
One is often confronted with a need to integrate the sample
paths of a random process. The following theorem presents con
ditions that are sufficient to ensure that almost all of the sample
paths of a random process are Lebesgue integrable. Later, we
will define an L2 (or meansquare) integral for a certain family
of random processes. It will not be defined as a pathwise in
tegral but instead will be defined as an L2 limit. (If both the
pathwise integral and the L2 integral exist they will be equal
almost surely.) VVe will find this latter type of integral much
more useful for our purposes than an integral based upon the
sample paths of a random process.
6.5 Theorem Let {X (t) : t E T} be a measurable random process.
All sample paths of the random process are Lebesgue measurable
functions of t. rf E[X(t)] exists for all t E T then it defines a
Lebesgue measurable function of t. Further, if A is a Lebesgue
measurable subset ofT and if f4 E[IX(t) I] dt < 00 then almost all
sample paths of {X (t) : t E T} are Lebesgue integrable over A.
6.2 Gaussian Processes
A random process {X (t) : t E T} is called a Gaussian process
if the random variables X(td, X(t 2 ), ... , X(t n ) are mutually
Gaussian for every finite subset {tl' t 2 , . . . , t n } of T.
6.6 Theorem Let {Y(t) : t E T} be any random process such that
E[IY(t)12] < 00 for all t E T. There exists a Gaussian process
{X (t) : t E T} (defined, perhaps, on a different probability space)
such that E[X(t)] = 0 and E[X(s)X(t)] = E[Y(s)Y(t)] for all s
and tin T.
www.MathGeek.com
www.MathGeek.com
Second Order Random Processes 139
6.3 Second Order Random Processes
A random process {X (t) : t ETc lR} is said to be a second or
der random process or an L2 random process if E [X2 (t)] < CXJ for
all t E T. The a11tocovariance f11ndion of sl1ch a process is de
fined to be K(tl' t 2 ) = E[(X(tl)  E[X(t l )])(X(t 2 )  E[X(t2)])]
where t l , t2 E T. Notice that if {X(t) : t E T} has autocovari
ance function K and if f : T + lR then {X(t) + f(t) : t E T}
also has auto covariance function K. That is, changing the
means of the individual random variables in {X (t) : t E T}
does not change the autocovariance function of the process.
The autocorrelation function of a second order random process
{X(t) : t E T} is defined to be R(tl' t"2) = E[X(tl)X(t"2)] for tl
and t2 in T. Note that K(tl' t 2) = R(h, t 2 ) E[X(tl)]E[X(t2)]'
Note, also, that for a zero mean, second order random process,
the autocovariance function is equal to the autocorrelation func
tion.
6.7 Theorem Let K be a Tealvalued nonnegative definite function
defined on TxT such that K(t, s) = K(s, t) fOT any t and s in
T. TheTe exists a second ordeT mndom pmcess {X(t) : t E T}
whose autocovaT'lance funct'lon is K.
A random process {X(t) : t E T} is said to be strictly stationary
if given any positive integer n, any elements tl < t2 < ... < tn
from T, and any h > 0 such that ti + h E T for each i ::::; n, the
joint distribution function of the random variables X (tl + h),
X(t2+h), ... , X(tn+h) is the same as that of X(t l ), X(t 2), ... ,
X(tn). That is, if t denotes time, a stridly stationary random
process is one whose finite dimensional distributions remain the
same as time is shifted.
Example 6.2 Let {X(t) : t E lR} be a random process mm
posed of identically distributed, mutually independent random
variables. Let s, t l , t 2, ... , tn E lR where n E N, and note that
FX(h+s), ... ,X(tn+s)(Xl, ... , xn)
= P(X(tl + s) ::::; Xl, ... , X(tn + S) ::::; Xn)
www.MathGeek.com
www.MathGeek.com
140 Random Processes
P(X(tl + S) ::; Xl)'" P(X(tn + S) ::; Xn)
P(X(tl) ::; Xd ... P(X(t n ) ::; Xn)
FX(h), ... ,X(tn)(Xl, ... , xn).
Thus, it follows that {X(t) : t E lR} is a strictly stationary
random process. D
A random process {X (t) : t E T} is said to be wide sense sta
tionary CWSS) if it is a second order process, and if K(s, t)
depends only on the difference s  t. VVe denote K(s + t, s) by
K(t) for a random process that is wide sense stationary. In the
case of a \VSS random process {X(t) : t E T} the assumption
that E[X(t)] is a constant function of t is often added. However,
this condition is unnatural mathematically and has nothing to
do with the essential properties of interest for \VSS random pro
cesses. For example, let e be a random variable with a uniform
distribution on [0, 1l"]. For each t E lR, let X(t) = cos(21l"t + 8).
Then {X(t) : t E lR} is a VVSS random process with a noncon
stant mean.
Example 6.3 Let {X(t) : t E lR} be a random process
composed of mutually independent random variables such that
E[X(t)] = 0 for all t E lR and such that E[X2(t)] = (}2 E (0, (0)
for all t E R This random process is wide sense stationary since
V(t)X(t
E [j~ )] = {E[X(t).]E[X(t + s)] = 0 if s i= 0
+s E[X2(t)] = (}2 if s = 0
is a constant function of t. D
6.8 Theorem A strictly stationary second order mndom pmcess is
wide sense stationary.
6.9 Theorem A wide sense stationary Gaussian pmcess with a con
stant mean is strictly stat'lonary.l
\Ve will next consider a calculus for second order processes. That
is, we will consider a framework in which we may discuss con
tinuity, differentiation, and integration of second order random
processes.
lThus, the phrase "stationary Gaussian process" is not ambiguous.
www.MathGeek.com
www.MathGeek.com
Second Order Random Processes 141
A second order random process {X(t) : t E lR} is said to be
L2 continuous at the point t E lR if X (t + h) + X (t) in L2 as
h + o. A second order random process {X (t) : t E lR} is said
to be L2 differentiable at the point t E lR if (X(t+h) X(t))/h
converges in L2 to a limit X' (t) as h + o. The next theorem
relates L2 continuity of a second order random process to the
auto covariance function of the random process.
Recall that a function f : lR 2 + lR is said to be continuous
at (x, y) if for every E > 0, there exists a 8 > 0 such that
If(x, y)  f(a, b)1 < s for all points (a, b) in lR 2 snch that
j(x  a)2 + (y  b)2 < b.
6.10 Theorem Let {X(t) : t E lR} be a second order random pTOcess
such that E[X(t)] is continuous. The random pTOcess {X(t) :
t E lR} ,is L2 cont'inu01Ls at T E lR 'if and only if K ,is continuous
at (1', 1') E lR 2 .
6.1 Lemma If an autoc01!aTiance function is continuous at (t, t) fOT
all tin lR then ,it is cont'inuous at (s, t) fOT all sand t ,in R
6.1 Corollary Let {X (t) : t E lR} be a WSS random process with
autocovariance function K (t). If the pTOcess is L2 continuous
at some point s then K is continuous at the origin. If K is
continuous at the origin then it is continuous everywhere and
the random pTOcess is L2 contimw1Ls for all t.
Notice that the random process in Example 6.3 is nowhere L2
continuous since its auto covariance function is discontinuous at
the origin. Vve will next relate L2 differentiability and differen
tiability of the autocovariance function in the wide sense sta
tionary case.
6.11 Theorem Let {X(t) : t E lR} be a WSS random pTOcess with
autocovar'lance fundion K (t). If the pTOcess ,is L2 d~fferentiable
at all points t E lR then K (t) is twice differentiable fOT all t E lR
and {X' (t) : t E lR} is a llyide sense stationary random pTOcess
with autocovaTiance function  K" (t).
\Ve next consider integration of second order random processes.
Let {X (t) : a ::; t ::; b} be a second order random process with
www.MathGeek.com
www.MathGeek.com
142 Random Processes
auto covariance function K where a and b are real numbers with
a < b. Let 9 be a realvalued function defined on [a, b]. \Ye
define
ib g(t)X(t) dt
as follows. Let ~ = {to, t l , ... , tn} be such that a = to < tl <
... < tn = b, and let I~I denote the maximum of It i  till over
all positive integers i ::; n. Define
n
I(~) = "Lg(tk)X(tk)(tk  tkl).
k=l
If I(~) converges in L2 to some random variable Z as I~I + 0
then we say that g(t)X(t) is L2 integrable on [a, b] and we denote
the L2 limit Z by
ib g(t)X(t) dt.
6.12 Theorem If, in the context of OUT discussion, E[X(t)] and g(t)
aTe continuous on [a, b] and if K ,is continuous on [a, b] X [a, b]
then g(t)X(t) is L2 integmble on [a, b].
6.13 Theorem If E[X(t)] = 0, if 9 and haTe cont'inuous on [a, b],
and 'if K ,is continuous on [a, b] x [a, b] then
E [ibg(s)X(s) ds ib h(t)X(t) dt] = ib ib g(s)h(t)K(s, t) dsdt,
and,
E [i b
g(s)X(s) dS] = E [i b
h(t)X(t) dt] = o.
6.14 Theorem If E[X(t)] = 0, if h is continuous on [a, b], and if K
is continuous on [a, b] X [a, b] then
E [X(S) ib h(t)X(t) dt] = ib K(s, t)h(t) dt.
6.2 Lemma ff a seq1Lence of mndom vaTiables defined on some
probability space conveTges in L2 and converges almost sUTely
then the limits aTe equal with probab'ility one.
www.MathGeek.com
www.MathGeek.com
The KarhunenLoeve Expansion 143
6.15 Theorem If the integral J: g(t)X(t) dt exists as an L2 integral
and, for almost all w, as a Riemann integral then the two inte
grals are equal with probability one.
Proof. If g(t)X(t) is both L2 integrable and Riemann integrable
a.s. on [a, b] then (using the current notation) I(lJ..) converges to
Z in L2 and almost surely. Thus, the desired conclusion follows
by the previous lemma. D
<) 6.4 The KarhunenLoeve Expansion
Let K be a continuous auto covariance function defined on [a,
b] x [a, b]. Define an integral operator A on L 2 ([a, b]) (the set of
all square integrable realvalued Lebesgue measurable functions
defined on [a, b] where we identify any two functions that are
eqnal a.e.) via
A[f](s) = 1b K(s, t)f(t) dt,
where a ::; s ::; band f(t) E L 2 ([a, b]). Notice that the function
A maps the realvalued function f defined on lR to the real
valued function A[f] defined on R A function e() is said to be
an eigenfunction of the integral operator A if A[ e](s) = Ae(s)
for some constant A and for a < s < b. The constant A is called
the eigenvalue associated with the eigenfunction e(t).
6.16 Theorem (Mercer) US'ing the above notai'ion, let {en()}nEN
be a sequence of e'igenf1Lndions of the integral operator A such
that
if j i: k
'if j = k,
2. 'if e(.) is any eigenfunction of A then e(·) zs equal to a
linear cornb'ination of the en's, and,
www.MathGeek.com
www.MathGeek.com
144 Random Processes
3. the eigenvalue An associated with en is nonzero fOT each
n E N.
It then follows that
(Xl
K(s, t) = L Anen(s )en (t)
n=l
for sand t 'In [a, b] (where the ser'les converges absolutely and
uniformly in both variables).
6.17 Theorem (KarhunenLoeve) Let {X(t) : a ::; t ::; b} be a
second oTdeT pTocess with zero mean and continuous autoc01JaTi
ance fmLdlon K. Let {en (t) }nEN be a sequence of eigenfmLct'lons
of the integml opemtoT A (as defined above) associated with K
that satisfies properties (1), (2), and (3) of MeTceT's TheoTem.
Then (Xl
X(t) = L Znen(t),
n=l
fOT a ::; t ::; b, wheTe
Zn = lb X(t)en(t) dt.
FUTtheT, the Zn's aTe zero mean, oTthogonal (E[ZkZj] = 0 fOT
k i= j) mndom vaTiables such that E[Z~] = An, and the seTies
converges in L2 to X(t) unzformly in t; that is,
E [ ( X(t)  t, Zke,(t)) '] ~ ()
as n 7 00, 1tn'iformly fOT t in [a, b].
Notice that each term in the above series expansion for X(t) is a
product of a random part (that is, a function of w) and a deter
ministic part (that is, a function of t). As the following theorem
shows, the KarhunenLoeve expansion takes on a special form
when the random process is Gaussian.
6.18 Theorem Let the previous discussion set notation. In the
KarhunenLoeve expansion for a Gaussian mndom process, the
mndom sequence {Zi}iEN is a Gaussian mndom sequence com
posed of mutually independent mndom variables.
www.MathGeek.com
www.MathGeek.com
Markov Chains 145
6.5 Markov Chains
Consider a discrete parameter random process {Xn : n E N u
{O}} where each random variable in the process takes values
only in some snbset C = {ai :i E I} of lR where I is a subset of
N. For each j and k: from I, let Pj = P(Xo = aj) and let Pjk =
P(Xl = aklXo = G:j). The random sequence {Xn : n E NU{O}}
is said to be a Markov chain if, for any nonnegative integer n,
= Pj oPjoj 1 Pjd2 X ... x Pjndn.
The points in C are called the states of the lVIarkov chain, the
Pk values are called the initial probabilities of the Markov chain,
and the Pjk values are called the transition probabilities of the
Markov chain. If C is a finite set then the Markov chain is said
to be a finite Markov chain.
Higher order transition probabilities of a Markov chain are de
fined as follows. Let
PJ~) = P(Xn = aklXo = aj).
This probability is equal to the sum of the probabilities of all
possible distinct seqnences of states that begin at state aj and
arrive, n steps later, as state G:k. For example, if n = 2, then
L P jmP mk·
(2) _ '""'
P jk
mEl
A simple inductive argument shows that in general we have
(m+n)
PJk
= '""'p(m.. )p(n)
L)1 lk
iEI
which is a special case of a general Markov property known as
the ChapmanKolmogorov equation.
The unconditional probability of entering state Cl!k at the nth
step is denoted by
n
P1
) and is given by
P(n)
k
= '""'
L
P .p(n)
J )k·
jEl
www.MathGeek.com
www.MathGeek.com
146 Random Processes
Note that if Pi = 1 (that is, if the Markov chain always begins
in state
,
cy.)
1.
then p(n)
k = p(n)
lk .
\Ve will say that a state ak can be reached from state aj if there
exists some nonnegative integer n for which p~~) is positive. A
set A of states is said to be closed if no state outside of A can
be reached from any state inside A. For an arbitrary set A of
states, the smallest closed set containing A is said to be the
closure of A. If the singleton set containing a particular state
is closed then that state is said to be an absorbing state. A
Markov chain is said to be irreducible if there exists no closed
state other than the set of all states. Note that a Markov chain
is irreducible if and only if every state can be reached from every
other state.
A state CYj is said to have period m > 1 if p)7)
= 0 unless n is a
multiple of m and if m is the largest integer with this property.
A state aj is said to be aperiodic if no such period exists. Let
ij~) denote the probability that for a Markov chain starting in
state CYj, the .first entry into state CYk occms at the nth step. Let
=
ijk = L ij~).
n=1
Note that ijj = 1 for a Markov chain that begins in state CYj
then a return to state CXj will occur at some later time with
probability one. In this case, we let
=
JLj = L ni?;),
71,=1
and we call JLj the mean recurrence time for the state aj. A state
aj is said to be persistent if ijj = 1 and is said to be transient
if ijj < 1. A persistent state CYj is said to be a null state if
JLj = 00. An aperiodic persistent state aj with JLj < 00 is said
to be ergodic.
=
6.19 Theorem A state CXj is transient if and only 'if L p~~) < 00.
71,=0
6.20 Theorem A persistent state eej is a null state if and only if
(Xl
L p~7) = 00 yet p~7) 7 0 as n 7 00.
17.=0
www.MathGeek.com
www.MathGeek.com
Markov Processes 147
6.21 Theorem rr
the state aj ,is aperiod'ic then limn>oo Pl;l ,is e'ither
equal to zero or to fij / Pj .
6.22 Theorem In a finite Markov chain there exist no null states
and it is impossible that all states are transient.
6.6 Markov Processes
Consider a random process {X (t) : t E T} defined on a pro b
ability space (0, F, P) where T c Itt The random process
{ X (t) : t E T} is said to be a Markov process if, for kEN and
o ::; tl ::; t2 ::; ... ::; tk ::; u where ti E T for each i and U E T,
P(X(u) E B I X(t 1 ), ... , X(t k )) = P(X(u) E B I X(td) a.s.
for each real Borel set B. Recall that a conditional expectation
with respect to {X (s) : s ::; t} is by definition a conditional
expectation with respect to IT( {X (s) : S ::; t}), the smallest IT
algebra with respect to which every random variable in the set
{X (s) : s ::; t} is measurable.
6.23 Theorem Consider a Markov process {X (t) : t E [0, oo)}. For
all real Borel sets B and for all t ::; u, 'it follollJS that P(X(u) E
B I {X(s) : S ::; t}) = P(X(u) E B I X(t)) a.s.
Note that the result of the previous theorem follows from the
seemingly weaker condition used to define a Markov process.
The theorem says roughly that a conditional probability of a
future event (at time u) associated with a Markov process given
the present (at time t) and the past (at times before t) is the
same as a conditional probability of that future event given
just the present. That is, for a Markov process, the past and
the present combined are no more "informative" than jnst the
present for determining the probability of some future event.
The following corollary to the previous theorem restates this
property in terms of conditional expectation.
www.MathGeek.com
www.MathGeek.com
148 Random Processes
6.2 Corollary Consider a Markov process {X(t) : t E [0, oo)}. If
Z is an integrable random variable that is (J ( { X (s) : s 2': t})
measurable then E[Z I {X(s) : s ::::; t}] = E[Z I X(t)] a.s.
The following theorem says that the future and the past of a
Markov process are conditionally independent given the present.
6.24 Theorem Consider a Markov process {X(t) : t E [0, oo)}. If
Z is an integrable random variable that is (J ( { X (s) : s 2': t})
measurable and if Y is an integrable random variable that is
(j({X(s) : s::::; t})measurable then E[ZYIX(t)] = E[ZIX(t)]
E[Y I X(t)] a.s.
Notice in the previous theorem that Z is a function of the present
and the future of the Markov process and that Y is a function
of the present and the past of the Markov process.
6.25 Theorem (ChapmanKolmogorov) ConsideT a MaTkov pro
cess {X (t) : t E [0, oo)}. If Z is an integrable random variable
that is (j ({X (s) : s 2': t}) measurable and if 0 ::::; to < t then
E[Z I X(t o)] = E[E[Z I X(t)] I X(t o)] a.s.
Example 6.4 Consider a zero mean, Gaussian, Markov
random process {X(t) : t E JR.} and consider real numbers
tl < t2 < ... < tn < t. Since the process is Markov it fol
lows that E[X(t) I X(td, X(t"2), ... ,X(tn)] = E[X(t) I X(t n )]
a.s. But, since the process is also Gaussian, we know that
E[X(t) I X(t n )] = aX(t n ) a.s. for some real constant a. Now,
consider the problem of estimating or predicting the value of
the process at some future time t based on a collection of past
samples of the process by taking the conditional expectation of
the process at time t given the past samples. For a Gaussian
Markov process, this estimate is simply a linear function of the
last sample. All previous samples taken before the last sample
may be discarded. Although we will not show it here, an esti
mate of this type based on conditional expectation provides a
best (in a minimum meansquare error sense) estimator of the
random variable of interest as a Borel measurable transforma
tion of the data. D
www.MathGeek.com
www.MathGeek.com
Martingales 149
6.7 Martingales
Let {Xn}nEN be a random sequence defined on a probability
space (n, F, P) and let {Fn}nEN be a seqllence of ITsllbalgebras
of F. The random seqllence {Xn}nEN is said to be a martingale
relative to {Fn : n E N} if the following four conditions hold for
each positive integer n:
2. Xn is Fnmeasurable,
3. E[IXnll < 00, and
4. E[Xn+l I Fnl = Xn a.s.
Do you know what it is to be possessed by a
problem, to have within yourself some urge
that keeps you at it every waking moment,
that makes you alert to every sign pointing
the way to its solution; to be gripped by
a piece of work so that you cannot let it
alone, and to go on with deep joy to its
accomplishment? Lao G. Simons
A sequence of ITalgebras that satisfies condition (1) is called
a filtration. If condition (2) holds for all n E N then we say
that the random sequence {Xn}nEN is adapted to the filtration
{Fn : n EN}. If Fn = IT(Xl' X 2 , ... , Xn) then {Fn : n E N} is
a filtration and is called the canonical filtration associated with
the random sequence {Xn}nEN. If a martingale is given without
a specified filtration then it should be regarded as a martingale
with respect to its canonical filtration.
6.26 Theorem If a random sequence is a martingale with respect to
some filtration then it is a martingale with respect to 'its canon
ical filtration.
www.MathGeek.com
www.MathGeek.com
150 Random Processes
Proof. Assume that {Xn :n E N} is a martingale with respect
to some filtration {Fn : n E N} and let Tn and n be positive
integers such that Tn < n. Since Xm is Fmmeasurable and
Fm c F n , it follows that Xrn is Fnmeasurable. Thus, it follows
that Xl, X 2, ... , Xn are each Fnmeasurable for any positive
integer n. Finally, since IT(XI, ... , Xn) C Fn for any n E N, it
follows that
E[E[Xn+IIFnlliJ(XI' ... , Xn)l
E[XnliJ(XI, ... , Xn)l
A random sequence {Xn}nEN is said to be a submartingale rel
ative to {Fn : n E N} if conditions (1), (2), and (3) given above
and condition (4') given below each hold for every positive inte
ger n: (4') E[Xn+1 I Fnl ~ Xn a.s.
A random sequence {Xn}nEN is said to be a supermartingale
relative to {Fn : n E N} if conditions (1), (2), and (3) given
above and condition (4") given below each hold for every positive
integer n: (4") E[Xn+1 I Fnl :::; Xn a.s.
Example 6.5 Let Xl, X 2, ... be mutually independent ran
dom variables with zero means and finite variances. Further, let
Sn = Xl + X 2 + ... + Xn and Tn = S~ for n E N. Note that
E[Tn+IIXI' ... , Xnl
E[(XI + X 2 + ... + Xn + Xn+d 2 IXI' ... , Xnl
E[(XI + ... + Xn)2 + 2Xn+1(Xl + ... + Xn)
+X~+IIXI' ... , Xnl
E[Sn + 2Xn+1 S n + Xn+IIXI, ... , Xnl
2 2
E[S~IXI, ... , Xnl + 2E[Xn+I S n IX I , ... , Xnl
+E[X~+IIXI' ... , Xnl
S~ + 2SnE[Xn+ll + E[X~+1l
Tn + E[X~+1l
> Tn a.s.
Thus, Tn is a submartingale with respect to iJ(XI' ... ,Xn). D
www.MathGeek.com
www.MathGeek.com
Random Processes with Orthogonal Increments 151
Martingales are often used to model gambling games that are
fair. That is, they model a game in which the expected fortune
of the gambler after the next play is the same as his present
fortune. In this context, a submartingale would represent a
game that is favorable to the gambler and a supermartingale
would represent a game that is unfavorable to the gambler. (A
"martingale" is part of a horse's harness that prevents the horse
from raising its head too high. A martingale be<:ame a gam
bling term through its association with horse racing and later
was used to describe processes of this sort.) The following the
orem is called the Martingale Convergence Theorem and is due
to Joseph Doob.
6.27 Theorem (Doob) Let the mndom sequence {Xn}nEN be a sub
martingale wdh respect to ds canon'lcal filtmtion. If SUPnEN
E[IXnl] is finite then Xn converges almost surely to a mndom
variable X such that E[IXI] ::; sUPnEN E[IXnl]·
Consider a filtration {Fn : n E N} and let FrXJ denote the small
r
est 0" alge bra containing U~=l Fn. In this case we write Fn F =
and have the following result.
r
6.28 Theorem If Fn F= and if Z is an integmble mndom variable
then E[Z I Fn] converges to E[Z I F=] a.s.
6.8 Random Processes with Orthog
onal Increments
A random pro<:ess {X (t) : t E T} is said to possess orthogonal
in<:rements if E[IX(t)  X(s)12] < IX) for all 5, t E T and if,
whenever parameter values satisfy the inequality Sl < t1 ::; S2 <
t 2, the increments X(t1) X (Sl) and X(t2) X (S2) ofthe process
are orthogonal; that is, E[(X(t1)  X(Sl))(X(t 2)  X(S2))] = o.
6.29 Theorem Let {X(t) : t E T} be a random process with orthogo
nal increments. There e:cists a nondecreasing function F(t) such
www.MathGeek.com
www.MathGeek.com
152 Random Processes
that E[IX(t)  X(sW] = F(t)  F(s) when s < t. Further, the
function F is unique up to an additive constant.
Notice that the previolls theorem implies that the meansquare
continuity of a random process with orthogonal increments is
related to the pointwise continuity of the corresponding function
F(t). VVe denote the relationship between the function F(t)
and the random process {X(t)} by writing E[ldX(t)12] = dF(t).
(Our use of a differential here is just for notational purposes.)
Let {Y(t) : t E Jl{} be a random process with orthogonal incre
ments and let h(t) be a realvalued function. VYe are now going
to direct our attention toward defining an integral of the form
k h(t) dY(t).
Since the sample functions of the random process {Y(t)} are
not generally of bounded variation, we cannot define the above
integral as an ordinary RiemannStieltjes integral with respect
to the individual sample functions. Instead, we will define this
integral (called a stochastic integral of h( t) with respect to the
random process {Y(t)}) as an L2 limit. As usual, we begin by
defining the integral when h(t) is a step function.
Assume that the function h(t) is of the following form where ai
and Ci are real numbers for each i and where al < a2 < ... < an:
if t < al
h(t) ~ {~ if ajl =::; t
if t ::2: an.
< aj for 1 < j =::; n
For such a function h we will define the stochastic integral of
h(t) with respect to {Y(t)} to be
More precisely, we define the integral to be any random variable
that is equal almost surely to the sum on the right hand side.
(One technical detail is that if aj is a discontinuity point of
F then instead of Y (aj) in the previous definition we use the
meansquare limit of Y(t) as t raj. This limit will exist due
www.MathGeek.com
www.MathGeek.com
Random Processes with Orthogonal Increments 153
to the relation between F and Y (t).) It is not difficult to show
that if h(t) and g(t) are step functions (as defined above) and if
E[ldY(t)12] = dF(t) then
E [l h(t) dY(t) l g(t) dY(t)] = l h(t)g(t) dF(t).
Now, consider a realvalued function h(t) and let {hn(t)}nEN be
a sequen<:e of step fundions (as defined above) su<:h that
l (h(t)  h (t))2 dF(t)
n 70
as n 7 00. Further, let Zn denote the sto<:hastk integral of the
step fundion hn(t) with resped to the random pro<:ess {Y(t)}.
The previous observation implies that there exists a random
variable Z such that E[(Z  Zn)2] 7 0 as n 7 00. Further,
the random variable Z does not depend on the particular se
quen<:e {hn}nEN given above. That is, a random variable equal
almost surely to Z will be obtained when any sequen<:e {hn}nEN
mnverging in the above sense to h is <:hosen. \Ve define the
sto<:hasti<: integral
l h(t) dY(t)
to be any random variable that is equal almost surely to the
random variable Z.
6.30 Theorem In the context of OUT discussion, a function h may
be TepTesented as a limit of step functions (in the above sense)
if the integml fJR h 2(t) dF(t) e.'rists and is finite.
If h(t) = g(t)+2p(t) is a mmplexvalued fundion su<:h that 9 and
P each satisfy the condition of the previous theorem then the in
tegral fJR h( t) dY( t) is defined to be fJR g(t) dY (t) +l fJRp( t) dY(t).
\Ve will say more about complexvalued random processes later.
www.MathGeek.com
www.MathGeek.com
154 Random Processes
6.9 Wide Sense Stationary Random
Processes
Recall the definition of a wide sense stationary random process
that was given on page 140. In this section we will consider
zero mean, continuous time VVSS random processes with par
ticular concern for their harmonic properties. Throughout this
section we will assume, based upon the following theorem, that
all wide sense stationary random processes satisfy the following
condition:
lim E[IX(t)  X(sW] = O.
ts+O
6.31 Theorem A wide sense stationary random process {X(t) : t E
lR.} possesses a separable and measurable modification if
lim E[IX(t)  X(s)12] = O.
ts+O
Further, 4 a modification of {X (t) : t E lR.} is measurable then
the pmcess must satisfy the previous condition.
Thus, the previous condition is a minimal continuity hypothesis
and we will assume that it is satisfied whenever we discuss con
tinuous parameter WSS random processes. In addition, we will
always take the parameter set of such a process to be either lR. or
[0, (0). Recall that the auto covariance function of a zero mean
\VSS random process is defined by K(t) = E[X(s + t)X(s)].
Note that the autocovariance function K(t) is continuous since
IK(t)  K(s)1 IE[(X(t)  X(s))X(O)]1
< VE[IX(t)  X(s)12]E[IX(0)12]
where the right hand side approaches zero as t  s 7 0 via the
previolls continuity hypothesis.
6.32 Theorem The autocovariance function K(t) of a zem mean
WSS random pmcess may be expressed as
www.MathGeek.com
www.MathGeek.com
Wide Sense Stationary Random Processes 155
wheTe the junction F is nondecTeasing, bounded, Tight continu
ous, and such that F( 00) = O. FuTther, the junction F is the
unique such junction jor which the above equality is satisfied.
Consider a zero mean WSS random process with a autocovari
ance function K and let F denote the function obtained via the
previous theorem for this autocovariance function. The func
tion F is called the spectral distribution function of a zero mean
\VSS random process with autocovariance function K. If F is
absolutely continuous then its derivative F' exists almost every
where and is called a spectral density function of a \VSS random
process with auto covariance function K.
6.33 Theorem Ij JIll!.IK(t)1 dt < 00 then theTe e:rlsts a cont'lnuous
spectral density junction given by
The spectrum of a \VSS random process is given by the set of
all real numbers AO such that F (AO + c) > F (AO  c) for every
[ > O. That is, the spectrum c:onsists of all points of increase of
the spectral distribution function F. Note in particular that the
spectrum of a vVSS random process is a subset of lR and is not,
as is frequently misstated, a function. The spectrum consists
of frequencies that enter into the harmonic analysis of both the
auto covariance function and the sample functions of the random
process.
6.34 Theorem Every zero mean wide sense stationary random pro
cess {X(t) : t E lR} satisjying the continuity condition given at
the beginn'ing oj this section possesses a spectral representation
oj the jorm
X(t) = re
Jill!.
21r1t
)" dY(A),
where the random process {Y(A) : A E lR} has orthogonal in
CTements and is s1Lch that E[ldY(A)12] = dF(A) wheTe F is the
spectral distribution junction oj {X (t)}.
Let {X (t) : t E lR} be a meansquare continuous, wide sense
stationary random process defined on a probability space (0,
www.MathGeek.com
www.MathGeek.com
156 Random Processes
F, P) and let H denote the real Hilbert space L 2 (O, F, P).
Note that for any real number t, the random variable X(t) is a
point in H. Further, note that IIX(t)11 = VR(O) where R is the
autocorrelation function of the random process {X (t) : t E lR.}.
Let S denote the sphere in H consisting of all points in H that
V
are at a distance of R(O) from the origin. Note that the random
process {X(t) : t E lR.} is a subset of S. Further, since the
random process is meansquare continuous, we see that as .s + t,
IIX(s) X(t)ll+ 0, which implies that the random process is a
continuous curve in H.
6.10 ComplexValued Random Pro
cesses
It is often convenient to be able to deal with random processes
that are complexvalued. The extension from the real case is
very straightforward.
If {Y(t) : t E T} is a random process taking values in the com
plex plane then Y(t) = X(t) + ~Z(t) where {X(t) : t E T}
and {Z(t) : t E T} are realvalued random processes. Further,
E[Y(t)] = E[X(t)] + ~E[Z(t)] for all t for which the expecta
tions on the right hand side exist and are finite. We say that a
complexvalued function is measurable if the real and imaginary
parts of the function are each measurable. Finally, the autoco
variance function of a complexvalued random process {Y(t)} is
given by K(s, t) = E[(Y(s)  E[Y(s)])(Y(t)  E[Y(t)])*].
One should be careful not to carelessly apply theorems about
realvalued random processes to complexvalued random pro
cesses. For example, a wide sense stationary complexvalned
Gaussian process need not be strictly stationary, and there exist
two complexvalued mutually Gaussian random variables that
are uncorrelated but not independent.
www.MathGeek.com
www.MathGeek.com
Linear Operations on WSS Random Processes 157
6.11 Linear Operations on WSS Ran
dom Processes
Let {X (t) : t E ~} be a zero mean \VSS random process with a
spectral representation given by
where E[ldY(>.)I2] = dF(>'). By a linear operation on the pro
cess {X(t)} we will mean a transformation of {X(t)} into a
random process {Z (t) : t E ~} of the form
The function C(>') may be any real or complexvalued function
for which the following integral exists and is finite
kIC(>')1 dF(>.). 2
The function C is called the gain of the linear operation. Note
that the process {Z (t)} is a zero mean VVSS random process
and also satisfies the continuity condition given at the beginning
of this section. (That {Z(t)} is zero mean follows from the
definition of the stochastic integral and the fact that {X (t)} is
zero mean.) Further, {Z(t)} is a WSS random process since its
auto covariance function Q(t) itl given by
lIT{r e
21TltA
Q(t) = IC(>') 12 dF(>.).
In addition, the spectral distribution function G (>.) of {Z (t)} is
given by
G (>.) = 1
. (=,>.]
1C (JL ) 12 dF (JL ) .
If {X (t)} possesses a spectral density function f (>.) then {Z (t)}
possesses a spectral density function g(>.) that is given by g(>.) =
IC(>')1 2 f(>.)·
www.MathGeek.com
www.MathGeek.com
158 Random Processes
In engineering contexts, it is common to consider linear op
erations (sometimes called linear filters) of the following form
where the function h(t) is often (imprecisely) called an impulse
response function:
Z(t) = k h(s) X(t  s) ds.
As the following theorem shows, this linear filter is a special case
of the linear operations that we have been considering.
6.35 Theorem Let {X(t) : t E lR} be a zero mean WSS random
process with a bounded spectral density function and a spectral
repTesentation given by
X(t) = re
J~
2mt
)" dY(>').
FUTtheT, let h be a Teal OT comple:rvalued function defined on lR
that is continuous a. e. with respect to Lebesgue measure, inte
grable, and square integrable. Define
k h(s)X(t  s) ds
to be the limit in L2 as T 7 CXJ of the L'2 'integral
1(T,T)
h(s)X(t  s) ds.
This L2 limit e:rists and is eqtwl to
where H is the FouTier transfoTm of h. The function H is some
times called the transfer function of the linear jilter.
<) 6.12 Nonlinear Transformations
www.MathGeek.com
www.MathGeek.com
Nonlinear Transformations 159
Random processes often appear as models for random signals
and noise. An assumption of stationarity is often warranted.
Nonlinear systems that commonly appear in practice include
half wave rectifiers, limiters, square law devices, and others. Let
{X (t) : t E JR.} be a stationary Gaussian random process with
mean zero and positive varian<:e (J2. Assume that the auto<:or
relation function of X(t) is denoted by ReT) = E[X(t)X(t+T)].
Further, assume that the function R(·) is positive definite. For
two random variables X(t 1 ) and X(t 2) in the random process,
let P(tl  t 2) = R(tl  t 2)/ (J2 denote their correlation coefficient.
Recall that a bivariate probability density function j (., .) exists
for these two random variables. Further, we can and do take
this bivariate probability density function to be continuous as a
function of its two real arguments. Indeed, we note that j(x, y)
can be taken as the bivariate Gaussian density function given
on page 112 with ml = m2 = 0, with (Jl = (J2 = (J, and with
p = p(tl  t2). That is X(td and X(t2) have a N(O, 0, (J2,
(J2, P(tl  t 2)) distribution. Now, let p denote the continuous
marginal Gaussian density function. That is, p is the <:ontinu
ous probability density function corresponding to the N(O, (J2)
distribution. Let T = tl  h
Define the measure m on B(JR.) via m(B) = IE p(x) dx, and note
that Tn is equivalent to Lebesgue measure on B(JR.). Further,
consider the real Hilbert space L 2 (JR., B(JR.) , m). Vve will take
this real Hilbert space as our space of nonlinearities. VVe will let
the Borel measurable function 9 correspond to a point in L2 (JR.,
B(JR.) , m), and we will refer to 9 as a nonlinearity. Nonlinear
systems such as this are often referred to as zero memory non
linearities. That is, the output is a Borel measurarable function
of the input at the same time; if it depends on the input at
earlier times, then the system is said to have memory. \Ve will
be <:on<:erned with the random pro<:ess {g(X(t)) : t E JR.}. Note
that this random pro<:ess is also a stationary random pro<:ess.
For a nonnegative integer n, let
Bn(x) =
(l)n
1:::1. exp
(X2)
2
n
d
p(x)
V nl 2(J dxn
where p is the univariate Gaussian density function given above.
These functions are called the orthonormalized Hermite poly
www.MathGeek.com
www.MathGeek.com
160 Random Processes
nomials and are obtained by applying the GramSchmidt or
thonormalization procedure to the collection of functions of the
form xk for nonnegative integers k. Note that 8n is an nth degree
polynomial. Note, also, that for each nonnegative integer n, the
norm of 8n is unity, and ,8n , 8m ) = 0 for nonnegative distinct
integers nand m. Thus, the set of functions {8 n : n E N U {O} }
is a set of orthonormal functions in L 2 (lR., B(lR.) , rn). Indeed, it
is an orthonormal subset of the real Hilbert space L 2 (lR., B(lR.) ,
m).
Recall Lllsin's Theorem which states that for any element 9 of
L 2(lR., B(lR.) , m) and for any positive s there exists a continu
ous function c(·) that is equal to 9 on a given bounded interval
pointwise off a set of Lebesgue measure less than E. Recalling
the "Vierstrauss approximation theorem, we know that there ex
ists a sequence of polynomials that converges to c(·) uniformly
on the interval of interest. Thus we can make the uniform norm
between c(·) and a polynomial arbitrarily small on the interval
of concern. This polynomial can be written as a linear com
bination of the orthonormalized polynomials 8n (·). \Vith this
reasoning, we see that the set of orthonormalized polynomials
{8 n (·) : n E N U {O}} is dense in L 2(lR., B(lR.) , m). Hence, any
9 E L2 (lR., B (lR.), m) can be expressed as
00
9= L bn 8n (),
n=O
where the convergence is in L 2 (lR., B(lR.) , m). Note that
00 roo
~ b~ = ioo Ig(x)12 dx < CXJ
via Parseval's eqnality and since 9 is an element of L2(lR., B(lR.) ,
m).
Consider again the bivariate density function f of the random
variables X(tl) and X(t2) where, for convenience, we let X =
X(td and Y = X(t2). This bivariate density admits a senes
expansion, given via the Mehler series, as
00
f(x, y) = p(x)p(y) L pn(T)8n(x)8 n (y),
n=O
www.MathGeek.com
www.MathGeek.com
Nonlinear Transformations 161
where the convergence is in the following sense:
xp(x)p(y)dxdy = O.
\Ve now consider a nonlinearity 9 and the bivariate density func
tion f as given above. Consider, also, the output random process
g(X(t)). \Ve are interested in the bandwidth characteristics of
the output. The autocorrelation function of the output is given
by Rg(T) = E[g(X(t))g(X(t + T))]. Thus, we see that
Rg(T) = E[g(X)g(Y)] = 1: 1: g(x)g(y)f(x, y) dxdy.
Observe, further, that
=
Rg(T) = L b~pn(T),
11,=0
where the convergence is uniform in T, since the bn's are square
summable and since Ip(T) I : : ; 1.
Now suppose that the input random process {X(t) : t E lP&}
has a spectral density function that has compact support. Let
8 denote the spectral density function of the input, and let n
denote the support of S. \Ve are thus assuming that the input
is bandlimited and that the Lebesgue measure of n is bounded.
Recall that the Fourier transform of ()2 p(T) is equal to 8 (w).
Hence, by well known properties of Fourier transforms, we see
that the Fourier transform of pH (T) is given by
where 8(W)*" = (8 * 8 * ... * 8)(w) denotes 8(·) convolved with
itself n  1 times. (Note that 8 * 8(w) is 8 convolved with itself
once.) Assume that the nonlinearity 9 is not almost everywhere
equal to a polynomial. Then infinitely many of the bn's are
nonzero. Note that in this case, if the input random process
{ X (t) : t E lP&} were not bandlimited, then the output random
process {g(X(t)) : t E lP&} would not be bandlimited, either.
www.MathGeek.com
www.MathGeek.com
162 Random Processes
Next we show that even if the input random process were ban
dlimitec!., the output random process is not bandlimited when 9
is not a polynomial. Observe that the support of
is given by the nfold Minkowski sum of r2 with itself; that is, its
support is r2 EB ... EB r2 = EBnn where EB denotes the lVIinkowski
sum; that is, AEBB = {a+b : a E A, b E B}. Assume again that
the input is bandlimited. It follows that 00 > In( EBnr2) 2: n'\(r2)
where .\ is Lebesgue measure. Further, recalling the Cantor
ternary set, we see that the above inequality can be strict, and
by considering a dosed interval, we see that it can be satisfied
with equality. For the moment, we will measure bandwidth by
the Lebesgue measure of the support of the spectral density.
Thus, we see that the output is bandlimited if and only if there
exists a positive integer N such that bn = 0 for all n > N.
Notice that this condition is equivalent to the nonlinearity being
almost everywhere equal to a polynomial. However, since we are
assuming that the nonlinearity is not almost everywhere equal to
a polynomial, we see that the bn's do not truncate, and thus the
output is not bandlimited. On the other hand, if the nonlinearity
is almost everywhere equal to a polynomial then we see that the
output is bandlimited if and only if the input is bandlimited.
\Ve summarize this result with the following theorem.
6.36 Theorem The output random pTOcess in the above discussion
is strictly band limited if and only if the Gaussian input random
process is strictly bandlim'ited and the nonlinearity 9 (.) is almost
everywhere equal to a polynomial.
Recall, in particular, that "limiters" are not polynomials, and
thus the output of a limiter with a Gaussian input is never ban
dlimited.
Now we will consider the case where the input random process
is not strictly bandlimited but has a finite second moment band
width given by
www.MathGeek.com
www.MathGeek.com
Nonlinear Transformations 163
Assume for now that the mean has been subtracted from the out
put random process. Observe that this is equivalent to assuming
that bo = 0 since Bo(x) = 1 for all x and thus bo = E[g(X(t))].
Observe, also, that (J"2nS(w)*n can be viewed as a density func
tion of a sum of n mutually independent, identically distributed
random variables each with mean zero and variance equal to the
second moment bandwidth of the input B2 [X]. Thus it follows
that
1: w2(J" 2n S(w)*n dw = nB;[Y]
where B2 [Y] denotes the output second moment handwidth.
Now recall that the output antocorrelation fnnction is given by
00
R(T) = L b~pn(T).
n=1
Further, recall £I·om standard properties of Fourier transforms
that B2 [X] = pl/(O). Similarly, note that
B [Y] = RI/(O).
2 R(O)
Next, using Fubini's theorem on term by term differentiation,
we will deduce the preceding derivative. Note that p(T) is max
imized at the origin. Further, there exists a positive number 6
such that p( T) is monotone in (0, 6). Also, pn (T) has the same
monotonicity property. Taking derivatives from the right, and
using Fubini's theorem on term by term differentiation, we see
that
00
n=1
00
11,=1
00
RI/(T) = L (b~n(n  1)pn2(T)(p'(T))2 + b~npn1(T)pl/(T)) .
11,=1
Thus, we see that
00
RI/ (0) = L nb~pl/ (0).
n=1
www.MathGeek.com
www.MathGeek.com
164 Random Processes
Further, we see that
B [Y] = R"(O) = 2:::=1 nb;'B [X].
2 R(O) 2::~=1 b~ 2
Thus the output second moment bandwidth is greater than or
equal to the input second moment bandwidth with equality hold
ing precisely when hn = 0 for all n > 1. This, however, charac
terizes the case where 9 is almost everywhere equal to an affine
function; that is, when 9 (x) = ax + b for real numbers a and b.
\Ve summarize this result in the following theorem.
6.37 Theorem If {X(t) : t E lR} is a zero mean Gaussian random
process that has a finite mean square bandwidth, and if 9 is a
nonlinearity such that 9 (X (t)) has zero mean, then the mean
square bandwidth of 9 (X (t)) is greater than or equal to that of
the 'input. Eq1wlity holds if and only 'if 9 is almost everywhere
equal to an affine fm~ction.
<> 6.13 Brownian Motion
The random processes that we will describe in this section were
first used to model the movement of a partide suspended in a
fluid and bombarded by molecules in thermal motion. Such mo
tion was first analyzed by a nineteenthcentury botanist named
Robert Brown. The mathematical foundations of the theory
were later developed by Albert Einstein in 1905 and (rigorously)
by Norbert vViener in 1924.
A Brownian motion process (or a \iViener process) is a random
process {W(t) : t ~ O} defined on some probability space (0,
F, P) that satisfies the following four properties:
1. n'(O, w) = 0 for each w E 0,
2. for any real numbers 0 ::; to < tl < ... < t k , the incre
ments ~V(tk)  Vqt k 1 ), VV(t k 1)  ~V(tk2)' ... , H'(t1)
n'(to) are mutually independent,
www.MathGeek.com
www.MathGeek.com
Brownian Motion 165
3. for 0 :::; s < t, the increment W~(t)  W~(s) is a Gaussian
random variable with mean zero and variance t  s,
4. for each w E r2, W~(t, w) is continuous in t.
6.38 Theorem There exists a random process defined on a probabil
ity space (that may be taken as the und interval wdh Lebesgue
measure) that satisfies conditions (1), (2), (3), and en
given
above.
Returning to the physical motivation given above, a Brownian
motion process may be used as a model for a single compo
nent of the path of a suspended particle subjeded to molecnlar
bombardment. For example, consider the projedion onto the
vertical axis of snch a particle's path. Condition (2) refleds a
lack of memory of the suspended particle. That is, althongh the
future behavior of the particle depends on its present position,
it does not depend on how the particle arrived at its present po
sition. Condition (3), which specifies that the increments have
zero mean, indicates that the particle is equally likely to go up or
down. That is, there is no drift. Condition (3), which specifies
that the variance of the increments grows in proportion to the
length of the interval, indicates that the particle tends to wan
der away from its present position and having done so suffers no
force tending to restore it. Condition (4) is a natural condition
to expect of the path of a particle and condition (1) is merely a
convention.
Since ~V(t)  tV(O) = tt'(t) , property (3) implies that ~V(t)
is Gaussian with mean 0 and variance t. If 0 :::; s < t then,
using the previous properties, we see that E[vV(s)vV(t)] =
E[~V(s)(vV(t)  ~V(s))] +E[vV 2 (s)] = E[vV(s)]E[tt'(t)  ~V(s)] +
E[~V2(S)] = s. Thns, it follows that E[~V(s)~V(t)] = min{s, t}.
6.39 Theorem There exists a Brownian mot'lon process that is a
measurable random process.
6.40 Theorem With probability one, lV(t, w) zs nowhere differen
tiable as a function of t.
Thus, off a null set, the sample paths of a Brownian motion
process are continuous and nowhere differentiable. A nowhere
www.MathGeek.com
www.MathGeek.com
166 Random Processes
differentiable path represents the motion of a particle that at
no time has a velocity. Further, since a function of bounded
variation is differentiable a.e., the sample paths of a Brown
ian motion process are of unbounded variation almost surely.
Brownian motion is a commonly used. model for noise in engi
neering applications. In the following example we will find the
KarhunenLoeve expansion of a Brownian motion process.
Example 6.6 Consider a Brownian motion process restricted
to the interval [0, 1]. The autocovariance function of such a
random process is given by K(s, t) = min(s, t) for s, t E [0,
1]. To find the eigenvalues of this integral operator associated
with this autocovariance function, we must solve the integral
equation
1
10 min(s, t)e(t) dt = >.e(s); 0:::; s :::; 1
which reduces to
s 11
loo te(t) dt + s e(t) dt = >.e(s); 0 :::; s :::; 1.
s
(1)
Leibniz's rule 2 implies that
d los te(t) dt = se(s)
ds 0
and that
:s 11 se(t) dt = 11 e(t) dt  se(s).
Thus, differentiating (1) with respect to s implies that
1 d
1
. s
e(t) dt = >.e(s)
ds
(2)
2If a1(t, s) exists and is continuous and if o:(s) and (3(s) are difIeren
as
tiable realvalued functions then
d 1;3(S)
;; . 1(t, 8) dt = 1(3(8),
d3(s)
8)'~ 1(0:(8),
do:(s) 1;3(S)
s).+
a)(t, 8) dt.
C)
de n(8) de ds . 0«8) uS
www.MathGeek.com
www.MathGeek.com
Brownian Motion 167
and differentiating (2) with respect to s implies that
d2
e(s) = A~e(s). (3)
ds
Recall that a solution of (3) will have the form
for A > 0 and A, B E JR. Setting s = 0 in (1) implies that
e(O) = 0 for A > 0 and hence that B = O. Setting s = 1 in (2)
implies that cos(l/ v1) = 0 which in turn implies that
1 (2n  1)11"
vIA 2
for n E N. Thus, writing e(s) as a function of nand s we have
en(s) = A sin C~ 2n
l)11"S)
for n E N. Note that
1
10 cn(s)em(s) ds
= 2A2 [sin((j  k)11"/2) _ sin((j + k)11"/2)]
11" 2(jk) 2(j+k)
= o.
Thus, the en's are orthogonal. Requiring the en's to be orthonor
mal implies that A = J2 since the sum and difference of odd
numbers is even. Thus, the eigenvalues are given by
4
An = (2n  1)211"2
and the orthonormalized eigenfunctions are given by
en(t) = J2 sin((2n  1)11"t/2)
for n E N. Thus, the KarhunenLoeve theorem implies that
=
X(t) = L Znen(t)
n=l
where
for each n E N. D
www.MathGeek.com
www.MathGeek.com
168 Random Processes
6.14 Caveats and Curiosities
www.MathGeek.com
www.MathGeek.com
7 Problems
7.1 Set Theory
Problem 1.1. Let n be a nonempty set and, for each t E lR,
let At be a subset of n. Assume that if t1 < t2 then Atl C A t2 .
Show that UtElR At = UnEN An.
Problem 1.2. Let 0 be a nonempty set, let :F be a ITalgebra
on 0, and let A be a nonempty subset of O. Let 9 be a family
of subsets of A given by 9 = {B E lP(A) : B E :F}. Is 9 a
ITalgebra on A?
Problem 1.3. Prove or Disprove: The set of all integers is
equipotent to the set of all positive, even integers.
Problem 1.4. Consider a nonempty set 0 and let :F be a
sllbset of lP(n). Show that IT(:F) exists and is llniqlle. (Recall
that IT(:F) is the smallest ITalgebra on 0 that contains every
element in :F.)
Problem 1.5. Consider a nonempty set O. A subset of 0
is said to be cofinite if its complement is finite. (That is, A is
cofinite iff AC is finite.) Let:F be the subset of lP(O) consisting
entirely of all finite and cofinite subsets of n. Must:F be an
algebra on O? Must :F be a ITalgebra on O?
Problem 1.6. Consider non empty sets X and Y and consider
a function f: X + Y. Show that j (j1 (A)) c A for all A C Y
and that B C j1(f(B)) for all B C X.
Problem 1.7. For a function f: X + Y, show that the
following three statements are equivalent:
1. j is onetoone.
www.MathGeek.com
www.MathGeek.com
170 Problems
2. f(A n B) = f(A) n f(B) for all A C X and all B C X.
3. f(A) n f(B) = 0 whenever A and B are disjoint subsets
of X.
Problem 1.8. Show that .f: X + Y is onto if and only if
f(fI(B)) = B for each subset B of Y.
Problem 1.9. Consider the set S of all sequences of the form
°
{aI, a2, a3, ... } where ai is equal to or 1 for each i. (For
example, {O, 0, 0, ... }, {I, 1, 1, ... }, and {I, 0, 1, 0, ... } are
all points in S.) Show that S is an uncountable set. (That is,
show that there are uncountably many different such sequences
of O's and 1 's.)
Problem 1.10. Any real number that is a root of a (nonzero)
polynomial with integer coefficients is called an algebraic num
ber. (For example, y'2 is algebraic yet 7r is not.) Show that the
set of all algebraic numbers is countable.
<) Problem 1.11. Show that any uncountable set of positive real
numbers includes a countable subset whose elements sum to 00.
Problem 1.12. Let n be a nonempty set and let F be a
collection of subsets of n such that n E F and such that if A
and B are in F then A \ B is in F. Show that F is an algebra
on n.
<) Problem 1.13. A ITalgebra is said to be countably generated
if it is equal to IT(AI' A 2 , ... ) for some countable colledion {An}
of measurable sets. Show that B(JR.) is countably generated.
Problem 1.14. 'What is the smallest ITalgebra on JR. that
contains every singleton subset of JR.?
www.MathGeek.com
www.MathGeek.com
Measure Theory 171
Problem 1.15. Consider an uncountable set A and let B be
a countably infinite subset of A. Show that A is equipotent to
A \B.
Problem 1.16. Show that the interval (0, 1] is equipotent to
the set of all nonnegative real numbers.
<> Problem 1.17. Does there exist a aalgebra with a countably
infinite number of elements?
7.2 Measure Theory
Problem 2.1. Let fL: JPl(N) 7 [0, 00] via fL(A) = 0 if A is a
finite set and fL(A) = 00 if A is not a finite set. Is fL a measure
on (N, JPl(N))?
Problem 2.2. Prove or Disprove: Let (D, F, fL) be a measure
space and let {An}nEN be a sequence of sets from F. Assume
that the sequence {An}nEN is a strictly decreasing sequence; that
is, assume that, for each positive integer fl, An+1 is a proper
subset of An. If the sequence {An}nEN converges to the empty
set as n 7 00 then fL(An) converges to zero as n 7 00.
<> Problem 2.3. Let D = JR2 and, for each positive integer TI, let
An be the open ball ofradius one centered at the point (( l)n In,
0). Findliminfn 7= An andlimsuPn7= An.
Problem 2.4. Consider the measure space (JR, B(JR) , A) where
A denotes Lebesgue measure. Let {At: t E I} be a collection of
null sets where I is any index set.
1. Show that UtE! At need not be a measurable set.
2. If UtE! At is measurable then must it be a null set?
www.MathGeek.com
www.MathGeek.com
172 Problems
I would quarrel with mathematics, and say
that the sum of zeros is a dangerous num
ber. Stanislaw Jerzy Lec
Problem 2.5. Show that any countable subset of lR. IS
Lebesgue measurable and has Lebesgue measure zero.
Problem 2.6. Consider a function .I that maps lR. into R If
1.11 is a Borel measurable function then must .I also be a Borel
measurable function? Explain.
<) Problem 2.7. Suppose that P1 and P2 are probability mea
sures on rr(P) where P is a 1l"system. Prove that if Hand P2
agree on P then they also agree on O"(P).
7.3 Integration Theory
Problem 3.1. Let .In: [0, 1] + lR. be defined via
fn(x) = { °n if x E (0, lin)
if x tf. (0, lin).
1. Find limn;oo .In (x) .
2. Find limn;oo f01 .In (x) dx.
3. Find fei limn;oo fn(x) dx.
Problem 3.2. Let 0 = {1, 2, 3, 4} and let J1 be a realvalued
function on lP(O) such that J1(A) is equal to the number of points
in A. (For example, J1( {1, 3}) = 2 and J1(0) = 4.) Show that
www.MathGeek.com
www.MathGeek.com
Integration Theory 173
JL a measure on (0, lP(O)). Let f
IS 0 + lR. via f(w) = w2 .
Evaluate the Lebesgue integral
10 f dJL.
Problem 3.3. Consider a continuous probability distribution
function 1 F for which F(a) = 0 and F(b) = 1 for some a and b
from R Evaluate the following RiemannStieltjes integral:
r
J[a,b]
F(x) dF(x).
Problem 3.4. Let F be a probability distribution function
that is absolutely continuous and let c > o. Evaluate the follow
ing integral:
1: (F(x + c)  F(x)) dx.
Problem 3.5. Engineers frequently use the "delta function"
l5(t) which has the interesting property that if f : lR. + lR. is
c:ontinnolls at the origin then
1: l5(t)f(t) dt = f(O).
Unfortunately, no such function 15 exists since if it did it would
equal 0 for nonzero t, and hence would integrate to zero. That
is, the above integral would be zero for any continuous function
f. \Ve can, however, obtain such a "sampling property" using
a RiemannStieltjes integral. For what function 9 : lR. + lR. is it
true that
1(a, b)
f(x) dg(x) = f(O)
when .f is continnolls at the origin and when a < 0 < b? Why
can't we simply define l5(t) to be the derivative of g(t)?
1 Probability distribution functions are defined on page 70 in Section 5.2.
If you have not yet studied that section, you may want to defer this problem
and the next problem until later.
www.MathGeek.com
www.MathGeek.com
174 Problems
7.4 Functional Analysis
Problem 4.1. For real numbers x and y, let d(x, y) = (Xy)2.
Does d define a metric on the set of real numbers?
Problem 4.2. Let (lvI, p) be a metric space. Show that the
closure of an open ball B(x, r) = {y E !vI : p(x, y) < r} need
not equal the corresponding closed ball B(x, r) = {y EM: p(x,
y) :::; T}.
Problem 4.3. Let a and b be real numbers with a < b. Let
C[a, b] denote the set of all realvalued functions that are con
tinuous on [a, b] and consider a metric don C(a, b] defined by
d(j, g) = max li(t)  g(t)l·
tEla. b]
Show that if {in }nEN is a sequence of points in C[a, b] such that,
for t E [a, b], fn(t) + 0 as n + 00 then it need not follow that
d(jn' 0) + 0 as n + OC).
Problem 4.4. Consider the set Q of all rational nnmbers
endowed with a metric d given by d(x, y) = Ixyl. This metric
space is called the rational line. Show that the rational line is
not complete.
7.5 Distributions & Probabilities
Problem 5.1. Assume that Band C are random variables
possessing a joint probability density function given by
i E, C (b, c) = {I0
for 0 :::; b :::; 1 and 0 :::; c :::; 1
otl' len,VIse.
.
\Vhat is the probability that the roots ofthe equation x2+2Bx+
C = 0 are real?
www.MathGeek.com
www.MathGeek.com
Distributions &. Probabilities 175
Problem 5.2. Consider a random variable X that has a uni
form distribution on (0, 1). ·What is the probability that the
first digit after the decimal point in VX will be a 3?
Problem 5.3. Consider a random variable X that has a
continuous, stridly increasing, positive probability distribution
function Fx. What is the probability distribution function of
the random variable Z = Fx(X)?
Problem 5.4. Consider a random variable X that has a
continuous, strictly increasing, positive probability distribution
function F. Find a probability density function for the random
variable Y = In(F(X)).
Problem 5.5. Consider random variables X and Y, let Fx
denote the distribution fundion of X, let Fy denote the distri
bution fundion of Y, and let F x , y denote the joint distribution
function of X and Y. Let Z = max{X, Y} and let W = min{X,
Y}. Find the distribution of Z and the distribution of ~V in
terms of F x , F y , and Fx,Y.
Problem 5.6. Consider real numbers Xl, X2, Yl, and Y2 such
that Xl ::; X2 and Yl ::; Y2. Show that if F(x, y) is a joint
probability distriblltion fundion then it mllst follow that
Show that the function
0 if X +Y < 1
G(x, y) = { 1 1·f· x+y ~
1
is not a joint probability distribution fundion.
Problem 5.7. Find the marginal probability density function
.fx if the joint probability density function .fx. y is uniform on
the circle of radius one centered at the origin.
www.MathGeek.com
www.MathGeek.com
176 Problems
7.6 Independence
Problem 6.1. Consider a probability space (n, F, P) and two
events A and B from F that are independent. Show that Ae and
Be are also independent events.
Problem 6.2. Assume that a dart is thrown at a circular
dart board having unit area in such a way that the probability
the dart lands in any particular circular region of the board is
given simply by the area of that region. Let (X, Y) denote the
coordinates of the dart's position on the board after one throw.
Are the random variables X and Y independent? Explain.
Problem 6.3. Consider a toss of two fair dice. Let A denote
the event that the number appearing on the first die is even. Let
B denote the event that the nnmber appearing on the second
die is odd. Let C denote the event that the numbers on the two
die are either both even or both odd. Are the events A, B, and
C mutually independent? Explain.
Problem 6.4. Consider a monkey that is seated at a type
writer and who makes a single keystroke each second. Assume
that the keystrokes are mutually independent events. (Is this
a good assumption for a human typist?) Further, assume that
the set of all possible outcomes of a keystroke include all low
ercase and uppercase English letters, the numbers zero through
nine, all punctuation, and a space. Assume that each possible
outcome of a keystroke has a fixed positive probability of be
ing typed. The typewriter never fails, the monkey is immortal,
and there is an endless stream of paper. (All standard assump
tions!) Prove that, with probability one, the entire script of the
play Hamlet by 'William Shakespeare will be typed an infinite
number of times.
www.MathGeek.com
www.MathGeek.com
Random Variables 177
Problem 6.5. Let X and Y be random variables possessing a
joint probability density function given by f (x, y) = 2 exp( x
y) for 0 < x < y < 00. Are X and Y independent?
Problem 6.6. Assume that missiles are fired at a target in
such a way that the point at which each missile lands has a uni
form distribution on the interior of a dis(; of radius 5 miles (;en
tered aronnd the target. If we assume that the points at whi(;h
the missiles land are rrmtnally independent, then how many mis
siles rrmst we fire to ensnre at least a 0.95 probability of at least
one hit not more than one mile from the target?
7.7 Random Variables
Problem 7.1. Consider a random variable X defined on a
probability space (0, :.F, P) such that X(w) = 87 for each w E
n. VVhat is IT(X)?
Problem 7.2. Consider a probability space (lR, B(lR) , P)
where P is any probability measnre on (lR, B(lR)). Let X
be a random variable defined on this space via X (w) = w2 .
(Note that such a definition is possible only because we have let
n = JR.) vVhat is cr(X)?
Problem 7.3. Let X and Y be random variables su(;h that
E[(X  y)2] = o. Show that X = Y a.s.
Problem 7.4. Consider a random variable X with probability
density function
1
fx(x) = "2 exp( Ixl)
for x E JR. Use Chebyshev's inequality to find an upper bound.
on the probability that IXI > 2. ·What is the actual value of
that probability?
www.MathGeek.com
www.MathGeek.com
178 Problems
Problem 7.5. Consider a random variable X defined on a
probability space (D, F, P). Let 9 : JR. + JR. be a Borel mea
surable function and let Y = g(X). Show that cr(Y) C cr(X).
\Vhen will cr(Y) = cr(X)?
<) Problem 7.6. Consider a random variable X. Show that
cr(X) is countably generated.
<) Problem 7.7. Prove that a function X mapping a measurable
space (D, F) into (JR., B(JR.)) is a random variable if and only if
the set {w ED: X (w) :::::; x} is an element of F for each x E R
Problem 7.8. Let X and Y be integrable random variables
defined on (0, F, P). Show that X = Y a.s. if and only if
for all F E F.
Problem 7.9. Consider independent random variables X and
Y such that each has a uniform distribution on the interval [0,
2]. Find E[IX  YI].
Problem 7.10. For a positive integer TL, let Xl, ... , Xn be a
collection of mutually independent, identically distributed ran
dom variables each with a uniform distribution on the interval
[0, e] for some fixed positive real number e. If Z = max{XI'
... , Xn} then what is E[Z]?
Problem 7.11. Consider a random variable X whose charac
teristic function Cx(t) is such that C x (2) = 0. For a fixed real
number s, find E[cos(X + s) cos(X + s + 1)].
www.MathGeek.com
www.MathGeek.com
Moments 179
7.8 Moments
Problem 8.1. Consider a nonnegative, integrable random
variable X defined on a probability space (0, F, P). Show that
E[X] = 10= P(X > t) dt.
Problem 8.2. Consider a random variable X with a finite
second moment. Show that E[(X  m)2] is minimized over all
m E Jl{ when Tn = E[X].
Problem 8.3. Consider random variables X and Y with finite
second moments. Show that E[XY] is finite.
Problem 8.4. Consider random variables X and Y with finite
second moments. Show that COV[X, Y] = E[XY] E[X]E[Y].
Problem 8.5. Consider random variables X and Y with finite
second moments. Show that Ip(X, Y)I :::; 1.
<) Problem 8.6. Consider random variables X and Y with finite
second moments. vVhat can be said about X and Y if p(X,
Y) = ±1?
Problem 8.7. Let Y be a random variable with a uniform
distribution on [a, b] where a < b. \iVhat is VAR[Y]?
Problem 8.8. If X is Poisson with parameter A > 0 then what
is VAR[X]? Let Xl, X 21 X 3 , and X 4 be mutually independent,
Poisson random variables ea<:h with a mean equal to 3. Let
Y = 4XI + X 2 + 6X3 + 3X4 . What is VAR[Y]? (The Poisson
distribution is defined in Example 5.11 on page 105.)
www.MathGeek.com
www.MathGeek.com
180 Problems
Problem 8.9. Consider random variables X and Y such that
each has a finite positive second moment. Find a real number a
for which E[(X  ay)2] is minimized.
Problem 8.10. Let Xl, ... , Xn be mntnally independent ran
dom variables each with variance (}2 and mean JL. Find the
correlation coefficient between 2:7=1 Xi and Xl.
Problem 8.11. Let X be a random variable with a Poisson
distribution having parameter A. Find E[t X ] for t E lit (The
Poisson distribution is defined in Example 5.11 on page 105.)
Problem 8.12. For an integer n > 1, let Xl, ... , Xn be
mutually independent random variables that are uniformly dis
tributed over the interval (1, 1). Find the <:haraderistk fun<:
tion for the sum Xl + ... + X n.
<) Problem 8.13. Let <I>(t) be the characteristic function of a
random variable that possesses an even probability density func
tion. Show that 1 + <I> (2t) ;:::: 2<I> 2 (t) for all t E lit
Problem 8.14. Let X denote the nnmber of 'Heads' that
o<:<:nr when a fair min is flipped twke. vVhat is the moment
generating function of X? Find E[xn] for n E N.
Problem 8.15. Consider independent random variables X
and Y such that
wp 1/3
wp 1/3
wp 1/3
and
Y = {0 wp 1/3
1 wp 2/3.
Let Z be a random variable such that
if X +Y = 0 or X +Y = 3
if X +Y = 1
if X +Y = 2.
www.MathGeek.com
www.MathGeek.com
Transformations of Random Variables 181
Find 1\IIx(s) (the moment generating function of X), Alz(s) (the
moment generating function of Z), and 1\IIx+z(s) (the moment
generating function of X + Z). Is l\1x+z(s) = 1\IIx(s)Mz(s)?
Are X and Z independent random variables?
Problem 8.16. Consider a random variable e with a uniform
distribution on [0, 21T]. Let X = cos(8) and let Y = sin(8).
Are X and Y uncorrelated? Are X and Y independent?
<) Problem 8.17. Let Xl, X2, Yl, and Y2 be real numbers such
that Xl # X2 and Yl # Y2. Consider random variables X and Y
defined on the same probability space such that P(X = Xl) +
P(X = X2) = 1 with P(X = xd > 0 and P(X = X2) > 0 and
such that P(Y = Yl) + P(Y = Y2) = 1 with P(Y = Yl) > 0 and
P(Y = Y2) > o. Prove or Disprove: If X and Yare uncorrelated
then X and Yare independent.
7.9 Transformations of Random Vari
ables
Problem 9.1. The radius of a circle is approximately mea
sured in such a way that the approximation has a uniform dis
tribution in the interval (a, b) where a > b > O. Find the
distribution of the resulting approximation of the circumference
of the circle and of the resulting approximation of the area of
the circle.
Problem 9.2. Let X and Y be independent random variables
with densities
1 1
fx(x) = 
1["
VI x 2; Ixl <1
and
fy(y) =
Y
0"2 exp
(_y 2
20"2
)
; Y> 0,
www.MathGeek.com
www.MathGeek.com
182 Problems
respectively. Find the distribution of the product XY.
Problem 9.3. Let X and Y be independent random variables
each with a density fundion given by f(x) = exI[o,=)(x). Let
HT = Xj(X + Y). What is the distribution of l)V?
Problem 9.4. Consider a positive random variable X with
density fundion fx. Find a density fundion for IjX.
7.10 The Gaussian Distribution
Problem 10.1. Consider random variables X and Z that
are defined on the same probability space (0, :F, P). Assume
that X has a standard Gaussian distribution and that Z has a
Gaussian distribution with mean 5 and variance 4. Find a real
number a such that P(X > a) = P(Z < 2.44).
Problem 10.2. Let X be a Gaussian random variable with
mean m and variance (}2. \Vhat is E[X3] in terms of rn and (}2?
\Vhat is E[X97] if m = 0 and iT 2 = 38?
Problem 10.3. For a fixed positive integer n, let Z and
Xl, ... , Xn be zero mean, unit variance, mutually independent
Gaussian random variables. Let
and let
w~ ~t,Xf
The random variable W has a density function fw that you do
not need to find. Instead, find an expression for a density func
tion of T in terms of fw and fz where fz is a density fnndion
for Z.
www.MathGeek.com
www.MathGeek.com
Convergence 183
Problem 10.4. Let X be a standard Gaussian random vari
able, and let Z be a random variable that takes on the values 1
and 1 each with probability~. Assume that X and Z are inde
pendent, and let Y = X Z. Show that Y is a standard Gaussian
random variable. Is X + Y a Gaussian random variable? Are X
and Y uncorrelated? Are X and Y independent?
Problem 10.5. Let Xl and X 2 be zero mean, unit variance,
mutually Gaussian random variables with correlation coefficient
1/3. Let X denote the random vector [Xl X2]T. Find a real
2 x 2 matrix e so that the random vector Z = ex is composed
of independent Gaussian random variables.
Problem 10.6. Let X be a Ganssian random variable with
mean 'ml and variance tTi. Let Y be a Ganssian random variable
with mean m2 and variance tT5. Assume that X and Yare
independent and find the distribution of X + Y.
Problem 10.7. Let 11 be a N(ml' aD density function and let
12 be a N(m2' 0"5) density function. Consider a random variable
X that has a density given by )..JI(x) + (1 )..)12(x) where 0 <
).. < 1. Find the moment generating function for X, the mean
of X, and the variance of X.
Problem 10.8. Let X and Y be independent Gaussian ran
dom variables each with mean zero and variance one. Find
E[max(X, Y)].
7.11 Convergence
Problem 11.1. Define a sequence of mutually independent
random variables as follows:
with probability.!.n
with probability 1  1n
www.MathGeek.com
www.MathGeek.com
184 Problems
for n E N. Does Xn Jo a a.s.? Explain.
Problem 11.2. A random variable X is said to have a Cau<:hy
distribution <:entered at zero with parameter a > a if X has a
density given by
a
f x (x) = ::::
2 1I(a 2+x )'
The characteristic function Cx(t) of X is given by Cx(t) =
exp( altl). Let {Xn}nEN be a sequence of mutually independent
random variables each having a Cauchy distribution centered at
zero with parameter a = 1. For a fixed positive integer n, let
Sn = Xl + ... + X n. \iVhat is the distribution of Sn/n?
Problem 11.3. Show via an example that a seqnen<:e of ran
dom variables may converge in probability without converging
in Lp for any p > 1.
Problem 11.4. Let c be a real constant. Show that Xn Jo C
in distribution if and only if Xn Jo C in probability.
Problem 11.5. Consider a sequence {Xn},,,EN of mutually
independent random variables each with a uniform distribution
on the interval (0, 1]. For each positive integer n, let Zn =
n(1  max(XI, ... , Xn)). Does Zn <:onverge in distribution? If
so then to what distribution does FZn converge?
Problem 11.6. Consider a nnmerkal s<:heme in whkh the
roundoff error to the second decimal place has the uniform dis
tribution on the interval (0.05, 0.05). 'What is an approximate
valne of the probability that the absolnte error in the S11m of
1000 s11<:h nnmbers is less than 2?
Problem 11. 7. If you toss a fair coin 10, 000 times then what
(approximately) is the probability that you will observe exactly
5000 heads?
www.MathGeek.com
www.MathGeek.com
Conditioning 185
Problem 11.8. Let {Xn}nEN be a sequence of second order
random variables defined on (0, F, P) and let a be a real num
ber. Find conditions on E[Xn] and VAR[Xn ] that are both suf
ficient and necessary to ensure that Xn + a in L 2 .
7.12 Conditioning
Problem 12.1. Let U and V be independent random variables
each with a zero mean, unit variance Gaussian distribution. Let
X = U +V and Y = U  V. Show that X and Yare independent
random variables each with a zero mean Gaussian distribution
having a variance equal to 2. Find E[XIU] and E[YIU]. Are
E[XIU] and E[YIU] independent random variables?
Problem 12.2. Show via em example that E[XIY] = E[X]
need not imply that X and Yare independent.
Problem 12.3. Let X and Y be independent, zero mean
random variables and let Z = XY. Assume that Z has a finite
mean. Find E[ZIX]' E[ZIY]' and E[ZIX, Y].
Problem 12.4. Consider the probability space ([0, 1], B([O,
1]), ).) where ). denotes Lebesgue measure. Consider subsets of
[0, 1] given by A = [0, 1/4]' B = (1/4, 2/3]' and C = (2/3,
1]. Let F be the O"algebra on 0 given by O"({A, B, C}) and let
X(w) = w2 for W E [0, 1]. Find E[XIF].
Problem 12.5. Consider second order random variables X,
Y, and Z defined on the same probability space. Show that if
X and Z are independent and if X and Yare independent then
E[XZIY] = E[X]E[ZIY] a.s.
www.MathGeek.com
www.MathGeek.com
186 Problems
Problem 12.6. Consider a sequence {Yn}nEN of mutually in
dependent random variables each with mean zero and positive
variance a 2 . For each positive integer n, let
Problem 12.7. Let X and Y be second order random variables
defined on the same probability space. The conditional variance
of X given Y is denoted by VAR[XIY] and is defined by
VAR[XIY] = E[(X  E[Xly])2IY].
Show that
VAR[X] = E[VAR[XIY]] + VAR[E[XIY]].
Problem 12.8. Let X and Y be random variables defined on
the same probability space and assume that E[X2] < 00. Let 9
be a Borel measurable function mapping lR to R Show that
E[(X  g(y))2] = E[(X  E[Xly])2] + E[(E[XIY] g(y))2].
For what such function 9 is E[(X  g(y))2] minimized?
Problem 12.9. Let 0 = {I, 2, 3, 4, 5, 6} and let F be the
power set of O. Define a probability measure P on (0, F) by
letting P({w}) = 1/6 for each wE O. Let Q = a({{l, 3, 5}})
and let X(w) = w for each w E O. Find E[XIQ].
Problem 12.10. Consider a probability space (0, F, P) and
let 0 1 , ... , ON be disjoint measnrable subsets of 0 s11ch that
0= 0 1 u· .. U ON and s11ch that P(D i ) > 0 for each i. Let Q be
the aalgebra on 0 generated by 0 1 , ... , ON and let X be an
integrable random variable defined on (0, F, P). Find E[XIQ]
for all w E Oi.
www.MathGeek.com
www.MathGeek.com
True/False Questions 187
Problem 12.11. Let X be a random variable defined on (0,
F, P) such that E[X2] is finite. Let 91 and 92 be crsubalgebras
of F. If Y = E[XI91] a.s. and X = E[YI92] a.s. then show that
X = Y a.s.
Problem 12.12. Let X be a random variable with mean 3
and variance 2. Let Y be a random variable sl1ch that E[Y] = 4
and E[XY] = 3. If E[YIX] = a + lJX a.s. then find a and lJ.
<) Problem 12.13. Let X and Y be zero mean, positive vari
ance, mutually Gaussian random variables possessing a corre
lation coefficient p such that Ipl < 1. Show that E[X2y2] =
E[X2]E[y2] + 2(E[Xy])2.
Problem 12.14. Consider a sequence {Y1 , 1'2, ... } of mutu
ally independent random variables that are defined on the same
probability space and that each have a mean of 1. For each
positive integer n, let Xn = Y1 Y2 ... Yn . Find
and
where j < m < n.
Problem 12.15. Consider random variables X and Y pos
sessing a joint probability density f11nction
f(x, y) = 8x'y
' if 0 <
 x. < _.Y <
 1 and 0 <  x
{
o otherwIse.
Find E[XIY] and E[YIX].
7.13 True/False Questions
A statement that is not always true should be considered false.
For example, the statement "If x 2 = 4 then x = 2" is a false
statement.
www.MathGeek.com
www.MathGeek.com
188 Problems
1. The set IR is a subset of the set IR2.
2. There exists a function f:IR 7 IR such that f:A 7
A is a bijection for any nonempty subset A of R
3. Any O"algebra is a Asystem.
4. Consider a complete measure space (D, F, P)
and let A be an element of F. If B c A then B E F.
5. Consider a probability space (D, F, P) such that
nand IR are equipotent. If x:n 7 IR is bijective then X
is a random variable defined on (n, F, P).
6. Consider two independent random variables X
and Y defined on a probability space (n, F, P). There
does not exist a set A s11ch that A E iJ(X) n iJ(Y) and
0< P(A) < 1.
7. If the second moment of a random variable X
exists then the first moment of X must also exist.
8. If .f:IR 7 IR is constant a.e. with respect to
Lebesgue measure then f is Riemann integrable.
9. Consider a nonempty set n and two subsets F
and 9 of JID(n). If F n 9 = 0 then O"(F) i= 0"(9).
10. The expected value of an integrable random vari
able must be an element of the range of that random vari
able.
11. If two sets A and B are such that A is a subset
of B then there always exists an element x in the set B
that is not in the set A.
www.MathGeek.com
www.MathGeek.com
True/False Questions 189
12. There exists a nonempty set n such that the
power set of n is the smallest cralgebra that contains n.
13. A probability measure is always a crfinite mea
sure.
14. The infimum of a set of positive real numbers
must itself be a positive real number.
15. The collection ofreal I30rel sets is the smallest (J
algebra on the real line that contains every closed interval.
16. _ _ __ There exist two subsets A and B of JR: s11ch that
B is a Lebesgue null set, such that A is a subset of B, and
such that A is not an element of M(JR:).
17. It is possible for a random variable to be inde
pendent of itself.
18. A random variable X possessing an even proba
bility density function must have a mean equal to zero.
19. It is possible for disjoint events to be indepen
dent and it is possible for disjoint events not to be inde
pendent.
20. A Lebesgue measurable subset of the real line
that is not countable must have positive Lebesgue mea
sure.
21. _ _ __ If X and Yare Gaussian random variables then
X + Y must be a Gaussian random variable.
22. _ _ __ If X and Yare uncorrelated Gaussian random
variables then X and Y must be independent random vari
ables.
www.MathGeek.com
www.MathGeek.com
190 Problems
23. If X and Yare independent Gaussian random
variables then X +Y must be a Gaussian random variable.
24. A function mapping the real line to a finite sub
set of the real line must be Riemann integrable.
25. Consider a probability space (n, F, P), a ran
dom variable X defined on this space, and a crsubalgebra
Q of F. The conditional expectation E[XIQ] must be F
measurable.
26. If all of the sample paths of a random process are
continuous then all of the sample paths of a modification
of that process must also be continuous.
27. Two distinct second order random processes
must possess distinct auto covariance functions.
28. There does not exist a random variable with a
first moment equal to J2 and a second moment equal to 1.
29. Let n be a nonempty set and let f be a function
mapping n to R There always exists a ITalgebra F on
n so that f is a measurable mapping from (n, F) to (JR,
B(JR)).
30. If 1\2 f dp is a Lebesgue integral then p must be
Lebesgue measure.
31. Consider two random variables X and Y defined
on a probability space (n, F, P). If E[X  Y] = 0 then
P(X = Y) = 1.
32. Consider two random variables X and Y defined
on the same probability space. If E[X + Y] < 00 then
E[X] < 00 and E[Y] < 00.
www.MathGeek.com
www.MathGeek.com
True/False Questions 191
33. Consider a function 9 lR. Jo lR. and a random
variable X defined on (D, F, P). The function g(X) will
always be a random variable defined on (0, F, P).
34. If a random variable X is equal almost surely to
a certain mnditional expedation then X mllst be a version
of that mnditional expedation.
35. Consider random variables X and Y defined on
the same probability space. If X = Y a.s. then (J(X) =
(J(y).
36. Consider a random variable X that possesses
an absolutely continuous probability distribution function,
and let 9 be a Borel measurable fundion mapping lR. to R
The random variable g(X) must also possess an absolutely
mntinuous probability distribution fundion.
37. There exists a probability density function f
such that the supremum of the set {.f(x) : x E lR.} is not
finite.
38. Consider two random variables X and Y defined
on the same probability space. If X is (J(Y)measurable
then O"(X, Y) = O"(X).
39. A set may be equipotent to a proper subset of
itself.
40. Let D be a set containing at least two elements
and let F and 9 be two distinct (Jalgebras on D. The set
F u 9 is never a (Jalgebra on D.
www.MathGeek.com
www.MathGeek.com
192 Problems
www.MathGeek.com
www.MathGeek.com
8 Solutions
8.1 Solutions to Exercises
1.1. Yes, {0} is a set containing one element and 0 is the set
containing no elements.
1.2. Since the only subset of the empty set is the empty set itself
it follows that {0} is the power set of 0.
1.3. Assume that A c B. If A U B is empty then B is empty
and hence Au B = B. Assume that Au B is not empty and let
x E A U B. By definition of union, it follows that either x E A
or x E B. If x E A then x E B since A c B. Thus, x E Band
we conclude that A U B c B. If B is empty then A is empty
and hence Au B = B. Assume that B is not empty and let
x E B. Then x E Au B which implies that B c Au B. Thus,
we conclude that A U B = B.
Assume that Au B = B. If A is empty then A c B for any set
B. Assume that A is not empty and let x E A. Then x E Au B
which implies that x E B since Au B = B. Thus, we conclude
that A c B.
1.4. The first function is not onto and not onetoone. The
second function is onto but not onetoone. The third function
is onetoone but not onto. The fourth function is bijective with
inverse 1 1 (x) = yIX.
1.5. Choose some b E B and note that since 1 is onto there
exists some a E A such that 1(a) = b. Since 1 is bijective it
follows that 11 ({b}) = {a}; that is, 11 (b) = a. Substitution
thus implies that lU 1 (b)) = b.
Choose some a E A and let 1(a) = b. As above, note that
11(b) = a. Substitution thus implies that l 1U(a)) = a.
1.6. There does not exist a bijection from R into S since no
function from R to S can be onetoone. There does not exist
a bijection from S into R since no function from S to R can be
onto.
www.MathGeek.com
www.MathGeek.com
194 Solutions
1. 7. Yes, if ] : A + Band] is bijective then ]1 : B + A is a
bijection from B to A. That is, ]1 is onto since] is defined on
all of A and ]1 is onetoone since if ](a) = 61 and ](a) = 62
then 61 = 62 . (That is, if 61 i= 62 then ]1 (6 1 ) cannot be equal
to ]1(62 ).)
1.8. Consider a countable set C and a subset B of C. Since C
is countable there exists a bijection f mapping C to a subset
N of the positive integers. Let 9 mapping B to ](B) be the
restriction of ] to B; that is 9 = ] on Band 9 is undefined on
C \ B. Note that 9 is onto since it maps B to f(B) and that 9
is onetoone since] is onetoone. Thus, B is countable since
9 is a bijection from B to ](B) c N.
1.9. For notational simplicity, assume that all of the A;'s are
countably infinite. For each i E N, let Ai = {ai, a~, ... }. (For
example, if ] is a bijection £I·om Ai to N then we could simply
choose aj such that f (aj) = j.) Note that we may arrange the
a~ in matrix form as:
Define a sequence {6i }iEN by selecting elements from the above
array in the following manner:
61 63 66
62 65 69
64 68 613
Note that this sequence defines a bijection from the union of the
Ai'S to N. That is, the countable union is itself countable.
1.10. Yes. This is the smallest possible algebra or O"algebra on
n.
1.11. Yes. This is the largest possible O"algebra on n since it
contains every subset of n.
1.12. Five O"algebras on n are {0, n}, JPl(n) , {0, n, {I}, {2,
3}}, {0, n, {2}, {I, 3}}, and {0, n, {3}, {l, 2}}.
www.MathGeek.com
www.MathGeek.com
Solutions to Exercises 195
1.13. Consider nonempty sets ~ and I, and for each i E I let Ai
be a cralgebra on~. Let A denote the intersection of the A;'s
for i E I. (That is, A E A if and only if A E Ai for each i E I.)
First, note that ~ E A since ~ E Ai for each i E I. Second,
note that if A E A then A and hence AC is in Ai for each i E I
which implies that AC E A. Finally, assume that An E A for
each n E N. Then An E Ai for each n E N and each i E I. Thus,
UnEN An E Ai for each i E I which implies that UnEN An E A.
Note that a union of cralgebras need not be a cralgebra. Let
:F = {A, AC, 0, D} and g = {B, BC, 0, ~}. Note that :F U g
(generally) does not include A U B.
1.14. First, note that D E A since ~c = 0 is finite. Second, note
that if A E A then either AC is finite or has a finite complement
and hence AC E A. Further, note that if A and B are in A
then Au B is finite if A and B are each finite and Au B has
finite complement if either A or B has finite complement since
(A U B)C = AC nBC. Thus, A is dosed under finite unions, and
hence A is an algebra. To see that A is not a O"algebra let
Ai = {i} for i E N and note that UiEN Ai = N which is neither
finite nor has a finite complement.
1.15. First, note that ~ E A since ~c = 0 is finite, and hence
countable. Second, note that if A E A then either AC is count
able or has a countable complement and hence Ac E A. Now,
let A; for eachi E N be an element from A. If each of the Ai'S is
countable then so is their countable union. If one or more of the
Ai's is cocountable then (by DeMorgan's Law) it follows that
their countable union is also cocountable. In each case, we see
that the union of the A;'s is in A. Thus, A is both an algebra
and a cralgebra.
1.16. They each equal {0, ~}, but for different reasons. The cr
algebra 0"(0) is the smallest O"algebra on ~ that contains every
set in 0. Since there are no sets in 0, cr(0) is simply the smallest
cralgebra on D, which is {0, D}. The cralgebra cr( {0}) is the
smallest cralgebra on ~ that contains 0, which again is {0, ~}.
1.17. This again is simply {0, ~}.
1.18. This is {A, AC, ~, 0}.
www.MathGeek.com
www.MathGeek.com
196 Solutions
1.19. Note that IT({A, B}) = {O, 0, A, AC, B, BC, AUB, (AU
B)C, A U BC, B \ A, B U AC, A \ B, AC U BC, A n B, A D. B,
(A D. B)C}.
1.20. Yes, 0 E F and 0 C F.
1.21. No. Let A = lR and let A = {0, A}. Further, let f: lR+
lR via f(x) = 3 for all x E R Then f(A) = {0, {3}} is not a
ITalge bra on lR since lR t/:. f (A).
For another example, let A = lR and let A = {0, A, {5}, {5y}.
Further, let f : lR + [1, 1] via f(x) = sin(x). Then f(A) =
{0, [1,1]' {sin(5)}} which is not a O"algebra on [1, 1] since
it does not contain {sin(5)}c.
2.1. Yes. Since every real number is an upper bound of 0
it follows that the least upper bound (or supremum) of 0 is
00. Since every real number is a lower bound of 0 it follows
that the greatest lower bound (or infimum) of 0 is 00. Thus,
sup 0 < inf 0.
2.2. Note that
(lim sup A~)C
[kOl nQk A~1 C
kQl [rQk A~l c
= =
u n Am
k=l m.=k
liminf An.
2.3. Assume that lim inf An is not empty and note that
= =
w E lim inf An ::::} W E U n Am
k=l m=k
::::} 3N st WEn= Am
rn=N
=
::::} W E U AmVk
m=k
www.MathGeek.com
www.MathGeek.com
Solutions to Exercises 197
: : } wEn= 00
U Am = lim sup An·
k=l m=k
2.4. Recall that lim inf An consists of all those points that belong
to all but perhaps a finite number of the An's. Let a be a positive
real number. Choose a positive integer N so that liN < Ct. Note
that a ¢:. An when n is an even integer greater than N. Since
there are an infinite number of such n's it follows that Ct cannot
be in lim inf An. A similar argument implies that no negative
real number is in lim inf An. Note, however, that since 0 E An
for each 11, it follows that 0 E lim inf An. Thns, we conclude that
lim inf An = {O}.
Recall that lim sup An consists of all those points that belong to
infinitely many ofthe An's. Note that any real number from the
interval (1, 0] is in An for any even integer n, and that any
real number from the interval [0, 1] is in An for any odd integer
n. Further, any real number outside of these intervals is not in
An for any n. Thus, lim sup An = (1,1].
2.5. Note that this exercise asked you to try to find a nonBorel
set. In particular, you could have sucessfully completed this
problem without actually finding such a set!
The purpose of this exercise is to convince the reader that con
structing a nonBorel subset of the real line is not a trivial task.
Since the construction of such a set at this point would take us
rather far afield, a nonBorel set will not be presented here. For
many examples, see the book Counterexamples in Pmbability
and Real Analysis by Gary Wise and Eric Hall.
A proof for the e.Tistence of a nonBorel set is not quite as dif
ficult. It follows immediately from the fact that the set of real
Borel sets is equipotent to Itt
2.6. To begin, we will show that any singleton subset of lR is a
real Borel set. Note that, for any x E lR,
n
{x} = = ( x  , x1 + 1) .
n=l n n
Thus, since {x} is a countable intersection of bounded open
intervals it follows that {x} must be an element of 8(lR), the
smallest O"algebra containing every bounded open interval.
www.MathGeek.com
www.MathGeek.com
198 Solutions
Now, let C be a countable subset of R Since C is countable
we may enumerate its elements as a sequence {C1' C2, ... }. Note
that
=
n=l
Thus, C is a countable union of sets from B(JR) it follows that
C must also be an element of B (JR).
2.7. A function 1 : JR 7 JR is continuous if and only if 11 (U) is
open for every open subset U of R Further, a function 1 : JR 7
JR is Borel measurable if and only if 1 1 ((00, x)) is a Borel
set for each x E R Since (00, x) is open for each x E JR and
each open subset of JR is a Borel set, the desired result follows
immediately.
2.8. The Cantor ternary set is an uncountable subset of JR that
has Lebesgue measure zero.
2.9. Dirac measure on a single point will yield the power set of
the reals when completed.
3.1. Let 9 denote the collection of all subdivisions of [a, b] and
recall that V = sup{ S(r) : rEg} where
s(r) = L 11(ai)  l(ai1)1
i=l
ifr = {ao, aI, ... , am}. Since 11(x)  I(Y)1 s Clx  YI for all
x and Y in [a, b] it follows that
Tn
S(r) S CLai  ai1 = C(b  a)
i=l
for any rEg and hence that V = C(b  a).
5.1. Since the set {w En: X(w) S n} converges to the empty
set as n 7 00 it follows from Lemma 2.1 that F(n) 7 0 as
n 7 00. From this the desired result follows immediately.
5.2. VVe must show that limyl x F(y) = F(x). Again, we can use
Lemma 2.1 since the set {w En: X(w) S x + (lin)} converges
to the set {w En: X(w) S x} as n 700.
5.3. Since P(X S x) = P(X < x) + P(X = x), the desired
result will follow if we show that P (X < x) = limyTx F (y ). Let
www.MathGeek.com
www.MathGeek.com
Solutions to Exercises 199
{Un}nEN be a strictly increasing sequence whose limit is x, and
let An = {w En: X(w) ~ Un}. Note that U~=l An = {w En:
X(w) < x}. Note further that
as n 7 00 since An C An+1 for each n E N. Thus, the desired
result follows from Lemma 2.1.
5.4. Recall that
Thus,
10= x f(x) dx = +00
and
1°= x f(x) dx = 00
which implies that the first moment does not exist. Note that
2. if n is even and n> 2 then lim xnf(x) = 00, and
x~±=
3. if n is odd and n > 1 then lim xn.f(x) = ±oo.
X7±=
Thus, the odd moments do not exist and the even moments are
infinite.
5.5. The only way that a Lebesgue integral of a measurable
fundion can fail to exist is if one encounters a sum of the form
00  00. This cannot occur if the measurable function is non
negative or nonpositive.
5.6. Note that
VAR[X] E[(X  E[X])2]
E[X2] 2E[XE[X]] + E2[X]
E[X2] 2E2[X] + E2[X]
E[X2] E2[X].
www.MathGeek.com
www.MathGeek.com
200 Solutions
5.7. Recall that X and Y possess a joint density of the form
fx,Y(x, y) = 2
1
J1 :2 exp
(q(X,2 y))
1!"iT1iT:2  P
where q(x, y) =
Further, recall that X has a density function fx given by
fx(x) = l f(x, y) dy.
The desired result follows after substituting, completing the
square, and integrating.
5.8. Recall that X and Y possess a joint density of the form
fx,y(x, y) = 2
1
J1 2 exp
(q(X,2 y))
1!"iT1 iT2  P
where q(x, y) =
Further, recall that
The desired result follows immediately after finding
II xy f(x, y) dxdy.
5.9. No, see Problem 11.3.
5.10. Consider a sequence {Xn}nEN of random variables defined
as follows on the probability space ([0, 1], 8([0, 1]), A) where A
is Lebesgue measure on 8([0,1]). Let Xl = 1[0,1/2], X 2 = 1[1/2,1],
X3 = 1[0,1/4], X 4 = 1[1/4,1/2], X5 = 1[1/2, :3/4], X6 = 1[3/4,1], X 7 =
1[0, l/S] , Xs = I[l/S, 1/4], ... , X 14 = I[7/S,1], X 15 = 1[0,1/16], etc.
www.MathGeek.com
www.MathGeek.com
Solutions to Exercises 201
Note that Xn does not converge to zero at any point in [0, 1]
even though E[IXn  OIP] = E[Xn] + 0 as n + 00 for any p > O.
5.11. Consider the probability space given by (0, 1), the Borel
snbsets of (0, 1), and Lebesgne measnre. Define a seqnence of
random variables on this space by setting Xn(w) = 2nI(o.1/n)(W)
for n E N. Note that Xn converges pointwise to zero as n + 00.
However,
l/n np 2np
E[IXn  OIP] = E[X~] =
l
o
2 dw =  ,
n
which goes to 00 as n + 00 for every p > O. Thns, the Xn's do
not converge to zero in Lp.
5.12. Since X is a random variable on (r2, F, P) it must be F
measurable, and thus satisfies the first property in the definition
of E[XIF]. Further, X trivially satisfies the second property of
that definition. Thus, E[XIF] = X a.s.
5.13. Since E[X] is a constant it is measurable with respect to
any O"algebra and thus satisfies the first property in the defini
tion of E[XI{0, r2}]. Further, note that
and that any integral over 0 is zero. Thns, E[X] satisfies the
second property in the definition of E[XI{0, r2}]. \Ve condnde
that E[XI{0, D}] = E[X] a.s. Note, however, that this eqnality
actually holds pointwise since the only null set in {0, r2} is the
empty set.
5.14. Note that
E[XY] E[E[XYIY]]
E[YE[XIY]]
E[YE[X]]
E[X]E[Y].
www.MathGeek.com
www.MathGeek.com
202 Solutions
8.2 Solutions to Problems
1.1. If At is empty then it follows immediately that
UtElR
UtElR At C UnEN An· Assnme then that UtElR At is not empty
and let x E UtElR At. Then there exists some y E lR snch that
x E A y. Let Tn be any positive integer such that Tn > y and note
that by assumption Ay C Am. Hence, x E Am. and x E UnEN An.
Thus, UtElR At C Un EN An.
If UnEN An is empty then it follows immediately that Un EN An C
UtElR At· Assume that UnEN An is not empty and let x E
UnEN An. Then, x E UtElR At since NcR Hence, UnEN An C
UtElR At and we conclude that in fact the two sets are equal.
1.2. Not necessarily. Simply let A be a subset of n that is not
in F. Then, A is not in 9 and hence 9 is not a O"algebra on A
1.3. Let B denote the set of positive, even integers and define
f: Z + B via f(O) = 2, f(n) = 4n if n E N, and f(n) = 41nl + 2
if n E N. Since f is bijedive it follows that Band Z are
equipotent.
1.4. To begin, we will show that an intersedion of (Jalgebras
on n is itself a O"algebra on n. Consider a nonempty set n
and a nonempty set A. For each ..\ E A assume that F).. is a
(Jalgebra on n and let M = n)..EA F)... Note that n E F).. for
each ..\ E A since F).. is a O"algebra on n for each ..\ E A. Hence,
n E M. Next, let A E M and note that A E F).. for each ..\ E A.
Hence, AC E F).. for each ..\ E A since F).. is a O"algebra on n
for each ..\ E A. Thus, A" E M and we see that M is dosed
nnder complementation. Finally, let An E .!\It for each n E N
and note that An E F).. for each ..\ E A and each n E N. Hence,
UnEN An E F).. for each ..\ E A since each F).. is closed under
countable unions. Thus, since this union must also be in M,
it follows that M is closed under countable unions. Combining
these three results we see that .!\It is itself a (Jalgebra on n.
Now, returning to the problem, let C denote the family of all
O"algebras on n that contain each element in F. Note that C
is not empty since lfD(n) E C. Let M denote the (Jalgebra on
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 203
n given by the intersection of all of the O"algebras in C. Note
that M contains every element in:.F. Further, assume that £., is
another O"algebra on n that contains every element in:.F. Since
£., E C it follows that Me£. Thus, M is the smallest O"algebra
on n that contains every element in:.F. That is, M = O"(:.F).
To show that M is unique, assume that Ml = O"(:.F) and that
M2 = O"(:.F). By definition of O"(:.F) it follows that Ml c M2
and that M2 c MI. Hence, we conclude that Ml = M 2; that
is, O"(:.F) is the unique such O"algebra on n.
1.5. The set :.F is an algebra on n but need not be a O"algebra
on n. Since nc = 0 is finite it follows that n is cofinite and
hence that n E:.F. If A E :.F then A is either finite or cofinite
and hence AC is either cofinite or finite. In either case, AC E :.F.
Finally, let A and B be elements of :.F. If A and B are each finite
then Au B is finite and hence is an element of :.F. If either A or
B is cofinite then either A" or BC must be finite which implies
that (A U B)C = AC n B" is finite and hence that (A U B)C is
in:.F. Since:.F is dosed under complementation it follows that
Au B E :.F and hence that :.F is an algebra on n.
To see that :.F need not be a O"algebra, let n = lR. and let
An = {n} for each n E N. Note that An is finite for eachn and
hence is an element of :.F for each Tl. However, Un;::!'! An = N
and N is neither finite nor cofinite. Thus, since :.F is not closed
under countable unions it follows that :.F is not a O"algebra on
n.
1.6. If y E f(11(A)) then y = f(x) for some x E fl(A). If
x E f1(A) then f(x) E A. Thus, since yEA we condnde that
f(11(A)) is a subset of A. If x E B then f(x) E f(B) and
hence x E fl(1(B)). Thus, B is a subset of fl(1(B)).
1.7. [(1) =? (2)] If y E f(A) n f(B) then there exists a E A
and b E B such that y = f(a) = f(b). Since f is onetoone
it follows that a = b E An B and hence that y E f(A n B).
Further, if A n B i 0 and y E f(A n B) then there exists
some point z E An B such that y = f(z). Since z E A and
z E B it follows that y E f(A) n f(B). Thus, it follows that
f(A n B) = f(A) n f(B).
[(2) =? (3)] This part is obvious since f(0) = 0.
www.MathGeek.com
www.MathGeek.com
204 Solutions
[(3) ::::} (1)] Let f(a) = f(b). If a i= b then {a} and {b} are
disjoint yet f( {a} )nf( {b}) is equal to {f(a)} which is not empty.
Hence f is onetoone.
1.8. Assume that f is onto and that BeY. If b E B then
there exists some a E X such that f (a) = b and hence snch
that a E f1(B). Thus, b = f(a) E f(f1(B)). This and
Problem 1.6 imply that fU 1 (B)) = B.
Next, assume that f(f1(B)) = B for every subset B of Y.
If y E Y then f(f1( {y})) is equal to {y} which implies that
f 1 ({y}) is not empty. Thus, f is onto.
1.9. Assume that the set S is countable and let the sequence
{a1' a2, ... } denote the elements in S. Construct a sequence /3
of 0' sand l' s as follows: Let the n th term in /3 be 0 if the n th
term in an is 1 and let thenth term in /3 be 1 otherwise. Note
that }6 is an element of S yet is different from an for each n EN.
This contradiction implies that the set S is not countable.
1.10. Fix n E N and note that every polynomial p(x) = ao +
a1x + ... + anx n with integer coefficients is nniquely deter
mined by the point (ao, a1, ... , Ll:n) from the countable set
zn+1. Thus, the set P of all such polynomials is countable and
we may list the elements of P as a sequence {P1, P2, ... }. The
fundamental theorem of algebra implies that the set Ak = {x E
lR. : Pk (x) = O} is a finite set for each k. Since a countable union
of finite sets is countable it follows that the set of all algebraic
numbers is countable.
1.11. A point x E lR. is said to be a point of condensation of a
subset E of lR. if every open interval containing x contains 11n
countably many elements of E. To begin, we will show that any
uncountable subset E of lR. has at least one point of condensa
tion.
Assume that there exists no condensation point of E. Then, for
each x E E there exists an open interval Ix such that x E Ix
and such that Ix n E is countable. Let J x be an open interval
such that J x C Ix, such that x E J x , and such that J x has
rational endpoints. Note that Jx n E is also countable. Further,
the collection of all such intervals Jx is countable and may be
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 205
enumerated as N 1 , N 2 , etc. Note that
(Xl
E= U NknE
k=l
which implies that E is countable. This contradiction implies
that E must have at least one point of condensation.
Now, let E be an uncountable set of positive real numbers and
let a be a condensation point of E. If a i 0 then let (0:, (3)
be an open interval containing a such that CI: > O. Let {Xn}nEN
be a sequen<:e of distind points in (0:, (3) n E and note that
2:~=1 Xn = 00 since Xn > CI: for each n. If a = 0 then since
(0, (3) = Uk=l 0:,
(3) it follows that some interval of the form
(~, (3) contains un<:01mtably many point of E. From this point,
we may pro<:eed as we did when a i O.
1.12. Since n E F we see that F is closed under complementa
tion. That is, if A E F then n \ A = Ae E F. Now, let A E F
and B E F. Then Be E F and hence A \ Be = A n B E F.
Thus, F is closed under finite intersections. De lVIorgan's Law
thus implies that F is also dosed under finite unions.
1.13. Re<:all that B(JR) is the smallest ITalgebra on JR <:ontain
ing all bounded open intervals. Let Q be the collection of all
bounded open intervals of JR with rational endpoints and note
that Q is countable. Further, note that (J(Q) is a subset of B(JR)
sin<:e Q is a snbset of the <:olledion of all bonnded, open inter
vals. Assume that (J(Q) is a proper subset of B(JR). Then there
must exist an open interval (x, y) that is not an element of (J(Q)
since B(JR) is the smallest (Jalgebra containing all such intervals.
Let {Xn}nEN and {Yn}nEN be sequences of rational numbers such
that Xn 1 x and Yn 1 Y with Xn < Yn for each n E N. Note that
since (x, y) = U~=l(xn, Yn) it follows that (x, y) E (J(Q). This
contradiction implies that IT(Q) = B(JR), and thus we see that
B(JR) is count ably generated.
1.14. Consider the ITalgebra F given by the countable and
cocountable subsets of R Note that F contains every singleton
subset of R Assume that there exists a (Jalgebra Q such that Q
contains every singleton subset of JR and su<:h that Q is a proper
subset of F. Let F E F with F tj. Q. Note that F can be written
www.MathGeek.com
www.MathGeek.com
206 Solutions
as a countable union of singleton sets or as a complement of such
a countable union. Thus, F E Q. This contradiction implies that
:F must be the smallest ITalgebra containing all singleton sets.
1.15. To begin, note that A \ B is nnconntable. Let C be a
conntable snbset of A \ B. Enumerate the elements of Band
C such that B = {b l , b2 , . . . } and C = {Cl C2, ... }. Finally,
l
consider a fnnction f : A \ B 7 A via
{~n
if x tj C
1(x) = if x = C2n
Cn if x = C2nl.
Note that 1 is onto and onetoone. Thus, we conclude that A
and A \ B are equipotent.
1.16. Let 1 : (0, 1] 7 [0, (0) via 1 (x) = (1  x) / x . Let y E [0,
(0) and note that 1(1/(1 +y)) = y. Thus, since 1/(1 +y) E (0,
1], we see that 1 is onto. Next, let a, b E (0, 1] with aib.
Since (1  a)/a i (1  b)/b we see that 1 is onetoone. Thus,
we conclude that (0, 1] and [0, (0) are equipotent.
1.17. No. Assume that M is a countably infinite ITalgebra on
a nonempty set 0 and, for each w E 0, let
A",= n
{MEM:"'EM}
All.
Note that there are at most only a countable number of distinct
A", 's since M is countable. If there are only a finite number of
A", 's then )\It is finite, which contradicts our assumption. How
ever, if there are only a conntably infinite number of distinct
A", 's then M must be nnconntable. To see why this last point
holds, consider an enumeration of the elements of M as {j1{l,
M 2 .. . }, consider an ennmeration of the distinct A",'s as {AI,
A2 ... }, and define
if Aj ct. 1\;lj
N = {Aj
J 0 if Aj C 1\;lj .
Note that N = U~l N j is different from j1{j for every j and
hence we conclude that M is not countable.
2.1. No. Let An = {n} for each n E N. Then N = U:=l An and
the An's are disjoint, but JL(N) = 00 i I:~=l p(An) = o.
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 207
2.2. Consider the measure space (lR, B(lR) , A) where A is
Lebesgue measure. Let An = (n, (0) for each n E Nand
note that the An's comprise a strictly decreasing sequence of
Borel sets. Further, the sequence converges to the empty set
since given any real number x there exists an integer m such
that x tj. An for anyn > m; i.e. limsupAn = 0. However,
A(An) = 00 for each n E N and hence A(An) /'t 0 as n + 00.
(What would happen if we required the measure p, to be a finite
measure?)
2.3. Let U = {(x, y) E lR 2 : x'2 + y2 < I} and recall that
liminfAn = u~=ln~nAk. Assume that (x, y) E liminfA n .
Then there exists some n E N such that (x, y) E nk=nAk. Note
that (x, y) E n~n Ak if and only if
(1)k)2 2
( X  k, +y <1
for all k 2 n since (x, y) E Ak if and only if
(I.)k) E U.
( xk,y
Since
(_I)k)2
( X  k' + y2 <1
for all k > n it follows that
for all k > n. Assume that x 2 + y'2 2 1. Then it follows that
for all k 2 TI, and hence that 2x(I)k 2 l/k for all k 2 TI.
This last resnlt, however, cannot be trne since the left hand side
alternates sign (or is zero) and the right hand side is always
positive. Thus we conclude that x 2 + y2 must be less than 1.
Hence (x, y) E U and thus liminf An C U.
www.MathGeek.com
www.MathGeek.com
208 Solutions
Now, assume that (x, y) E U, let E = 1  (x 2 + y2), and note
that E > O. Further, note that
( _I)k) 2 2x(I)k 1
x    +' 2 = x 2 _ +_ +' 2
( k Y k k2 Y
<
2 21xl 1 2 2
x +++y < x +++y = x +y + = lE+
2 1 2 2 2 3 3
 k k2 k k· . k k
since (x, y) E U and kEN. Thus, we see that
(_I)k)2
(X  k' + y2 < 1
if 3/k :::; E or if k 2 3/E. Thus, for n 2 3/E it follows that (x,
y) E Ak for all k > 17. Hence, U c lim inf An which combined
with the earlier result implies that lim inf An = U.
Let S = {(x, y) E ]R2 : x 2 + y2 :::; I} \ {(O, 1) U (0, I)}.
Recall that lim sup An = n~=l U~n Ak and assume that (x,
y) E lim sup An. Note that (x, y) E Uk=nAk for all 17 E N.
\Ve will first show that x 2 + y2 :::; 1. Let E = x 2 + y2  1 and
assume that E > O. Note that (x, y) E Ak if and only if
(1)k)2
(X  k' + y2 < 1
which is true if and only if
2x(I)k 1
E < k k2·
Since (x, y) E lim sup An we know that (x, y) E Uk=nAk for all
n E N. That is, for all n E N there exists some kEN such that
k 2n and such that (x, y) E A k. Note that (x, y) E Ak if and
only if
2x 1
E < k  k2
if k is even and if and only if
2x 1
E<Tk2
if k is odd. Assume that x 2 o. Then since E > 0 we see that
(x, y) fj. Ak for any odd value of k. Let n be an integer such
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 209
that n > 2 (x + 1) / t: and let k be any even integer not less than
n. Since (x, y) E lim sup An we see that there exists an (even)
integer rn such that rn 2: n and such that
2x 1
C"<
~ 'm ln 2 ·
From this we conclude that
mt: 1 rn£ nt:
x>+>>.
2 2m~ 22
Recall, however, that n > 2(x + 1)/t:. Hence, nE/2 > x + 1
and nt:/2 ::::; x. This contradiction implies that t: cannot be
positive when x 2: o. A similar procedure shows that t: cannot
be positive when x ::::; o. Thlls, we see that x 2 + y2 ::::; 1 if (x,
y) E lim sup An. Let (x, y) = (0, ±1). Then, (x, y) fj. Ak for
any kEN since
(_I)k)2 + y2 = ((_I)k)2
  + (±1)2 = 
1
+1> 1
(x  
k k k2
for all kEN. Hence, lim sup An C S.
Now, let (x, y) E S and consider x < 0 and k odd. Then,
Note that if k > 1/lxl then, since x < 0, it follows that x +
(l/k) < o. Hence, if k > 1/lxl and if k is odd then
(It)2 :2 :2 :2
<x +y ::::;1.
(x'k +y
Thus, for any n E N we can find some kEN such that k > n
and such that (x, y) E Ak if x < o. Hence, (x, y) E U~nAk
for all n E N if x < o. A similar argument shows that (x,
y) E Uk=n Ak for all 17 E N if x > o. Finally, if (0, y) E S
then (0, y) E U = lim inf An. Since lim inf An C lim sup An
we thus see that (0, y) E lim sup An Hence, we conclude that
S C lim sup An. Combined with our earlier result we see that
lim sup An = S.
www.MathGeek.com
www.MathGeek.com
210 Solutions
2.4. Let C be a subset of JR that is not a real Borel set. Let
At = {t} for each t E JR and note that '\(At) = 0 for each t E R
Let the index set I be given by C. Then UtE I At = C tj. B(JR).
That is, an arbitrary union of null sets need not be a measurable
set.
Let C be a real Borel set such that .\(C) > O. Let At = {t} for
each t E JR and note that '\(At) = 0 for each t E R Let the
index set I be given by C. Then UtEI At = C. That is, even
when an arbitrary union of null sets is measurable it need not
be a nnll set.
2.5. Let A be a countable subset of JR and note that, since A is
countable, we may express A as a countable union of singleton
sets; that is A = U:=I {an} where an E JR for each n. Recall that
singleton subsets of JR are Borel sets. Thus, A as a countable
union of Borel sets must also be a Borel set. Since the Borel
sets are a subset of the Lebesgue sets we conclude that A is
Lebesgue measurable. Let m denote Lebesgue measure on the
real line and the Lebesgue measurable subsets of the real line.
By countable subadditivity (or countable additivity if the an's
are distinct) we see thatm(A) must be zero since m({a n }) = 0
for each n.
2.6. Let A be a subset of JR such that A tj. B(JR). Define a
function f : JR + JR via f(x) = 2IA(x)  1. Note that f is not
Borel measurable since fI( {I}) = A tj. B(JR). However, If I = 1
is Borel measurable.
2.7. Let £ be the collection of all sets A E (J(P) such that
PI(A) = P2(A). Note that n E £ since PI and P2 are probability
measures. Further, if A E £ then AC E £ since H (AC) = 1 
PI(A) = 1  P2 (A) = P2 (AC). Finally, if An E £ for each n E N
and if the An's are disjoint then UnEl'i An E £ since
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 211
Thus, 1: is a Asystem. By assumption, P c 1: and P is a
1fsystem. Thus, the 1fA theorem implies that (}"(P) c £.
3.1. Note that fn(O) = 0 for all n E N, and let x E (0, 1].
Choose 17 E N s11ch that 1/17 < x and note that fm(x) = 0 for
all 'Tn ;:::: n. Hence, limn~oo fn(x) = 0 for all x E [0, 1]. Since
f01 fn(x) dx = 1 for all n it follows that limn>oo f~ fn(x) dx = 1.
The final integral is zero since the integrand is zero.
3.2. Clearly, JL maps JID(O) into [0, 00]. Indeed, it maps it into
the set {O, 1, 2, 3, 4}. Further, JL(0) = 0 since 0 contains zero
elements. Finally, if A and B are disjoint sets then JL(A U B) is
simply JL(A) + JL(B) , the number of points in Au B. Next, note
that
r
~1}
f dJL +
~2}
r
f dJL +
J{3}
r
f dJL +
~4}
f dJLr
f(l)f1({l}) + f(2)JL( {2})
+f(3)f1({3}) + f(4)f1({4})
1 X 1 +4 X 1+9 X 1 + 16 X 1
30.
3.3. Recall the integration by parts theorem for Riemann
Stieltjes integrals. Since F is continuous and of bounded vari
ation it follows that the integral exists. Thus, integrating by
parts we see that
lb F(x) dF(x) = (F(b))2  (F(a))2  lb F(x) dF(x).
Since F(b) = 1 and F( a) = 0 we see that
b 1
la
F(x) dF(x) = .
2
3.4. Let f be a probability density fnndion associated with F.
Then
i: (F(x + c)  F(x)) dx = i: lx+c f(t) dtdx
= i: l~c dx f(t) dt = c i: f(t) dt = c.
www.MathGeek.com
www.MathGeek.com
212 Solutions
For an alternate solution, note that
10= P(X > t) dt  i =P(X < t) dt.
O
E[X] =
Thus,
c E[X  (X  c)]
E[X]  E[X  c]
10= (1  F(t)) dt  iOco F(t) dt
O
loco (1  F(t + c)) dt + i = F(t + c) dt
10= (F(t + c)  F(t)) dt + iOco (F(t + c)  F(t)) dt
i: (F(t + c)  F(t)) dt.
3.5. Using Lemma 5.3 on page 96, it follows that we shonld
choose
0 if x < 0
9 (x) = { 1 if x > O.
Note that 9 is not differentiable at the origin, and hence the
typical engineering appeal to "derivatives of the step function"
is nonsensical.
4.1. No. Consider the three real numbers 1, 2, and 3. Note that
d(l, 3) = 4 but d(l, 2) = 1 and d(2, 3) = 1. Thus, we see that
d(l, 3) > d(l, 2) +d(2, 3). Hence, d does not satisfy the triangle
inequality and consequently cannot be a metric.
4.2. Consider the metric p defined on the positive integers N
via p(n, rn) = In  rnl. Notice that B(l, 1) = {1}, while B(l,
1) = {1, 2}. Further, the closure of B(l, 1) is equal to {1}.
Thus, the closure of B(l, 1) is a proper subset of the closed ball
B(l, 1).
4.3. Let a = 0, let b = 1, and let fn(t) = n2te nt . Clearly this
seqnence converges to zero pointwise as n + 00. However, note
that fn(t) has a maximnm at t = lin and that fn(l/n) = nle.
Thns, we see that although fn(t) + 0 as 17 + 00, dUn, 0) + 00
as n + 00.
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 213
4.4. For each positive integern, let Xn be a rational number in
the interval (J2  (lIn), J2). Note that d(xn' xm) < (lIn) +
(11m). Hence, we see that {Xn}nEN is a Cauchy sequence in Q.
However, there is no element in Q to which Xn converges. Since
we have found a Cauchy sequence in Q that does not converge
to a point in Q we see that the rational line is not complete.
5.1. The polynomial x2 + 2Bx + C has real roots if and only if
B2  C :2:: O. Thus, we are seeking the probability that B2 :2:: C.
This probability is given by
P(B 2 :2:: C) = 1 1ob2 JE c(b, c) dcdb = 10 1
1
b2 db = .
10o .0' 0 3
That is, the polynomial has real roots with probability 1/3.
5.2. Note that
P(0.3::::; VX < 0.4) P(0.09 ::::; X < 0.16)
0.16  0.09 = 0.07.
5.3. Note first that Fj(l : (0, 1) + lR exists and is strictly
increasing. Thus, if Z = Fx(X) then it follows that Fz(z) =
P(Z ::::; z) = P(Fx(X) ::::; z) = P(X ::::; Fj(I(Z)) = Fx(Fj(I(Z)) =
z for 0 < z < 1. Thus, Z is nniform on (0, 1).
5.4. Let Y = In(F(X)) and, as above, note that Fy(y) =
P(Y ::::; y) = P( In(F(X)) ::::; y) = P(ln(F(X)) :2:: y) =
P(F(X) :2:: exp( y)) = P(X :2:: FI(exp( y))) = 1  P(X ::::;
Fl(exp( y))) = 1  F(FI(exp( y))) = 1  exp( y) for y :2::
o where the sixth equality follows from the continuity of the
indicated distribution function. Thus, .!y (y) = exp( y) for y :2::
o and is zero for y < o.
5.5. Note that Fz(z) = P(Z ::::; z) = P(X ::::; z, Y ::::; z) =
Fx,Y(z, z). Also, note that Fw(w) = P(W ::::; w) = 1  P(W >
w) = 1P(X > w, Y > w) = P(X::::; w)+P(Y::::; w)P(X::::;
w, Y ::::; w) = Fx(w) + Fy(w)  Fx,Y(w, w).
5.6. Assume that X and Y have a joint probability distribution
function given by F. Note that P(XI < X ::::; X2, YI < Y ::::;
Y2) = P(X ::::; X2, Y ::::; Y2)  P(X ::::; Xl, Y ::::; Y2)  P(X ::::; X2,
Y ::::; yd + P(X ::::; Xl, Y ::::; YI) = F(X2' Y2)  F(Xl' Y2)  F(X2'
www.MathGeek.com
www.MathGeek.com
214 Solutions
Yl) + F(Xl' yd :2: O. Thus G is not a distribution function since
G(2, 2)  G(O, 2)  G(2, 0) + G(O, 0) = 1  1  1 + 0 = 1.
5.7. Let C be the circle of radius one centered at the origin.
Note that
.fx(X) r ~Ic(x,y)dy
JIR 1["
Y lX 2 1
j ylx2
dy
1["
~Jl X2
1["
where 1 < x < 1.
6.1. To begin, note that the three sets AnB, AnBc, and BnAc
partition Au B. Thus, by countable additivity it follows that
P(A U B) = P(A n B) + P(A nBC) + P(B n AC) which implies
that P(AC nBC) = 1  P(A)P(B)  P(A nBC)  P(B n AC)
where we have used De Morgan's Law and the fact that A and B
are independent. Note that since A and B are independent and
since An BC and An B partition A it follows that P(A nBC) =
P(A)  P(A n B) = P(A)(1  P(B)) = P(A)P(BC). Similarly,
it follows that P(B n AC) = P(B)P(AC). Substituting we see
that P(AC nBC) = P(AC)P(BC) which implies that AC and BC
are independent.
6.2. Note that a cirde with llnit area has radills r = 1/ y'iF.
Assume that the dart board is the circle of unit area centered
at the origin in ]R2. Note that P(X E [r/y'2, r]) and P(Y E
[r / y'2, rn are each positive since the dart's final resting place
is determined by a uniform distribution over the area of the
board. However, P(X E [r / y'2, r], Y E [r / y'2, r]) is zero since
the region in question is outside of the circle. Thus, X and Y
are not independent.
6.3. No, since P(A), P(B), and P(C) each equal 1/2 yet p(An
B n C) is equal to zero.
6.4. Let h denote the number of keystrokes required to type
Hamlet. Let D = {Wl, ... , wrn} denote the m different char
acters that the typewriter is able to produce. The monkey's
output may be thought of as a sequence of experiments where
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 215
the outcome of each experiment is an element of Oh. The prob
ability of each possible outcome is simply the product of the
probabilities of the keystrokes required to produce it. Let Pi de
note the probability of D:i E Oh where 1 ::::;i ::::; mh. Note that Pi
is positive for each i and that the text of the play Hamlet corre
sponds to {Xj for some integer j. We may model the situation as
follows: Repeatedly toss an m h sided die where the ith side of
the die appears on top with probability Pi. Our question then is
on how many tosses will the jth side appear on top. Since each
side comes up with positive probability and since the tosses are
made independently the second BorelCantelli lemma implies
that the jth side (and, indeed, each side) will with probability
one appear infinitely many times.
6.5. No, since
and since
fy(y) = loy 2e x e Y dx = 2e Y (1  e Y ); y ~ 0
and thus f(x, y) i= fx(x)fy(y)·
6.6. To begin, note that the area of the disc is 251T and the area
of the disc with those points removed that are less than one mile
from the center is 241T. Thus, the probability that there are no
hits within one mile of the target after N shots is (24/25)N.
Hence, the probability that there is at least one hit within a
mile of the target after N shots is equal to 1  (24/25)N. This
probability exceeds 0.95 when N ~ 74.
7.1. Recall that O"(X) = XI(B(lR)). Let A be a real Borel set.
If 87 E A then XI(A) = 0 since X(w) E A for each w E O.
Similarly, if 87 tj. A then XI(A) is empty. Thus, iJ(X) = {0,
O}.
7.2. If A is a real Borel set such that A C (00, 0) then
XI (A) = 0. Further, if A is any real Borel set then XI(A) =
XI(B) where B = An [0, (0). If A is a real Borel set such
that A C [0, (0) then let VA denote the set {ft : x E A}
and let A denote the set {x: x E A}. For such a set A it
www.MathGeek.com
www.MathGeek.com
216 Solutions
then follows that Xl(A) = VAu (VA) and hence that IT(X)
consists of all sets of this form. But, any real Borel set B C [0,
00) may be written as VC for the set C = {x 2 : x E B}. Thus,
O"(X) consists of all sets of the form B U  B where B C [0, 00)
is a real Borel set.
7.3. Assume that X is not equal to Y a.s. Then there must exist
a set of positive probability on which X  Y is not zero. Hence,
there exists a set of positive probability on which (X  y)2
is positive. But, if (X  y)2 is positive on a set of positive
probability then E[(X  y)2] cannot be equal to zero. This
contradiction implies that X must equal Y a.s.
7.4. Note that E[X] = 0 and that
VAR[X] = ~ ( x 2 e 1xl dx = 2.
2 JIT€.
Thus, Chebyshev implies that P(IXI > 2) ::; 1/2. Note, how
ever, that
P(IXI > 2) = 1  1
1
2.
2
2
e 1xl dx
.
= e 2 ~ 0.135.
7.5. Recall that IT(Y) = y1(8(lR)). Let B E 8(lR) and note
that y1(B) = X 1(g1(B)). Since 9 is Borel measurable it
follows that gl(B) E 8(lR) and hence that X1(g1(B)) E
O"(X). Equality will occur when any Borel set B may be written
as gl(A) for some A E 8(lR). (For example, if 9 is bijective.)
7.6. Let A and B be nonempty sets, let j : A 7 B, and let
9 be a collection of subsets of B. To begin, we will show that
j1(O"B(9)) = O"A(f1(9)) where, for a nonempty set M and
a collection .Iv/ of subsets of A1, O"M(M) denotes the smallest
O"algebra on M that contains every set in M.
Recall that if {Bi : i E I} is a collection of subsets of B then
j1(UiEI B i ) = UiEI jl(B;) and jl(niEI B i ) = niEI jl(Bi)'
That is, intersections and inverses commute and unions and in
verses commute. Let 91 = {G c B : G E 9 or GC E 9}.
Further, for each positive integer i > 1, let 9i denote the set of
all countable unions and all countable intersections of elements
in Uj<i 9j . Note that U j <= 9j is a ITalgebra and, in fact, is
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 217
equal to (T(Q). Further, j1((T(9)) = Uj<oo j1(9i). Also, since
j1 (9i) is the set of all countable unions and countable inter
sections of elements in Uj<i j1(QJ it follows that Ui<oo j1(9i)
is a O"algebra. Indeed, it is the O"algebra generated by j1 (9).
Thus, j1(0"(9)) = O"U 1(9)).
Consider a random variable X defined on a probability space
(n, F, P). Now, let S be a collntable collection of sllbsets
of ffi. such that B(ffi.) = O"(S). Our first result implies that
XI (O"lR(S)) = 0"1l(X 1(S)). Note, also, that X 1(O"lR(S)) =
X 1(B(ffi.)) = (Tll(X). Thus, (Tll(X) = (Tll(X 1(S)) from which
it follows immediately that O"Il(X) is countably generated.
7.7. Consider measurable spaces (n1' F 1 ) and (n2' F 2 ) and let j
be a function mapping fh to n2 . Let A be a collection of subsets
of fh such that O"(A) = F 2. If j1(A) C F1 then j1(F2) c Fl.
To see why this holds, recall that complements, unions, and
intersections commute with inverses. Thus, the collection y of
all subsets A of n2 such that j1(A) E F1 is a (Talgebra on n2 .
Note that n2 E y since j1(0) = 0. Further, note that A c y.
This implies that O"(A) C y. Since O"(A) = F2 the desired result
follows immediately.
It follows immediately that if XI (B(ffi.)) c F then X 1((00,
xl) E F for each x E R Further, using the result of the previous
paragraph, it follows that XI (B(ffi.)) c F if XI (( 00, xl) E F
for each x E R
7.S. The forward implication is clear. The reverse implication
follows quickly via a proof by contradiction.
7.9. Note that
~
2 2
E[IXYll r r Ix  yl dx dy
4 Jo Jo
41 Jor2 Jry2(xy)dxdy+41 Jor2 Jor (yx)dxdyy
4lo
1 2 ( 1
2  2y  "2y2 + y2 ) dy
1 10
2(2 2) dy
+4.0 Y  1
y
2
1 .,
4lo~ (y2  2y + 2) dy
www.MathGeek.com
www.MathGeek.com
218 Solutions
2
3
7.10. Note that
P(Z:S; z) P(max(Xl' ... , Xn) :s; Z)
P(XI :s; z, ... , Xn :s; Z)
P(XI :s; Z) ... P(Xn :s; z)
(~)n
for 0 < z :s; B. Thus, it follows that
for 0 < z < B. Hence,
7.11. To begin, recall that
1 1
cos (a) cos ( b) = "2 cos (a  b) + "2 cos (a + b)
and
cos(a + b) = cos(a) cos (b)  sin(a) sin(b).
Note also that E[cos(2X)] = 0 and E[sin(2X)] = 0 since
Cx (2) = E[cos(2X)] + ~E[sin(2X)] = O. Thus, it follows that
E[cos(X + s) cos(X + s + 1)]
E [~COS(l) + ~ cos(2X + 2s + 1)]
1 1
"2 cos(l) + "2E[cos(2X + 2s + 1)]
1 1
"2 cos(l) + "2E[cos(2X) cos(2s + 1)  sin(2X) sin(2s + 1)]
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 219
1 1
 cos(l) +  cos(2s + 1)E[cos(2X)]
2 2
1
 sin(2s + 1)E[sin(2X)]
2
1
"2 cos(l).
8.1. Note that
oo
E[X] la xdF(x)
oo
la lax dt dF(x)
oo
la 1 dF(x) dt
00
oo
la P(X > t) dt.
8.2. Since E[(X  m)2] = E[X2]  2mE[X] + m 2 it follows that
d
dm E[(X  m)2] = 2E[X] + 2m. Setting this latter expression
equal to zero implies that m = E[X] is a critical point. Since
~ . .
dm2E[(X  m)2] = 2 > 0 it follows that m minimizes E[(X
m)2J.
8.3. Apply the CauchySchwarz inequality to the product XY
to see that
IE[XY]I :::; E[IXYI]:::; VE[X2]/E[Y2].
8.4. Note that COV[X, Y] = E[(X  E[X])(Y  E[Y])]
E[XY]E[X]E[Y]E[X]E[Y]+E[X]E[Y] = E[XY]E[X]E[Y].
8 ..5. Apply the CauchySchwarz inequality to X  E[X] and
Y  E[Y] to see that
IE[(X  E[X])(Y  E[Y])] I
:::; VE[ (X  E[X])2] V'E[(YE[Y])2]
= O"xO"y
which implies that
Ip(X, Y)I = ICOV[X, Y]I :::; 1.
O"xO"y
www.MathGeek.com
www.MathGeek.com
220 Solutions
8.6. For a, b E lR let Z = aX  bY. Note that 0 ::; E[Z2] =
a2E[X2]2abE[XY]+b 2E[Y'lJ. Note that the right hand side is a
quadratic equation in a that has at most one real root (possibly
of multiplicity two). Note that the roots of this expression are
given by
2bE[XY] ± /4b 2E[XYJ2  4E[X2]b 2E[Y2]
2E[X2]
Based upon the previoU8 observation we know that 4b2E[XYj2
4E[X2]b 2E[y2] ::; 0 and hence that E[Xy]2  E[X2]E[y2] ::; O.
Equality holds if and only if E[ Z2], as a function of a, has a real
root. Thus, equality holds if and only E[(aX  by)2] = 0 for
some a and b not both eqnal to zero. Thns, eqnality holds if and
only if P(aX = bY) = 1 for a and b not both zero. In fact, if
p(X, Y) = 1 then Y increases linearly with X (almost snrely)
and if p(X, Y) = 1 then Y decreases linearly with X (almost
surely).
8.7. It follows quickly that
E[Y] =
1
b
 a
lb
a
ydy
a
=
+b
2
and that
VAR[Y] = 1
b a
lb a
y2 dy  (a+2b)2 (b  a)2
12
8.8. Recall that Mx(t) = exp().(e t  1)). Thus, M'x(t)
).e exp().(e t  1)) and lVf'};(t) = ().e t + ).2e 2t ) exp().(e t  1)).
t
Thus, E[X] = M·'x(O) = ). and E[X2] = 1\1'};(0) = ). +).2 which
implies that VAR[X] = E[X2] E[X]2 = ).+).2 _).2 = ).. Recall
that VAR[aX] = a2VAR[X] for a E R Thns, by independence
we see that VAR[Y] = (16 + 1 + 36 + 9)(3) = 186.
8.9. Note that E[(X  aY?] = E[X2]  2aE[XY] + a 2E[y2].
d .
Hence, d E[(X  ay)2] = 2E[XY] + 2aE[y2] = 0 If a =
a
2
E[XY]/E[y2]. Since dd "E[(X  aY?] = 2E[y2] > 0 it follows
a"
that this choice for a a results in a minimum value of E[(X 
ay)2].
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 221
8.10. Let Z = I:~=l Xi and note that E[XIZ] = E[Xfl + (n
l)fL2 = 0 2 + fL2 + (n 1)fL2 = 0 2 + nfL2, that E[XI ] = fL, and
that E[Z] = nfL. Thus, COV[XI' Z] = 0 2 +nfL2  (fL)(nfL) = 0 2.
Further, VAR[XI ] = 0 2 and VAR[Z] = n0 2. Thus,
8.11. Recall that
for k = 0, 1, 2, .... Thus,
exp ( A(t  1)).
8.12. Recall that if Y is has a uniform distribution on (0, 1)
then
e,t  1
<Py (t) = zt .
Note that if X = 2Y  1 then X has a uniform distribution on
(1, 1) and
<P x (t) eIt<py (2t)
e lt ( e21~z~ 1 )
e,t _ e 1t
2d
sin( t)
t
Finally, if Sn = Xl + ... + Xn then
<P, = (sin (t) ) n
Sn t
www.MathGeek.com
www.MathGeek.com
222 Solutions
8.13. Note that <I>(t) is realvalued, and hence that
<I>(t) = E[cos(tX)] = 1: cos(tx)f(x) dx.
Let g(x) = cos(tx)Jf(x), let h(x) = J f(x), and re(;all that
S(;hwarz's inequality implies that
(1: g(x)h(x) dxr: ; 1: i(x) dx 1: h2(x) dx.
Thus,
<I>2(t) < 1: cos 2(tx)f(x) dx 1: f(x) dx
1/
"2.
00
00 (1 + (;os(2tx))f(x) dx
1 1
"2 + "2<I>(2t)
from which the desired result follows immediately.
For an alternate solution (that does not require X to possess a
density fundion), note that
<I>(t) E[(;os(tX)]
E [2 cos
2
C;)  1]
2E [cos
2
C;)]  1.
Thus, <I>(2t) = 2E[cos 2 (tX)]1. Jensen's inequality implies that
E[cos 2 (tX)] ;::: E[cos(tX)J2. Thus, <I>(2t) ;::: 2E[cos(tX)J2  1 =
2<I>2(t)  1 from which the desired result again follows immedi
ately.
8.14. Note that X is equal to 0, 1, and 2 with probabilities 1/4,
1/2, and 1/4, respectively. Thus,
1'vlx(s) E[e SX ]
2
'LeskP(X=k)
k=O
ens P(X = 0) + e1s P(X = 1) + e2s P(X = 2)
1 1 s 1 2s
_ + _e + _e
4 2 4
~(1 + e )2. s
4
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 223
Note that
1 s
lvI I (s) = _e 1 2
+ e S
x 2 2'
that
and, in general, that
lvIt'l(s) = ~es + 2n 2 e2s .
2
Thus, we see that
for n E N.
8.15. To begin, note that
lvIx(s) = E[e SX
] = ~(1 + e + e2s ).
S
Further, note that
0 if X = 0 and Y = 0
0 if X = 2 and Y = 1
Z=
1 if X = 1 and Y = 0
1 if X = 0 and Y = 1
2 if X = 1 and Y = 1
2 if X = 2 and Y = o.
From this we see that
0 WP 1/3
Z = 21 WP 1/3
{
WP 1/3,
from which it follows that
JI/1z(s) = E[e SZ ] = ~(1
3
+ e + e2s ).
S
Next, note that
0 if X = 0 and Y = 0
1 if X = 0 and Y = 1
X+Z=
2 if X = 1 and Y = 0
2 if X = 2 and Y = 1
3 if X = 1 and Y = 1
4 if X = 2 and Y = o.
www.MathGeek.com
www.MathGeek.com
224 Solutions
Thus, we see that
0 wp 1/9
1 wp 2/9
X+Z= 2 wp 1/3
3 wp 2/9
4 wp 1/9.
Hence,
~1~
IVjX+Z
()
s = 1 2 8 + c
+ c 1 28 + e
2 3s + c
1 48 .
9 9 3 9 9
Note also that
as well. However, X and Z are not independent since P(Z = 0,
X = 1) = 0 even though P(Z = 0) and P(X = 1) are each
positive.
8.16. The random variables X and Yare uncorrelated since
27r 1
E[XY] =
1o
 cos( e) sin( e) de = 0,
21r
27r 1
E[X] =
1
.0 21r
cos( e) de = 0,
and,
27r 1
E[Y] = (  sin(e) de = o.
21r Jo
X and Yare not independent since, however, Sll1ce P(X E
[1/V2, 1], Y E [1/V2, 1]) = 0 =I P(X E [1/V2, I])P(Y E
[1/V2, 1]).
8.17. Consider uncorrelated random variables X and Y with
joint probability distribution function P(X = Xi, Y = Yj) = Pij
for i, j = 1, 2 and marginal probability distributions P(X =
Xi) = Pi fori = 1, 2 and P(Y = Yj) = qj for j = 1, 2. Note
that Pll + P12 + P21 + P22 = 1, Pi1 + Pi2 = Pi for i = 1, 2,
P1j + P2j = qj for j = 1, 2, P1 + P2 = 1, and q1 + q2 = 1.
] = 2:: i =l 2:: j =l XiYjPij, EX = 2:: i =l XiPi, and
2 2. [] 2
Note that E [ XY
E[Y] = 2::;=1 Yjqj. Since X and Yare uncorrelated it follows
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 225
that E[XY]  E[X]E[Y] = 0 and hence that X1Yl (pu  Plqd +
X1Y2(P12  Plq2) + X2Yl (P2l  P2ql) + X2Y2(P22  P2Q2) = O. Notice
that P12 = Pl  Pll, P2l = ql  Pll, P22 = q2  P12 = q2  Pl + Pll·
Substitution yields X1Yl (Pll  Plql)  X(lj2(Pll  Pl + Plq2) 
X2Yl(Pllql +P2ql)+X2Y2(PllPl +q2P2q2) = O. Next, note
that Plql = Pl  P1Q2 = ql  P2Ql = Pl  q2 + P2Q2· Substituting
again implies that (X1Yl  X1Y2  X2Yl + X2Y2)(Pll  P1Ql) = 0
or that (Xl  X2)(Yl  Y2)(Pll  P1Qd = O. Since Xl i X2 and
Yl i Y2 it follows that Pll = P1Ql· From this we see that P12 =
PlP1Ql = Pl(IQl) = P1Q2, P2l = QlP1Ql = Ql(IPl) = P2Ql,
and, P22 = P2  P2l = P2  P2Ql = P2(1  Ql) = P2Q2· That is,
Pij = PiQj for i, j = 1, 2. Thus, X and Yare independent.
9.1. Let fR denote a uniform density on (a, b). Let X be the
length of the circumference and let Y be the area of the circle.
It follows that
1 ( X ) 1
fx(x) = 21/R 211" = 211"(b  a)
for 211"a < X < 211"b and
fy(y) = 2y11rYfR
1 (
VfY)
; 1
= 2y11rY b 
1
a
for 11"a 2 ::::; Y ::::; 11"b 2.
9.2. Let Z = XY. Theorem 5.17 on page 103 implies that
fz(z) = I: I~I fy(y)fx (~) dy
~
2
_1_ (= yexp (_y 1 dy
lyl VI  (Z2/ y 2)
)
11"0"2 llzl . 20"2 .
(since Iz/yl < 1 and Y > 0)
~2
1 exp
11" (J"
(_Z2) 10= exp
~2
2(J". 0
(t 2)
~2
2(J"
x t 1 dt
VI 
2 2
(Z2/(t + Z2)) Jt + Z2
(where we let y2 = t 2 + Z2)
= 1
~')
11" O"~
exp ~2 10= exp
(_Z2)
20" 0
(t 2)
')
20"~
dt
www.MathGeek.com
www.MathGeek.com
226 Solutions
1
exp  2
(_Z2) V21ra
1 2
1ra 2 2a 2
_ 1 exp (_Z2)
rrV'iii 2a 2
for z E JR. That is, Z is N(O, ( 2 ).
9.3. Let g(x, y) = x/(x+y) and let h(x, y) = x+y. Note that if
a(b, t) = bt then a(g(x, y), h(x, y)) = a(x/(x + y), x + y) = x,
and if /3(b, t) = t(l b) then (3(g(x, y), h(x, y)) = (3(x/(x + y),
x + y) = y. Let B = X/(X + Y) and T = X + Y. Then
fB,T(b, t) = fx,y(a(b, t), J3(b, t)) det 8a/8b
8a/8t 8(3/8b
8(3/8t II
[
1
fx,Y(bt, t(l  b)) Idet [~ t
1 __ b II
ebteHbtltl
te t
for t > 0 and 0 < b < 1. Thus,
fB(b) = 10= te dt = 1
t
for 0 < b < 1 which implies that B is uniform on (0, 1).
9.4. Let g(x) = l/x for x > 0 and note that gl = g. Thus, it
follows that
I1/x(Y) = fX(gl(y)) Id~yg1(Y)1
1/ 1~211
Ix ( y)
fx(1/y)/y2
for y > O.
10.1. Note that
P(Z < 2.44)
Z  2.44 
P ( <
5 5)
2 2
P(X < 1.28)
P(X> 1.28)
1/10.
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 227
Thus, a = 1.28.
10.2. Recall the moment generating function lVIx and note that
JVI~(t) = ((J"2 + (m + (J"2t)2) exp [ (J 22t :2 + tm1
and that
lVI'!!(t) = (2(m + (J2t)(J2 + ((J2 + (Tn + (J2t)2)(m + (J2t))
x exp [2
(J"2 t 2
+ tTn1.
Thus, E[X3] = lVI'j(.(O) = 2m(J"2 + ((J"2 + m 2)(m) = 3m(J"2 + m 3.
If m = 0 then E[X97] = 0 since x 97 fx(x) is an integrable, odd
function.
10.3. Let T = ViiZlvV and let E = VV. Note that Z = TEI;n.
Theorem 5.19 on page 103 and the independence of VV and Z
imply that
fT,B(t, b) = fz,w (~, b) det [~ : 1
Vii
= f Z.R. (.!!!..
Vii' b) ~
Vii
= fz (
y'lL
b~) fw(b) vn
1%.
Thus, we see that
h(t) = JIR fzr (Vii
bt ) fw(b) Ibl db.
Vii
10.4. Note that
P(Y ~ y) P(XZ ~ y)
P(XZ ~ y, Z = 1) + P(XZ ~ y, Z = 1)
P(X ~ y, Z = 1) +P(X ~ y, Z = 1)
P(X ~ y)P(Z = 1) + P( X ~ y)P(Z = 1)
1 1
P(X ~ Y)"2 + P(X ~ Y)"2
(since X is also standard Gaussian)
= P(X ~ y)
www.MathGeek.com
www.MathGeek.com
228 Solutions
which implies that Y is standard Gaussian. Note, however, that
X + Y is not Gaussian since
X +Y = {2X wp 1/2
o wp 1/2.
That is, X + Y has a discontinuous distribution function. Note,
also, that X and Yare nncorrelated since E[XY] = E[X2 Z] =
E[X2]E[Z] = O. Finally, however, note that X and Yare not
independent since P(X E [1, 2], Y E [3, 4]) = 0 yet P(X E [1,
2]) and P(Y E [3, 4]) are each positive.
10.5. Let
[ ~~ ] = [~~ ~~] [ ~~ ] = [ ~~~~ ! ~~~~ ].
Note that Zl and Z2 are mntnally Ganssian. Thns, for Zl and Z2
to be independent we require that E[ZlZ2] = E[Zl]E[Z2]. (Note
the we effectively have one equation and four variables so long
as we ensure that C is not singular.) Let C1 = C3 = 1, let C2 = 0,
and note that E[ZlZ2] = E[X1(X1 +C4X2)] = E[XiJ+C4E[X1X2].
Note that E[Xf] = 1 and E[X1X 2] = 1/3. Thus, Zl and Z2 are
independent if C4 = 3 with C1, C2, and C3 as given above.
10.6. Recall that
( 22) (22)
. ~t ~t
Alx(t) = exp 2 + tm1 and },fy(t) = exp 2 + tm2 .
Thus,
. (0"1+0"2
. 2 2) t 2 )
lVlx+y(t) = exp ( 2 + t(m1 + m2)
which implies that X + Y is N(m1 + m2, O"r + O"i).
10.7. Note that
k (>.f1 +
etx (x) (1  )..)i2(x)) dx
). k f1(X) +
etx dx (1 )..) ketx i2(x) dx
).. [exp (0"~t2 + tm 1) ]
+ (1  )..) [ex p ((}~t2 + tm2) ]
)"lVh (t) + (1  )")lVh(t)
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 229
where lvfl is the moment generating function associated with h
and A12 is the moment generating function associated with h.
Note, also, that
E[X] lvI~(O)
[A(lTit + lndlvh(t) + (1  A)(IT~t + m 2)M2(t)L=o
An~l + (1  A)m2.
Further,
E[X2] M~(O)
[AO"i lvfl (t) + A(O"it + ml)21\1l (t)
+(1  A)0"~lvf2(t) + (1  A)(O"~t + m2)21\12(t)]t=o
AlTl2 + Am 2 l+( 1 )A21T2 + (1  A)m 2
2.
Thus,
VAR[X]
10.8. Let Z = max(X, Y) and note that Fz(z) = Fx,Y(z, z)
via Problem 5.5. Let F be the distribution function of X (and
Y) and let j be a density function for X (and Y). Then, since
Fz(z) = F(z)F(z) it follows that jz(z) = F~(z) = 2F(z)j(z).
Thus,
E[Z] i: zfz(z) dz
i: 2zj(z)F(z) dz
2 roo z_1_e z2 / 2 rz _1_e t2 / 2 dt dz
L= v'2i L= v'2i
~ roo t zez2/2et2/2dtdz
i= i=
j= 1= e t2/.2ze z/2dzdt
7r
_1 2 '.
7r = t
11= 1=
_
7r =. t
ze(Z 2 +t 2 )/2 dz dt
11= h=

7r =. 2t2
e"2w 1 dwdt
2
www.MathGeek.com
www.MathGeek.com
230 Solutions
11(Xl e dt
 _t 2
1["(Xl
~ 1(Xl e 1_ dt =
t2 / 2 _ _1_.
1["(Xl J2 yIK
11.1. Note that the Xn's are mutually independent and that
2:~=1 P(Xn = 1) = 2:~=1 ~ = 00. Thus, the second Borel
Cantelli Lemma implies that P(lim sup{ Xn = I}) = 1. That is,
there exists a null set A such that for any w E AC, Xn(w) = 1
for infinitely many values of n. Hence, Xn does not converge to
zero for any w E AC. Since P(AC) = 1 it follows that the Xn's
cannot converge to zero almost snrely.
11.2. Since Cx ; (t) = e 1tl for eachi and since the Xi's are
mutually independent, it follows that
CSn/n(t) = (Cx ; (~)) n = Cx;(t).
Thus, Sn/n has a Cauchy distribution centered at zero with
parameter 1. That is, the distribution of the normalized sum is
the same as the distribution of any specific random variable in
the sum.
11.3. Consider a sequence {Xn}nEN" of random variables such
that
_ {n:30
Xn 
with probability 1/n 2
with probability 1  (l/n 2 ).
Fix s > 0 and note that
if s > 7)3
P(IXn  01 2 s) = P(Xn 2 E) = {01 if 0 < s S; n 3 .
n2
Thus, since P(Xn 2 E) + 0 as n + 00 for any positive E
we conclude that Xn converges in probability to zero. Note,
however, that since E[X~l = n 3p /n 2 = n 3p  2 it follows that the
Xn's do not converge to zero in Lp for any p > 1.
11.4. By Theorem 5.40 we know that Xn + C in distribution if
Xn + C in probability. We will show that Xn + C in probability
if Xn + C in distribution. If we consider c to be a constant
random variable then Fc(x) = I[c, 00) (x). Let s > 0 be given and
note that
P(IXn  cl 2 s) = P(Xn S; c  s) + P(Xn 2 c + s).
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 231
Note that P(Xn S C  E) = FXn(c  E) Jo 0 as n Jo 00 since
Fxn(x) Jo Fc(x) as n Jo 00 for all x < c. Similarly, P(Xn ;::::
C + E) Jo 0 as n Jo 00. From this we see that Xn Jo C in
probability.
11.5. Note that
P(Zn S z)
P(n(l  max{Xl' ... , Xn}) S z)
P (1  max{Xl , ... , Xn} S ~)
17,
P (max{Xl , ... , Xn} ;:::: 1  ~)
n
1  P (max{Xl , ... , Xn} < 1  ~)
17,
1  P (Xl < 1  ;;, ... , Xn < 1  ;;)
1  P (Xl < 1  ~) ... P ( Xn < 1  ~)
1_(1 _~)n
n
for 0 < z < n. Recall that
(1  nz)n Jo e z
as n Jo 00. Thus, FZn (z) Jo 1  e Z for 0 < z < 00 as n Jo 00.
11.6. If the X;'s are mutually independent uniform random vari
ables taking values in the interval (0.05, 0.05) then Xi has
mean zero and variance 0.01/12 for each ,i. Let Z be a random
variable with a standard Gaussian distribution function and let
<I>(x) = P(Z S x). The Central Limit Theorem implies that 5 n
has approximately the same distribution as Z(Jvn. Thus,
P (IZI < )1000(0.01)/12
2 )
p(IZI < _2 )
0.91
P(IZI < 2.19)
2<I>(2.19)  1 = 0.97.
www.MathGeek.com
www.MathGeek.com
232 Solutions
11.7. Let Xi equal 0 or 1 if the ith flip is tails or heads, respec
tively. Thus, X = I:~~'~OO Xi denotes the number of heads that
are observed in 10,000 flips. VVe will approximate P(X = 5000)
by finding P(4999.5 < X < .5000.5) with an appeal to the Cen
tral Limit Theorem.
The Central Limit Theorem implies that
Let Z be a standard Gaussian random variable, and note that
E[Xi ] = ~ and VAR[Xil = E[Xf]  E2[Xil = ~  ~ = ~. Thus,
P( 4999.5 < X < 5000.5)
4999.5  10,0000) 5000.5  10, 000 (~))
;::::j P ( <Z < ~;::;;::::::;:;::;:;:;'r=f"''
v10,000y1 v 10 ,000y1
1< Z <1 )
P (
100 100
~ (1~0)  ~ (~~)
2 (~(1~0)  ~(O))
2~ (_1
100
) 1.
Note that ~(0.01) = 0.5040. Hence, P(X = 5000) ;::::j 2(0.5040)
1 = 0.008.
11.8. Note that
E[X~l  20E[Xnl + 0
2
VAR[Xnl + E[Xn]2  20:E[Xnl + 0: 2
VAR[Xnl + (E[Xnl  0)2
7 0
if and only if VAR[Xnl 7 0 and E[Xnl 70 as n 7 00.
12.1. Note that X and Yare mutually Gaussian and that X and
Yare uncorrelated since E[XYl = E[(U + V)(U  V)] = E[U 2 ] 
E[V2l = O. Thus, X and Yare independent. Problem 10.6
implies that X and Yare each N(O, 2).
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 233
Note that E[XIU] = E[U + VIU] = E[UIU] + E[VIU] = U +
E[V] = U a.s. Similarly, E[YIU] = U a.s. Thus, not only are
E[XIU] and E[YIU] not independent, but they are each equal
almost surely to the same positive variance random variable.
12.2. Let f2 = {a, b, c} and define a measnre P on JPl(D) via
P({a}) = P({b}) = P({e}) = 1/3. Define random variables X
and Y on the resnlting probability space via Y(a) = 1, Y(b) =
Y(e) = 1, X(a) = 1, X(b) = 2, and X(e) = O. Since the
distributions are discrete the second condition in the definition
of conditional expectation reduces to
L E[XIY](w)P({w}) = L X(w)P({w}),
where JlvI E (j(y) = {0, f2, {a}, {b, e}}. Snbstitntion for !vI
implies that E[XIY](a) = X(a) and E[XIY](b) + E[XIY](e) =
X(b) + X(e). Since E[XIY] is (j(Y)measnrable it follows that
E[XIY](b) = E[XIY](e). Thus, E[XIY](a) = E[XIY](b) =
E[XIY] (c) = 1 and we see that E[XIY] = E[X] = 1/3 + 2/3 = 1
as required. However, X and Yare not independent since
P(X = 1, Y = 1) = 1/3 yet P(X = l)P(Y = 1) = 1/9.
12.3. Note that E[ZIX] = E[XYIX] = XE[YIX] = XE[Y] =
o a.s. Similarly, E[ZIY] = 0 a.s. However, Z is O"(X, Y)
measurable and hence E[ZIX, Y] = Z a.s.
12.4. Since E[XIF] is Fmeasurable it follows that E[XIF] =
alIA + a2IB + a3Ic for some real constants aI, a2, and a:3. Recall
that E[XIF] must satisfy
£E[XIF]d)'= £Xd),
for all F E F. Choosing F = A implies that
which in turns implies that
1 rl / 4 2 1
al = )'(A) Jo w dw = 48·
Similarly, a2 = 97/432 and a3 = 19/27.
www.MathGeek.com
www.MathGeek.com
234 Solutions
12.5. Note that if A E (J(y) then
LE[XZIY]dP LXZdP
L IAXZdP
E[IAXZ]
E[X]E[IAZ]
E[X] LZdP
E[X] L E[ZIY] dP.
12.6. Recall from Problem 12.5 that if X and Z are independent
and if X and Yare independent then E[XZIY] = E[X]E[ZIY]
a.s. Let Fn denote 0"(X1' ... , Xn). Thus, it follows that
E [( n+1
{;Yk) 2 I ]
(n+1)0"2Fn
E [ (~Yk +Y" 11)'  (n + l)<T'IFn]
E [ (~Yk)' + 21';, 11 ~ Yk +Y';+1  n,,'  "'1.1;,]
E [Xn + 2Yn+1~Yk + Y;+1  0"21Fn]
E[XnIFn] + 2E [Yn+1 ~ YklFn] + E[Y,;+lIFn]  0"2
Xn + 2E[Yn+1]E [~ Yk IFn] + E[Y;+l]  (J2
Xn a.s.
where in the second to the last step we used our first result and
noticed that Y;+ 1 is independent of Xi fori ::; TI.
12.7. Note that
VAR[XIY] E[(X  E[Xly])2IY]
E[X2  2XE[XIY] + E[Xly]2IY]
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 235
E[X21Y] 2E[XE[XIY]IY] + E[E[Xly]2IY]
E[X2IY]  2E[XIY]E[XIY] + E[Xly]2
E[X2IY] E[Xly]2.
Thus,
E[VAR[XIY]] E[E[X 2IY]]  E[E[Xly]2]
E[X2]  E[E[Xly]2]
and
VAR[E[XIY]] = E[E[Xly]2] E[E[XIYW
= E[E[Xly]2]  E[X]2.
Thus, E[VAR[XIY]]+VAR[E[XIY]] = E[X2]E[X]2 = VAR[X].
12.8. To begin, note that
E[(X  g(y))2]
E[(X  E[XIY] + E[XIY]  g(y))2]
= E[(X  E[Xly])2] + 2E[(X  E[XIY])(E[XIY] g(Y))]
+E[(E[XIY] g(y))2].
The first res nIt now follows since
E[(X  E[XIY])(E[XIY]  g(Y))]
E[E[(X  E[XIY])(E[XIY] g(Y))IY]]
E[(E[XIY] g(Y))E[X  E[XIY]IY]]
E[(E[XIY] g(Y))(E[XIY]  E[XIY])]
o.
From this result it is dear that E[(X _g(y))2] is minimized over
all Borel measurable functions 9 when we let g(y) = E[XIY = y].
12.9. Let A = {I, 3, 5}, let B = {2, 4, 6}, and note that
E[XIQ] = alIA + a2IB for some choice of al and a2 from R
Since
JAr
E[XIQ] dP = al dP =
JAr JAr
X dP
it follows that Ctl = 3. That is since nIP(A) = 1P( {I}) +
3P( {3}) + 5P( {5}) it follows that ad2 = 9/6. Similarly it
follows that Ct2 = 4. Thus, E[XIQ] = 3I A + 4I B .
www.MathGeek.com
www.MathGeek.com
236 Solutions
12.10. It follows by the definition of conditional expectation that
r XdP
E[XIQ] = _JO,,i,.,._
P(D;)
for all w E Di .
12.11. To begin, note that
where
E[XY] E[E[YIQ2]Y]
E[E[E[YIQ2]YIQ2]]
E[E[YIQ2]2]
E[X2]
and
E[XY] E[XE[XIQl]]
E[E[XE[XIQl]IQl]]
E[E[XIQl]2]
E[y2].
Thus, E[(X _y)2] = E[XY]2E[XY] +E[XY] = 0 and, hence,
the desired result follows via Problem 7.3.
12.12. Since E[YIX] = a + /3X it follows that E[E[YIX]] =
E[Y] = a + pE[X]. Sllbstitntion implies that 4 = a + 3p. Note
also that
E[XY] E[E[XYIX]]
E[XE[YIX]]
E[X(a + leX)]
aE[X] + pE[X2]
aE[X] + pVAR[X] + pE[X]2.
Substituting here implies that 3 = 3a + 11p. Solving these
two eqnations yields a = 53/2 and p = 15/2.
www.MathGeek.com
www.MathGeek.com
Solutions to Problems 237
12.13. To begin, note that
E[y2IX = x] = (T~(1 _ p2) + ( P(T(T: X):.l
since, for each fixed x, a conditional density f(ylx) of Y given
X = x is Gaussian with mean prJy x / rJX and variance rJ~ (1  p:.l) .
Next, recall that a moment generating function for X exists and
is given by lHx(t) = exp(oJt 2 /2). Finding the fourth derivative
of this function and evaluating it at t = 0 implies that E[X4] =
3rJl. Thus, since p = E[XYl!(rJxrJy), it follows that
E[X2y2] E[E[X2y2IX]]
E[X2E[y2IX]]
E[X2((T~(1 _ p2) + p2(T~.(T>? X2)]
rJ~(l  p2)rJi + p2rJ~rJX2(3rJ:tJ
+ 2p2rJirJ~
rJirJ~
E[X2]E[y2] + 2(E[Xy])2.
12.14. First, note that
E[YI ... Yn lXI, ... , Xm]
E[XmYrn+1 ... YnIXI , ... , Xm]
X mE[Yrn+1 ... Yn]
Xm·
Next, note that
E[(Xn  Xm) COS(Xj )]
E[E[(Xn  Xm) COS(Xj)IXI' ... , Xmll
E[cos(Xj)E[Xn  XmIXI , ... , Xm]]
E[cos(Xj) (E[Xn lXI, ... , Xm] Xm)]
E[cos(Xj)(Xm  Xm)]
o.
12.15. To begin, note that
.fy(y) k f(x, y) dx
l
J
r 8xy dx
y
4y(1 _ y:.l)
www.MathGeek.com
www.MathGeek.com
238 Solutions
for 0 S y S 1. Also, note that
fx(x) kf(x, y)dy
lax 8xydy
4x 3
for 0 S x S 1. Thus,
E[XIY = y] r xf(x, y) dx
JJR fy(y)
2 2
1 y Jy
r 1
X
2
dx
21  y:3
31 y2
21 + y + y2
3 l+y
for 0 S y S 1. Further,
E[YIX = x]
r f(x, y) d
JJR Y fx(x) y
2
2"
lox,Y 2 d
Y
x 0
2
x
3
for 0 S x S 1.
8.3 Solutions to True jFalse Questions
1. False 5. False 9. False
2. True 6. True 10. False
3. True 7. False 11. False
4. False 8. False 12. True
www.MathGeek.com
www.MathGeek.com
Solutions to True/False Questions 239
13. True 23. True 33. False
14. False 24. False 34. False
15. True 25. True 31':o. False
16. False 26. False 36. False
17. True 27. False 37. True
18. False 28. True 38. False
19. True 29. True 39. True
20. False 30. False! 40. False
21. False 31. False
22. False 32. False
www.MathGeek.com
www.MathGeek.com
Index
absolutely c:ontinuous distri bounded random variable,
bution, 72 70
affine function, 164 bounded set, 35
algebra, 24 bounded variation, 49
almost always, 36 Brown, Robert, 164
almost everywhere, 53 Brownian motion process,
almost sure convergence, 116, 164
1·51 Buffon's needle problem, 84,
almost surely, 116 100
atomic distribution, 72
autocOITelation function, 139 canonical filtration, 149
auto covariance fnndion, 139, canonical projection, 18
156 CantorLebesgue function,
axiom of choice, 13 74
Caratheodory criterion, 43
Banach space, 61 Caratheodory extension the
bandwidth, 161 orem, 86
second moment, 162 Cartesian product
basis, 60 arbitrary index set, 17
Bernoulli trial, 81 n sets, 17
Bessel's inequality, 63 two sets, 17
bijection, 20 Cauchy distribution, 99, 109
bijective function, 20 Cauchy sequence, 61
binomial distribution, 81 CauchySchwarz inequality,
bivariate Gaussian distribu 100
tion, 112 central limit theorem, 120
Borel measurable fundion, central moment, 98
40,46, 73 ChapmanKolmogorovequa
Borel set, 39, 46 tion, 145
BorelCantelli lemma ChapmanKolmogorov the
first, 37 orem, 148
second, 77 charaderistic fundion, 107,
121
www.MathGeek.com
www.MathGeek.com
INDEX 241
Chebyshev's inequality, 100, convergent sequence, 42
122 convexity, 64, 100, 127
choose, 80 convolution, 103
closed set, 30, 61 coordinate, 18
closure, 30, 146 correlation coefficient, 101,
cocountable set, 25, 41 113
cofinite set, 25 countable additivity, 33
complement, 16 countable cover, 88
complete measure space, 45 countable set, 21
complete metric space, 61 countable subadditivity, 34
complete probability space, countable union, 23
135 countably infinite set, 21
complexvalued random pro counting measure, 34
cess, 153, 156 covariance, 101
conditional density, 130 covariance matrix, 114
conditional expectation, 123, cover, 88
126
conditional independence, data processor, 94
148 Dedekind's theorem, 22
conditional probability, 123 DeMorgan's law, 18
continuous function, 40, 73, dense set, 136
141 density function, 74
continuous in probability, dimension, 60
137 Dirac measnre, 34
continuous time random pro discrete random variable, 72
cess, 135 discrete time random pro
convergence in L p , 118 cess, 135
convergence in distribntion, disjoint sets, 16
119 disjointification, 34
convergence in law, 119 distance function, 61
convergence in mean, 118 distribution, 93
convergence In meansquare, distribution function, 70
118 domain, 20
convergence in probability, dominated convergence the
117, 137 orem, 55
convergence in the pth mean, Doob, Joseph, 151
118 Dynkin's JrA theorem, 29
convergence of random vari eigenfunction, 143
ables, 116 eigenvalue, 143
convergence of sets, 37
www.MathGeek.com
www.MathGeek.com
242 INDEX
Einstein, Albert, 164 Gaussian Markov process,
empty set, 13 148
equipotence, 21 Gaussian process, 138
equipotent sets, 21 Gaussian random variable,
equivalence relation, 19 108
evaluation, 18 GramSchmidt procedure,
event, 25, 75 160
expectation, 94 greatest lower bound, 36
expected value, 94
exponential distribution, 109 half wave rectifier, 159
Hall, Eric, 74, 129
Fa set, 31 Hilbert space, 62, 128
factor, 18 Hilbert space projection the
field, 24 orem, 66, 128
filtration, 149 Holder's inequality, 100
finite dimensional distribu
tion, 135 improper Riemann integral,
finite measure, 33 56
finite set, 21 impulse response function,
finitedimensional vector space, 158
60 independent events, 75
first order random variable, independent random van
94 ables, 76
floor, 85 index set, 14, 135
Fonrier transform, 107 indicator function, 20
Frege, Gottlob, 13 indistinguishable random pro
Fubini's Theorem, 163 cesses, 135
function of a random van induced measure, 57, 93
able, 103 induced metric, 61
functions, 19 inferior limit, 36, 42
fundamental theorem of cal infimum, 36
culus, 98 infinitely often, 36
initial probability, 145
G b set, 31 injective function, 20
gain, 157 inner product, 62
gambling, 151 inner product space, 62
Gaussian density function, integrable function, 53
108 integrable random variable,
Gaussian distribution, 109, 94
130 integrable sample paths, 138
www.MathGeek.com
www.MathGeek.com
INDEX 243
integral operator, 143 lim sup, 36
integration by parts, 51 limit point, 30, 61
intersection, 14 limiter, 159, 162
inverse image, 20 linear filter, 158
invertible function , 20 linear manifold , 60
isolated point, 31 linear operation, 157
linear span, 60
Jensen's inequality, 101, 127 linearly dependent vedors,
joint Gaussian distribution , 59
112 linearly independent vectors,
joint probability density func 60
tion, 83 Lipschitz condit on , 49
joint probability distribution lower bound, 35
fundion,83 lower Riemann integral, 47
jointly Gaussian random vari Lusin's Theorem, 160
ables, 113 Lyapounov's inequality, 101
KarhunenLoeve expansion, marginal density fnndion,
144, 166 84
Kolmogorov, Andrei, 69 Markov chain, 145
Lp(O, :F, P), 118 irredndble, 146
L2 continuous process, 141 Markov process, 147
L2 differentiable process, 141 martingale, 149
L2 integrable process, 142 martingale convergence the
L2 integral, 138 orem, 151
Asystem, 28 mean, 94, 98
Laplace distribution, 109 mean recurrence time, 146
Laplace transform , 107 meansquare continuity, 141 ,
law, 93 152
least upper bound, 36 meansquare integral, 138
Lebesgue decomposition the measurable function, 38
orem,73 measurable modification, 137
Lebesgue, Henri, 69 measurable random process,
Lebesgue integrable func 137
tion, 53, 55 measurable rectangle, 136
Lebesgue integral, 52, 53 measurable set, 25
Lebesgue measurable set, 43 measurable space, 25
Lebesgue measure, 44 measure, 33
Leibniz's rule, 166 measure on an algebra, 86
lim inf, 36 measure space, 33
www.MathGeek.com
www.MathGeek.com
244 INDEX
Mehler series, 160 open Euclidean ball, 30
Mercer's theorem, 143 open interval, 39
metric, 61 open rectangle, 46
metric space, 61 open set, 30
mllllmum meansquare esti orthogonal increments, 151
mate, 128, 148 orthogonal vectors, 62
Minkowski sum, 162 orthonormal vectors, 63
Minkowski's inequality, 100 outer Lebesgue measure, 43
modification, 135
moment, 98 pairwise independent events,
moment generating function, 75
105 parallelogram law, 63
monotone convergence theo Parseval's equality, 160
rem, 54 Parsevars identity, 63
monotonicity, 34 perfect set, 31
1["system, 28
Monte Carlo analysis, 84
1[">., 18
Monte Carlo simulation, 86
multivariate Gaussian distri placebo, 132
bution, 113 pointwise convergence, 116
mutually Gaussian random pointwise limit, 42
variables, 113 Poisson approximation, 82
mutually independent events, Poisson distribution, 106
75 positive definite matrix, 114
nmtnally independent ran power set, 14
dom variables, 76, preHilbert space, 62
84 probability density fnnction,
74
neighborhood, 31 probability distribution func
nondecreasing function, 49 tion, 70
nonnegative definite func probability measure, 33
tion, 139 probability space, 33, 70
norm, 60 product O"algebra, 136
normed linear space, 61 product measure, 136
nowhere differentiable func product of random variables,
tion, 165 103
null set, 45 projection, 65, 128
proper subset, 14
onetoone function, 20 proper subspace, 60
onto function, 20
open ball, 61 quantization, 129
www.MathGeek.com
www.MathGeek.com
INDEX 245
46
]R.n, set difference, 16
RadonNikodym theorem, erA, 216
68 eralgebra, 24
random process, 13.5 countably generated, 41
random sequence, 135 generated, 25, 76
random variable, 70 ITfield, 24
range, 20 erfinite measure, 33
rank, 114 ITsubalgebra, 41
rational line, 174 simple function, 51
real vector space, 59 singleton set, 14
reflexive relation, 19 singular distribution, 72
regression function, 129 size of a subdivision, 50
relations, 19 spectral density function, 155
relative <:omplement, 16 spedral distribution fun<:
Riemann integral, 47, 51, 55, tion, 155
143 spedral representation, 155
RiemannStieltjes integral, spedrum, 155
51,96 square law device, 159
RiemannStieltjes sum, 50 St. Petersburg paradox, 97
RieszFrechet theorem, 67 standard deviation, 98
right mntinuous fundion, standard Gaussian distribu
56,70 tion, 110, 111
Russell, Bertrand, 13 standard Gaussian random
variable, 110
sample fnndion, 135 state, 145
sample path, 135 absorbing, 146
sample sequence, 135 aperiodic, 146
sample space, 25 dosed set, 146
SchroederBernstein theo ergodi<:, 146
rem, 22 rnean re<:nrren<:e tinle,
second order random pro 146
cess, 139 null, 146
second order random van period, 146
able, 98 persistent, 146
separable modification, 136, reachable, 146
137 transient, 146
separable random process, stationary Gaussian process,
136 140
sequence of random van statistical
ables, 116 hypothesis, 132
www.MathGeek.com
www.MathGeek.com
246 INDEX
inference, 132 uniform convergence, 144
step function, 1.52 uniform distribution, 84, 109
stochastic integral, 1.52 union, 15
stochastic process, 135 upper bound, 35
strictly stationary process, upper Riemann integral, 47
139 usual topology, 30
strong law of large numbers,
123 variance, 98
subdivision, 47 vector space, 59
submartingale, 150 version, 123
subset, 13 weak law of large numbers,
subspace, 60 122
sum of random variables, wide sense stationary pro
102 cess, 140, 154
superior limit, 36, 42 vViener, Norbert, 164
supermartingale, 150 \Viener process, 164
superset, 13 vVierstrauss Aproximation
supremum, 36 Theorem, 160
surjective function, 20 \Vise, Gary, 74, 129
symmetric difference, 16 with probability 1, 116
symmetric distribution, 107
symmetric matrix, 114 ZermeloFraenkel, 13
symmetric relation, 19 zero memory nonlinearity,
159
Taylor's series, 107, 121
Taylor's theorem, 105
topological space, 30
topology, 30
total set, 63
trajectory, 135
transfer function, 158
transition probability, 145
transitive relation, 19
triangle inequality, 61
unbounded set, 36
unbounded variation, 49
uncorrelated random van
ables, 101
uncountable set, 21
www.MathGeek.com
Molto più che documenti.
Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.
Annulla in qualsiasi momento.