Sei sulla pagina 1di 247

www.MathGeek.

com

Probability:
Basic Ideas and Selected Topics

Eric B. Hall

Gary L. Wise

ALL RIGHTS RESERVED.


UNAUTHORIZED DUPLICATION
IS STRICTLY PROHIBITED.

www.MathGeek.com
www.MathGeek.com

Preface

In writing this book we were faced with a serious dilemma. To


how much of the vast subject of probability theory should an
undergraduate student be exposed? Although it is tempting to
remain at the level of coin flipping, card shuffling, and Riemann
integration, we feel that such an approach does a great disservice
to the students by reinforcing the many popular myths about
probability. In particular, probability theory is simply a branch
of measure theory, and no one should sugar-coat that fact. Al-
though some might suggest that this approach is over the head
of an average student, such has not been the case in our experi-
ence. Indeed, most of the reluctance to cover probability at this
level seems to originate behind the desk rather than in front of
it. The importance of probability is increasing even faster than
the frontier of scientific knowledge, and hence the usefulness of
the standard non-measure-theoretic approach is being left far
behind. Students need to be able to reason clearly and think
critically rather than just learn how to parrot a few simplistic
results. Our goal with this book is to provide the serious student
of engineering with a rigorous yet understandable introduction
to basic probability that not only will serve his or her present
needs but will continue to serve as a useful reference into the
next century.

www.MathGeek.com
www.MathGeek.com

www.MathGeek.com
www.MathGeek.com

Contents

Preface 1

Acknowledgments 7

Introduction 9

Notation 11

1 Set Theory 13
1.1 Introduction 13
1.2 Unions and Intersections 14
1.3 Relations.. 19
1.4 Functions . . . . . . . 19
1.5 cr-Algebras . . . . . . . 24
1. 6 Dynkin 's 1l"- A Theorem 28
1.7 Topological Spa<:es . . 30
1.8 Caveats and Curiosities. 31

2 Measure Theory 33
2.1 Definitions.. 33
2.2 Snpremums and Infimnms . . . . . . . . . 35
2.3 Convergen<:e of Sets: Lim Inf and Lim Sup 36
2.4 Measurable Functions. . . . . . . . . . . . 38
2.5 Real Borel Sets . . . . . . . . . . . . . . . 39
2.6 Lebesgue Measure and Lebesgue Measurable Sets 43
2.7 Caveats and Curiosities. . . . . . . . . . . . . .. 46

3 Integration 47
3.1 The Riemann Integral . . . . . 47
3.2 The Riemann-Stieltjes Integral 49
3.3 The Lebesgue Integral . . . . . 51

www.MathGeek.com
www.MathGeek.com

3.3.1 Simple Functions 51


3.3.2 Measurable Functions. 52
3.3.3 Properties of the Lebesgue Integral 53
3.4 The Riemann Integral and the Lebesgue Integral. 55
3.0t:: The Riemann-Stieltjes Integral and the Lebesgue
Integral 56
3.6 Caveats and Curiosities . 57

4 Functional Analysis 59
4.1 Vector Spaces 59
4.2 Normed Linear Spaces 60
4.3 Inner Product Spaces . 62
4.4 The Radon-Nikodym Theorem 68
4.5 Caveats and Curiosities . 68

5 Probability Theory 69
5.1 Introduction. 69
5.2 Random Variables and Distributions 70
5.3 Independence 75
5.4 The Binomial Distribution 80
5.4.1 The Poisson Approximation to the Bino-
mial Distribution 82
5.5 Multivariate Distributions 83
5.6 Caratheodory Extension Theorem 86
5.7 Expectation 94
5.8 Useful Inequalities 98
5.9 Transformations of Random Variables. 102
5.10 Moment Generating and Characteristic Functions 105
5.11 The Gaussian Distribution 108
5.12 The Bivariate Gaussian Distribution 112
5.13 Multivariate Gaussian Distributions 113
5.14 Convergence of Random Variables 116
5.14.1 Pointwise Convergence 116
5.14.2 Almost Sure Convergence 116
5.14.3 Convergence in Probability. 117
5.14.4 Convergence in Lp 118
5.14.5 Convergence in Distribution 119
5.15 The Central Limit Theorem 120
5.16 Laws of Large Numbers 122
5.17 Conditioning 123

www.MathGeek.com
www.MathGeek.com

5.18 Regression Functions 129


5.19 Statistical Hypothesis Testing 132
5.20 Caveats and Curiosities . 134

6 Random Processes 135


6.1 Introduction. 135
6.2 Gaussian Processes 138
6.3 Second Order Random Pro<:esses 139
6.4 The Karhunen-Loeve Expansion. 143
6.5 Markov Chains 145
6.6 Markov Processes 147
6.7 Martingales 149
6.8 Random Processes with Orthogonal Increments 151
6.9 \Vide Sense Stationary Random Processes 154
6.10 Complex-Valued Random Processes. 156
6.11 Linear Operations on VVSS Random Processes 157
6.12 Nonlinear Transformations 158
6.13 Brownian Motion 164
6.14 Caveats and Cnriosities . 168

7 Problems 169
7.1 Set Theory. 169
7.2 Measnre Theory . 171
7.3 Integration Theory 172
7.4 Functional Analysis 174
7.5 Distributions & Probabilities. 174
7.6 Independence 176
7.7 Random Variables. 177
7.8 Moments. 179
7.9 Transformations of Random Variables. 181
7.10 The Gaussian Distribution 182
7.11 Convergen<:e . 183
7.12 Conditioning 185
7.13 True/False Questions 187

8 Solutions 193
8.1 Solutions to Exercises. 193
8.2 Solutions to Problems 202
8.3 Solutions to True/False Questions 238

www.MathGeek.com
www.MathGeek.com

www.MathGeek.com
www.MathGeek.com

Acknow ledgments

The authors would like to thank David Drumm for many help-
ful suggestions. In addition, they would like to thank Dr. Herb
\Voodson, Dr. Stephen Szygenda, Dr. Tom Edgar, Dr. Edward
Powers, Dr. Francis Bostick, and Dr. James Cogdell. Also, they
would like to acknowledge the wonderful help that GL\V received
in his recovery from a stroke, and in this regard, they mention
the supportive friendship of the preceding friends as well as that
of Dr. Michael Edmond and many dedicated therapists, includ-
ing Michelle Sanderson, Jerilyn Iliff, Janice Johnson, Audrey
Schooling, Liz Larue, and Mischa Smith. Finally, they are grate-
ful to Carey Taylor of the Texas Rehabilitation Commission for
his help in providing services for GL\V's recovery.
This book was typeset using the u\TEX typesetting system de-
veloped by Donald Knuth and Leslie Lamport.

www.MathGeek.com
www.MathGeek.com

www.MathGeek.com
www.MathGeek.com

Introduction

This book is designed to impart a working knowledge of prob-


ability theory and random processes that will enable a student
to undertake serious studies in this area. No prior experien<:e
with probability, statistics, or real analysis is required. All that
is needed is a familiarity with basic <:akulus and an ability to
follow mathemati<:al reasoning.
Any course on probability theory must go down one of two roads.
On the first road the student flips mins, shuffles <:ards, looks
at pretty bell <:nrves, and mnsiders many simple <:onseqnen<:es
of deep, dark theorems mentioned only in footnotes. Although
this road is popular with engineers (and some statistidans), it is
a dead-end road that produces students capable of dealing only
with a few overly-restrictive special cases and incapable of think-
ing for themselves. The semnd road treats probability theory as
a bran<:h of an area of mathematics known as measnre theory.
Althongh this approa<:h reqnires a stndent to first learn some
very basic aspects of set theory and real analysis, the benefits of
taking this road are enormous. Students suddenly understand
the results that they are applying, formerly obtuse theorems be-
come transparently easy, and seemingly advanced engineering
tools such as the Kalman filter are seen as simple consequences
of much more general results. In this work we will take the latter
road without apology.

www.MathGeek.com
www.MathGeek.com

www.MathGeek.com
www.MathGeek.com

Notation

the set of all real numbers


the set of all integers
the set of all integers greater than zero
the set of all rational numbers
the set of all complex numbers
the empty set
'l the imaginary unit
z* the complex conjugate of the complex
number z
A~B A c B and A i=- B
lP(S) the set of all subsets of the set S
IA the indicator function of the set A
AC the complement of the set A
A\B the set of points in A that are not in B
ADB (A \ B) U (B \ A)
-A {-x: x E A} for A c JR.
Lp(D, F, J-L) the set of all J-L-equivalence classes
of functions f:(0., F) --+ (JR., B(JR.))
such that Jo Ifl P dJ-L < OCI
B(T) the Borel subsets of a Borel
subset T of JR.
M(A) the Lebesgue measurable subsets
of A E B(JR.)
m Lebesgue measure on M(JR.)
). Lebesgue measure on B(JR.)
a({A;: i E I}) the smallest a-algebra including {Ai: i E I}
1T({X;:iEI}) the smallest IT-algebra for which
Xi is measurable for each i E I

www.MathGeek.com
www.MathGeek.com

12

the function j restricted to A


max{j, O} for a real-valued
function j
- min{j, O} for a real-valued
function j
a.e. [J1] almost everywhere with respect to the
measure J1; i.e. pointwise off
a J1-null set
a.s. almost surely
{a E A: condition} the set of points in A for which the
indicated condition is true
V "for all" or "for each"
::3 "there exists"
st "such that"
wp "with probability"
"has the distribution"
D Quod Erat Demonstrandum
<) This symbol denotes an unusually difficult
section or problem. Proceed with caution.

www.MathGeek.com
www.MathGeek.com

1 Set Theory

1.1 Introduction

\Ve will take a naive approach to set theory. That is, we will
assnme that any describable collection of objects is a set. Con-
sider a set A By writing x E A we will mean that x is an
element of the set A By writing x t/:. A we will mean that x is
not an element of the set A. Note that x E A and x t/:. A cannot
both be true simultaneously. To see why our approach is naive,
let R denote the set of all sets A such that A t/:. Al If R E R
then by definition it follows that R t/:. R. Similarly, if R t/:. R
then by definition it follows that R E R. Thus, although R is a
describable collection of objects, R is not a set!
This paradox was discovered by Bertrand Russell and had a
rather devastating effect on the work of a German logician
named Gottlob Frege who later wrote: "To a scientific author
hardly something worse can happen than the destruction of the
foundation of his edifice after the completion of his work. I
was placed in this position by a letter of Mr. Bertrand Russell
when the printing came to a dose." To avoid such paradoxes,
set theory is based upon systems of axioms such as the Zermelo-
F):·aenkel system. Mathematics is based upon such systems of
axioms and "mathematical truths" mnst be nnderstood in that
light. One such axiom that we will use without hesitation is
the Axiom of Choice which simply states that for any collec-
tion {Xa : a E A} of nonempty sets, there exists a function c
mapping A to UaEA Xa such that c( a) E Xa for each a E A
Although seemingly innocuous, there are many deep and dark
consequences of the Axiom of Choice.
The set with no elements is called the empty set and is denoted
by 0. Vve say that a set B is a subset of a set A and we write
B c A if x E A whenever x E B. Vve sometimes denote this by
1 It is possible for a set to be an element of itself. For example, the set
of all sets that contain more than one element is itself a set that contains
more than one element, and hence is an elelnent of itself.

www.MathGeek.com
www.MathGeek.com

14 Set Theory

saying that A is a superset of B in which case we write A :J B.


Note that any set is a subset and a superset of itself. Two sets A
and B are said to be equal if A c B and if B c A. In this case
we write A = B. If A and B are not equal we write A i=- B. A
set A is said to be a proper subset of B if A c B and if A i=- B.
(We sometimes denote this by writing A~B.)

Later generations will regard set theory as


a malady from which one has recovered.
-Poincare

Consider a nonempty set 0 and let x be an element from 0.


The set {x} containing only the element x is called a singleton
set. In general, for elements Xi from 0 where i ranges over some
index set I we will let {Xi: i E I} denote the set containing only
the elements Xi for i E I.

Exercise 1.1 Is there any difference between {0} and 0?

For any set A the power set of A is denoted by JID(A) and is


defined to be the set of all subsets of A. That is, a set B is
an element of JID(A) if and only if B c A. In set notation we
may write JID(A) = {B: B C A}. Note that 0 E JID(A) and that
A E JID(A) for any set A.

Exercise 1.2 \Vhat is JID(0)?

1.2 Unions and Intersections

Let 0 and I be nonempty sets and consider a collection of sub-


sets of 0 denoted by {Ai: i E I}. In this case the set I is called
an index set and is often taken to be a subset of the real line R
The intersection of the sets in {Ai:i E I} is denoted by niEI Ai

www.MathGeek.com
www.MathGeek.com

Unions and Intersections 15

and is defined to be the set of all points in n that are in Ai for


each i E I. That is,

nA =
iEI
{x En: x E Ai Vi E I}.

(Note that this intersection equals n if I = 0.) The union of


the sets in {Ai: i E I} is denoted by UiEI Ai and is defined to
be the set of all points in n that are in Ai for some i E I. That
IS,
U Ai = {x En: 3i E I st x E Ai}.
iEI

(Note that this union equals 0 if I = 0.) In other words, for


two sets A and B, the set An B contains elements that are in
A and in B and the set A U B contains elements that are in A
or in B. 2 If I = {I, ... , n} for some positive integer n then we
will often write

as n

or as Al n ... n An and similarly for unions. If 1= N, the set of


positive integers, then we will often write

as
=

and similarly for unions.


Consider three sets A, B, and C. YOll should be able to prove
the following properties concerning unions and intersections:

1. A n B = B n A and A U B = B U A. That is, unions and


intersections are commutative.

2. A n0 = 0 and A U 0 = A.
2This "or" is not an exclusive or. That is, a point that is in both A and
B is also in A u B.

www.MathGeek.com
www.MathGeek.com

16 Set Theory

3. A U A = A n A = A. That is, unions and intersections are


idempotent.

4. (AUB)UC = AU(BUC) and (AnB)nC = An(BnC).


That is, unions and intersedions are associative.

5. (A n B) c A and A c (A U B).

6. A c B if and only if A U B = B.

7. If A c C and B c C then (A U B) c C.

8. (A U B) n C = (A n C) U (B n C) and (A n B) U C =
(A U C) n (B U C). That is, nnions and intersedions are
distributive.

Exercise 1.3 Prove that A c B if and only if A U B = B.

The set difference of two sets A and B is denoted by A \ Band


is defined to be the set of points in A that are not in B. The set
A \ B is sometimes called the relative complement of B in A.
If the set A is clear from the context of our discussion we will
often write A \ B as BC and refer to it simply as the complement
of B. That is, if n is some fixed nonempty underlying set then
BC = {x En: x tj. B}.
The symmetric difference of two sets A and B is denoted by
AL:-.B and is defined to be the set (A \B) U (B\A). Two sets A
and B are said to be disjoint if A n B = 0. A colledion of sets
is said to be disjoint if any two distind sets from the colledion
are disjoint.
In what follows, any set of the form AC should be interpreted
to refer to the set n \ A for some fixed nonempty set n that
contains every point of interest. You should be able to prove
the following properties:

2. Au AC = n.
3. A and AC are disjoint.

www.MathGeek.com
www.MathGeek.com

Unions and Intersections 17

It may seem to be a stark paradox that,


just when mathematics has been brought
close to the ultimate in abstractness, its
applications have begun to multiply and
proliferate in an extraordinary fashion .
. . . Far from being paradoxical, however,
this conjunction of two apparently oppo-
site trends in the development of mathe-
matics may rightly be viewed as the sign
of an essential truth about mathematics it-
self. For it is only to the extent that math-
ematics is freed fi"Oln the bonds which have
attached it in the past to particular aspects
of reality that it can become the extremely
flexible and powerful instrument we need
to break paths into areas now beyond our
ken. -Marshall Stone

The Cartesian product of two sets A and B is denoted by A x B


and is defined to be the set of all ordered pairs (a, b) for which
a E A and bE B. For example, JR. x JR. (often denoted by JR.2) is
the plane. For n E N, the Cartesian product of n sets A 1 , ... ,
An is the set of all orderedn-tuples (a1, ... , an) where ai E Ai
for each positive integeri :::::; n. This product is denoted by
n
ITA;
i=l

or by A1 X ... x An. Note that this product is empty if Ai is


empty for any i. For example, JR. x JR. x JR. (denoted by JR.3) is the
set of all ordered triples of three real numbers. Note that JR.3,
JR. X JR.2, and JR.2 x JR. are three distinct sets.
For sets A and B, the set BA is the set of all functions mapping
A into B. Let A and 0 be nonempty sets and, for each), E A,
let A,\ be a nonempty subset of 0. The Cartesian product of
the A,\ 's over the set A is a subset {w.,\ EO: ). E A} of OA such

www.MathGeek.com
www.MathGeek.com

18 Set Theory

that for all A E A, w).. E A)... \Ve denote this product by

II A)...
)..EA

In the context of this product, the set A).. is called the A-th
factor. Also, if {h En: A E A} is a point in the product
then h is called the A-th coordinate of the point. For A E A,
we will let 'iT)..: ITaEA A" ---7 A).. be the mapping that assigns a
point in IT)..EA A).. to its A-th coordinate. The map 'iT).. is called
the canonical projedion into the A-th fador or the evaluation
at A.

1.1 Theorem (DeMorgan's Law) Let n and I be nonempty sets


and assume that Ai C n
for each i E I. Then

Proof. If I = 0 then the result reduces to n


follows by definition.
Let I be an arbitrary nonempty set, and for each i E I, let Ai
be a subset of n. First assume that niEI Ai = 0. Then for each
wEn, there exists some i E I such that w tj Ai and hence such
that w E UiEI Af. We have shown that n c UiEI Af. Clearly,
UiEI Af c n. Thus, n = UiEI Af, and it follows that

nA =
iEliEI
0 = n c
= (u A~) c

Now, assume that niEI Ai i= 0. Let w be any point belonging to


niEI Ai· Then w E Ai for eachi E I. In particular, w tj UiEI Af,
and thus
nAi c (UA~)C
iEI iEI

Conversely, it follows from this that assnming that niEI Ai i= 0


implies that (UiEI A~r i= 0. Now let w E (UiEI A~r. Then
w tj Af for any i E I and thus w E Ai for all i E I. Therefore,

(uA~)C c nA.
lEliEI

www.MathGeek.com
www.MathGeek.com

Relations 19

Hence, we have

nAi = (U A~)C
iEI iEI

for any set I and for any family {Ai: i E I} of subsets of D. D

Note that the following corollaries are an immediate consequence


of the previous theorem.

1.1 Corollary Au B = (AC n BC)c.


1.2 Corollary An B = (AC U BC)c.

1.3 Relations

Consider subsets A and B of a nonempty set o. A relation R


between A and B is a subset of Ax B. If R is a relation between
A and B, then two points a E A and b E B are said to be R-
related if (a, b) E R. Vve will call a relation R between A and
A a relation R on A. A relation R on A is said to be transitive3
if (aI, a2) E Rand (a2, a3) E R imply that (aI, a3) E R. A
relation R on A is symmetric if (aI, a2) E R implies that (a2'
al) E R. A relation R on A is reflexive if (a, a) E R for all
a E A. A relation R on A is called an equivalence relation if it
is reflexive, symmetric, and transitive.

1.4 Functions

Let A and B be nonempty sets. A function f mapping A into


B (written as f: A ---7 B) is a relation between A and B such
that:
JEven though a relation is a set, there is a difference between a transitive
set and a transitive relation. Here we are defining a transitive relation.

www.MathGeek.com
www.MathGeek.com

20 Set Theory

1. if a E A then there exists b E B such that (a, b) E 1, and,

2. if (a, bd E 1 and (a, b2 ) E 1 then bl = b2 ·

Thus, a function is defined everywhere on A and assigns precisely


one element of B to an element in A. If A and Bare nonempty
sets and if 1 is a function mapping A into B, then we typically
use the notation 1(a) = b to denote that (a, b) E .f. The set A
is called the domain of the function 1. If SeA then we will
let I(S) denote the subset of B given by {b E B : b = l(a) for
some a E S}. The set 1(S) is called the image of Sunder 1.
The set 1(A) is sometimes called the range of 1. By convention,
for a E A, 1({ a }) is usually taken to be the element 1(a) E B
rather than the subset {I (a)} of B. For any set A, the indicator
function of A is denoted by IA (x) and equals 1 if x E A and
equals zero otherwise.

Example 1.1 Let f: lR ----7 [0,00) via l(x) = x 2 . Then 1((1,


2]) = (1, 4], 1({2, 3}) = {4, 9}, and 1({-3, 3}) = {9}. D

A function 1: A B is said to be injective or one-to-one if any


----7

two distinct elements of A have distinct images in B; that is, if


UI i= a2 then l(al) i= l(a2). A function f: A ----7 B is said to be
snrjective or onto if I(A) = B; that is, given any b E B there
exists an a E A such that 1(a) = b. A function 1: A ----7 B is
said to be bijective or to be a bijection between A and B if it is
both injective and surjective.
Let f: A ----7 B and let M c B. The inverse image of }vl with
respect to 1 is denoted by 1-1 (}vl) and is defined to be the set
{a E A : l(a) E AI}. Note that 1-1 is a function mapping
lP(B) into lP(A). A function f: A ----7 B is bijective if and only if
1-1 ({x} ) is a function mapping the set of all singleton subsets of
B into the set of all singleton subsets of A. In this case we write
1-1 ({ X } ) as 1-1 (x) and say that 1 is invertible with inverse 1-1.

Exercise 1.4 For each of the following functions answer the


following questions: Is the function onto? Is the function one-
to-one? If yes to both then what is the inverse of the function?

www.MathGeek.com
www.MathGeek.com

Functions 21

The inverse of a function f: A ----7 B can be defined in two


different ways. First, we can consider f- 1 to be a function
that maps JP(B) to JP(A). This type of inverse always
exists for any function f. Second, we can consider f- 1
to be a function that maps B to A. This type of inverse
exists if and only if f is one-to-one and onto. The type
of inverse under consideration must be inferred from the
context.

2. f: JR. ----7 [0, (0) via f(x) = x 2 .

3. f: [0, (0) ----7JR. via f(x) = x 2 .

4. f: [0, (0) ----7 [0, (0) via f(x) = x 2 .

Exercise 1.5 For a bijection f: A ----7 B show that fU- 1 (b)) =


band f-l(f(a)) = a for each a in A and each b in B.

Exercise 1.6 Let S be any set with exactly two elements and
let R be any set with exactly three elements. Does there exist
a bijection from R into S? Why or why not? Does there exist a
bijection from S into R? Why or why not?

Exercise 1. 7 If there exists a bijection of A into B then must


there also exist a bijection of B into A?

Two sets A and B are said to be equipotent if there exists a


bijection mapping A to B. A set S is said to be countable if it is
empty or if it is equipotent to a subset of the positive integers.
A set S is said to be finite if it is empty or if it equipotent to a
set of the form {I, 2, ... , n} for some positive integer n. A set
S is said to be countably infinite if it is countable but not finite.
A set is said to be uncountable if it is not countable.

<) Example 1. 2 Let A = JR. and let B denote the set of all
functions that map A into {O, I}. Vve will show that A and B

www.MathGeek.com
www.MathGeek.com

22 Set Theory

are not equipotent. Assume, by way of contradiction, that A


and B are equipotent. There then exists a function g: A ----+
B such that 9 is onto and one-to-one. For a real number a,
denote g(a) by fa(x); that is, g(a) is a function mapping lR to
{O, I}. Let 4J(x) = 1 - fx(x) and note that 4J E B. Since 9
is bijective, there exists a point a E A such that 4J(x) = fa (x)
which implies that 1 - .Ix (x) = fa (x). If we let x = 0: then
it follows that 1 - .In (a) = .In (a) which in turn implies that
fa(c~) = ~. This, however, is not possible since fa takes values
only in the set {O, I}. This contradiction implies that A and B
are not equipotent. D

1.2 Theorem (Dedekind) A set ,is an infinite set if and only 'if it
is eq1L'ipotent to a proper subset of 'itself.

1.1 Lemma If A and B are sets, if A is countable, and if .I: A ----+


B J then f(A) ,is cmmtable.

1.2 Lemma Let T be a set having at least two distinct elements and
let I be an infinite set. The set of all functions mapping I to T
is uncountable.

1.3 Theorem (Schroeder-Bernstein) Let A and B be sets. If


there eJ;ists a one-to-one mapping of A to B and a one-to-one
mapping of B to A then A and B are equipotent.

Proof. This result is proved on page 20 of Real and Abstract


Analysis by E. Hewitt and K. Stromberg (Springer-Verlag, New
York, 1965). D

1.3 Lemma ( Cantor) For any set OJ the sets 0 and lfD(O) are not
equipotent.

Proof. Assume that 0 and lfD(O) are equipotent. There then


exists a function .I mapping 0 to lfD(O) that is onto and one-to-
one. Let U = {w ED: w tj. f(w)}. Since U E lfD(O) it follows
that U = .I (x) for some point x in D. Is x E U? If x E U then
x tj. f(x) which implies that x tj. U. If x tj. U then x E f(x)
which implies that x E U. Thus, no sHch fmIction f exists and

www.MathGeek.com
www.MathGeek.com

Functions 23

the desired result follows. Note that this lemma implies that
"the set of all sets" is not a set! D

Exercise 1.8 Show that any subset of a countable set must


itself be countable.

Exercise 1.9 Show that a count ably infinite union of count-


able sets is conntable. That is, show that UiEN Ai must be connt-
able if Ai is countable for each i E N.

1.4 Theorem The set Q of rational numbeTs is countable.

Proof. Note that

Q- U[U {::}]
nE.:z kEN k

and hence Q is countable since it may be written as a countable


union of countable sets. D

1.5 Theorem The set of all r-eal numbeTs is an uncountable set.

Proof. Assume that [0, 1) is countable. Hence there exists a


bijective function f: [0, 1) ----7 N. Using this function f, enumer-
ate the set [0,1) as a sequence {aI, a2, ... }. Notice that each ai
corresponds to a point in [0, 1) and hence may be expressed as
a decimal expansion where we agree that any expansion ending
with a string of all 9's will instead be written in a form ending
with a string of O's. Construct an element b of [0, 1) as follows:
Let b = 0.17.117.217.3'" where, for each i E N, 17.i is chosen to be
a single digit that is not equal to the ith digit in the decimal
expansion of ai. Since b is an element in [0, 1) that is not eqnal
to ai for any i we conclude that [0, 1) (and hence JR) is uncount-
able. D

www.MathGeek.com
www.MathGeek.com

24 Set Theory

1.5 o--Algebras

Consider a nonempty set D and a subset A of JPl(D). (That is,


an element of A is a subset of D.) The set A is said to be an
alge bra (or afield) on D if the following three properties are
satisfied:

1. DE A.

2. If A E A then Ac E A.

3. If A E A and B E A then A U B E A.

That is, an algebra on D is a subset of JPl(D) that contains D,


that is dosed under complementation, and that is dosed under
finite unions.
The set A is said to be a a-algebra (or a a-field) on the nonempty
set D if the following three properties are satisfied:

1. DE A.

2. If A E A then AC E A.

3. If An E A for each n E N then UnEl'I An E A.

That is, a IT-algebra on D is a subset of JPl(D) that contains D,


that is closed under complementation, and that is closed under
countable unions. Note that any a-algebra is an algebra and that
any algebra contains the empty set. Note also that DeMorgan's
Law implies an algebra is closed under finite intersections and
that a a-algebra is closed under countable intersections. Finally,
note that any algebra containing only a finite number of elements
is also a IT-algebra.
Let's briefly review our notation: Let A be a subset of a
nonempty set D and assume that w is a point in A and that A is
an element of an algebra A on D. Then w E {w} cAe D E A,
A E A c JPl(D), and w E A. In the following exercises let D be
a nonempty set.

www.MathGeek.com
www.MathGeek.com

cr -Algebras 25

Exercise 1.10 Is {0, O} a cr-algebra on O?

Exercise 1.11 Is lP(O) a cr-algebra on O?

Exercise 1.12 Let 0 {l, 2, 3}. Find five different a-


algebras on O.

Exercise 1.13 Show that an intersection of cr-algebras is


itself a a-algebra. Does the same hold for a union of a-algebras?

Exercise 1.14 Let 0 be the set of all real nnmbers and let A
be the collection of all snbsets of D that are either finite or have
finite complements. (A set with a finite complement is said to
be cofinite.) Is A an algebra on O? Is A a a-algebra on O?

Exercise 1.15 Let 0 be the set of all real numbers and let
A be the collection of all subsets of 0 that are either countable
or have conntable complements. (A set with a countable com-
plement is said to be coconntable.) Is A an algebra on D? Is A
a cr-algebra on O?

Consider a nonempty set 0 and a cr-algebra A on O. The ordered


pair (0, A) is called a measurable space and sets in A are called
measurable sets. Later we will refer to D as a sample space and
refer to measnrable sets as events.
Consider a non empty set 0 and let F be any subset oflP(O). The
cr-algebra generated by F is denoted by cr(F) and is defined to
be the smallest cr-algebra on 0 that contains each element in F.
That is, if B is any cr-algebra on 0 that contains each element in
F then cr(F) C B. Note, also, that if F is already a cr-algebra,
then a(F) = F.

Exercise 1.16 ~What is the difference (if any) between a(0)


and a( {0})?

www.MathGeek.com
www.MathGeek.com

26 Set Theory

Exercise 1.17 \Vhat is iT( {0})7

Exercise 1.18 \Vhat is iT( {A}) for a subset A of 07

Exercise 1.19 What is 0"( {A, B}) for subsets A and B of 07


(In general, 0"( {A, B}) will contain 16 elements.)

Exercise 1.20 Consider a iT-algebra F on a nonempty set O.


Does there exist a subset A of n snch that A C F and A E F7

\Ve will next consider several properties of inverse functions. In


each of the following three lemmas we will let the context set
our notation.
Let A and B be nonempty sets and let .I: A ----+ B. Further, for
a nonempty set I, let Bi be a subset of B for each i E I. Note
that:

iEI
(by definition of the inverse)
{a E A : 3i E I st .I (a) E Bi }
(by definition of the union)
{a E A : 3i E I st a E .1- 1 (Bin
(by definition of .1- 1 )
{a E A: a E U f-1(B i n
iEI
(by definition of the union)
U f-1(B i ).
iEI

Thus, we have the following result:

1.4 Lemma .1- 1 (U


iEI
Bi) = U
iEI
f-1(B i ).

www.MathGeek.com
www.MathGeek.com

cr-Algebras 27

Next, let M be a subset of B and notice that

a E (f-l(NI))C ¢:::::} a tJ. j-l (lVI)


¢:::::} j(a) tJ. .M
¢:::::} j(a) E N[C
¢:::::} a E j-l(l\;I C
).

Thus, we have the following result:

Again, let A and B be nonempty sets and let f: A ---+ B. Fur-


ther, for a nonempty set I, let B;, for i E I, be a subset of B.
Note that:

UU- 1(Bi))C)C; via DeMorgan's Law


(1EI
(u j-l(Bn)C; via Lemma 1.5
1EI

(j-l (~Bf) J; via Lemma 1.4

j-l ( (~Bf J) ; via Lemma 1.5

j-l (n
1EI
Bi) ; via DeMorgan's Law.

Thus, we have the following result:

1.6 Lemma n
1EI
j-l(Bi ) = j-l (n 1EI
Bi) .

If f: A ---+ B and if F is a subset of JID(B) then we will let j-l(F)


denote the subset of JID(A) consisting of every subset of A that is
an inverse image of some element in F. That is, S E j-l (F) if
and only if S = j-l (T) for some T E F. The following theorem
follows quickly from the three preceding results.

1.6 Theorem Let A and B be nonempty sets and let j: A ---+ B. If


B is a cr-algebm on B then j-l(B) is a cr-algebm on A.

www.MathGeek.com
www.MathGeek.com

28 Set Theory

Exercise 1.21 Let A and B be nonempty sets and let f:


A ---+ B. For a subset 5 of A, let f(5) denote the subset of B
given by {f(.s) : .s E 5}. For a subset P of lP(A), let f(P) denote
the subset of lP(B) given by {f(5) : 5 E P}. If A is a cr-algebra
on A then must f(A) be a cr-algebra on B?

1.7 Theorem Consider- measumble spaces (rh, .r1) and (0 2 , .r2)


and let f be a function mapping 0 1 to O2 . Let A be a collection
of subsets of O2 s1Lch that rr(A) = .r2. If f-1(A) c .r1 then
f- 1(.r2) c .r1.

Proof. It follows from Lemma 1.4 and Lemma 1.5 that the
collection Q of all subsets A of O2 such that f-1(A) E .r1 is a cr-
algebra on O2 . Note that O2 E Q since f-1(0) = 0. (Note that
in the last equation the first empty set is o~ and the second
empty set is n1-) Further, note that A c Q. This implies
that rr(A) C Q. Since rr(A) = .r2 the desired result follows
immediately. D

<> 1.6 Dynkin's 'iT-A Theorem

Consider a nonempty set o. A subset P of lP(O) is said to be a


7r-system if it is dosed under the formation of finite intersections;
that is, if A E P and B E P imply that An B E P. A subset L
of lP(O) is said to be a A-system if it satisfies the following three
properties:

1. 0 E L.

2. If A E L then AC E L.

3. If An E L for each 17 E N and if Ai n Aj = 0 when i 1- j


then Un EN An E £.

www.MathGeek.com
www.MathGeek.com

Dynkin's IT-A Theorem 29

That is, a A-system contains 0, is closed under the formation


of complements, and is closed under the formation of countable
disjoint unions.
The following resnlt is called Dynkin's 1T-A Theorem and is often
quite helpful in proving uniqneness.

1.8 Theorem (Dynkin) Consider a nonempty set 0 and subsets


P and £, of JID(O). If P is a 1T-system and if £, is a A-system
then Pc£, implies that rr(P) C £'.

Proof. Let A(P) denote the intersection of all A-systems that


in dude P as a snbset. Each family of sets in this intersection
contains 0, is closed under proper differences, and is closed un-
der strictly increasing limits of sets. 4 Thus, the intersection itself
contains 0, is closed under proper differences, and is closed un-
der strictly increasing limits of sets. Hence, A(P) is a A-system.
Note that A(P) C £. Thus if A(P) is a 1T-system, then it will
be a rr-algebra that is a superset of rr(P) and a subset of £', and
hence the desired result will follow. Therefore, we will show that
A(P) is a 1T-system.
For each subset A of 0, let PA denote the family of all subsets
B of 0 such that An B is an element of A(P). Let Al E A(P).
Notice that 0 E PAl since Al nO = Al E A(P). Now assume
that C1 and C2 are elements of PAl such that C1 c C2 . Then
(AI n C 1 ) c (AI n C 2 ) and thus ((AI n C 2 ) \ (AI n C 1 )) E A(P);
also, (AI n C 2 ) \ (AI n Cd = (AI n C2 ) n (AI n CdC = (AI n
C2 ) n (A~ U Cf) = Al n C2 n Cf = Al n (C2 \ Cd· Thns PAl is
dosed lmder proper differences. Finally, assnme that {Dn}nEN
is an increasing seqnence of sets in PAl. Then the sequence
{AI n Dn}nEN is either increasing or there exists some kEN
such that n > k implies that Al n Dn = AI. In either case,
lim(Al n Dn) E A(P) since A(P) is a 1T-system containing the set
AI. Thus, PAl is a A-system. Furthermore, notice that P C PAl
since (AI n B) E P c A(P) for all B E P. Thns, if Al E P, then
PAl is a A-system that indndes P. Since A(P) is the minimal
A-system that indndes P, we see that for Al E P, A(P) C PAl.
4Limits of sets will be defined later. If this is your first trip through the
book, then hold off on this proof until you have read the next chapter.

www.MathGeek.com
www.MathGeek.com

30 Set Theory

From this we see that if Al E P and B E ).(P) then (Al n B) E


).(P). This, in turn, implies that for B E )'(P), P C PB. Since,
for B E )'(P), PB is a ).-system that includes P and since ).(P) is
the minimal ).-system that includes P, we see that ).(P) C PB.
Now we observe that this means that for B E ).(P) and. for
C E ).(P) we have (BnC) E ).(P). Thus, ).(P) is a 1T-system. D

<) 1.7 Topological Spaces

Let 0 be a nonempty set. A topology U for 0 is a subset of JP(O)


that contains 0, contains the empty set, is closed under finite
intersections, and closed under arbitrary unions. A topological
space is an ordered pair (0, U) where 0 is a nonempty set and U
is a topology for D. The sets in U are called the open sets with
respect to the topology U on O. The complement of an open
set is called a closed set. Note that in any topological space (0,
U) the sets 0 and 0 are both open and closed. It follows from
DeMorgan's Law that a finite union of closed sets is closed and
an arbitrary intersection of closed sets is closed.

Example 1.3 Consider the set JR:k for a positive integer k. For
x E JR:k and r E (0, (0), let B(x, r) denote the open Euclidean
ball in JR:k centered at x with radius r; that is,

B(x, r) = {Y E JR:k : "t,(1T (X) -


i 1T.;(y))2 < r2} .

For the usual topology on JR: k , a subset U of JR:k is open if and only
if for any x E U there exists a positive real number r snch that
B(x, r) C U. Unless noted otherwise, we will always assnme
that JR:k is equipped with its usual topology. D

Let (0, U) be a topological space and let A C o. A point w E 0


is a limit point of A if A n (U \ {w}) is not empty for any open
set U that contains w. Note that a limit point of A need. not be
an element of A. The closure of A is the union of A with the set

www.MathGeek.com
www.MathGeek.com

Caveats and Curiosities 31

of all limit points of A. A neighborhood of A is any subset of n


that includes an open superset of A. An isolated point of A is
a point w E A such that AnN = {w} for some neighborhood
N of {w}. A closed set that has no isolated points is said to
be a perfect set. A subset of n is said to be a G b set if it is
expressible as a <:ountable intersedion of open sets. A subset of
n is said to be an Fer set if it is expressible as a countable union
of dosed sets.

1.8 Caveats and Curiosities

It is important to keep in mind the crucial role that topology


plays in dealing with questions regarding convergence. For ex-
ample, the set {ffi., 0} is a topology on the real line that is called
the trivial topology on ffi.. Under this topology, every sequence
of real numbers converges to every real number!

www.MathGeek.com
www.MathGeek.com

32 Set Theory

I wanted certainty in the kind of way


in which people want religious faith. I
thought that certainty is more likely to
be found in mathematics than elsewhere.
But I dismvered that many mathemati-
cal demonstrations, which my teachers ex-
pected me to accept, were full of fallacies,
and that, if certainty were indeed discov-
erable in mathematics, it would be in a
new field of mathematics, with more solid
foundations than those that had hitherto
been thought secure. But as the work
proceeded, I was continually reminded of
the fable about the elephant and the tor-
toise. Having mnstructed an elephant
upon which the mathematical world could
rest, I found the elephant tottering, and
proceeded to construct a tortoise to keep
the elephant from falling. But the tortoise
was no more secure than the elephant, and
after some twenty years of very arduous
toil, I came to the conclusion that there
was nothing more that I could do in the
way of making mathematical knowledge in-
dubitable. -Bertrand Russell (who, with
Alfred North Vlhitehead, constructed a
362 page proof that 1+1=2.)

www.MathGeek.com
www.MathGeek.com

2 Measure Theory

2.1 Definitions

A measure J-L on a measurable space (n, F) is a fmIction on F


that satisfies the following three properties:

1. J-L: F ---7 [0, 00]

2. J-L(0) = 0

3. If An E F for all n E N and if An n Am = 0 when m i= n


then

A function that satisfies property (3) is said to be countably


additive. Thus, a measure is a countably additive, nonnegative,
extended real-valued set function that maps the empty set to
zero. If A is an element of F then J-L(A) is called the measure
(or J-L-measure) of A.
A measure J-L on a measurable space (n, F) is said to be a finite
measure if j1(n) < 00. A measure j1 on a measurable space
(n, F) is said to be a IT-finite measure if n may be written as
n = UnEN An where An E F and J-L(An) < 00 for each n.
If J-L is a measure on a measurable space (n, F) then the re-
sulting ordered triplet (n, F, J-L) is called a measure space. A
probability measure P on a measurable space (n, F) is a mea-
sure on (n, F) such that p(n) = 1. The associated measure
space (n, F, P) is then called a probability space and sets in
F are called events. If A is an event then P(A) is called the
probability of A. Note that it does not make sense to discuss
the probability of subsets of n that are not events.

Example 2.1 Let n be a nonempty set and let Wo be a point


in n. Let J-L: JID(n) ---7 {O, I} via J-L(A) = 1 if Wo E A and J-L(A) = 0
if Wo tt A. Then J-L is a measure on (n, JID(n)) and (n, JID(D) , J-L)

www.MathGeek.com
www.MathGeek.com

34 Measure Theory

is a probability space. The particular measure in this example


is known as Dirac measure at the point Wo. D

Example 2.2 Let 0 be any nonempty set and define J-L on


lP(O) by letting J-L(A) = 00 if A is an infinite set and by letting
J-L(A) equal the number of points in A if A is a finite set. Then
J-L is a measure on (0, lP(O)). The particular measure in this
example is known as munting measure. D

2.1 Theorem (Monotonicity) Consider a measure space (0, F,


J-L) and let A and B be elements of F. If A c B then J1 (A) ::;
J-L( B).

Proof. Notice that B = Au (B \ A) and also that An (B \


A) = 0. Since J1 is countably additive we see that J-L(B) =
J-L(A) + J-L(B \ A). Further, since J-L is nonnegative, we see that
J-L(B \ A) 2': o. Thus, it follows that J-L(A) ::; J-L(B). D

2.2 Theorem Consider a measure space (0, F, J-L). The measure


J-L is countably s1Lbaddii-ive. That is, given any (not necessarily
disjoint) sequence {An}nEN of sets in F it follows that

Proof. Define a new sequence {A;'JnEN of measurable sets as


follows:

n-l
A~ An \ U Ak for n E N \ {I}.
k=l

Note that
U An = U A~
nEN nEN
and that the A~ 's are disjoint. (The collection {A;' : n E N}
is called a disjointification of the collection {An : n E N}.)
Further, since A;1 c An for each n, Theorem 2.1 implies that

www.MathGeek.com
www.MathGeek.com

Supremums and Infimums 35

JL(A~):S JL(An) for each n. This observation combined with the


countable additivity of JL implies that

Logic is the railway track along which the


mind glides easily. It is the axioms that
determine our destination by setting us on
this track or the other, and it is in the mat-
ter of choice of axioms that applied math-
ematics differs most fundamentally from
pure. Pure mathematics is controlled (or
should we say "uncontrolled"?) by a prin-
ciple of ideological isotropy: any line of
thought is as good as another, provided
that it is logically smooth. Applied math-
ematics on the other hand follows only
those tracks which offer a view of natural
scenery; if sometimes the track dives into
a tunnel it is because there is prospect of
scenery at the far end. -J. L. Synge

2.2 Supremums and Infimums

Let S be a subset of JR. An element x E JR is said to be an upper


bound of S if y :S x for all yES. An element x E JR is said to
be a lower bound of S if x :S y for all yES.
\Ve say that a subset of JR is bounded above (below) if it has an
upper (lower) bound. If a subset of JR has both an upper bound
and a lower bound then we say that the set is bounded. vVe say

www.MathGeek.com
www.MathGeek.com

36 Measure Theory

that a subset of lR is unbounded if it lacks either an upper or a


lower bound.
Let S be a subset of R If S is bounded above then an upper
bonnd of S is said to be a snpremnm (or least npper bonnd) of
S if it is less than every other npper bonnd of S. We denote the
supremum of S by sup S. If S is bounded below then a lower
bonnd of S is said to be an infimnm (or greatest lower bonnd) of
S if it is greater than every other lower bound of S. \Ve denote
the infimum of S by inf S. If S is not bounded above then we
will define snp S to be 00 and if S is not bonnded below then
we will define inf S to be -00. Thus, any subset of lR possesses
an infimum and a supremum.
The supremum of a subset S of lR and the infimum of Sneed
not belong to S. If sup S is an element of S then we sometimes
refer to sup S as the maximum of S and denote it by max S. If
inf S is an element of S then we sometimes refer to inf S as the
minimum of S and denote it by min S. For example, if S = (a,
b] then inf S = a, sup S = b, max S = b, and min S does not
exist.

<) Exercise 2.1 Does there exist a subset A of lR such that


sup A < inf A?

2.3 Convergence of Sets: Lim Inf and


Lim Sup

Let {An }nEN be a sequence of subsets of some non empty set O.


The set of all elements from 0 that belong to all but a finite
number of the An's is called the inferior limit of the sequence
{An}nEN and is denoted by liminf An or sometimes by [An a.a.]
where a.a. is an abbreviation for "almost always." The set of all
elements from 0 that belong to infinitely many An's is called
the superior limit of the sequence {An}nEN and is denoted by
lim sup An or sometimes by [An i.o.] where i.o. is an abbreviation

www.MathGeek.com
www.MathGeek.com

Convergence of Sets: Lim Inf and Lim Sup 37

for "infinitely often." That is,

liminf An =
00

U n Am
k=l m=k
00

and
limsllpAn= n U Am.
00

k=l m=k
00

If lim inf An = lim sup An = A then we say that the sequence


{An }nEN converges to the set A. In such a case we denote A by
limn-->0O An.

Exercise 2.2 Show that limsllpAn = (liminf(A~))c.

Exercise 2.3 Show that lim inf An C lim sup An.

Exercise 2.4 Define subsets An of JR; via:

A. - {(-lin, 1] if n is odd
n - (-1, lin] if n is even
for positive integers n. Show that lim inf An = {O} and that
limsllpAn = (-1,1].

2.3 Theorem (The First Borel-Cantelli Lemma) Let (0, F,


JL) be a measure space. Ij {An}nEN is a sequence oj measurable
sets and 'if
=

then JL(limsupAn) = o.

Proof. To begin, note that since

n U Ak
00 00

lim sup An =
m=l k=m
it follows that lim sup An C U~m Ak for each mEN. Thus, the
monotonicity of JL implies that

JL(lim sup An) :::; JL CQ, Ak)

www.MathGeek.com
www.MathGeek.com

38 Measure Theory

for each mEN. Hence, via the countable subadditivity of p, it


follows that 00

k=m,

for each mEN. Note that since ~~=1 P,(An) < 00 it follows that
~~m p,(Ak) ---7 0 as m ---7 00; that is, the tail of the convergent
sequence vanishes. Therefore, since p,(lim sup An) is nonnegative
and must be smaller than any positive value we conclnde that
p,(lim sup An) = O. D

The following continuity property of finite measures will be use-


ful in proving some later results.

2.1 Lemma Consider a measure space (n, F, p,) such that p,(n) <
00. If {An}nEN is a sequence of measurable sets that converges
to some (measurable) set A then the sequence p,( An) converges
to p,(A) as 17 ---7 00.

2.4 Measurable Functions

Let (nl' Fd and (n2' F 2 ) be measurable spaces. If f: n 1 ---7 n 2


is such that f- 1 (F2 ) C Fl then we say that f is a measurable
function mapping (nl' F 1 ) into (n2' F 2 ), and we denote this
property by writing f: (nl, F 1 ) ---7 (n2' F2).

Example 2.3 Let n 1 = {red, blue, green} and let n 2 = {O,


I}. Further, let Fl = {0, n 1 , {red, blue}, {green}} and let
F2 = {0, n 2, {O}, {I}}. Define f: n 1 ---7 n 2 via f(red) =
f(blue) = 0 and f(green) = 1. Define g: n 1 ---7 n 2 via g(red) =
o and g(green) = g(blue) = 1. Note that f-l(0) = 0 and
f- 1 (n 2 ) = n 1 . (Indeed, these relationships always hold.) In
addition, note that f- 1 ( {O}) = {red, blue} and that f- 1 ( {I}) =
{green}. Thus, since f- 1 (F2 ) C Fl (equal, in fact) we conclude
that f is a measurable function mapping (nl' F 1 ) into (n2' F2).
Note, however, that since g-l( {O}) = {red} tj:. Fl it follows
that 9 is not a measurable function mapping (nl, F 1 ) into (n2,
F2). D

www.MathGeek.com
www.MathGeek.com

Real Borel Sets 39

2.5 Real Borel Sets

Recall that a bounded open interval in JR is a subset of JR of


the form (a, b) where a and b are real numbers snch that a < b
and where, as usnal, (a, b) = {x E JR : a < x < b}. Let A
denote the collection of all bounded open intervals in R The
collection of Borel subsets of JR is denoted by B(JR) and is defined
by B(JR) = a-(A). That is, B(JR) is the smallest a--algebra on JR
that contains every bounded open interval. The subsets of JR in
B(JR) are called real Borel sets or Borel measurable subsets of
R Note that (JR, B(JR)) is a measurable space.

I hold ... that utility alone is not a proper


measure of value, and would even go so
far as to say that it is, when strictly and
short-sightedly applied, a dangerously false
measure of value. For mathematics, which
is at once the pure and untrammelled cre-
ation of the mind and the indispensable
tool of science and modern technology, the
adoption of a strictly utilitarian standard
could lead only to disaster; it would first
bring about the drying up of the sources
of new mathematical knowledge and would
thereby eventually cause the suspension of
significant new activity in applied mathe-
matics as well. In mathematics we need
rather to aim at a proper balance be-
tween pure theory and practical applica-
tions ... -Marshall Stone

<:; Exercise 2.5 Try to find a subset of JR that is not a real Borel
set.

www.MathGeek.com
www.MathGeek.com

40 Measure Theory

Consider a measurable space (0, :F). If f: (0, :F) ---+ (JR, B(JR))
then f is said to be a real-valued :F-measurable function defined
on 0. If.f: (JR, B(JR)) ---+ (JR, B(JR)) then f is said to be a
real-valued Borel measurable function defined on JR.

Exercise 2.6 Show that any countable subset of JR is a real


Borel set.

Let f : JR ---+ R Recall that we say that f is continuous at the


real number x if and only if for any c > 0 there exists 8 > 0
such that if Ix - yl < 8, then If(x) - f(y)1 < c. Further, if f
is continuous at x for each real number x, then we say that f is
continuous.

2.4 Theorem Let f : JR ---+ JR. The function f is continuous if and


only if for each open set U of real numbers, f-l(U) zs an open
set.

Proof. Suppose that f- 1 (U) is open for each open set U of real
numbers, and let x be an arbitrary real number. Then, given
any real number c > 0, the interval 1= (.t(x) -c, f(x) +c) is an
open set, and so f-l(1) must be open. Now, since x E f- 1 (1),
there must exist some real number 8 > 0 such that (x - 8,
x + 8) C f-l(1). But this implies that if Ix - yl < 8, then
f(y) E (f(x) - c, f(x) + c). Hence, f is continuous at x and,
since x was arbitrary, f is continuous.
Now, suppose that f : JR ---+ JR is continuous, and let U be
a nonempty open subset of R If f-l(U) is empty then the
desired result follows since the empty set is open. Assume then
that f-l(U) is not empty and let x E f-l(U). Then, since
f(x) E U there exists some c > 0 such that (f(x) - s, f(x) + c)
is a subset of U. Since f is continuous at x there exists a b > 0
such that If(x) - f(y)1 < c when Ix - YI < b. Thus, for every
y E (x - 8, x + 8), it follows that f(y) E (f(x) - c, f(x) + c)
which is an open subset of f-l(U). Thus, f-l(U) is open. D

<) Exercise 2.7 Show that any continuous function mapping JR


to JR is Borel measurable.

www.MathGeek.com
www.MathGeek.com

Real Borel Sets 41

Consider a nonempty set o. A IT-subalgebra of a IT-algebra F


on 0 is a IT-algebra on D that is a subset of F. For example,
{0, O} is a IT-sub algebra of any IT-algebra on O. For a second
example, let F = {0, 0, A, AC} for some proper subset A of O.
Even though the subset {0, A} of F is a IT-algebra on A, it is
not a IT-subalgebra of F.
\Ve say that a IT-algebra A on a nonempty set 0 is collntably
generated if A = IT( {An: n E N} ) for some choice of the An's. If
F is a countably generated IT-algebra on a nonempty set 0 and if
9 is a IT-sllbalgebra of F, then mllst 9 be collntably generated?
In the following example we show that the answer is no.

Example 2.4 Let 0 = [0, 1] and let F = 8([0, 1]). Fur-


ther, let 9 be the IT-subalgebra of F given by the countable and
cocountable subsets of [0, 1]. (A set is cocountable if it has a
countable complement.) It follows from one of the problems that
F is countable generated. Assume now that 9 is also countably
generated. That is, assume that 9 = IT( {An : n E N}) where
An C [0, 1] for each n E N. Note that without loss of general-
ity, we may assume that An is countable for each n E N. Let
B = UnEl'l An and note that B is also countable. Thus, there
exists some real number x such that x E [0, 1] \ B. Notice also
that if D is the family of all subsets of B and their complements
then D is a IT-algebra such that 9 :J D :J IT( {An: n E N}).
But, D i= 9 since {x} is in 9 but not in D. This contradiction
implies that 9 is not countably generated even though it is a
IT-subalgebra of the countably generated IT-algebra F. D

2.2 Lemma Consider a mea.'mrable space (D, F) and real-vahled


F -measurable functions f and 9 defined on O. The set {w EO:
f(w) > g(w)} is an element of F.

Proof. \iVrite the set Q of rational numbers as a sequence


{rn}"EN. Note that
{w EO: f(w) > g(w)}
U {w EO: f(w) > Tn > g(w)}

U ({WE 0 : f (w) > Tn} n {W EO: 9 (w) < Tn} )


nEl'l

www.MathGeek.com
www.MathGeek.com

42 Measure Theory

U U- 1 ((rn' (0)) ng- 1((-00, rn))).


nEN

The desired result then follows since (rn' (0) and (-00, rn) is
in B(JR) for each n E N. D

2.3 Lemma Consider a meas1Lrable space (fl, F) and a real-valued


F-measurable ftmction f defined on fl. If 0; is any real number
then f + 0; and o;f are F -measurable functions defined on fl.

2.4 Lemma Consider a measurable space (fl, F) and real-valued


F-measurable functions f and 9 defined on fl. The function
f + 9 is an F-measurable function defined on fl.

2.5 Lemma Consider a measurable space (fl, F) and real-valued


F -measumble functions f and 9 defined on fl. The function f 9
is an F -measm-able f1mdion defined on fl, and, if 9 ,is nonzeTO
then f /g is an F-measurable function defined on fl.

2.6 Lemma Consider a measurable space (fl, F) and a sequence


{fn}nEN of real-valued F-measumble functions defined on fl.
The functions SUPkEN fk(X) and inhEN fk(X) are F-measurable
functions defined on fl.

Consider a sequence {Xn}nEN of real numbers. Recall that the


superior limit of this sequence is given by

lim snp Xn = inf snp Xn


n~= JEN n';?j

and the inferior limit of this sequence is given by

lim inf Xn = sup inf Xn .


n-+= JEN n';?j

Further, this sequence is said to converge to a real number x if

lim sup Xn = lim inf Xn = x.


n-.= n-----7CX)

Finally, a sequence {fn}nEN of real-valued functions defined on


some nonempty set fl is said to converge pointwise to a function
f : fl ----+JR is the sequence {fn(W)}nEN ofreal numbers converges
to the real number f (w) for each W E fl. In this case, we denote
the pointwise limit f as limn -.= fn.

www.MathGeek.com
www.MathGeek.com

Lebesgue Measure and Lebesgue Measurable Sets 43

2.7 Lemma Consider a measurable space (D,:F) and a sequence


{fn}nEN of real-valued :F -measurable functions defined on D. ~f
limn ---*= fn exists then it is an :F-measurable function defined on
D.

2.6 Lebesgue Measure and Lebesgue


Measurable Sets

For an open interval (a, b) of JR., let £( (a, b)) denote the length
of the interval (a, b). That is, if I = (a, b) with a < b then
£(1) = b - a.
Let A be a snbset of R \lVe will say that a conntable colledion
{In: n E 11{ C N} of open intervals covers A if

A C (U
nEM
In).

For each snch set A, let SA be the subset of JR. given by

The onter Lebesgne measure of A is denoted by m * (A) and is


defined by m*(A) = inf SA. (Note that outer Lebesgue measure
is defined for any set in lP(JR.) but is not a measure on (JR., lP(JR.))
since it fails to be conntably additive.)

2.1 Definition (The Caratheodory Criterion) A subset E of


JR. is said to be Lebesgue measurable if

m*(B) = m*(B n E) + 'm*(B n E C


)

for every subset B of R

Let M (JR.) denote the collection of all subsets of JR. that satisfy
the Caratheodory Criterion; that is, .1\// (JR.) denotes the collec-
tion of all Lebesgue measurable subsets of R

www.MathGeek.com
www.MathGeek.com

44 Measure Theory

2.5 Theorem The set B(~)is a proper subset of M(~).

2.6 Theorem The set M(~) is a proper subset ofJPl(~).

Proof. For the construction of a non-Lebesgue measurable


subset of the real line, see pages 41-42 of Counie'rexamples in
Probability and Real Analysis by G. Wise and E. Hall (Oxford
University Press, New York, 1993). Also, see page 63 of Real
Analysis by H. L. Royden (lVIacmillan, New York, 1988, Second
~~. D

2.7 Theorem The set M(~) ,is a (J-algebm on R

Proof. See pages 56-58 of Real Analysis by H. L. Royden


(Macmillan, New York, 1988, Second edition). D

Lebesgue measure m on the measurable space (~, M(~)) is de-


fined to be the restriction of m* to M(~). That is, m(A) is
equal tom*(A) if A E M(~) and m(A) is left undefined if
A 1:. M(~). Lebesgue measure A on the measurable space (~,
B(~)) is defined to be the restriction of m to B(~).

Lebesgne measure corresponds to om intuitive concept of length.


That is, the Lebesgue measure of an interval is the length of the
interval. Lebesgue measure, however, is defined for subsets of
~ that are mnch more complicated than intervals. Note, also,
that we have only defined Lebesgue measure for certain subsets
of the real line. Later, we will define it for certain snbsets of ~k.
In any case, however, when discllssing the Lebesglle measnre of
a set A it will always be true that A is a subset of ~k for some
positive integer k.

2.8 Theorem Let A denote Lebesgue measure on (~, B(~)). If x E


~ then A( {x}) = o.

Proof. For each positive integer TI, let In denote the subset of
~ given by

I
n
= (x -~.
211'
x+ ~)
2n
.

www.MathGeek.com
www.MathGeek.com

Lebesgue Measure and Lebesgue Measurable Sets 45

Note that )..(In) = lin. Further, since {x} C In for each n


it follows via monotonicity that )..({x}) ::::; l/n for any positive
integer rL. Thus, we conclude that )..( {x}) = o. D

<> Exercise 2.8 If A is a Lebesgue measurable subset of Jl{ hav-


ing zero Lebesgue measure then must A be countable?

°
Consider a measure space (0, :F, JL). A subset of is said to be
a nnll set (or a JL-null set) if it is measurable and has measnre
zero. That is, A is a null set if A E :F and if JL(A) = O. Let
A C B where where B is a null set. If A E :F then A must
also be a nnll set since JL(A) ::::; JL(B). In general, however, A
need not be a null set since A need not be an element of :F. A
measure space is said to be complete if every snbset of a null set
is a measurable set. Note that while the empty set is always a
null set, a null set need not be empty.

2.9 Theorem Corresponding to any measure space (0, :F, JL) there
exists a complete measure space (0, :Fa, JLo) such that

1. :F c :Fa.
2. JL(A) = JLo(A) for each set A E :F.

3. A E :Fa if and only if A = E U F where E E :F and where


FeN for some N E :F with JL(N) = O.

The measure space (0, :Fa, JLo) is said to be the completion of


(0, :F, JL).

2.10 Theorem The measure space (Jl{, M(Jl{),m) is the completion


of the measure space (Jl{, B(Jl{) , )..).

Exercise 2.9 If we complete Lebesgue measure on the real


Borel sets, then we obtain the real Lebesgue sets. There do exist
measures on the real Borel sets that when completed yield the
power set of R Can you think of such a measure?

www.MathGeek.com
www.MathGeek.com

46 Measure Theory

For a positive integer n, let JRn denote the n-fold Cartesian prod-
uct of JR with itself. That is, an element of JRn is an ordered
n-tuple of the form (aI, ... ,an) where ai E JR for each i. A set
I of the form I = II X ... x In where h is an open interval of
the form (ak' bk) for each k is called an open rectangle in JRn.
The smallest IT-algebra on JRn that contains every open rectan-
gle in JRn is denoted by B(JRn) and is called the set of Borel
measurable subsets of JRn. Note that, for any positive integer
n, (JRn, B(JRn)) is a measurable space. If f: (JR k , B(JR k )) ---+ (JR,
B (JR)) for some kEN then f is said to be a real-valued Borel
measurable function defined on JR k .

2.11 Theorem For any kEN there exists a unique measure A on


(JR k , B(JR k )) such that

A(A1 X ... X A k ) = "\(A1) ... "\(Ak)

for any sets AI, ... , Ak from B(JR) where..\ is Lebesgue measure
on (JR, B(JR)). The measure A on (JRk, B(JRk)) is called Lebesgue
measure on (JRk, B(JRk)).

2.7 Caveats and Curiosities

www.MathGeek.com
www.MathGeek.com

3 Integration

3.1 The Riemann Integral

Let fbe a bonnded real-valued fnndion defined on an interval


[a, b] and let f = {aD, ... , an} be a subdivision of [a, b]; that
is, a = aD < al < ... < an = b for some positive integer n.
Let S denote the collection of all subdivisions of [a, b]. Define
real-valued functions 51 and 52 on S via
n
5 1(r) = 2:)0:; - ai-I) sup{f(x) : ai-l < x ::::; o:d
i=1

and.
n
S:z(r) = 2]a; - O:i-l) inf{f(x) : ai-l < x ::::; ai}
i=1

where f = {aD, ... , an} is an element from S. The upper


Riemann integral of f over [a, b] is given by

U lb f(x)dx = inf{51 (f) : f E S}

and the lower Riemann integral of f over [a, b] is given by

£ lb f(x) dx = sup{52(f) : f E S}.

If the upper and lower Riemann integrals of f over [a, b] are


each equal to the same value j3 then we say that f is Riemann
integrable over [a, b] and we denote the value /3 by f(x) dx J:
and call it the Riemann integral of f over [a, b]. As the next
example shows, it is not difficult to find functions that are not
Riemann integrable.

Example 3.1 Let [a, b] with a < b be a subinterval of ffi. and


define a real-valued fnndion f on [a, b] via

f(x) = {~ if x is irrational
if x is rational.

www.MathGeek.com
www.MathGeek.com

48 Integration

Once when walking past a lounge in the


University of Chicago that was filled with a
loud crowd watching TV, [Zygmund] asked
one of his students what was going on.
The student told him that the crowd was
watching the \Vorld Series and explained to
him some of the features of this baseball
phenomenon. Zygmund thought about it
all for a few minutes and commented, "I
think it should be called the \Yorld Se-
quence." -Ronald Coifman and Robert
Strichartz writing about Antoni Zygmund

That is, f(x) = IiQ)(x). Let r = {aD, ... , an} be a subdivision


of [a, b]. Given any positive integer i ::; 1/, there exists a rational
number qi and an irrational number Ti such that O:i-1 < qi ::; ai
and such that O:i-1 < Ti ::; ai. Hence, it follows that sup{f (x) :
ai-1 < x ::; ai} = 1 and inf{f(x) : ai-1 < x ::; ai} = O.
From this we conclude that Sl (r) = L:~=1 O:i - a,-l = b - a
and SAr) = O. Since these values do not depend upon the
particular subdivision r that was selected it follows that the
upper Riemann integral of f over [a, b] is equal to b - a and
that the lower Riemann integral of f over [a, b] is equal to zero.
Since these values do not coincide, we see that f is not Riemann
integrable over [a, b]. D

Example 3.1 points out a serious shortcoming of the Riemann


integral. In particular, for a Borel set E we would like IE to
be integrable and fJJ~IE(x) dx to equal the Lebesgue measure of
E. That is, ideally fIR IiQ)(x) dx should equal zero (the Lebesgue
measure of Q) but the Riemann integral of IiQ) does not exist.
Although the Riemann integral is not general enough or powerful
enough for our purposes, it remains useful for other purposes due
to its simplicity and computability.
\Ve will consider two additional types of integration. The first
will be a straightforward extension of the Riemann integral and,
as above, will be used to integrate functions defined on a subset

www.MathGeek.com
www.MathGeek.com

The Riemann-Stieltjes Integral 49

of the real line. The second new integration technique will be


much more general in that it will allow us to integrate functions
defined on arbitrary sets.

3.2 The Riemann-Stieltjes Integral

Let f be a real-valued function that is defined on an interval [a,


bj. As before, let r = {ao, ... , an} be a subdivision of [a, bj.
(That is, a = ao < a1 < ... < an = b.) Let g denote the set
of all subdivisions of [a, bj. Define a function S mapping g into
the extended nonnegative reals via
n
S(r) = L If(ai) - f(ai-1)1·
i=l

The variation of f over [a, bj is defined by

v = sup{S(r) : rEg}.
If V < 00 then we say that f is of bounded variation on [a, bj.
If V = 00 then we say that f is of unbounded variation on [a,
bj.

Example 3.2 Consider a function f defined on [a, bj that is


nondecreasing; that is, if a ::; x < y ::; b then f (x) ::; f (y). Then
S(r) = f(b) - f(a) for any subdivision r and hence it follows
that V = f(b) - f(a). D

Example 3.3 Let f(x) = I<QI(x) for x E [a, bj. Then, given
any positive number B there exists a subdivision r of [a, bj such
that S(r) > B. (Simply choose r = {(Yo, ... , an} such that 11
is large and such that ai is rational when i is even and irrational
when i is odd.) Thus, V = 00 and we conclude that f is of
unbounded variation on [a, bj. D

Exercise 3.1 A function f defined on [a, bj and taking values


in lR is said to satisfy a Lipschitz condition on [a, bj if there

www.MathGeek.com
www.MathGeek.com

50 Integration

exists a constant C such that If(x) - f(y)1 s


Clx - YI for all x
and y in [a, bj. Show that for such a function f it follows that
V S C(b - a) where V is the variation of f over [a, bj.

Now, let f and 9 be real-valued functions defined on [a, bj and


c:onsider a subdivision r = {ao, ... , an} of [a, bj. Let <D be a
sample from the subdivision r. That is, <D = {PI, ... , Pn} is
a collection of real numbers such that ai-l S /3i S 0:; for each
positive integer i S n. Let g denote the set of all subdivisions
of [a, bj and for a subdivision r let Sr denote the collection of
all samples from the subdivision r. Let S denote the set of all
ordered pairs of the form (r, <D) where <D E Sr and define a
function R mapping S to lR via
n
R((r, <D)) = "Lf(Pi)(g(O:i) - g(ai-l)).
i=1

The value R( (r, <D)) is called a Riemann-Stieltjes sum of f with


respect to 9 for the subdivision r.

This was all part of his passion for order


in the world of mathematics. He could
not stand untidiness in his chosen ter-
ritory, blunders, obscurity, or vagueness,
unproven assertions or half substantiated
claims ... the man who did his job incom-
petently, who spoilt Landau's world, re-
ceived no mercy: that was the unpardon-
able sin in Landau's eyes, to make a math-
ematical mess where there had been order
before. -G. H. Hardy and H. Heilbronn
writing about Edmund Landau

For a subdivision r = {o:o, ... , Ct n } of [a, b], let


If! = max (O:i - ai-I)
1 <:::1.<:::n

denote the size of r. If the limit


lim R((r, <D))
II'I-->O

www.MathGeek.com
www.MathGeek.com

The Lebesgue Integral 51

exists and is finite then that limit is called the Riemann-Stieltjes


integral of f with respect to 9 on [a, b] and is denoted by

1 [a, b]
f(x) dg(x).

(N ate that this limit does not depend on 1>.) If 9 (x) = x then
I[a, b] f (x) dg (x) is simply the Riemann integral of the function f
over [a, b].

3.1 Theorem If f is continuous on [a, b] and if 9 ,is of bmmded


var-iation on [a, b] then the Riemann-Stieltjes integml of f with
r-espect to 9 on [a, b] exists.

3.2 Theorem (Integration by Parts) fr

1 [a. b]
f(x) dg(x)

exists then so does


Ja,b] g(x) df(x)
and

1 [a, b]
f(x) dg(x) = (f(b)g(b) - f(a)g(a)) -1
[a, b]
g(x) df(x).

3.3 Theorem If f is continuous on [a, b] and if 9 has a continuous


der-ivative g' on [a, b] then

{
J[a,b]
f dg = Ib f g' dx.
a

3.3 The Lebesgue Integral

3.3.1 Simple Functions

Consider a measure space (0, y, J-L). A function f: 0 ---7 lR is


said to be a simple function 1 if it has the form
n
f(w) = LCYJA;(W)
i=l

lOr, more precisely, a rneasumble simple function.

www.MathGeek.com
www.MathGeek.com

52 Integration

wheren E N, where ai E JR for each i, and where the Ai '8


are disjoint elements of .:F. Note that such a simple function
is a measurable mapping from (0, .:F) to (JR, B(JR)). Note also
that any function having the form given above with the A/s not
disjoint may be written as a simple function by taking intersec-
tions. For a simple function f as given above we will define the
Lebesgue integral of f over 0 to be

Example 3.4 Let 0 = {Head, Tail} and let .:F = lP(O). Let
J-L be a measure defined on (0, .:F) via J-L( {Head}) = 1/2 and
J-L( {Tail}) = 1/2. Let f map 0 to JR via

f(w) = a1 I {Tail}(W) + a2 I {Head}(w)


where a1 and a2 are real numbers. Note that

3.3.2 Measurable Functions

Consider a measurable real-valued function f defined on (0, .:F)


and assume that f (w) ~ 0 for all w E 0. Let Sf denote the set
of all simple functions h defined on (0, .:F) such that 0 ::::; h( w) ::::;
f (w) for all w E 0. For such a nonnegative measurable function
f we define the Lebesgue integral of f over 0 to be

in f dJ-L = sup {in h dJ-L : h E Sf} .

Consider a measurable real-valued function f defined on (0, .:F)


and let f+ and f- denote the positive and negative parts of
f, respectively. That is, f+(w) = max{f(w), O} and f-(w) =
max{ - f(w), 0) for each w E 0. Note that f+ and f- are non-
negative measurable functions, that If I = f+ + f-, and that

www.MathGeek.com
www.MathGeek.com

The Lebesgue Integral 53

j = j+ - j-. \Ve will define the Lebesgue integral of j over 0


to be
L
j d~ = L
j+ d~ - L
j- d~
provided that the two integrals on the right are not both equal
to 00; if they are ea<:h infinite then we say that the Lebesgue in-
tegral of j does not exist. The function j is said to be Lebesgue
integrable if 10 j d~ exists and is finite. If A E :F then we will
let

!
. A
j d~ = r
Jo
j 1/1 dp .

Note that the Lebesgne integral of a nonnegative measurable


function always exists although the value of the integral may
be 00.

\Ve have considered two important concepts that are as-


sociated with Lebesgue. It is important not to confuse
them. Lebesgue measure is a particular example of a
measure that is only defined for certain subsets of the
real line or ]R.k. The Lebesgue integral allows us to inte-
grate real-valued measurable functions that are defined
on any measurable space. In particular, the Lebesgue in-
tegral is defined on general measure spa<:es and need not
have any relation at all to Lebesgue measure. If, how-
ever, we consider the Lebesgue integral with respect to
Lebesgue measure on ]R.k then for certain functions we
recover the familiar Riemann integral.

3.3.3 Properties of the Lebesgue Integral

Consider a measure spa<:e (0, :F, Jil A <:ondition is said to hold


almost everywhere with respect to the measure ~ (written a.e.
[,u]) if there exists a ,u-null set B su<:h that the mndition holds
for all w in 0 \ B. For example, if 0 = ]R. and if ~ is Lebesgue
measure then IQ(x) = 0 a.e. [~l. Lebesgue integrals satisfy the
following properties:

www.MathGeek.com
www.MathGeek.com

54 Integration

1. If In f dp, exists and if k E lR then In kf dp, exists and


equals k In f dp,.

2. If g(w) 2': h(w) for all w E 0 then

k9dP, 2': k hdp,


provided that these integrals exist.

3. If In f dp, exists then

4. If the Lebesgue integral of f and of 9 each exist then

provided that the right hand side is not of the form 00 - 00

or -00 + 00.

5. A real-valued measurable function f is integrable if and


only if If I is integrable.

6. If f = 0 a.e. [p,l then In f dp, = o.


7. If 9 = h a.e. [p,l, if In 9 dp, exists, and if h is measurable
then In h dp, exists and is equal to In 9 dp,.

8. If h is integrable then h is finite a.e. [p,l.


9. If h 2': 0 and In hdp, = 0 then h = 0 a.e. [p,l.

The following two results are the "work-horses" of real analysis:

3.4 Theorem (Monotone Convergence or B. Levi's Theorem)


Let {fn}nEN be a sequence of measurable Teal-valued functions
defined on 0 such that 0 ::::; h(w) ::::; h(w) ::::; ... faT all w E 0
and such that f n (w) ---7 f (w) as n ---7 00 faT all w E 0 faT some
function f. The f1mdion f ,is measurable and

as n ---7 00.

www.MathGeek.com
www.MathGeek.com

The Riemann Integral and the Lebesgue Integral 55

Proof. For a proof of this theorem, see page 172 of Real and
Abstract Analysis by E. Hewitt and K. Stromberg (Springer-
Verlag, New York, 196.5). D

3.5 Theorem (Dominated Convergence Theorem) Let {fn}nEN


be a sequence of measurable real-valued functions defined on n
s1lch that f(w) = limn->oo fn(w) exists for all wEn. If Ifni gs
for some integrable function g and for each n E N then

1. f is integrable,

2. lim
n->= Jor Ifn - fl dp = 0, and,

Proof. For a proof of this theorem, see pages 172-173 of


Real and Abstract Analysis by E. Hewitt and K. Stromberg
(Springer-Verlag, New York, 1965). D

3.4 The Riemann Integral and the


Lebesgue Integral

3.6 Theorem Let f be a bounded real-valued function defined on


an interval [a, b]. If f is Riemann integrable on [a, b] then f
is Lebesgue integrable wdh respect to Lebesgue measure on [a, b]
and the two integrals are eq1wl.

Proof. This result is proved on pages 121-122 of Real Variables


by A. Torchinsky (Addison-'Wesley, Redwood City, California,
1988). D

3.7 Theorem Let f be a bounded real-valued function defined on an


interval [a, b]. The function f is Riemann integrable on [a, b] if
and only if f is continuous a.e. on [a, b] with respect to Lebesgue
measure.

www.MathGeek.com
www.MathGeek.com

56 Integration

Proof. This result is proved on page 123 of Real Variables


by A. Torchinsky (Addison-\Vesley, Redwood City, California,
1988). D

Note 3.1 The previous result holds for a bounded function


defined on a bounded interval. In contrast, there do exist func-
tions that possess improper Riemann integrals and yet are not
Lebesgue integrable. The function sin( x) / x is such a function.
(Recall that for an improper Riemann integral either the inte-
grand or the interval over which the integrand is integrated is
unbounded. )

Note 3.2 Let). denote Lebesgue measnre on (lR, B(lR)). If


.f: lR ----+ lR is Lebesgue integrable then we will often denote the
integral
kfd).
via the more familiar notation

k
f(x) dx.

The second expression, however, is just our notation for the


Lebesgue integral with respect to Lebesgue measnre and should
not ordinarily be taken to refer to a Riemann integral.

3.5 The Riemann-Stieltjes Integral


and the Lebesgue Integral

A fnnction F: lR ----+ lR is said to be right continnous if

limF(y) = F(x)
ylx

for any x E R (The notation y 1x means that y ----+ x with


y > x.)

www.MathGeek.com
www.MathGeek.com

Caveats and Curiosities 57

3.8 Theorem Let F: ~ ---+ ~ be nondecreas'ing and Tight cont'iml-


ous. Let f be a continuous, real-valued function defined on [a,
b]. The function F induces a measure J-L on (~, B(~)) such that
J-L((s, t]) = F(t) - F(s) for all s < t and such that

la
b
f(x)dF(x) = 1
(a,~
fdJ-L.

Proof. This res nIt is proved on pages 5-9 of Frobabildy Theory


by rr. G. Laha and V. K. Rohatgi (John 'Wiley, New York, 1979).
This proof nses the Caratheodory Extension Theorem that is
developed in Section 5.6 of this book. D

3.6 Caveats and Curiosities

www.MathGeek.com
www.MathGeek.com

58 Integration

www.MathGeek.com
www.MathGeek.com

4 Functional Analysis

4.1 Vector Spaces

Let X be a nonempty set and snppose that there exists a map-


ping .f of X X X into X that is called the addition fllndion and
is denoted by j(XI' X2) Xl + X2. Suppose also that there is
a mapping 9 of lR X X into X that is called the scalar mlllti-
plication fllndion and is denoted by g(n, x) cu. The set X
endowed with two such mappings is called a real vector space if
the following properties are satisfied:

1. x +y = y +x for all x and y in X.

2. x + (y + z) = (x + y) + z for all x, y, and z in X.

3. There exists in X a unique element denoted by 0 and called


the zero element such that x + 0 = x for each x in X.

4. To each x in X there corresponds a llniqlle element in X


denoted by -x such that x + (-x) = O. (\Ve will often
write +(-x) as -x.)

5. n(x + y) = nx + ny for each n in lR and each x and y


from X.

6. (a + (3)x = ax + /3x for each a and /3 in lR and each x


in X.

7. a(/3x) = (n(3)x for each a and /3 in lR and each x in X.

8. Ix = x for each x in X.

9. Ox = 0 for each x in X where the 0 on the left is a real


number and the 0 on the right is the element in X de-
scribed in Property 3.

Consider a real vector space X (or, more precisely, a real vector


space (X, j, g)). A finite set {Xl, ... , x n } of elements (vedors)
from X is said to be linearly dependent (or consist of elements

www.MathGeek.com
www.MathGeek.com

60 Functional Analysis

that are linearly dependent) ifthere exist real numbers (scalars)


aI, ... , an, not all zero, such that alxl + ... + anXn = O. Oth-
envise, the elements are said to be linearly independent. An
infinite set is said to be linearly independent if every finite sub-
set of it is linearly independent.
A nonempty subset 1\11 of a vector space X is called a subspace
of X if x + y and ax are in AI for every a in Jl{ and every x and
y fI·om !vI. A subspace !vI of X is said to be a proper subspace
if !vI i= X. A subspace of a vector space is itself a vector space.
The intersedion of any family of sllbspaces is itself a sllbspace.
Let S be a nonempty subset of a vedor space X and let £(S)
be the set of all finite linear combinations of elements from S.
That is, x E £(S) if and only if x = CtlXl + ... + anXn for some
positive integer n and where Xi E Sand ai E Jl{ for each i. The
set £(S) is a subspace of X and is called the linear manifold
generated by S or the linear span of S.
If X is a vector space then there may be some positive integer n
such that X contains a set ofo, vectors that are linearly indepen-
dent while every set ofo, + 1 vectors in X is linearly dependent.
In this case we say that X is finite-dimensional and of (iimen-
siono,. The trivial vector space {O} has dimension O. If X is not
finite dimensional then it is infinite dimensional. (The set Jl{k
endowed with the standard operations is k-dimensional. Spaces
whose elements are functions are typically infinite-dimensional.)
If X is n-dimensional for some positive integer n then there ex-
ists a linearly independent set S consisting of n elements such
that the linear span of S is X itself. Such a set is called a basis
for X.

4.2 Normed Linear Spaces

A mapping from a vector space X into Jl{ is called a norm on X


and is denoted by I . I if it satisfies the following properties:

1. Ilx + yll ::; Ilxll + Ilyll for each x and y from X.

www.MathGeek.com
www.MathGeek.com

Normed Linear Spaces 61

2. Iiaxil = lallixil for each a in lR and each x in X.

3. Ilxll :2: 0 for each x in X.


4. Ilxll = 0 if and only if x = o.

A nonempty set X is said to be a metric space if there exists a


mapping P of X x X into lR (called a metric or distance function)
such that:

1. P(XI' X2) :2: 0 for each Xl and X2 from X.

2. P(XI' X2) = 0 if and only if Xl = X2.


3. P(XI' X2) = P(X2' Xl) for each Xl and X2 from X.

4. P(XI' X3) :::; P(XI' X2) + P(X2' X3) for each Xl, x2, and X3
from X.l

An open ball centered at a point p in a metric space (X, p) is a


set consisting of all points q in X such that p(p, q) < 'r for some
fixed positive 'r. A point p is said to be a limit point of a subset
E of X if every open ball centered at p contains a point q such
that q i= p and such that q E E. The set E is closed if every
limit point of E is an element of E.
A seqnence {Xi}iEN of elements from a metric space (X, p) is
said to be a Cauchy sequence if for every E > 0 there exists an
integer N snch that p(xn, xm) < E whenever n :2: N and 'Tn :2: N.
A metric space in which every Canchy sequence converges to a
point in the space is said to be a complete metric space.
A vector space X equipped with a norm is called a normed linear
space. With the aid of this norm on X we can define a metric
d on X by letting d(x, y) = Ilx - yll for each X and y from X.
That is, a normed linear space is also a metric space. A normed
linear space that is complete with respect to the metric induced
by its norm is called a Banach space.
lThis property is called the Triangle Inequality.

www.MathGeek.com
www.MathGeek.com

62 Functional Analysis

4.3 Inner Product Spaces

Consider a real vector space X. A mapping of X x X into lR


is called an inner product on X and is denoted by (x, y) if it
satisfies the following conditions:

2. (x, y) = (y, x).

3. (x, x) ~ O.

4. (x, x) = 0 if and only if x = o.

A vector space endowed with an inner product is called an inner


product space or a pre-Hilbert space. An inner product may be
used to define a norm by letting Ilxll = ~. A complete
inner product space is called a Hilbert space.

Mathematics is the one area of human en-


terprise where the motivation to deceive
has been practically eliminated. Not be-
cause mathematicians are necessarily vir-
tuous people, but because the nature of
mathematical ability is such that decep-
tion can be immediately determined by
other mathematicians. This requirement
of honesty soon affects the character of
the continuous student of mathematics. -
Howard Fehr

Two elements x and y in an inner product space X are said


to be orthogonal if (x, y) = o. If S is a set of vectors from
X then a vector y is said to be orthogonal to the set S if (x,
y) = 0 for each x in S. A set S of vectors from X such that (x,
y) = 0 for any distinct elements x and y from S is said to be

www.MathGeek.com
www.MathGeek.com

Inner Product Spaces 63

an orthogonal set. An orthogonal set S of vectors from X such


that Ilxll = 1 for each xES is said to be an orthonormal set.
An orthonormal subset of X is said to be total if there exists no
orthonormal subset of X of which S is a proper subset. For a
subspace AI of a Hilbert space H let A1.l (pronounced. 'M perp')
denote the subspace of H consisting of all elements in H that
are orthogonal to every element in M.

Example 4.1 The vectors [0, 1] and [1, 0] comprise a total


orthonormal subset of the inner product space ]R.2 where the
inner product is simply the vector dot product. D

4.1 Theorem (Bessel's inequality) Let {'U1, 'U2, ... } be an or-


thonormal subset of an inner product space X. Then, for each
x EX, it follows that:
(Xl

L\X, Uk)'2 ::; Ilxll'2·


k=l

4.2 Theorem Let {U1,U2, ... } be an orthonormal subset of a


Hilbert space X. Each of the follollJ'ing conditions is necessary
and sufficient for the or-ihonormal set to be total:

(Xl

1. x = L\X, un)u n for each x EX.


n=l
(Xl

2. Ilxll'2 = L \X, Un )2 for each x E X.'2


n=l

4.3 Theorem (Parallelogram Law) In an inner product space,


the following eq1wlity holds for any two elements x and y of the
space:

4.4 Theorem If {Xl, ... , xn} is an orthonormal subset of a Hilbert


space H and if x E H then
Fl,

X - Lajxj
j=l

2This equality is called Parseval's identity.

www.MathGeek.com
www.MathGeek.com

64 Functional Analysis

is minimized when aj = (x, Xj) faT j = 1, ... ,n. (That is, the
aj's provide the coefficients for a best linear estimator of x in
terms of the Xj's.)

Proof. Note that


2

where we note that

since the x;'s are orthonormal. Thus, we have


2
n n
2
x- ~ax·
L J J IIxI1 + L (a; - 2aj(x, Xj))
j=1 j=1
n
IIxI1 + L
2
((aj - (x, Xj))2 - (x, Xj)2)
j=1

slllce
(aj - (x, Xj)) 2 2 2
= aj - 2aj(x, Xj) + (x, Xj) .
Thus, we have
2
n n n
0::::; X - Lajxj = IIxI1 2 - L(x, Xj)2 + L(aj - (x, Xj))2,
j=1 j=1 j=1

which is minimized when aj = (x, Xj). D

A subset E of a vector space X is said to be convex if it has the


following geometric property: Vlhenever x and yare in E and
o < t < 1 then the point (1 - t)x + ty is also in E. That is,
convexity requires that E contain the line segment between any
two of its points.

www.MathGeek.com
www.MathGeek.com

Inner Product Spaces 65

4.5 Theorem Let !vI be a nonempty closed convex subset of a


Hilbert space H. fr x E H, then there is a un'ique element
Yo E All such that Ilx - yoll = inf{llx - yll : Y E 1\1}. The
element Yo is called the projection of x on 1'1.

Proof. Let d = inf{llx - yll : Y E !vI} and choose points Yl, Y2,
... E !vI such that Ilx - Ynll ----Jo d as n ----Jo 00. \lVe will show that
{Yn}nEN is a Cauchy sequence.
The parallelogram law states that Ilu+vl1 2+ Ilu -1)11 2 = 211ul12 +
211vl12 for all u and v in H. Let u = Yn - X and v = Ym - X to
obtain:
2 2 2
llYn + Yrn - 2xl12 + llYn - Yrnl1 = 211Yn - xl1 + 211Yrn - x11 ,
or,
2
2 2
llYn - Yrnl1 = 211Yn - xl1 + 211Ym - xl1
2
- 411~(Yn + Ym) _ xl1
Since ~(Yn + Yrn) E M (by convexity), it follows that

Thus

Since the right hand side of this expression goes to 0 as '17" m ----Jo

00 it follows that {Yn}~=l is Cauchy.

Since H is complete, Yn converges to some limit Yo E H as


n ----Jo 00. Thus, Ilx - Ynll ----Jo Ilx - Yoll as n ----Jo 00. But then
II x - Yo II = d and Yo E }./I since 1\1 is closed. Thus such an
element Yo exists.
To prove uniqueness, let Yo, Zo E 1\1 with Ilx-yo II = Ilx-zoll = d.
In the parallelogram law, let u = Yo - x and v = Zo - x to obtain:

But,
2
Ilyo + Zo - 2xl12 = 411~(YO + zo) - xii ;:::: 4d
2
.

Thus, Ilyo - zoll = 0, which implies that Yo = zoo D

www.MathGeek.com
www.MathGeek.com

66 Functional Analysis

4.6 Theorem Let All be a closed subspace of a Hilbert Space H, and


let Yo be an element of A1. Then Ilx-yoll = inf{llx-YII : Y E A1}
iff x - Yo ~ 1'1, i.e. iff (x - Yo, y) = 0 for all Y E M.

Proof. Assume that x - Yo ~ !vI. If Y E 1W then

Ilx - Yo - (y - Yo)11 2
Ilx - yol12 + Ily - yol12 - 2(x - Yo, Y - Yo)
Ilx - Yol12 + Ily - Yol12 ~ Ilx - Yol12

since Y - Yo E M. Thus, Ilx - yoll = inf{llx - yll : Y E !vI}.


Assume now that Ilx - Yoll = inf{llx - yll : Y E !vI}. Let Y E All
and let c be a real number. Since A1 is a subspace it follows that
Yo + cy E lW. Thus Ilx - Yo - cY11 ~ Ilx - Yoll. Bnt,

II x - Yo - cy 112 = II x - Yo 112 + c211 Y 112 - 2 (x - Yo, cy).

Thus,
c211YI12 - 2(x - Yo, cy) ~ o.
Let c = b(x - Yo, y) for some b E R Then

(x - Yo, cy) = (x - Yo, b(x - Yo, y)y) = b(x - Yo, y) 2 .

Thus,

b2(x - Yo, y)211Y112 - 2b(x - Yo, y)2


(x - Yo, y)2(b 21IYI12 - 2b) ~ o.
But (b 2 11Y112 - 2b) < 0 if b is small and positive. Thus (x - Yo,
y) = o. D

4.7 Theorem (Hilbert Space Projection Theorem) Let A1 be


a closed subspace of a Hilbert space H. If x E H, then x has
a 1mique representation x = y + z where y E M and z E !vI ~ .
Furthermore, y is the pTOjection of x on A1; that is, y is the
nearest pO'int ,in M to x.

Proof. Let Yo be the projection of x on !vI (see Theorem 4.5)


and let y = Yo and z = x - Yo. Theorem 4.6 implies that z E

www.MathGeek.com
www.MathGeek.com

Inner Product Spaces 67

M..1. Thus such a representation exists. To prove uniqueness,


let x = y + z = y' + z' where y, y' E AI and z, z' E 1\1..1.
Then y - y' E NI since AI is a subspace and y - y' E 11/1..1 since
y - y' = z' - z. Thus y - y' is orthogonal to itself which implies
that y = y'. But then z = z', which proves uniqueness. D

4.8 Theorem (Riesz-Frechet) Cons'ider- a r-eal Hilber-t space H.


Ever-y bounded linear- function f : H ----7 lR. may be expr-essed as
an inner- product on H. That is, ever-y bounded linear- function
f : H ----7 lR. may be expr-essed in the for-m f(h) = ~h, z) wher-e
z E H ,is uniq1Lely deter-mined by f and has nor-m Ilzll = Ilfll·

Proof. If f = 0 then let z = 0 and note that f(h) = ~h, z) and


Ilzll = Ilfll = O. Assume that f i- 0 and note that z i- O. Let
N (f) denote the null space of f; that is, N (f) consists of those
points h in H su<:h that f (h) = O. Note that z 1.. N (f) sin<:e
~h, z) = 0 for all h E N(f).

Note that N(f) is a vector space. That is, if 'U and 1) are in N(f)
then, since f is linear, f(u) + f(v) = f(u + v) = O. Further, if
a is a scalar and if u E fl(f) then au E N(f) since af(u) =
f (au) = O. Note also that N (f) is closed since f is a bounded,
linear, and hence continuous, map. Thus, since f i- 0 it follows
that N(f) i- H and hence, via Theorem 4.7, that J-/(f)..1 i- {O}.
Let Zo be any nonzero element of N (f)..1, and let 1) = f (x) Zo -
f(zo)x for some fixed x E H. Applying f to ea<:h side implies
that f(v) = f(x)f(zo) - f(zo)f(x) = O. That is, 'U E N(f).
Fnrther, sin<:e Zo E N(f)..1 it follows that ~/}, zo) = 0 = f(x)~zo,
zo) - f(zo)~x, zo) whkh implies that f(x)llzoI12 - f(zo)~x, zo) =
O. Solving for f(x) implies that

f(zo)
f (x) = I Zo 112 ~ x, zo)

where we recall that IIzol12 > O. Finally, we may rewrite f(x) as


~x, z) where

www.MathGeek.com
www.MathGeek.com

68 Functional Analysis

<:; 4.4 The Radon-Nikodym Theorem

4.9 Theorem Consider cr-finite measures J-L and v defined on a


measumble space (O,:F) such that any J-L-null set is also a v-
null set. There exists an a. e. [J-Ll unique:F -measumble function
h : 0 ---7 lR. such that

v(F) = k h dJ-L

for all F E :F.

4.5 Caveats and Curiosities

www.MathGeek.com
www.MathGeek.com

5 Probability Theory

5.1 Introduction

Modern probability theory is a branch of measnre theory that


is distingllished by its special emphasis and applications. Mllch
of the terminology of probability theory was established hun-
dreds of years ago by people sllch as Pascal, Fermat, Bernoulli,
Laplace, and Gallss. While this historical foundation provided
much of the current vocabulary used in probability, it did not
provide a rigorous mathematical basis for probability theory.
Near the end of the nineteenth centnry, C. S. Peirce, the founder
of pragmatism, wrote:

This branch of mathematics [probability] is the only


one, I believe, in which good writers frequently get
resllits entirely erroneOllS. In elementary geometry
the reasoning is freqllently fallac:iolls, but erroneous
conclusions are avoided; but it may be doubted if
there is a single extensive treatise on probabilities in
existence which does not contain soilltions absolutely
indefensible. This is partly owing to the want of
any reglliar methods of procednre; for the subject
involves too many sllbtleties to make it easy to pllt
problems into equations without such aid.

At the beginning of the twentieth century measure theory was


established primarily through the work of Henri Lebesgue. In
1929 Andrei Kolmogorov developed a measure-theoretical ap-
proach to probability theory and established probability theory
as a rigorous mathematical theory. 1 Thus, much of the vocab-
ulary of probability theory was established hllndreds of years
before the vocabulary of measure theory was established. Con-
sequently, many concepts have different names when seen from
the perspectives of probability theory and measure theory. For
lThis incident seems to have been overlooked by a large part of the
engineering community.

www.MathGeek.com
www.MathGeek.com

70 Probability Theory

example, an event in probability theory is a measurable set in


measure theory. On the other hand, there are concepts such
as statistical independence in probability theory that have no
analog in measure theory.

5.2 Random Variables and Distribu-


tions
Consider a probability space (n, F, P); that is, consider a mea-
sure space (n, F, P) such that p(n) = 1. A real-valued F-
measurable function defined on n is said to be a random vari-
able defined on (n, F, P). That is, X is a random variable if X:
(n, F) ---+ (JR, B(JR)). Note that X is a fundion and X(w) is a
real number. Note, also, that if Hand P2 are probability mea-
sures defined on (n, F), then a random variable X defined on
(n, F, P1 ) is also a random variable defined on (n, F, P2 ). A
random variable X defined on (n, F, P) is said to be a bounded
random variable if there exists some real number B such that
IX(w)1 < B for all wEn.
The probability distribution function of a random variable X
is the function F: JR ---+ [0, 1] defined by F(x) = P(X :::; x)
where P(X :::; x) denotes the probability of the event {w E n:
X (w) :::; x}. (How do we know that this set is an event?) If sev-
eral random variables are under consideration we may denote
the distribution function of X by Fx. A probability distribu-
tion function F of a random variable X satisfies the following
properties:

1. lim F(x) =
x-----t-CXJ
o.
2. lim F(x) = 1.
X-----t(X)

3. If x < y then F(x) :::; F(y).

4. F is right continuous.

www.MathGeek.com
www.MathGeek.com

Random Variables and Distributions 71

[In statistics] you have the fact that the


concepts are not very dean. The idea of
probability, of randomness, is not a clean
mathematical idea. You cannot prodnce
random nnmbers mathematically. They
can only be produced by things like toss-
ing dice or spinning a roulette wheel. With
a formula, any formula, the number you
get wonld be predidable and therefore not
random. So as a statistician you have to
rely on some conception of a world where
things happen in some way at random,
a conception which mathematicians don't
have. -Lucien LeCam

5. P(x < X ::::; y) = F(y) - F(x).

6. P(X > x) = 1 - F(x).


7. P(X = x) = F(x) -limF(y).
yTx

8. P(X = x) = 0 for x E lR if and only if F is continuous


at x.

Exercise 5.1 Consider a probability distribution function F.


Show that limx---+_= F(x) = O.

Exercise 5.2 Consider a probability distribution function F.


Show that F is right continuous.

Exercise 5.3 Consider a random variable X defined on a


probability space (D, F, P). Show that P(X = x) = F(x) -
limyTx F(y) where F is the probability distribution function
of X.

www.MathGeek.com
www.MathGeek.com

72 Probability Theory

Example 5.1 If F is continuous then P(X = x) is zero for


each x E R Does this mean that X cannot take on the value
x for any x E lR? No. Consider a dart that lands in a circular
dart board of unit area in such a way that the probability that
the dart lands in any particular circular region of the board
is simply the area of that region. Since a single point on the
board can be enclosed within a circle of arbitrarily small area
it follows that the probability of hitting any particular point is
zero. Thus, even though our (idealized) dart will hit a point
when thrown, the probability that it will hit that point is zero
before it is thrown. D

A random variable is said to be discrete if it takes values only


in some countable subset of R A probability distribution func-
tion is said to be atomic if it is continnons except at most a
conntable nnmber of points and if it is constant between any
two adjacent points from the union of the set of discontinuities
with {-oo, oo}. A probability distribution function F is said to
be absolutely continuous if

F(x) = [xao f(t) dt


for some integrable Borel measurable function f. If a function
is absolutely continuous then it is continuous, but there do ex-
ist continuous functions that are not absolutely continuous. A
probability distribution function F is said to be singular if
d
dxF(x) = 0 a.e.
with respect to Lebesgue measure. If a distribution function is
atomic then it is singular, but there do exist singular distribution
functions that are not atomic.
The following corollary relates our definition of a random vari-
able to a definition that is frequently found in introductory texts.

5.1 Corollary Consider a measurable space (n, F) and let X be a


function mapping n to R It follows that X- 1 (( -00, xl) E F
for each x E lR if and only 'if X- 1 (B(lR)) c F.

Proof. It follows immediately that if X- 1 (B(lR)) c F then


X -1 ( ( - 00, x]) E F for each x E R Further, using Theorem 1. 7

www.MathGeek.com
www.MathGeek.com

Random Variables and Distributions 73

it follows that X- 1 (B(lR)) c :F if X- 1((-00, xl) E :F for each


x E lR. D

5.2 Corollary Any continuous function mapping lR to lR is Borel


measurable.

Proof. Let.l: lR --+ lR and recall that, for any subset A of


R f-l(AC) = (.f-l(A)t Next, assume that .I is continuous
and recall from Theorem 2.4 that for any open subset U of lR,
.1-1 (U) is open. Thus, we see that for a closed subset K of lR,
.I- 1 (K) is closed. Further, from Corollary 5.1 it follows that f
is Borel measurable if and only if for each x E lR, .1-1 (( -00,
xl) E B(lR). Note that for any x E lR, (-00, xl is a closed set
since it is the complement of the open set (x, (0). Thus, for any
x E lR, f- 1 (( -00, xl) is closed and hence is a real Borel set. \\'e
thus conclude that f is Borel measurable. D

I told him, I'm a scientist, we're objective.


I told him a crash was improbable. I was
trying to remember the exact probability
when we smashed into the ground. -27
year old botanist Wim Kodman trying to
calm a friend as their jet flew through tur-
bulence.

5.1 Theorem (The Lebesgue Decomposition Theorem) Any


pTObability distribution function F may be written in the form
F(x) = Ci:lFl(X) + cC2F2(X) + a3F:3(x) where Ci:i ~ 0 for each i,
where al + a2 + a3 = 1, and where

1. Fl is an atom'ic pTObability distribution function,

2. F2 is an absolutely continuous pTObability distribution


function, and

3. F3 'is a singular, cont'lmLOus pTObabildy distr'ibution func-


tion.

www.MathGeek.com
www.MathGeek.com

74 Probability Theory

Note 5.1 The Cantor-Lebesgue function (which is developed


by Example 2.6 on page 54 of Counterexamples in Probability
and Real Analysis 2 by Gary VVise and Eric Hall) is an example
of a distribution function that is continuous and singular. In
particular, it is equal to zero at zero, is equal to one at one,
and is nondecreasing and continuous, yet has a derivative that
is almost everywhere equal to zero.

Consider a random variable X that possesses a probability dis-


tribution function F that is absolutely continuous. There exists
a nonnegative Borel measurable function 1 mapping JR to JR such
that
P (X E A) = L1(x) dx
for any real Borel set A. Such a function 1 is called a probability
density function of the random variable X and exists if and only
if the probability distribution function of X is absolutely con-
tinuous. A probability density function 1 for X is often denoted
by Ix. Note that if X possesses an absolutely continuous distri-
bution function F then F is a.e. differentiable with respect to
Lebesgue measure and X possesses a probability density func-
tion given by the derivative of F at points where the derivative
exists and defined to be an nonnegative value at points where
the derivative does not exist.
Let X be a random variable with an absolutely continuous prob-
ability distribution function F and a probability density function
f. The function 1 satisfies the following properties:

1. F(x) = 1 (-00, x]
l(s) ds.

2. k1 (x) dx = 1.

3. P(a:S; X :s; b) = P(a < X < b).

<:; Note 5.2 Consider a random variable X defined on a prob-


ability space (r2, F, P) and the corresponding measure /-Lx de-
fined on (JR, B(JR)) such that /-Lx(B) = P(X E B) for each
20xfonl University Press, 1993.

www.MathGeek.com
www.MathGeek.com

Independence 75

B E B(JR.).If px is absolutely continuous with respect to


Lebesgue measure A defined on (JR., B (JR.)) then there exists a
Radon-Nikodym derivative dpjdA. A nonnegative version of
this Radon-Nikodym derivative is known as a probability den-
sity function of X. Note from the Radon-Nikodym Theorem
that such a fundion must be Borel measurable. Thus, there
exist nonnegative integrable functions that integrate to one, yet
which are not probability density fundions.

5.3 Independence

Consider a probability space (n, 5", P). Recall that elements of


5" are said to be events. Two events A and B are said to be
independent if P(A n B) = P(A)P(B). Consider an index set I
and let Ai be an event for eachi E I. The sets {Ai :i E I} are
mutually independent 3 if for every finite collection {h, i 2 , ... ,
id of distinct indices from I it follows that

The sets {Ai : i E I} are said to be pairwise independent if


P(A i n Aj) = P(AJP(Aj ) for all i and j from I with i i= j. If
the index set I contains only two elements then mutual inde-
pendence and pairwise independence are equivalent. In general,
however, pairwise independence is implied by, but does not im-
ply, mutual independence.

Note 5.3 Consider three events A l , A 2 , and A 3 . The fol-


lowing chart illustrates the difference between pairwise indepen-
3Many authors omit the word "mutually," but we prefer to retain it
as a way of reinforcing the distinction between mutual independence and
pairwise independence.

www.MathGeek.com
www.MathGeek.com

76 Probability Theory

dence and mutual independence of the three events.

P(A I n A 2) =
P(A l )P(A2)
pmnVlse P(A 2 n A 3 ) =
mutual independence P(A 2)P(A 3 )
independence P(A I n A 3 ) =
P(A l )P(A 3 )
P(A I n A2 n A 3 ) =
P(A l )P(A 2)P(A 3 )

Consider a probability space (n, F, P). Let F l , F 2 , . . . , Fn


be subsets (not necessarily IT-sub algebras) of F. (That is, each
Fi is a collection of events.) The collection F l , F 2 , . . . , Fn
are said to be mutually independent if given any Al E F l , any
A2 E F 2, ... , and any An E F n, it follows that AI, A 2, ... , An
are mutually independent.

[Cantor's theory] seems to me the most


admirable fruit of the mathematical mind
and indeed one of the highest achievements
of man's intellectual processes .... No one
shall expel us from the paradise which
Cantor has created for us. -David Hilbert

Let X be a random variable defined on (n, F, P). We define the


IT-algebra generated by X (denoted by IT(X)) to be the small-
est O"-subalgebra of F with respect to which X is measurable.
That is, IT(X) = X-l(B(lR.)). For a collection Xl, ... , Xn of
random variables we will let IT(Xl, ... , Xn) denote the smallest
IT-algebra with respect to which Xl, ... , Xn are each measur-
able. Note that

Random variables Xl, X 2 , ... , Xn defined on (n, F, P) are


said to be mutually independent if IT(Xd, IT(X2 ), ... , IT(Xn)
are mutually independent collections of events.

www.MathGeek.com
www.MathGeek.com

Independence 77

5.2 Theorem For an 'integer n > 1, cons'ider mutually independent


random variables Xl, X 2 , . . . , Xn defined on a common probabil-
ity space. Let Tn be a positive integer such that Tn < rI. Further,
consider two functions f and g such that f : (JR.m, B (JR.m)) ---+
(JR., B(JR.)) and g : (JR.n-m, B(JR.n-m)) ---+ (JR., B(JR.)). The ran-
dom variables f(X 1 , ... , Xm) and g(Xm+l, ... , Xn) are 'inde-
pendent.

5.3 Theorem (The Second Borel-Cantelli Lemma) Consider


a pTObabildy space (0, F, P). If {An}nEN is a sequence of mu-
tually independent events and if
00

then P(lim sup An) = 1.

Proof. Since limsupAn = (liminf A~Y and since P(A) +


P(AC) = 1 for any event A the desired result will follow if we
show that P(lim inf A~) = O. Recall that
= =
liminf A~ = U n
k=l m=k
A~n·

By countable subadditivity it follows that

P(lim inf A~) ::; ~ P CQk A~n) .


Thus, the desired result will follow if we show that

for all kEN. Let j E N and note that

via independence ofthe An's (and hence of the A~'s). Note that
1 - x ::; e- X for all x E JR. and, in particular, for all x E [0, 1].

www.MathGeek.com
www.MathGeek.com

78 Probability Theory

Thus, it follows that

r Cd A~;) k+j
II (1 -
n=k
P(An))
k+j
< II exp( - P(An))
n=k

Since L~=l P(An) = 00 we see that

---+0

and hence that


k+j )
P ( nQk A~ ---+ 0

as j ---+ 00 for any kEN. Since


k+j 00

n A~ ---+ n A~
n=k n=k

as j ---+ 00 the desired result follows from Lemma 2.1. D

Example 5.2 An adaptive communications system transmits


blocks of bits where each block contains a fixed number of bits.
Let Xn equal 1 or 0 depending on whether an error occurs or
does not occur in block n, respectively. Assume that the Xn's
are mutually independent. Further, let Pn denote the probability
that Xn = 1. In this example we will derive a condition of the
Pn's that is necessary and sufficient for there to be almost surely
only a finite number of errors.
Let En denote the event that the nth block of data has an error.
That is, let En denote that event that Xn = 1. Thus, if [l is
the set of all possible sequences of received bits, then wEEn
if and only if sequence w contains an error in block n. Note
that lim sup En is the event that infinitely many errors occur.

www.MathGeek.com
www.MathGeek.com

Independence 79

That is, lim sup En is the set of w such that wEEn for infinitely
many different values of n. Thus, our problem is to determine
a condition that is both necessary and sufficient to ensure that
P(lim sup En) = O. If this probability is zero then with proba-
bility one there will be only a finite number of errors.
By the first Borel-Cantelli lemma we know that if L:~=1 Pn < 00
then P(lim sup En) = o. Fnrther, the second Borel-Cantelli
lemma implies that if L:~=1 P(En) = 00 then P(lim sup En) =
1. That is, there will almost surely be infinitely many errors.
Thus, it is necessary that L:~=1 Pn < 00 for there to be almost
surely only a finite number of errors. Finally, we conclude that
L:~=1 Pn < 00 occurs if and only if there are almost surely a
finite number of errors. D

Example 5.3 Although we know that all years are not of


equal length, and although we might suspect that all days of
the year are not eqnally likely to be birthdays, we will neverthe-
less make the simplifying assnmptions that all years have 365
days and that each day is equally likely to be a birthday. This
example is concerned with the probability of the existence of
a common birthday between any two or more people among a
given group of people. It seems easier to calculate the prob-
ability that each of the birthdays are different. Note that for
two people, the probability of no common birthday is given by
1 - (1/365); that is, the first person has some birthday, and
the second person then has 364 possible days for a noncommon
birthday. Further, for three people, the probability of no com-
mon birthday is given by (1 - (1/365))(1 - (2/365)), and, for
four people, the probability of no common birthday is given by
(1- (1/365))(1- (2/365))(1- (3/365)). Continuing in this way,
we see that for n people (where n is a positive integer less than
365), the probability of no common birthday is given by

( 1 - _1 ) (1 - ~) (1 - ~) x ... x (1 - ~) .
365 365 365 365
Checking this numerically, we find that foro, = 23, this probabil-
ity is less than 1/2. Thus, for 23 or more people, the probability
that at least two have a common birthday exceeds 1/2. D

www.MathGeek.com
www.MathGeek.com

80 Probability Theory

5.4 The Binomial Distribution


Consider a finite sequence of n terms taking the values of Hand
T. Let N (n, k) denote the nnmber of snch seqnences of length
17 having exactly k H's. Note that if we know this quantity
for sequences of length n - 1 then we see that in sequences of
length 71, the sequences that have exactly k H's are given by
those which have exactly k H's in the first n - 1 terms and a
T for the nth term and those sequences that have k - 1 H's in
the first 17 - 1 terms and an H in the nth term. Hence, N(n,
k) = N(n - 1, k) + N(n - 1, k - 1). Next, use induction, and
assume that
n!
N(17, k) = k!(17 _ k)!·
(We use the convention that zero factorial is one.) Assume that
this expression is correct for n - 1. Then,

(n-1)! (n-1)!
N(n, k) =
k!(n - 1 - k)! + (k - l)!(n - k)!
n! (n - k k)
k!(n - k)! -17- +;:;-
n!
k!(n - k)!·

Note that for k = 0 or for k = 17, it follows straightforwardly that


N(17, 0) = 1 and N(17, 17) = 1. For 17 = 1 we have that N(l,
0) = 1 and N(l, 1) = 1. Thus, the general result follows by
induction, and we conclude that the number of ways of selecting
k items from a set of 17 items is given by

n!
k!(17 - k)!

which is denoted by

(~)
and read as "17 choose k."

www.MathGeek.com
www.MathGeek.com

The Binomial Distribution 81

He uses statistics as a drunken man uses


lampposts-for support rather than illumi-
nation. -Andrew Lang

In many of the elementary aspects of probability, a sequence of


mutually independent trials whose only outcomes are success or
failure is considered such that the probability of success is fixed
from trial to trial. Such trials are called Bernoulli trials.
Consider a finite sequence of 17 Bernoulli trials where the proba-
bility of success on a trial is given by p. vVe model the underlying
probability space as the set of all seqnences of length 17 consist-
ing of S's and F's. Let q = 1 - p. \Ve assign a sequence of n
S's and k - 17 F's to have probability pnqk-n. Now, consider
the probability of getting exactly k S's in 17 trials. Each s11ch
seqnence has probability pnqk-n. Fnrther, there are (~) snch
sequences. Since probability measnres are conntably additive,
it follows that to find the probability of obtaining exactly k S's
in n trials, we simply multiply the common probability of one
such sequence by the total number of such sequences. Hence,
the probability of obtaining exactly k S's in 17 trials is given by

Further, note that the event of having exactly kl successes is


disjoint from the event of having exactly k2 successes if kl i- k 2.
Thus, we see that the probability of having no more than T
successes in 17 trials is given by

~ (~)pkqn-k.
A random variable X taking values in the set {O, 1, ... , n} for
some positive integer n such that

for some p E [0, 1] is said to have a binomial distribution with


parameters p and n.

www.MathGeek.com
www.MathGeek.com

82 Probability Theory

5.4.1 The Poisson Approximation to the Binomial Dis-


tribution

Let b(k; n, p) denote the probability of obtaining exactly k suc-


cesse8 in n Bernoulli trials where p denotes the probability of suc-
ce8S. It is common to deal with a binomial distribution where,
relatively speaking, the parameter n is large and the parame-
ter p is small, and yet the product ..\ = np is positive and of
moderate size. In such cases it is often convenient to use an
approximation that is due to Poisson.
For k = 0 it follows that

b(O;17, p) = (1- pyn = (1- ~)n


Taking logarithms and using Taylor's expansion yields

..\2 - ....
Inb(O; 17, p) = 17ln 1 - -..\) = -..\ - -.
( 17 217

Thus, for large 17, it follows that b(O; 17, p) ~ e-).. Alternatively,
we could have obtained this result by recalling that for fixed A,

-..\)71 =
lim
71--'= (1 - 17
e-).

Also, for any fixed positive integer k, it follows that for suffi-
ciently large n,
b(k;n,p) "\-(k-1)p..\
b(k - 1; n, p) = k(l - p) ~ k'
From this we successively conclude that

b(l; n, p) ~ ..\b(O; n, p) ~ ..\e->.,

and,
1 1 2 -).
b(2; n, p) ~ "2..\b(l; n, p) ~ "2..\ e .

Induction thus implies that

www.MathGeek.com
www.MathGeek.com

Multivariate Distributions 83

This is the classical Poisson approximation to the binomial dis-


tribution.
Let
),k
p(k;)') = e->'kT'

\Ve have shown that p(k;)') is an approximation for b(k; 17, p)


when n is sufficiently large. Note that
= = ),k
L p(k;)') = e->' L -, = 1.
k=O k=O k.

5.5 Multivariate Distributions

For a positive integer n, consider random variables Xl, X 2 ,


... , Xn defined on a probability space (r2, :.F, P). The joint
probability distribution function of Xl, ... , Xn is the func-
tion F : jRn ----Jo [0, 1] defined by F(XI' ... ,xn ) = P(XI :::;
Xl and ... and Xn :::; xn). \lVe will often denote the function F
by F X1 , ... , Xn when the particular random variables of interest
are not dear from context.
The random variables Xl, ... , Xn possess a joint probability
density fllnction f if there exists a nonnegative Borel measnrable
function J : JRn ----Jo JR such that

P((XI' X 2, ... , Xn) E A)

= j J(XI' X2, ... , xn) dXI dX2' .. dXn


A
for all A E B(JRn). Note that the integral of J over JRn is equal
to 1.
For a positive integer n consider random variables Xl, ... , Xn
defined on the same probability space and possessing a joint
probability density function JX1, ... ,X.", For any positive integer
i :::; n, the random variable Xi possesses a probability density
function Jx; given by

Jx; (Xi) = r
Jrrt. n - 1
JX1' ... , Xn (Xl, ... , Xn) dXI ... dXi-1 dXi+1 ... dx n.

www.MathGeek.com
www.MathGeek.com

84 Probability Theory

A density function obtained m this way IS called a marginal


density function.

5.4 Theorem For a positive integer n consider mndom variables


Xl, ... , Xn defined on the same pTObability space. The mn-
dam variables Xl, ... , Xn are mui1tally independent if and only
if FXI, ... ,X,,(XI, ... ,xn ) = FXI(XI)" . Fx,,(xn) for all Xl, ... ,
Xn E JR.

5.5 Theorem For a positive integer n consider mndom variables


Xl, ... , Xn defined on the same probability space and possess-
ing a joint pTObability density function fXI' ... , Xn ' The mndom
variables Xl, ... , Xn are rrmtually independent 'if and only if
fXI, ... ,Xn(XI, ... , xn) = fxI(xd···fxn(x n ) a.e. with 'respect to
Lebesgue measure on B(JRn).

A random variable X is said to have a nniform distribution on


an interval [a, b] if

0 if X <a
x-a
Fx(x) = if a < x ::::; b
b-a
1 if X> b.
Note that a density function for X is given by

1
fx(x) = b _ a I[a,b](x),

If we knew that the outcome of an experiment resulted in values


from some interval [a, b] but had no reason to believe that those
valnes wonld tend to concentrate toward any particnlar part of
that interval then we might choose to model the experiment via
a uniform distribution.

Example 5.4 In this example we will consider a problem


known as Buffon's needle problem, which was an early example
of a problem solving technique called Monte Carlo analysis in
which a nonprobabilistic problem is solved using probabilistic
techniques. Consider a plane that is ruled by the lines y = 17 for
17 E Z and onto which a needle of unit length is cast randomly.

www.MathGeek.com
www.MathGeek.com

Multivariate Distributions 85

\Vhat is the probability that the needle intersects one of the


ruled lines?
Let (X, Y) denote the coordinates of the center of the needle
and let e denote the angle between the needle and the x axis.
Let Z denote the distan<:e from the needle's <:enter to the nearest
line beneath it. Note that Z = Y - l Y J where lx J (the floor of
x) denotes the greatest integer not greater than x.
\Ve will model the statement "needle is cast randomly" via the
following assumptions:

1. Z is uniformly distributed on [0, 1].

2. e is uniformly distributed on [0, 1T].


3. Z and e are independent.

Note that these assnmptions imply that


. 1 .
iZ,e(z, 8) = iz(z)ie(8) = -1[0. 1] (z)1[O,1f] (8).
1T

For what values of z and 8 will the needle intersect the line
immediately above its center? If z < 1/2 then the needle cannot
intersect the line above its center. Assume then that 1/2 :::::; z :::::;
1. In this case the needle intersects the line directly above its
<:enter if and only if 80 :::::; 8 :::::; 1T - 80 where 80 = sin-1(2(1 - z)).
Thus, the probability that the needle intersects the line above
it is given by
111 l1f-sin-l(2(1-Z))
- d8dz
1T 1/2 sin- 1 (2(1-z))
1 2lo1/2
- - - sin- 1(2y) dy
2 1T 0
"21 -:;2 [
y sin -1 ( 2y) + "21 viI - 4y2 ] 11/2
0

By symmetry the needle has the same probability of hitting the


line directly beneath its <:enter. Thus, the probability that the
needle hits any line on the grid is given by 2/1T.

www.MathGeek.com
www.MathGeek.com

86 Probability Theory

Note that this experiment can be used to obtain an estimate of


the numerical value of 1l". That is, throw the needle N times and
count the number of times H that the needle hits a line. The
ratio 2N/ H should be close to 1l" for large values of N. Indeed,
we will show later that this ratio converges to 1l". Solving a
deterministic problem via probabilistic techniques is an example
of a technique known as Monte Carlo simulation. D

5.6 Caratheodory Extension Theo-


rem

Let D be a nonempty set, and let A be an algebra of subsets of


D. That is, A is a nonempty set of subsets of n that is closed
under the operations of taking complements and finite unions.
Recall that it follows from DeMorgan's Law that an algebra is
also closed under the operation of taking finite intersections.
Further, recall, that an algebra on D contains both the empty
set and the set D.
By a measure A on an algebra A we mean a function A defined
on A and taking values in [0,00] that satisfies the following two
properties:

1. A(0) = 0 and, for A E A, A(A) ;:::: o.


2. If {An}nEN" is a sequence of disjoint sets in A whose union
UnEN" An also belongs to A, then

Note that when a countable union of disjoint sets in the algebra


is itself in the algebra then we require that the measure on the
algebra must behave as if it were a measure.

www.MathGeek.com
www.MathGeek.com

Caratheodory Extension Theorem 87

Let 0 be a nonempty set. A function M defined on JP(O) and


taking values in [0,00] is called an outer measure if it satisfies
the following three properties:

1. 1\1(A) 2': 0 for all A E JP(O), and 1\1(0) = O.

2. 1V1(A) ::::; M(B) if A c B c O.

3. 1\1 (91 Ak) ::::; %i M(Ak) for any sequence {Ak}kEN of


subsets of O.

As an example of an outer measure, note that Lebesgue outer


measure is an outer measure on JP(JR). Further, note that Dirac
measure at a fixed point of a set is an outer measure on the
family of all subsets of the set of interest.
As with outer Lebesgue measure, it is possible to use an outer
measnre to characterize a family of measnrable sets. In doing
so, we base the definition of measurability on Caratheodory's
condition. For a given outer measure AI, we say that a subset S
of 0 is measnrable if J\J(A) = 1V1(AnS)+Al(A\S) for any snbset
A of O. This condition has somewhat of an artificial touch to it.
It almost seems mysterions, since it is not in the least intnitive.
Indeed, it singles out the subsets S of 0 which when split by
any subset of 0 resnlts in two subsets of 0 for which the onter
measure adds. Note that a subset S of 0 is measurable if and
only if 1V1(AI U A 2) = 1V1(Al) + 1\I(A2) whenever Al C Sand
A2 C se.
Note that it follows from property (3) of an outer measure that
1\1(A) ::::; l\1(An S) + 1\1(A \S). Hence, we see that a subset S of
o is measurable if and only if, for any subset A of 0, it follows
that M(A) 2': M(A n S) + M(A \ S). Now, it follows almost
immediately that if IvI is an outer measure on JP(O) and if Z is
a subset of 0 such that 1V1(Z) = 0, then Z is measnrable. That
is, let Z be such a set and let A be any subset of O. Then we
have that J\1(A n Z) + 1V1(A \ Z) ::::; J\1(Z) + M(A) by property
(2) of outer measures. Then, since IvI(Z) = 0, we have that
1\1(A n Z) + J'I(A \ Z) ::::; 1\1(A), which characterizes Z as being
measnrable since it is always true that M(A n Z) + M(A \ Z) 2':
1\1(A) by property (3) of outer measures.

www.MathGeek.com
www.MathGeek.com

88 Probability Theory

If lVI is an outer measure on the subsets of 0, and if A is a mea-


surable set, then M·(A) is called the NI-measure, or simply the
measure, of A. This terminology is justified by the next theo-
rem. Before presenting that theorem, however, we will present
a lemma that will be of use in proving the theorem of interest.

5.1 Lemma Let 0 be a nonempty set and let M be an outer meaS1tre


on the subsets of o. If Al and A2 are measurable, then so 'is
Al \ A 2.

Proof. Vve will show that NI(AnB) = NI(A)+NI(B) whenever


A c (AI \A 2) and B C (AI \A 2)c. Since B = (BnA 2) u (B\A 2),
it follows that Au B = (A u (B \ A 2)) u (B n A2). Hence, since
A U (B \ A 2) c A2 and (B n A 2) c A 2, it follows from the
measnrability of A2 that JVI(AU B) = lVI(AU (B \ A 2)) + J\J(B n
A2). However, A C Al and (B \ A 2) c (AI \ A 2)C \ A2 cAl.
Therefore, since Al is measnrable, JVI(A U (B \ A 2)) = AI(A) +
.i\1(B\A2). Combining equalities and using the measurability of
A2 we see that NJ(A U B) = .i\1(A) + NI(B \ A 2) + JVI(B n A 2) =
.i\1(A) + .i\1(B), and the lemma is proved. D

As before, let D be a nonempty set. If S is a subset of 0, then


any family of subsets of 0 whose union contains S as a subset
is known as a cover of S. A countable cover of S is a cover of S
that is countable.

5.6 Theorem Let D be a nonempty set. Let NJ be an outer meaS1tre


on the subsets of D.

1. The family of ./I/I -measurable subsets ofD. forms a (j-algebra


on D..

2. If {Ad kEN is a sequence of disjo'int measurable sets then

More generally, for any subset A of 0, it follows that

www.MathGeek.com
www.MathGeek.com

Caratheodory Extension Theorem 89

and

Proof. Let {Ad kEN be a seqnence of disjoint measurable sub-


sets of O. Let E = Uk EN Ak and, for each positive integer j, let
E j = U~=I A k . vVe will show that
j
NI(A) = L AI(A n A k) + M(A \ Ej).
k=1

The proof will proceed by induction on j. For j = 1, the result


follows from the measurability of AI. Now, assnming that the
result holds for j - 1, it follows that

1\1(A) 1\1(A n Aj) + 1\1(A \ Aj)


j
1\1(A n Aj) + L NI((A \ Aj) \ A k )
k=1

Recalling that the Ak'S are disjoint, it follows that (A\Aj)nAk =


An Ak for k ~ j - 1. Therefore, since (A \ Aj) \ E j - I = A \ E j ,
it follows that
j
1\;f(A) = L AI(A n Ak) + M(A \ E j ),
k=1

as required. This completes the proof of the previous claim.


Next, since E j C E, it follows that M(A \ E j ) :2: M(A \ E).
Using this fact with the above result and considering the limit
as j ---7 00, we see that
00

M(A) :2: L 1\;f(A n A k) + 1VI(A \ E) :2: 1\;f(A n E) + 1Vl(A \ E).


k=1

However, we also have that M(A) ~ M(A n E) + 1VI(A \ E).


Therefore, E is measurable, and
00

1Vl(A) = L 1Vl(A n A k) + M(A \ E).


k=1

www.MathGeek.com
www.MathGeek.com

90 Probability Theory

If we replace A with An B in this equation we see that


=
M(A n B) = L NI(A n A k ),
k=l

and the proof of (2) is complete.


Note that we have also shown that a countable union of disjoint
measurable sets is measurable. To prove (1), we must show that
a countable union of arbdrary measurable sets is measurable.
Returning now to the proof of (1), it follows from Lemma 5.1
and the fad that n is measurable that the complement of a
measurable set is also measurable. Moreover, since El U E2 =
(Ef \ E 2)C, it follows that El U E2 is measurable if El and E2
are measurable. Therefore, any finite union of measurable sets
is measurable. Next, let {EkhEN be a sequence of measurable
sets. If, for each positive integer j, B j = U~=l E k , then

Since the Bj's are measurable and nondecreasing, the terms on


the right are measurable and disjoint. Thus, by the case already
considered, it follows that Uk=l Ek is measurable. This com-
pletes the proof of the theorem. D

A measure J-l on an algebra A is said to be O"-finite (with respect


to A) if n can be written as n = UkEN nk where for each positive
integer k, rh E A and J-l(fh) < 00. For example, Lebesgue
measure is O"-finite on the algebra generated by the intervals (a,
b].
Let n be a nonempty set, and let A be an algebra on n. If J-l is
a measure on the algebra A, we define the outer extension J-l* of
J-l as follows: For any subset A of n,
=
J-l*(A) = inf L J-l(A k ) ,
k=l

where the infimum is taken over all countable covers of A by sets


in A. Note that it is always possible to find such a cover of A

www.MathGeek.com
www.MathGeek.com

Caratheodory Extension Theorem 91

since 0 itself belongs to A. The fact that A is an algebra allows


us to assume without loss of generality that the sets Ak are dis-
joint. \Ve will make this assumption throughout the remainder
of the section.

5.2 Lemma Let 0 be a nonempty set. If A is an algebra on 0 and


if J-L is a measure on A then the outer extension J-L* of J-L ,is an
outer meaS1Lre.

Proof. Note that J-L*(0) = 0 since 0 E A, and J-L*(A) ;:::: 0 for


any subset A of D. If Al and A2 are two subsets of 0 such
that Al C A 2, then any <:ountable (;Over of A2 by sets in A is
also a countable cover of Al by sets in A. Thus, we see that
J-L*(AI) ::; J-L*(A2). Now, let {AdkEN be any sequen<:e of subsets
of O. \Ve wish to show that

Let E be a positive real number. For ea<:h positive integer k,


there is a countable covering of Ak by sets {Ajk} from A such
that
L J-L(A jk ) ::; J-L* (Ak + ;k'
jEN

since J-L* (Ajk) is defined as an infimum. Now, since Uk EN Ak C


UjEN Uk EN Ajk' it follows that

and, since E > 0 may be chosen arbitrarily close to zero, the


desired result follows. D

5.7 Theorem (Caratheodory Extension Theorem) Let A be


an algebra on a nonempty set D. If A is a meaSUTe on A, let A*
be the correspond'ing outer meaS1Lre, and let A* be the O"-algebra
of A* -measumble sets. Then

1. the restriction of A* to A* is an extension of A

www.MathGeek.com
www.MathGeek.com

92 Probability Theory

2. if A is rr-finite with respect to A, and if S is any rr-algebm


with A eSc A *, then A* is the only measure on S that
is an extension of A.

Proof. Let A E A. Then clearly A*(A) ::::; A(A). On the other


hand, given disjoint sets {Ak : kEN} in A that cover A, let
A~ = Ak n A. Then A~ E A and A is the disjoint union of the
A~'s. Hence A(A) = L:kEN A(AU. Since A~ c A, it follows that
A(A) ::::; L:kEN A(A~). Therefore, A(A) ::::; A*(A), and the proof of
(1) is complete.
To prove (2), which states the nniqneness of the extension, let
JL be any measure on the O"-algebra S where A eSc A* that
agrees with A on A. Given a set E E S, consider any countable
colledion {Ed snch that E C UkEN Ek and snch that each Ak E
A. Then

Therefore, by definition of A*, it follows that JL( E) ::::; A* (E).


To show that equality holds, first suppose that there exists a
set A E A with E c A and A(A) < 00. Applying what has
just been proved to A \ E, which belongs to S, we see that
JL(A \ E) ::::; A*(A \ E). However,

JL(E) + JL(A \ E) = JL(A) = A*(A) = A*(E) + A*(A \ E).


Since each of these terms is finite (due to the fact that A(A) is
finite) it follows that JL(E) = A*(E) in this case.
In the general case, since A is rr-finite, there exist disjoint Ak E A
such that the Ak'S cover S1 and such that A(Ak) < 00. \lVe may
apply the result above to each En Ak (which is a subset of A k )
to show that JL(E n A k ) = A*(E n Ak)' By summing over k, we
see that JL(E) = A*(E), and this completes the proof. D

The next result follows as a consequence of the Caratheodory


Extension Theorem.

5.8 Theorem Let F : lR [0, 1] be a pTObab'ility distribution func-


----7

tion and let JLo((a, b]) = F(b) - F(a) for -00 ::::; a < b. Then

www.MathGeek.com
www.MathGeek.com

Caratheodory Extension Theorem 93

theTe is a unique e:Eiension of Po to a meaSUTe p on B(lR) such


that p(I) < 00 for any bounded interval I.

Consider a random variable X defined on a probability space (0,


F, P). The distribution or law of X is the probability measure
Px on (lR, B(lR)) defined by px(A) = P(X E A) = P({w EO:
X(w) E A}) for each A E B(lR). VVe say that Px is the measure
on (lR, B(lR)) induced by X. Note that Fx(x) = px(( -00, xl)
and that Px = PoX-I.

5.9 Theorem If F is a nondecreasing, right-contirmous real-valued


function defined on lR then theTe er;ists a unique meaSUTe p on
(lR, B(lR)) such that p((a, bl) = F(b) - F(a) for all a <: b.

The measure p corresponding to the function F in Theorem 5.9


is said to be the measure on (lR, B(lR)) induced by F and is
obtained via Theorem 5.8. If Fx is the distribution function of
a random variable X then the measure on (lR, B(lR)) induced by
Fx is equal to the measure on (lR, B(lR)) induced by X.

5.10 Theorem If F is any probability distribution function then


there exists on some probability space a random variable X such
that Fx = F.

Proof. Let p be the measure on (lR, B (lR)) induced by F and


define a random variable X on the probability space (lR, B(lR) ,
p) by letting X (w) = w for each w E R The distribution
function Fx of X is given by Fx(x) = p({w : X(w) :::; x}) =
p(( -00, xl). From Theorem 5.9 we know that the measure p
is such that p((a, b]) = F(b) - F(a). In particular, p(( -00,
xl) = F(x) and hence F(x) = Fx(x). D

The dearer the teacher makes it, the worse


it is for you. You must work things out for
yourself and make the ideas your own.
vVilliam Osgood

www.MathGeek.com
www.MathGeek.com

94 Probability Theory

Many questions about a random variable X may be answered


based only on the distribution function of X. That is, to answer
such questions we do not need to know the probability space on
which X is defined. Instead, we may simply take the distribution
function Fx and use Theorem 5.10 to define a random variable
Yon a probability space (R B(JR.) , f-L) where f-L is the measure on
(JR., B(JR.)) induced by Fx. Any question about X that depends
only upon Fx will have the same answer if we ask it about
the random variable Y instead. Thus, we will often say "let X
be a random variable with distribution function Fx" and make
no reference to the underlying probability space on which X is
defined.
The following result establishes a link between the concept of
measurability and the existence of a functional relation. In par-
ticular, this result will place on firm footing the engineering
concept of a data processor.

5.11 Theorem ConsideT a collection {Xl, ... , Xn} of random vaTi-


abIes defined on a pTObab'ility space (0, F, P). A random vaTi-
able X defined on th'is space ,is measurable with Tespect to ()(Xl,
... , Xn) if and only -if theTe exists a BOTel measurable function
f: JR.n - 7 JR. s1Lch that X(w) = f(Xl(w), ... , Xn(w)) fOT all
wE 0.

5.7 Expectation

If X is a random variable defined on (0, F, P) then the expected


value of X is denoted by E[X] and is defined by

E[X] = kXdP
provided the integral exists. If 9 : (JR., B(JR.)) -7 (JR., B(JR.)) then
E[g(X)] = k g(X) dP.
A random variable X for which E[X] exists and is finite is said
to be integrable or to have a finite mean or to be a first order
random variable.

www.MathGeek.com
www.MathGeek.com

Expectation 95

5.12 Theorem Consider an integrable random variable X and a


Borel measurable function 9 : JR. ---+ R ff Fx is the distribution
function of X and if J-L x is the measure on (JR., B (JR.)) induced
by X then

Further, if X possesses a density function fx then

E[g(X)] = kg(x)fx(x) dx.

Example 5.5 In this example we will find a rather simple


expectation using three different methods in order to illustrate
some of the concepts that we have been considering.
Let X be a random variable defined on a probability space (0,
F, P) with distribution function

0 if x < -2
F (x) = ~ if - 2 ::; x < 3
{
1 if x ;:::: 3.

Note that P(X = -2) = P(X = 3) = 1/2. VVhat is E[X2 +


I]? D

Method I: \Ve will first find the expectation via a Lebesgue


integral over 0 with respect to P. Let A = {w EO: X(w) =
-2} and let B = {w EO: X(w) = 3} and note that 0\ (AUB)
is a P-null set. Further, note that

k(X 2
+ 1) dP
L + 1) + k
(X2 dP (X2 + 1) dP
L( + + k +
4 1) dP (9 1) dP
15
5P(A) + 10P(B) = 2.

The following result will be used by Method II.

www.MathGeek.com
www.MathGeek.com

96 Probability Theory

5.3 Lemma If
G(x) = {a(3 if x ~ y
'if x < y
where /3 > a and if h : lR ----Jo lR is continuous at y then

k h(x) dG(x) = ((3 - a)h(y).

Proof. Consider a subdivision r = {ao, ... , an} of an interval


[a, b] such that a < y < b and assume that aj > y > aj-l.
Recall the notation we introduced during our derivation of the
Riemann-Stieltjes integral. This desired result follows since
n
R(r) = Lh(bi)(G(a;) - G(a;-I)) = h(bj ) ((3 - a)
i=1

and since h(bj ) ----Jo h(y) as If! ----Jo O. D

Method II: We will next express E[X] as a Riemann-Stieltjes


integral over lR with respect to F. Let 1 > [ > 0 and note that

k (x
2
+ 1) dF(x)
2
j-2-E, -2+e) (x + 1) dF(x)
2
+ j3-E,3+c) (x + 1) dF(x)
1 1 15
-(4+1)+-(9+1) =-.
2 2 2

Method III: Finally, we will express E[X] as a Lebesgue inte-


gral over lR with respect to the measure /-Lx on (lR, B(lR)) induced
by X. First, note that
I if -2 E A and 3 E A
/-Lx(A) = P(X E A) = 1/2 ~~ -2 E A and 3 E Ac
{ 1/2 If -2 E AC and 3 E A
o if -2 E AC and 3 E AC
for A E B(lR). Note that lR \ { -2, 3} is a /-Lx-null set. Thus, it
follows that
E[X2 + 1] r
J{-2}
(x 2 + l)d/-LX + r (x
J{3}
2
+1)d/-LX
15
(4 + l)/-Lx({ -2}) + (9 + 1)/-Lx({3}) = 2.

www.MathGeek.com
www.MathGeek.com

Expectation 97

5.4 Lemma Consider a random variable X that takes values only


in some countable set {Xl, X2, ... } and a function 9 : JR;. ---+ JR;.
such that g(X) is integrable. It follows that
=
E[g(X)] = Lg(Xi) P(X = Xi).
i=l

Example 5.6 This example is known as the St. Petersburg


Paradox. Consider the following game. A fair min is flipped
until a tail appears; we win $2k if it appears on the kth toss.
Let the random variable X denote our winnings. \Vhat is E[X]?
That is, how mu(;h should we be required to "put up" in order
to make the game fair? Note that X takes on the value 2k with
probability 2- k ; i.e. the probability that we toss k -1 heads and
then toss one tail. Thus,
ex)

E[X] = L 2k 2- k = 00.
k=l

The paradox arises since most people would "expect" their win-
nings to be much less. The problem arises from our inability to
put in perspective the very small probabilities of winning very
large amounts. The problem returns a much more realistic value
if we assign a maximum amount that can be won; that is, if we
are allowed to "break the bank" when we reach a preassigned
level. D

Example 5.7 As part of a reliability stndy, a total of n items


are tested. Suppose that each item has an exponential failure
time distribution given by

for t > 0 where Ti is a random variable that denotes the time at


whi(;h the ith item fails and where A is a fixed positive mnstant.
Note that if A is large then we expect the item to fail quickly.
Assume that the T/s are mutually independent. (Is this a good
assumption?) Let T denote the time at which the first failure
O(;(;11rs. \Vhat is the expected value of T? Note that T ex(;eeds

www.MathGeek.com
www.MathGeek.com

98 Probability Theory

some positive time t if and only if Ti > t for each i. Thus, for
t> 0,

P(T> t) P(T1 > t, T2 > t, ... , Tn > t)


P(T1 > t)P(T2 > t) ... P(Tn > t)
{= Ae-Atldtl1= Ae- At2 dt 2 ••• (= Ae-Atndt n
it . t it
e -At ... e -At = e -nAt

From this we see that FT(t) = 1 - e- nAt for t > o. Recall


from the fnndamental theorem of caknlns that if a probability
distribution function is differentiable then that derivative is a
probability density function corresponding to that distribution.
Thus, fT(t) = nAe- nAt for t > 0 from which it follows that

= 1
E[T] =
1 o
tfT(t) dt = -
nA
where we have used the fact that fo= ye-Ydy = 1. Note that
the expeded time of the first failure decreases as either 17 or A
mcreases. D

5.8 Useful Inequalities

Let X be a random variable defined on (0, :.F, P). If kEN


then E[Xk] is called the kth moment of X and E[(X - E[X])k]
is called the kth central moment of X. The first moment of
X is called the mean of X and the second central moment of
X is called the variance of X and is denoted by (]"2, (]"1-, or by
VAR[X]. The standard deviation of X is denoted by (Tx and
is given by the nonnegative square root of the variance of X.
A random variable with a finite second moment is said to be a
second order random variable.

5.13 Theorem If k > 0 and if E[Xk] is finite then E[Xj] is finite


when 0 < j < k.

www.MathGeek.com
www.MathGeek.com

Useful Inequalities 99

Proof. Note that E[Xj] is finite if and only if E[IXlj] is finite.


Further, note that

j
E[IXl ] in IXl j dP
j j
r
J{IXlj<l}
IXl dP + r
l{IXlj:;"l}
IXl dP

< r
J{IXlj<l}
IdP+ r
J{IXlj:;"l}
IXlkdP
j
< P({IXl < I}) +E[IXlk] < 00.

Thus, if the kth moment is finite then all lower moments are
also finite. D

Exercise 5.4 The density function

. 1
f(x) = ---:-::-
7r(1 + x 2 )

for x E lR is called a Cauchy density function. Let X be a


random variable with density function f. Show that none of the
odd moments of X exists and that none of the even moments of
X is finite.

Exercise 5.5 Although the first moment of a random variable


need not exist, the second moment of a random variable always
exists. vVhy?

Exercise 5.6 Show that if X is a second order random vari-


able then

5.14 Theorem Consider a positive integer n and let Xl, ... , Xn be


mutually independent random variables defined on (n, F, P). If
Xi :2: 0 for each i or if E[Xi ] < 00 for each i then E[XI ... Xn]
exists and is equal to E[Xl]' .. E[Xn]'

www.MathGeek.com
www.MathGeek.com

100 Probability Theory

5.1 Inequality (Holder) Ifl < P < 00, 1 < q < 00, and 1+1
p q
= I,
then

5.2 Inequality (Minkowski) If p :2: 1, then

The following inequality is a special case of Holder's inequality.

5.3 Inequality (Cauchy-Schwarz) E[IXYI] ::; VE[X2]VE[Y2].

5.4 Inequality (Chebyshev) If 0: > 0 then

P(IX - E[X]I :2: a) ::; ~2VAR[X].


a

Example 5.8 Consider again Buffon's needle problem from


Section 5.5 on page 84 and recall that the random variable Y =
H/N provides an estimate of 2/Tr where H denotes the number
of times the needle hits a line after N drops. Note that

P(H = h) = (N)
h
(2)h
-;
(1 - -;2)N-h
for h = 0, 1, ... , N, where we have nsed the binomial dis-
tribntion from Section 5.4 on page 80. Thns, E[Y] = 2/Tr
and VAR[Y] = ~~ (1 - ~). What value of N ensures that
IY - (2/it) I < 0.01 with probability 0.999? Chebyshev's in-
equality implies that such will be true if

1 2 1 -2 ( 1 - -2) < 0.001.


( 1/ 100 ) N it it

This inequality holds when N > 2,313,350. The dedicated


reader is invited to verify this result empirically. D

Recall that a function <I> : lR ---7 lR is said to be convex if <I> (AX +


(I-A)Y) ::; A<I>(x)+(I-A)<I>(y) whenever 0::; A::; 1. A sufficient
condition for <I> to be convex is that it have a nonnegative second
derivative.

www.MathGeek.com
www.MathGeek.com

Useful Inequalities 101

5.5 Inequality (Jensen) If <I> is conve:r on an interval containing


the range of X then <I>(E[X]) ::::; E[<I>(X)]. Note that letting
<I> ( x) = x 2 implies that (E[X])2 ::::; E[X2].

5.6 Inequality (Lyapounov) If 0 < a ::::; (3 then

Statistical thinking will one day be as nec-


essary for efficient citizenship as the ability
to read and write. -H. G. 'Wells

Let X and Y be random variables with finite means and assume


that E[XY] is also finite. The covariance of X and Y is de-
noted by COV[X, Y] and is defined to be COV[X, Y] = E[(X-
E[X])(Y -E[Y])]. Note that COV[X, Y] = E[XY]-E[X]E[Y],
also. The random variables X and Yare said to be nncorrelated
if COV[X, Y] = 0; that is, if E[XY] = E[X]E[Y]. Note that if
X and Yare independent (and if E[X], E[Y], and E[XY] are
finite) then X and Yare uncorrelated. If the variances oJ and
()~ of X and Yare finite and nonzero then the correlation coef-
ficient between X and Y is denoted by p(X, Y) and is defined
by
p(X, Y) = COV[X, Y].
(}x(}y

5.15 Theorem If Xl, ... ,Xn are second order random variables
then
n n
VAR[XI + ... +Xn] = LVAR[Xi ] + 2 L COV[Xi' Xj].
i, j=l
i<j

5.3 Corollary If Xl, ... , Xn are second order, uncorrelated ran-


dom var'iables (that is, if COV[X;, X j ] = 0 when i i= j) then
n
VAR[XI + ... +Xn] = LVAR[Xd-
;=1

www.MathGeek.com
www.MathGeek.com

102 Probability Theory

5.4 Corollary If Xl, ... , Xn are second order, mutually indepen-


dent random variables then
n
VAR[X 1 + ... +Xn ] = LVAR[Xi ].
;=1

5.9 Transformations of Random Vari-


ables

5.16 Theorem If X and Y have a joint probability density function


f x, y (x, y) then the random variable Z = X + Y possesses a
density function given by

fz(z) = [ : fx,Y(x, z - x) dx.

Proof. Let Az = {(x, y) E]R2 : x + y:::::; z} and note that

P(Z:::::;z) = jjfX,y(x,y)dydx
Az

[: [Z~X fx,Y(x, y) dydx


[: [z= fx,Y(x, s - x) dsdx
[ziX) [: fx,y(x, s - x) dxds.

Thus, we have found a nonnegative function f z (s) such that

P(Z:::::; z) = [ziX) fz(s) ds


for all z E R It follows by definition that fz is a probability
density function for Z. D

5.5 Corollary If X and Yare independent mndom 1Jariables pos-


sessing density functions fx and fy, respectively, then the mn-
dom variable Z = X + Y possesses a probability density function
given by
fz(z) = l fx(x)fy(z - x) dx.

www.MathGeek.com
www.MathGeek.com

Transformations of Random Variables 103

Note that this density for Z is the convoltdion of fx and fy.

For example, if X and Yare independent random variables each


with a uniform distribution on [0, 1] then X +Y has a triangular
distribution on [0, 2]. A proof of the following result will be
supplied by Example 5.10.

5.17 Theorem If X and Y possess a joint pmbability density func-


tion fx,Y(x, y) then the random variable B = XY possesses a
probability density funct'ion given by

5.18 Theorem Consider a random variable X that possesses a pmb-


ability density function fx and a function g: ]R ---+ ]R that pos-
sesses a differentiable inverse. The random variable Y = g(X)
possesses a pmbability density function given by

Example 5.9 Consider a random variable X with probability


density function fx and let g(x) = ax + b for a, b E ]R with
a i- O. Let Y = g(X) and note that g-l(x) = (x - b)ja. Thus,
Theorem 5.18 implies that

fy(y) = fx(g-l(y)) Id~yg-l(Y)1


.fx (y : b) I :y (y : b) I

Ix (Y:b) I~I.
D

5.19 Theorem Consider random variables X and Y that possess a


joint pmbability density function Ix, y(x, y). Consider functions
g: ]R2 ---+ ]R and h: ]R2 ---+ ]R for wh'lch there eX'lst I1Lnctions a:
]R2 ---+ ]R and {3: ]R2 ---+ ]R such that a(g(x, y), h(x, y)) = x

www.MathGeek.com
www.MathGeek.com

104 Probability Theory

aa aa
and (3(g(x , y), h(X, y)) = y and such that -a (x, y), -a (x, y),
'x y
l6
aa (x, y), and aa(3 (x, y) each e:rist. The random vaTiables B =
x y
g(X, Y) and T = h(X, Y) possess a joint pmbability density
function given by

fB,T(b, t) = fX'y(a(b, t), (3(b, t)) det

Example 5.10 As an example we will prove Theorem 5.17.


Let g(x, y) = xy and h(x, y) = y. Let a(b, t) = bit and fJ(b,
t) = t and note that a(g(x, y), h(x, y)) = x and. 16(g(x, y), h(x,
y)) = y as desired. Let B = XY and T = Y. Using the previous
result it follows that
1
0
t
fB, T(b, t) fx,y(~,t) det
-b
1
t'2

fx,Y (~, t) I~I·


Thus, it follows that

fB(b) k fB, T(b, t) dt

k fx,Y (~, t) I~I dt


as claimed. D

www.MathGeek.com
www.MathGeek.com

Moment Generating and Characteristic Functions 105

5.10 Moment Generating and Char-


acteristic Functions

The moment generating function of a random variable X is de-


fined to be Alx(s) = E[e SX ] for all s E lR for whkh the expec-
tation is finite provided that 1\1x (s) is finite in some nonempty
open interval containing the origin.

5.20 Theorem The moment genemting funct'lon of a bounded mn-


dom variable exists.

Proof. Let X be a bounded random variable, and note that e Sx


is bounded as well for each fixed value of s. Thus, E[e SX ] exists
for each fixed s and, by the dominated convergence theorem, is
a continuous function of s. D

5.21 Theorem Consider a mndom variable X for wh'ich the moment


genemting function 1\Jx (s) e.1:'ists. The function JvIx satisfies the
following properties:

=
1. 1\1x(s) = L skE[Xk]jkL
k=O

5.22 Theorem If X and Yare independent mndom variables pos-


sessing moment genemting functions lUX and lvIy , respectively,
then the sum X + Y possesses a moment genemting function
that is given by lvIx+y(s) = lUx(s)My(s).

5.23 Theorem Cons'lder two mndom variables X and Y possessing


moment genemting functions 1\1x and lvIy , respectively. The
mndom variables X and Y have the same distribution if and
only if lvIx = 1\1y .

4This result is known as Taylor's Theorem.

www.MathGeek.com
www.MathGeek.com

106 Probability Theory

In the space of one hundred and seventy-six


years the Lower Mississippi has shortened
itself two hundred and forty-two miles.
That is an average of a trifle over one mile
and a third per year. Therefore, any calm
person, who is not blind or idiotic, can
see that in the Old Oolitic Silurian Period,
just a million years ago next November,
the Lower Mississippi River was upward of
one million three hundred thousand miles
long, and stuck out over the Gulf of Mex-
ico like a fishing-rod. And by the same to-
ken any person can see that seven hundred
and forty-two years from now the Lower
Mississippi will be only a mile and three-
quarters long, and Cairo and New Orleans
will have joined their streets together, and
be plodding comfortably along under a sin-
gle mayor and a mutual board of aldermen.
There is something fascinating about sci-
ence. One gets such wholesale returns of
conjecture out of such a trifling investment
of fact. -Mark Twain

Example 5.11 A random variable X is said to have a Poisson


distribution with parameter A > 0 if
Ak
P(X = k) = _e- A
k!
for each nonnegative integer k. Note that for snch a random
variable X, the moment generating function NIx exists and is
given by

NIx(s)

www.MathGeek.com
www.MathGeek.com

Moment Generating and Characteristic Functions 107

where we have recalled that the Taylor's series expansion for e Z


is given by
k (Xl
z _","""" z
e - L ,.
k=O k.

Now, assume that X and Yare independent random variables


each with a Poisson distribution with parameter ).. \Vhat is the
distribution of X + Y? Using Theorem 5.22 we see that

and hence from Theorem 5.23 it follows that X +Y is Poisson


with parameter 2),. D

A problem with the moment generating function is that it need


not exist and hence is difficult to use in a general setting. The
characteristic function defined below shares many of the same
properties as the moment generating function yet always exists.
In nonprobabilistic contexts, a moment generating function is
similar to a Laplace transform and a characteristic function is
similar to a Fourier transform.
The characteristic function of a random variable X is the func-
tion <I> x : lR ----Jo C defined by
<I>x(t) = E[e~tX] = E[cos(tX)] + 1.E[sin(tX)].
For the characteristic functions of several common distributions,
see Table 5.1.

5.24 Theorem A characteristic function <I>x exists for any random


variable X and it possesses the following properties:

1. I<I> x (t) I ::::; <I> x (0) = 1 for all t E R

2. IfE[lXkl] < 00 then <I>~)(O) = 1,kE[Xk].

3. <I>x(t) = <I>x(-t).
-4. <I> x (t) is real-valued if and only if F x is symmetric; that
is, if and only if IE dFx(x) = LE dFx(x) for any real
Borel set B where - B = {-x: x E B}. (Note that a
random variable with a symmetric, absolutely continuous
probability distr"ibution function possesses an even proba-
bility density function.)

www.MathGeek.com
www.MathGeek.com

108 Probability Theory

5.25 Theorem Distinct pTObability distribtdions correspond to dis-


tinct characteristic functions.

5.26 Theorem If X and Yare independent random variables then

5.27 Theorem If a, b E lR and 'if Y = aX + b then

5.28 Theorem (Continuity Property) Suppose that {Fn}nENis a


sequence of pTObability distribution functions with correspond-
ing characteristic functions {<pn : n E N}. If there exists a
pTObability distribution function F such that Fn(x) ---7 F(x) at
each point x where F is continuous then <pn(t) ---7 <p(t) for all
t, where <P is the characteristic function of F. Conversely, if
<p(t) = limn ---7= <pn(t) exists and is continuous at t = 0 then <P is
the character'isi'ic function of some pTObability &istribui'ion ftLnc-
tion F and Fn(x) ---7 F(x) at each point x where F is continuous.

5.11 The Gaussian Distribution

A random variable X is said to be a Gaussian random variable


or to possess a Gaussian distribution if X has a probability
density function of the form

1
fx(x) = ~exp
(-(X - m)2) 2
21l"(J"2 2(}

for all x E lR where m E lR and (}2 > 0 are fixed parameters. To


indicate that X has such a distribution we write X rv N(m, (}2).5
As we will see, the mean of X is Tn and the variance of X is (}2.
Note that these two parameters completely specify the Gaussian
"Some texts refer to the Gaussian distribution as the Normal distribu-
tion. The "N" in our notation comes from this latter terminology.

www.MathGeek.com
www.MathGeek.com

The Gaussian Distribution 109

Table 5.1 Common Characteristic Functions

I Distribution I Ix <Px

e't - 1
Uniform I(O,l)(X)
~t

1
Exponential e-xI(o, =) (x) --
I - ~t

1
Laplace le- 1xl --
2 1 + {2

1 1
Cauchy - e- 1tl
1[" 1 + X2

1 e -x2/2
-- e- t2 /"2
Gaussian
yI27r

www.MathGeek.com
www.MathGeek.com

110 Probability Theory

distribution of X. If X N(O, 1) (i.e. if X is Gaussian with


f'.)

zero mean and unit variance), then we say that X is a standard


Gaussian random variable or that X has a standard Gaussian
distribution.

5.29 Theorem If X N(m, ()2) then X possesses a moment gener-


f'.)

ating function given by


2
NJX(t) = exp ((J:t + tm) .

Proof. Note that

Mx(t)

where the final integral equals 1 since the integrand is a Gaussian


density with mean (J2t + 1n and variance (J2. D

www.MathGeek.com
www.MathGeek.com

The Gaussian Distribution 111

Example 5.12 The moment generating function may now


be used to confirm that if X N(rn, (}2) then E[X] = Tn and
f"'-.)

VAR[X] = (}2. Note that lVl'x(t) = (Tn + (}2t)l\1x(t) and that


l\1'{(t) = (}2l\1x(t) + (m + (}2t)2 1v1x(t). Thus, E[X] = M'x(O) =
m and E[X2] = l\1'{(0) = (}2+rn 2, which implies that VAR[X] =
E[X2] - (E[X])2 = (}2 as expected. D

5.30 Theorem If a mndom vaTiable X has a N(m, (}2) distTibution


then the mndom vaTiable
W = X -Tn
()

is a standaTd Gauss'lan mndom vaTiable.

Proof. Note that

Fw(w) = P(W S w) = P (X :rn s w) = P(X S (}w + rn).


Thus, it follows that

Fw(w) = i:+m fx(x) dx

j w ~ exp ( _y2) dy with y = x -m~ .


-00 V 21f 2 ()
Thus, we see that lV has a standard Gaussian distribution. D

Note 5.4 If
x 1
cD(x) = j
. -00
;;cexp(-t 2 /2)dt
V 21f

then, for x 2 0,

cD(x) = 1- ~(1 +d1x+d"2X2 +d3X 3 +d4x4+d5X 5+d6x 6)-16 +E(X)

where 1c:(x)1 < 1.5 x 10- 7 and where


d1 0.0498673470
d2 0.0211410061
d3 0.0032776263
d4 0.0000380036
d5 0.0000488906
d6 0.0000053830.

www.MathGeek.com
www.MathGeek.com

112 Probability Theory

Further, if <I>(x) = 1 - p for 0 < p ::; 1/2 then

x=t-
Co + C1t + C2 t "2 +;:-(p)
1+q1t+q2t2+q3t3 ~

where Ic(p) I < 4.5 X 10-4 , where

and where

Co 2.515517
C1 0.802853
C2 0.010328
ql 1.432788
q"2 0.189269
q3 0.001308.

5.12 The Bivariate Gaussian Distri-


bution

Two random variables X and Yare said to possess a bivariate


or joint Gaussian distribution if they possess a joint probability
density function of the form

fx,Y(x, y) = 2
1
VI - P2 exp
(-q(X,2 y))
KO"lO""2

with q(x, y) =

www.MathGeek.com
www.MathGeek.com

Multivariate Gaussian Distributions 113

where IT1 > 0, IT2 > 0, m1 E lR, m2 E lR, and Ipi < 1. Our nota-
tion for this distribution is N(m1' rn2, o"i, 0"3, p). Such random
variables X and Yare said to be jointly Gaussian or mutually
Gaussian.

Exercise 5.7 For X and Y as above, show that X f"'...J N(m1'


lTD and that Y rv N(m2, IT3).

Exercise 5.8 For X and Y as above, show that the correla-


tion coefficient of X and Y is p.

5.31 Theorem Let X and Y have a N(m1' m2, O"i, 0"3, p) distribu-
tion. The random variables X and Yare independent if and
only 'if p = O. That is, mutually Gaussian random var'iables X
and Yare 'independent if and only 'if they are uncorrelated.

Proof. \Ve have already seen on page 101 that if the two random
variables are independent then they are uncorrelated. To see
that, in this case, if the random variables are uncorrelated then
they are independent, simply let p = 0 and note that fx,Y(x,
y) = fx(x)fy(y). D

5.32 Theorem If X and Y possess a bivariate Gaussian distribution


then X + Y is a Gamsian random variable.

5.13 Multivariate Gaussian Distribu-


tions

A collection {Xl, ... , Xn} ofrandom variables is said to possess


a multivariate Gaussian distribution (or to be jointly Gaussian
or mutually Gaussian) if they possess a joint probability density
function of the form !x 1 , ... ,Xn (X1, ... , xn ) =

1
---===----exp [ --(x
1 - m) T ~- l (x
' - m) ]
V(21l")nVdet~ 2

www.MathGeek.com
www.MathGeek.com

114 Probability Theory

where x = [Xl' .. Xn]T, m = [ml ... mnJT, and 2.: is a symmetric


positive definite matrix. Recall that a matrix N is symmetric if
N = NT and that a real symmetric matrix is positive definite if
all of its eigenvalues are positive. It follows easily that E[Xi ] =
mi fori = 1, ... , n and that COV[Xi' X j ] = aij where

2.:=

Ltnl an 2 (Y nn

The matrix 2.: is called the covariance matrix of Xl, ... , X n . We


denote sHch a distribution for X = [Xl, ... , Xn]T by writing
X N(m, 2.:).
f"'.)

Except for boolean algebra there is no the-


ory more universally employed in mathe-
matics than linear algebra; and there is
hardly any theory which is more elemen-
tary, in spite of the fact that generations
of professors and textbook writers have ob-
scured its simplicity by preposterous calcu-
lations with matrices. -J. Dieudonne

5.33 Theorem If a collection of Gaussian mndom variables are mu-


tually independent then they are mut1wlly Gaussian.

5.34 Theorem If a collection of mutually Gaussian mndom vari-


ables are (pa'lrW'l.'ie) 1Lncorrdated then they are rrmtually inde-
pendent.

5.35 Theorem If X = [Xl, ... , Xn]T has a N(p, 2.:) distribution


with 2.: posdive definde, if e is an m x n real matr'lX with mnk!"
m ::::; n, and if b is an m x 1 r-eal vector, then ex + b has a
N (e J1 + b, C2.:CT ) distribution and e2.:CT is positive definite.
6The rank of a matrix is the number of linearly independent rows (or
columns) in the matrix. The matrix C in this theorem can have more
columns than rows but the rows must be linearly independent.

www.MathGeek.com
www.MathGeek.com

Multivariate Gaussian Distributions 115

5.36 Theorem If X = [Xl, ... , Xn]T 'lS composed of mut1wlly


Gaussian, positive variance random variables, then there ex-
ists a nonsingular n x n real matrix C such that Z = ex
is a random vector composed of mutually independent positive
variance Gaussian random variables.

Example 5.13 Let Xl and X 2 be mutually Gaussian ran-


dom variables with zero mean, unit variances, and correlation
coefficient ~. Let

[ ~~ ] = [~~ ~:] [ ~~ ] = [ ~~~~ ! ~:~~ ].


Note that Zl and Z2 are mutually Gaussian. Thus, for Zl and
Z'2 to be independent we require that E[ZlZ'2] = E[Zl]E[Z'2] =
O. Let Cl = C3 = 1, let C'2 = 0, and note that E[ZlZ'2] =
E[Xl(Xl + C4X2)] = E[Xf] + C4E[X1X2]. Note that E[Xf] = 1
and E[X1X'2] = 1/2. Thus, Zl and Z'2 are independent if C4 =
-2. D

Example 5.14 Let Xl, ... , Xn be random variables possess-


ing a joint probability density function given by

www.MathGeek.com
www.MathGeek.com

116 Probability Theory

Thus, since any subset of {Xl, ... , Xn} containingn - 1 ran-


dom variables is composed of mutually independent standard
Gaussian random variables, it follows that any proper subset of
{Xl, ... , Xn} containing at least two random variables is also
composed of mutually independent standard Gaussian random
variables. However, it is dear that the random variables in {Xl,
... , Xn} are neither mutually independent nor mutually Gaus-
sian. This example points out the dangers that arise when one
attempts to show that a collection of random variables is jointly
Gaussian. D

5.14 Convergence of Random Vari-


ables

Consider a probability space (0, F, P) and a sequence {Xn}nEN


of random variables defined on that space. In this section we
will consider several ways in which the elements in this sequence
may converge.

5.14.1 Pointwise Convergence

Consider a probability space (0, F, P) and a sequence {Xn}nEN


of random variables defined on that space. \lVe say that the Xn's
converge pointwise to a random variable X defined on (0, F, P)
if IXn(w) -X(w)l---+ 0 asn ---+ OC! for each wE 0. In such a case
we write Xn ---+ X.

5.14.2 Almost Sure Convergence

In a probabilistic context, a condition that holds almost ev-


erywhere with respect to the underlying probability measure of
interest is said to hold almost surely (written a.s.) or with prob-
ability one (written wpl).

www.MathGeek.com
www.MathGeek.com

Convergence of Random Variables 117

Consider a probability space (0, :F, P) and a sequence {Xn}nEN


of random variables defined on that space. \lVe say that the Xn's
converge almost surely to a random variable X defined on (0,
:F, P) if there exists a set E E :F such that P(E) = and such
° °
that IXn(w) - X(w)1 ----+ as n ----+ 00 for each wE EC. In such a
case we write Xn ----+ X a.s.

5.37 Theorem Consider- a pmbability space (O,:F, P) and a se-


quence {Xn},,,EN of mndom var-iables defined on that space. If
X is a mndom var-iable defined on (0, :F, P) such that
00

L E[(Xn - X)2] < 00


n=l

then Xn ----+ X a.s.

Example 5.15 Consider the probability space given by ([0,


1], 8([0, 1]), A) where 8([0, 1]) denotes the collection of real
Borel subsets of [0, 1] and A is Lebesgue measure on 8([0, 1]).
Define random variables Xn for n E N on this space via:
if w E [0, 1] n QC
if w E [0, 1] n Q.
Note that Xn(w) ----+ 00 as n ----+ 00 for all w E [0, 1] n Q. Even so,
[0, 1] n Q is countable and hence is a Lebesgue null set. Further,
off the set [0, 1] n Q we see that Xn ----+
° °
as 17 ----+ 00. Thns,
we mndllde that Xn ----+ a.s. Note that this also follows from
Theorem 5.37 since

which is finite. D

5.14.3 Convergence in Probability

Consider a probability space (0, :F, P) and a sequence {Xn}nEN


of random variables defined on that space. \Ve say that Xn
converges in probability to a random variable X defined on (0,
°
:F, P) if for each E > 0, P(IX n - XI ~ E) ----+ as n ----+ 00. In
p
such a case we write .Xn -----? X.

www.MathGeek.com
www.MathGeek.com

118 Probability Theory

5.38 Theorem Cons-ideT a probability space (0, F, P) and a se-


q1lence {Xn }nEN of random variables defined on that space. Let
X be a random variable also defined on (0, F, P). 1f Xn ---+ X
p
a.s. then Xn ----? X. That is, convergence in probability is
weaker than almost sure convergence.

Example 5.16 Consider a sequence of mutually independent


random variables {Xn}nEN such that

P(Xn=(Y)={~1-- 1
if (Y = 1
if 0: = O.
n

Let c > 0 and note that

01 ~ c) ~ s) = { ~ if c > 1
P(IXn - = P(Xn if 0 < c :S 1.

Thus, P(Xn ~ c) ---+ 0 as n ---+ 00 for any E > 0 which implies


p
that Xn ----? O. Does Xn ---+ 0 a.s.? See Problem 11.1. D

5.14.4 Convergence in Lp

Consider a probability space (0, F, P), and let p be a positive


real number. Let Lp(O, F, P) denote the set of all random
variables defined on (0, F, P) whose pth absolute moment is
finite, where we agree to identify any two random variables that
are eqnal almost surely. (The pth absolute moment of a random
variable X is E[IXIP].)
Consider a probability space (0, F, P) and a sequence {Xn}nEN
of random variables defined on that space such that Xn E Lp(O,
F, P) for some fixed p > o. \Ve say that the Xn's converge
in Lp (or in the pth mean) to a random variable X E Lp(O,
F, P) if E[IXn - XIP] ---+ 0 as n ---+ 00. In such a case we write
Xn ---+ X in Lp. If p = 1 then Lp convergence is sometimes called
convergence in mean. If p = 2 then Lp convergence is sometimes
nl.s.
called convergence in mean-sqnare and we often write Xn ----? X.

5.39 Theorem Cons-ideT a probability space (0, F, P) and a se-


quence {Xn}nEN of random variables defined on that space. Let

www.MathGeek.com
www.MathGeek.com

Convergence of Random Variables 119

After passing through several rooms in


a museum filled with the paintings of a
rather well-known modern painter, [Zyg-
mund] mused, '-Mathematics and art are
quite different. VVe could not publish
so many papers that used repeatedly the
same idea and still command the respect
of our colleagues." -Ronald Coifman and
Robert Strichartz writing about Antoni
Zygmund

X be a random variable also defined on (n, F, P). frp there


exists some p > 0 such that Xn ----7 X in Lp then Xn ----7 X.
That is, convergence in probability is weaker than convergence
in Lp.

Exercise 5.9 Does the converse to Theorem 5.39 hold?

Exercise 5.10 Construct a sequence of random variables


that does not converge pointwise to zero at any point yet does
converge to zero in Lp for any p > o.

Exercise 5.11 Show by an example that almost sure con-


vergence need not imply convergence in Lp.

5.14.5 Convergence in Distribution

A sequence {Xn}nEN of random variables is said to converge in


distribution or converge in law to a random variable X if the
sequence {FX,JnEN of distribution functions converges to Fx(x)
at all points x where Fx is continuous. In such a case we write
L
Xn X. Note that these random variables need not be defined
----7

on the same probability space.

www.MathGeek.com
www.MathGeek.com

120 Probability Theory

Table 5.2 Relations Between Types of Convergence

Relation Reference
Xn ---7 X in Lp #- Xn ---7 X a.s. Exercise 5.10
Xn ---7 X a.s. #- Xn ---7 X in Lp Exercise 5.11
p
Xn ---7 X a.s. ::::} Xn ----? X Theorem 5.38
p
Xn ----? X #- Xn ---7 X a.s. Example 5.16
p
Xn ---7 X in Lp ::::} Xn ----? X Theorem 5.39
p
Xn ----? X #- Xn ---7 X in Lp Problem 11.3
p L
Xn ----? X ::::} X n ----? X Theorem 5.40
L P
Xn ----? X #- Xn ----? X Example 5.17

5.40 Theorem Cons-ideT a probability space (0, F, P) and a se-


quence {Xn}nEN of random vaTiables defined on that space. Let
p
X be a random vaTiable also defined on (0, F, P). If Xn ----? X
L
then Xn ----? X.

Example 5.17 Let X take on the values 0 and 1 each with


probability 1/2, and let Xn = X for each n E N. Let Y = I-X.
L
Note that Xn ----? Y since FXn = Fx = Fy for all n E N even
though IXn - YI = 1 for each n E N. D

Table 5.2 summarizes the relationships between the different


types of convergence that we have considered.

5.15 The Central Limit Theorem

The Central Limit Theorem states that the sum of many in-
dependent random variables will be approximately Gaussian if
each term in the sum has a high probability of being small.

www.MathGeek.com
www.MathGeek.com

The Central Limit Theorem 121

A key word in that description is "approximately." Nowhere


does the Central Limit Theorem state that anything actually
has a Gaussian distribution, except perhaps in a limit. In en-
gineering applications, Gaussian assumptions are often justified
by appeals to the Central Limit Theorem. Such appeals are,
however, often at best simply not properly supported and at
worst are simply specious. \Ve must always keep in mind that
the Central Limit Theorem is not a magic wand that can make
anything have a Gaussian distribution.

5.41 Theorem (Central Limit Theorem) Suppose that {Xn}nEN


is a mutually 'independent sequence of ident'ically distr'ib1Lied ran-
dom variables each with mean m and finite positive variance (j2.
frSn = Xl + ... + Xn then
S -nm
(jfo
n
£
-----+Z

where Z ,is a standard Cams'ian random variable.

Proof. (Sketch) Let m = O. Let ¢ be the characteristic


function of Xn and note that Sn;;;; has characteristic function
()yn

[¢ ((j~) In. Since the X;'s have a finite variance, Taylor's


t 2 ()2
theorem implies that ¢(t) = 1 - -2- + /3(t) where fJ(t)/t 2 ----7 0
as t ----7 O. (Recall from calculus that

(1 - ~n2)11 ----7 exp (2)


-;

as n ----7 00.) Thus, it follows that the characteristic function of


S
~. converges to exp( -t 2 /2), the characteristic function of a
()yn
standard Gaussian random variable, as n ----7 00. The desired
result follows from Theorem 5.28 on page 108. D

<) Note 5.5 A sequence of mutually independent, identically


distributed, second order random variables exists such that the
convergence rate associated with the Central Limit Theorem can

www.MathGeek.com
www.MathGeek.com

122 Probability Theory

be aTbitraTily slow. For details about this result, see the article
"A Lower Bound for the Convergence Rate in the Central Limit
Theorem" by V. K Matskyavichyus in the Theory of Probability
and its Applications, 1983, Vol. 28, No.3, pp. 596-601. This
results calls into question the standard engineering claim that
the sum of a few dozen random variables is always approximately
Gaussian.

5.16 Laws of Large Numbers

Consider n mutually independent tossings of a coin with con-


stant probability p of turning up heads. Let T denote the num-
ber of times that the coin comes up heads in n tosses. If n is
large then it is reasonable to expect the ratio T /n to be close
to p. The laws of large numbers make this idea mathematically
precise. According the 'Weak Law of Large Numbers CWLLN),
the ratio r/n converges to p in probability. According to the
Strong Law of Large Numbers (SLLN), T /n converges to p al-
most surely.

5.42 Theorem (WLLN) If {Xn}nEN is a sequence of identically


distributed, mtdually independent random var'iables each with a
finite mean m, then

- - - - - - ----+ In.
n

Proof. We will prove only the special case when the Xn's each
have a finite positive variance (J"2. Let
1 n
X= - LXk
n k=l

and apply Chebyshev's inequality to obtain

P (I XI + ..... + Xn -mI>s) <-.


17 ns - -
(J2
2

The desired result now follows immediately. D

www.MathGeek.com
www.MathGeek.com

Conditioning 123

5.43 Theorem (SLLN) If {Xn}nENis a sequence of identically dis-


trib1ded, rrmtually independent random variables each with a fi-
nite mean Tn and a finite positive variance 17 2 , then

---+ Tn a.s.
n

5.17 Conditioning

Consider a random variable X defined on a probability space


(rl, F, P) with E[IXI] < 00, and let Q be a l7-subalgebra of F.
The conditional expectation of X given Q is denoted by E[XIQ]
and is defined to be any random variable defined on (rl, F, P)
that satisfies the following two properties:

1. E[XIQ] is Q-measurable.

2. 1; E[XIQ] dP = 1; X dP for all G E Q.

Any Q-measurable random variable that is equal a.s. to E[XIQ]


is called a version of E[XIQ].

5.44 Theorem Consider a random var'iable X defined on a probabil-


ity space (rl, F, P) with E[lXI] < 00, and let Q be a l7-subalgebra
of F. The conditional e.rpectat'ion E[XIQ] eX'ists and ,is almost
surely unique.

If A E F then the conditional probability of A given Q is denoted


by P(AIQ) and is defined by

Thus, P(AIQ) satisfies the following two properties:

1. P(AIQ) is Q-measurable.

2. r P(AIQ) dP = P(A n G) for all G E Q.


Je

www.MathGeek.com
www.MathGeek.com

124 Probability Theory

Exercise 5.12 Consider a random variable X defined on


a probability space (D, F, P) with E[IXll < 00. Show that
E[XIFl = X a.s.

Exercise 5.13 Consider a random variable X defined on


a probability space (D, F, P) with E[IXll < 00. Show that
E[XI{ 0, n}l = E[Xl. Does this hold pointwise or just almost
surely?

Consider random variables X and Y defined on a probability


space (D, F, P) with E[IXll < 00 and E[lYll < 00, and let Q
be a (J-sllbalgebra of F. Conditional expectations satisfy the
following properties:

1. If X = a a.s. for a E ~ then E[XIQ] = a a.s.

2. If 0: E ~ and (3 E ~ then E[aX + (3YIQl = o:E[XIQl +


(3E[YIQl a.s.
3. If X ::::; Y a.s. then E[XIQl ::::; E[YIQl a.s.
4. IE[XIQll ::::; E[IXIIQl a.s.

Property (1) is a special case of the following result.

5.45 Theorem Consider integrable random variables X and Y de-


fined on a probability space (D, F, P), and let Q be a (J-subalgebra
ofF. If X is Q-measurable and ifE[XYl is finite then E[XYIQl =
XE[YIQl a.s.

5.6 Corollary Consider a random variable X defined on a probabil-


ity space (n, F, P) with E[lXll < 00, and let Q be a (J-subalgebra
of F. If X is Q-measurable then E[XIQl = X a.s.

5.46 Theorem Consider a random variable X defined on a proba-


bility space (D, F, P) with E[IXll < 00, and let Q1 and Q2 be
(J" -s1Lbalgebras of F such that Q1 c Q:2. It follows that

E[E[XIQ1lIQ2l
E[XIQ1l a.s.

www.MathGeek.com
www.MathGeek.com

Conditioning 125

Proof. Vve will first show that

Recall that if

1. Y is Qrmeasnrable, and

2. r Y dP = Jer X dP for all G E Ql


Je
then Y = E[XIQl] a.s. Thus, if E[E[XIQ2]IQl] satisfies the previ-
ous two properties then E[E[XIQ2]IQl] = E[XIQl] a.s. By defini-
tion, E[E[XIQ2]IQl] is Ql-measurable. Thus, we need only show
that
1;E[E[XIQ2]IQl] dP = 1;
X dP
for all G E Ql. By definition, the conditional expectation
E[E[XIQ2]IQl] must satisfy:

L E[E[XIQ2]IQl] dP = LE[XIQ2] dP for all G E Ql. (1)

Similarly, E[XIQ2] must satisfy:

which, since Ql C Q2, implies that

Substituting this expression into (1) implies that

for all G E Ql which is what we wanted to show. \Ve will next


show that E[E[XIQl]IQ2] = E[XIQl] a.s. In this regard, the
following lemma will be useful.

5.5 Lemma Consider- a mndom var-iable Z defined on a pr-obability


space (D, F, P) and let Ql and Q2 be (J-s1Lbalgebms of F. If Z
is Ql -measumble and if Ql C Q2 then Z is Q2 -measumble.

www.MathGeek.com
www.MathGeek.com

126 Probability Theory

Proof. Since Z is Yl-measurable it follows that Z-l(B(lR)) C


Yl. But, since Yl C Y2 it follows that Z-l(B(lR)) C Y2. But this
means that Z is Y2-measurable. D

\Ve will now continue with our proof of Theorem 5.46. By defi-
nition, E[XIY1] is Yl-measurable. Lemma 5.5 thus implies that
E[XIY1] is also Y2-measurable. Corollary 5.6 thus implies that
E[E[XIY1]IY2] = E[XIY1] a.s. D

5.7 Corollary Consider a random variable X defined on a pmbabil-


ity space (0, F, P) with E[IXI] < 00, and let Y be a (T-subalgebra
of F. It follows that E[E[XIYll = E[X].

Consider random variables X, 11, ... , Y" defined on a proba-


bility space (0, F, P) with E[IXI] < 00. The conditional ex-
pectation of X given Y1, ... , Yn is denoted by E[XI11, ... , Yn ]
and is defined to be E[XI(J"(Yl' ... , Yn)].

5.4 7 Theorem If X and Yare independent random variables de-


fined on a pmbability space (0, F, P) wdh X 'integrable then
E[XIY] = E[X] a.s.

Proof. Since X and Yare independent it follows that P(A n


B) = P(A)P(B) for all A E (T(X) and for all B E (T(y). Let
A E (T(y) and consider the random variable I A . Since (T(IA) C
(T(y) it follows that X and IA are independent. Note that

LE[XIY]dP LX dP for all A E (T(y)

in lAX dP for all A E (J"(y)


E[IAX] for all A E (T(y)
E[tl]E[X] for all A E (T(y)
in IA dP E[X] for all A E (J"(y)

14 dP E[X] for all A E (J"(y)


14 E[X] dP for all A E (T(y).
Note also that E[X] is (J"(Y)-measurable. Thus it follows that
E[XIY] = E[X] a.s. D

www.MathGeek.com
www.MathGeek.com

Conditioning 127

Exercise 5.14 Show that if E[XIY] = E[X] then E[XY] =


E[X]E[Y].

5.48 Theorem (Jensen's Inequality) Cons'ideT a mndom vaT'l-


able X defined on a probability space (D, :.F, P) with E[IXI] <
00, and let Q be a rr-subalgebm of :.F. If cjJ ,is a conve.1: Teal-valued

function defined on lR and if cjJ(X) is integmble then

cjJ(E[XIQ]) :::; E[cjJ(X)IQ] a.s.

Example 5.18 Consider a random variable X defined on a


probability space (D, :.F, P) with E[X2] < 00, and let Ql and Q2
be O"-t-mbalgebras of :.F such that Ql C Q2' \Ve will show that

To begin, for a rr-subalgebra Q of :.F, note that

E[X2 - 2XE[XIQ] + E[XIY]2]


E[X2] - 2E[XE[XIQ]] + E[E[XIQ]2]
E[X2] - 2E[E[XE[XIQ]IQ]]
+E[E[XIQ]2]
E[X2] - 2E[E[XIQ]2] + E[E[XIQ]2]
E[X2] - E[E[XIQ]2].

Thus, it follows that

if and only if

The desired result follows since we have

E[E[XIQl]2] E[(E[E[XIQ2]IQl])2]
< E[E[E[XIQ2]2IQl]]
E[E[XIQ2]2]

via Jensen's ineqnality. D

www.MathGeek.com
www.MathGeek.com

128 Probability Theory

Consider a probability space (0, :F, P) and let H denote the


Hilbert space of square integrable random variables defined on
(0, :F, P) where (X, Y) = E[XY] and where we agree to iden-
tify any two random variables X and Y for which E[(X - y)2] =
O. Let X, Y1 , ... , Yn be second order random variables defined
on (0, :F, P). Our goal now is to find a Borel measurable fU11(;-
tion f: lRn ----+ lR so that E[(X - .1(11, ... , Yn))2] is minimized
over all such functions f. Let G be the subspace of H given
by all elements of H that may be written as Borel measurable
transformations of Y1 , ... , Yn . Using the Hilbert Space Projec-
tion Theorem (Theorem 4.7) we know that the function we are
seeking is the projection of X on G; that is, we seek the point
in G that is nearest to X.

5.6 Lemma The projection of X on G is given by E[XIY1 , ... , Yn].

Proof. First, note that E[XI11, ... , Y,,] E G since (via Jensen's
inequality) we have:

Next, let Z E G and note that


E[XZ] = E[E[XZI11, ... , Y"ll = E[ZE[XI11, ... , Y"ll·
That is,
(X, Z) = (E[XIY1' ... , Yn ], Z)
which implies that
(X - E[XI11, ... , Yn ], Z) = O.
Thus, X - E[XI11, ... , Y,,] is orthogonal to every element in
G. D

Thus, we conclude that the best minimum mean-square Borel


measurable estimate of X in terms of 11, ... , Yn is given by
E[XI11, ... ,Yn ]. The following result shows that Borel mea-
surability of our estimators cannot be dispensed with:

5.49 Theorem Let A1 be any real number. There exists a probability


space (0, :F, P), two bounded mndom variables X and Y defined
on (0, :F, P), and a f1Lnct'lon .I: lR ----+ lR such that X(w) =
f(Y(w)) Jor all w E ° yet such that E [(X - E[Xly])2] > !vI.

www.MathGeek.com
www.MathGeek.com

Regression Fu nctions 129

<) Proof. See "A Note on a Common Misconception in Estima-


tion" by Gary \\'ise in Systems and Control Letters, 1985, Vol. 5,
pp. 355-356. For related material, see also "A Result on Multi-
dimensional Quantization" by Eric Hall and Gary \Vise, in Pro-
ceedings of the American Mathematical Society, Vol. 118, No.2,
June 1993, pp. 609-613. D

5.18 Regression Functions

Consider random variables X and Y defined on a probability


space (0, F, P) with E[X] < 00. A regression function of X
given Y = y is denoted by E[XIY = y] and is defined to be any
real-valued Borel measurable function on lR that satisfies

r E[XIY = y] dFy(y) = r
JB JY-l(B)
X dP

for all B E 8(lR).

5.50 Theorem Any two regression functions of X given Y = yare


equal almost everywhere with respect to the measure 'induced by
Fy .

5.51 Theorem Consider random variables X and Y defined on


a probability space (0, F, P) with E[IXI] < 00. If ¢(y) =
E[XIY = y] then E[XIY] = ¢(Y) a.s.

5.52 Theorem Consider two random variables X and Y possessing


a joint density function fx, y then

E[XIY = y] = r x fx,Y(x, y) dx
JIT€. fy(y)

almost everywhere with respect to the measure induced by Fy .


That is, a version of E[XIY = y] is given by

r . fx,Y(x, y) d
JIT€. X fy(y) x.

www.MathGeek.com
www.MathGeek.com

130 Probability Theory

(The ratio
fx.y(x, y)
fy(y)
is called a condit'lonal densdy of X given Y = y and is denoted
by fXIY(xly).)

Example 5.19 Let X and Y be zero mean, unit variance,


mutually Gaussian random variables with correlation coefficient
p. In this example we will find E[XIY = y] and E[XIY]. Note
that

E[XIY =y] r xfX,y(x'Y)dx


JlR fy(y)
1 (-(x 2 - 2pxy + y2))
r -=-2n-v/----;;: O=l=_=p::::;;:2 exp 2 (1 - (2) d
JJFi!. x 1
y'21f
(_y2)
exp ---:2
X

1
-y'2K-2-n VI
1
- p2 exp
(y2) (_y2)
2 exp 2(1 _ (2)

~
-(X2 - 2PX Y))
x
JFi!.
x exp ( 21-p2)
( dx

1 1 (y2) (_y2)
y'2K v/l - p2 exp 2 exp 2(1 _ (2)

~
_(x2 _ 2pxy ± p2y2))
X lR x exp ( (2)
21-p dx

y2) (_y2) (p2y2)


exp ( 2 exp 2(1 _ (2) exp 2(1 _ (2)

X r _1_ 1 x exp (-(X - py)2) dx


JlR y'2K v/l - p2 2(1 - (2)
y2) (_y2) (p2y2)
exp ( 2 exp 2(1 _ (2) exp 2(1 _ (2) py
y2(1 _ p"2) _ y"2 + p2y"2)
exp ( 2(1 _ p"2) py
py
where the final integral above is simply the mean of a N(py,
1- (2) random variable. Thus, it follows that E[XIY = y] = py
and hence that E[XIY] = pY a.s. D

www.MathGeek.com
www.MathGeek.com

Regression Fu nctions 131

As the following theorem shows, the existence of a joint density


function for two random variables X and Y with X integrable
places no additional restrictions on the regression function of X
given Y = y.

5.53 Theorem Let 9 be any Borel measurable function mapping JR


into JR. There exist mndom variables X and Y possessing a
joint density function such that X is integmble and E[XIY =
y] = g(y) for all y E R

Proof. Let g: JR ----7 JR be Borel measurable and define


1
f(x, y) = 4" exp[- exp(lyl)lx - g(y)I]·

Note that f(x, y) is a joint probability density function since

kkf(x, y)dxdy kk~ exp[ - exp(lyl) Ix - g(y) I] dx dy

kk~ exp[ - exp(lyl) Izl] dz dy

k~exp( -IYI) dy = 1.

Let X and Y be random variables s11ch that the pair (X, Y)


has a joint probability density function given by f(x, y). Notice
from the above calculation that a second marginal probability
density function of f(x, y) is given by fy(y) = exp( -lyl)/2.
Recall that a version of E[XIY = y] is given by fIR. x [J(x,
y)/ fy(y)] dx. This version will be used throughout the remain-
der of this proof. Substituting for fy (y) implies that

E[XIY =y] 2 exp(lyl) ( ::. exp[ - exp(lyl) Ix - g(y) I] dx


JIR. 4
( z+g(y)
2exp(lyl) JIR. 4· exp[-exp(IYI)lzl] dz

2exp(IYI)g(y) \1
2exp y
I) = g(y).

Hence, the random variables X and Y with the joint probability


density function f(x, y) are such that E[XIY = y] = g(y), where
9 was an arbitrarily preselected Borel measurable function. D

www.MathGeek.com
www.MathGeek.com

132 Probability Theory

{) 5.19 Statistical Hypothesis Testing

Basic concepts of statistics arise in medicine, engineering, sociol-


ogy, bllsiness, education, and other areas. For example, consider
a medical situation in which a new medication for a particular
problem is being tested. Assume that patients are divided into
two groups, and assume that the patients in the first group are
each given the new medication and that the patients in the sec-
ond group are each given a placebo. (A placebo is a sugar pill
that is identical in appearence to the medication.) Assume that
each group has 50 patients in it. 'What if 36 patients in the first
group were found to be free of the medical problem of concern,
and 25 patients in the second group were found to be free of the
medical problem of concern? How might we describe these re-
sults? Of the 50 patients taking the new medication, 36 of them
improved. This is an objective result of the data. However, we
should be careful before concluding that 72% of the time the
new medication will be effective. This conclusion belongs in the
realm of statistical inference.
A statistical hypothesis is a nonempty family of probability mea-
Sllres on a given measurable space. For convenience, we will take
our llnderlying measurable space to be (JR., B(JR.)). Then, for
instance, the probability measures of interest could be distribu-
tions of random variables. A statistical hypothesis is said to be
simple if it is a singleton set. Roughly speaking, we are trying
to discern which hypothesis is in effect based upon knowledge
of a realization of a random variable. For example, if the hy-
potheses are simple and if, under one hypothesis, unit measure
is given to a particular Borel set, and if, under the other hy-
pothesis, the measure gives unit measure to a disjoint Borel set,
then it should be straightforward to discern which hypothesis is
in effect. Indeed, just see which of the two Borel sets contains
the realization and announce the corresponding hypothesis.
Consider the situation where we have two disjoint simple sta-
tistical hypotheses Ho and Hl and assume that we know the
probability 1To of Ho and 1Tl of H 1 . These probabilities are of-

www.MathGeek.com
www.MathGeek.com

Statistical Hypothesis Testing 133

ten called the priors since such a probability is the probability


of a hypothesis being true without regard to any random vari-
able that is observed. For convenience, assume that the relevant
measures associated with Ho and HI have probability densities
denoted, respectively, by .10 and .h.
\Ve note that there are two types of errors we could make in
reaching a decision. VVe conld annonnce HI when Ho is trne or
we could announce Ho when HI is true. Our goal will be to
make a decision in such a way so as to minimize the probability
of error Pe. Let So denote a Borel set snch that if a realization
belongs to So then we announce Ho and if a realization belongs
to S8 we announce HI. Let SI = Soc. Thus, it follows that

Pe= 7rO r
JS I
fo(x)dx+ 7rl
JSa
r .h(x)dx.

Rewriting this, we have

Pe = hI .10
7ro (x) dx + 7rl ha .II(x) dx

+ hI .II
7r l (x) dx - 7rl hI .11 (x) dx

+ hI (
7rl 7rofo(x) - 7rl.h(x)) dx.

Now, we see that we can minimize Pe by choice of SI by defining


SI to be the set ofreal numbers x such that 7rofo(x) -7rl.h(X) <
o. Consequently, So is the set of all real numbers x such that
7rofo(x) - 7rl.h(x) ;:::: O. \Ye note that the equality condition in
these inequalities is arbitrary since such a change in the inequal-
ity does not change the corresponding integral.
As an example, consider testing for the hypothesis that a ran-
dom variable X has a standard normal distribution versus the
hypothesis that X is normal with a mean and a variance of
one. Assnme that each hypothesis is eqnally likely; that is,
7ro = 7rl = 1/2. Using the above procednre, we announce that
the mean is one for real nnmbers x snch that
e-(x-l)2

that is, we announce that the mean is one when x > 1/2. Hence,
we announce that the mean is one whenever X E (1/2, (0), and
this test minimizes the probability of error.

www.MathGeek.com
www.MathGeek.com

134 Probability Theory

5.20 Caveats and Curiosities

www.MathGeek.com
www.MathGeek.com

Random Processes 135

6 Random Processes
6.1 Introduction

Throughout this chapter, we will assume that all probability


spaces are complete unless otherwise specified. A random pro-
cess (or a stochastic process) defined on a probability space (0,
F, P) is an indexed collection of random variables each defined
on (0, F, P). \Ve denote a random process hy {X(t) : t E T}
where T is a nonempty index set that often denotes time and is
usually (in these notes, always) taken to be a subset of R Thus,
for each fixed t in T, X (t) (or, more precisely, X (t, .)) is simply
a random variable defined on (0, F, P). If T is a countably in-
finite set then we say that {X (t) : t E T} is a random sequence
or a discrete time or discrete parameter random process. If T
is a subinterval of lR. then we say that {X (t) : t E T} is a con-
tinuous time or continuous parameter random process. Vve will
often denote a random process {X(t) : t E T} by {X(t)} when
the index set T is arbitrary or clear from the context.
Consider a random process {X(t) : t E T}. A function X(t,
wo) : T ---+ lR. obtained by fixing some Wo E n and letting t
vary is called a sample function or sample path or trajectory
of the random process {X (t)}. If T is count ably infinite then
a sample path is called a sample sequence. If {tl' t"j, ... , tn}
is any finite set of elements from T then the joint probability
distribution of the random variables X(t 1 ), ... , X(t n ) is called
a finite dimensional distribution of the random process {X (t)}.
A random process {Y(t) : t E T} is said to be a modification of a
random process {X(t) : t E T} if X(t) = Y(t) a.s. for all t E T.
Notice that in such a case {X(t) : t E T} and {Y(t) : t E T}
have the same family of finite dimensional distributions. Also,
note that the associated P-nnll set can depend on t.
Two random processes {X(t) : t E T} and {Y(t) : t E T}
are said to be indistinguishable if, for almost every w, X(t,
w) = Y(t, w) for all t E T. Notice that there is just one set
of measure zero off of which X(t) = Y(t) for all t in T while for

www.MathGeek.com
www.MathGeek.com

136 Random Processes

a modification the set of measure zero off of which X(t) = Y(t)


may depend on t. If T is a countable set then the two definitions
are equivalent since a countable union of null sets is itself a null
set.

The value of a problem is not so much in


coming up with the answer as in the ideas
and attempted ideas it forces on the would-
be solver. -1. N. Herstein

Let D be a subset of R The set D is said to be dense in lR


if every nonempty open snbset of lR contains an element from
D. For example, the set Q of rational numbers is dense in R
Let {X (t) : t E T} be a random process defined on a complete
probability space (0, F, P) where T is an interval. The random
process {X (t) : t E T} is said to be separable if there exists a
countable dense subset I of T and a null set N E F such that
if w E NC and t E T then there exists a sequence {tn}nEN of
elements from I with tn ---+ t such that X (tn' w) ---+ X (t, w).

6.1 Theorem Consider a random process {X (t) : t E T} defined on


a complete pTObabildy space and assume that T E B(lR). There
exists a separable random pTOcess {Y(t) : t E T} defined on the
same pTObabUity space that ,is a modificat'ion of {X(t) : t E T}.

Theorem 6.1 says that requiring a random process to be sep-


arable places no additional restrictions on the family of finite
dimensional distributions of that process. In short, any random
process admits a separable modification.
Let (fh, Fd and (0 2 , F 2 ) be two measurable spaces. If A E Fl
and B E F2 then A x B is called a measurable rectangle. The
smallest IT-algebra on 0 1 X O 2 that contains every measurable
rectangle is denoted by Fl x F'2 and is called the product IT-
algebra on 0 1 x fh

6.2 Theorem If (0 1, F 1 , !11) and (0 2, F 2 , !12) are (j-finite measure


spaces then there e.rists a IT-finde measure on the measm-able
space (0 1 X O2 , Fl x F 2 ), called the pTOduct measure and denoted

www.MathGeek.com
www.MathGeek.com

I ntrod uction 137

by P,2, such that, for any measurable rectangle A x B,


P,1 X P,1 x
P,2(A x B) = P,1(A)P,2(B).

Consider a random process {X (t) : t ETc JR.} defined on a


probability space (0, 5", P), and let M(JR.) denotes the collec-
tion of all Lebesgue measurable subsets of JR.. If T is an ele-
ment of M(JR.) and if X is a measurable mapping from (JR. x 0,
M(JR.) x 5") to (JR., M(JR.)) then we say that the random process
{X (t) : t E T} is measurable.

Example 6.1 Let A be a subset of JR. that is not a Lebesgue


measurable set, let X be a positive random variable defined
on a probability space (0, 5", P), and define a random process
{Y(t) : t E JR.} on this space via Y(t, w) = X(w)IA(t). The
inverse image of the Borel set (0, (0) is A x ° which is not a
measurable set in the product measure space. Thus, {Y(t)} is
not a measurable random process. Notice that for each fixed t,
Y(t) is a random variable yet, for each fixed w, Y(t) is a non-
Lebesgue measnrable fnndion of t. D

6.3 Theorem Let {X (t) : t E T} be a random process defined on


a complete probability space and assume that T is a Lebesgue
measurable subset of R Suppose that there exists a subset N
of JR. having Lebesgue measure zero such that X (s) converges in
probability to X(t) as s ---+ t for every t in T \ N. (That is,
S1lppose that {X(t) : t E T \ N} is continuous in probability.)
Then there e:rists a random process defined on the same space
that is a measurable and separable modification of {X (t) : t E
T}.

Theorem 6.3 says that any random process that is continuous


in probability admits a modification that is both separable and
measurable. Recall that separability places no additional re-
strictions on the family of finite dimensional distributions of a
random process. This statement cannot be made for measura-
bility. In particular, there exist random processes that do not
possess measurable modifications. An example of such a pro-
cess (that is, nevertheless, discussed frequently in engineering
contexts) is provided by the following theorem.

www.MathGeek.com
www.MathGeek.com

138 Random Processes

6.4 Theorem Let {X (t) : t E lR} be a random process composed of


second order, positive variance, mui1wlly independent random
variables defined on the same probability space. The random
process {X (t) : t E lR} does not admit a measurable modification.

One is often confronted with a need to integrate the sample


paths of a random process. The following theorem presents con-
ditions that are sufficient to ensure that almost all of the sample
paths of a random process are Lebesgue integrable. Later, we
will define an L2 (or mean-square) integral for a certain family
of random processes. It will not be defined as a pathwise in-
tegral but instead will be defined as an L2 limit. (If both the
pathwise integral and the L2 integral exist they will be equal
almost surely.) VVe will find this latter type of integral much
more useful for our purposes than an integral based upon the
sample paths of a random process.

6.5 Theorem Let {X (t) : t E T} be a measurable random process.


All sample paths of the random process are Lebesgue measurable
functions of t. rf E[X(t)] exists for all t E T then it defines a
Lebesgue measurable function of t. Further, if A is a Lebesgue
measurable subset ofT and if f4 E[IX(t) I] dt < 00 then almost all
sample paths of {X (t) : t E T} are Lebesgue integrable over A.

6.2 Gaussian Processes

A random process {X (t) : t E T} is called a Gaussian process


if the random variables X(td, X(t 2 ), ... , X(t n ) are mutually
Gaussian for every finite subset {tl' t 2 , . . . , t n } of T.

6.6 Theorem Let {Y(t) : t E T} be any random process such that


E[IY(t)12] < 00 for all t E T. There exists a Gaussian process
{X (t) : t E T} (defined, perhaps, on a different probability space)
such that E[X(t)] = 0 and E[X(s)X(t)] = E[Y(s)Y(t)] for all s
and tin T.

www.MathGeek.com
www.MathGeek.com

Second Order Random Processes 139

6.3 Second Order Random Processes

A random process {X (t) : t ETc lR} is said to be a second or-


der random process or an L2 random process if E [X2 (t)] < CXJ for
all t E T. The a11tocovariance f11ndion of sl1ch a process is de-
fined to be K(tl' t 2 ) = E[(X(tl) - E[X(t l )])(X(t 2 ) - E[X(t2)])]
where t l , t2 E T. Notice that if {X(t) : t E T} has autocovari-
ance function K and if f : T ---+ lR then {X(t) + f(t) : t E T}
also has auto covariance function K. That is, changing the
means of the individual random variables in {X (t) : t E T}
does not change the autocovariance function of the process.
The autocorrelation function of a second order random process
{X(t) : t E T} is defined to be R(tl' t"2) = E[X(tl)X(t"2)] for tl
and t2 in T. Note that K(tl' t 2) = R(h, t 2 ) -E[X(tl)]E[X(t2)]'
Note, also, that for a zero mean, second order random process,
the autocovariance function is equal to the autocorrelation func-
tion.

6.7 Theorem Let K be a Teal-valued nonnegative definite function


defined on TxT such that K(t, s) = K(s, t) fOT any t and s in
T. TheTe exists a second ordeT mndom pmcess {X(t) : t E T}
whose autocovaT'lance funct'lon is K.

A random process {X(t) : t E T} is said to be strictly stationary


if given any positive integer n, any elements tl < t2 < ... < tn
from T, and any h > 0 such that ti + h E T for each i ::::; n, the
joint distribution function of the random variables X (tl + h),
X(t2+h), ... , X(tn+h) is the same as that of X(t l ), X(t 2), ... ,
X(tn). That is, if t denotes time, a stridly stationary random
process is one whose finite dimensional distributions remain the
same as time is shifted.

Example 6.2 Let {X(t) : t E lR} be a random process mm-


posed of identically distributed, mutually independent random
variables. Let s, t l , t 2, ... , tn E lR where n E N, and note that

FX(h+s), ... ,X(tn+s)(Xl, ... , xn)


= P(X(tl + s) ::::; Xl, ... , X(tn + S) ::::; Xn)

www.MathGeek.com
www.MathGeek.com

140 Random Processes

P(X(tl + S) ::; Xl)'" P(X(tn + S) ::; Xn)


P(X(tl) ::; Xd ... P(X(t n ) ::; Xn)
FX(h), ... ,X(tn)(Xl, ... , xn).
Thus, it follows that {X(t) : t E lR} is a strictly stationary
random process. D

A random process {X (t) : t E T} is said to be wide sense sta-


tionary CWSS) if it is a second order process, and if K(s, t)
depends only on the difference s - t. VVe denote K(s + t, s) by
K(t) for a random process that is wide sense stationary. In the
case of a \VSS random process {X(t) : t E T} the assumption
that E[X(t)] is a constant function of t is often added. However,
this condition is unnatural mathematically and has nothing to
do with the essential properties of interest for \VSS random pro-
cesses. For example, let e be a random variable with a uniform
distribution on [0, 1l"]. For each t E lR, let X(t) = cos(21l"t + 8).
Then {X(t) : t E lR} is a VVSS random process with a noncon-
stant mean.

Example 6.3 Let {X(t) : t E lR} be a random process


composed of mutually independent random variables such that
E[X(t)] = 0 for all t E lR and such that E[X2(t)] = (}2 E (0, (0)
for all t E R This random process is wide sense stationary since

V(t)X(t
E [j~ )] = {E[X(t).]E[X(t + s)] = 0 if s i= 0
+s E[X2(t)] = (}2 if s = 0
is a constant function of t. D

6.8 Theorem A str-ictly stationar-y second or-der- mndom pmcess is


wide sense stationar-y.

6.9 Theorem A wide sense stationar-y Gaussian pmcess with a con-


stant mean is str-ictly stat'lonar-y.l

\Ve will next consider a calculus for second order processes. That
is, we will consider a framework in which we may discuss con-
tinuity, differentiation, and integration of second order random
processes.
lThus, the phrase "stationary Gaussian process" is not ambiguous.

www.MathGeek.com
www.MathGeek.com

Second Order Random Processes 141

A second order random process {X(t) : t E lR} is said to be


L2 continuous at the point t E lR if X (t + h) ---+ X (t) in L2 as
h ---+ o. A second order random process {X (t) : t E lR} is said
to be L2 differentiable at the point t E lR if (X(t+h) -X(t))/h
converges in L2 to a limit X' (t) as h ---+ o. The next theorem
relates L2 continuity of a second order random process to the
auto covariance function of the random process.
Recall that a function f : lR 2 ---+ lR is said to be continuous
at (x, y) if for every E > 0, there exists a 8 > 0 such that
If(x, y) - f(a, b)1 < s for all points (a, b) in lR 2 snch that
j(x - a)2 + (y - b)2 < b.

6.10 Theorem Let {X(t) : t E lR} be a second order random pTOcess


such that E[X(t)] is continuous. The random pTOcess {X(t) :
t E lR} ,is L2 cont'inu01Ls at T E lR 'if and only if K ,is continuous
at (1', 1') E lR 2 .

6.1 Lemma If an autoc01!aTiance function is continuous at (t, t) fOT


all tin lR then ,it is cont'inuous at (s, t) fOT all sand t ,in R

6.1 Corollary Let {X (t) : t E lR} be a WSS random process with


autocovariance function K (t). If the pTOcess is L2 continuous
at some point s then K is continuous at the origin. If K is
continuous at the origin then it is continuous everywhere and
the random pTOcess is L2 contimw1Ls for all t.

Notice that the random process in Example 6.3 is nowhere L2


continuous since its auto covariance function is discontinuous at
the origin. Vve will next relate L2 differentiability and differen-
tiability of the autocovariance function in the wide sense sta-
tionary case.

6.11 Theorem Let {X(t) : t E lR} be a WSS random pTOcess with


autocovar'lance fundion K (t). If the pTOcess ,is L2 d~fferentiable
at all points t E lR then K (t) is twice differentiable fOT all t E lR
and {X' (t) : t E lR} is a llyide sense stationary random pTOcess
with autocovaTiance function - K" (t).

\Ve next consider integration of second order random processes.


Let {X (t) : a ::; t ::; b} be a second order random process with

www.MathGeek.com
www.MathGeek.com

142 Random Processes

auto covariance function K where a and b are real numbers with


a < b. Let 9 be a real-valued function defined on [a, b]. \Ye
define
ib g(t)X(t) dt

as follows. Let ~ = {to, t l , ... , tn} be such that a = to < tl <


... < tn = b, and let I~I denote the maximum of It i - ti-ll over
all positive integers i ::; n. Define
n
I(~) = "Lg(tk)X(tk)(tk - tk-l).
k=l

If I(~) converges in L2 to some random variable Z as I~I ---+ 0


then we say that g(t)X(t) is L2 integrable on [a, b] and we denote
the L2 limit Z by
ib g(t)X(t) dt.

6.12 Theorem If, in the context of OUT discussion, E[X(t)] and g(t)
aTe continuous on [a, b] and if K ,is continuous on [a, b] X [a, b]
then g(t)X(t) is L2 integmble on [a, b].

6.13 Theorem If E[X(t)] = 0, if 9 and haTe cont'inuous on [a, b],


and 'if K ,is continuous on [a, b] x [a, b] then

E [ibg(s)X(s) ds ib h(t)X(t) dt] = ib ib g(s)h(t)K(s, t) dsdt,

and,

E [i b
g(s)X(s) dS] = E [i b
h(t)X(t) dt] = o.
6.14 Theorem If E[X(t)] = 0, if h is continuous on [a, b], and if K
is continuous on [a, b] X [a, b] then

E [X(S) ib h(t)X(t) dt] = ib K(s, t)h(t) dt.

6.2 Lemma ff a seq1Lence of mndom vaTiables defined on some


probability space conveTges in L2 and converges almost sUTely
then the limits aTe equal with probab'ility one.

www.MathGeek.com
www.MathGeek.com

The Karhunen-Loeve Expansion 143

6.15 Theorem If the integral J: g(t)X(t) dt exists as an L2 integral


and, for almost all w, as a Riemann integral then the two inte-
grals are equal with probability one.

Proof. If g(t)X(t) is both L2 integrable and Riemann integrable


a.s. on [a, b] then (using the current notation) I(lJ..) converges to
Z in L2 and almost surely. Thus, the desired conclusion follows
by the previous lemma. D

<) 6.4 The Karhunen-Loeve Expansion

Let K be a continuous auto covariance function defined on [a,


b] x [a, b]. Define an integral operator A on L 2 ([a, b]) (the set of
all square integrable real-valued Lebesgue measurable functions
defined on [a, b] where we identify any two functions that are
eqnal a.e.) via

A[f](s) = 1b K(s, t)f(t) dt,


where a ::; s ::; band f(t) E L 2 ([a, b]). Notice that the function
A maps the real-valued function f defined on lR to the real-
valued function A[f] defined on R A function e(-) is said to be
an eigenfunction of the integral operator A if A[ e](s) = Ae(s)
for some constant A and for a < s < b. The constant A is called
the eigenvalue associated with the eigenfunction e(t).

6.16 Theorem (Mercer) US'ing the above notai'ion, let {en(-)}nEN


be a sequence of e'igenf1Lndions of the integral operator A such
that

if j i: k
'if j = k,
2. 'if e(.) is any eigenfunction of A then e(·) zs equal to a
linear cornb'ination of the en's, and,

www.MathGeek.com
www.MathGeek.com

144 Random Processes

3. the eigenvalue An associated with en is nonzero fOT each


n E N.

It then follows that


(Xl

K(s, t) = L Anen(s )en (t)


n=l

for sand t 'In [a, b] (where the ser'les converges absolutely and
uniformly in both variables).

6.17 Theorem (Karhunen-Loeve) Let {X(t) : a ::; t ::; b} be a


second oTdeT pTocess with zero mean and continuous autoc01JaTi-
ance fmLdlon K. Let {en (t) }nEN be a sequence of eigenfmLct'lons
of the integml opemtoT A (as defined above) associated with K
that satisfies properties (1), (2), and (3) of MeTceT's TheoTem.
Then (Xl

X(t) = L Znen(t),
n=l

fOT a ::; t ::; b, wheTe

Zn = lb X(t)en(t) dt.

FUTtheT, the Zn's aTe zero mean, oTthogonal (E[ZkZj] = 0 fOT


k i= j) mndom vaTiables such that E[Z~] = An, and the seTies
converges in L2 to X(t) unzformly in t; that is,

E [ ( X(t) - t, Zke,(t)) '] ~ ()


as n ---7 00, 1tn'iformly fOT t in [a, b].

Notice that each term in the above series expansion for X(t) is a
product of a random part (that is, a function of w) and a deter-
ministic part (that is, a function of t). As the following theorem
shows, the Karhunen-Loeve expansion takes on a special form
when the random process is Gaussian.

6.18 Theorem Let the previous discussion set notation. In the


Karhunen-Loeve expansion for a Gaussian mndom process, the
mndom sequence {Zi}iEN is a Gaussian mndom sequence com-
posed of mutually independent mndom variables.

www.MathGeek.com
www.MathGeek.com

Markov Chains 145

6.5 Markov Chains

Consider a discrete parameter random process {Xn : n E N u


{O}} where each random variable in the process takes values
only in some snbset C = {ai :i E I} of lR where I is a subset of
N. For each j and k: from I, let Pj = P(Xo = aj) and let Pjk =
P(Xl = aklXo = G:j). The random sequence {Xn : n E NU{O}}
is said to be a Markov chain if, for any nonnegative integer n,

= Pj oPjoj 1 Pjd2 X ... x Pjn-dn.


The points in C are called the states of the lVIarkov chain, the
Pk values are called the initial probabilities of the Markov chain,
and the Pjk values are called the transition probabilities of the
Markov chain. If C is a finite set then the Markov chain is said
to be a finite Markov chain.
Higher order transition probabilities of a Markov chain are de-
fined as follows. Let

PJ~) = P(Xn = aklXo = aj).

This probability is equal to the sum of the probabilities of all


possible distinct seqnences of states that begin at state aj and
arrive, n steps later, as state G:k. For example, if n = 2, then

L P jmP mk·
(2) -_ '""'
P jk
mEl

A simple inductive argument shows that in general we have


(m+n)
PJk
= '""'p(m.. )p(n)
L)1 lk
iEI

which is a special case of a general Markov property known as


the Chapman-Kolmogorov equation.
The unconditional probability of entering state Cl!k at the nth
step is denoted by
n
P1
) and is given by

P(n)
k
= '""'
L
P .p(n)
J )k·
jEl

www.MathGeek.com
www.MathGeek.com

146 Random Processes

Note that if Pi = 1 (that is, if the Markov chain always begins


in state
,
cy.)
1.
then p(n)
k = p(n)
lk .

\Ve will say that a state ak can be reached from state aj if there
exists some nonnegative integer n for which p~~) is positive. A
set A of states is said to be closed if no state outside of A can
be reached from any state inside A. For an arbitrary set A of
states, the smallest closed set containing A is said to be the
closure of A. If the singleton set containing a particular state
is closed then that state is said to be an absorbing state. A
Markov chain is said to be irreducible if there exists no closed
state other than the set of all states. Note that a Markov chain
is irreducible if and only if every state can be reached from every
other state.
A state CYj is said to have period m > 1 if p)7)
= 0 unless n is a
multiple of m and if m is the largest integer with this property.
A state aj is said to be aperiodic if no such period exists. Let
ij~) denote the probability that for a Markov chain starting in
state CYj, the .first entry into state CYk occms at the nth step. Let
=
ijk = L ij~).
n=1

Note that ijj = 1 for a Markov chain that begins in state CYj
then a return to state CXj will occur at some later time with
probability one. In this case, we let
=
J-Lj = L ni?;),
71,=1

and we call J-Lj the mean recurrence time for the state aj. A state
aj is said to be persistent if ijj = 1 and is said to be transient
if ijj < 1. A persistent state CYj is said to be a null state if
J-Lj = 00. An aperiodic persistent state aj with J-Lj < 00 is said
to be ergodic.
=
6.19 Theorem A state CXj is transient if and only 'if L p~~) < 00.
71,=0

6.20 Theorem A persistent state eej is a null state if and only if


(Xl

L p~7) = 00 yet p~7) --7 0 as n --7 00.


17.=0

www.MathGeek.com
www.MathGeek.com

Markov Processes 147

6.21 Theorem rr
the state aj ,is aperiod'ic then limn--->oo Pl;l ,is e'ither
equal to zero or to fij / Pj .

6.22 Theorem In a finite Markov chain there exist no null states


and it is impossible that all states are transient.

6.6 Markov Processes

Consider a random process {X (t) : t E T} defined on a pro b-


ability space (0, F, P) where T c Itt The random process
{ X (t) : t E T} is said to be a Markov process if, for kEN and
o ::; tl ::; t2 ::; ... ::; tk ::; u where ti E T for each i and U E T,
P(X(u) E B I X(t 1 ), ... , X(t k )) = P(X(u) E B I X(td) a.s.

for each real Borel set B. Recall that a conditional expectation


with respect to {X (s) : s ::; t} is by definition a conditional
expectation with respect to IT( {X (s) : S ::; t}), the smallest IT-
algebra with respect to which every random variable in the set
{X (s) : s ::; t} is measurable.

6.23 Theorem Consider a Markov process {X (t) : t E [0, oo)}. For


all real Borel sets B and for all t ::; u, 'it follollJS that P(X(u) E
B I {X(s) : S ::; t}) = P(X(u) E B I X(t)) a.s.

Note that the result of the previous theorem follows from the
seemingly weaker condition used to define a Markov process.
The theorem says roughly that a conditional probability of a
future event (at time u) associated with a Markov process given
the present (at time t) and the past (at times before t) is the
same as a conditional probability of that future event given
just the present. That is, for a Markov process, the past and
the present combined are no more "informative" than jnst the
present for determining the probability of some future event.
The following corollary to the previous theorem restates this
property in terms of conditional expectation.

www.MathGeek.com
www.MathGeek.com

148 Random Processes

6.2 Corollary Consider a Markov process {X(t) : t E [0, oo)}. If


Z is an integrable random variable that is (J ( { X (s) : s 2': t})-
measurable then E[Z I {X(s) : s ::::; t}] = E[Z I X(t)] a.s.

The following theorem says that the future and the past of a
Markov process are conditionally independent given the present.

6.24 Theorem Consider a Markov process {X(t) : t E [0, oo)}. If


Z is an integrable random variable that is (J ( { X (s) : s 2': t})-
measurable and if Y is an integrable random variable that is
(j({X(s) : s::::; t})-measurable then E[ZYIX(t)] = E[ZIX(t)]
E[Y I X(t)] a.s.

Notice in the previous theorem that Z is a function of the present


and the future of the Markov process and that Y is a function
of the present and the past of the Markov process.

6.25 Theorem (Chapman-Kolmogorov) ConsideT a MaTkov pro-


cess {X (t) : t E [0, oo)}. If Z is an integrable random variable
that is (j ({X (s) : s 2': t}) -measurable and if 0 ::::; to < t then
E[Z I X(t o)] = E[E[Z I X(t)] I X(t o)] a.s.

Example 6.4 Consider a zero mean, Gaussian, Markov


random process {X(t) : t E JR.} and consider real numbers
tl < t2 < ... < tn < t. Since the process is Markov it fol-
lows that E[X(t) I X(td, X(t"2), ... ,X(tn)] = E[X(t) I X(t n )]
a.s. But, since the process is also Gaussian, we know that
E[X(t) I X(t n )] = aX(t n ) a.s. for some real constant a. Now,
consider the problem of estimating or predicting the value of
the process at some future time t based on a collection of past
samples of the process by taking the conditional expectation of
the process at time t given the past samples. For a Gaussian
Markov process, this estimate is simply a linear function of the
last sample. All previous samples taken before the last sample
may be discarded. Although we will not show it here, an esti-
mate of this type based on conditional expectation provides a
best (in a minimum mean-square error sense) estimator of the
random variable of interest as a Borel measurable transforma-
tion of the data. D

www.MathGeek.com
www.MathGeek.com

Martingales 149

6.7 Martingales

Let {Xn}nEN be a random sequence defined on a probability


space (n, F, P) and let {Fn}nEN be a seqllence of IT-sllbalgebras
of F. The random seqllence {Xn}nEN is said to be a martingale
relative to {Fn : n E N} if the following four conditions hold for
each positive integer n:

2. Xn is Fn-measurable,

3. E[IXnll < 00, and

4. E[Xn+l I Fnl = Xn a.s.

Do you know what it is to be possessed by a


problem, to have within yourself some urge
that keeps you at it every waking moment,
that makes you alert to every sign pointing
the way to its solution; to be gripped by
a piece of work so that you cannot let it
alone, and to go on with deep joy to its
accomplishment? -Lao G. Simons

A sequence of IT-algebras that satisfies condition (1) is called


a filtration. If condition (2) holds for all n E N then we say
that the random sequence {Xn}nEN is adapted to the filtration
{Fn : n EN}. If Fn = IT(Xl' X 2 , ... , Xn) then {Fn : n E N} is
a filtration and is called the canonical filtration associated with
the random sequence {Xn}nEN. If a martingale is given without
a specified filtration then it should be regarded as a martingale
with respect to its canonical filtration.

6.26 Theorem If a random sequence is a martingale with respect to


some filtration then it is a martingale with respect to 'its canon-
ical filtration.

www.MathGeek.com
www.MathGeek.com

150 Random Processes

Proof. Assume that {Xn :n E N} is a martingale with respect


to some filtration {Fn : n E N} and let Tn and n be positive
integers such that Tn < n. Since Xm is Fm-measurable and
Fm c F n , it follows that Xrn is Fn-measurable. Thus, it follows
that Xl, X 2, ... , Xn are each Fn-measurable for any positive
integer n. Finally, since IT(XI, ... , Xn) C Fn for any n E N, it
follows that

E[E[Xn+IIFnlliJ(XI' ... , Xn)l


E[XnliJ(XI, ... , Xn)l

A random sequence {Xn}nEN is said to be a submartingale rel-


ative to {Fn : n E N} if conditions (1), (2), and (3) given above
and condition (4') given below each hold for every positive inte-
ger n: (4') E[Xn+1 I Fnl ~ Xn a.s.
A random sequence {Xn}nEN is said to be a supermartingale
relative to {Fn : n E N} if conditions (1), (2), and (3) given
above and condition (4") given below each hold for every positive
integer n: (4") E[Xn+1 I Fnl :::; Xn a.s.

Example 6.5 Let Xl, X 2, ... be mutually independent ran-


dom variables with zero means and finite variances. Further, let
Sn = Xl + X 2 + ... + Xn and Tn = S~ for n E N. Note that

E[Tn+IIXI' ... , Xnl


E[(XI + X 2 + ... + Xn + Xn+d 2 IXI' ... , Xnl
E[(XI + ... + Xn)2 + 2Xn+1(Xl + ... + Xn)
+X~+IIXI' ... , Xnl
E[Sn + 2Xn+1 S n + Xn+IIXI, ... , Xnl
2 2

E[S~IXI, ... , Xnl + 2E[Xn+I S n IX I , ... , Xnl


+E[X~+IIXI' ... , Xnl
S~ + 2SnE[Xn+ll + E[X~+1l
Tn + E[X~+1l
> Tn a.s.
Thus, Tn is a submartingale with respect to iJ(XI' ... ,Xn). D

www.MathGeek.com
www.MathGeek.com

Random Processes with Orthogonal Increments 151

Martingales are often used to model gambling games that are


fair. That is, they model a game in which the expected fortune
of the gambler after the next play is the same as his present
fortune. In this context, a submartingale would represent a
game that is favorable to the gambler and a supermartingale
would represent a game that is unfavorable to the gambler. (A
"martingale" is part of a horse's harness that prevents the horse
from raising its head too high. A martingale be<:ame a gam-
bling term through its association with horse racing and later
was used to describe processes of this sort.) The following the-
orem is called the Martingale Convergence Theorem and is due
to Joseph Doob.

6.27 Theorem (Doob) Let the mndom sequence {Xn}nEN be a sub-


martingale wdh respect to ds canon'lcal filtmtion. If SUPnEN
E[IXnl] is finite then Xn converges almost surely to a mndom
variable X such that E[IXI] ::; sUPnEN E[IXnl]·

Consider a filtration {Fn : n E N} and let FrXJ denote the small-


r
est 0" -alge bra containing U~=l Fn. In this case we write Fn F =
and have the following result.

r
6.28 Theorem If Fn F= and if Z is an integmble mndom variable
then E[Z I Fn] converges to E[Z I F=] a.s.

6.8 Random Processes with Orthog-


onal Increments

A random pro<:ess {X (t) : t E T} is said to possess orthogonal


in<:rements if E[IX(t) - X(s)12] < IX) for all 5, t E T and if,
whenever parameter values satisfy the inequality Sl < t1 ::; S2 <
t 2, the increments X(t1)- X (Sl) and X(t2)- X (S2) ofthe process
are orthogonal; that is, E[(X(t1) - X(Sl))(X(t 2) - X(S2))] = o.

6.29 Theorem Let {X(t) : t E T} be a random process with orthogo-


nal increments. There e:cists a nondecreasing function F(t) such

www.MathGeek.com
www.MathGeek.com

152 Random Processes

that E[IX(t) - X(sW] = F(t) - F(s) when s < t. Further, the


function F is unique up to an additive constant.

Notice that the previolls theorem implies that the mean-square


continuity of a random process with orthogonal increments is
related to the pointwise continuity of the corresponding function
F(t). VVe denote the relationship between the function F(t)
and the random process {X(t)} by writing E[ldX(t)12] = dF(t).
(Our use of a differential here is just for notational purposes.)
Let {Y(t) : t E Jl{} be a random process with orthogonal incre-
ments and let h(t) be a real-valued function. VYe are now going
to direct our attention toward defining an integral of the form

k h(t) dY(t).

Since the sample functions of the random process {Y(t)} are


not generally of bounded variation, we cannot define the above
integral as an ordinary Riemann-Stieltjes integral with respect
to the individual sample functions. Instead, we will define this
integral (called a stochastic integral of h( t) with respect to the
random process {Y(t)}) as an L2 limit. As usual, we begin by
defining the integral when h(t) is a step function.
Assume that the function h(t) is of the following form where ai
and Ci are real numbers for each i and where al < a2 < ... < an:

if t < al
h(t) ~ {~ if aj-l =::; t
if t ::2: an.
< aj for 1 < j =::; n

For such a function h we will define the stochastic integral of


h(t) with respect to {Y(t)} to be

More precisely, we define the integral to be any random variable


that is equal almost surely to the sum on the right hand side.
(One technical detail is that if aj is a discontinuity point of
F then instead of Y (aj) in the previous definition we use the
mean-square limit of Y(t) as t raj. This limit will exist due

www.MathGeek.com
www.MathGeek.com

Random Processes with Orthogonal Increments 153

to the relation between F and Y (t).) It is not difficult to show


that if h(t) and g(t) are step functions (as defined above) and if
E[ldY(t)12] = dF(t) then

E [l h(t) dY(t) l g(t) dY(t)] = l h(t)g(t) dF(t).


Now, consider a real-valued function h(t) and let {hn(t)}nEN be
a sequen<:e of step fundions (as defined above) su<:h that

l (h(t) - h (t))2 dF(t)


n ---70

as n ---7 00. Further, let Zn denote the sto<:hastk integral of the


step fundion hn(t) with resped to the random pro<:ess {Y(t)}.
The previous observation implies that there exists a random
variable Z such that E[(Z - Zn)2] ---7 0 as n ---7 00. Further,
the random variable Z does not depend on the particular se-
quen<:e {hn}nEN given above. That is, a random variable equal
almost surely to Z will be obtained when any sequen<:e {hn}nEN
mnverging in the above sense to h is <:hosen. \Ve define the
sto<:hasti<: integral
l h(t) dY(t)
to be any random variable that is equal almost surely to the
random variable Z.

6.30 Theorem In the context of OUT discussion, a function h may


be TepTesented as a limit of step functions (in the above sense)
if the integml fJR h 2(t) dF(t) e.'rists and is finite.

If h(t) = g(t)+2p(t) is a mmplex-valued fundion su<:h that 9 and


P each satisfy the condition of the previous theorem then the in-
tegral fJR h( t) dY( t) is defined to be fJR g(t) dY (t) +l fJRp( t) dY(t).
\Ve will say more about complex-valued random processes later.

www.MathGeek.com
www.MathGeek.com

154 Random Processes

6.9 Wide Sense Stationary Random


Processes

Recall the definition of a wide sense stationary random process


that was given on page 140. In this section we will consider
zero mean, continuous time VVSS random processes with par-
ticular concern for their harmonic properties. Throughout this
section we will assume, based upon the following theorem, that
all wide sense stationary random processes satisfy the following
condition:
lim E[IX(t) - X(sW] = O.
t-s---+O

6.31 Theorem A wide sense stationary random process {X(t) : t E


lR.} possesses a separable and measurable modification if

lim E[IX(t) - X(s)12] = O.


t-s---+O

Further, 4 a modification of {X (t) : t E lR.} is measurable then


the pmcess must satisfy the previous condition.

Thus, the previous condition is a minimal continuity hypothesis


and we will assume that it is satisfied whenever we discuss con-
tinuous parameter WSS random processes. In addition, we will
always take the parameter set of such a process to be either lR. or
[0, (0). Recall that the auto covariance function of a zero mean
\VSS random process is defined by K(t) = E[X(s + t)X(s)].
Note that the autocovariance function K(t) is continuous since

IK(t) - K(s)1 IE[(X(t) - X(s))X(O)]1


< VE[IX(t) - X(s)12]E[IX(0)12]

where the right hand side approaches zero as t - s ----7 0 via the
previolls continuity hypothesis.

6.32 Theorem The autocovariance function K(t) of a zem mean


WSS random pmcess may be expressed as

www.MathGeek.com
www.MathGeek.com

Wide Sense Stationary Random Processes 155

wheTe the junction F is nondecTeasing, bounded, Tight continu-


ous, and such that F( -00) = O. FuTther, the junction F is the
unique such junction jor which the above equality is satisfied.

Consider a zero mean WSS random process with a autocovari-


ance function K and let F denote the function obtained via the
previous theorem for this autocovariance function. The func-
tion F is called the spectral distribution function of a zero mean
\VSS random process with autocovariance function K. If F is
absolutely continuous then its derivative F' exists almost every-
where and is called a spectral density function of a \VSS random
process with auto covariance function K.

6.33 Theorem Ij JIll!.IK(t)1 dt < 00 then theTe e:rlsts a cont'lnuous


spectral density junction given by

The spectrum of a \VSS random process is given by the set of


all real numbers AO such that F (AO + c) > F (AO - c) for every
[ > O. That is, the spectrum c:onsists of all points of increase of
the spectral distribution function F. Note in particular that the
spectrum of a vVSS random process is a subset of lR and is not,
as is frequently misstated, a function. The spectrum consists
of frequencies that enter into the harmonic analysis of both the
auto covariance function and the sample functions of the random
process.

6.34 Theorem Every zero mean wide sense stationary random pro-
cess {X(t) : t E lR} satisjying the continuity condition given at
the beginn'ing oj this sect-ion possesses a spectral representation
oj the jorm
X(t) = re
Jill!.
21r1t
)" dY(A),

where the random process {Y(A) : A E lR} has orthogonal in-


CTements and is s1Lch that E[ldY(A)12] = dF(A) wheTe F is the
spectral distribution junction oj {X (t)}.

Let {X (t) : t E lR} be a mean-square continuous, wide sense


stationary random process defined on a probability space (0,

www.MathGeek.com
www.MathGeek.com

156 Random Processes

F, P) and let H denote the real Hilbert space L 2 (O, F, P).


Note that for any real number t, the random variable X(t) is a
point in H. Further, note that IIX(t)11 = VR(O) where R is the
autocorrelation function of the random process {X (t) : t E lR.}.
Let S denote the sphere in H consisting of all points in H that
V
are at a distance of R(O) from the origin. Note that the random
process {X(t) : t E lR.} is a subset of S. Further, since the
random process is mean-square continuous, we see that as .s ---+ t,
IIX(s) -X(t)ll---+ 0, which implies that the random process is a
continuous curve in H.

6.10 Complex-Valued Random Pro-


cesses

It is often convenient to be able to deal with random processes


that are complex-valued. The extension from the real case is
very straightforward.
If {Y(t) : t E T} is a random process taking values in the com-
plex plane then Y(t) = X(t) + ~Z(t) where {X(t) : t E T}
and {Z(t) : t E T} are real-valued random processes. Further,
E[Y(t)] = E[X(t)] + ~E[Z(t)] for all t for which the expecta-
tions on the right hand side exist and are finite. We say that a
complex-valued function is measurable if the real and imaginary
parts of the function are each measurable. Finally, the autoco-
variance function of a complex-valued random process {Y(t)} is
given by K(s, t) = E[(Y(s) - E[Y(s)])(Y(t) - E[Y(t)])*].
One should be careful not to carelessly apply theorems about
real-valued random processes to complex-valued random pro-
cesses. For example, a wide sense stationary complex-valned
Gaussian process need not be strictly stationary, and there exist
two complex-valued mutually Gaussian random variables that
are uncorrelated but not independent.

www.MathGeek.com
www.MathGeek.com

Linear Operations on WSS Random Processes 157

6.11 Linear Operations on WSS Ran-


dom Processes

Let {X (t) : t E ~} be a zero mean \VSS random process with a


spectral representation given by

where E[ldY(>.)I2] = dF(>'). By a linear operation on the pro-


cess {X(t)} we will mean a transformation of {X(t)} into a
random process {Z (t) : t E ~} of the form

The function C(>') may be any real or complex-valued function


for which the following integral exists and is finite

kIC(>')1 dF(>.). 2

The function C is called the gain of the linear operation. Note


that the process {Z (t)} is a zero mean VVSS random process
and also satisfies the continuity condition given at the beginning
of this section. (That {Z(t)} is zero mean follows from the
definition of the stochastic integral and the fact that {X (t)} is
zero mean.) Further, {Z(t)} is a WSS random process since its
auto covariance function Q(t) itl given by

lIT{r e
21TltA
Q(t) = IC(>') 12 dF(>.).

In addition, the spectral distribution function G (>.) of {Z (t)} is


given by
G (>.) = 1
. (-=,>.]
1C (JL ) 12 dF (JL ) .

If {X (t)} possesses a spectral density function f (>.) then {Z (t)}


possesses a spectral density function g(>.) that is given by g(>.) =
IC(>')1 2 f(>.)·

www.MathGeek.com
www.MathGeek.com

158 Random Processes

In engineering contexts, it is common to consider linear op-


erations (sometimes called linear filters) of the following form
where the function h(t) is often (imprecisely) called an impulse
response function:

Z(t) = k h(s) X(t - s) ds.

As the following theorem shows, this linear filter is a special case


of the linear operations that we have been considering.

6.35 Theorem Let {X(t) : t E lR} be a zero mean WSS random


process with a bounded spectral density function and a spectral
repTesentation given by

X(t) = re
J~
2mt
)" dY(>').

FUTtheT, let h be a Teal OT comple:r-valued function defined on lR


that is continuous a. e. with respect to Lebesgue measure, inte-
grable, and square integrable. Define

k h(s)X(t - s) ds

to be the limit in L2 as T ---7 CXJ of the L'2 'integral

1(-T,T)
h(s)X(t - s) ds.

This L2 limit e:rists and is eqtwl to

where H is the FouTier transfoTm of h. The function H is some-


times called the transfer function of the linear jilter.

<) 6.12 Nonlinear Transformations

www.MathGeek.com
www.MathGeek.com

Nonlinear Transformations 159

Random processes often appear as models for random signals


and noise. An assumption of stationarity is often warranted.
Nonlinear systems that commonly appear in practice include
half wave rectifiers, limiters, square law devices, and others. Let
{X (t) : t E JR.} be a stationary Gaussian random process with
mean zero and positive varian<:e (J2. Assume that the auto<:or-
relation function of X(t) is denoted by ReT) = E[X(t)X(t+T)].
Further, assume that the function R(·) is positive definite. For
two random variables X(t 1 ) and X(t 2) in the random process,
let P(tl - t 2) = R(tl - t 2)/ (J2 denote their correlation coefficient.
Recall that a bivariate probability density function j (., .) exists
for these two random variables. Further, we can and do take
this bivariate probability density function to be continuous as a
function of its two real arguments. Indeed, we note that j(x, y)
can be taken as the bivariate Gaussian density function given
on page 112 with ml = m2 = 0, with (Jl = (J2 = (J, and with
p = p(tl - t2). That is X(td and X(t2) have a N(O, 0, (J2,
(J2, P(tl - t 2)) distribution. Now, let p denote the continuous
marginal Gaussian density function. That is, p is the <:ontinu-
ous probability density function corresponding to the N(O, (J2)
distribution. Let T = tl - h
Define the measure m on B(JR.) via m(B) = IE p(x) dx, and note
that Tn is equivalent to Lebesgue measure on B(JR.). Further,
consider the real Hilbert space L 2 (JR., B(JR.) , m). Vve will take
this real Hilbert space as our space of nonlinearities. VVe will let
the Borel measurable function 9 correspond to a point in L2 (JR.,
B(JR.) , m), and we will refer to 9 as a nonlinearity. Nonlinear
systems such as this are often referred to as zero memory non-
linearities. That is, the output is a Borel measurarable function
of the input at the same time; if it depends on the input at
earlier times, then the system is said to have memory. \Ve will
be <:on<:erned with the random pro<:ess {g(X(t)) : t E JR.}. Note
that this random pro<:ess is also a stationary random pro<:ess.
For a nonnegative integer n, let

Bn(x) =
(-l)n
1:::1. exp
(X2)
-2
n
d
-p(x)
V nl 2(J dxn
where p is the univariate Gaussian density function given above.
These functions are called the orthonormalized Hermite poly-

www.MathGeek.com
www.MathGeek.com

160 Random Processes

nomials and are obtained by applying the Gram-Schmidt or-


thonormalization procedure to the collection of functions of the
form xk for nonnegative integers k. Note that 8n is an nth degree
polynomial. Note, also, that for each nonnegative integer n, the
norm of 8n is unity, and ,8n , 8m ) = 0 for nonnegative distinct
integers nand m. Thus, the set of functions {8 n : n E N U {O} }
is a set of orthonormal functions in L 2 (lR., B(lR.) , rn). Indeed, it
is an orthonormal subset of the real Hilbert space L 2 (lR., B(lR.) ,
m).
Recall Lllsin's Theorem which states that for any element 9 of
L 2(lR., B(lR.) , m) and for any positive s there exists a continu-
ous function c(·) that is equal to 9 on a given bounded interval
pointwise off a set of Lebesgue measure less than E. Recalling
the "Vierstrauss approximation theorem, we know that there ex-
ists a sequence of polynomials that converges to c(·) uniformly
on the interval of interest. Thus we can make the uniform norm
between c(·) and a polynomial arbitrarily small on the interval
of concern. This polynomial can be written as a linear com-
bination of the orthonormalized polynomials 8n (·). \Vith this
reasoning, we see that the set of orthonormalized polynomials
{8 n (·) : n E N U {O}} is dense in L 2(lR., B(lR.) , m). Hence, any
9 E L2 (lR., B (lR.), m) can be expressed as
00

9= L bn 8n (-),
n=O

where the convergence is in L 2 (lR., B(lR.) , m). Note that

00 roo
~ b~ = i-oo Ig(x)12 dx < CXJ
via Parseval's eqnality and since 9 is an element of L2(lR., B(lR.) ,
m).
Consider again the bivariate density function f of the random
variables X(tl) and X(t2) where, for convenience, we let X =
X(td and Y = X(t2). This bivariate density admits a senes
expansion, given via the Mehler series, as
00

f(x, y) = p(x)p(y) L pn(T)8n(x)8 n (y),


n=O

www.MathGeek.com
www.MathGeek.com

Nonlinear Transformations 161

where the convergence is in the following sense:

xp(x)p(y)dxdy = O.

\Ve now consider a nonlinearity 9 and the bivariate density func-


tion f as given above. Consider, also, the output random process
g(X(t)). \Ve are interested in the bandwidth characteristics of
the output. The autocorrelation function of the output is given
by Rg(T) = E[g(X(t))g(X(t + T))]. Thus, we see that

Rg(T) = E[g(X)g(Y)] = 1: 1: g(x)g(y)f(x, y) dxdy.

Observe, further, that


=
Rg(T) = L b~pn(T),
11,=0

where the convergence is uniform in T, since the bn's are square


summable and since Ip(T) I : : ; 1.
Now suppose that the input random process {X(t) : t E lP&}
has a spectral density function that has compact support. Let
8 denote the spectral density function of the input, and let n
denote the support of S. \Ve are thus assuming that the input
is bandlimited and that the Lebesgue measure of n is bounded.
Recall that the Fourier transform of ()2 p(T) is equal to 8 (w).
Hence, by well known properties of Fourier transforms, we see
that the Fourier transform of pH (T) is given by

where 8(W)*" = (8 * 8 * ... * 8)(w) denotes 8(·) convolved with


itself n - 1 times. (Note that 8 * 8(w) is 8 convolved with itself
once.) Assume that the nonlinearity 9 is not almost everywhere
equal to a polynomial. Then infinitely many of the bn's are
nonzero. Note that in this case, if the input random process
{ X (t) : t E lP&} were not bandlimited, then the output random
process {g(X(t)) : t E lP&} would not be bandlimited, either.

www.MathGeek.com
www.MathGeek.com

162 Random Processes

Next we show that even if the input random process were ban-
dlimitec!., the output random process is not bandlimited when 9
is not a polynomial. Observe that the support of

is given by the n-fold Minkowski sum of r2 with itself; that is, its
support is r2 EB ... EB r2 = EBnn where EB denotes the lVIinkowski
sum; that is, AEBB = {a+b : a E A, b E B}. Assume again that
the input is bandlimited. It follows that 00 > In( EBnr2) 2: n'\(r2)
where .\ is Lebesgue measure. Further, recalling the Cantor
ternary set, we see that the above inequality can be strict, and
by considering a dosed interval, we see that it can be satisfied
with equality. For the moment, we will measure bandwidth by
the Lebesgue measure of the support of the spectral density.
Thus, we see that the output is bandlimited if and only if there
exists a positive integer N such that bn = 0 for all n > N.
Notice that this condition is equivalent to the nonlinearity being
almost everywhere equal to a polynomial. However, since we are
assuming that the nonlinearity is not almost everywhere equal to
a polynomial, we see that the bn's do not truncate, and thus the
output is not bandlimited. On the other hand, if the nonlinearity
is almost everywhere equal to a polynomial then we see that the
output is bandlimited if and only if the input is bandlimited.
\Ve summarize this result with the following theorem.

6.36 Theorem The output random pTOcess in the above discussion


is strictly band limited if and only if the Gaussian input random
process is strictly bandlim'ited and the nonlinearity 9 (.) is almost
everywhere equal to a polynomial.

Recall, in particular, that "limiters" are not polynomials, and


thus the output of a limiter with a Gaussian input is never ban-
dlimited.
Now we will consider the case where the input random process
is not strictly bandlimited but has a finite second moment band-
width given by

www.MathGeek.com
www.MathGeek.com

Nonlinear Transformations 163

Assume for now that the mean has been subtracted from the out-
put random process. Observe that this is equivalent to assuming
that bo = 0 since Bo(x) = 1 for all x and thus bo = E[g(X(t))].
Observe, also, that (J"-2nS(w)*n can be viewed as a density func-
tion of a sum of n mutually independent, identically distributed
random variables each with mean zero and variance equal to the
second moment bandwidth of the input B2 [X]. Thus it follows
that
1: w2(J"- 2n S(w)*n dw = nB;[Y]

where B2 [Y] denotes the output second moment handwidth.


Now recall that the output antocorrelation fnnction is given by
00

R(T) = L b~pn(T).
n=1
Further, recall £I·om standard properties of Fourier transforms
that B2 [X] = -pl/(O). Similarly, note that

B [Y] = -RI/(O).
2 R(O)

Next, using Fubini's theorem on term by term differentiation,


we will deduce the preceding derivative. Note that p(T) is max-
imized at the origin. Further, there exists a positive number 6
such that p( T) is monotone in (0, 6). Also, pn (T) has the same
monotonicity property. Taking derivatives from the right, and
using Fubini's theorem on term by term differentiation, we see
that
00

n=1
00

11,=1
00

RI/(T) = L (b~n(n - 1)pn-2(T)(p'(T))2 + b~npn-1(T)pl/(T)) .


11,=1
Thus, we see that
00

RI/ (0) = L nb~pl/ (0).


n=1

www.MathGeek.com
www.MathGeek.com

164 Random Processes

Further, we see that

B [Y] = -R"(O) = 2:::=1 nb;'B [X].


2 R(O) 2::~=1 b~ 2
Thus the output second moment bandwidth is greater than or
equal to the input second moment bandwidth with equality hold-
ing precisely when hn = 0 for all n > 1. This, however, charac-
terizes the case where 9 is almost everywhere equal to an affine
function; that is, when 9 (x) = ax + b for real numbers a and b.
\Ve summarize this result in the following theorem.

6.37 Theorem If {X(t) : t E lR} is a zero mean Gaussian random


process that has a finite mean squar-e bandwidth, and if 9 is a
nonlinear-ity such that 9 (X (t)) has zero mean, then the mean
square bandwidth of 9 (X (t)) is greater- than or equal to that of
the 'input. Eq1wlity holds if and only 'if 9 is almost ever-ywher-e
equal to an affine fm~ction.

<> 6.13 Brownian Motion

The random processes that we will describe in this section were


first used to model the movement of a partide suspended in a
fluid and bombarded by molecules in thermal motion. Such mo-
tion was first analyzed by a nineteenth-century botanist named
Robert Brown. The mathematical foundations of the theory
were later developed by Albert Einstein in 1905 and (rigorously)
by Norbert vViener in 1924.
A Brownian motion process (or a \iViener process) is a random
process {W(t) : t ~ O} defined on some probability space (0,
F, P) that satisfies the following four properties:

1. n'(O, w) = 0 for each w E 0,

2. for any real numbers 0 ::; to < tl < ... < t k , the incre-
ments ~V(tk) - Vqt k- 1 ), VV(t k- 1) - ~V(tk-2)' ... , H'(t1)-
n'(to) are mutually independent,

www.MathGeek.com
www.MathGeek.com

Brownian Motion 165

3. for 0 :::; s < t, the increment W~(t) - W~(s) is a Gaussian


random variable with mean zero and variance t - s,

4. for each w E r2, W~(t, w) is continuous in t.


6.38 Theorem There exists a random process defined on a probabil-
ity space (that may be taken as the und interval wdh Lebesgue
measure) that satisfies conditions (1), (2), (3), and en
given
above.

Returning to the physical motivation given above, a Brownian


motion process may be used as a model for a single compo-
nent of the path of a suspended particle subjeded to molecnlar
bombardment. For example, consider the projedion onto the
vertical axis of snch a particle's path. Condition (2) refleds a
lack of memory of the suspended particle. That is, althongh the
future behavior of the particle depends on its present position,
it does not depend on how the particle arrived at its present po-
sition. Condition (3), which specifies that the increments have
zero mean, indicates that the particle is equally likely to go up or
down. That is, there is no drift. Condition (3), which specifies
that the variance of the increments grows in proportion to the
length of the interval, indicates that the particle tends to wan-
der away from its present position and having done so suffers no
force tending to restore it. Condition (4) is a natural condition
to expect of the path of a particle and condition (1) is merely a
convention.
Since ~V(t) - tV(O) = tt'(t) , property (3) implies that ~V(t)
is Gaussian with mean 0 and variance t. If 0 :::; s < t then,
using the previous properties, we see that E[vV(s)vV(t)] =
E[~V(s)(vV(t) - ~V(s))] +E[vV 2 (s)] = E[vV(s)]E[tt'(t) - ~V(s)] +
E[~V2(S)] = s. Thns, it follows that E[~V(s)~V(t)] = min{s, t}.

6.39 Theorem There exists a Brownian mot'lon process that is a


measurable random process.

6.40 Theorem With probability one, lV(t, w) zs nowhere differen-


tiable as a function of t.

Thus, off a null set, the sample paths of a Brownian motion


process are continuous and nowhere differentiable. A nowhere

www.MathGeek.com
www.MathGeek.com

166 Random Processes

differentiable path represents the motion of a particle that at


no time has a velocity. Further, since a function of bounded
variation is differentiable a.e., the sample paths of a Brown-
ian motion process are of unbounded variation almost surely.
Brownian motion is a commonly used. model for noise in engi-
neering applications. In the following example we will find the
Karhunen-Loeve expansion of a Brownian motion process.

Example 6.6 Consider a Brownian motion process restricted


to the interval [0, 1]. The autocovariance function of such a
random process is given by K(s, t) = min(s, t) for s, t E [0,
1]. To find the eigenvalues of this integral operator associated
with this autocovariance function, we must solve the integral
equation
1
10 min(s, t)e(t) dt = >.e(s); 0:::; s :::; 1

which reduces to

s 11
loo te(t) dt + s e(t) dt = >.e(s); 0 :::; s :::; 1.
s
(1)

Leibniz's rule 2 implies that

-d los te(t) dt = se(s)


ds 0

and that
:s 11 se(t) dt = 11 e(t) dt - se(s).
Thus, differentiating (1) with respect to s implies that

1 d
1
. s
e(t) dt = >.-e(s)
ds
(2)

2If a1(t, s) exists and is continuous and if o:(s) and (3(s) are difIeren-
as
tiable real-valued functions then

d 1;3(S)
----;; . 1(t, 8) dt = 1(3(8),
d3(s)
8)-'-~-- 1(0:(8),
do:(s) 1;3(S)
s)-.-+
a)(t, 8) dt.
C)
de n(8) de ds . 0«8) uS

www.MathGeek.com
www.MathGeek.com

Brownian Motion 167

and differentiating (2) with respect to s implies that


d2
-e(s) = A~e(s). (3)
ds
Recall that a solution of (3) will have the form

for A > 0 and A, B E JR. Setting s = 0 in (1) implies that


e(O) = 0 for A > 0 and hence that B = O. Setting s = 1 in (2)
implies that cos(l/ v1) = 0 which in turn implies that
1 (2n - 1)11"
vIA 2
for n E N. Thus, writing e(s) as a function of nand s we have

en(s) = A sin C~ 2n
l)11"S)

for n E N. Note that


1
10 cn(s)em(s) ds
= 2A2 [sin((j - k)11"/2) _ sin((j + k)11"/2)]
11" 2(j-k) 2(j+k)
= o.
Thus, the en's are orthogonal. Requiring the en's to be orthonor-
mal implies that A = J2 since the sum and difference of odd
numbers is even. Thus, the eigenvalues are given by
4
An = (2n - 1)211"2
and the orthonormalized eigenfunctions are given by
en(t) = J2 sin((2n - 1)11"t/2)
for n E N. Thus, the Karhunen-Loeve theorem implies that
=
X(t) = L Znen(t)
n=l

where

for each n E N. D

www.MathGeek.com
www.MathGeek.com

168 Random Processes

6.14 Caveats and Curiosities

www.MathGeek.com
www.MathGeek.com

7 Problems
7.1 Set Theory

Problem 1.1. Let n be a nonempty set and, for each t E lR,


let At be a subset of n. Assume that if t1 < t2 then Atl C A t2 .
Show that UtElR At = UnEN An.

Problem 1.2. Let 0 be a nonempty set, let :F be a IT-algebra


on 0, and let A be a nonempty subset of O. Let 9 be a family
of subsets of A given by 9 = {B E lP(A) : B E :F}. Is 9 a
IT-algebra on A?

Problem 1.3. Prove or Disprove: The set of all integers is


equipotent to the set of all positive, even integers.

Problem 1.4. Consider a nonempty set 0 and let :F be a


sllbset of lP(n). Show that IT(:F) exists and is llniqlle. (Recall
that IT(:F) is the smallest IT-algebra on 0 that contains every
element in :F.)

Problem 1.5. Consider a nonempty set O. A subset of 0


is said to be cofinite if its complement is finite. (That is, A is
cofinite iff AC is finite.) Let:F be the subset of lP(O) consisting
entirely of all finite and cofinite subsets of n. Must:F be an
algebra on O? Must :F be a IT-algebra on O?

Problem 1.6. Consider non empty sets X and Y and consider


a function f: X ---+ Y. Show that j (j-1 (A)) c A for all A C Y
and that B C j-1(f(B)) for all B C X.

Problem 1.7. For a function f: X ---+ Y, show that the


following three statements are equivalent:

1. j is one-to-one.

www.MathGeek.com
www.MathGeek.com

170 Problems

2. f(A n B) = f(A) n f(B) for all A C X and all B C X.

3. f(A) n f(B) = 0 whenever A and B are disjoint subsets


of X.

Problem 1.8. Show that .f: X ---+ Y is onto if and only if


f(f-I(B)) = B for each subset B of Y.

Problem 1.9. Consider the set S of all sequences of the form


°
{aI, a2, a3, ... } where ai is equal to or 1 for each i. (For
example, {O, 0, 0, ... }, {I, 1, 1, ... }, and {I, 0, 1, 0, ... } are
all points in S.) Show that S is an uncountable set. (That is,
show that there are uncountably many different such sequences
of O's and 1 's.)

Problem 1.10. Any real number that is a root of a (nonzero)


polynomial with integer coefficients is called an algebraic num-
ber. (For example, y'2 is algebraic yet 7r is not.) Show that the
set of all algebraic numbers is countable.

<) Problem 1.11. Show that any uncountable set of positive real
numbers includes a countable subset whose elements sum to 00.

Problem 1.12. Let n be a nonempty set and let F be a


collection of subsets of n such that n E F and such that if A
and B are in F then A \ B is in F. Show that F is an algebra
on n.

<) Problem 1.13. A IT-algebra is said to be countably generated


if it is equal to IT(AI' A 2 , ... ) for some countable colledion {An}
of measurable sets. Show that B(JR.) is countably generated.

Problem 1.14. 'What is the smallest IT-algebra on JR. that


contains every singleton subset of JR.?

www.MathGeek.com
www.MathGeek.com

Measure Theory 171

Problem 1.15. Consider an uncountable set A and let B be


a countably infinite subset of A. Show that A is equipotent to
A \B.

Problem 1.16. Show that the interval (0, 1] is equipotent to


the set of all nonnegative real numbers.

<> Problem 1.17. Does there exist a a-algebra with a countably


infinite number of elements?

7.2 Measure Theory

Problem 2.1. Let f-L: JPl(N) ---7 [0, 00] via f-L(A) = 0 if A is a
finite set and f-L(A) = 00 if A is not a finite set. Is f-L a measure
on (N, JPl(N))?

Problem 2.2. Prove or Disprove: Let (D, F, f-L) be a measure


space and let {An}nEN be a sequence of sets from F. Assume
that the sequence {An}nEN is a strictly decreasing sequence; that
is, assume that, for each positive integer fl, An+1 is a proper
subset of An. If the sequence {An}nEN converges to the empty
set as n ---7 00 then f-L(An) converges to zero as n ---7 00.

<> Problem 2.3. Let D = JR2 and, for each positive integer TI, let
An be the open ball ofradius one centered at the point (( -l)n In,
0). Findliminfn ---7= An andlimsuPn---7= An.

Problem 2.4. Consider the measure space (JR, B(JR) , A) where


A denotes Lebesgue measure. Let {At: t E I} be a collection of
null sets where I is any index set.

1. Show that UtE! At need not be a measurable set.


2. If UtE! At is measurable then must it be a null set?

www.MathGeek.com
www.MathGeek.com

172 Problems

I would quarrel with mathematics, and say


that the sum of zeros is a dangerous num-
ber. -Stanislaw Jerzy Lec

Problem 2.5. Show that any countable subset of lR. IS


Lebesgue measurable and has Lebesgue measure zero.

Problem 2.6. Consider a function .I that maps lR. into R If


1.11 is a Borel measurable function then must .I also be a Borel
measurable function? Explain.

<) Problem 2.7. Suppose that P1 and P2 are probability mea-


sures on rr(P) where P is a 1l"-system. Prove that if Hand P2
agree on P then they also agree on O"(P).

7.3 Integration Theory

Problem 3.1. Let .In: [0, 1] ---+ lR. be defined via

fn(x) = { °n if x E (0, lin)


if x tf. (0, lin).

1. Find limn--;oo .In (x) .

2. Find limn--;oo f01 .In (x) dx.


3. Find fei limn--;oo fn(x) dx.

Problem 3.2. Let 0 = {1, 2, 3, 4} and let J1 be a real-valued


function on lP(O) such that J1(A) is equal to the number of points
in A. (For example, J1( {1, 3}) = 2 and J1(0) = 4.) Show that

www.MathGeek.com
www.MathGeek.com

Integration Theory 173

JL a measure on (0, lP(O)). Let f


IS 0 ---+ lR. via f(w) = w2 .
Evaluate the Lebesgue integral

10 f dJL.

Problem 3.3. Consider a continuous probability distribution


function 1 F for which F(a) = 0 and F(b) = 1 for some a and b
from R Evaluate the following Riemann-Stieltjes integral:

r
J[a,b]
F(x) dF(x).

Problem 3.4. Let F be a probability distribution function


that is absolutely continuous and let c > o. Evaluate the follow-
ing integral:
1: (F(x + c) - F(x)) dx.

Problem 3.5. Engineers frequently use the "delta function"


l5(t) which has the interesting property that if f : lR. ---+ lR. is
c:ontinnolls at the origin then

1: l5(t)f(t) dt = f(O).

Unfortunately, no such function 15 exists since if it did it would


equal 0 for nonzero t, and hence would integrate to zero. That
is, the above integral would be zero for any continuous function
f. \Ve can, however, obtain such a "sampling property" using
a Riemann-Stieltjes integral. For what function 9 : lR. ---+ lR. is it
true that

1(a, b)
f(x) dg(x) = f(O)
when .f is continnolls at the origin and when a < 0 < b? Why
can't we simply define l5(t) to be the derivative of g(t)?

1 Probability distribution functions are defined on page 70 in Section 5.2.


If you have not yet studied that section, you may want to defer this problem
and the next problem until later.

www.MathGeek.com
www.MathGeek.com

174 Problems

7.4 Functional Analysis

Problem 4.1. For real numbers x and y, let d(x, y) = (X-y)2.


Does d define a metric on the set of real numbers?

Problem 4.2. Let (lvI, p) be a metric space. Show that the


closure of an open ball B(x, r) = {y E !vI : p(x, y) < r} need
not equal the corresponding closed ball B(x, r) = {y EM: p(x,
y) :::; T}.

Problem 4.3. Let a and b be real numbers with a < b. Let


C[a, b] denote the set of all real-valued functions that are con-
tinuous on [a, b] and consider a metric don C(a, b] defined by
d(j, g) = max li(t) - g(t)l·
tEla. b]

Show that if {in }nEN is a sequence of points in C[a, b] such that,


for t E [a, b], fn(t) ---+ 0 as n ---+ 00 then it need not follow that
d(jn' 0) ---+ 0 as n ---+ OC).

Problem 4.4. Consider the set Q of all rational nnmbers


endowed with a metric d given by d(x, y) = Ix-yl. This metric
space is called the rational line. Show that the rational line is
not complete.

7.5 Distributions & Probabilities

Problem 5.1. Assume that Band C are random variables


possessing a joint probability density function given by

i E, C (b, c) = {I0
for 0 :::; b :::; 1 and 0 :::; c :::; 1
otl' len,VIse.
.
\Vhat is the probability that the roots ofthe equation x2+2Bx+
C = 0 are real?

www.MathGeek.com
www.MathGeek.com

Distributions &. Probabilities 175

Problem 5.2. Consider a random variable X that has a uni-


form distribution on (0, 1). ·What is the probability that the
first digit after the decimal point in VX will be a 3?

Problem 5.3. Consider a random variable X that has a


continuous, stridly increasing, positive probability distribution
function Fx. What is the probability distribution function of
the random variable Z = Fx(X)?

Problem 5.4. Consider a random variable X that has a


continuous, strictly increasing, positive probability distribution
function F. Find a probability density function for the random
variable Y = -In(F(X)).

Problem 5.5. Consider random variables X and Y, let Fx


denote the distribution fundion of X, let Fy denote the distri-
bution fundion of Y, and let F x , y denote the joint distribution
function of X and Y. Let Z = max{X, Y} and let W = min{X,
Y}. Find the distribution of Z and the distribution of ~V in
terms of F x , F y , and Fx,Y.

Problem 5.6. Consider real numbers Xl, X2, Yl, and Y2 such
that Xl ::; X2 and Yl ::; Y2. Show that if F(x, y) is a joint
probability distriblltion fundion then it mllst follow that

Show that the function


0 if X +Y < 1
G(x, y) = { 1 1·f· x+y ~
1

is not a joint probability distribution fundion.

Problem 5.7. Find the marginal probability density function


.fx if the joint probability density function .fx. y is uniform on
the circle of radius one centered at the origin.

www.MathGeek.com
www.MathGeek.com

176 Problems

7.6 Independence

Problem 6.1. Consider a probability space (n, F, P) and two


events A and B from F that are independent. Show that Ae and
Be are also independent events.

Problem 6.2. Assume that a dart is thrown at a circular


dart board having unit area in such a way that the probability
the dart lands in any particular circular region of the board is
given simply by the area of that region. Let (X, Y) denote the
coordinates of the dart's position on the board after one throw.
Are the random variables X and Y independent? Explain.

Problem 6.3. Consider a toss of two fair dice. Let A denote


the event that the number appearing on the first die is even. Let
B denote the event that the nnmber appearing on the second
die is odd. Let C denote the event that the numbers on the two
die are either both even or both odd. Are the events A, B, and
C mutually independent? Explain.

Problem 6.4. Consider a monkey that is seated at a type-


writer and who makes a single keystroke each second. Assume
that the keystrokes are mutually independent events. (Is this
a good assumption for a human typist?) Further, assume that
the set of all possible outcomes of a keystroke include all low-
ercase and uppercase English letters, the numbers zero through
nine, all punctuation, and a space. Assume that each possible
outcome of a keystroke has a fixed positive probability of be-
ing typed. The typewriter never fails, the monkey is immortal,
and there is an endless stream of paper. (All standard assump-
tions!) Prove that, with probability one, the entire script of the
play Hamlet by 'William Shakespeare will be typed an infinite
number of times.

www.MathGeek.com
www.MathGeek.com

Random Variables 177

Problem 6.5. Let X and Y be random variables possessing a


joint probability density function given by f (x, y) = 2 exp( -x-
y) for 0 < x < y < 00. Are X and Y independent?

Problem 6.6. Assume that missiles are fired at a target in


such a way that the point at which each missile lands has a uni-
form distribution on the interior of a dis(; of radius 5 miles (;en-
tered aronnd the target. If we assume that the points at whi(;h
the missiles land are rrmtnally independent, then how many mis-
siles rrmst we fire to ensnre at least a 0.95 probability of at least
one hit not more than one mile from the target?

7.7 Random Variables

Problem 7.1. Consider a random variable X defined on a


probability space (0, :.F, P) such that X(w) = 87 for each w E
n. VVhat is IT(X)?

Problem 7.2. Consider a probability space (lR, B(lR) , P)


where P is any probability measnre on (lR, B(lR)). Let X
be a random variable defined on this space via X (w) = w2 .
(Note that such a definition is possible only because we have let
n = JR.) vVhat is cr(X)?
Problem 7.3. Let X and Y be random variables su(;h that
E[(X - y)2] = o. Show that X = Y a.s.

Problem 7.4. Consider a random variable X with probability


density function
1
fx(x) = "2 exp( -Ixl)
for x E JR. Use Chebyshev's inequality to find an upper bound.
on the probability that IXI > 2. ·What is the actual value of
that probability?

www.MathGeek.com
www.MathGeek.com

178 Problems

Problem 7.5. Consider a random variable X defined on a


probability space (D, F, P). Let 9 : JR. ---+ JR. be a Borel mea-
surable function and let Y = g(X). Show that cr(Y) C cr(X).
\Vhen will cr(Y) = cr(X)?

<) Problem 7.6. Consider a random variable X. Show that


cr(X) is countably generated.

<) Problem 7.7. Prove that a function X mapping a measurable


space (D, F) into (JR., B(JR.)) is a random variable if and only if
the set {w ED: X (w) :::::; x} is an element of F for each x E R

Problem 7.8. Let X and Y be integrable random variables


defined on (0, F, P). Show that X = Y a.s. if and only if

for all F E F.

Problem 7.9. Consider independent random variables X and


Y such that each has a uniform distribution on the interval [0,
2]. Find E[IX - YI].

Problem 7.10. For a positive integer TL, let Xl, ... , Xn be a


collection of mutually independent, identically distributed ran-
dom variables each with a uniform distribution on the interval
[0, e] for some fixed positive real number e. If Z = max{XI'
... , Xn} then what is E[Z]?

Problem 7.11. Consider a random variable X whose charac-


teristic function Cx(t) is such that C x (2) = 0. For a fixed real
number s, find E[cos(X + s) cos(X + s + 1)].

www.MathGeek.com
www.MathGeek.com

Moments 179

7.8 Moments

Problem 8.1. Consider a nonnegative, integrable random


variable X defined on a probability space (0, F, P). Show that

E[X] = 10= P(X > t) dt.

Problem 8.2. Consider a random variable X with a finite


second moment. Show that E[(X - m)2] is minimized over all
m E Jl{ when Tn = E[X].

Problem 8.3. Consider random variables X and Y with finite


second moments. Show that E[XY] is finite.

Problem 8.4. Consider random variables X and Y with finite


second moments. Show that COV[X, Y] = E[XY]- E[X]E[Y].

Problem 8.5. Consider random variables X and Y with finite


second moments. Show that Ip(X, Y)I :::; 1.

<) Problem 8.6. Consider random variables X and Y with finite


second moments. vVhat can be said about X and Y if p(X,
Y) = ±1?

Problem 8.7. Let Y be a random variable with a uniform


distribution on [a, b] where a < b. \iVhat is VAR[Y]?

Problem 8.8. If X is Poisson with parameter A > 0 then what


is VAR[X]? Let Xl, X 21 X 3 , and X 4 be mutually independent,
Poisson random variables ea<:h with a mean equal to 3. Let
Y = 4XI + X 2 + 6X3 + 3X4 . What is VAR[Y]? (The Poisson
distribution is defined in Example 5.11 on page 105.)

www.MathGeek.com
www.MathGeek.com

180 Problems

Problem 8.9. Consider random variables X and Y such that


each has a finite positive second moment. Find a real number a
for which E[(X - ay)2] is minimized.

Problem 8.10. Let Xl, ... , Xn be mntnally independent ran-


dom variables each with variance (}2 and mean J-L. Find the
correlation coefficient between 2:7=1 Xi and Xl.

Problem 8.11. Let X be a random variable with a Poisson


distribution having parameter A. Find E[t X ] for t E lit (The
Poisson distribution is defined in Example 5.11 on page 105.)

Problem 8.12. For an integer n > 1, let Xl, ... , Xn be


mutually independent random variables that are uniformly dis-
tributed over the interval (-1, 1). Find the <:haraderistk fun<:-
tion for the sum Xl + ... + X n.

<) Problem 8.13. Let <I>(t) be the characteristic function of a


random variable that possesses an even probability density func-
tion. Show that 1 + <I> (2t) ;:::: 2<I> 2 (t) for all t E lit

Problem 8.14. Let X denote the nnmber of 'Heads' that


o<:<:nr when a fair min is flipped twke. vVhat is the moment
generating function of X? Find E[xn] for n E N.

Problem 8.15. Consider independent random variables X


and Y such that
wp 1/3
wp 1/3
wp 1/3
and
Y = {0 wp 1/3
1 wp 2/3.
Let Z be a random variable such that
if X +Y = 0 or X +Y = 3
if X +Y = 1
if X +Y = 2.

www.MathGeek.com
www.MathGeek.com

Transformations of Random Variables 181

Find 1\IIx(s) (the moment generating function of X), Alz(s) (the


moment generating function of Z), and 1\IIx+z(s) (the moment
generating function of X + Z). Is l\1x+z(s) = 1\IIx(s)Mz(s)?
Are X and Z independent random variables?

Problem 8.16. Consider a random variable e with a uniform


distribution on [0, 21T]. Let X = cos(8) and let Y = sin(8).
Are X and Y uncorrelated? Are X and Y independent?

<) Problem 8.17. Let Xl, X2, Yl, and Y2 be real numbers such
that Xl #- X2 and Yl #- Y2. Consider random variables X and Y
defined on the same probability space such that P(X = Xl) +
P(X = X2) = 1 with P(X = xd > 0 and P(X = X2) > 0 and
such that P(Y = Yl) + P(Y = Y2) = 1 with P(Y = Yl) > 0 and
P(Y = Y2) > o. Prove or Disprove: If X and Yare uncorrelated
then X and Yare independent.

7.9 Transformations of Random Vari-


ables

Problem 9.1. The radius of a circle is approximately mea-


sured in such a way that the approximation has a uniform dis-
tribution in the interval (a, b) where a > b > O. Find the
distribution of the resulting approximation of the circumference
of the circle and of the resulting approximation of the area of
the circle.

Problem 9.2. Let X and Y be independent random variables


with densities
1 1
fx(x) = -
1["
VI -x 2; Ixl <1
and
fy(y) =
Y
0"2 exp
(_y 2
20"2
)
; Y> 0,

www.MathGeek.com
www.MathGeek.com

182 Problems

respectively. Find the distribution of the product XY.

Problem 9.3. Let X and Y be independent random variables


each with a density fundion given by f(x) = e-xI[o,=)(x). Let
HT = Xj(X + Y). What is the distribution of l)V?

Problem 9.4. Consider a positive random variable X with


density fundion fx. Find a density fundion for IjX.

7.10 The Gaussian Distribution

Problem 10.1. Consider random variables X and Z that


are defined on the same probability space (0, :F, P). Assume
that X has a standard Gaussian distribution and that Z has a
Gaussian distribution with mean 5 and variance 4. Find a real
number a such that P(X > a) = P(Z < 2.44).

Problem 10.2. Let X be a Gaussian random variable with


mean m and variance (}2. \Vhat is E[X3] in terms of rn and (}2?
\Vhat is E[X97] if m = 0 and iT 2 = 38?

Problem 10.3. For a fixed positive integer n, let Z and


Xl, ... , Xn be zero mean, unit variance, mutually independent
Gaussian random variables. Let

and let

w~ ~t,Xf
The random variable W has a density function fw that you do
not need to find. Instead, find an expression for a density func-
tion of T in terms of fw and fz where fz is a density fnndion
for Z.

www.MathGeek.com
www.MathGeek.com

Convergence 183

Problem 10.4. Let X be a standard Gaussian random vari-


able, and let Z be a random variable that takes on the values 1
and -1 each with probability~. Assume that X and Z are inde-
pendent, and let Y = X Z. Show that Y is a standard Gaussian
random variable. Is X + Y a Gaussian random variable? Are X
and Y uncorrelated? Are X and Y independent?

Problem 10.5. Let Xl and X 2 be zero mean, unit variance,


mutually Gaussian random variables with correlation coefficient
1/3. Let X denote the random vector [Xl X2]T. Find a real
2 x 2 matrix e so that the random vector Z = ex is composed
of independent Gaussian random variables.

Problem 10.6. Let X be a Ganssian random variable with


mean 'ml and variance tTi. Let Y be a Ganssian random variable
with mean m2 and variance tT5. Assume that X and Yare
independent and find the distribution of X + Y.

Problem 10.7. Let 11 be a N(ml' aD density function and let


12 be a N(m2' 0"5) density function. Consider a random variable
X that has a density given by )..JI(x) + (1- )..)12(x) where 0 <
).. < 1. Find the moment generating function for X, the mean
of X, and the variance of X.

Problem 10.8. Let X and Y be independent Gaussian ran-


dom variables each with mean zero and variance one. Find
E[max(X, Y)].

7.11 Convergence

Problem 11.1. Define a sequence of mutually independent


random variables as follows:
with probability.!.n
with probability 1 - 1n

www.MathGeek.com
www.MathGeek.com

184 Problems

for n E N. Does Xn ----Jo a a.s.? Explain.

Problem 11.2. A random variable X is said to have a Cau<:hy


distribution <:entered at zero with parameter a > a if X has a
density given by
a
f x (x) = --:-------::-:-
2 1I(a 2+x )'

The characteristic function Cx(t) of X is given by Cx(t) =


exp( -altl). Let {Xn}nEN be a sequence of mutually independent
random variables each having a Cauchy distribution centered at
zero with parameter a = 1. For a fixed positive integer n, let
Sn = Xl + ... + X n. \iVhat is the distribution of Sn/n?

Problem 11.3. Show via an example that a seqnen<:e of ran-


dom variables may converge in probability without converging
in Lp for any p > 1.

Problem 11.4. Let c be a real constant. Show that Xn ----Jo C

in distribution if and only if Xn ----Jo C in probability.

Problem 11.5. Consider a sequence {Xn},,,EN of mutually


independent random variables each with a uniform distribution
on the interval (0, 1]. For each positive integer n, let Zn =
n(1 - max(XI, ... , Xn)). Does Zn <:onverge in distribution? If
so then to what distribution does FZn converge?

Problem 11.6. Consider a nnmerkal s<:heme in whkh the


round-off error to the second decimal place has the uniform dis-
tribution on the interval (-0.05, 0.05). 'What is an approximate
valne of the probability that the absolnte error in the S11m of
1000 s11<:h nnmbers is less than 2?

Problem 11. 7. If you toss a fair coin 10, 000 times then what
(approximately) is the probability that you will observe exactly
5000 heads?

www.MathGeek.com
www.MathGeek.com

Conditioning 185

Problem 11.8. Let {Xn}nEN be a sequence of second order


random variables defined on (0, F, P) and let a be a real num-
ber. Find conditions on E[Xn] and VAR[Xn ] that are both suf-
ficient and necessary to ensure that Xn ----+ a in L 2 .

7.12 Conditioning

Problem 12.1. Let U and V be independent random variables


each with a zero mean, unit variance Gaussian distribution. Let
X = U +V and Y = U - V. Show that X and Yare independent
random variables each with a zero mean Gaussian distribution
having a variance equal to 2. Find E[XIU] and E[YIU]. Are
E[XIU] and E[YIU] independent random variables?

Problem 12.2. Show via em example that E[XIY] = E[X]


need not imply that X and Yare independent.

Problem 12.3. Let X and Y be independent, zero mean


random variables and let Z = XY. Assume that Z has a finite
mean. Find E[ZIX]' E[ZIY]' and E[ZIX, Y].

Problem 12.4. Consider the probability space ([0, 1], B([O,


1]), ).) where ). denotes Lebesgue measure. Consider subsets of
[0, 1] given by A = [0, 1/4]' B = (1/4, 2/3]' and C = (2/3,
1]. Let F be the O"-algebra on 0 given by O"({A, B, C}) and let
X(w) = w2 for W E [0, 1]. Find E[XIF].

Problem 12.5. Consider second order random variables X,


Y, and Z defined on the same probability space. Show that if
X and Z are independent and if X and Yare independent then
E[XZIY] = E[X]E[ZIY] a.s.

www.MathGeek.com
www.MathGeek.com

186 Problems

Problem 12.6. Consider a sequence {Yn}nEN of mutually in-


dependent random variables each with mean zero and positive
variance a 2 . For each positive integer n, let

Problem 12.7. Let X and Y be second order random variables


defined on the same probability space. The conditional variance
of X given Y is denoted by VAR[XIY] and is defined by

VAR[XIY] = E[(X - E[Xly])2IY].

Show that

VAR[X] = E[VAR[XIY]] + VAR[E[XIY]].

Problem 12.8. Let X and Y be random variables defined on


the same probability space and assume that E[X2] < 00. Let 9
be a Borel measurable function mapping lR to R Show that

E[(X - g(y))2] = E[(X - E[Xly])2] + E[(E[XIY]- g(y))2].


For what such function 9 is E[(X - g(y))2] minimized?

Problem 12.9. Let 0 = {I, 2, 3, 4, 5, 6} and let F be the


power set of O. Define a probability measure P on (0, F) by
letting P({w}) = 1/6 for each wE O. Let Q = a({{l, 3, 5}})
and let X(w) = w for each w E O. Find E[XIQ].

Problem 12.10. Consider a probability space (0, F, P) and


let 0 1 , ... , ON be disjoint measnrable subsets of 0 s11ch that
0= 0 1 u· .. U ON and s11ch that P(D i ) > 0 for each i. Let Q be
the a-algebra on 0 generated by 0 1 , ... , ON and let X be an
integrable random variable defined on (0, F, P). Find E[XIQ]
for all w E Oi.

www.MathGeek.com
www.MathGeek.com

True/False Questions 187

Problem 12.11. Let X be a random variable defined on (0,


F, P) such that E[X2] is finite. Let 91 and 92 be cr-subalgebras
of F. If Y = E[XI91] a.s. and X = E[YI92] a.s. then show that
X = Y a.s.

Problem 12.12. Let X be a random variable with mean 3


and variance 2. Let Y be a random variable sl1ch that E[Y] = 4
and E[XY] = -3. If E[YIX] = a + lJX a.s. then find a and lJ.

<) Problem 12.13. Let X and Y be zero mean, positive vari-


ance, mutually Gaussian random variables possessing a corre-
lation coefficient p such that Ipl < 1. Show that E[X2y2] =
E[X2]E[y2] + 2(E[Xy])2.

Problem 12.14. Consider a sequence {Y1 , 1'2, ... } of mutu-


ally independent random variables that are defined on the same
probability space and that each have a mean of 1. For each
positive integer n, let Xn = Y1 Y2 ... Yn . Find

and

where j < m < n.

Problem 12.15. Consider random variables X and Y pos-


sessing a joint probability density f11nction

f(x, y) = 8x'y
' if 0 <
- x. < _.Y <
- 1 and 0 < - x
{
o otherwIse.
Find E[XIY] and E[YIX].

7.13 True/False Questions

A statement that is not always true should be considered false.


For example, the statement "If x 2 = 4 then x = 2" is a false
statement.

www.MathGeek.com
www.MathGeek.com

188 Problems

1. The set IR is a subset of the set IR2.

2. There exists a function f:IR ---7 IR such that f:A ---7

A is a bijection for any nonempty subset A of R

3. Any O"-algebra is a A-system.

4. Consider a complete measure space (D, F, P)


and let A be an element of F. If B c A then B E F.

5. Consider a probability space (D, F, P) such that


nand IR are equipotent. If x:n ---7 IR is bijective then X
is a random variable defined on (n, F, P).

6. Consider two independent random variables X


and Y defined on a probability space (n, F, P). There
does not exist a set A s11ch that A E iJ(X) n iJ(Y) and
0< P(A) < 1.

7. If the second moment of a random variable X


exists then the first moment of X must also exist.

8. If .f:IR ---7 IR is constant a.e. with respect to


Lebesgue measure then f is Riemann integrable.

9. Consider a nonempty set n and two subsets F


and 9 of JID(n). If F n 9 = 0 then O"(F) i= 0"(9).

10. The expected value of an integrable random vari-


able must be an element of the range of that random vari-
able.

11. If two sets A and B are such that A is a subset


of B then there always exists an element x in the set B
that is not in the set A.

www.MathGeek.com
www.MathGeek.com

True/False Questions 189

12. There exists a nonempty set n such that the


power set of n is the smallest cr-algebra that contains n.

13. A probability measure is always a cr-finite mea-


sure.

14. The infimum of a set of positive real numbers


must itself be a positive real number.

15. The collection ofreal I30rel sets is the smallest (J-


algebra on the real line that contains every closed interval.

16. _ _ __ There exist two subsets A and B of JR: s11ch that


B is a Lebesgue null set, such that A is a subset of B, and
such that A is not an element of M(JR:).

17. It is possible for a random variable to be inde-


pendent of itself.

18. A random variable X possessing an even proba-


bility density function must have a mean equal to zero.

19. It is possible for disjoint events to be indepen-


dent and it is possible for disjoint events not to be inde-
pendent.

20. A Lebesgue measurable subset of the real line


that is not countable must have positive Lebesgue mea-
sure.

21. _ _ __ If X and Yare Gaussian random variables then


X + Y must be a Gaussian random variable.

22. _ _ __ If X and Yare uncorrelated Gaussian random


variables then X and Y must be independent random vari-
ables.

www.MathGeek.com
www.MathGeek.com

190 Problems

23. If X and Yare independent Gaussian random


variables then X +Y must be a Gaussian random variable.

24. A function mapping the real line to a finite sub-


set of the real line must be Riemann integrable.

25. Consider a probability space (n, F, P), a ran-


dom variable X defined on this space, and a cr-subalgebra
Q of F. The conditional expectation E[XIQ] must be F-
measurable.

26. If all of the sample paths of a random process are


continuous then all of the sample paths of a modification
of that process must also be continuous.

27. Two distinct second order random processes


must possess distinct auto covariance functions.

28. There does not exist a random variable with a


first moment equal to J2 and a second moment equal to 1.

29. Let n be a nonempty set and let f be a function


mapping n to R There always exists a IT-algebra F on
n so that f is a measurable mapping from (n, F) to (JR,
B(JR)).

30. If 1\2 f dp is a Lebesgue integral then p must be


Lebesgue measure.

31. Consider two random variables X and Y defined


on a probability space (n, F, P). If E[X - Y] = 0 then
P(X = Y) = 1.

32. Consider two random variables X and Y defined


on the same probability space. If E[X + Y] < 00 then
E[X] < 00 and E[Y] < 00.

www.MathGeek.com
www.MathGeek.com

True/False Questions 191

33. Consider a function 9 lR. ----Jo lR. and a random


variable X defined on (D, F, P). The function g(X) will
always be a random variable defined on (0, F, P).

34. If a random variable X is equal almost surely to


a certain mnditional expedation then X mllst be a version
of that mnditional expedation.

35. Consider random variables X and Y defined on


the same probability space. If X = Y a.s. then (J(X) =
(J(y).

36. Consider a random variable X that possesses


an absolutely continuous probability distribution function,
and let 9 be a Borel measurable fundion mapping lR. to R
The random variable g(X) must also possess an absolutely
mntinuous probability distribution fundion.

37. There exists a probability density function f


such that the supremum of the set {.f(x) : x E lR.} is not
finite.

38. Consider two random variables X and Y defined


on the same probability space. If X is (J(Y)-measurable
then O"(X, Y) = O"(X).

39. A set may be equipotent to a proper subset of


itself.

40. Let D be a set containing at least two elements


and let F and 9 be two distinct (J-algebras on D. The set
F u 9 is never a (J-algebra on D.

www.MathGeek.com
www.MathGeek.com

192 Problems

www.MathGeek.com
www.MathGeek.com

8 Solutions

8.1 Solutions to Exercises

1.1. Yes, {0} is a set containing one element and 0 is the set
containing no elements.
1.2. Since the only subset of the empty set is the empty set itself
it follows that {0} is the power set of 0.
1.3. Assume that A c B. If A U B is empty then B is empty
and hence Au B = B. Assume that Au B is not empty and let
x E A U B. By definition of union, it follows that either x E A
or x E B. If x E A then x E B since A c B. Thus, x E Band
we conclude that A U B c B. If B is empty then A is empty
and hence Au B = B. Assume that B is not empty and let
x E B. Then x E Au B which implies that B c Au B. Thus,
we conclude that A U B = B.
Assume that Au B = B. If A is empty then A c B for any set
B. Assume that A is not empty and let x E A. Then x E Au B
which implies that x E B since Au B = B. Thus, we conclude
that A c B.
1.4. The first function is not onto and not one-to-one. The
second function is onto but not one-to-one. The third function
is one-to-one but not onto. The fourth function is bijective with
inverse 1- 1 (x) = yIX.
1.5. Choose some b E B and note that since 1 is onto there
exists some a E A such that 1(a) = b. Since 1 is bijective it
follows that 1-1 ({b}) = {a}; that is, 1-1 (b) = a. Substitution
thus implies that lU- 1 (b)) = b.
Choose some a E A and let 1(a) = b. As above, note that
1-1(b) = a. Substitution thus implies that l- 1U(a)) = a.
1.6. There does not exist a bijection from R into S since no
function from R to S can be one-to-one. There does not exist
a bijection from S into R since no function from S to R can be
onto.

www.MathGeek.com
www.MathGeek.com

194 Solutions

1. 7. Yes, if ] : A ---+ Band] is bijective then ]-1 : B ---+ A is a


bijection from B to A. That is, ]-1 is onto since] is defined on
all of A and ]-1 is one-to-one since if ](a) = 61 and ](a) = 62
then 61 = 62 . (That is, if 61 i= 62 then ]-1 (6 1 ) cannot be equal
to ]-1(62 ).)
1.8. Consider a countable set C and a subset B of C. Since C
is countable there exists a bijection f mapping C to a subset
N of the positive integers. Let 9 mapping B to ](B) be the
restriction of ] to B; that is 9 = ] on Band 9 is undefined on
C \ B. Note that 9 is onto since it maps B to f(B) and that 9
is one-to-one since] is one-to-one. Thus, B is countable since
9 is a bijection from B to ](B) c N.
1.9. For notational simplicity, assume that all of the A;'s are
countably infinite. For each i E N, let Ai = {ai, a~, ... }. (For
example, if ] is a bijection £I·om Ai to N then we could simply
choose aj such that f (aj) = j.) Note that we may arrange the
a~ in matrix form as:

Define a sequence {6i }iEN by selecting elements from the above


array in the following manner:
61 63 66
62 65 69
64 68 613

Note that this sequence defines a bijection from the union of the
Ai'S to N. That is, the countable union is itself countable.
1.10. Yes. This is the smallest possible algebra or O"-algebra on
n.
1.11. Yes. This is the largest possible O"-algebra on n since it
contains every subset of n.
1.12. Five O"-algebras on n are {0, n}, JPl(n) , {0, n, {I}, {2,
3}}, {0, n, {2}, {I, 3}}, and {0, n, {3}, {l, 2}}.

www.MathGeek.com
www.MathGeek.com

Solutions to Exercises 195

1.13. Consider nonempty sets ~ and I, and for each i E I let Ai


be a cr-algebra on~. Let A denote the intersection of the A;'s
for i E I. (That is, A E A if and only if A E Ai for each i E I.)
First, note that ~ E A since ~ E Ai for each i E I. Second,
note that if A E A then A and hence AC is in Ai for each i E I
which implies that AC E A. Finally, assume that An E A for
each n E N. Then An E Ai for each n E N and each i E I. Thus,
UnEN An E Ai for each i E I which implies that UnEN An E A.

Note that a union of cr-algebras need not be a cr-algebra. Let


:F = {A, AC, 0, D} and g = {B, BC, 0, ~}. Note that :F U g
(generally) does not include A U B.
1.14. First, note that D E A since ~c = 0 is finite. Second, note
that if A E A then either AC is finite or has a finite complement
and hence AC E A. Further, note that if A and B are in A
then Au B is finite if A and B are each finite and Au B has
finite complement if either A or B has finite complement since
(A U B)C = AC nBC. Thus, A is dosed under finite unions, and
hence A is an algebra. To see that A is not a O"-algebra let
Ai = {i} for i E N and note that UiEN Ai = N which is neither
finite nor has a finite complement.
1.15. First, note that ~ E A since ~c = 0 is finite, and hence
countable. Second, note that if A E A then either AC is count-
able or has a countable complement and hence Ac E A. Now,
let A; for eachi E N be an element from A. If each of the Ai'S is
countable then so is their countable union. If one or more of the
Ai's is cocountable then (by DeMorgan's Law) it follows that
their countable union is also cocountable. In each case, we see
that the union of the A;'s is in A. Thus, A is both an algebra
and a cr-algebra.
1.16. They each equal {0, ~}, but for different reasons. The cr-
algebra 0"(0) is the smallest O"-algebra on ~ that contains every
set in 0. Since there are no sets in 0, cr(0) is simply the smallest
cr-algebra on D, which is {0, D}. The cr-algebra cr( {0}) is the
smallest cr-algebra on ~ that contains 0, which again is {0, ~}.
1.17. This again is simply {0, ~}.
1.18. This is {A, AC, ~, 0}.

www.MathGeek.com
www.MathGeek.com

196 Solutions

1.19. Note that IT({A, B}) = {O, 0, A, AC, B, BC, AUB, (AU
B)C, A U BC, B \ A, B U AC, A \ B, AC U BC, A n B, A D. B,
(A D. B)C}.
1.20. Yes, 0 E F and 0 C F.
1.21. No. Let A = lR and let A = {0, A}. Further, let f: lR---+
lR via f(x) = 3 for all x E R Then f(A) = {0, {3}} is not a
IT-alge bra on lR since lR t/:. f (A).

For another example, let A = lR and let A = {0, A, {5}, {5y}.


Further, let f : lR ---+ [-1, 1] via f(x) = sin(x). Then f(A) =
{0, [-1,1]' {sin(5)}} which is not a O"-algebra on [-1, 1] since
it does not contain {sin(5)}c.
2.1. Yes. Since every real number is an upper bound of 0
it follows that the least upper bound (or supremum) of 0 is
-00. Since every real number is a lower bound of 0 it follows

that the greatest lower bound (or infimum) of 0 is 00. Thus,


sup 0 < inf 0.
2.2. Note that

(lim sup A~)C


[kOl nQk A~1 C

kQl [rQk A~l c


= =
u n Am
k=l m.=k
liminf An.

2.3. Assume that lim inf An is not empty and note that
= =
w E lim inf An ::::} W E U n Am
k=l m=k
::::} 3N st WEn= Am
rn=N
=
::::} W E U AmVk
m=k

www.MathGeek.com
www.MathGeek.com

Solutions to Exercises 197

: : } wEn= 00

U Am = lim sup An·


k=l m=k

2.4. Recall that lim inf An consists of all those points that belong
to all but perhaps a finite number of the An's. Let a be a positive
real number. Choose a positive integer N so that liN < Ct. Note
that a ¢:. An when n is an even integer greater than N. Since
there are an infinite number of such n's it follows that Ct cannot
be in lim inf An. A similar argument implies that no negative
real number is in lim inf An. Note, however, that since 0 E An
for each 11, it follows that 0 E lim inf An. Thns, we conclude that
lim inf An = {O}.
Recall that lim sup An consists of all those points that belong to
infinitely many ofthe An's. Note that any real number from the
interval (-1, 0] is in An for any even integer n, and that any
real number from the interval [0, 1] is in An for any odd integer
n. Further, any real number outside of these intervals is not in
An for any n. Thus, lim sup An = (-1,1].
2.5. Note that this exercise asked you to try to find a non-Borel
set. In particular, you could have sucessfully completed this
problem without actually finding such a set!
The purpose of this exercise is to convince the reader that con-
structing a non-Borel subset of the real line is not a trivial task.
Since the construction of such a set at this point would take us
rather far afield, a non-Borel set will not be presented here. For
many examples, see the book Counterexamples in Pmbability
and Real Analysis by Gary Wise and Eric Hall.
A proof for the e.Tistence of a non-Borel set is not quite as dif-
ficult. It follows immediately from the fact that the set of real
Borel sets is equipotent to Itt
2.6. To begin, we will show that any singleton subset of lR is a
real Borel set. Note that, for any x E lR,

n
{x} = = ( x - -, x1 + -1) .
n=l n n
Thus, since {x} is a countable intersection of bounded open
intervals it follows that {x} must be an element of 8(lR), the
smallest O"-algebra containing every bounded open interval.

www.MathGeek.com
www.MathGeek.com

198 Solutions

Now, let C be a countable subset of R Since C is countable


we may enumerate its elements as a sequence {C1' C2, ... }. Note
that
=

n=l

Thus, C is a countable union of sets from B(JR) it follows that


C must also be an element of B (JR).
2.7. A function 1 : JR ---7 JR is continuous if and only if 1-1 (U) is
open for every open subset U of R Further, a function 1 : JR ---7
JR is Borel measurable if and only if 1- 1 ((-00, x)) is a Borel
set for each x E R Since (-00, x) is open for each x E JR and
each open subset of JR is a Borel set, the desired result follows
immediately.
2.8. The Cantor ternary set is an uncountable subset of JR that
has Lebesgue measure zero.
2.9. Dirac measure on a single point will yield the power set of
the reals when completed.
3.1. Let 9 denote the collection of all subdivisions of [a, b] and
recall that V = sup{ S(r) : rEg} where

s(r) = L 11(ai) - l(ai-1)1


i=l

ifr = {ao, aI, ... , am}. Since 11(x) - I(Y)1 s Clx - YI for all
x and Y in [a, b] it follows that
Tn

S(r) S CLai - ai-1 = C(b - a)


i=l

for any rEg and hence that V = C(b - a).


5.1. Since the set {w En: X(w) S n} converges to the empty
set as n ---7 -00 it follows from Lemma 2.1 that F(n) ---7 0 as
n ---7 -00. From this the desired result follows immediately.
5.2. VVe must show that limyl x F(y) = F(x). Again, we can use
Lemma 2.1 since the set {w En: X(w) S x + (lin)} converges
to the set {w En: X(w) S x} as n ---700.
5.3. Since P(X S x) = P(X < x) + P(X = x), the desired
result will follow if we show that P (X < x) = limyTx F (y ). Let

www.MathGeek.com
www.MathGeek.com

Solutions to Exercises 199

{Un}nEN be a strictly increasing sequence whose limit is x, and


let An = {w En: X(w) ~ Un}. Note that U~=l An = {w En:
X(w) < x}. Note further that

as n ----7 00 since An C An+1 for each n E N. Thus, the desired


result follows from Lemma 2.1.
5.4. Recall that

Thus,
10= x f(x) dx = +00

and
1°= x f(x) dx = -00

which implies that the first moment does not exist. Note that

2. if n is even and n> 2 then lim xnf(x) = 00, and


x~±=

3. if n is odd and n > 1 then lim xn.f(x) = ±oo.


X---7±=

Thus, the odd moments do not exist and the even moments are
infinite.
5.5. The only way that a Lebesgue integral of a measurable
fundion can fail to exist is if one encounters a sum of the form
00 - 00. This cannot occur if the measurable function is non-

negative or nonpositive.
5.6. Note that

VAR[X] E[(X - E[X])2]


E[X2]- 2E[XE[X]] + E2[X]
E[X2]- 2E2[X] + E2[X]
E[X2]- E2[X].

www.MathGeek.com
www.MathGeek.com

200 Solutions

5.7. Recall that X and Y possess a joint density of the form

fx,Y(x, y) = 2
1
J1 :2 exp
(-q(X,2 y))
1!"iT1iT:2 - P

where q(x, y) =

Further, recall that X has a density function fx given by

fx(x) = l f(x, y) dy.

The desired result follows after substituting, completing the


square, and integrating.
5.8. Recall that X and Y possess a joint density of the form

fx,y(x, y) = 2
1
J1 2 exp
(-q(X,2 y))
1!"iT1 iT2 - P

where q(x, y) =

Further, recall that

The desired result follows immediately after finding

II xy f(x, y) dxdy.

5.9. No, see Problem 11.3.


5.10. Consider a sequence {Xn}nEN of random variables defined
as follows on the probability space ([0, 1], 8([0, 1]), A) where A
is Lebesgue measure on 8([0,1]). Let Xl = 1[0,1/2], X 2 = 1[1/2,1],
X3 = 1[0,1/4], X 4 = 1[1/4,1/2], X5 = 1[1/2, :3/4], X6 = 1[3/4,1], X 7 =
1[0, l/S] , Xs = I[l/S, 1/4], ... , X 14 = I[7/S,1], X 15 = 1[0,1/16], etc.

www.MathGeek.com
www.MathGeek.com

Solutions to Exercises 201

Note that Xn does not converge to zero at any point in [0, 1]


even though E[IXn - OIP] = E[Xn] ---+ 0 as n ---+ 00 for any p > O.
5.11. Consider the probability space given by (0, 1), the Borel
snbsets of (0, 1), and Lebesgne measnre. Define a seqnence of
random variables on this space by setting Xn(w) = 2nI(o.1/n)(W)
for n E N. Note that Xn converges pointwise to zero as n ---+ 00.
However,
l/n np 2np
E[IXn - OIP] = E[X~] =
l
o
2 dw = - ,
n
which goes to 00 as n ---+ 00 for every p > O. Thns, the Xn's do
not converge to zero in Lp.
5.12. Since X is a random variable on (r2, F, P) it must be F-
measurable, and thus satisfies the first property in the definition
of E[XIF]. Further, X trivially satisfies the second property of
that definition. Thus, E[XIF] = X a.s.
5.13. Since E[X] is a constant it is measurable with respect to
any O"-algebra and thus satisfies the first property in the defini-
tion of E[XI{0, r2}]. Further, note that

and that any integral over 0 is zero. Thns, E[X] satisfies the
second property in the definition of E[XI{0, r2}]. \Ve condnde
that E[XI{0, D}] = E[X] a.s. Note, however, that this eqnality
actually holds pointwise since the only null set in {0, r2} is the
empty set.
5.14. Note that

E[XY] E[E[XYIY]]
E[YE[XIY]]
E[YE[X]]
E[X]E[Y].

www.MathGeek.com
www.MathGeek.com

202 Solutions

8.2 Solutions to Problems

1.1. If At is empty then it follows immediately that


UtElR
UtElR At C UnEN An· Assnme then that UtElR At is not empty
and let x E UtElR At. Then there exists some y E lR snch that
x E A y. Let Tn be any positive integer such that Tn > y and note
that by assumption Ay C Am. Hence, x E Am. and x E UnEN An.
Thus, UtElR At C Un EN An.
If UnEN An is empty then it follows immediately that Un EN An C
UtElR At· Assume that UnEN An is not empty and let x E
UnEN An. Then, x E UtElR At since NcR Hence, UnEN An C
UtElR At and we conclude that in fact the two sets are equal.

1.2. Not necessarily. Simply let A be a subset of n that is not


in F. Then, A is not in 9 and hence 9 is not a O"-algebra on A
1.3. Let B denote the set of positive, even integers and define
f: Z ---+ B via f(O) = 2, f(n) = 4n if n E N, and f(n) = 41nl + 2
if -n E N. Since f is bijedive it follows that Band Z are
equipotent.
1.4. To begin, we will show that an intersedion of (J-algebras
on n is itself a O"-algebra on n. Consider a nonempty set n
and a nonempty set A. For each ..\ E A assume that F).. is a
(J-algebra on n and let M = n)..EA F)... Note that n E F).. for
each ..\ E A since F).. is a O"-algebra on n for each ..\ E A. Hence,
n E M. Next, let A E M and note that A E F).. for each ..\ E A.
Hence, AC E F).. for each ..\ E A since F).. is a O"-algebra on n
for each ..\ E A. Thus, A" E M and we see that M is dosed
nnder complementation. Finally, let An E .!\It for each n E N
and note that An E F).. for each ..\ E A and each n E N. Hence,
UnEN An E F).. for each ..\ E A since each F).. is closed under
countable unions. Thus, since this union must also be in M,
it follows that M is closed under countable unions. Combining
these three results we see that .!\It is itself a (J-algebra on n.
Now, returning to the problem, let C denote the family of all
O"-algebras on n that contain each element in F. Note that C
is not empty since lfD(n) E C. Let M denote the (J-algebra on

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 203

n given by the intersection of all of the O"-algebras in C. Note


that M contains every element in:.F. Further, assume that £., is
another O"-algebra on n that contains every element in:.F. Since
£., E C it follows that Me£. Thus, M is the smallest O"-algebra
on n that contains every element in:.F. That is, M = O"(:.F).
To show that M is unique, assume that Ml = O"(:.F) and that
M2 = O"(:.F). By definition of O"(:.F) it follows that Ml c M2
and that M2 c MI. Hence, we conclude that Ml = M 2; that
is, O"(:.F) is the unique such O"-algebra on n.
1.5. The set :.F is an algebra on n but need not be a O"-algebra
on n. Since nc = 0 is finite it follows that n is cofinite and
hence that n E:.F. If A E :.F then A is either finite or cofinite
and hence AC is either cofinite or finite. In either case, AC E :.F.
Finally, let A and B be elements of :.F. If A and B are each finite
then Au B is finite and hence is an element of :.F. If either A or
B is cofinite then either A" or BC must be finite which implies
that (A U B)C = AC n B" is finite and hence that (A U B)C is
in:.F. Since:.F is dosed under complementation it follows that
Au B E :.F and hence that :.F is an algebra on n.
To see that :.F need not be a O"-algebra, let n = lR. and let
An = {n} for each n E N. Note that An is finite for eachn and
hence is an element of :.F for each Tl. However, Un;::!'! An = N
and N is neither finite nor cofinite. Thus, since :.F is not closed
under countable unions it follows that :.F is not a O"-algebra on
n.
1.6. If y E f(1-1(A)) then y = f(x) for some x E f-l(A). If
x E f-1(A) then f(x) E A. Thus, since yEA we condnde that
f(1-1(A)) is a subset of A. If x E B then f(x) E f(B) and
hence x E f-l(1(B)). Thus, B is a subset of f-l(1(B)).
1.7. [(1) =? (2)] If y E f(A) n f(B) then there exists a E A
and b E B such that y = f(a) = f(b). Since f is one-to-one
it follows that a = b E An B and hence that y E f(A n B).
Further, if A n B i- 0 and y E f(A n B) then there exists
some point z E An B such that y = f(z). Since z E A and
z E B it follows that y E f(A) n f(B). Thus, it follows that
f(A n B) = f(A) n f(B).
[(2) =? (3)] This part is obvious since f(0) = 0.

www.MathGeek.com
www.MathGeek.com

204 Solutions

[(3) ::::} (1)] Let f(a) = f(b). If a i= b then {a} and {b} are
disjoint yet f( {a} )nf( {b}) is equal to {f(a)} which is not empty.
Hence f is one-to-one.
1.8. Assume that f is onto and that BeY. If b E B then
there exists some a E X such that f (a) = b and hence snch
that a E f-1(B). Thus, b = f(a) E f(f-1(B)). This and
Problem 1.6 imply that fU- 1 (B)) = B.
Next, assume that f(f-1(B)) = B for every subset B of Y.
If y E Y then f(f-1( {y})) is equal to {y} which implies that
f- 1 ({y}) is not empty. Thus, f is onto.
1.9. Assume that the set S is countable and let the sequence
{a1' a2, ... } denote the elements in S. Construct a sequence /3
of 0' sand l' s as follows: Let the n- th term in /3 be 0 if the n- th
term in an is 1 and let then-th term in /3 be 1 otherwise. Note
that }6 is an element of S yet is different from an for each n EN.
This contradiction implies that the set S is not countable.
1.10. Fix n E N and note that every polynomial p(x) = ao +
a1x + ... + anx n with integer coefficients is nniquely deter-
mined by the point (ao, a1, ... , Ll:n) from the countable set
zn+1. Thus, the set P of all such polynomials is countable and
we may list the elements of P as a sequence {P1, P2, ... }. The
fundamental theorem of algebra implies that the set Ak = {x E
lR. : Pk (x) = O} is a finite set for each k. Since a countable union
of finite sets is countable it follows that the set of all algebraic
numbers is countable.
1.11. A point x E lR. is said to be a point of condensation of a
subset E of lR. if every open interval containing x contains 11n-
countably many elements of E. To begin, we will show that any
uncountable subset E of lR. has at least one point of condensa-
tion.
Assume that there exists no condensation point of E. Then, for
each x E E there exists an open interval Ix such that x E Ix
and such that Ix n E is countable. Let J x be an open interval
such that J x C Ix, such that x E J x , and such that J x has
rational endpoints. Note that Jx n E is also countable. Further,
the collection of all such intervals Jx is countable and may be

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 205

enumerated as N 1 , N 2 , etc. Note that


(Xl

E= U NknE
k=l
which implies that E is countable. This contradiction implies
that E must have at least one point of condensation.
Now, let E be an uncountable set of positive real numbers and
let a be a condensation point of E. If a i- 0 then let (0:, (3)
be an open interval containing a such that CI: > O. Let {Xn}nEN
be a sequen<:e of distind points in (0:, (3) n E and note that
2:~=1 Xn = 00 since Xn > CI: for each n. If a = 0 then since
(0, (3) = Uk=l 0:,
(3) it follows that some interval of the form
(~, (3) contains un<:01mtably many point of E. From this point,
we may pro<:eed as we did when a i- O.
1.12. Since n E F we see that F is closed under complementa-
tion. That is, if A E F then n \ A = Ae E F. Now, let A E F
and B E F. Then Be E F and hence A \ Be = A n B E F.
Thus, F is closed under finite intersections. De lVIorgan's Law
thus implies that F is also dosed under finite unions.
1.13. Re<:all that B(JR) is the smallest IT-algebra on JR <:ontain-
ing all bounded open intervals. Let Q be the collection of all
bounded open intervals of JR with rational endpoints and note
that Q is countable. Further, note that (J(Q) is a subset of B(JR)
sin<:e Q is a snbset of the <:olledion of all bonnded, open inter-
vals. Assume that (J(Q) is a proper subset of B(JR). Then there
must exist an open interval (x, y) that is not an element of (J(Q)
since B(JR) is the smallest (J-algebra containing all such intervals.
Let {Xn}nEN and {Yn}nEN be sequences of rational numbers such
that Xn 1 x and Yn 1 Y with Xn < Yn for each n E N. Note that
since (x, y) = U~=l(xn, Yn) it follows that (x, y) E (J(Q). This
contradiction implies that IT(Q) = B(JR), and thus we see that
B(JR) is count ably generated.
1.14. Consider the IT-algebra F given by the countable and
cocountable subsets of R Note that F contains every singleton
subset of R Assume that there exists a (J-algebra Q such that Q
contains every singleton subset of JR and su<:h that Q is a proper
subset of F. Let F E F with F tj. Q. Note that F can be written

www.MathGeek.com
www.MathGeek.com

206 Solutions

as a countable union of singleton sets or as a complement of such


a countable union. Thus, F E Q. This contradiction implies that
:F must be the smallest IT-algebra containing all singleton sets.
1.15. To begin, note that A \ B is nnconntable. Let C be a
conntable snbset of A \ B. Enumerate the elements of Band
C such that B = {b l , b2 , . . . } and C = {Cl C2, ... }. Finally,
l

consider a fnnction f : A \ B ----7 A via

{~n
if x tj C
1(x) = if x = C2n
Cn if x = C2n-l.
Note that 1 is onto and one-to-one. Thus, we conclude that A
and A \ B are equipotent.
1.16. Let 1 : (0, 1] ----7 [0, (0) via 1 (x) = (1 - x) / x . Let y E [0,
(0) and note that 1(1/(1 +y)) = y. Thus, since 1/(1 +y) E (0,
1], we see that 1 is onto. Next, let a, b E (0, 1] with ai-b.
Since (1 - a)/a i- (1 - b)/b we see that 1 is one-to-one. Thus,
we conclude that (0, 1] and [0, (0) are equipotent.
1.17. No. Assume that M is a countably infinite IT-algebra on
a nonempty set 0 and, for each w E 0, let

A",= n
{MEM:"'EM}
All.

Note that there are at most only a countable number of distinct


A", 's since M is countable. If there are only a finite number of
A", 's then )\It is finite, which contradicts our assumption. How-
ever, if there are only a conntably infinite number of distinct
A", 's then M must be nnconntable. To see why this last point
holds, consider an enumeration of the elements of M as {j1{l,
M 2 .. . }, consider an ennmeration of the distinct A",'s as {AI,
A2 ... }, and define
if Aj ct. 1\;lj
N = {Aj
J 0 if Aj C 1\;lj .
Note that N = U~l N j is different from j1{j for every j and
hence we conclude that M is not countable.
2.1. No. Let An = {n} for each n E N. Then N = U:=l An and
the An's are disjoint, but JL(N) = 00 i- I:~=l p(An) = o.

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 207

2.2. Consider the measure space (lR, B(lR) , A) where A is


Lebesgue measure. Let An = (n, (0) for each n E Nand
note that the An's comprise a strictly decreasing sequence of
Borel sets. Further, the sequence converges to the empty set
since given any real number x there exists an integer m such
that x tj. An for anyn > m; i.e. limsupAn = 0. However,
A(An) = 00 for each n E N and hence A(An) -/-'t 0 as n ---+ 00.
(What would happen if we required the measure p, to be a finite
measure?)
2.3. Let U = {(x, y) E lR 2 : x'2 + y2 < I} and recall that
liminfAn = u~=ln~nAk. Assume that (x, y) E liminfA n .
Then there exists some n E N such that (x, y) E nk=nAk. Note
that (x, y) E n~n Ak if and only if

(-1)k)2 2
( X - -k-,- +y <1

for all k 2 n since (x, y) E Ak if and only if

(-I.)k) E U.
( x--k-,y

Since
(_I)k)2
( X - -k-'- + y2 <1

for all k > n it follows that

for all k > n. Assume that x 2 + y'2 2 1. Then it follows that

for all k 2 TI, and hence that 2x(-I)k 2 l/k for all k 2 TI.
This last resnlt, however, cannot be trne since the left hand side
alternates sign (or is zero) and the right hand side is always
positive. Thus we conclude that x 2 + y2 must be less than 1.
Hence (x, y) E U and thus liminf An C U.

www.MathGeek.com
www.MathGeek.com

208 Solutions

Now, assume that (x, y) E U, let E = 1 - (x 2 + y2), and note


that E > O. Further, note that

( _I)k) 2 2x(-I)k 1
x - - - +' 2 = x 2 _ +_ +' 2
( k Y k k2 Y

<
2 21xl 1 2 2
x +-+-+y < x +-+-+y = x +y +- = l-E+-
2 1 2 2 2 3 3
- k k2 k k· . k k
since (x, y) E U and kEN. Thus, we see that

(_I)k)2
(X - -k-'- + y2 < 1

if 3/k :::; E or if k 2 3/E. Thus, for n 2 3/E it follows that (x,


y) E Ak for all k > 17. Hence, U c lim inf An which combined
with the earlier result implies that lim inf An = U.
Let S = {(x, y) E ]R2 : x 2 + y2 :::; I} \ {(O, 1) U (0, -I)}.
Recall that lim sup An = n~=l U~n Ak and assume that (x,
y) E lim sup An. Note that (x, y) E Uk=nAk for all 17 E N.
\Ve will first show that x 2 + y2 :::; 1. Let E = x 2 + y2 - 1 and
assume that E > O. Note that (x, y) E Ak if and only if

(-1)k)2
(X - -k-'- + y2 < 1
which is true if and only if
2x(-I)k 1
E < k k2·

Since (x, y) E lim sup An we know that (x, y) E Uk=nAk for all
n E N. That is, for all n E N there exists some kEN such that
k 2n and such that (x, y) E A k. Note that (x, y) E Ak if and
only if
2x 1
E < k - k2

if k is even and if and only if


-2x 1
E<T-k2
if k is odd. Assume that x 2 o. Then since E > 0 we see that
(x, y) fj. Ak for any odd value of k. Let n be an integer such

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 209

that n > 2 (x + 1) / t: and let k be any even integer not less than
n. Since (x, y) E lim sup An we see that there exists an (even)
integer rn such that rn 2: n and such that

2x 1
C"<---
~ 'm ln 2 ·

From this we conclude that


mt: 1 rn£ nt:
x>-+->->-.
2 2m~ 2-2
Recall, however, that n > 2(x + 1)/t:. Hence, nE/2 > x + 1
and nt:/2 ::::; x. This contradiction implies that t: cannot be
positive when x 2: o. A similar procedure shows that t: cannot
be positive when x ::::; o. Thlls, we see that x 2 + y2 ::::; 1 if (x,
y) E lim sup An. Let (x, y) = (0, ±1). Then, (x, y) fj. Ak for
any kEN since

(_I)k)2 + y2 = (-(_I)k)2
- - + (±1)2 = -
1
+1> 1
(x - --
k k k2

for all kEN. Hence, lim sup An C S.


Now, let (x, y) E S and consider x < 0 and k odd. Then,

Note that if k > 1/lxl then, since x < 0, it follows that x +


(l/k) < o. Hence, if k > 1/lxl and if k is odd then

(-It)2 :2 :2 :2
<x +y ::::;1.
(x--'-k- +y
Thus, for any n E N we can find some kEN such that k > n
and such that (x, y) E Ak if x < o. Hence, (x, y) E U~nAk
for all n E N if x < o. A similar argument shows that (x,
y) E Uk=n Ak for all 17 E N if x > o. Finally, if (0, y) E S
then (0, y) E U = lim inf An. Since lim inf An C lim sup An
we thus see that (0, y) E lim sup An- Hence, we conclude that
S C lim sup An. Combined with our earlier result we see that
lim sup An = S.

www.MathGeek.com
www.MathGeek.com

210 Solutions

2.4. Let C be a subset of JR that is not a real Borel set. Let


At = {t} for each t E JR and note that '\(At) = 0 for each t E R
Let the index set I be given by C. Then UtE I At = C tj. B(JR).
That is, an arbitrary union of null sets need not be a measurable
set.
Let C be a real Borel set such that .\(C) > O. Let At = {t} for
each t E JR and note that '\(At) = 0 for each t E R Let the
index set I be given by C. Then UtEI At = C. That is, even
when an arbitrary union of null sets is measurable it need not
be a nnll set.
2.5. Let A be a countable subset of JR and note that, since A is
countable, we may express A as a countable union of singleton
sets; that is A = U:=I {an} where an E JR for each n. Recall that
singleton subsets of JR are Borel sets. Thus, A as a countable
union of Borel sets must also be a Borel set. Since the Borel
sets are a subset of the Lebesgue sets we conclude that A is
Lebesgue measurable. Let m denote Lebesgue measure on the
real line and the Lebesgue measurable subsets of the real line.
By countable subadditivity (or countable additivity if the an's
are distinct) we see thatm(A) must be zero since m({a n }) = 0
for each n.
2.6. Let A be a subset of JR such that A tj. B(JR). Define a
function f : JR ---+ JR via f(x) = 2IA(x) - 1. Note that f is not
Borel measurable since f-I( {I}) = A tj. B(JR). However, If I = 1
is Borel measurable.
2.7. Let £ be the collection of all sets A E (J(P) such that
PI(A) = P2(A). Note that n E £ since PI and P2 are probability
measures. Further, if A E £ then AC E £ since H (AC) = 1 -
PI(A) = 1 - P2 (A) = P2 (AC). Finally, if An E £ for each n E N
and if the An's are disjoint then UnEl'i An E £ since

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 211

Thus, 1: is a A-system. By assumption, P c 1: and P is a


1f-system. Thus, the 1f-A theorem implies that (}"(P) c £.
3.1. Note that fn(O) = 0 for all n E N, and let x E (0, 1].
Choose 17 E N s11ch that 1/17 < x and note that fm(x) = 0 for
all 'Tn ;:::: n. Hence, limn~oo fn(x) = 0 for all x E [0, 1]. Since
f01 fn(x) dx = 1 for all n it follows that limn--->oo f~ fn(x) dx = 1.
The final integral is zero since the integrand is zero.
3.2. Clearly, J-L maps JID(O) into [0, 00]. Indeed, it maps it into
the set {O, 1, 2, 3, 4}. Further, J-L(0) = 0 since 0 contains zero
elements. Finally, if A and B are disjoint sets then J-L(A U B) is
simply J-L(A) + J-L(B) , the number of points in Au B. Next, note
that

r
~1}
f dJ-L +
~2}
r
f dJ-L +
J{3}
r
f dJ-L +
~4}
f dJ-Lr
f(l)f1({l}) + f(2)J-L( {2})
+f(3)f1({3}) + f(4)f1({4})
1 X 1 +4 X 1+9 X 1 + 16 X 1
30.

3.3. Recall the integration by parts theorem for Riemann-


Stieltjes integrals. Since F is continuous and of bounded vari-
ation it follows that the integral exists. Thus, integrating by
parts we see that

lb F(x) dF(x) = (F(b))2 - (F(a))2 - lb F(x) dF(x).

Since F(b) = 1 and F( a) = 0 we see that


b 1
la
F(x) dF(x) = -.
2

3.4. Let f be a probability density fnndion associated with F.


Then

i: (F(x + c) - F(x)) dx = i: lx+c f(t) dtdx

= i: l~c dx f(t) dt = c i: f(t) dt = c.

www.MathGeek.com
www.MathGeek.com

212 Solutions

For an alternate solution, note that

10= P(X > t) dt - i =P(X < t) dt.


O
E[X] =

Thus,

c E[X - (X - c)]
E[X] - E[X - c]
10= (1 - F(t)) dt - iOco F(t) dt
O
-loco (1 - F(t + c)) dt + i = F(t + c) dt
10= (F(t + c) - F(t)) dt + iOco (F(t + c) - F(t)) dt

i: (F(t + c) - F(t)) dt.

3.5. Using Lemma 5.3 on page 96, it follows that we shonld


choose
0 if x < 0
9 (x) = { 1 if x > O.
Note that 9 is not differentiable at the origin, and hence the
typical engineering appeal to "derivatives of the step function"
is nonsensical.
4.1. No. Consider the three real numbers 1, 2, and 3. Note that
d(l, 3) = 4 but d(l, 2) = 1 and d(2, 3) = 1. Thus, we see that
d(l, 3) > d(l, 2) +d(2, 3). Hence, d does not satisfy the triangle
inequality and consequently cannot be a metric.
4.2. Consider the metric p defined on the positive integers N
via p(n, rn) = In - rnl. Notice that B(l, 1) = {1}, while B(l,
1) = {1, 2}. Further, the closure of B(l, 1) is equal to {1}.
Thus, the closure of B(l, 1) is a proper subset of the closed ball
B(l, 1).
4.3. Let a = 0, let b = 1, and let fn(t) = n2te- nt . Clearly this
seqnence converges to zero pointwise as n ---+ 00. However, note
that fn(t) has a maximnm at t = lin and that fn(l/n) = nle.
Thns, we see that although fn(t) ---+ 0 as 17 ---+ 00, dUn, 0) ---+ 00
as n ---+ 00.

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 213

4.4. For each positive integern, let Xn be a rational number in


the interval (J2 - (lIn), J2). Note that d(xn' xm) < (lIn) +
(11m). Hence, we see that {Xn}nEN is a Cauchy sequence in Q.
However, there is no element in Q to which Xn converges. Since
we have found a Cauchy sequence in Q that does not converge
to a point in Q we see that the rational line is not complete.
5.1. The polynomial x2 + 2Bx + C has real roots if and only if
B2 - C :2:: O. Thus, we are seeking the probability that B2 :2:: C.
This probability is given by

P(B 2 :2:: C) = 1 1ob2 JE c(b, c) dcdb = 10 1


1
b2 db = -.
10o .0' 0 3
That is, the polynomial has real roots with probability 1/3.
5.2. Note that

P(0.3::::; VX < 0.4) P(0.09 ::::; X < 0.16)


0.16 - 0.09 = 0.07.

5.3. Note first that Fj(l : (0, 1) ---+ lR exists and is strictly
increasing. Thus, if Z = Fx(X) then it follows that Fz(z) =
P(Z ::::; z) = P(Fx(X) ::::; z) = P(X ::::; Fj(I(Z)) = Fx(Fj(I(Z)) =
z for 0 < z < 1. Thus, Z is nniform on (0, 1).
5.4. Let Y = -In(F(X)) and, as above, note that Fy(y) =
P(Y ::::; y) = P( -In(F(X)) ::::; y) = P(ln(F(X)) :2:: -y) =
P(F(X) :2:: exp( -y)) = P(X :2:: F-I(exp( -y))) = 1 - P(X ::::;
F-l(exp( -y))) = 1 - F(F-I(exp( -y))) = 1 - exp( -y) for y :2::
o where the sixth equality follows from the continuity of the
indicated distribution function. Thus, .!y (y) = exp( -y) for y :2::
o and is zero for y < o.
5.5. Note that Fz(z) = P(Z ::::; z) = P(X ::::; z, Y ::::; z) =
Fx,Y(z, z). Also, note that Fw(w) = P(W ::::; w) = 1 - P(W >
w) = 1-P(X > w, Y > w) = P(X::::; w)+P(Y::::; w)-P(X::::;
w, Y ::::; w) = Fx(w) + Fy(w) - Fx,Y(w, w).
5.6. Assume that X and Y have a joint probability distribution
function given by F. Note that P(XI < X ::::; X2, YI < Y ::::;
Y2) = P(X ::::; X2, Y ::::; Y2) - P(X ::::; Xl, Y ::::; Y2) - P(X ::::; X2,
Y ::::; yd + P(X ::::; Xl, Y ::::; YI) = F(X2' Y2) - F(Xl' Y2) - F(X2'

www.MathGeek.com
www.MathGeek.com

214 Solutions

Yl) + F(Xl' yd :2: O. Thus G is not a distribution function since


G(2, 2) - G(O, 2) - G(2, 0) + G(O, 0) = 1 - 1 - 1 + 0 = -1.
5.7. Let C be the circle of radius one centered at the origin.
Note that

.fx(X) r ~Ic(x,y)dy
JIR 1["
Y l-X 2 1
j -yl-x2
-dy
1["

~Jl- X2
1["

where -1 < x < 1.


6.1. To begin, note that the three sets AnB, AnBc, and BnAc
partition Au B. Thus, by countable additivity it follows that
P(A U B) = P(A n B) + P(A nBC) + P(B n AC) which implies
that P(AC nBC) = 1 - P(A)P(B) - P(A nBC) - P(B n AC)
where we have used De Morgan's Law and the fact that A and B
are independent. Note that since A and B are independent and
since An BC and An B partition A it follows that P(A nBC) =
P(A) - P(A n B) = P(A)(1 - P(B)) = P(A)P(BC). Similarly,
it follows that P(B n AC) = P(B)P(AC). Substituting we see
that P(AC nBC) = P(AC)P(BC) which implies that AC and BC
are independent.
6.2. Note that a cirde with llnit area has radills r = 1/ y'iF.
Assume that the dart board is the circle of unit area centered
at the origin in ]R2. Note that P(X E [r/y'2, r]) and P(Y E
[r / y'2, rn are each positive since the dart's final resting place
is determined by a uniform distribution over the area of the
board. However, P(X E [r / y'2, r], Y E [r / y'2, r]) is zero since
the region in question is outside of the circle. Thus, X and Y
are not independent.
6.3. No, since P(A), P(B), and P(C) each equal 1/2 yet p(An
B n C) is equal to zero.
6.4. Let h denote the number of keystrokes required to type
Hamlet. Let D = {Wl, ... , wrn} denote the m different char-
acters that the typewriter is able to produce. The monkey's
output may be thought of as a sequence of experiments where

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 215

the outcome of each experiment is an element of Oh. The prob-


ability of each possible outcome is simply the product of the
probabilities of the keystrokes required to produce it. Let Pi de-
note the probability of D:i E Oh where 1 ::::;i ::::; mh. Note that Pi
is positive for each i and that the text of the play Hamlet corre-
sponds to {Xj for some integer j. We may model the situation as
follows: Repeatedly toss an m h sided die where the ith side of
the die appears on top with probability Pi. Our question then is
on how many tosses will the jth side appear on top. Since each
side comes up with positive probability and since the tosses are
made independently the second Borel-Cantelli lemma implies
that the jth side (and, indeed, each side) will with probability
one appear infinitely many times.
6.5. No, since

and since

fy(y) = loy 2e- x e- Y dx = 2e- Y (1 - e- Y ); y ~ 0

and thus f(x, y) i= fx(x)fy(y)·


6.6. To begin, note that the area of the disc is 251T and the area
of the disc with those points removed that are less than one mile
from the center is 241T. Thus, the probability that there are no
hits within one mile of the target after N shots is (24/25)N.
Hence, the probability that there is at least one hit within a
mile of the target after N shots is equal to 1 - (24/25)N. This
probability exceeds 0.95 when N ~ 74.
7.1. Recall that O"(X) = X-I(B(lR)). Let A be a real Borel set.
If 87 E A then X-I(A) = 0 since X(w) E A for each w E O.
Similarly, if 87 tj. A then X-I(A) is empty. Thus, iJ(X) = {0,
O}.
7.2. If A is a real Borel set such that A C (-00, 0) then
X-I (A) = 0. Further, if A is any real Borel set then X-I(A) =
X-I(B) where B = An [0, (0). If A is a real Borel set such
that A C [0, (0) then let VA denote the set {ft : x E A}
and let -A denote the set {-x: x E A}. For such a set A it

www.MathGeek.com
www.MathGeek.com

216 Solutions

then follows that X-l(A) = VAu (-VA) and hence that IT(X)
consists of all sets of this form. But, any real Borel set B C [0,
00) may be written as VC for the set C = {x 2 : x E B}. Thus,
O"(X) consists of all sets of the form B U - B where B C [0, 00)
is a real Borel set.
7.3. Assume that X is not equal to Y a.s. Then there must exist
a set of positive probability on which X - Y is not zero. Hence,
there exists a set of positive probability on which (X - y)2
is positive. But, if (X - y)2 is positive on a set of positive
probability then E[(X - y)2] cannot be equal to zero. This
contradiction implies that X must equal Y a.s.
7.4. Note that E[X] = 0 and that

VAR[X] = ~ ( x 2 e- 1xl dx = 2.
2 JIT€.

Thus, Chebyshev implies that P(IXI > 2) ::; 1/2. Note, how-
ever, that

P(IXI > 2) = 1 - -1
1
2.
2

-2
e- 1xl dx
.
= e- 2 ~ 0.135.

7.5. Recall that IT(Y) = y-1(8(lR)). Let B E 8(lR) and note


that y-1(B) = X- 1(g-1(B)). Since 9 is Borel measurable it
follows that g-l(B) E 8(lR) and hence that X-1(g-1(B)) E
O"(X). Equality will occur when any Borel set B may be written
as g-l(A) for some A E 8(lR). (For example, if 9 is bijective.)
7.6. Let A and B be nonempty sets, let j : A ----7 B, and let
9 be a collection of subsets of B. To begin, we will show that
j-1(O"B(9)) = O"A(f-1(9)) where, for a nonempty set M and
a collection .Iv/ of subsets of A1, O"M(M) denotes the smallest
O"-algebra on M that contains every set in M.
Recall that if {Bi : i E I} is a collection of subsets of B then
j-1(UiEI B i ) = UiEI j-l(B;) and j-l(niEI B i ) = niEI j-l(Bi)'
That is, intersections and inverses commute and unions and in-
verses commute. Let 91 = {G c B : G E 9 or GC E 9}.
Further, for each positive integer i > 1, let 9i denote the set of
all countable unions and all countable intersections of elements
in Uj<i 9j . Note that U j <= 9j is a IT-algebra and, in fact, is

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 217

equal to (T(Q). Further, j-1((T(9)) = Uj<oo j-1(9i). Also, since


j-1 (9i) is the set of all countable unions and countable inter-
sections of elements in Uj<i j-1(QJ it follows that Ui<oo j-1(9i)
is a O"-algebra. Indeed, it is the O"-algebra generated by j-1 (9).
Thus, j-1(0"(9)) = O"U- 1(9)).
Consider a random variable X defined on a probability space
(n, F, P). Now, let S be a collntable collection of sllbsets
of ffi. such that B(ffi.) = O"(S). Our first result implies that
X-I (O"lR(S)) = 0"1l(X- 1(S)). Note, also, that X- 1(O"lR(S)) =
X- 1(B(ffi.)) = (Tll(X). Thus, (Tll(X) = (Tll(X- 1(S)) from which
it follows immediately that O"Il(X) is countably generated.
7.7. Consider measurable spaces (n1' F 1 ) and (n2' F 2 ) and let j
be a function mapping fh to n2 . Let A be a collection of subsets
of fh such that O"(A) = F 2. If j-1(A) C F1 then j-1(F2) c Fl.
To see why this holds, recall that complements, unions, and
intersections commute with inverses. Thus, the collection y of
all subsets A of n2 such that j-1(A) E F1 is a (T-algebra on n2 .
Note that n2 E y since j-1(0) = 0. Further, note that A c y.
This implies that O"(A) C y. Since O"(A) = F2 the desired result
follows immediately.
It follows immediately that if X-I (B(ffi.)) c F then X- 1((-00,
xl) E F for each x E R Further, using the result of the previous
paragraph, it follows that X-I (B(ffi.)) c F if X-I (( -00, xl) E F
for each x E R
7.S. The forward implication is clear. The reverse implication
follows quickly via a proof by contradiction.
7.9. Note that

~
2 2
E[IX-Yll r r Ix - yl dx dy
4 Jo Jo
-41 Jor2 Jry2(x-y)dxdy+-41 Jor2 Jor (y-x)dxdyy

4lo
1 2 ( 1
2 - 2y - "2y2 + y2 ) dy

1 10
2(2 2) dy
+-4.0 Y - 1
-y
2
1 .,
4lo~ (y2 - 2y + 2) dy

www.MathGeek.com
www.MathGeek.com

218 Solutions

2
3

7.10. Note that

P(Z:S; z) P(max(Xl' ... , Xn) :s; Z)


P(XI :s; z, ... , Xn :s; Z)
P(XI :s; Z) ... P(Xn :s; z)

(~)n
for 0 < z :s; B. Thus, it follows that

for 0 < z < B. Hence,

7.11. To begin, recall that


1 1
cos (a) cos ( b) = "2 cos (a - b) + "2 cos (a + b)
and
cos(a + b) = cos(a) cos (b) - sin(a) sin(b).
Note also that E[cos(2X)] = 0 and E[sin(2X)] = 0 since
Cx (2) = E[cos(2X)] + ~E[sin(2X)] = O. Thus, it follows that

E[cos(X + s) cos(X + s + 1)]


E [~COS(l) + ~ cos(2X + 2s + 1)]
1 1
"2 cos(l) + "2E[cos(2X + 2s + 1)]
1 1
"2 cos(l) + "2E[cos(2X) cos(2s + 1) - sin(2X) sin(2s + 1)]

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 219

1 1
- cos(l) + - cos(2s + 1)E[cos(2X)]
2 2
1
-- sin(2s + 1)E[sin(2X)]
2
1
"2 cos(l).

8.1. Note that


oo
E[X] la xdF(x)
oo
la lax dt dF(x)
oo
la 1 dF(x) dt
00

oo
la P(X > t) dt.
8.2. Since E[(X - m)2] = E[X2] - 2mE[X] + m 2 it follows that
d
dm E[(X - m)2] = -2E[X] + 2m. Setting this latter expression
equal to zero implies that m = E[X] is a critical point. Since
~ . .
dm2E[(X - m)2] = 2 > 0 it follows that m minimizes E[(X-
m)2J.
8.3. Apply the Cauchy-Schwarz inequality to the product XY
to see that
IE[XY]I :::; E[IXYI]:::; VE[X2]/E[Y2].

8.4. Note that COV[X, Y] = E[(X - E[X])(Y - E[Y])]


E[XY]-E[X]E[Y]-E[X]E[Y]+E[X]E[Y] = E[XY]-E[X]E[Y].
8 ..5. Apply the Cauchy-Schwarz inequality to X - E[X] and
Y - E[Y] to see that
IE[(X - E[X])(Y - E[Y])] I
:::; VE[ (X - E[X])2] V'---E[-(Y---E-[Y-])-2]
= O"xO"y

which implies that

Ip(X, Y)I = ICOV[X, Y]I :::; 1.


O"xO"y

www.MathGeek.com
www.MathGeek.com

220 Solutions

8.6. For a, b E lR let Z = aX - bY. Note that 0 ::; E[Z2] =


a2E[X2]-2abE[XY]+b 2E[Y'lJ. Note that the right hand side is a
quadratic equation in a that has at most one real root (possibly
of multiplicity two). Note that the roots of this expression are
given by

2bE[XY] ± /4b 2E[XYJ2 - 4E[X2]b 2E[Y2]


2E[X2]

Based upon the previoU8 observation we know that 4b2E[XYj2-


4E[X2]b 2E[y2] ::; 0 and hence that E[Xy]2 - E[X2]E[y2] ::; O.
Equality holds if and only if E[ Z2], as a function of a, has a real
root. Thus, equality holds if and only E[(aX - by)2] = 0 for
some a and b not both eqnal to zero. Thns, eqnality holds if and
only if P(aX = bY) = 1 for a and b not both zero. In fact, if
p(X, Y) = 1 then Y increases linearly with X (almost snrely)
and if p(X, Y) = -1 then Y decreases linearly with X (almost
surely).
8.7. It follows quickly that

E[Y] =
1
-b-
- a
lb
a
ydy
a
=--
+b
2
and that

VAR[Y] = -1-
b- a
lb a
y2 dy - (a-+2-b)2 (b - a)2
12

8.8. Recall that Mx(t) = exp().(e t - 1)). Thus, M'x(t)


).e exp().(e t - 1)) and lVf'};(t) = ().e t + ).2e 2t ) exp().(e t - 1)).
t

Thus, E[X] = M·'x(O) = ). and E[X2] = 1\1'};(0) = ). +).2 which


implies that VAR[X] = E[X2]- E[X]2 = ).+).2 _).2 = ).. Recall
that VAR[aX] = a2VAR[X] for a E R Thns, by independence
we see that VAR[Y] = (16 + 1 + 36 + 9)(3) = 186.
8.9. Note that E[(X - aY?] = E[X2] - 2aE[XY] + a 2E[y2].
d .
Hence, -d E[(X - ay)2] = -2E[XY] + 2aE[y2] = 0 If a =
a
2
E[XY]/E[y2]. Since dd "E[(X - aY?] = 2E[y2] > 0 it follows
a"
that this choice for a a results in a minimum value of E[(X -
ay)2].

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 221

8.10. Let Z = I:~=l Xi and note that E[XIZ] = E[Xfl + (n-


l)fL2 = 0- 2 + fL2 + (n -1)fL2 = 0- 2 + nfL2, that E[XI ] = fL, and
that E[Z] = nfL. Thus, COV[XI' Z] = 0- 2 +nfL2 - (fL)(nfL) = 0- 2.
Further, VAR[XI ] = 0- 2 and VAR[Z] = n0- 2. Thus,

8.11. Recall that

for k = 0, 1, 2, .... Thus,

exp ( A(t - 1)).

8.12. Recall that if Y is has a uniform distribution on (0, 1)


then
e,t - 1
<Py (t) = zt .
Note that if X = 2Y - 1 then X has a uniform distribution on
(-1, 1) and

<P x (t) e-It<py (2t)

e- lt ( e21~z~ 1 )
e,t _ e- 1t
2d
sin( t)
t
Finally, if Sn = Xl + ... + Xn then

<P, = (sin (t) ) n


Sn t

www.MathGeek.com
www.MathGeek.com

222 Solutions

8.13. Note that <I>(t) is real-valued, and hence that

<I>(t) = E[cos(tX)] = 1: cos(tx)f(x) dx.

Let g(x) = cos(tx)Jf(x), let h(x) = J f(x), and re(;all that


S(;hwarz's inequality implies that

(1: g(x)h(x) dxr: ; 1: i(x) dx 1: h2(x) dx.


Thus,

<I>2(t) < 1: cos 2(tx)f(x) dx 1: f(x) dx

1/
"2.
00

-00 (1 + (;os(2tx))f(x) dx
1 1
"2 + "2<I>(2t)
from which the desired result follows immediately.
For an alternate solution (that does not require X to possess a
density fundion), note that
<I>(t) E[(;os(tX)]

E [2 cos
2
C;) - 1]

2E [cos
2
C;)] - 1.
Thus, <I>(2t) = 2E[cos 2 (tX)]-1. Jensen's inequality implies that
E[cos 2 (tX)] ;::: E[cos(tX)J2. Thus, <I>(2t) ;::: 2E[cos(tX)J2 - 1 =
2<I>2(t) - 1 from which the desired result again follows immedi-
ately.
8.14. Note that X is equal to 0, 1, and 2 with probabilities 1/4,
1/2, and 1/4, respectively. Thus,
1'vlx(s) E[e SX ]
2
'LeskP(X=k)
k=O
ens P(X = 0) + e1s P(X = 1) + e2s P(X = 2)
1 1 s 1 2s
_ + _e + _e
4 2 4
~(1 + e )2. s
4

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 223

Note that
1 s
lvI I (s) = _e 1 2
+ -e S
x 2 2'
that

and, in general, that

lvIt'l(s) = ~es + 2n- 2 e2s .


2
Thus, we see that

for n E N.
8.15. To begin, note that

lvIx(s) = E[e SX
] = ~(1 + e + e2s ).
S

Further, note that


0 if X = 0 and Y = 0
0 if X = 2 and Y = 1
Z=
1 if X = 1 and Y = 0
1 if X = 0 and Y = 1
2 if X = 1 and Y = 1
2 if X = 2 and Y = o.
From this we see that
0 WP 1/3
Z = 21 WP 1/3
{
WP 1/3,
from which it follows that

JI/1z(s) = E[e SZ ] = ~(1


3
+ e + e2s ).
S

Next, note that


0 if X = 0 and Y = 0
1 if X = 0 and Y = 1
X+Z=
2 if X = 1 and Y = 0
2 if X = 2 and Y = 1
3 if X = 1 and Y = 1
4 if X = 2 and Y = o.

www.MathGeek.com
www.MathGeek.com

224 Solutions

Thus, we see that

0 wp 1/9
1 wp 2/9
X+Z= 2 wp 1/3
3 wp 2/9
4 wp 1/9.

Hence,

~1~
IVjX+Z
()
s = -1 2 8 + -c
+ -c 1 28 + -e
2 3s + -c
1 48 .
9 9 3 9 9
Note also that

as well. However, X and Z are not independent since P(Z = 0,


X = 1) = 0 even though P(Z = 0) and P(X = 1) are each
positive.
8.16. The random variables X and Yare uncorrelated since
27r 1
E[XY] =
1o
- cos( e) sin( e) de = 0,
21r
27r 1
E[X] =
1-
.0 21r
cos( e) de = 0,

and,
27r 1
E[Y] = ( - sin(e) de = o.
21r Jo
X and Yare not independent since, however, Sll1ce P(X E
[1/V2, 1], Y E [1/V2, 1]) = 0 =I P(X E [1/V2, I])P(Y E
[1/V2, 1]).
8.17. Consider uncorrelated random variables X and Y with
joint probability distribution function P(X = Xi, Y = Yj) = Pij
for i, j = 1, 2 and marginal probability distributions P(X =
Xi) = Pi fori = 1, 2 and P(Y = Yj) = qj for j = 1, 2. Note
that Pll + P12 + P21 + P22 = 1, Pi1 + Pi2 = Pi for i = 1, 2,
P1j + P2j = qj for j = 1, 2, P1 + P2 = 1, and q1 + q2 = 1.
] = 2:: i =l 2:: j =l XiYjPij, EX = 2:: i =l XiPi, and
2 2. [] 2
Note that E [ XY
E[Y] = 2::;=1 Yjqj. Since X and Yare uncorrelated it follows

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 225

that E[XY] - E[X]E[Y] = 0 and hence that X1Yl (pu - Plqd +


X1Y2(P12 - Plq2) + X2Yl (P2l - P2ql) + X2Y2(P22 - P2Q2) = O. Notice
that P12 = Pl - Pll, P2l = ql - Pll, P22 = q2 - P12 = q2 - Pl + Pll·
Substitution yields X1Yl (Pll - Plql) - X(lj2(Pll - Pl + Plq2) -
X2Yl(Pll-ql +P2ql)+X2Y2(Pll-Pl +q2-P2q2) = O. Next, note
that Plql = Pl - P1Q2 = ql - P2Ql = Pl - q2 + P2Q2· Substituting
again implies that (X1Yl - X1Y2 - X2Yl + X2Y2)(Pll - P1Ql) = 0
or that (Xl - X2)(Yl - Y2)(Pll - P1Qd = O. Since Xl i- X2 and
Yl i- Y2 it follows that Pll = P1Ql· From this we see that P12 =
Pl-P1Ql = Pl(I-Ql) = P1Q2, P2l = Ql-P1Ql = Ql(I-Pl) = P2Ql,
and, P22 = P2 - P2l = P2 - P2Ql = P2(1 - Ql) = P2Q2· That is,
Pij = PiQj for i, j = 1, 2. Thus, X and Yare independent.
9.1. Let fR denote a uniform density on (a, b). Let X be the
length of the circumference and let Y be the area of the circle.
It follows that

1 ( X ) 1
fx(x) = 21/R 211" = 211"(b - a)

for 211"a < X < 211"b and

fy(y) = 2y11rYfR
1 (
VfY)
-; 1
= 2y11rY b -
1
a

for 11"a 2 ::::; Y ::::; 11"b 2.


9.2. Let Z = XY. Theorem 5.17 on page 103 implies that

fz(z) = I: I~I fy(y)fx (~) dy

~
2
_1_ (= yexp (_y 1 dy
lyl VI - (Z2/ y 2)
)
11"0"2 llzl . 20"2 .
(since Iz/yl < 1 and Y > 0)
~2
1 exp
11" (J"
(_Z2) 10= exp
~-2
2(J". 0
(-t 2)
~2
2(J"

x t 1 dt
VI -
2 2
(Z2/(t + Z2)) Jt + Z2
(where we let y2 = t 2 + Z2)

= 1
~')
11" O"~
exp ~-2 10= exp
(_Z2)
20" 0
(-t 2)
-')
20"~
dt

www.MathGeek.com
www.MathGeek.com

226 Solutions

1
-exp - 2
(_Z2) -V21ra
1 2
1ra 2 2a 2
_ 1 exp (_Z2)
rrV'iii 2a 2
for z E JR. That is, Z is N(O, ( 2 ).
9.3. Let g(x, y) = x/(x+y) and let h(x, y) = x+y. Note that if
a(b, t) = bt then a(g(x, y), h(x, y)) = a(x/(x + y), x + y) = x,
and if /3(b, t) = t(l- b) then (3(g(x, y), h(x, y)) = (3(x/(x + y),
x + y) = y. Let B = X/(X + Y) and T = X + Y. Then

fB,T(b, t) = fx,y(a(b, t), J3(b, t)) det 8a/8b


8a/8t 8(3/8b
8(3/8t II
[
1

fx,Y(bt, t(l - b)) Idet [~ t


1 __ b II

e-bte-Hbtltl
te- t
for t > 0 and 0 < b < 1. Thus,

fB(b) = 10= te- dt = 1


t

for 0 < b < 1 which implies that B is uniform on (0, 1).


9.4. Let g(x) = l/x for x > 0 and note that g-l = g. Thus, it
follows that

I1/x(Y) = fX(g-l(y)) Id~yg-1(Y)1


1/ 1~211
Ix ( y)
fx(1/y)/y2
for y > O.
10.1. Note that

P(Z < 2.44)


Z - 2.44 -
P ( --<---
5 5)
2 2
P(X < -1.28)
P(X> 1.28)
1/10.

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 227

Thus, a = 1.28.
10.2. Recall the moment generating function lVIx and note that

JVI~(t) = ((J"2 + (m + (J"2t)2) exp [ (J 22t :2 + tm1


and that
lVI'!!(t) = (2(m + (J2t)(J2 + ((J2 + (Tn + (J2t)2)(m + (J2t))
x exp [-2-
(J"2 t 2
+ tTn1.
Thus, E[X3] = lVI'j(.(O) = 2m(J"2 + ((J"2 + m 2)(m) = 3m(J"2 + m 3.
If m = 0 then E[X97] = 0 since x 97 fx(x) is an integrable, odd
function.
10.3. Let T = ViiZlvV and let E = VV. Note that Z = TEI;n.
Theorem 5.19 on page 103 and the independence of VV and Z
imply that

fT,B(t, b) = fz,w (~, b) det [~ : 1


Vii
= f Z.R. (.!!!..-
Vii' b) ~
Vii
= fz (
y'lL
b~) fw(b) vn
1%.
Thus, we see that

h(t) = JIR fzr (Vii


bt ) fw(b) Ibl db.
Vii
10.4. Note that
P(Y ~ y) P(XZ ~ y)
P(XZ ~ y, Z = 1) + P(XZ ~ y, Z = -1)
P(X ~ y, Z = 1) +P(-X ~ y, Z = -1)
P(X ~ y)P(Z = 1) + P( -X ~ y)P(Z = -1)
1 1
P(X ~ Y)"2 + P(X ~ Y)"2
(since -X is also standard Gaussian)
= P(X ~ y)

www.MathGeek.com
www.MathGeek.com

228 Solutions

which implies that Y is standard Gaussian. Note, however, that


X + Y is not Gaussian since
X +Y = {2X wp 1/2
o wp 1/2.
That is, X + Y has a discontinuous distribution function. Note,
also, that X and Yare nncorrelated since E[XY] = E[X2 Z] =
E[X2]E[Z] = O. Finally, however, note that X and Yare not
independent since P(X E [1, 2], Y E [3, 4]) = 0 yet P(X E [1,
2]) and P(Y E [3, 4]) are each positive.
10.5. Let

[ ~~ ] = [~~ ~~] [ ~~ ] = [ ~~~~ ! ~~~~ ].


Note that Zl and Z2 are mntnally Ganssian. Thns, for Zl and Z2
to be independent we require that E[ZlZ2] = E[Zl]E[Z2]. (Note
the we effectively have one equation and four variables so long
as we ensure that C is not singular.) Let C1 = C3 = 1, let C2 = 0,
and note that E[ZlZ2] = E[X1(X1 +C4X2)] = E[XiJ+C4E[X1X2].
Note that E[Xf] = 1 and E[X1X 2] = 1/3. Thus, Zl and Z2 are
independent if C4 = -3 with C1, C2, and C3 as given above.
10.6. Recall that

( 22) (22)
. ~t ~t
Alx(t) = exp -2- + tm1 and },fy(t) = exp -2- + tm2 .
Thus,
. (0"1+0"2
. 2 2) t 2 )
lVlx+y(t) = exp ( 2 + t(m1 + m2)
which implies that X + Y is N(m1 + m2, O"r + O"i).
10.7. Note that
k (>.f1 +
etx (x) (1 - )..)i2(x)) dx

). k f1(X) +
etx dx (1 -)..) ketx i2(x) dx

).. [exp (0"~t2 + tm 1) ]

+ (1 - )..) [ex p ((}~t2 + tm2) ]


)"lVh (t) + (1 - )")lVh(t)

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 229

where lvfl is the moment generating function associated with h


and A12 is the moment generating function associated with h.
Note, also, that

E[X] lvI~(O)

[A(lTit + lndlvh(t) + (1 - A)(IT~t + m 2)M2(t)L=o


An~l + (1 - A)m2.

Further,

E[X2] M~(O)
[AO"i lvfl (t) + A(O"it + ml)21\1l (t)
+(1 - A)0"~lvf2(t) + (1 - A)(O"~t + m2)21\12(t)]t=o
AlTl2 + Am 2 l+( 1 -)A21T2 + (1 - A)m 2
2.
Thus,

VAR[X]

10.8. Let Z = max(X, Y) and note that Fz(z) = Fx,Y(z, z)


via Problem 5.5. Let F be the distribution function of X (and
Y) and let j be a density function for X (and Y). Then, since
Fz(z) = F(z)F(z) it follows that jz(z) = F~(z) = 2F(z)j(z).
Thus,

E[Z] i: zfz(z) dz

i: 2zj(z)F(z) dz
2 roo z_1_e- z2 / 2 rz _1_e- t2 / 2 dt dz
L= v'2i L= v'2i
~ roo t ze-z2/2e-t2/2dtdz
i-= i-=
j= 1= e- t2/.2ze -z/2dzdt
7r

_1 2 '.
7r -= t

11= 1=
_
7r -=. t
ze-(Z 2 +t 2 )/2 dz dt

11= h=
-
7r -=. 2t2
e-"2w -1 dwdt
2

www.MathGeek.com
www.MathGeek.com

230 Solutions

11(Xl e dt
- _t 2
1["-(Xl
~ 1(Xl e- 1_ dt =
t2 / 2 _ _1_.
1["-(Xl J2 yIK
11.1. Note that the Xn's are mutually independent and that
2:~=1 P(Xn = 1) = 2:~=1 ~ = 00. Thus, the second Borel-
Cantelli Lemma implies that P(lim sup{ Xn = I}) = 1. That is,
there exists a null set A such that for any w E AC, Xn(w) = 1
for infinitely many values of n. Hence, Xn does not converge to
zero for any w E AC. Since P(AC) = 1 it follows that the Xn's
cannot converge to zero almost snrely.
11.2. Since Cx ; (t) = e- 1tl for eachi and since the Xi's are
mutually independent, it follows that

CSn/n(t) = (Cx ; (~)) n = Cx;(t).

Thus, Sn/n has a Cauchy distribution centered at zero with


parameter 1. That is, the distribution of the normalized sum is
the same as the distribution of any specific random variable in
the sum.
11.3. Consider a sequence {Xn}nEN" of random variables such
that
_ {n:30
Xn -
with probability 1/n 2
with probability 1 - (l/n 2 ).
Fix s > 0 and note that
if s > 7)3
P(IXn - 01 2 s) = P(Xn 2 E) = {01 if 0 < s S; n 3 .
n2

Thus, since P(Xn 2 E) ---+ 0 as n ---+ 00 for any positive E


we conclude that Xn converges in probability to zero. Note,
however, that since E[X~l = n 3p /n 2 = n 3p - 2 it follows that the
Xn's do not converge to zero in Lp for any p > 1.
11.4. By Theorem 5.40 we know that Xn ---+ C in distribution if
Xn ---+ C in probability. We will show that Xn ---+ C in probability
if Xn ---+ C in distribution. If we consider c to be a constant
random variable then Fc(x) = I[c, 00) (x). Let s > 0 be given and
note that
P(IXn - cl 2 s) = P(Xn S; c - s) + P(Xn 2 c + s).

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 231

Note that P(Xn S C - E) = FXn(c - E) ----Jo 0 as n ----Jo 00 since


Fxn(x) ----Jo Fc(x) as n ----Jo 00 for all x < c. Similarly, P(Xn ;::::
C + E) ----Jo 0 as n ----Jo 00. From this we see that Xn ----Jo C in
probability.
11.5. Note that

P(Zn S z)
P(n(l - max{Xl' ... , Xn}) S z)
P (1 - max{Xl , ... , Xn} S ~)
17,

P (max{Xl , ... , Xn} ;:::: 1 - ~)


n
1 - P (max{Xl , ... , Xn} < 1 - ~)
17,

1 - P (Xl < 1 - ;;, ... , Xn < 1 - ;;)

1 - P (Xl < 1 - ~) ... P ( Xn < 1 - ~)


1_(1 _~)n
n
for 0 < z < n. Recall that

(1 - -nz)n ----Jo e- z

as n ----Jo 00. Thus, FZn (z) ----Jo 1 - e- Z for 0 < z < 00 as n ----Jo 00.

11.6. If the X;'s are mutually independent uniform random vari-


ables taking values in the interval (-0.05, 0.05) then Xi has
mean zero and variance 0.01/12 for each ,i. Let Z be a random
variable with a standard Gaussian distribution function and let
<I>(x) = P(Z S x). The Central Limit Theorem implies that 5 n
has approximately the same distribution as Z(Jvn. Thus,

P (IZI < )1000(0.01)/12


2 )

p(IZI < _2 )
0.91
P(IZI < 2.19)
2<I>(2.19) - 1 = 0.97.

www.MathGeek.com
www.MathGeek.com

232 Solutions

11.7. Let Xi equal 0 or 1 if the ith flip is tails or heads, respec-


tively. Thus, X = I:~~'~OO Xi denotes the number of heads that
are observed in 10,000 flips. VVe will approximate P(X = 5000)
by finding P(4999.5 < X < .5000.5) with an appeal to the Cen-
tral Limit Theorem.
The Central Limit Theorem implies that

Let Z be a standard Gaussian random variable, and note that


E[Xi ] = ~ and VAR[Xil = E[Xf] - E2[Xil = ~ - ~ = ~. Thus,

P( 4999.5 < X < 5000.5)


4999.5 - 10,0000) 5000.5 - 10, 000 (~))
;::::j P ( <Z < -~;::;;::::::;:;::;:;:;'---r=f"-'----'-
v10,000y1 v 10 ,000y1
-1< Z <1- )
P (-
100 100

~ (1~0) - ~ (~~)
2 (~(1~0) - ~(O))
2~ (_1
100
) -1.
Note that ~(0.01) = 0.5040. Hence, P(X = 5000) ;::::j 2(0.5040)-
1 = 0.008.
11.8. Note that

E[X~l - 20E[Xnl + 0
2

VAR[Xnl + E[Xn]2 - 20:E[Xnl + 0: 2


VAR[Xnl + (E[Xnl - 0)2
---7 0

if and only if VAR[Xnl ---7 0 and E[Xnl ---70 as n ---7 00.

12.1. Note that X and Yare mutually Gaussian and that X and
Yare uncorrelated since E[XYl = E[(U + V)(U - V)] = E[U 2 ] -
E[V2l = O. Thus, X and Yare independent. Problem 10.6
implies that X and Yare each N(O, 2).

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 233

Note that E[XIU] = E[U + VIU] = E[UIU] + E[VIU] = U +


E[V] = U a.s. Similarly, E[YIU] = U a.s. Thus, not only are
E[XIU] and E[YIU] not independent, but they are each equal
almost surely to the same positive variance random variable.
12.2. Let f2 = {a, b, c} and define a measnre P on JPl(D) via
P({a}) = P({b}) = P({e}) = 1/3. Define random variables X
and Y on the resnlting probability space via Y(a) = 1, Y(b) =
Y(e) = -1, X(a) = 1, X(b) = 2, and X(e) = O. Since the
distributions are discrete the second condition in the definition
of conditional expectation reduces to

L E[XIY](w)P({w}) = L X(w)P({w}),

where JlvI E (j(y) = {0, f2, {a}, {b, e}}. Snbstitntion for !vI
implies that E[XIY](a) = X(a) and E[XIY](b) + E[XIY](e) =
X(b) + X(e). Since E[XIY] is (j(Y)-measnrable it follows that
E[XIY](b) = E[XIY](e). Thus, E[XIY](a) = E[XIY](b) =
E[XIY] (c) = 1 and we see that E[XIY] = E[X] = 1/3 + 2/3 = 1
as required. However, X and Yare not independent since
P(X = 1, Y = 1) = 1/3 yet P(X = l)P(Y = 1) = 1/9.
12.3. Note that E[ZIX] = E[XYIX] = XE[YIX] = XE[Y] =
o a.s. Similarly, E[ZIY] = 0 a.s. However, Z is O"(X, Y)-
measurable and hence E[ZIX, Y] = Z a.s.
12.4. Since E[XIF] is F-measurable it follows that E[XIF] =
alIA + a2IB + a3Ic for some real constants aI, a2, and a:3. Recall
that E[XIF] must satisfy

£E[XIF]d)'= £Xd),

for all F E F. Choosing F = A implies that

which in turns implies that

1 rl / 4 2 1
al = )'(A) Jo w dw = 48·

Similarly, a2 = 97/432 and a3 = 19/27.

www.MathGeek.com
www.MathGeek.com

234 Solutions

12.5. Note that if A E (J(y) then

LE[XZIY]dP LXZdP

L IAXZdP
E[IAXZ]
E[X]E[IAZ]
E[X] LZdP

E[X] L E[ZIY] dP.

12.6. Recall from Problem 12.5 that if X and Z are independent


and if X and Yare independent then E[XZIY] = E[X]E[ZIY]
a.s. Let Fn denote 0"(X1' ... , Xn). Thus, it follows that

E [( n+1
{;Yk) 2 I ]
-(n+1)0"2Fn

E [ (~Yk +Y" 11)' - (n + l)<T'IFn]


E [ (~Yk)' + 21';, 11 ~ Yk +Y';+1 - n,,' - "'1.1;,]
E [Xn + 2Yn+1~Yk + Y;+1 - 0"21Fn]

E[XnIFn] + 2E [Yn+1 ~ YklFn] + E[Y,;+lIFn] - 0"2

Xn + 2E[Yn+1]E [~ Yk IFn] + E[Y;+l] - (J2


Xn a.s.

where in the second to the last step we used our first result and
noticed that Y;+ 1 is independent of Xi fori ::; TI.
12.7. Note that

VAR[XIY] E[(X - E[Xly])2IY]


E[X2 - 2XE[XIY] + E[Xly]2IY]

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 235

E[X21Y]- 2E[XE[XIY]IY] + E[E[Xly]2IY]


E[X2IY] - 2E[XIY]E[XIY] + E[Xly]2
E[X2IY]- E[Xly]2.
Thus,

E[VAR[XIY]] E[E[X 2IY]] - E[E[Xly]2]


E[X2] - E[E[Xly]2]
and

VAR[E[XIY]] = E[E[Xly]2]- E[E[XIYW


= E[E[Xly]2] - E[X]2.
Thus, E[VAR[XIY]]+VAR[E[XIY]] = E[X2]-E[X]2 = VAR[X].
12.8. To begin, note that

E[(X - g(y))2]
E[(X - E[XIY] + E[XIY] - g(y))2]
= E[(X - E[Xly])2] + 2E[(X - E[XIY])(E[XIY]- g(Y))]
+E[(E[XIY]- g(y))2].
The first res nIt now follows since

E[(X - E[XIY])(E[XIY] - g(Y))]


E[E[(X - E[XIY])(E[XIY]- g(Y))IY]]
E[(E[XIY]- g(Y))E[X - E[XIY]IY]]
E[(E[XIY]- g(Y))(E[XIY] - E[XIY])]
o.
From this result it is dear that E[(X _g(y))2] is minimized over
all Borel measurable functions 9 when we let g(y) = E[XIY = y].
12.9. Let A = {I, 3, 5}, let B = {2, 4, 6}, and note that
E[XIQ] = alIA + a2IB for some choice of al and a2 from R
Since

JAr
E[XIQ] dP = al dP =
JAr JAr
X dP

it follows that Ctl = 3. That is since nIP(A) = 1P( {I}) +


3P( {3}) + 5P( {5}) it follows that ad2 = 9/6. Similarly it
follows that Ct2 = 4. Thus, E[XIQ] = 3I A + 4I B .

www.MathGeek.com
www.MathGeek.com

236 Solutions

12.10. It follows by the definition of conditional expectation that

r XdP
E[XIQ] = _JO-,,-i-,------.,._
P(D;)
for all w E Di .
12.11. To begin, note that

where

E[XY] E[E[YIQ2]Y]
E[E[E[YIQ2]YIQ2]]
E[E[YIQ2]2]
E[X2]

and

E[XY] E[XE[XIQl]]
E[E[XE[XIQl]IQl]]
E[E[XIQl]2]
E[y2].

Thus, E[(X _y)2] = E[XY]-2E[XY] +E[XY] = 0 and, hence,


the desired result follows via Problem 7.3.
12.12. Since E[YIX] = a + /3X it follows that E[E[YIX]] =
E[Y] = a + pE[X]. Sllbstitntion implies that 4 = a + 3p. Note
also that

E[XY] E[E[XYIX]]
E[XE[YIX]]
E[X(a + leX)]
aE[X] + pE[X2]
aE[X] + pVAR[X] + pE[X]2.
Substituting here implies that -3 = 3a + 11p. Solving these
two eqnations yields a = 53/2 and p = -15/2.

www.MathGeek.com
www.MathGeek.com

Solutions to Problems 237

12.13. To begin, note that

E[y2IX = x] = (T~(1 _ p2) + ( P(T(T: X):.l


since, for each fixed x, a conditional density f(ylx) of Y given
X = x is Gaussian with mean prJy x / rJX and variance rJ~ (1 - p:.l) .
Next, recall that a moment generating function for X exists and
is given by lHx(t) = exp(oJt 2 /2). Finding the fourth derivative
of this function and evaluating it at t = 0 implies that E[X4] =
3rJl. Thus, since p = E[XYl!(rJxrJy), it follows that
E[X2y2] E[E[X2y2IX]]
E[X2E[y2IX]]
E[X2((T~(1 _ p2) + p2(T~.(T>? X2)]
rJ~(l - p2)rJi + p2rJ~rJX2(3rJ:tJ
+ 2p2rJirJ~
rJirJ~
E[X2]E[y2] + 2(E[Xy])2.

12.14. First, note that


E[YI ... Yn lXI, ... , Xm]
E[XmYrn+1 ... YnIXI , ... , Xm]
X mE[Yrn+1 ... Yn]
Xm·
Next, note that
E[(Xn - Xm) COS(Xj )]
E[E[(Xn - Xm) COS(Xj)IXI' ... , Xmll
E[cos(Xj)E[Xn - XmIXI , ... , Xm]]
E[cos(Xj) (E[Xn lXI, ... , Xm]- Xm)]
E[cos(Xj)(Xm - Xm)]
o.
12.15. To begin, note that

.fy(y) k f(x, y) dx
l

J
r 8xy dx
y
4y(1 _ y:.l)

www.MathGeek.com
www.MathGeek.com

238 Solutions

for 0 S y S 1. Also, note that

fx(x) kf(x, y)dy

lax 8xydy
4x 3

for 0 S x S 1. Thus,

E[XIY = y] r xf(x, y) dx
JJR fy(y)
2 2
1- y Jy
r 1
X
2
dx
21 - y:3
31- y2
21 + y + y2
3 l+y

for 0 S y S 1. Further,

E[YIX = x]
r f(x, y) d
JJR Y fx(x) y
2
2"
lox,Y 2 d
Y
x 0
2
-x
3
for 0 S x S 1.

8.3 Solutions to True jFalse Questions


1. False 5. False 9. False
2. True 6. True 10. False
3. True 7. False 11. False
4. False 8. False 12. True

www.MathGeek.com
www.MathGeek.com

Solutions to True/False Questions 239

13. True 23. True 33. False


14. False 24. False 34. False
15. True 25. True 31':o. False

16. False 26. False 36. False


17. True 27. False 37. True
18. False 28. True 38. False
19. True 29. True 39. True
20. False 30. False! 40. False
21. False 31. False
22. False 32. False

www.MathGeek.com
www.MathGeek.com

Index

absolutely c:ontinuous distri- bounded random variable,


bution, 72 70
affine function, 164 bounded set, 35
algebra, 24 bounded variation, 49
almost always, 36 Brown, Robert, 164
almost everywhere, 53 Brownian motion process,
almost sure convergence, 116, 164
1·51 Buffon's needle problem, 84,
almost surely, 116 100
atomic distribution, 72
autocOITelation function, 139 canonical filtration, 149
auto covariance fnndion, 139, canonical projection, 18
156 Cantor-Lebesgue function,
axiom of choice, 13 74
Caratheodory criterion, 43
Banach space, 61 Caratheodory extension the-
bandwidth, 161 orem, 86
second moment, 162 Cartesian product
basis, 60 arbitrary index set, 17
Bernoulli trial, 81 n sets, 17
Bessel's inequality, 63 two sets, 17
bijection, 20 Cauchy distribution, 99, 109
bijective function, 20 Cauchy sequence, 61
binomial distribution, 81 Cauchy-Schwarz inequality,
bivariate Gaussian distribu- 100
tion, 112 central limit theorem, 120
Borel measurable fundion, central moment, 98
40,46, 73 Chapman-Kolmogorovequa-
Borel set, 39, 46 tion, 145
Borel-Cantelli lemma Chapman-Kolmogorov the-
first, 37 orem, 148
second, 77 charaderistic fundion, 107,
121

www.MathGeek.com
www.MathGeek.com

INDEX 241

Chebyshev's inequality, 100, convergent sequence, 42


122 convexity, 64, 100, 127
choose, 80 convolution, 103
closed set, 30, 61 coordinate, 18
closure, 30, 146 correlation coefficient, 101,
cocountable set, 25, 41 113
cofinite set, 25 countable additivity, 33
complement, 16 countable cover, 88
complete measure space, 45 countable set, 21
complete metric space, 61 countable subadditivity, 34
complete probability space, countable union, 23
135 countably infinite set, 21
complex-valued random pro- counting measure, 34
cess, 153, 156 covariance, 101
conditional density, 130 covariance matrix, 114
conditional expectation, 123, cover, 88
126
conditional independence, data processor, 94
148 Dedekind's theorem, 22
conditional probability, 123 DeMorgan's law, 18
continuous function, 40, 73, dense set, 136
141 density function, 74
continuous in probability, dimension, 60
137 Dirac measnre, 34
continuous time random pro- discrete random variable, 72
cess, 135 discrete time random pro-
convergence in L p , 118 cess, 135
convergence in distribntion, disjoint sets, 16
119 disjointification, 34
convergence in law, 119 distance function, 61
convergence in mean, 118 distribution, 93
convergence In mean-square, distribution function, 70
118 domain, 20
convergence in probability, dominated convergence the-
117, 137 orem, 55
convergence in the pth mean, Doob, Joseph, 151
118 Dynkin's Jr-A theorem, 29
convergence of random vari- eigenfunction, 143
ables, 116 eigenvalue, 143
convergence of sets, 37

www.MathGeek.com
www.MathGeek.com

242 INDEX

Einstein, Albert, 164 Gaussian Markov process,


empty set, 13 148
equipotence, 21 Gaussian process, 138
equipotent sets, 21 Gaussian random variable,
equivalence relation, 19 108
evaluation, 18 Gram-Schmidt procedure,
event, 25, 75 160
expectation, 94 greatest lower bound, 36
expected value, 94
exponential distribution, 109 half wave rectifier, 159
Hall, Eric, 74, 129
Fa set, 31 Hilbert space, 62, 128
factor, 18 Hilbert space projection the-
field, 24 orem, 66, 128
filtration, 149 Holder's inequality, 100
finite dimensional distribu-
tion, 135 improper Riemann integral,
finite measure, 33 56
finite set, 21 impulse response function,
finite-dimensional vector space, 158
60 independent events, 75
first order random variable, independent random van-
94 ables, 76
floor, 85 index set, 14, 135
Fonrier transform, 107 indicator function, 20
Frege, Gottlob, 13 indistinguishable random pro-
Fubini's Theorem, 163 cesses, 135
function of a random van- induced measure, 57, 93
able, 103 induced metric, 61
functions, 19 inferior limit, 36, 42
fundamental theorem of cal- infimum, 36
culus, 98 infinitely often, 36
initial probability, 145
G b set, 31 injective function, 20
gain, 157 inner product, 62
gambling, 151 inner product space, 62
Gaussian density function, integrable function, 53
108 integrable random variable,
Gaussian distribution, 109, 94
130 integrable sample paths, 138

www.MathGeek.com
www.MathGeek.com

INDEX 243

integral operator, 143 lim sup, 36


integration by parts, 51 limit point, 30, 61
intersection, 14 limiter, 159, 162
inverse image, 20 linear filter, 158
invertible function , 20 linear manifold , 60
isolated point, 31 linear operation, 157
linear span, 60
Jensen's inequality, 101, 127 linearly dependent vedors,
joint Gaussian distribution , 59
112 linearly independent vectors,
joint probability density func- 60
tion, 83 Lipschitz condit on , 49
joint probability distribution lower bound, 35
fundion,83 lower Riemann integral, 47
jointly Gaussian random vari- Lusin's Theorem, 160
ables, 113 Lyapounov's inequality, 101
Karhunen-Loeve expansion, marginal density fnndion,
144, 166 84
Kolmogorov, Andrei, 69 Markov chain, 145
Lp(O, :F, P), 118 irredndble, 146
L2 continuous process, 141 Markov process, 147
L2 differentiable process, 141 martingale, 149
L2 integrable process, 142 martingale convergence the-
L2 integral, 138 orem, 151
A-system, 28 mean, 94, 98
Laplace distribution, 109 mean recurrence time, 146
Laplace transform , 107 mean-square continuity, 141 ,
law, 93 152
least upper bound, 36 mean-square integral, 138
Lebesgue decomposition the- measurable function, 38
orem,73 measurable modification, 137
Lebesgue, Henri, 69 measurable random process,
Lebesgue integrable func- 137
tion, 53, 55 measurable rectangle, 136
Lebesgue integral, 52, 53 measurable set, 25
Lebesgue measurable set, 43 measurable space, 25
Lebesgue measure, 44 measure, 33
Leibniz's rule, 166 measure on an algebra, 86
lim inf, 36 measure space, 33

www.MathGeek.com
www.MathGeek.com

244 INDEX

Mehler series, 160 open Euclidean ball, 30


Mercer's theorem, 143 open interval, 39
metric, 61 open rectangle, 46
metric space, 61 open set, 30
mllllmum mean-square esti- orthogonal increments, 151
mate, 128, 148 orthogonal vectors, 62
Minkowski sum, 162 orthonormal vectors, 63
Minkowski's inequality, 100 outer Lebesgue measure, 43
modification, 135
moment, 98 pairwise independent events,
moment generating function, 75
105 parallelogram law, 63
monotone convergence theo- Parseval's equality, 160
rem, 54 Parsevars identity, 63
monotonicity, 34 perfect set, 31
1["-system, 28
Monte Carlo analysis, 84
1[">., 18
Monte Carlo simulation, 86
multivariate Gaussian distri- placebo, 132
bution, 113 pointwise convergence, 116
mutually Gaussian random pointwise limit, 42
variables, 113 Poisson approximation, 82
mutually independent events, Poisson distribution, 106
75 positive definite matrix, 114
nmtnally independent ran- power set, 14
dom variables, 76, pre-Hilbert space, 62
84 probability density fnnction,
74
neighborhood, 31 probability distribution func-
nondecreasing function, 49 tion, 70
nonnegative definite func- probability measure, 33
tion, 139 probability space, 33, 70
norm, 60 product O"-algebra, 136
normed linear space, 61 product measure, 136
nowhere differentiable func- product of random variables,
tion, 165 103
null set, 45 projection, 65, 128
proper subset, 14
one-to-one function, 20 proper subspace, 60
onto function, 20
open ball, 61 quantization, 129

www.MathGeek.com
www.MathGeek.com

INDEX 245

46
]R.n, set difference, 16
Radon-Nikodym theorem, erA, 216
68 er-algebra, 24
random process, 13.5 countably generated, 41
random sequence, 135 generated, 25, 76
random variable, 70 IT-field, 24
range, 20 er-finite measure, 33
rank, 114 IT-subalgebra, 41
rational line, 174 simple function, 51
real vector space, 59 singleton set, 14
reflexive relation, 19 singular distribution, 72
regression function, 129 size of a subdivision, 50
relations, 19 spectral density function, 155
relative <:omplement, 16 spedral distribution fun<:-
Riemann integral, 47, 51, 55, tion, 155
143 spedral representation, 155
Riemann-Stieltjes integral, spedrum, 155
51,96 square law device, 159
Riemann-Stieltjes sum, 50 St. Petersburg paradox, 97
Riesz-Frechet theorem, 67 standard deviation, 98
right mntinuous fundion, standard Gaussian distribu-
56,70 tion, 110, 111
Russell, Bertrand, 13 standard Gaussian random
variable, 110
sample fnndion, 135 state, 145
sample path, 135 absorbing, 146
sample sequence, 135 aperiodic, 146
sample space, 25 dosed set, 146
Schroeder-Bernstein theo- ergodi<:, 146
rem, 22 rnean re<:nrren<:e tinle,
second order random pro- 146
cess, 139 null, 146
second order random van- period, 146
able, 98 persistent, 146
separable modification, 136, reachable, 146
137 transient, 146
separable random process, stationary Gaussian process,
136 140
sequence of random van- statistical
ables, 116 hypothesis, 132

www.MathGeek.com
www.MathGeek.com

246 INDEX

inference, 132 uniform convergence, 144


step function, 1.52 uniform distribution, 84, 109
stochastic integral, 1.52 union, 15
stochastic process, 135 upper bound, 35
strictly stationary process, upper Riemann integral, 47
139 usual topology, 30
strong law of large numbers,
123 variance, 98
subdivision, 47 vector space, 59
submartingale, 150 version, 123
subset, 13 weak law of large numbers,
subspace, 60 122
sum of random variables, wide sense stationary pro-
102 cess, 140, 154
superior limit, 36, 42 vViener, Norbert, 164
supermartingale, 150 \Viener process, 164
superset, 13 vVierstrauss Aproximation
supremum, 36 Theorem, 160
surjective function, 20 \Vise, Gary, 74, 129
symmetric difference, 16 with probability 1, 116
symmetric distribution, 107
symmetric matrix, 114 Zermelo-Fraenkel, 13
symmetric relation, 19 zero memory nonlinearity,
159
Taylor's series, 107, 121
Taylor's theorem, 105
topological space, 30
topology, 30
total set, 63
trajectory, 135
transfer function, 158
transition probability, 145
transitive relation, 19
triangle inequality, 61

unbounded set, 36
unbounded variation, 49
uncorrelated random van-
ables, 101
uncountable set, 21

www.MathGeek.com